I am working on a web app project to block all the file executable from file upload.
Example: user can upload, txt, png, image and video files and not any executable scripts like, Perl, Python, exe, PHP, .so, .sh files.
If it is a PHP file, then I strstr for "<?php" tag, If this tag is present, then it is PHP file. How can we find the same for other script/executable files?
Edit: Some time hackers will upload the malicious files using .png or .jpg extn, so what is the pattern to check inside the files?
Rather than making your own checks you make use of an existing library and you block everything that does not register as a desired format.
Most such libraries guess the content type and encoding of a file by looking for certain signatures or magic byte sequences at specific positions within the file.
Other libraries may be more specialised and will for example only identify image or video formats.
https://www.php.net/manual/en/intro.fileinfo.php
https://github.com/ahupp/python-magic
https://docs.python.org/3/library/imghdr.html
The file programme is a command line tool for identification of file types.
After the first pass where you identify and accept only the desired file formats you should then make all files that are not rejected go through an antivirus scanner.
Depending on you use cases you may decide to strip the original file name extension and/or even the complete file name that was provided during the upload and assign the mime-type that was detected rather than rely on user provided properties.
Related
my friends
I have a question about why file extensions are created?
I found a quote on Wikipedia
"They are commonly used to imply information about the way data might be stored in the file"
what does it mean?
File extension is an identifier which tell the operating system what kind of data and file type they are working with and what associated program opens the file.
if u have an .apk extension file, system can easily recognize it as an application file. If it is an mp4, means it's some kind of multimedia file and can be operated with multimedia applications.
They are commonly used to imply information about the way data might be stored in the file. A normal text editor uses .txt as extension when an html uses extension .html These two files stores data differently.
I'm looking to remotely download and detect a file from a website, like this
http://examplewebsite.com/100/download
When viewing in my browser, this automatically downloads as the appropriate file type, 100.pdf, but sometimes it can be a .xls or .doc file. etc.
Looking at libraries available, like file-type, it only works if you already have the extension
Is this possible?
If you have the url, you can split by '.' and select the last element of your list.
The file-type library you linked in your question actually checks the source of the file to guess. It doesn't use the file extension at all.
While using linux system I encounter that that many file extensions are in capital as well as small letters like
myfile.JPG and myfile.jpg
I know Linux file system is case sensitive, but what's the difference in these two files? and why sometimes they get saved saved as capital or sometimes in small.
I have seen the same for other file too like
.ttf vs .TTF
Thanks
There is no difference if you name the file myfile.jpg or myfile.JPG or myfile.jpeg. Linux doesn't care.
The extension might be used by some programs running on linux and by humans to easily identify the filetype but it doen't affects the file in any way. You can even call it myfile.dog or just leave it without extension and would be the same image file and for linux it wouldn't make any difference.
If you have an image file and you want to tell what kind of image file it is you can use the file command or if you have imagemagick installed you can use the identify command.
Try renaming some jpeg file and give it a png extension, the run file image.png, you will see that it still is a jpeg file and that the png extension is there only to confuse you.
You might find this usefull: https://www.quora.com/How-do-Linux-identify-file-types-without-extensions-And-why-cant-Windows-do-so
I would like to convert a file to .dat below is my query
I have a File eg: ABC this file doesn't have an extension(when i click its propertise it says type of file: file ) I want to convert this file to a .dat by writing a unix script
Linux (and Unix) do not use the file extension to define the type of a file, though some programs to use the file extension as a guideline. Unix/Linux examines the file magic number (the first bytes) of the file to determine the file type, though the program 'file' is the best explanation of how this is done (three tests, filesystem tests, magic tests, and language tests, the first that succeeds determines the file type).
Windows makes heavy use of the file extension to determine file type, and keeps metadata which maps file extension to application(s) which understand the file.
Linux/Unix uses the file magic number, examination of the first line of the file, and hints at the file type (for human use and some program use) using the file extension.
MacOS tracks file metadata using extension, file type code and creator code (metadata kept apart from filename), although OSX is derived from a Unix-like OS, so many of the Linux/Unix notes are applicable.
I want to distribute a cross-platform application for which the executable file is slightly different, depending on the user who downloaded it. This is done by having a placeholder string somewhere in the executable that is replaced with something user-specific upon download.
The webserver that has to do these string replacements is a Linux machine. For Windows, the executable is not compressed in the installer .exe, so the string replacement is easy.
For uncompressed Mac OS X .dmg files, this is also easy. However, .dmg files that are compressed with either gzip or bzip2 are not so easy. For example, in the latter case, the compressed .dmg is not one big bzip2-compressed disk image, but instead consists of a few different bzip2-compressed parts (with different block sizes) and a plist suffix. Also, decompressing and recompressing the different parts with bzip2 does not result in the original data, so I'm guessing Apple uses some different parameters to bzip2 than the command-line tool.
Is there a way to generate a compressed .dmg from an uncompressed one on Linux (which does not have hdiutil)? Or maybe another suggestion for creating customized applications without pregenerating them? It should work without any input by the user.
I realize that I'm a bit too late here, but we wanted to do exactly the same thing and got it to work using libdmg. https://github.com/planetbeing/libdmg-hfsplus
Basically, you can use libdmg to unpack a dmg file to an uncompressed file containing a hfs+ file system, play around with the files inside the hfs+ file system, and them put it back together again as a dmg file with the correct checksums.
If you use any fancy dmg features, like showing an EULA before the image is mounted, then these will not survive the process. Background images and so on work, though.
If your web server and client support the gzip encoding, then you can deal with uncompressed files on the server, but have them compressed / decompressed on the fly by the web server / web client respectively.
e.g. apache's mod_gzip.
Otherwise maybe you can split your dmg into 3 parts:
the stuff before what you want to replace
the string you want to replace
the stuff after what you want to replace
If the gzip stream is splittable at those points, you could just concatenate the front and back onto the gzipped string you want to replace. That would let you generate it on the fly.
Release a normal, read-only, compressed dmg. Then bundle your app in a package installer with a pre-flight script that sets the variables you need.