While using linux system I encounter that that many file extensions are in capital as well as small letters like
myfile.JPG and myfile.jpg
I know Linux file system is case sensitive, but what's the difference in these two files? and why sometimes they get saved saved as capital or sometimes in small.
I have seen the same for other file too like
.ttf vs .TTF
Thanks
There is no difference if you name the file myfile.jpg or myfile.JPG or myfile.jpeg. Linux doesn't care.
The extension might be used by some programs running on linux and by humans to easily identify the filetype but it doen't affects the file in any way. You can even call it myfile.dog or just leave it without extension and would be the same image file and for linux it wouldn't make any difference.
If you have an image file and you want to tell what kind of image file it is you can use the file command or if you have imagemagick installed you can use the identify command.
Try renaming some jpeg file and give it a png extension, the run file image.png, you will see that it still is a jpeg file and that the png extension is there only to confuse you.
You might find this usefull: https://www.quora.com/How-do-Linux-identify-file-types-without-extensions-And-why-cant-Windows-do-so
Related
I am working on a web app project to block all the file executable from file upload.
Example: user can upload, txt, png, image and video files and not any executable scripts like, Perl, Python, exe, PHP, .so, .sh files.
If it is a PHP file, then I strstr for "<?php" tag, If this tag is present, then it is PHP file. How can we find the same for other script/executable files?
Edit: Some time hackers will upload the malicious files using .png or .jpg extn, so what is the pattern to check inside the files?
Rather than making your own checks you make use of an existing library and you block everything that does not register as a desired format.
Most such libraries guess the content type and encoding of a file by looking for certain signatures or magic byte sequences at specific positions within the file.
Other libraries may be more specialised and will for example only identify image or video formats.
https://www.php.net/manual/en/intro.fileinfo.php
https://github.com/ahupp/python-magic
https://docs.python.org/3/library/imghdr.html
The file programme is a command line tool for identification of file types.
After the first pass where you identify and accept only the desired file formats you should then make all files that are not rejected go through an antivirus scanner.
Depending on you use cases you may decide to strip the original file name extension and/or even the complete file name that was provided during the upload and assign the mime-type that was detected rather than rely on user provided properties.
Basically I am trying to get list of programs in Linux which are installed and can open particular file extension .jpg for example. If not all, At-least default program should get listed.
Linux (the kernel) has no knowledge on file types to application mapping. If you want to use Gnome programs you can look at https://people.gnome.org/~shaunm/admin-guide/mimetypes-7.html. For KDE there is another mechanism. Each toolkit can define it as it likes. And the programmer can use the defaults or not. So it is simply application specific!
What do you want to achieve?
If you (double) click with a explorer/browser application on an icon or file name, exactly the explorer/browser looks for the file type. Typically this is realized via mime type dictionary. But how a program looks for the file type and maybe execute another program is only related to the programmer who writes that program. The GUI tool-chains like Gnome and KDE have a lot of support for that topic and so you have basic conformity for each family of applications.
If you want to know how a application do the job, start it with strace. But it is quite hard to dig into the huge amount of data.
Also you can take a look for xdg-open. Many programs use this helper to start applications. As an example: If you start Dolphin with strace you will find a line like lstat64("/etc/xdg", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 after clicking on a file.
you can run from command line with:
xdg-open <file-name>
You maybe also want to have a look for applications which registers for file types: /usr/share/applications/*.desktop
Here you can find in each desktop file some mime-types which are registered for the applications. E.g. for audiacity is:
MimeType=application/x-audacity-project;audio/flac;audio/x-flac;audio/basic;audio/x-aiff;audio/x-wav;application/ogg;audio/x-vorbis+ogg;
For your example with jpg:
$ xdg-mime query filetype <any-jpg-file>
image/jpeg
$ grep 'image/jpeg' -R /usr/share/applications/*
...
/usr/share/applications/mimeinfo.cache:image/jpeg2000=kde4-kolourpaint.desktop;gimp.desktop;
So you can see that gimp is one of the default applications for jpg
The place to start looking is at the mailcap (/etc/mailcap) and MIME-types, e.g., in /etc/mime.types in Debian (the filename and path will vary according to who provides it).
The mailcap file gives some rules for opening a file, while MIME-types lists the known filetypes with a tag that allows multiple applications to know about the file types.
Except for embedded or reduced-functionality systems (such as those based on busybox), you would find these files on almost every UNIX-like system.
I use various 3rd party libraries to convert files on my Linux server. For instance, ImageMagick/convert for images, libreoffice3.5/convert-to for Microsoft Office documents, etc.
Is it possible that these applications require the pre-converted file to have the proper extension for the type of file? For instance, if the file was a png file, it would need to be called whatever.png and not just whatever.
Thank you
your question sounds general, and in general linux apps do not require extensions. bash will execute a .png file with shell commands happily and vi will open a text file called a.exe. extensions are in general not a unix/linux concept to begin with and . is just an allowed character in the file name.
this being said, some particular application may interpret or even require correct extensions.
Given a text file in ubuntu (or debian unix in general), how do I find out the file encoding of the file ? Can I run od or hexdump on it to fingerprint its encoding ? What should I be looking out for ?
There are many tools to do this. Try a web search for "detect encoding". Here are some of the tools I found:
The Internationalizations Classes for Unicode (ICU) are a great place to start. See especially their page on Character Set Detection.
Chardet is a Python module to guess the encoding
of a file. See chardet.feedparser.org
The *nix command-line tool file detects file types, but might also detect encodings if mentioned in the file (e.g. if there's a mime-type notation in
the file). See man file
Perl modules Encode::Detect and Encode::Guess .
Someone asked a similar question in StackOverflow. Search for the question, PHP: Detect encoding and make everything UTF-8. That's in the context of fetching files from the net and using PHP, but you could write a command-line PHP script.
Note well what the ICU page says about character set detection: "Character set detection is ..., at best, an imprecise operation using statistics and heuristics...." In my experience the problem domain makes a big difference in how easy or difficult the job is. Don't forget that it's possible that the octets in a file can be of ambiguous encoding, i.e. sensibly interpreted using multiple different encodings. They can also be of mixed encoding, i.e. different subsets of the octets make sense interpreted in different encodings. This is why there's not a single command-line tool I can recommend which always does the job.
If you have a single file and you just want to get it into a known encoding, my trick is to open the file with a text editor which can import using a bunch of different encodings, such as TextWrangler or OpenOffice.org. First, open the file and let the editor guess the encoding. Take a look at the result. If you aren't satisfied with it, guess an encoding, open the file with the editor specifying that encoding, and take a look at the result. Then save as a known encoding, e.g. UTF-16.
You can use enca. Enca is a small command line tool for encoding detection and convertion.
You can install it at debian / ubuntu by:
apt-get install enca
In order to use it, just call
enca FILENAME
Also see the manpage for more information.
I want to distribute a cross-platform application for which the executable file is slightly different, depending on the user who downloaded it. This is done by having a placeholder string somewhere in the executable that is replaced with something user-specific upon download.
The webserver that has to do these string replacements is a Linux machine. For Windows, the executable is not compressed in the installer .exe, so the string replacement is easy.
For uncompressed Mac OS X .dmg files, this is also easy. However, .dmg files that are compressed with either gzip or bzip2 are not so easy. For example, in the latter case, the compressed .dmg is not one big bzip2-compressed disk image, but instead consists of a few different bzip2-compressed parts (with different block sizes) and a plist suffix. Also, decompressing and recompressing the different parts with bzip2 does not result in the original data, so I'm guessing Apple uses some different parameters to bzip2 than the command-line tool.
Is there a way to generate a compressed .dmg from an uncompressed one on Linux (which does not have hdiutil)? Or maybe another suggestion for creating customized applications without pregenerating them? It should work without any input by the user.
I realize that I'm a bit too late here, but we wanted to do exactly the same thing and got it to work using libdmg. https://github.com/planetbeing/libdmg-hfsplus
Basically, you can use libdmg to unpack a dmg file to an uncompressed file containing a hfs+ file system, play around with the files inside the hfs+ file system, and them put it back together again as a dmg file with the correct checksums.
If you use any fancy dmg features, like showing an EULA before the image is mounted, then these will not survive the process. Background images and so on work, though.
If your web server and client support the gzip encoding, then you can deal with uncompressed files on the server, but have them compressed / decompressed on the fly by the web server / web client respectively.
e.g. apache's mod_gzip.
Otherwise maybe you can split your dmg into 3 parts:
the stuff before what you want to replace
the string you want to replace
the stuff after what you want to replace
If the gzip stream is splittable at those points, you could just concatenate the front and back onto the gzipped string you want to replace. That would let you generate it on the fly.
Release a normal, read-only, compressed dmg. Then bundle your app in a package installer with a pre-flight script that sets the variables you need.