zip mime types, when to pick which one - zip

So far for Mime Types for Zip files I've seen:
application/octet-stream
multipart/x-zip
application/zip
application/zip-compressed
application/x-zip-compressed
I guess my question is which is the "best" and why? Why is there so many choices? I use winrar and it doesn't seem to care what the Mimetype is, but WinZip seems to only like multipart/x-zip and application/octet-stream. is there a Mimetype I can have all Zip files be downloaded as that will work in all programs?
thanks!

Registered with IANA MIME type is application/zip : http://www.iana.org/assignments/media-types/application/zip
WinZip is not a reference implementation (since originally ZIP standard is developed by PkWare).

Some facts about MIME types
MIME types follow a format: media-type/subtype-identifier. Example: image/png.
IANA maintains a list of all registered media types and subtypes.
The x- prefix of a subtype-identifier indicates that it is experimental and non-standard (not registered with IANA).
Now about the zip specifically...
application/zip is a standard MIME type for zip files, officially registered with IANA. It seems like a good first choice :)
application/octet-stream is defined in RFC 2045 and 2046: The “octet-stream” subtype is used to indicate that a body contains arbitrary binary data, so the content can be anything, not just zip.
multipart/x-zip - unlike a “discrete” type, the “multipart” type is one which represents a document that's comprised of multiple component parts, each of which may have its own individual MIME type. I suspect that the logic here is that a compressed file consists of multiple files. Thus, zip fits the “multipart” definition. But to me, it looks like overinterpretation, I would expect plain-text delimiters between parts to classify content as multipart. Moreover, it's not registered as a standard.
application/zip-compressed - a non-standard type, the naming violates RFC2046: A media type value beginning with the characters “X-” is a private value, to be used by consenting systems by mutual agreement. Any format without a rigorous and public definition must be named with an “X-” prefix
application/x-zip-compressed - some non-standard convention, I'm not sure if there is any significant usage
application/x-zip - some non-standard convention, I'm not sure if there is any significant usage

Related

Bouncycastle - how to distinguish attached from detached signature file programmatically

I am working on an application that is used to sign/verify files according to pkcs7. I am using bouncycastle. The problem is that whenever i pass in (to verify!) a file containing a signature i cannot find a way to distinguish whether the file contains both signature and signed data or just a signature. The point is to ask the user to select a second file if the first one contains only signature /and display appropriate error/.
Is there any way around this problem?
To construct the CMSSignedData (the first time, before you know whether it has encapsulated content), just use the CMSSignedData(byte[]) constructor, where byte[] is the full contents of the file.
Once you have the CMSSignedData instance, then getSignedContent() simply returns null if the content was not encapsulated.
Once you have the basics working, if you are dealing with very large files, you may want to look at CMSSignedDataParser as a more advanced option that will avoid reading in the entire file.

How do you change the MIME type of a file from the terminal?

What I'm looking for is a counterpart to file -I (Darwin; -i on Linux).
For example, given:
$ file -I filename.pdf
filename.pdf: application/octet-stream; charset=binary
I would like to be able to do something like this:
$ [someCommand] filename.pdf application/pdf
The result would be that filename.pdf would then be typed as application/pdf.
The reason for the question is that sometimes web servers use the wrong MIME type, which results in programs refusing to open the file. (Most often text/plain, in my experience.)
I've been searching man, the web and this site for about two and a half hours. Tried everything from hex dumps to xattr to text editors.
Your help would very much be appreciated.
Chris
The thing about MIME types is they're almost entirely fictional.
MIME and HTTP ask us to pretend that all of our files have a piece of metadata identifying the "content type". When we send files around the network, the "content type" metadata goes with them, so nobody ever misinterprets the content of a file.
The truth is this metadata doesn't exist. By the time MIME was invented, it was really too late to convince any OS vendors to adopt a new type system for files. Unix had settled on magic numbers, DOS had settled on 3-letter filename suffixes, and classic MacOS had its creator codes and type codes. (MacOS type codes were closest to the MIME model, since they actually were separate from both the filename and the content. But being only 4 letters long, MIME types wouldn't fit.)
Nobody stores MIME-compatible content types in their filesystem. When a MIME message composer or HTTP server wants to send a file, it decides the file type in the traditional way (filename suffix and/or magic number) and maps the result to a MIME type.
In contrast to the theory (where MIME eliminates file type guessing), MIME as implemented in practice has moved the "guess file type based on filename suffix and/or magic number" logic from the receiver of the file to the sender. As you have noticed, the sender doesn't usually do a better job than the receiver would have done if forced to figure it out for itself. Frequently in the case of a web server, the server's eagerness to slap a Content-type on a file makes things worse. There's no reason for a web server to know anything about the format of files it serves when it is only being used to distribute them and has no need to interpret their contents.
The file command guesses file type by reading the content and looking for magic numbers and strings. The -I option doesn't change that. It just chooses a different output format.
To change the Content-Type header that a web server sends for a specific file, you should be looking in your web server's configuration manual. There's nothing you can do to the file itself.
It's a bit of a category mistake to talk about ‘the MIME type of a file’ – ‘files’ don't have MIME types; only octet streams have them (I'm not necessarily disagreeing with #wumpus-q-wumbley's description of MIME types as ‘fictional’, but this is another way of thinking about it).
MIME stands for Multipurpose Internet Mail Extensions, as originally described in in RFC 2045, and MIME types were originally intended to describe what a receiver is supposed to do with the bunch of bytes soon to follow down the wire, in the rest of the email message. They were very naturally repurposed in (for example) the HTTP protocol, to let a client understand how it is to interpret the bytes in the HTTP response which this MIME type forms the header of.
The fact that the file command can display a MIME type suggests the further extension of the idea, to act as the key which lets a windowing system look up the name of an application which should be used to open the file.
Thus, if ‘the MIME type of a file’ means anything, it means ‘the MIME type which a web server would prefix to this file if it were to be delivered in response to an HTTP request’ (or something like that). Thought of like that, it's clear that the MIME type is part of the web server's configuration, and not anything intrinsic to the file – a single file might be delivered with various MIME types depending on the URL which retrieves it, and details of the request and configuration. Thus an XHTML file might be delivered as text/html or application/xml or application/octet-stream depending on the details of the HTTP request, the directory the file's located in, or indeed the phase of the moon (the latter would be an unhelpful server configuration).
A web server might have a number of mechanisms for deciding on this MIME type, which might include a lookup table based on any file extension, a .htaccess file, or indeed the output of the file command.
So the answer to your question is: it depends.
If what you want to do is change how a web server delivers this file, then you need to look at either your web server documentation, or the contents of your system's /etc/mime.types file (if your system uses that and if the server is configured to fall back on that).
If what you want to do is to change the application which opens a given (type of) file, then your OS/window-manager documentation should help.
If you need to change the output of the file command specifically, for some other reason, then man file is your friend, and you'll probably need to grub around in the magic numbers file, reasonably carefully.
If you have a pdf, and the $file --mime-type command answer octet-stream and not application/pdf, you have a corruption in your file.
The pdf readers will read it, and ignore the problem, but if you upload this file to a web application, the application will recognize the mime-type as a octet-sream. Sometimes it is a problem, mainly if you validate the mime-type (I sometimes have this problem in my application).
To get a fast solution, use a ghost script like this:
gs -o new.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress old.pdf

When do browsers send application/octet-stream as Content-Type?

I'm developing a file upload with JSF. The application saves three dates about the file:
Filename
Bytes
Content-Type as submitted by the browser.
My problem is that some files are saved with content type = application/octet-stream even if they are *.doc files oder *.pdf.
When does the browser submits such a content type?
I would like to clean up the database so I need to know when the browser information are incorrect.
Ignore the value sent by the browser. This is indeed dependent on the client platform, browser and configuration used.
If you want full control over content types based on the file extension, then better determine it yourself using ServletContext#getMimeType().
String mimeType = servletContext.getMimeType(filename);
The default mime types are definied in the web.xml of the servletcontainer in question. In for example Tomcat, it's located in /conf/web.xml. You can extend/override it in the webapp's /WEB-INF/web.xml as follows:
<mime-mapping>
<extension>xlsx</extension>
<mime-type>application/vnd.openxmlformats-officedocument.spreadsheetml.sheet</mime-type>
</mime-mapping>
You can also determine the mime type based on the actual file content (because the file extension may not per se be accurate, it can be fooled by the client), but this is a lot of work. Consider using a 3rd party library to do all the work. I've found JMimeMagic useful for this. You can use it as follows:
String mimeType = Magic.getMagicMatch(file, false).getMimeType();
Note that it doesn't support all mimetypes as reliable. You can also consider a combination of both approaches. E.g. if the one returns null or application/octet-stream, use the other. Or if both returns a different but "valid" mimetype, prefer the one returned by JMimeMagic.
Oh, I almost forgot to add, in JSF you can obtain the ServletContext as follows:
ServletContext servletContext = (ServletContext) FacesContext.getCurrentInstance().getExternalContext().getContext();
Or if you happen to use JSF 2.x already, use ExternalContext#getMimeType() instead.
It depends on the OS, the browser, and how the user has configured them. It's based on the way the browser determines the file type of local files (to display them). On most OS/browser combinations this is based on the file's extension, but on some it may be determined by other means. (eg: on Mac OS)
In ay case, you shouldn't really rely on the Content-type sent by the browser. The best approach would be to actually look at the contents of the file. You could probably also use the filename, but keep in mind that browsers aren't necessarily going to be good about telling you that either (though it's probably still a lot more reliable than the Content-type they send).

Programmatically determine file types in SharePoint

Is there a way to programmatically determine a file type in SharePoint? I want to limit the types of files that are being uploaded into a document library. I have written an EventReceiver that on ItemAdding conducts the following -
if (!(properties.AfterUrl.Contains(".docx") || properties.AfterUrl.Contains(".pptx") || properties.AfterUrl.Contains(".xlsx") ))
Surely there's a better way to do so?
Blocking file types is only possible at the farm level (through the central admin).
An Event Handler checking the file's extension is the only way to go if you want to be able to administer this at a document library level.
So no, there is no better way of doing this.
If you are really interested in restricting certain types of files, I would recommend to go beyond file's extension or mime types and inspect file's content to determine its nature, which is what IE and Firefox do.
(BTW, there's an IE API whose name I cannot remember right now that gives you the mime type of a file after inspecting it.)

What is the difference between: image/x-citrix-pjpeg and image/pjpeg

Some files are uploaded with a reported MIME type:
image/x-citrix-pjpeg
They are valid jpeg files and I accept them as such.
I was wondering however: why is the MIME type different?
Is there any difference in the format? or was this mimetype invented by some light bulb at citrix for no apparent reason?
Update:
Ok, I did some more searching and testing on this question, and it turns out they're all lying about MIME-type (never trust any info send by the client, I know).
I've checked a bunch of files with different encodings (created with libjpeg)
Official MIME type for jpeg files: image/jpeg
But some applications (most notably MS Internet Explores but also Yahoo! mail) send jpeg files as image/pjpeg
I thought I knew that pjpeg stood for 'progressive' jpeg. It turns out that progressive/standard encoding has nothing to do with it.
MS Internet explorer send out all jpeg files as pjpeg regardless of the contents of the file.
The same goes for citrix: all jpeg files send from a citrix client are reported as the image/x-citrix-pjpeg MIME type.
The files themselves are untouched (identical before and after upload). So it turns out that difference in MIME type is only an indication the software used to send the file?
Why would people invent a new MIME type if there is no differences to the file contents?
image/x-citrix-pjpeg seems to be the MIME type sent by images which are exported from a Citrix session.
I haven't come across any format differences between them and regular JPEGs - most image conversion utilities will handle them the same as a regular pjpeg, once the appropriate mime-type rule is added.
It's possible that in a Citrix session there is some internal magic done when managing jpegs which led them to create this mime-type, which they leave on the file when it's exported from their systems, but that's only my guess. As I say, I haven't noticed any actual format differences from the occasional files in this format we receive.
The closest i have come to find out what this is, is this thread. Hope it helps.
http://forums.citrix.com/message.jspa?messageID=713174
For some reason, when people are running Internet Explorer via Citrix, it changes the mime type for GIF and JPG files.
JPG: image/x-citrix-pjpeg
GIF: image/x-citrix-gif
Based on my testing, PNG files are not affected. I don't know if this is an Internet Explorer issue or Citrix.
It's to do with a feature of Citrix called SpeedBrowse, which intercepts jpegs and gifs in webpages on the [Citrix] server side, so that it can send them whole via ICA (the Citrix remoting protocol) -- this is more efficient than screen-scraping them. As a previous poster suggested, this is implemented by marking the images with a changed mime type.
IIRC it hooks FindMimeFromData in IE to change the mime type on the fly, but this is being applied to uploaded files as well as downloaded ones - surely a bug.
From what I recall the Progressive JPG format is the one that would allow the image to be shown with progressively higher resolution as the download of the file progressed. I am not entirely aware of the details, but if you remember back in the days of dial up, some files would show blurry, then better and eventually complete as they were downloaded. For this to work the data needs to be sent in a different order than a JPEG would typically be sent.
The actual data, once you view it, is identical it is just sent in a different order. The JPEG encoding itself may very well group pixels differently, I forget.

Resources