What is Google Chrome's "Uncommon Download" warning based on? - security

I understand that Chrome's "Uncommon Download" warning is broadly based on how common a download is, but what are the specific conditions?
Is "commonness" measured, or is it a heuristic? (eg. "zip files are always considered not common")
If it is measured, is it based on the domain, protocol (eg. http or dataurl), or the full specific URL?
It's clear that file or content type is one of them. From the same website, I've seen that a zip file will trigger the warning, whereas a PNG or JPG won't. Is there any way to make it not trigger for a zip file being created and saved through JSZip?

Related

Retrieve contents of a ZIP file on SharePoint without downloading it

I have written a bit of automated code that checks a SharePoint site and looks for a ZIP file (lets call it doc.zip). If doc.zip is found, it downloads it, and then checks for a file (say target.docx). doc.zip is about 300MB, and so I want to only download where necessary.
What I would like to know is that given SharePoint has some ZIP search capability, is it possible to write code using CSOM (c#) to find doc.zip, and then run some code to retrieve the contents of doc.zip without downloading it.
Just to re-iterate, I am comfortable with searching for files in a folder on SP, downloading the file, and unpacking zip entries. What I need is to retrieve a ZIP files content on SP without downloading it.
E.g. is there a SP command:
cxt.Load(SomeZipFileQuery);
cxt.ExecuteQuery();
Thanks in advance.
This capability is not available. I do like the idea. Having the ability to "parse" zip files on the server side and then download the relevant bits would be ideal. Perhaps raise this on uservoice to see if others also find this us https://sharepoint.uservoice.com
Ok, I have proven yet again that stubbornness will prevail.
I have figured out that if I use the /_api/search?query='myfile.zip' web REST API to search for my file, this search will also match ZIP files that contain the file I need. And it works perfectly.
Of course there is added (pain) of parsing an XML response, but it works very nicely for my code example.
At least if someone is looking for this solution here it is. I wont bore anyone with code, as the /_api/search has probably been done to death already on other threads.

How to use ImageMagick to test if received input is an image (for security purposes)?

Imagine an environment in which users can upload images to a website by either uploading it from their pc or referring to a remote url.
As part of some security checks I'd like to make sure that the referenced object is indeed an image.
In the case of a remote-url, I of course check the content-type, but this isn't bullet-proof.
I figured I could use ImageMagick to do the task. Perhaps executing the ImageMagick.identify() method and if no error is returned and returned type is either JPG|GIF|,etc. the content is an image. (In a quick check I noticed that TXT files are identified correctly as well, so I have to blacklist these)
Is there any better way in doing this?
You could probably simply load the image via ImageMagick's appropriate function for your language of choice. If the image isn't formatted properly (in terms of internal formatting, not its aesthetic properties, that is), I would expect ImageMagick to refuse to load it and report an error. In PHP, for example, readImage returns false if the image fails to load.
Alternatively, you could read the first few hundred bytes of the file and determine if the expected image file format headers are present; e.g., "GIF89" etc.
These checks may backfire, if your image is in a compressable format (PNG, GIF) and it is constructed in a way similar to a zip bomb https://en.wikipedia.org/wiki/Zip_bomb
Some examples at ftp://ftp.aerasec.de/pub/advisories/decompressionbombs/pictures/ (nothing special about that site, I just googled decompression bombs)
Another related issue is that formats like SVG are in fact XML and some image processing tools are prone to a variant of "billion laughs" attack https://en.wikipedia.org/wiki/Billion_laughs
You should not store the original file. The generally recommended approach is to always re-process the image and convert it to an entirely new file. There have been vulnerabilites exploited inside valid image files (see GIFAR), so checking for this would have been useless.
Never expose your visitors to an image file that you have not written out yourself and for which you did not choose the file name yourself.

Programmatically downloading a large number of <insert file type here>

I'm wondering if there's an easy way to download a large number of files of one arbitrary type, e.g., downloading 10,000 XML files. In the past, I've used Bing's API. It's free and offers unlimited queries. However, it doesn't index as many types of files as Google does. Google indexes XML files, CSV files, and KML files. (These can all be found by doing searches like "filetype:XML".) As far as I know, Bing doesn't index these in a way that's easily searchable. Is there another API that has these capabilities?
How about using wget? You can give wget a URL (for example, a google search result) and tell it to follow all the links on that page and download them (I bet you could also give it a filter).
Just tried it and got an ERROR 403: Forbidden. Apparently Google blocks requests from Wget. You'll have to provide a different user agent. Quick search provided this example:
http://www.mail-archive.com/wget#sunsite.dk/msg06564.html
Then it worked with the example given.

What security issues we acquire if we publish a form that lets you upload any type of file into our database?

I am trying to assess our security risk if we allow to have a form in our public website that lets the user upload any type of file and get it stored in the database.
I am worried about the following:
Robots uploading information
A huge increment of the size of the database
The form is an resume upload so HR people will be downloading those files in a jpeg or doc or pdf format but actually getting a virus.
You can use captchas for dealing with robots
Set a reasonable file size limit for each upload
You can do multiple checking for your file upload control.
1) Checking the extension of file (.wmv, .exe, .doc). This can be implemented by Regex expression.
2) Actually check the file header or definition type (ex: gif, word, image, etc, xls). Sometimes file extension is not sufficient.
3) Limit the file size. (Ex: 20mb)
4) Never accept the filename provided by the user. Always rename the file to some GUID according to your specifications. This way hacker wont be able to predict the actual name of the file which is stored on the server.
5) Store all the files out of web virtual directory. Preferably store in separate File Server.
6) Also implement the Captcha for File upload.
In general, if you really mean to allow any kind of file to be uploaded, I'd recommend:
A minimal type check using mime magic numbers that the extension of the file corresponds to the given one (though this doesn't solve much if you are not going to limit the kinds of files that can be uploaded).
Better yet, have an antivirus (free clamav for example) check the file after uploading.
On storage, I always prefer to use the filesystem for what it was created: storing files. I would not recommend storing files in the database (suposing a relational database). You can store the metadata of the file on the database and a pointer to the file on the file system.
Generate a unique id for the file and you can use a 2-level directory structure to store the data: E.g: Id=123456 => /path/to/store/12/34/123456.data
Said that, this can vary depending on what you want to store and how do you want to manage it. It's not the same to service a document repository, a image gallery or a simple "shared directory"

What is the difference between: image/x-citrix-pjpeg and image/pjpeg

Some files are uploaded with a reported MIME type:
image/x-citrix-pjpeg
They are valid jpeg files and I accept them as such.
I was wondering however: why is the MIME type different?
Is there any difference in the format? or was this mimetype invented by some light bulb at citrix for no apparent reason?
Update:
Ok, I did some more searching and testing on this question, and it turns out they're all lying about MIME-type (never trust any info send by the client, I know).
I've checked a bunch of files with different encodings (created with libjpeg)
Official MIME type for jpeg files: image/jpeg
But some applications (most notably MS Internet Explores but also Yahoo! mail) send jpeg files as image/pjpeg
I thought I knew that pjpeg stood for 'progressive' jpeg. It turns out that progressive/standard encoding has nothing to do with it.
MS Internet explorer send out all jpeg files as pjpeg regardless of the contents of the file.
The same goes for citrix: all jpeg files send from a citrix client are reported as the image/x-citrix-pjpeg MIME type.
The files themselves are untouched (identical before and after upload). So it turns out that difference in MIME type is only an indication the software used to send the file?
Why would people invent a new MIME type if there is no differences to the file contents?
image/x-citrix-pjpeg seems to be the MIME type sent by images which are exported from a Citrix session.
I haven't come across any format differences between them and regular JPEGs - most image conversion utilities will handle them the same as a regular pjpeg, once the appropriate mime-type rule is added.
It's possible that in a Citrix session there is some internal magic done when managing jpegs which led them to create this mime-type, which they leave on the file when it's exported from their systems, but that's only my guess. As I say, I haven't noticed any actual format differences from the occasional files in this format we receive.
The closest i have come to find out what this is, is this thread. Hope it helps.
http://forums.citrix.com/message.jspa?messageID=713174
For some reason, when people are running Internet Explorer via Citrix, it changes the mime type for GIF and JPG files.
JPG: image/x-citrix-pjpeg
GIF: image/x-citrix-gif
Based on my testing, PNG files are not affected. I don't know if this is an Internet Explorer issue or Citrix.
It's to do with a feature of Citrix called SpeedBrowse, which intercepts jpegs and gifs in webpages on the [Citrix] server side, so that it can send them whole via ICA (the Citrix remoting protocol) -- this is more efficient than screen-scraping them. As a previous poster suggested, this is implemented by marking the images with a changed mime type.
IIRC it hooks FindMimeFromData in IE to change the mime type on the fly, but this is being applied to uploaded files as well as downloaded ones - surely a bug.
From what I recall the Progressive JPG format is the one that would allow the image to be shown with progressively higher resolution as the download of the file progressed. I am not entirely aware of the details, but if you remember back in the days of dial up, some files would show blurry, then better and eventually complete as they were downloaded. For this to work the data needs to be sent in a different order than a JPEG would typically be sent.
The actual data, once you view it, is identical it is just sent in a different order. The JPEG encoding itself may very well group pixels differently, I forget.

Resources