Querystring security and dos attack - security

I've got an image generation servlet that generates an image text from a querystring and I use like this:
<img src="myimage.jpg.jsp?text=hello%20world">
These below are my security measures:
Urlencoding of querystring parameter
Domain whitelist
Querystring parameter length check
My questions:
Any security measure I'm forgetting there?
How does the above increase DOS attack risks compared to a standard:
<img src="myimage.jpg">
Thanks for helping.

Things to check would be,
Use HTTP Referer header to verify that requests are originated from your pages. This is only relevant if you are only going to use these images on pages of your site. You can verify that these images are loaded from your page and are not being directly included in page on some other site or the image URL is not directly accessed. This can easily be forged for performing a DOS attack though.
Check the underlying library you are using to generate the images. What parameters you are passing it for generating the image and check which parameters can potentially be controlled by user which can affect the size of image or processing time for the image. I am not sure how the font, font size are provided to the image, if they are hardcoded, or if they are derived through the information from the user.
Since this URL pattern generates an image, I am assuming every call is CPU intensive as well as includes some data transfer for the actual image. You may want to control the rate at which these URLs are fired, if you are really worried about DOS.
As I already mentioned in my comment, the URL can only be 1024 characters long, so there is inherent limit on number of characters that the text field can have. You can enforce a even smaller limit by providing an additional check.

For DoS prevention rate limit how many requests can be received per IP per X number of seconds.
So to implement this, before doing any processing log the remote IP address of each request and then count the number of previous requests in the last e.g. 30 seconds. If the number from that IP address is greater than say 15 then reject the request with a "HTTP 500.13 Web Server Too Busy".
This is on the assumption that your database logging and lookup are less processor intensive than your image generation code. This will not protect against a large scale DDoS attack but it will reduce the risks considerably.
Domain whitelist
I assume this is on the "referer" header? Yes, this would stop your image from being directly included in other websites but it could be circumvented by proxying the request via the other site's server. The DoS protection above would help alleviate this though.
Querystring parameter length check
Yes that would help reduce the amount of processing that a single image request could do.
My questions:
Any security measure I'm forgetting there?
Probably. As a start I would verify you are not at risk to the OWASP Top 10.
How does the above increase DOS attack risks compared to a standard
A standard image request would simply request the static image off your server and the only real overhead would be IO. Processing though your JSP means that it is possible to overload your server by executing multiple requests at the same time as the CPU is doing a more work.

Related

Pingdom breaks IIS output caching when using varyByHeaders (Cookie)

I've been doing a lot of research on output caching lately and have been able to successfully implement output caching in IIS via web.config with either varyByQueryString or varyByHeaders.
However, then there's the issue of Pingdom's Performance & Real User Monitoring (or PRUM). They have a "fun" little beforeUnload routine that sets a PRUM_EPISODES cookie just as you navigate away from the page so it can time your next page load. The value of this cookie is basically a unixtimestamp() which changes every second.
As you can image, this completely breaks user-mode output caching because now every request will be sent with a different Cookie header on each subsequent request.
So two questions:
My first inclination says to find a way to drop PRUM_EPISODES cookie before it reaches the server since it's serves no purpose to the actual application (this is also my informal request for a ClientOnly flag in the next HTTP version). Is anyone familiar with a technique for dropping individual cookies before they reach IIS' output caching engine or some other technique to leverage varyByHeaders="Cookie" while ignoring PRUM_EPISODES? Haven't found such a technique for Web.config as of yet.
Do all monitoring systems manipulate cookies in this manner (changing every page request) for their tracking mechanisms and do they not realize that by doing so, they break user-mode output caching?

How to find out my site is being scraped?

How to find out my site is being scraped?
I've some points...
Network Bandwidth occupation, causing throughput problems (matches if proxy used).
When querting search engine for key words the new referrences appear to other similar resources with the same content (matches if proxy used).
Multiple requesting from the same IP.
High requests rate from a single IP. (by the way: What is a normal rate?)
Headless or weird user agent (matches if proxy used).
Requesting with predictable (equal) intervals from the same IP.
Certain support files are never requested, ex. favicon.ico, various CSS and javascript files (matches if proxy used).
The client's requests sequence. Ex. client access not directly accessible pages (matches if proxy used).
Would you add more to this list?
What points might fit/match if a scraper uses proxying?
As a first note; consider if its worthwhile to provide an API for bots for the future. If you are being crawled by another company/etc, if it is information you want to provide to them anyways it makes your website valuable to them. Creating an API would reduce your server load substantially and give you 100% clarity on people crawling you.
Second, coming from personal experience (I created web-crawls for quite a while), generally you can tell immediately by tracking what the browser was that accessed your website. If they are using one of the automated ones or one out of a development language it will be uniquely different from your average user. Not to mention tracking the log file and updating your .htaccess with banning them (if that's what you are looking to do).
Its usually other then that fairly easy to spot. Repeated, very consistent opening of pages.
Check out this other post for more information on how you might want to deal with them, also for some thoughts on how to identify them.
How to block bad unidentified bots crawling my website?
I would also add analysis of when the requests by the same people are made. For example if the same IP address requests the same data at the same time every day, it's likely the process is on an automated schedule. Hence is likely to be scraping...
Possible add analysis of how many pages each user session has impacted. For example if a particular user on a particular day has browsed to every page in your site and you deem this unusual, then perhaps its another indicator.
It feels like you need a range of indicators and need to score them and combine the score to show who is most likely scraping.

ImageResizer and security considerations

We are considering using ImageResizer in our commercial application and have some questions related to security. The application will allow user upload of images for subsequent display on web pages.
We want to know how we can use ImageResizer to protect against attacks such as compression bomb, JAR content, payload, exif exposures, and malformed image data.
I think I know how to address these in general, but I'd like to know what specific tools ImageResizer offers.
Most ImageResizer data adapters offer an "untrustedData=true" configuration setting.
This setting in turn sets &process=always in the request querystring during the ImageResizer.Configuration.Config.Current.Pipeline.PostRewrite event.
If you wish, you can set it for all image requests. Keep in mind, this will cause requests for original images to be re-encoded at a potential quality loss and/or size increase.
When process=always is set, all images are re-encoded and stripped of exif data to prevent potentially malicious images from reaching the browser. This means the client will get a 500 error instead of a malformed image.
How an image is interpreted, however, is just as important. If you permit user uploads to preserve their original file name or just extension (instead of picking from a whitelist), you open yourself to easy attack vectors. In the same way, if an image is set to the browser with a javascript mime-type, the client may interpret it as javascript and get XSS'd. ImageResizer's pipeline works with whitelists to prevent this from happening.
Also, if you intend to re-encode all uploads, it may be easier to do it during the upload stage instead of on every request. However, this relies on the security of your data store and being sure that no 'as-is' uploads can succeed.

What are good limits to set for IIS Dynamic IP restriction module?

I recently tried to use the default settings, this is:
5 - max number of concurrent occurances
-20 max number of requests in 200 milliseconds.
However, this started cutting of my personal connections to the website (loading javascript, css etc.). I need something that will never fire for users using the site honestly, but I do want to prevent denial of service attacks.
What are good limits to set?
I don't think that there is a good generic limit that will fit for all websites, it is personal for each website. It depends on RPS, requests execution time etc.
I suggest you to modify IIS logger and log IP of each request. Then view IIS logs to see what is the pattern of the traffic for users, how many requests they do within a normal flow. It should let you approximate average amount of requests coming from user in a selected time frame.
However in my experience 20 requests without 200 milliseconds usually looks like an attack. In this way default settings provided by IIS seem reasonable.

Chrome Instant invald URLs triggering website lockout

My website uses obscure, random URLs to provide some security for sensitive documents. E.g. a URL might be http://example.com/<random 20-char string>. The URLs are not linked to by any other pages, have META tags to opt out of search engine crawling, and have short expiration periods. For top-tier security some of the URLs are also protected by a login prompt, but many are simply protected by the obscure URL. We have decided that this is an acceptable level of security.
We have a lockout mechanism implemented where an IP address will be blocked for some period of time following several invalid URL attempts, to discourage brute-force guessing of URLs.
However, Google Chrome has a feature called "Instant" (enabled in Options -> Basic -> Search), that will prefetch URLs as they are typed into the address bar. This is quickly triggering a lockout, since it attempts to fetch a bunch of invalid URLs, and by the time the user has finished, they are not allowed any more attempts.
Is there any way to opt out of this feature, or ignore HTTP requests that come from it?
Or is this lockout mechanism just stupid and annoying for users without providing any significant protection?
(Truthfully, I don't really understand how this is a helpful feature for Chrome. For search results it can be interesting to see what Google suggests as you type, but what are the odds that a subset of your intended URL will produce a meaningful page? When I have this feature turned on, all I get is a bunch of 404 errors until I've finished typing.)
Without commenting on the objective, I ran into a similar problem (unwanted page loads from Chrome Instant), and discovered that Google does provide a way to avoid this problem:
When Google Chrome makes the request to your website server, it will send the following header:
X-Purpose: preview
Detect this, and return an HTTP 403 ("Forbidden") status code.
Or is this lockout mechanism just stupid and annoying for users without providing any significant protection?
You've potentially hit the nail on the head there: Security through obscurity is not security.
Instead of trying to "discourage brute-force guessing", use URLs that are actually hard to guess: the obvious example is using a cryptographically secure RNG to generate the "random 20 character string". If you use base64url (or a similar URL-safe base64) you get 64^20 = 2^6^20 = 2^120 bits. Not quite 128 (or 160 or 256) bits, so you can make it longer if you want, but also note that the expected bandwidth cost of a correct guess is going to be enormous, so you don't really have to worry until your bandwidth bill becomes huge.
There are some additional ways you might want to protect the links:
Use HTTPS to reduce the potential for eavesdropping (they'll still be unencrypted between SMTP servers, if you e-mail the links)
Don't link to anything, or if you do, link through a HTTPS redirect (last I checked, many web browsers will still send a Referer:, leaking the "secure" URL you were looking at previously). An alternative is to have the initial load set an unguessable secure session cookie and redirect to a new URL which is only valid for that session cookie.
Alternatively, you can alter the "lockout" to still work without compromising usability:
Serve only after a delay. Every time you serve a document or HTTP 404, increase the delay for that IP. There's probably an easy algorithm to asymptotically approach a rate limit but be more "forgiving" for the first few requests.
For each IP address, only allow one request at a time. When you receive a request, return HTTP 5xx on any existing requests (I forget which one is "server load too high").
Even if the initial delays are exponentially increasing (i.e. 1s, 2s, 4s), the "current" delay isn't going to be much greater than the time taken to type in the whole URL. If it takes you 10 seconds to type in a random URL, then another 16 seconds to wait for it to load isn't too bad.
Keep in mind that someone who wants to get around your IP-based rate limiting can just rent a (fraction of the bandwidth of a) botnet.
Incidentally, I'm (only slightly) surprised by the view from an Unnamed Australian Software Company that low-entropy randomly-generated passwords are not a problem, because there should be a CAPTCHA in your login system. Funny, since some of those passwords are server-to-server.

Resources