All,
I know this is all over the internet and there is no one compelling answer. However let me explain my situation so that I can get some relevant help.
I am having to store many small images (mostly user avatars) and having to show a list of users with avatars (requiring a bulk user list read along with their avatar pics). Tried storing the images as base64 in MongoDB, but it poorly affects the read performance - takes 900ms to retrieve just 10 records opposed to 70ms without the base64 fields. The application has the potential to grow quickly as we are developing it for an established clinic.
Considering our situation (store many small images, read a bulk of them at once) what would you great minds recommend ?
I know there are other methods, like GFS or storing as binary in mongoDB. Still exploring those options - what do you think about those ?
If FileSystem is your recommendation, I am unaware of how it would be handled in a loadbalanced app servers situation. For example if I am running 3 app servers, i am storing the image on the filesystem of the app server that received the upload request. While retrieving the image back how would the loadbalancer know which app server has this image to route the browser's GET request to ? Also how do I efficiently manage orphaned files situation. When a user record is updated or deleted via application I can remove the filesystem file right at the time of DB update/delete. is this considered a safe data synching strategy?
Have you considered storing them in S3 instead? Storing them across load balanced app servers is a horrible idea. S3 is the solution generally recommended in this scenario. S3 can even directly serve the images for you, removing load from your web servers.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Just to let folks know, I did edit the post slightly as some people have suggested, and the question has been posted on Azure CDN forum here. The reason I am posting it on StackOverflow as well, is to try and reach a bigger audience in hope that folks that have dealt with the same/similar issue could provide valuable solutions/feedback. As far as I know there's currently no solution to this problem, yet it's something that affects any business that uses CDN to deliver their content. I am open to editing this question further, but what I would ask, is that folks don't simply down-vote this question because it sounds like a "rant", it's not, and I can guarantee you that it affects thousands of businesses out there and costs people thousands of dollars a year whether they are aware of it or not.
So here's the situation. Let's say that I am building a picture gallery website and I would like to use Azure CDN to deliver my content for me. In the backend, Azure CDN will pull content from an Azure storage account. CDN is fast and powerful, but it seems though it can be a little unsecured in terms of preventing someone from being able to pull content in very large quantities and thus leaving a user with a huge bandwidth bill. Let me demonstrate what I mean.
So last night I decided to write a simple console app that would download a simple image from my future to be picture gallery website, in a for{} loop, the code is below:
namespace RefererSpoofer
{
class Program
{
static void Main(string[] args)
{
HttpWebRequest myHttpWebRequest = null;
HttpWebResponse myHttpWebResponse = null;
for (int x = 0; x < 1000; x++)
{
string myUri = "http://myazurecdnendpoint.azureedge.net/mystoragecontainer/DSC00580_1536x1152.jpg";
myHttpWebRequest = (HttpWebRequest) WebRequest.Create(myUri);
myHttpWebRequest.Referer = "www.mywebsite.com";
myHttpWebResponse = (HttpWebResponse) myHttpWebRequest.GetResponse();
Stream response = myHttpWebResponse.GetResponseStream();
StreamReader streamReader = new StreamReader(response);
Image image = Image.FromStream(streamReader.BaseStream);
image.Save(string.Format("D:\\Downloads\\image{0}.Jpeg", x), ImageFormat.Jpeg);
myHttpWebResponse.Close();
}
Console.ReadKey();
}
}
}
This console application makes 1000 super-fast continuous requests to an image file that is hosted on my Azure CDN endpoint, and saves them to 'D:\Downloads' folder on my PC, with each filename corresponding to the for{} loop iteration, i.e. image1.jpeg, image2.jpeg, etc.
So what just happened? In about 1 minute of time, I have cost myself 140MB of bandwidth. With this being a Premium CDN, priced at $0.17/GB, let's do the math together: 0.14GB * 60minutes * 24hours * 30days * 0.17cents/GB = $1028.16 of bandwidth costs just if someone (a competitor for example) wanted to make a single request for a single image for a duration of the month to jeopardize my website. I think you guys can see where I am going with this...my website will have thousands of images, in hi-res, btw, the image that I was using in this example was a mere 140KB in size. These types of requests can come from anonymous proxies, etc.
So the question that I have is: What can one do to prevent someone from abusing a publicly exposed CDN endpoint? Obviously one can't be stuck paying $5,000, $20,000 for bandwidth resulting from malicious requests.
Now Azure Premium CDN has an advanced Rules Engine, that can filter out requests based on Referer, and respond with a 403 error in case the Referer doesn't match your website. But, the Referer can be faked, as I did in the above code sample, and CDN still allows the requests to be served (I tested with a Referer spoof). This sucks, a lot of people use Refer to prevent 'hotlinking', but in this case of bandwidth abuse, what does it matter if Referer can be faked with just a line of code?
A couple of ideas that I've had in regards to preventing such abuse and huge bandwidth cost:
*Both solutions would require an action from CDN:
When a request comes for content to the CDN, CDN could make a call to the client server passing in a) IP address of the user b) the CDN Uri requested. And then the client server would check how many times the Uri was requested from this particular IP, and if the client logic sees that it was requested let's say 100 times over the past minute, then obviously this would signal abuse, because browsers cache images, while malicious requests don't. So the client machine would simply reply 'false' to serving the content for this particular request. This would not be a perfect solution since the additional callback to client infrastructure would cause a small delay, yet it's definitely better than being potentially stuck with a bill that would look like the amount of money you have saved up in your bank's savings account.
A better solution. Built in a limit for number of times a file can be served over CDN within a particular time frame, per ip. For example, in the example of the image file above, if one could configure the CDN to serve no more than let's say 50 image requests / IP / within 10 minute time frame. If the abuse was detected, then CDN could, for a time defined by a customer a) serve a 403 for a particular abused uri. or b) server 403 for all uri's if the request is coming from an abuser IP. All times / options should be left configurable to the customer. This would definitely help. There's no callback here which saves time. The downside is that CDN will have to keep track of Uri/IP address/ Hit count.
Which solutions would NOT work:
Signed URL's won't work because the signature query string parameter would be different every time and browsers would constantly make requests for data, effectively wiping out browser cache for images.
Having a SAS access signature for azure blob would not work either because a) Uri is different every time b) There's no limit on how many times you can request a blob once SAS is granted. So abuse scenario is still possible.
Checking your logs and simply banning by IP. I was testing this type of abuse via anonymous proxy yesterday and it worked like a charm. Switched IPs in a matter of seconds and continued abuse (of my own content) for testing purposes. So this is out as well, unless one has a nanny to monitor the logs.
Solutions that can work, but are not feasible:
Filter requests on your web server. Sure, this would be the best way to control the issue and track the number of requests / IP, and simply not serve the content when abuse is detected. But then you loose the big benefit of not delivering your content over super-fast, proximity-to-client optimized CDN. Besides the fact that your servers will be slowed down a lot by serving out large byte content such as images.
Simply bite the bullet and not worry about it. Well...then you know that the pothole that will take your wheel out is just down the road, so no, it's not a comfortable feeling to go with this option.
With all of the above said, the Premium CDN offering from Azure with custom Rules Engine might offer a solution somewhere in there, but with a very poor documentation and a lack of examples one only would have to guess how to properly protect yourself, so that's why I am writing this post. Has anyone ever tackled such an issue? and how would one solve it?
Any suggestions are appreciated, I am very open minded on the issue.
Thank you for reading.
This is theoretical because we don't serve our "app", yet but will likely soon face this problem:
Couldn't you just issue a cookie which expires very quickly (10 sec!?) on your site itself and check for that , using the Azure standard rules engine.
Then, on your "main site" you can have a custom check, that blocks IPs (or doesn't issue the cookie, respectively) if the IP exceeds a "reasonable" amout of requests.
Some preconditions:
You probably would have to have more control over the "main site" that issues the cookie than you have via CDN. Don't know, if you can "channel" these requests directly to the "client server".
The cookie has to be HttpOnly to not be faked from the client.
... Did I miss something?
I recently tried to use the default settings, this is:
5 - max number of concurrent occurances
-20 max number of requests in 200 milliseconds.
However, this started cutting of my personal connections to the website (loading javascript, css etc.). I need something that will never fire for users using the site honestly, but I do want to prevent denial of service attacks.
What are good limits to set?
I don't think that there is a good generic limit that will fit for all websites, it is personal for each website. It depends on RPS, requests execution time etc.
I suggest you to modify IIS logger and log IP of each request. Then view IIS logs to see what is the pattern of the traffic for users, how many requests they do within a normal flow. It should let you approximate average amount of requests coming from user in a selected time frame.
However in my experience 20 requests without 200 milliseconds usually looks like an attack. In this way default settings provided by IIS seem reasonable.
I've got an image generation servlet that generates an image text from a querystring and I use like this:
<img src="myimage.jpg.jsp?text=hello%20world">
These below are my security measures:
Urlencoding of querystring parameter
Domain whitelist
Querystring parameter length check
My questions:
Any security measure I'm forgetting there?
How does the above increase DOS attack risks compared to a standard:
<img src="myimage.jpg">
Thanks for helping.
Things to check would be,
Use HTTP Referer header to verify that requests are originated from your pages. This is only relevant if you are only going to use these images on pages of your site. You can verify that these images are loaded from your page and are not being directly included in page on some other site or the image URL is not directly accessed. This can easily be forged for performing a DOS attack though.
Check the underlying library you are using to generate the images. What parameters you are passing it for generating the image and check which parameters can potentially be controlled by user which can affect the size of image or processing time for the image. I am not sure how the font, font size are provided to the image, if they are hardcoded, or if they are derived through the information from the user.
Since this URL pattern generates an image, I am assuming every call is CPU intensive as well as includes some data transfer for the actual image. You may want to control the rate at which these URLs are fired, if you are really worried about DOS.
As I already mentioned in my comment, the URL can only be 1024 characters long, so there is inherent limit on number of characters that the text field can have. You can enforce a even smaller limit by providing an additional check.
For DoS prevention rate limit how many requests can be received per IP per X number of seconds.
So to implement this, before doing any processing log the remote IP address of each request and then count the number of previous requests in the last e.g. 30 seconds. If the number from that IP address is greater than say 15 then reject the request with a "HTTP 500.13 Web Server Too Busy".
This is on the assumption that your database logging and lookup are less processor intensive than your image generation code. This will not protect against a large scale DDoS attack but it will reduce the risks considerably.
Domain whitelist
I assume this is on the "referer" header? Yes, this would stop your image from being directly included in other websites but it could be circumvented by proxying the request via the other site's server. The DoS protection above would help alleviate this though.
Querystring parameter length check
Yes that would help reduce the amount of processing that a single image request could do.
My questions:
Any security measure I'm forgetting there?
Probably. As a start I would verify you are not at risk to the OWASP Top 10.
How does the above increase DOS attack risks compared to a standard
A standard image request would simply request the static image off your server and the only real overhead would be IO. Processing though your JSP means that it is possible to overload your server by executing multiple requests at the same time as the CPU is doing a more work.
I'm a bit confused about the propagatation time for a CDN. We already have a CDN that contain all of our static files. I'm trying to see if CDN is a good choice for a new project.
Basically, how many time it will take to see my updated file in the CDN ?
Is there any way to reduce time to replicate ?
FYI, i'm using LimeLight but my question is for all CDN.
Propagation depends on filesize and cache settings.
For small files (varies by CDN but typically includes all files under 10MB individual filesize) such as images, css, js, etc. the propagation is instant.
If a file is not in cache, the CDN will fetch it at the first request and then continue caching it until it expires. The standard is a 24 hour cache expiry, but most CDNs allow you to configure it and let it be cached for longer.
For large files (any files over 10MB individual filesize) such as videos, propagation typically varies between 2 - 12 hours, depending on the CDN and how much priority they give you as a client.
You can easily test if a file has propagated fully by doing a curl against the IP of each edge location/server.