What is the standard propagation time for a CDN? - web

I'm a bit confused about the propagatation time for a CDN. We already have a CDN that contain all of our static files. I'm trying to see if CDN is a good choice for a new project.
Basically, how many time it will take to see my updated file in the CDN ?
Is there any way to reduce time to replicate ?
FYI, i'm using LimeLight but my question is for all CDN.

Propagation depends on filesize and cache settings.
For small files (varies by CDN but typically includes all files under 10MB individual filesize) such as images, css, js, etc. the propagation is instant.
If a file is not in cache, the CDN will fetch it at the first request and then continue caching it until it expires. The standard is a 24 hour cache expiry, but most CDNs allow you to configure it and let it be cached for longer.
For large files (any files over 10MB individual filesize) such as videos, propagation typically varies between 2 - 12 hours, depending on the CDN and how much priority they give you as a client.
You can easily test if a file has propagated fully by doing a curl against the IP of each edge location/server.

Related

Allowing users to download very large files in IIS (10GB+)

What solution are people using to enable users to download large files in application running on IIS?
For the web application, there is a cap on the maximum file request size and also the request timeout, this limit can be used to prevent denial of service attacks that are caused by users who post large files to the server.
As far as I know, the maximum length of content in a request can be specified via the maxAllowedContentLength, but maxAllowedContentLength has type uint, its maximum value is 4,294,967,295 bytes = 4 gb, this does not meet your requirements. or you can consider splitting this large file into several parts.

How to prevent Azure CDN bandwidth abuse by malicious bandwidth-vampire requests? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Just to let folks know, I did edit the post slightly as some people have suggested, and the question has been posted on Azure CDN forum here. The reason I am posting it on StackOverflow as well, is to try and reach a bigger audience in hope that folks that have dealt with the same/similar issue could provide valuable solutions/feedback. As far as I know there's currently no solution to this problem, yet it's something that affects any business that uses CDN to deliver their content. I am open to editing this question further, but what I would ask, is that folks don't simply down-vote this question because it sounds like a "rant", it's not, and I can guarantee you that it affects thousands of businesses out there and costs people thousands of dollars a year whether they are aware of it or not.
So here's the situation. Let's say that I am building a picture gallery website and I would like to use Azure CDN to deliver my content for me. In the backend, Azure CDN will pull content from an Azure storage account. CDN is fast and powerful, but it seems though it can be a little unsecured in terms of preventing someone from being able to pull content in very large quantities and thus leaving a user with a huge bandwidth bill. Let me demonstrate what I mean.
So last night I decided to write a simple console app that would download a simple image from my future to be picture gallery website, in a for{} loop, the code is below:
namespace RefererSpoofer
{
class Program
{
static void Main(string[] args)
{
HttpWebRequest myHttpWebRequest = null;
HttpWebResponse myHttpWebResponse = null;
for (int x = 0; x < 1000; x++)
{
string myUri = "http://myazurecdnendpoint.azureedge.net/mystoragecontainer/DSC00580_1536x1152.jpg";
myHttpWebRequest = (HttpWebRequest) WebRequest.Create(myUri);
myHttpWebRequest.Referer = "www.mywebsite.com";
myHttpWebResponse = (HttpWebResponse) myHttpWebRequest.GetResponse();
Stream response = myHttpWebResponse.GetResponseStream();
StreamReader streamReader = new StreamReader(response);
Image image = Image.FromStream(streamReader.BaseStream);
image.Save(string.Format("D:\\Downloads\\image{0}.Jpeg", x), ImageFormat.Jpeg);
myHttpWebResponse.Close();
}
Console.ReadKey();
}
}
}
This console application makes 1000 super-fast continuous requests to an image file that is hosted on my Azure CDN endpoint, and saves them to 'D:\Downloads' folder on my PC, with each filename corresponding to the for{} loop iteration, i.e. image1.jpeg, image2.jpeg, etc.
So what just happened? In about 1 minute of time, I have cost myself 140MB of bandwidth. With this being a Premium CDN, priced at $0.17/GB, let's do the math together: 0.14GB * 60minutes * 24hours * 30days * 0.17cents/GB = $1028.16 of bandwidth costs just if someone (a competitor for example) wanted to make a single request for a single image for a duration of the month to jeopardize my website. I think you guys can see where I am going with this...my website will have thousands of images, in hi-res, btw, the image that I was using in this example was a mere 140KB in size. These types of requests can come from anonymous proxies, etc.
So the question that I have is: What can one do to prevent someone from abusing a publicly exposed CDN endpoint? Obviously one can't be stuck paying $5,000, $20,000 for bandwidth resulting from malicious requests.
Now Azure Premium CDN has an advanced Rules Engine, that can filter out requests based on Referer, and respond with a 403 error in case the Referer doesn't match your website. But, the Referer can be faked, as I did in the above code sample, and CDN still allows the requests to be served (I tested with a Referer spoof). This sucks, a lot of people use Refer to prevent 'hotlinking', but in this case of bandwidth abuse, what does it matter if Referer can be faked with just a line of code?
A couple of ideas that I've had in regards to preventing such abuse and huge bandwidth cost:
*Both solutions would require an action from CDN:
When a request comes for content to the CDN, CDN could make a call to the client server passing in a) IP address of the user b) the CDN Uri requested. And then the client server would check how many times the Uri was requested from this particular IP, and if the client logic sees that it was requested let's say 100 times over the past minute, then obviously this would signal abuse, because browsers cache images, while malicious requests don't. So the client machine would simply reply 'false' to serving the content for this particular request. This would not be a perfect solution since the additional callback to client infrastructure would cause a small delay, yet it's definitely better than being potentially stuck with a bill that would look like the amount of money you have saved up in your bank's savings account.
A better solution. Built in a limit for number of times a file can be served over CDN within a particular time frame, per ip. For example, in the example of the image file above, if one could configure the CDN to serve no more than let's say 50 image requests / IP / within 10 minute time frame. If the abuse was detected, then CDN could, for a time defined by a customer a) serve a 403 for a particular abused uri. or b) server 403 for all uri's if the request is coming from an abuser IP. All times / options should be left configurable to the customer. This would definitely help. There's no callback here which saves time. The downside is that CDN will have to keep track of Uri/IP address/ Hit count.
Which solutions would NOT work:
Signed URL's won't work because the signature query string parameter would be different every time and browsers would constantly make requests for data, effectively wiping out browser cache for images.
Having a SAS access signature for azure blob would not work either because a) Uri is different every time b) There's no limit on how many times you can request a blob once SAS is granted. So abuse scenario is still possible.
Checking your logs and simply banning by IP. I was testing this type of abuse via anonymous proxy yesterday and it worked like a charm. Switched IPs in a matter of seconds and continued abuse (of my own content) for testing purposes. So this is out as well, unless one has a nanny to monitor the logs.
Solutions that can work, but are not feasible:
Filter requests on your web server. Sure, this would be the best way to control the issue and track the number of requests / IP, and simply not serve the content when abuse is detected. But then you loose the big benefit of not delivering your content over super-fast, proximity-to-client optimized CDN. Besides the fact that your servers will be slowed down a lot by serving out large byte content such as images.
Simply bite the bullet and not worry about it. Well...then you know that the pothole that will take your wheel out is just down the road, so no, it's not a comfortable feeling to go with this option.
With all of the above said, the Premium CDN offering from Azure with custom Rules Engine might offer a solution somewhere in there, but with a very poor documentation and a lack of examples one only would have to guess how to properly protect yourself, so that's why I am writing this post. Has anyone ever tackled such an issue? and how would one solve it?
Any suggestions are appreciated, I am very open minded on the issue.
Thank you for reading.
This is theoretical because we don't serve our "app", yet but will likely soon face this problem:
Couldn't you just issue a cookie which expires very quickly (10 sec!?) on your site itself and check for that , using the Azure standard rules engine.
Then, on your "main site" you can have a custom check, that blocks IPs (or doesn't issue the cookie, respectively) if the IP exceeds a "reasonable" amout of requests.
Some preconditions:
You probably would have to have more control over the "main site" that issues the cookie than you have via CDN. Don't know, if you can "channel" these requests directly to the "client server".
The cookie has to be HttpOnly to not be faked from the client.
... Did I miss something?

ImageResizer and security considerations

We are considering using ImageResizer in our commercial application and have some questions related to security. The application will allow user upload of images for subsequent display on web pages.
We want to know how we can use ImageResizer to protect against attacks such as compression bomb, JAR content, payload, exif exposures, and malformed image data.
I think I know how to address these in general, but I'd like to know what specific tools ImageResizer offers.
Most ImageResizer data adapters offer an "untrustedData=true" configuration setting.
This setting in turn sets &process=always in the request querystring during the ImageResizer.Configuration.Config.Current.Pipeline.PostRewrite event.
If you wish, you can set it for all image requests. Keep in mind, this will cause requests for original images to be re-encoded at a potential quality loss and/or size increase.
When process=always is set, all images are re-encoded and stripped of exif data to prevent potentially malicious images from reaching the browser. This means the client will get a 500 error instead of a malformed image.
How an image is interpreted, however, is just as important. If you permit user uploads to preserve their original file name or just extension (instead of picking from a whitelist), you open yourself to easy attack vectors. In the same way, if an image is set to the browser with a javascript mime-type, the client may interpret it as javascript and get XSS'd. ImageResizer's pipeline works with whitelists to prevent this from happening.
Also, if you intend to re-encode all uploads, it may be easier to do it during the upload stage instead of on every request. However, this relies on the security of your data store and being sure that no 'as-is' uploads can succeed.

Querystring security and dos attack

I've got an image generation servlet that generates an image text from a querystring and I use like this:
<img src="myimage.jpg.jsp?text=hello%20world">
These below are my security measures:
Urlencoding of querystring parameter
Domain whitelist
Querystring parameter length check
My questions:
Any security measure I'm forgetting there?
How does the above increase DOS attack risks compared to a standard:
<img src="myimage.jpg">
Thanks for helping.
Things to check would be,
Use HTTP Referer header to verify that requests are originated from your pages. This is only relevant if you are only going to use these images on pages of your site. You can verify that these images are loaded from your page and are not being directly included in page on some other site or the image URL is not directly accessed. This can easily be forged for performing a DOS attack though.
Check the underlying library you are using to generate the images. What parameters you are passing it for generating the image and check which parameters can potentially be controlled by user which can affect the size of image or processing time for the image. I am not sure how the font, font size are provided to the image, if they are hardcoded, or if they are derived through the information from the user.
Since this URL pattern generates an image, I am assuming every call is CPU intensive as well as includes some data transfer for the actual image. You may want to control the rate at which these URLs are fired, if you are really worried about DOS.
As I already mentioned in my comment, the URL can only be 1024 characters long, so there is inherent limit on number of characters that the text field can have. You can enforce a even smaller limit by providing an additional check.
For DoS prevention rate limit how many requests can be received per IP per X number of seconds.
So to implement this, before doing any processing log the remote IP address of each request and then count the number of previous requests in the last e.g. 30 seconds. If the number from that IP address is greater than say 15 then reject the request with a "HTTP 500.13 Web Server Too Busy".
This is on the assumption that your database logging and lookup are less processor intensive than your image generation code. This will not protect against a large scale DDoS attack but it will reduce the risks considerably.
Domain whitelist
I assume this is on the "referer" header? Yes, this would stop your image from being directly included in other websites but it could be circumvented by proxying the request via the other site's server. The DoS protection above would help alleviate this though.
Querystring parameter length check
Yes that would help reduce the amount of processing that a single image request could do.
My questions:
Any security measure I'm forgetting there?
Probably. As a start I would verify you are not at risk to the OWASP Top 10.
How does the above increase DOS attack risks compared to a standard
A standard image request would simply request the static image off your server and the only real overhead would be IO. Processing though your JSP means that it is possible to overload your server by executing multiple requests at the same time as the CPU is doing a more work.

Strategy for spreading image downloads across domains?

I am working on PHP wrapper for Google Image Charts API service. It supports serving images from multiple domains, such as:
http://chart.googleapis.com
http://0.chart.googleapis.com
http://1.chart.googleapis.com
...
Numeric range is 0-9, so 11 domains available in total.
I want to automatically track count of images generated and rotate domains for best performance in browser. However Google itself only vaguely recommends:
...you should only need this if you're loading perhaps five or more charts on a page.
What should be my strategy? Should I just change domain every N images and what would good N value be in context of modern browsers?
Is there point where it would make sense to reuse domain rather than introduce new one (to save DNS lookup)?
I don't have specific number of images in mind - since this is open source and publicly available code I would like to implement generic solution, rather than optimize for my specific needs.
Considerations:
Is the one host faster than the other?
Does a browser limit connection per host?
How long does it take for the browser to resolve a DNS name?
As you want this to make a component, I'd suggest you make it able to have multiple strategies to find the host name to use. This will not only allow you to have different strategies but also to test them against each other.
Also you might want to add support for the javascript libraries that can render the data on the page in the future so you might want to stay modular anyway.
Variants:
Pick one domain name and stick with it, hardcoded: http://chart.googleapis.com
Pick one domain name out of many, stick with it: e.g. http://#.chart.googleapis.com
Like 2 but start to rotate the name after some images.
Like 3 but add some javascript chunk at the end of the page that will resolve the DNS of the missing hostnames in the background so that it's cached for the next request (Provide the data of the hostnames not used so far).
Then you can make your library configureable, so you don't need to hardencode in the code the values but you provide the default configuration.
Then you can add the strategy as configuration so someone who implements can decide over it.
Then you can make the component offer to load the configuration from outside, so let's say, if you create a Wordpress plugin, the plugin can store the configuration and offer a plugin user an admin-interface to change the settings.
As the configuration already includes which strategy to follow you have completely given the responsibility to the consumer of the component and you can more easily integrate different usage-scenarios for different websites or applications.
I don't exactly understand the request to rotate domains. I guess it does make sense in the context that your browser may only allow X open requests to a given domain at once, so if you have 10 images served from chart.googleapis.com, you may need to wait for the first to finish downloading before beginning to recieve the fifth, and so on.
The problem with rotating domains randomly is that then you defeat browser caching entirely. If an image is served from 1.chart.googleapis.com on one page load and then from 7.chart.googleapis.com on the next page load, the cached chart is invalidated and the user needs to wait for it to be requested, generated, and downloaded all over again.
The best solution I can think of is somehow determining the domain to request from algorithmically from the request. If its in a function, you can md5 the arguments somehow, convert to an integer, and then serve the image from {$result % 10}.chart.googleapis.com.
Probably a little overkill, but you at least can guarantee that a given image will always be served from the same server.

Resources