Amazon CloudFront - Inavalidating files by regex, e.g. *.png - amazon-cloudfront

Is there a way to have Amazon CloudFront invalidation (via the management console), invalidate all files that match a pattern? e.g. images/*.png
context -
I had set cache control for images on my site, but by mistake left out the png extension in the cache directive on Apache. So .gif/.jpg files were cached on users computer but .png files WERE not.
So I fixed the apache directive and now my apache server serves png files with appropriate cache control directives. I tested this.
But the cloudfront had in past fetched those png files, SO hitting those png files via cloudfront still brings those png files with NO cache control. End Result - still no user caching for those png files
I tried to set the invalidation in Amazon CloudFront console as images/*.png. The console said completed, but I still do not get cache control directive in png files. --> Makes me believe that the invalidation did not happen.
I can set the invalidation for the complete image directory; but then I have too many image files --> I would get charged > $100 for this. So trying to avoid this.
Changing image versions so that cloudfront fetches new versions is a painful exercise in my code; doing it for say 500 png files would be a pain. --> Trying to avoid it.
Listing individual png files is also a pain --> trying to avoid it as well.
Thanks,
-Amit

If your CloudFront distribution is configured in front of an S3 bucket, you can list all of the objects in the S3 bucket, filter them with a regex pattern (e.g., /*.png/i), then use that list to construct your invalidation request.
That's what I do anyway. I hope this helps! :)

Related

wget solution to avoid downloading files already present in the mirror when Last-modified header is not provided

I want to refresh the mirror of a website which server is not set to deliver Last-Modified response headers.
Last-modified header missing -- time-stamps turned off. Remote file exists and could contain links to other resources -- retrieving.
I do not have admin rights to that server.
I'd be happy to set wget to ignore files for which the file size is identical, but I have not found a way to accomplish that.
I want to minimize bandwidth and avoid downloading files again, even if they were different but with the same size.
How would I implement such filtering from the CLI?

Azure CDN - truncating "large files" somehow?

Edit: This is all mysteriously working now, although I wasn't able to figure out what actually caused the issue. Might not have been the CDN at all? Leaving this here for posterity and will update if I ever see this kind of thing happen again...
I've been experimenting with using Azure CDN (Microsoft hosted, not Akamai or Verizon) to handle file downloads for a couple of Azure Web Apps, and it's been working fine until today, when it began returning truncated versions of a "large file", resulting in a PDF file that couldn't be opened (by "large file" I'm specifically referring to Azure CDN's Large File Optimisation feature).
The file works fine from the origin URL and is 8.59mb, but the same file retrieved from the CDN endpoint is exactly 8mb. Which, by a suspicious coincidence, happens to be the same as the chunk size used by the Large File Optimisation feature mentioned above. Relevant part of the documentation:
Azure CDN Standard from Microsoft uses a technique called object chunking. When a large file is requested, the CDN retrieves smaller pieces of the file from the origin. After the CDN POP server receives a full or byte-range file request, the CDN edge server requests the file from the origin in chunks of 8 MB.
... This optimization relies on the ability of the origin server to support byte-range requests
File URLs in question:
Origin
CDN
I've also uploaded the same file directly into the website's filesystem to rule out the CMS (Umbraco) and its blob-storage-filesystem stuff interfering, but it's the exact same result anyway. Here's the links for reference.
Origin
CDN
In both cases the two files are binary identical except that the file from the CDN abruptly stops at 8mb, even though the origin supports byte-range requests (verified with Postman) and the Azure CDN documentation linked above claims that
If not all the chunks are cached on the CDN, prefetch is used to request chunks from the origin
And that
There are no limits on maximum file size.
The same issue has occurred with other files over 8mb too, although these had previously worked as of last week. No changes to CDN configuration had been made since then.
What I'm thinking is happening is something like:
Client requests file download from CDN
CDN figures out that it's a "large file" and requests the first 8mb chunk from the origin
Origin replies with an 8mb chunk as requested
CDN begins returning 8mb chunk to client
CDN either doesn't request the next chunk, or origin doesn't provide it
Client only receives the first 8mb of the file
Or perhaps I'm barking up the wrong tree. Already tried turning off compression, not really sure where else to go from here. This is probably my fault, so have I misconfigured the CDN or something? I've considered purging the CDN's cache but I don't really see that as a solution and would rather avoid manual workarounds...
This optimization relies on the ability of the origin server to support byte-range requests; if the origin server doesn't support byte-range requests, requests to download data greater than 8mb size will fail.
https://learn.microsoft.com/en-us/azure/cdn/cdn-large-file-optimization

Cloudfront compression does not invalidate?

I have been adjusting my AWS Cloudfront settings trying to optimize my site.
I tried turning compression on (a Y-Slow recommendation) and it corrupted the rendering of my site.
So I turned compression off, ran in invalidation on the whole directory tree, but the problem persists. I have had to turn CDN off so my site will render.
Just for kicks I invalidated again, turned CDN on after waiting a bit, but still sending me compressed js and css files.
What did I miss?
How to upload a static HTML to S3 and use it as a Web Server
Pre­requisites:
­ You should have an IAM user with S3 access with the username and password
ready
Some Caveats:
Minify the files yourselves [including CSS/JS and HTMLs] [There is no web server to do
that].
You can use grunt [preferably] or any online tool like http://www.willpeavy.com/minifier/
Gzip the files if you want to enable compression with the below
command: [Rememberto minify the files before you do this step]
­ gzip ­9 ­
This will produce two files like file1.min.css.gz and file2.min.css.gz ­ Now remove the “.gz” extension with the help of mv command like: ­ mv file1.min.css.gz file1.min.css [similarly for file2.min.css.gz] ­
Sign­in into your AWS account and create a S3 bucket like mywebsite.com
○ Actions ­> Create Bucket
Right click on the S3 bucket and goto properties and click on enable website hosting.
Here you will have to enter the index document. This is the root file of your website. If it is located on the root and is named as index.html then simply write it as index.html
Go inside the bucket and click on Actions ­> Upload. On the popup, select your
folder/drag and drop the files that you want.
Next ­ Click on Set Details [Tick mark the default selection and then click on Set
Permissions]
Check mark on “Make Everything public” and click on Set Metadata
Voila, you’re good to go!
Thereafter, click on Add metadata and add two things:
Key:​ Cache­Control ​Value:​ max­age=2592000 [This number is in secs, modify this according to your needs]
Key:​ Content­Encoding ​Value:​ gzip
Now Click on Start Upload and at the end, you can access the site via the S3 link of the
index page
Thanks : #karan-shah

Http cache for symfony

I want to follow Google's directive in terms of cache headers for images, scripts and styles.
After reading symfony's documentation about http cache, I decided to install FOSHttpCacheBundle. The I set up rules for path like ^/Resources/ or ^/css/. I then fail to see it the proper headers for my images using Chrome's console.
Alternatively, I have read that, since my server is handling the resource, this is not Symfony that deals with this matter (yet I read in the doc that Symfony Proxy was good for shared-hosting servers, which is what I have).
So should I just add lines to my .htaccess as explained in here, or am I simply misusing FOSHttpCacheBundle? (Or both.)
Static files (including javascript files, CSS stylesheets, images, fonts...) are served directly by the web server. As the PHP module is not even loaded for such files, you must configure the server to set proper HTTP headers. You can do it using a .htaccess file if you use Apache but doing it directly in httpd.conf/apache2.conf/vhost conf (depending of your configuration) will be better from a performance point of view.
If you also want want to set HTTP cache headers for dynamic content (HTML generated by Symfony...), then you must use FosHttpCache or any other method provided by Symfonny such as the #Cache annotation.

How can I prevent Amazon Cloudfront from hotlinking?

I use Amazon Cloudfront to host all my site's images and videos, to serve them faster to my users which are pretty scattered across the globe. I also apply pretty aggressive forward caching to the elements hosted on Cloudfront, setting Cache-Controlto public, max-age=7776000.
I've recently discovered to my annoyance that third party sites are hotlinking to my Cloudfront server to display images on their own pages, without authorization.
I've configured .htaccessto prevent hotlinking on my own server, but haven't found a way of doing this on Cloudfront, which doesn't seem to support the feature natively. And, annoyingly, Amazon's Bucket Policies, which could be used to prevent hotlinking, have effect only on S3, they have no effect on CloudFront distributions [link]. If you want to take advantage of the policies you have to serve your content from S3 directly.
Scouring my server logs for hotlinkers and manually changing the file names isn't really a realistic option, although I've been doing this to end the most blatant offenses.
You can forward the Referer header to your origin
Go to CloudFront settings
Edit Distributions settings for a distribution
Go to the Behaviors tab and edit or create a behavior
Set Forward Headers to Whitelist
Add Referer as a whitelisted header
Save the settings in the bottom right corner
Make sure to handle the Referer header on your origin as well.
We had numerous hotlinking issues. In the end we created css sprites for many of our images. Either adding white space to the bottom/sides or combining images together.
We displayed them correctly on our pages using CSS, but any hotlinks would show the images incorrectly unless they copied the CSS/HTML as well.
We've found that they don't bother (or don't know how).
The official approach is to use signed urls for your media. For each media piece that you want to distribute, you can generate a specially crafted url that works in a given constraint of time and source IPs.
One approach for static pages, is to generate temporary urls for the medias included in that page, that are valid for 2x the duration as the page's caching time. Let's say your page's caching time is 1 day. Every 2 days, the links would be invalidated, which obligates the hotlinkers to update their urls. It's not foolproof, as they can build tools to get the new urls automatically but it should prevent most people.
If your page is dynamic, you don't need to worry to trash your page's cache so you can simply generate urls that are only working for the requester's IP.
As of Oct. 2015, you can use AWS WAF to restrict access to Cloudfront files. Here's an article from AWS that announces WAF and explains what you can do with it. Here's an article that helped me setup my first ACL to restrict access based on the referrer.
Basically, I created a new ACL with a default action of DENY. I added a rule that checks the end of the referer header string for my domain name (lowercase). If it passes that rule, it ALLOWS access.
After assigning my ACL to my Cloudfront distribution, I tried to load one of my data files directly in Chrome and I got this error:
As far as I know, there is currently no solution, but I have a few possibly relevant, possibly irrelevant suggestions...
First: Numerous people have asked this on the Cloudfront support forums. See here and here, for example.
Clearly AWS benefits from hotlinking: the more hits, the more they charge us for! I think we (Cloudfront users) need to start some sort of heavily orchestrated campaign to get them to offer referer checking as a feature.
Another temporary solution I've thought of is changing the CNAME I use to send traffic to cloudfront/s3. So let's say you currently send all your images to:
cdn.blahblahblah.com (which redirects to some cloudfront/s3 bucket)
You could change it to cdn2.blahblahblah.com and delete the DNS entry for cdn.blahblahblah.com
As a DNS change, that would knock out all the people currently hotlinking before their traffic got anywhere near your server: the DNS entry would simply fail to look up. You'd have to keep changing the cdn CNAME to make this effective (say once a month?), but it would work.
It's actually a bigger problem than it seems because it means people can scrape entire copies of your website's pages (including the images) much more easily - so it's not just the images you lose and not just that you're paying to serve those images. Search engines sometimes conclude your pages are the copies and the copies are the originals... and bang goes your traffic.
I am thinking of abandoning Cloudfront in favor of a strategically positioned, super-fast dedicated server (serving all content to the entire world from one place) to give me much more control over such things.
Anyway, I hope someone else has a better answer!
This question mentioned image and video files.
Referer checking cannot be used to protect multimedia resources from hotlinking because some mobile browsers do not send referer header when requesting for an audio or video file played using HTML5.
I am sure of that about Safari and Chrome on iPhone and Safari on Android.
Too bad! Thank you, Apple and Google.
How about using Signed cookies ? Create signed cookie using custom policy which also supports various kind of restrictions you want to set and also it is wildcard.

Resources