How can I prevent Amazon Cloudfront from hotlinking? - amazon-cloudfront

I use Amazon Cloudfront to host all my site's images and videos, to serve them faster to my users which are pretty scattered across the globe. I also apply pretty aggressive forward caching to the elements hosted on Cloudfront, setting Cache-Controlto public, max-age=7776000.
I've recently discovered to my annoyance that third party sites are hotlinking to my Cloudfront server to display images on their own pages, without authorization.
I've configured .htaccessto prevent hotlinking on my own server, but haven't found a way of doing this on Cloudfront, which doesn't seem to support the feature natively. And, annoyingly, Amazon's Bucket Policies, which could be used to prevent hotlinking, have effect only on S3, they have no effect on CloudFront distributions [link]. If you want to take advantage of the policies you have to serve your content from S3 directly.
Scouring my server logs for hotlinkers and manually changing the file names isn't really a realistic option, although I've been doing this to end the most blatant offenses.

You can forward the Referer header to your origin
Go to CloudFront settings
Edit Distributions settings for a distribution
Go to the Behaviors tab and edit or create a behavior
Set Forward Headers to Whitelist
Add Referer as a whitelisted header
Save the settings in the bottom right corner
Make sure to handle the Referer header on your origin as well.

We had numerous hotlinking issues. In the end we created css sprites for many of our images. Either adding white space to the bottom/sides or combining images together.
We displayed them correctly on our pages using CSS, but any hotlinks would show the images incorrectly unless they copied the CSS/HTML as well.
We've found that they don't bother (or don't know how).

The official approach is to use signed urls for your media. For each media piece that you want to distribute, you can generate a specially crafted url that works in a given constraint of time and source IPs.
One approach for static pages, is to generate temporary urls for the medias included in that page, that are valid for 2x the duration as the page's caching time. Let's say your page's caching time is 1 day. Every 2 days, the links would be invalidated, which obligates the hotlinkers to update their urls. It's not foolproof, as they can build tools to get the new urls automatically but it should prevent most people.
If your page is dynamic, you don't need to worry to trash your page's cache so you can simply generate urls that are only working for the requester's IP.

As of Oct. 2015, you can use AWS WAF to restrict access to Cloudfront files. Here's an article from AWS that announces WAF and explains what you can do with it. Here's an article that helped me setup my first ACL to restrict access based on the referrer.
Basically, I created a new ACL with a default action of DENY. I added a rule that checks the end of the referer header string for my domain name (lowercase). If it passes that rule, it ALLOWS access.
After assigning my ACL to my Cloudfront distribution, I tried to load one of my data files directly in Chrome and I got this error:

As far as I know, there is currently no solution, but I have a few possibly relevant, possibly irrelevant suggestions...
First: Numerous people have asked this on the Cloudfront support forums. See here and here, for example.
Clearly AWS benefits from hotlinking: the more hits, the more they charge us for! I think we (Cloudfront users) need to start some sort of heavily orchestrated campaign to get them to offer referer checking as a feature.
Another temporary solution I've thought of is changing the CNAME I use to send traffic to cloudfront/s3. So let's say you currently send all your images to:
cdn.blahblahblah.com (which redirects to some cloudfront/s3 bucket)
You could change it to cdn2.blahblahblah.com and delete the DNS entry for cdn.blahblahblah.com
As a DNS change, that would knock out all the people currently hotlinking before their traffic got anywhere near your server: the DNS entry would simply fail to look up. You'd have to keep changing the cdn CNAME to make this effective (say once a month?), but it would work.
It's actually a bigger problem than it seems because it means people can scrape entire copies of your website's pages (including the images) much more easily - so it's not just the images you lose and not just that you're paying to serve those images. Search engines sometimes conclude your pages are the copies and the copies are the originals... and bang goes your traffic.
I am thinking of abandoning Cloudfront in favor of a strategically positioned, super-fast dedicated server (serving all content to the entire world from one place) to give me much more control over such things.
Anyway, I hope someone else has a better answer!

This question mentioned image and video files.
Referer checking cannot be used to protect multimedia resources from hotlinking because some mobile browsers do not send referer header when requesting for an audio or video file played using HTML5.
I am sure of that about Safari and Chrome on iPhone and Safari on Android.
Too bad! Thank you, Apple and Google.

How about using Signed cookies ? Create signed cookie using custom policy which also supports various kind of restrictions you want to set and also it is wildcard.

Related

Using Microsoft Azure CDN with already existing site

I am fairly new to using CDN but i've found that there are two types of CDN.
You redirect your DNS to your CDN and they automatically take over the traffic as a proxy and do the caching and content delivery. No change in URLs and it's basically no work. Even hard to understand if my content is being delivered through CDN (you have to check headers or use website tools that look for it). Good example is CloudFlare
You do not redirect your DNS. You give it an origin server, then everything gets copied over to the CDN servers and you content is available on the new CDN URLs.
Now, i have a website with a lot of images. I want to use Microsoft Azure CDN. I created my profile (Standart Microsoft CDN) and created the CDN endpoint. I tested and it works fine
https://xxxx.com/images/example.png
https://xxxx.azureedge.net/images/example.png
All good - my image is there, along wiht others
So what comes next? I have an image (img src tag) for example pointing to /images/example.png. It seems like i need to change it to https://xxxx.azureedge.net/images/example.png
So my website has a lot of images and if i have to go and manually re-do all the img src tags it seems like a lot of work and what happens if i decide to move to another CDN or stop using CDN. So all this leads me to believe i might be missing a point here and not doing this correctly.
Is that the correct way a CDN like this should work? If yes, may i get some help on how can i achieve that with minimum amount of labour? re-doing all my css, js and images to the new URLs? I am using Joomla CSM.
Documentation out there on how to tackle or deal with something as easy as this are unbelievably limited.
Basically you are right. Mainly, CDN services will basically "pull" static content (for example images) from your website, and then serve them from multiple locations (servers) to your visitors from your provided CDN url. For example:
Your origin url
mydomain.com/image.jpg
CDN url
mycdn.cdnservice.com/image.jpg
If the URL was the SAME as your existing url, then it wouldn't really work as a CDN now would it. There are often options so that you can use your own subdomain, for example cdn.mydomain.com/image.jpg, but it's still a change of URL. Most CMS's will often have options, or at least plugins, to set CDN url for static assets, which will dynamically replace the paths to point to the CDN url. If you have set file paths manually, then these will need to be replaced manually also with the full CDN path.
There are a few hacks like server rewrite which might allow you to use the same URL, but this is not recommended to pursue. Generally speaking, using a CDN requires changing url to your static assets.
Option #2 is to use a reverse proxy CDN service like Cloudflare. This requires changing your nameservers to route ALL your traffic through Cloudflare, and then Cloudflare will work as a CDN for static assets without you having to change url paths. However, it must be noted that Cloudflare is much more than just a CDN, and you can't really control how your assets are cached on their CDN/servers.

SSL with CartThrob - in-template redirect or htaccess on the basis of URL segment?

this is a broader question than I would probably ask of the CartThrob folks, which is why I'm posting it here. What would the community recommend as far as SSL is concerned with CartThrob? The store functions are limited to a couple of key template groups. So my thinking was perhaps the best way to handle it would be htaccess on the basis of the presence of those URL segments. I would like to return the user to a non-SSL connection when they are not in the store area. So a trigger might be the first segment being "basket" or "account" for example. Or what about an in-template redirect to the secure URL? Very interested to hear the community's suggestions on how best to handle SSL within a given area of an EE site. I'm interested in whatever makes the most sense to implement, while also ensuring that, for example, all assets - even those loaded with path variables - are loaded via SSL. Thanks all!
I've always used CartThrob's https_redirect tag (docs) on my checkout screens, which will rewrite your {path}, {permalink} (etc)-created URLs to use https, as well as redirect you to the https:// version of your page if necessary.
That, combined with using the protocol-agnostic style of calling scripts and stylesheets should get you most of the way in getting your secure icon in the browser.
(Example:)
<script src="//ajax.googleapis.com/ajax/libs/jqueryui/1.9.1/jquery-ui.min.js"></script>

HTTPS Failing on Media in Magento

I'm having a problem with a clients magento site that has https enabled on the secure pages,
The website it built heavily around static block content and on the https pages images are pulled from static blocks (over 400 of them) using the media insert in the static block {{media url="media/bla/bla/bob.png"}} these images are comign through as http://site.com/media/bla/bla/bob.png
its not realistic, and it wouldn't make any sense to go through and change all these links to direct links.
Any ideas?
Cheers
Roly!
You are suppose to use the {{store url=""}} or the {{secure_base_url}}media/ in ur blocks
if ur not certain that ur page will be on HTTPS or HTTP the use first one if you know for sure that the request will be HTTPs use second one. (NOTE. Second is a system config path not the actual value that u'll put in the CMS block).
Hope it helps.
Whereas media files are not subject to a fallback, and with the awareness that if the directory level for Magento changes w/r/t the webroot (e.g. http://site.com/ vs. http://site.com/magento/) you can lead with the double-slash network location:
<img src="//media/bla/bob.png" />
Therefore, a search and replace against using the current data in cms_block.content is indicated.
I'll reiterate that this is not appropriate for skin assets due to the fallback.

What is HTTP cache best practices for high-traffic static site?

We have a fairly high-traffic static site (i.e. no server code), with lots of images, scripts, css, hosted by IIS 7.0
We'd like to turn on some caching to reduce server load, and are considered setting the expiry of web content to be some time in the future. In IIS, we can do this on a global level via "Expire web content" section of the common http headers in the IIS response header module. Perhaps setting content to expire 7 days after serving.
All this actually does is sets the max-age HTTP response header, so far as I can tell, which makes sense, I guess.
Now, the confusion:
Firstly, all browsers I've checked (IE9, Chrome, FF4) seem to ignore this and still make conditional requests to the server to see if content has changed. So, I'm not entirely sure what the max-age response header will actually effect?! Could it be older browsers? Or web-caches?
It is possible that we may want to change an image in the site at short notice... I'm guessing that if the max-age is actually used by something that, by its very nature, it won't then check if this image has changed for 7 days... so that's not what we want either
I wonder if a best practice is to partition one's site into folders of content really won't change often and only turn on some long-term expiry for these folders? Perhaps to vary the querystring to force a refresh of content in these folders if needed (e.g. /assets/images/background.png?version=2) ?
Anyway, having looked through the (rather dry!) HTTP specification, and some of the tutorials, I still don't really have a feel for what's right in our situation.
Any real-world experience of a situation similar to ours would be most appreciated!
Browsers fetch the HTML first, then all the resources inside (css, javascript, images, etc).
If you make the HTML expire soon (e.g. 1 hour or 1 day) and then make the other resources expire after 1 year, you can have the best of both worlds.
When you need to update an image, or other resource, you just change the name of that file, and update the HTML to match.
The next time the user gets fresh HTML, the browser will see a new URL for that image, and get it fresh, while grabbing all the other resources from a cache.
Also, at the time of this writing (December 2015), Firefox limits the maximum number of concurrent connections to a server to six (6). This means if you have 30 or more resources that are all hosted on the same website, only 6 are being downloaded at any time until the page is loaded. You can speed this up a bit by using a content delivery network (CDN) so that everything downloads at once.

Implementing HTTP or HTTPS depending on page

I want to implement https on only a selection of my web-pages. I have purchased my SSL certificates etc and got them working. Despite this, due to speed demands i cannot afford to place them on every single page.
Instead i want my server to serve up http or https depending on the page being viewed. An example where this has been done is ‘99designs’
The problem in slightly more detail:
When my visitors first visit my site they only have access to non-sensitive information and therefore i want them to be presented with simple http.
Then once they login they are granted access to more sensitive information, e.g. profile information for which HTTPS is used to deliver.
Despite being logged in, if the user goes back to a non-sensitive page such as the homepage then i want it delivered using HTTP.
One common solution seems to be using the .htaccess file. The problem is that my site is relatively large meaning that to use this would require me to write a rule for every page (several hundred) to determine whether it should be server up using http or https.
And then there is the problem of defining user generated content pages.
Please help,
Many thanks,
David
You've not mentioned anything about the architecture you are using. Assuming that the SSL termination is on the webserver, then you should set up separate virtual hosts with completely seperate and non-overlapping document trees, and for preference, use a path schema which does not overlap (to avoid little accidents).

Resources