Amazon Cloudfront removes Referer header - amazon-cloudfront

I am using Amazon CloudFront to deliver some HDS files. I have an origin server which check the HTTP HEADER REFERER and in case is no allowed it block it.
The problem is that cloud front is removing the referer header, so it is not forwarded to the origin.
Is it possible to tell Amazon not to do it?

Within days of writing the answer below, changes have been announced to Cloudfront. Cloudfront will now pass through headers you select and can add some headers of its own.
However, much of what I stated below remains true. Note that in the announcement, an option is offered to forward all headers which, as I suggested, would effectively disable caching. There's also an option to forward specific headers, which will cause Cloudfront to cache the object against the complete set of forwarded headers -- not just the uri -- meaning that the effectiveness of the cache is somewhat reduced, since Cloudfront has no option but to assume that the inclusion of the header might modify the response the server will generate for that request.
Each of your CloudFront distributions now contains a list of headers that are to be forwarded to the origin server. You have three options:
None - This option requests the original behavior.
All - This option forwards all headers and effectively disables all caching at the edge.
Whitelist - This option give you full control of the headers that are to be forwarded. The list starts out empty, and grows as you add more headers. You can add common HTTP headers by choosing them from a list. You can also add "custom" headers by simply entering the name.
If you choose the Whitelist option, each header that you add to the list becomes part of the cache key for the URLs associated with the distribution. Adding a header to the list simply tells CloudFront that the value of the header can affect the content returned by the origin server.
http://aws.amazon.com/blogs/aws/enhanced-cloudfront-customization/
Cloudfront does remove the Referer header along with several others that are not particularly meaningful -- or whose presence would cause illogical consequences -- in the world of cached content.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RequestAndResponseBehaviorCustomOrigin.html
Just like cookies, if the Referer: header were allowed to remain, such that the origin could see it and react to it, that would imply that the object should be cached based on the request plus the referring page, which would seem to largely defeat the cachability of objects. Otherwise, if the origin did react to an undesired referer and send no-cache responses, that would be all well and good until the first legitimate request came in, the response to which would be served to subsequent requesters regardless of their referer, also largely defeating the purpose.
RFC-2616 Section 13 requires that a cache return a response that has been "checked for equivalence with what the origin server would have returned," and this implies that the response be valid based on all headers in the request.
The same thing goes for User-agent and other headers an origin server might use to modify its response... if you need to react to these values at the origin, there's little obvious purpose for serving them with a CDN.
Referring page-based tests are quite a primitive measure, the way many people use them, since headers are so trivial to forge.
If you are dealing with a platform that you don't control, and this is something you need to override (with a dummy value, just to keep the existing system "happy,") then a reverse proxy in front of the origin server could serve such a purpose, with Cloudfront using the reverse proxy as its origin.

In today's newsletter amazon announced that it is now possible to forward request headers with cloudfront. See: http://aws.amazon.com/de/about-aws/whats-new/2014/06/26/amazon-cloudfront-device-detection-geo-targeting-host-header-cors/

Related

CloudFront - How to forward all request headers to the origin

At CloudFront behaviour setting, is "All" the one to forward all request headers to the origin?
Values That You Specify When You Create or Update a Distribution
If you configure CloudFront to forward all headers to your origin for a cache behavior, CloudFront never caches the associated objects. Instead, CloudFront forwards all requests for those objects to the origin. In that configuration, the value of Minimum TTL must be 0.
Yes, it is.
The documentation seems to focus more on caching based on headers and less on what's forwarded, but caching on headers and forwarding headers to the origin go hand-in-hand.
As I was looking for clear citations from the documentation, one reference I found in the Amazon CloudFront Developer Guide is the one shown below. It's a link to a section titled "Cache Based on Selected Request Headers" but its anchor tag is DownloadDistValuesForwardHeaders.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/distribution-web-values-specify.html#DownloadDistValuesForwardHeaders
This suggests that someone has tried to clarify or simplify the documemtation... with apparently limited success.
Note that this forwards almost all headers to the origin, except for some that are still stripped for security and/or operational reasons, like X-Forwarded-Proto, X-Real-IP, and X-Edge-*.
Note also that if your origin protocol is HTTPS and you were not already whitelisting the Host header at CloudFront, then whitelisting all headers will potentially change the requirements for the origin's TLS certificate. Failure to handle this correctly is one of several reasons why CloudFront might return a 502 error to the viewer.
The layout has changed a bit since this question was asked and answered.
In the "behavior" settings, it is now necessary to select "Legacy cache settings" for these options to be visible. You can select "all" or a specific set of headers to forward. Below are a set of headers that allow websocket connections to work:

cloudfront fail to request objects in behavior

I have setup cloudfront, elb and my ec2 web server for default behavior (no caching), everything is working fine. There is only 1 origin (the elb) and the origin path is empty.
Now I want to cache static stuff with cloudfront from the web server (wildfly) like js/css, they're all served in /my-context/assets folder
So i add a new behavior with path pattern '/my-context/assets/*' and default cache settings using the same origin.
This is not working, my request login page return the page html itself, but all css/js are failed. Request to /my-context/assets/a/b/some.css return 502 with "CloudFront wasn't able to connect to the origin."
I also tried to setup a new origin (with the same elb) with path "/my-context/assets" for the new behavior, it also fail.
Can I have instruction on how to make this work? or is this actually not do-able?
Thank you!
The solution is to configure the cache behavior to forward (whitelist) the Host: header to the origin, from the incoming request.
This is not to imply that it's the "correct" configuration in every case, but many times it is desirable, or even required.
When CloudFront makes a back-end https connection to your origin server, the certificate offered by the server has to not only be valid (not expired, not self-signed, issued by a trusted CA, and with an intact intermediate chain) but also has to be valid for the request CloudFront will be sending.
For CloudFront to use HTTPS when communicating with your origin, one of the domain names in the certificate must match one or both of the following values:
• The value that you specified for Origin Domain Name for the applicable origin in your distribution.
• If you configured CloudFront to forward the Host header to your origin, the value of the Host header.
The SSL/TLS certificate on your origin includes a domain name in the Common Name field and possibly several more in the Subject Alternative Names field. (CloudFront supports wildcard characters in certificate domain names.) If your certificate doesn't contain any domain names that match either Origin Domain Name or the domain name in the Host header, CloudFront returns an HTTP status code 502 (Bad Gateway) to the viewer.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/SecureConnections.html#SecureConnectionsHowToRequireCustomProcedure
In your case, you originally were running requests through CloudFront with caching disabled, which is typically done by configuring CloudFront to forward all request headers to the origin, as this automatically disables caching of responses.
Later, when you attempted configure a second cache behavior so that objects matching certain path patterns could be cached, you naturally did not forward all headers to the origin -- but in this case, forwarding the Host: header (which CloudFront refers to as "whitelisting" the header for forwarding) was necessary, because CloudFront appeared to have needed that information in order to validate the certificate that the origin server was presenting.
If you don't forward the Host: header, the the certificate must match the Origin Domain Name, as noted above, and in your case, this us apparently not the case. If the Host: header is not whitelisted for forwarding, then CloudFront still sends a host header in the back-end request, but this header is set to the same value as Origin Domain Name, hence the reason the certificate must match that value.
If matching one way or the other were not required (along with all the other conditions CloudFront imposes on HTTPS connections to the origin), this would prevent CloudFront from determining with reasonable certainty that the back end connection was being handled by the intended server, and that the origin server is genuinely the server it claims to be, which is one of two protections provided by TLS/SSL (the other protection, of course, is the actual encryption of traffic).

Is there a security vulnerability to permit all CORS origins if the request origin matches a pattern?

We had a requirement to permit cross-origin requests provided that the origin was part of the corporate domain. While this question is not C# or Web API specific, the snippets below demonstrate our strategy.
We have a collection of regular expressions defining permitted origins:
private static readonly Regex[] AllowedOriginPatterns = new[]
{
new Regex(#"https?://\w[\w\-\.]*\.company.com",
RegexOptions.Compiled | RegexOptions.IgnoreCase)
};
Later, in our ICorsPolicyProvider attribute, we test the origin. If the origin matches any pattern, we add the origin to the set of allowed origins:
var requestOrigin = request.GetCorsRequestContext().Origin;
if (!string.IsNullOrWhiteSpace(requestOrigin) &&
AllowedOriginPatterns.Any(pattern => pattern.IsMatch(requestOrigin)))
{
policy.Origins.Add(requestOrigin);
}
The perk of this approach is that we can support a large and flexible set of authorized origins, and restrict unauthorized origins, all without placing hundreds of origins in our headers.
We have an issue with HTTP header rewriting on the part of our reverse proxy. Under some circumstances, the reverse proxy replaces the Access-Control-Allow-Origin value with the host. Seems fishy to me, but that is not my question.
It was proposed that we change our code so that if the origin meets our precondition that we return * instead of echoing the origin in the response.
My question is whether we open a potential security vulnerability by returning Access-Control-Allow-Origin: * to origins that match our preconditions. I would hope that browsers check the access control headers with every request. If so, I am not overly concerned. If not, I can imagine an attacker running an untrusted web site taking advantage of browsers that have previously received a * for a trusted origin.
I have read the OWASP article on scrutinizing CORS headers. The article does not reflect any of my concerns, which is reassuring; however, I wanted to inquire of the community before pressing forward.
Update
As it would turn out, this question is not valid. We discovered, after exploring this implementation, that allowing all origins for credentialed requests is disallowed. After troubleshooting our failed implementation, I found two small statements that make this quite clear.
From the W3C Recommendation, Section 6.1:
Note: The string "*" cannot be used for a resource that supports credentials.
From the Mozilla Developer Network:
Important note: when responding to a credentialed request, server must specify a domain, and cannot use wild carding.
This proved ultimately disappointing in our implementation; however, I imagine the complexity of adequately securing credentialed CORS requests would explain why this is a prohibited scenario.
I appreciate all assistance and consideration rendered.
My question is whether we open a potential security vulnerability by returning Access-Control-Allow-Origin: * to origins that match our preconditions.
Potentially, if the browser (or proxy server) caches the response. However, if you are not specifying Access-Control-Allow-Credentials then the risk is minimal.
To mitigate the risk of sending *, make sure you set the appropriate HTTP response headers to prevent caching:
Cache-Control: private, no-cache, no-store, max-age=0, no-transform
Pragma: no-cache
Expires: 0
Note that specifying private is not enough as an attacker could then target someone who has already visited your page and has it in the browser cache.
As an alternative, you might be able to use the Vary: Origin header. However, if some clients don't send the Origin header you might end up with * cached for all requests.
The best approach would be to fix the reverse proxy issue so that your page returns the appropriate Origin as per your code. However, if this is not feasible then * is fine, only if used with an aggressive anti-caching policy. Also, * for all requests may be fine if Access-Control-Allow-Credentials is false or missing - this depends whether your application holds sensitive user based state.

Is there any way to identify requests coming to custom origin server from CloudFront?

I'm using CloudFront with custom origin and want to redirect certain requests coming to a web app to CloudFront (clients use direct URLs, which cannot be changed to CloudFront-based URLs). In order to ensure that cache on CloudFront is updated properly, I must not redirect requests coming from CloudFront itself. Is there any way to identify such requests on origin server?
Does CloudFront add any custom headers to requests sent to origin server? Or is there any other reliable way to determine that requests is coming from CloudFront?
yes you can identify requests coming to your origin server from cloudfront by checking the useragent. the user agent would be 'Amazon CloudFront'
Update
It's an old question, but my update useful for someone research or looking for the new solution.
Recently AWS added new feature Origin Custom Headers.You can set a header with a secret value and check it on your origin server by the web server or your applications.
Update
Avinash Bijja correctly pointed out (+1) that the HTTP User-agent header would be 'Amazon CloudFront' for requests coming from Amazon CloudFront servers. Unfortunately this doesn't seem to be explicitly documented indeed, but is implicitly acknowledged by various posts in the respective forum, see e.g. the AWS Team response to User Agent String - does CF overwrite the user agent string?:
You are correct. The User-Agent field is always populated as "Amazon CloudFront".
However, it turns out this is not currently entirely reliable, insofar CloudFront sends an empty User-Agent to the origin if one is missing in the originating client request already:
I can confirm that CloudFront is not sending a User-Agent to the
origin when the original client does not send a User-Agent. We have
enhancements & fixes to User-Agent handling on our backlog, but no
release dates at this time. I've sent you a PM with further details.
These enhancements & fixes are apparently not rolled out still as of February 07 2013 at least.
These enhancements & fixes have been rolled out as of August 05 2013 (thanks webbiedave for the update!).
Initial Answer
Does CloudFront add any custom headers to requests sent to origin
server?
One would think so indeed, but at least they don't appear to be documented where I would have expected it, namely in How CloudFront Processes and Forwards Requests to Your Custom Origin Server. Given you are in control of the origin server, you might just check its HTTP access logs though?
Or is there any other reliable way to determine that requests is
coming from CloudFront?
You'll need to judge the reliability yourself, but The IP address that CloudFront forwards to the origin server is the IP addresses of a CloudFront server, not the IP address of the end user's computer. - consequently you could restrict access to the published Amazon CloudFront Public IP Ranges; however, be aware of the respective disclaimer:
The CloudFront IP addresses change frequently and we cannot guarantee
advance notice of changes. On a best-effort basis, we will provide the
list of current addresses. Customers should not use these addresses
for mission critical applications and must never hard code them in DNS
names. [emphasis mine]
Consequently you'll need to monitor this forum/post to take notice of respective changes as early as possible (if this constraint is acceptable for your use case in the first place of course).
CloudFront appears to add a X-Amz-Cf-Id header to every request before forwarding it to the origin. At least, it currently is doing that for me.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RequestAndResponseBehaviorCustomOrigin.html#request-custom-headers-behavior
This should probably be a comment on Reza's answer, but I can't do that :).
For completeness, here's the link to the official documentation regarding Forwarding Custom Headers, which currently claims the following.
You can configure CloudFront to include custom headers whenever it forwards a request to your origin. You can specify the names and values of custom headers for each origin, both for custom origins and for Amazon S3 buckets. Custom headers have a variety of uses, such as the following:
You can identify the requests that are forwarded to your custom origin by CloudFront. This is useful if you want to know whether users are bypassing CloudFront or if you're using more than one CDN and you want information about which requests are coming from each CDN. (If you're using an Amazon S3 origin and you enable Amazon S3 server access logging, the logs don't include header information.)

Is access-control-origin: * safe if session based auth is disallowed? [duplicate]

I recently had to set Access-Control-Allow-Origin to * in order to be able to make cross-subdomain AJAX calls. I feel like this might be a security problem. What risks am I exposing myself to if I keep the setting?
By responding with Access-Control-Allow-Origin: *, the requested resource allows sharing with every origin. This basically means that any site can send an XHR request to your site and access the server’s response which would not be the case if you hadn’t implemented this CORS response.
So any site can make a request to your site on behalf of their visitors and process its response. If you have something implemented like an authentication or authorization scheme that is based on something that is automatically provided by the browser (cookies, cookie-based sessions, etc.), the requests triggered by the third party sites will use them too.
This indeed poses a security risk, particularly if you allow resource sharing not just for selected resources but for every resource. In this context you should have a look at When is it safe to enable CORS?.
Update (2020-10-07)
Current Fetch Standard omits the credentials when credentials mode is set to include, if Access-Control-Allow-Origin is set to *.
Therefore, if you are using a cookie-based authentication, your credentials will not be sent on the request.
Access-Control-Allow-Origin: * is totally safe to add to any resource, unless that resource contains private data protected by something other than standard credentials. Standard credentials are cookies, HTTP basic auth, and TLS client certificates.
Eg: Data protected by cookies is safe
Imagine https://example.com/users-private-data, which may expose private data depending on the user's logged in state. This state uses a session cookie. It's safe to add Access-Control-Allow-Origin: * to this resource, as this header only allows access to the response if the request is made without cookies, and cookies are required to get the private data. As a result, no private data is leaked.
Eg: Data protected by location / ip / internal network is not safe (unfortunately common with intranets and home appliances):
Imagine https://intranet.example.com/company-private-data, which exposes private company data, but this can only be accessed if you're on the company's wifi network. It's not safe to add Access-Control-Allow-Origin: * to this resource, as it's protected using something other than standard credentials. Otherwise, a bad script could use you as a tunnel to the intranet.
Rule of thumb
Imagine what a user would see if they accessed the resource in an incognito window. If you're happy with everyone seeing this content (including the source code the browser received), it's safe to add Access-Control-Allow-Origin: *.
AFAIK, Access-Control-Allow-Origin is just a http header sent from the server to the browser. Limiting it to a specific address (or disabling it) does not make your site safer for, for example, robots. If robots want to, they can just ignore the header. The regular browsers out there (Explorer, Chrome, etc.) by default honor the header. But an application like Postman simply ignores it.
The server end doesn't actually check what the 'origin' is of the request when it returns the response. It just adds the http header. It's the browser (the client end) which sent the request that decides to read the access-control header and act upon it. Note that in the case of XHR it may use a special 'OPTIONS' request to ask for the headers first.
So, anyone with creative scripting abilities can easily ignore the whole header, whatever is set in it.
See also Possible security issues of setting Access-Control-Allow-Origin.
Now to actually answer the question
I can't help but feel that I'm putting my environment to security
risks.
If anyone wants to attack you, they can easily bypass the Access-Control-Allow-Origin. But by enabling '*' you do give the attacker a few more 'attack vectors' to play with, like, using regular webbrowsers that honor that HTTP header.
Here are 2 examples posted as comments, when a wildcard is really problematic:
Suppose I log into my bank's website. If I go to another page and then
go back to my bank, I'm still logged in because of a cookie. Other
users on the internet can hit the same URLs at my bank as I do, yet
they won't be able to access my account without the cookie. If
cross-origin requests are allowed, a malicious website can effectively
impersonate the user.
– Brad
Suppose you have a common home router, such as a Linksys WRT54g or
something. Suppose that router allows cross-origin requests. A script
on my web page could make HTTP requests to common router IP addresses
(like 192.168.1.1) and reconfigure your router to allow attacks. It
can even use your router directly as a DDoS node. (Most routers have
test pages which allow for pings or simple HTTP server checks. These
can be abused en masse.)
– Brad
I feel that these comments should have been answers, because they explain the problem with a real life example.
This answer was originally written as a reply to What are the security implications of setting Access-Control-Allow-Headers: *, if any? and was merged despite being irrelevant to this question.
To set it to a wildcard *, means to allow all headers apart from safelisted ones, and remove restrictions that keeps them safe.
These are the restrictions for the 4 safelisted headers to be considered safe:
For Accept-Language and Content-Language: can only have values consisting of 0-9, A-Z, a-z, space or *,-.;=.
For Accept and Content-Type: can't contain a CORS-unsafe request header byte: 0x00-0x1F (except for 0x09 (HT), which is allowed), "():<>?#[\]{}, and 0x7F (DEL).
For Content-Type: needs to have a MIME type of its parsed value (ignoring parameters) of either application/x-www-form-urlencoded, multipart/form-data, or text/plain.
For any header: the value’s length can't be greater than 128.
For simplicity's sake, I'll base my answer on these headers.
Depending on server implementation, simply removing these limitations can be very dangerous (to the user).
For example, this outdated wordpress plugin has a reflected XSS vulnerability where the value of Accept-Language was parsed and rendered on the page as-is, causing script execution on the user's browser should a malicious payload be included in the value.
With the wildcard header Access-Control-Allow-Headers: *, a third party site redirecting to your site could set the value of the header to Accept Language: <script src="https://example.com/malicious-script.js"></script>, given that the wildcard removes the restriction in Point 1 above.
The preflight response would then give the greenlight to this request, and the user will be redirected to your site, triggering an XSS on their browser, which impact can range from an annoying popup to losing control of their account through cookie hijacking.
Thus, I would strongly recommend against setting a wildcard unless it is for an API endpoint where nothing is being rendered on the page.
You can set Access-Control-Allow-Headers: Pragma as an alternative solution to your problem.
Note that the value * only counts as a special wildcard value for requests without credentials (requests without HTTP cookies or HTTP authentication information), otherwise it will be read as a literal header. Documentation
In scenario where server attempts to disable the CORS completely by setting below headers.
Access-Control-Allow-Origin: * (tells the browser that server accepts
cross site requests from any ORIGIN)
Access-Control-Allow-Credentials: true (tells the browser that cross
site requests can send cookies)
There is a fail safe implemented in browsers that will result in below error
"Credential is not supported if the CORS header ‘Access-Control-Allow-Origin’ is ‘*’"
So in most scenarios setting ‘Access-Control-Allow-Origin’ to * will not be a problem. However to secure against attacks, the server can maintain a list of allowed origins and whenever server gets a cross origin request, it can validate the ORIGIN header against the list of allowed origins and then echo back the same in Access-Control-Allow-Origin header.
Since ORIGIN header can't be changed by javascript running on the browser, the malicious site will not be able to spoof it.

Resources