cloudfront fail to request objects in behavior - amazon-cloudfront

I have setup cloudfront, elb and my ec2 web server for default behavior (no caching), everything is working fine. There is only 1 origin (the elb) and the origin path is empty.
Now I want to cache static stuff with cloudfront from the web server (wildfly) like js/css, they're all served in /my-context/assets folder
So i add a new behavior with path pattern '/my-context/assets/*' and default cache settings using the same origin.
This is not working, my request login page return the page html itself, but all css/js are failed. Request to /my-context/assets/a/b/some.css return 502 with "CloudFront wasn't able to connect to the origin."
I also tried to setup a new origin (with the same elb) with path "/my-context/assets" for the new behavior, it also fail.
Can I have instruction on how to make this work? or is this actually not do-able?
Thank you!

The solution is to configure the cache behavior to forward (whitelist) the Host: header to the origin, from the incoming request.
This is not to imply that it's the "correct" configuration in every case, but many times it is desirable, or even required.
When CloudFront makes a back-end https connection to your origin server, the certificate offered by the server has to not only be valid (not expired, not self-signed, issued by a trusted CA, and with an intact intermediate chain) but also has to be valid for the request CloudFront will be sending.
For CloudFront to use HTTPS when communicating with your origin, one of the domain names in the certificate must match one or both of the following values:
• The value that you specified for Origin Domain Name for the applicable origin in your distribution.
• If you configured CloudFront to forward the Host header to your origin, the value of the Host header.
The SSL/TLS certificate on your origin includes a domain name in the Common Name field and possibly several more in the Subject Alternative Names field. (CloudFront supports wildcard characters in certificate domain names.) If your certificate doesn't contain any domain names that match either Origin Domain Name or the domain name in the Host header, CloudFront returns an HTTP status code 502 (Bad Gateway) to the viewer.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/SecureConnections.html#SecureConnectionsHowToRequireCustomProcedure
In your case, you originally were running requests through CloudFront with caching disabled, which is typically done by configuring CloudFront to forward all request headers to the origin, as this automatically disables caching of responses.
Later, when you attempted configure a second cache behavior so that objects matching certain path patterns could be cached, you naturally did not forward all headers to the origin -- but in this case, forwarding the Host: header (which CloudFront refers to as "whitelisting" the header for forwarding) was necessary, because CloudFront appeared to have needed that information in order to validate the certificate that the origin server was presenting.
If you don't forward the Host: header, the the certificate must match the Origin Domain Name, as noted above, and in your case, this us apparently not the case. If the Host: header is not whitelisted for forwarding, then CloudFront still sends a host header in the back-end request, but this header is set to the same value as Origin Domain Name, hence the reason the certificate must match that value.
If matching one way or the other were not required (along with all the other conditions CloudFront imposes on HTTPS connections to the origin), this would prevent CloudFront from determining with reasonable certainty that the back end connection was being handled by the intended server, and that the origin server is genuinely the server it claims to be, which is one of two protections provided by TLS/SSL (the other protection, of course, is the actual encryption of traffic).

Related

Could automatically redirecting all HTTP traffic to HTTPS inadvertently encourage a man-in-the-middle attack vector?

If your web server implements HTTPs, it's common practice to 301 redirect all http://* URLs to their https:// equivalents.
However, it occurs to me that this means that the client's original HTTP request (and any data contained in it) remains fully unencrypted, and only the response is encrypted. Does automatically "upgrading" all insecure requests on the server end effectively encourage clients to continue sending data to insecure HTTP endpoints, more or less downgrade attacking myself?
I realize I can't stop a client from insecurely sending any data to any endpoint, but does the practice of automatically redirecting HTTP to HTTPS "condone" the client doing so? Would it be better practice to instead outright reject all HTTP traffic that could contain sensitive data and make it the browser's responsibility to attempt or recommend the upgrade to HTTPS?
This is indeed a known issue, and HTTP Strict Transport Security (HSTS)—released in 2012—aims to solve it. It is an HTTP header field which takes the form:
Strict-Transport-Security: max-age=<seconds> [; includeSubDomains]
HSTS informs the browser via that all connections to a given domain must be "upgraded" to https, even if they were specified as non-secure http`:
The UA MUST replace the URI scheme with "https"
This applies to all future connections to the domain (including following links), for the duration of the max-age specified in the header.
However this does leave open a potential vulnerability on the user's first visit to a domain with HSTS (if the header were stripped by an attacker). Google Chrome, Mozilla Firefox, Internet Explorer and Microsoft Edge attempt to limit this problem by including a "pre-loaded" list of HSTS sites.
So this preloaded list has all popular websites, All you may see in this chromium link, this list is humongous(10M), thereby solving aforementioned problem to a certain extent.

Why does RFC 6797 forbid sending of the Strict-Transport-Security header over plain HTTP responses?

When reading the spec for HSTS (Strict-Transport-Security), I see an injunction in section 7.2 against sending the header when accessed over http instead of https:
An HSTS Host MUST NOT include the STS header field in HTTP responses
conveyed over non-secure transport.
Why is this? What are the risks if this is violated?
The danger is to the availability of the website itself. If the website is able to respond (either now or in the future) over HTTP but not over HTTPS, it will semi-permanently prevent browsers from accessing the site:
Browser: "I want http://example.com"
ExampleCom: "You should go to the https:// URL now and for the next 3 months!"
Browser: "I want https://example.com"
ExampleCom: [nothing]
By only serving the STS header over HTTPS connections, the site guarantees that at least right now it is not pointing browsers to an inaccessible site. Of course, if the max-age is set to 3 months and the HTTPS site breaks tomorrow, the effect is the same. This is merely an incremental protection.
If your server cannot positively tell from request characteristics whether it is being accessed over HTTP vs. HTTPS, but you believe you have set up your website to only be accessible over HTTPS anyhow (e.g. due to SSL/TLS termination in an nginx proxy), it should be safe to serve the header all the time. But if you want to serve both, e.g. if you wish to serve HTTP->HTTPS redirects from your server, find out how your proxy tells you about the connection and start gating the STS header response on that.
(Thanks to Deirdre Connolly for this explanation!)
Not sure if you have a specific issue you are trying to solve, or are only asking for curiosity sake but this might be better asked on http://security.stackexchange.com
Like you I can't see the threat from the server sending this over HTTP. It doesn't really make sense, but I'm not sure if there is a risk to be honest. Except to say if you can't set up the header properly then perhaps you're not ready to implement HSTS as it can be dangerous if misconfigured!
The far bigger danger is if a browser was to process a HSTS header received over HTTP, which section 8.1 explicitly states it MUST ignore:
If an HTTP response is received over insecure transport, the UA MUST
ignore any present STS header field(s).
The risk here is that a malicious attacker (or an accidentally misconfigured header) could take a HTTP-only website offline (or the HTTP-only parts of a mixed site) if a browser incorrectly processed it. This would effectively cause a DoS for that user(s) until either the header expiries or the site implements HTTPS.
Additionally if a browser did accept this header over HTTP rather than HTTPS, it could be possible for a MITM attacker to expire the header by setting it to a max-age of 0. For example if you have a long HSTS header set on https://www.example.com but attacker was able to publish a max-age=0 header with includeSubDomain over http://example.com, and the browser incorrectly processed that, then it could effectively remove the protection HTTPS gives to your www site.
For these reasons it's very important that clients (i.e. webbrowsers) implement this correctly and ignore the HSTS header if served over HTTP and only process it over HTTPS. This could be another reason the RFC states servers must not send this over HTTP - just in case a browser implements this wrong but, to be honest, if that happens then that browser is putting all HTTP only websites at risk as a MITM attacker could add it as per above.

Disable Serving from Default Cloudfront Hostname (ourdistid.cloudfront.net)

I've setup an alternate domain name for our Cloudfront distribution so we can serve from oursite.com. We'd like to disable ourdistid.cloudfront.net so our site is only accessible from one hostname. Is this possible?
Yes, you can do this, though perhaps not in the place where you might expect to.
By default, CloudFront sets the Host: header in the request sent to the origin server to have the value of the origin server hostname.
However, you can configure CloudFront to forward the original request's host header to the origin server, instead. It doesn't change how the request is routed, only the header that gets forwarded.
After that, it is a simple matter to configure your web server to return the response you want, when the request's Host: header matches the *.cloudfront.net host, which can include a generic error page with whatever code you seem most appropriate, such as 503 Service Unavailable, 404 Not Found, 403 Forbidden, or 410 Gone. You could even use 301 Moved Permanently. Whatever makes the most sense to you.
You can't literally disable the assigned endpoint, but you can prevent it from returning any of your content.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RequestAndResponseBehaviorCustomOrigin.html

Amazon Cloudfront removes Referer header

I am using Amazon CloudFront to deliver some HDS files. I have an origin server which check the HTTP HEADER REFERER and in case is no allowed it block it.
The problem is that cloud front is removing the referer header, so it is not forwarded to the origin.
Is it possible to tell Amazon not to do it?
Within days of writing the answer below, changes have been announced to Cloudfront. Cloudfront will now pass through headers you select and can add some headers of its own.
However, much of what I stated below remains true. Note that in the announcement, an option is offered to forward all headers which, as I suggested, would effectively disable caching. There's also an option to forward specific headers, which will cause Cloudfront to cache the object against the complete set of forwarded headers -- not just the uri -- meaning that the effectiveness of the cache is somewhat reduced, since Cloudfront has no option but to assume that the inclusion of the header might modify the response the server will generate for that request.
Each of your CloudFront distributions now contains a list of headers that are to be forwarded to the origin server. You have three options:
None - This option requests the original behavior.
All - This option forwards all headers and effectively disables all caching at the edge.
Whitelist - This option give you full control of the headers that are to be forwarded. The list starts out empty, and grows as you add more headers. You can add common HTTP headers by choosing them from a list. You can also add "custom" headers by simply entering the name.
If you choose the Whitelist option, each header that you add to the list becomes part of the cache key for the URLs associated with the distribution. Adding a header to the list simply tells CloudFront that the value of the header can affect the content returned by the origin server.
http://aws.amazon.com/blogs/aws/enhanced-cloudfront-customization/
Cloudfront does remove the Referer header along with several others that are not particularly meaningful -- or whose presence would cause illogical consequences -- in the world of cached content.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RequestAndResponseBehaviorCustomOrigin.html
Just like cookies, if the Referer: header were allowed to remain, such that the origin could see it and react to it, that would imply that the object should be cached based on the request plus the referring page, which would seem to largely defeat the cachability of objects. Otherwise, if the origin did react to an undesired referer and send no-cache responses, that would be all well and good until the first legitimate request came in, the response to which would be served to subsequent requesters regardless of their referer, also largely defeating the purpose.
RFC-2616 Section 13 requires that a cache return a response that has been "checked for equivalence with what the origin server would have returned," and this implies that the response be valid based on all headers in the request.
The same thing goes for User-agent and other headers an origin server might use to modify its response... if you need to react to these values at the origin, there's little obvious purpose for serving them with a CDN.
Referring page-based tests are quite a primitive measure, the way many people use them, since headers are so trivial to forge.
If you are dealing with a platform that you don't control, and this is something you need to override (with a dummy value, just to keep the existing system "happy,") then a reverse proxy in front of the origin server could serve such a purpose, with Cloudfront using the reverse proxy as its origin.
In today's newsletter amazon announced that it is now possible to forward request headers with cloudfront. See: http://aws.amazon.com/de/about-aws/whats-new/2014/06/26/amazon-cloudfront-device-detection-geo-targeting-host-header-cors/

Is there any way to identify requests coming to custom origin server from CloudFront?

I'm using CloudFront with custom origin and want to redirect certain requests coming to a web app to CloudFront (clients use direct URLs, which cannot be changed to CloudFront-based URLs). In order to ensure that cache on CloudFront is updated properly, I must not redirect requests coming from CloudFront itself. Is there any way to identify such requests on origin server?
Does CloudFront add any custom headers to requests sent to origin server? Or is there any other reliable way to determine that requests is coming from CloudFront?
yes you can identify requests coming to your origin server from cloudfront by checking the useragent. the user agent would be 'Amazon CloudFront'
Update
It's an old question, but my update useful for someone research or looking for the new solution.
Recently AWS added new feature Origin Custom Headers.You can set a header with a secret value and check it on your origin server by the web server or your applications.
Update
Avinash Bijja correctly pointed out (+1) that the HTTP User-agent header would be 'Amazon CloudFront' for requests coming from Amazon CloudFront servers. Unfortunately this doesn't seem to be explicitly documented indeed, but is implicitly acknowledged by various posts in the respective forum, see e.g. the AWS Team response to User Agent String - does CF overwrite the user agent string?:
You are correct. The User-Agent field is always populated as "Amazon CloudFront".
However, it turns out this is not currently entirely reliable, insofar CloudFront sends an empty User-Agent to the origin if one is missing in the originating client request already:
I can confirm that CloudFront is not sending a User-Agent to the
origin when the original client does not send a User-Agent. We have
enhancements & fixes to User-Agent handling on our backlog, but no
release dates at this time. I've sent you a PM with further details.
These enhancements & fixes are apparently not rolled out still as of February 07 2013 at least.
These enhancements & fixes have been rolled out as of August 05 2013 (thanks webbiedave for the update!).
Initial Answer
Does CloudFront add any custom headers to requests sent to origin
server?
One would think so indeed, but at least they don't appear to be documented where I would have expected it, namely in How CloudFront Processes and Forwards Requests to Your Custom Origin Server. Given you are in control of the origin server, you might just check its HTTP access logs though?
Or is there any other reliable way to determine that requests is
coming from CloudFront?
You'll need to judge the reliability yourself, but The IP address that CloudFront forwards to the origin server is the IP addresses of a CloudFront server, not the IP address of the end user's computer. - consequently you could restrict access to the published Amazon CloudFront Public IP Ranges; however, be aware of the respective disclaimer:
The CloudFront IP addresses change frequently and we cannot guarantee
advance notice of changes. On a best-effort basis, we will provide the
list of current addresses. Customers should not use these addresses
for mission critical applications and must never hard code them in DNS
names. [emphasis mine]
Consequently you'll need to monitor this forum/post to take notice of respective changes as early as possible (if this constraint is acceptable for your use case in the first place of course).
CloudFront appears to add a X-Amz-Cf-Id header to every request before forwarding it to the origin. At least, it currently is doing that for me.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RequestAndResponseBehaviorCustomOrigin.html#request-custom-headers-behavior
This should probably be a comment on Reza's answer, but I can't do that :).
For completeness, here's the link to the official documentation regarding Forwarding Custom Headers, which currently claims the following.
You can configure CloudFront to include custom headers whenever it forwards a request to your origin. You can specify the names and values of custom headers for each origin, both for custom origins and for Amazon S3 buckets. Custom headers have a variety of uses, such as the following:
You can identify the requests that are forwarded to your custom origin by CloudFront. This is useful if you want to know whether users are bypassing CloudFront or if you're using more than one CDN and you want information about which requests are coming from each CDN. (If you're using an Amazon S3 origin and you enable Amazon S3 server access logging, the logs don't include header information.)

Resources