Is it possible to use AWS Cloudfront to proxy Google Fonts? - amazon-cloudfront

I have created an AWS Cloudfront distribution in an attempt to proxy requests to fonts.googleapis.com through Cloudfront. So for example, I'd like to use something list this:
https://xxxxxx.cloudfront.net/css2?family=Noto+Sans+HK:wght#400;500;700;900&display=swap
To fetch the actual content from the origin at:
https://fonts.googleapis.com/css2?family=Noto+Sans+HK:wght#400;500;700;900&display=swap
I have configured Cloudfront with an origin of "fonts.googleapis.com" and set it so that it passes through all URL parameters, but still the origin responds with:
404. That’s an error.
The requested URL /css2 was not found on this server.
Does anyone know what could be causing this? Afaik, the way I've configured Cloudfront should act like a transparent pass-through.
I can't share all of the Cloudfront config settings here (there are too many), but perhaps someone can point me in the right direction?
Or is this impossible?

This in fact did work fine. I had just setup the CloudFront distribution incorrectly.

I suspect the change OP referred to was to update the behavior to prevent the origin request from sending the host header. Create a cache policy, remove the host header from the origin request and everything will work magically for you.

Related

CloudFront - How to forward all request headers to the origin

At CloudFront behaviour setting, is "All" the one to forward all request headers to the origin?
Values That You Specify When You Create or Update a Distribution
If you configure CloudFront to forward all headers to your origin for a cache behavior, CloudFront never caches the associated objects. Instead, CloudFront forwards all requests for those objects to the origin. In that configuration, the value of Minimum TTL must be 0.
Yes, it is.
The documentation seems to focus more on caching based on headers and less on what's forwarded, but caching on headers and forwarding headers to the origin go hand-in-hand.
As I was looking for clear citations from the documentation, one reference I found in the Amazon CloudFront Developer Guide is the one shown below. It's a link to a section titled "Cache Based on Selected Request Headers" but its anchor tag is DownloadDistValuesForwardHeaders.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/distribution-web-values-specify.html#DownloadDistValuesForwardHeaders
This suggests that someone has tried to clarify or simplify the documemtation... with apparently limited success.
Note that this forwards almost all headers to the origin, except for some that are still stripped for security and/or operational reasons, like X-Forwarded-Proto, X-Real-IP, and X-Edge-*.
Note also that if your origin protocol is HTTPS and you were not already whitelisting the Host header at CloudFront, then whitelisting all headers will potentially change the requirements for the origin's TLS certificate. Failure to handle this correctly is one of several reasons why CloudFront might return a 502 error to the viewer.
The layout has changed a bit since this question was asked and answered.
In the "behavior" settings, it is now necessary to select "Legacy cache settings" for these options to be visible. You can select "all" or a specific set of headers to forward. Below are a set of headers that allow websocket connections to work:

cloudfront fail to request objects in behavior

I have setup cloudfront, elb and my ec2 web server for default behavior (no caching), everything is working fine. There is only 1 origin (the elb) and the origin path is empty.
Now I want to cache static stuff with cloudfront from the web server (wildfly) like js/css, they're all served in /my-context/assets folder
So i add a new behavior with path pattern '/my-context/assets/*' and default cache settings using the same origin.
This is not working, my request login page return the page html itself, but all css/js are failed. Request to /my-context/assets/a/b/some.css return 502 with "CloudFront wasn't able to connect to the origin."
I also tried to setup a new origin (with the same elb) with path "/my-context/assets" for the new behavior, it also fail.
Can I have instruction on how to make this work? or is this actually not do-able?
Thank you!
The solution is to configure the cache behavior to forward (whitelist) the Host: header to the origin, from the incoming request.
This is not to imply that it's the "correct" configuration in every case, but many times it is desirable, or even required.
When CloudFront makes a back-end https connection to your origin server, the certificate offered by the server has to not only be valid (not expired, not self-signed, issued by a trusted CA, and with an intact intermediate chain) but also has to be valid for the request CloudFront will be sending.
For CloudFront to use HTTPS when communicating with your origin, one of the domain names in the certificate must match one or both of the following values:
• The value that you specified for Origin Domain Name for the applicable origin in your distribution.
• If you configured CloudFront to forward the Host header to your origin, the value of the Host header.
The SSL/TLS certificate on your origin includes a domain name in the Common Name field and possibly several more in the Subject Alternative Names field. (CloudFront supports wildcard characters in certificate domain names.) If your certificate doesn't contain any domain names that match either Origin Domain Name or the domain name in the Host header, CloudFront returns an HTTP status code 502 (Bad Gateway) to the viewer.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/SecureConnections.html#SecureConnectionsHowToRequireCustomProcedure
In your case, you originally were running requests through CloudFront with caching disabled, which is typically done by configuring CloudFront to forward all request headers to the origin, as this automatically disables caching of responses.
Later, when you attempted configure a second cache behavior so that objects matching certain path patterns could be cached, you naturally did not forward all headers to the origin -- but in this case, forwarding the Host: header (which CloudFront refers to as "whitelisting" the header for forwarding) was necessary, because CloudFront appeared to have needed that information in order to validate the certificate that the origin server was presenting.
If you don't forward the Host: header, the the certificate must match the Origin Domain Name, as noted above, and in your case, this us apparently not the case. If the Host: header is not whitelisted for forwarding, then CloudFront still sends a host header in the back-end request, but this header is set to the same value as Origin Domain Name, hence the reason the certificate must match that value.
If matching one way or the other were not required (along with all the other conditions CloudFront imposes on HTTPS connections to the origin), this would prevent CloudFront from determining with reasonable certainty that the back end connection was being handled by the intended server, and that the origin server is genuinely the server it claims to be, which is one of two protections provided by TLS/SSL (the other protection, of course, is the actual encryption of traffic).

Disable Serving from Default Cloudfront Hostname (ourdistid.cloudfront.net)

I've setup an alternate domain name for our Cloudfront distribution so we can serve from oursite.com. We'd like to disable ourdistid.cloudfront.net so our site is only accessible from one hostname. Is this possible?
Yes, you can do this, though perhaps not in the place where you might expect to.
By default, CloudFront sets the Host: header in the request sent to the origin server to have the value of the origin server hostname.
However, you can configure CloudFront to forward the original request's host header to the origin server, instead. It doesn't change how the request is routed, only the header that gets forwarded.
After that, it is a simple matter to configure your web server to return the response you want, when the request's Host: header matches the *.cloudfront.net host, which can include a generic error page with whatever code you seem most appropriate, such as 503 Service Unavailable, 404 Not Found, 403 Forbidden, or 410 Gone. You could even use 301 Moved Permanently. Whatever makes the most sense to you.
You can't literally disable the assigned endpoint, but you can prevent it from returning any of your content.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RequestAndResponseBehaviorCustomOrigin.html

Cloudfront origin pull from a folder

I have setup a CloudFront origin pull server. It allows me to set a domain name, which I have. This works.
But I don't want the whole domain to be the origin. I want
mydomain.com/folder/subfolder
to be the origin. Also, the cloudfront distribution is CNAMEd to a cdn, which is setup via DNS to cloudfront. This seems to work.
So, basically, instead of this URL:
xyz.cloudfront.net/folder/subfolder/1.jpg
I want this instead:
cdn.mydomain.com/1.jpg
Currently I have achieved, via CNAME and origin pull:
cdn.mydomain.com/folder/subfolder/1.jpg
The question is: on CloudFront how do I setup an origin pull from a folder, not from the main domain name?
The accepted answer is out of date. This is possible using the "Origin Path" setting in AWS which will rewrite the request to a sub-folder on the origin:
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/distribution-web-values-specify.html#DownloadDistValuesOriginPath
I am not aware of a way to do this cloudfront. However, you create a virtual host at your origin for subfolder.example.com and have it's root directory be the directory you mentioned. Then you could set subfolder.example.com as your origin for the default cache behavior.

Is there any way to identify requests coming to custom origin server from CloudFront?

I'm using CloudFront with custom origin and want to redirect certain requests coming to a web app to CloudFront (clients use direct URLs, which cannot be changed to CloudFront-based URLs). In order to ensure that cache on CloudFront is updated properly, I must not redirect requests coming from CloudFront itself. Is there any way to identify such requests on origin server?
Does CloudFront add any custom headers to requests sent to origin server? Or is there any other reliable way to determine that requests is coming from CloudFront?
yes you can identify requests coming to your origin server from cloudfront by checking the useragent. the user agent would be 'Amazon CloudFront'
Update
It's an old question, but my update useful for someone research or looking for the new solution.
Recently AWS added new feature Origin Custom Headers.You can set a header with a secret value and check it on your origin server by the web server or your applications.
Update
Avinash Bijja correctly pointed out (+1) that the HTTP User-agent header would be 'Amazon CloudFront' for requests coming from Amazon CloudFront servers. Unfortunately this doesn't seem to be explicitly documented indeed, but is implicitly acknowledged by various posts in the respective forum, see e.g. the AWS Team response to User Agent String - does CF overwrite the user agent string?:
You are correct. The User-Agent field is always populated as "Amazon CloudFront".
However, it turns out this is not currently entirely reliable, insofar CloudFront sends an empty User-Agent to the origin if one is missing in the originating client request already:
I can confirm that CloudFront is not sending a User-Agent to the
origin when the original client does not send a User-Agent. We have
enhancements & fixes to User-Agent handling on our backlog, but no
release dates at this time. I've sent you a PM with further details.
These enhancements & fixes are apparently not rolled out still as of February 07 2013 at least.
These enhancements & fixes have been rolled out as of August 05 2013 (thanks webbiedave for the update!).
Initial Answer
Does CloudFront add any custom headers to requests sent to origin
server?
One would think so indeed, but at least they don't appear to be documented where I would have expected it, namely in How CloudFront Processes and Forwards Requests to Your Custom Origin Server. Given you are in control of the origin server, you might just check its HTTP access logs though?
Or is there any other reliable way to determine that requests is
coming from CloudFront?
You'll need to judge the reliability yourself, but The IP address that CloudFront forwards to the origin server is the IP addresses of a CloudFront server, not the IP address of the end user's computer. - consequently you could restrict access to the published Amazon CloudFront Public IP Ranges; however, be aware of the respective disclaimer:
The CloudFront IP addresses change frequently and we cannot guarantee
advance notice of changes. On a best-effort basis, we will provide the
list of current addresses. Customers should not use these addresses
for mission critical applications and must never hard code them in DNS
names. [emphasis mine]
Consequently you'll need to monitor this forum/post to take notice of respective changes as early as possible (if this constraint is acceptable for your use case in the first place of course).
CloudFront appears to add a X-Amz-Cf-Id header to every request before forwarding it to the origin. At least, it currently is doing that for me.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RequestAndResponseBehaviorCustomOrigin.html#request-custom-headers-behavior
This should probably be a comment on Reza's answer, but I can't do that :).
For completeness, here's the link to the official documentation regarding Forwarding Custom Headers, which currently claims the following.
You can configure CloudFront to include custom headers whenever it forwards a request to your origin. You can specify the names and values of custom headers for each origin, both for custom origins and for Amazon S3 buckets. Custom headers have a variety of uses, such as the following:
You can identify the requests that are forwarded to your custom origin by CloudFront. This is useful if you want to know whether users are bypassing CloudFront or if you're using more than one CDN and you want information about which requests are coming from each CDN. (If you're using an Amazon S3 origin and you enable Amazon S3 server access logging, the logs don't include header information.)

Resources