Cloudfront origin pull from a folder - dns

I have setup a CloudFront origin pull server. It allows me to set a domain name, which I have. This works.
But I don't want the whole domain to be the origin. I want
mydomain.com/folder/subfolder
to be the origin. Also, the cloudfront distribution is CNAMEd to a cdn, which is setup via DNS to cloudfront. This seems to work.
So, basically, instead of this URL:
xyz.cloudfront.net/folder/subfolder/1.jpg
I want this instead:
cdn.mydomain.com/1.jpg
Currently I have achieved, via CNAME and origin pull:
cdn.mydomain.com/folder/subfolder/1.jpg
The question is: on CloudFront how do I setup an origin pull from a folder, not from the main domain name?

The accepted answer is out of date. This is possible using the "Origin Path" setting in AWS which will rewrite the request to a sub-folder on the origin:
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/distribution-web-values-specify.html#DownloadDistValuesOriginPath

I am not aware of a way to do this cloudfront. However, you create a virtual host at your origin for subfolder.example.com and have it's root directory be the directory you mentioned. Then you could set subfolder.example.com as your origin for the default cache behavior.

Related

Is it possible to use AWS Cloudfront to proxy Google Fonts?

I have created an AWS Cloudfront distribution in an attempt to proxy requests to fonts.googleapis.com through Cloudfront. So for example, I'd like to use something list this:
https://xxxxxx.cloudfront.net/css2?family=Noto+Sans+HK:wght#400;500;700;900&display=swap
To fetch the actual content from the origin at:
https://fonts.googleapis.com/css2?family=Noto+Sans+HK:wght#400;500;700;900&display=swap
I have configured Cloudfront with an origin of "fonts.googleapis.com" and set it so that it passes through all URL parameters, but still the origin responds with:
404. That’s an error.
The requested URL /css2 was not found on this server.
Does anyone know what could be causing this? Afaik, the way I've configured Cloudfront should act like a transparent pass-through.
I can't share all of the Cloudfront config settings here (there are too many), but perhaps someone can point me in the right direction?
Or is this impossible?
This in fact did work fine. I had just setup the CloudFront distribution incorrectly.
I suspect the change OP referred to was to update the behavior to prevent the origin request from sending the host header. Create a cache policy, remove the host header from the origin request and everything will work magically for you.

Origin for a CDN

I have some basic questions regarding configuring the CDN. I am using Amazon CloudFront for that.
1) Let's suppose my website is example.com. In the origin of cloudfront, do I mention example.com as the origin or create a CNAME like cdn.example.com which points to the server and then enter cdn.example.com as the origin?
2) Once the configuration is done, do I redirect example.com to the cloudfront domain like dxxxxxx.cloudfront.net?
3) I will update all the links in my website to http://dxxxxxx.cloudfront.net/xxx. Now when I browse example.com, I will be redirected to cloudfront. But cloudfront is also using the example.com as the origin. Isn't it like cloudfront is trying to pull data from itself? Won't that create a dead loop?
I am not able to get my head around this. I will be really grateful if someone could help. Thanks!!
Here is how it works.
Your website is at example.com where all the static files are hosted that you want to serve through Cloudfront. This example.com is called the Origin Server, Origin Host, or simply the Origin.
Cloudfront will create a Pull Zone for you that will look like http://dxxxxxx.cloudfront.net - Now you have to use this host instead of the original example.com for your static assets. All HTML files or dynamic files will still be loaded directly through example.com . Users will still enter example.com in their browser. Only the scripts, styles, images, fonts, icons and similar static files that are loaded by the browser behind the scenes are required to be changed to use the CDN host.
Your CDN setup is complete at this point. However, if someone looks at the source code of your page, they can see the cloudfront URLs being used to deliver static assets. This may look unprofessional. As a solution to hide 3rd party Host Name, and to use your own host name to get a branded feel, you can create a new subdomain cdn.example.com at your DNS provider and CNAME it to dxxxxxx.cloudfront.net
If you created a CNAME subdomain above, you now have to update all URLs of static files again, and change their URL to use cdn.example.com . Your website will still be loaded over example.com, but assets will now be delivered through cdn.example.com that will point to dxxxxxx.cloudfront.net
When dxxxxxx.cloudfront.net will receive request from browser for static files, it will forward that request to the specified origin server example.com where the files are actually placed. Origin will send the files to cloudfront, cloudfront will save the file for future use and will send a copy to the browser.
Step 3, and 4 are not part of the CDN integration process. Also, the subdomain cdn.example.com is not a requirement. You can use some other subdomain, or some other domain. For example, the following are valid:
cdn2.example.com
static-assets.example.com
static.assets.example.com
images.example.parent-company-website.com
Similarly, it is not a requirement to fetch assets from example.com only. You can specify my-other-website.net as origin, and cloudfront will happily fetch resources from there for your example.com site.
In your scenario, all of the following are not dependent on each other. You can change any or all of these and the process will not break providing you made necessary adjustments to the configuration and the code.
Your website: example.com
CDN origin: example.com (since currently assets are at this host)
Pull Zone: http://dxxxxxx.cloudfront.net/
CNAME Host: cdn.example.com
Hope this clears the picture.

Specifying origin path in Cloudfront without a redirect

I'm trying to host a github pages site on Cloudfront.
The problem is, the github repo is at username.github.io/repo rather than username.github.io.
If I specify username.github.io as the origin domain, and /repo as the origin path, then going to id.cloudfront.net redirects you fully to username.github.io/repo which is not what I want. I want it to stay at id.cloudfront.net (or mydomain.com aliased to id.cloudfront.net) and display the content of the github site, without redirecting to it.
Removing the origin path fixes this issue and loads the content from username.github.io, but I need the content from username.github.io/repo.
Found the issue - the github pages site has a 'force https' setting - which means it forces a redirect to https://username.github.io/repo if accessed thru http.
I had my origin protocol set to http in cloudfront (default) which was triggering this redirect. Setting it to https fixed the issue.

Disable Serving from Default Cloudfront Hostname (ourdistid.cloudfront.net)

I've setup an alternate domain name for our Cloudfront distribution so we can serve from oursite.com. We'd like to disable ourdistid.cloudfront.net so our site is only accessible from one hostname. Is this possible?
Yes, you can do this, though perhaps not in the place where you might expect to.
By default, CloudFront sets the Host: header in the request sent to the origin server to have the value of the origin server hostname.
However, you can configure CloudFront to forward the original request's host header to the origin server, instead. It doesn't change how the request is routed, only the header that gets forwarded.
After that, it is a simple matter to configure your web server to return the response you want, when the request's Host: header matches the *.cloudfront.net host, which can include a generic error page with whatever code you seem most appropriate, such as 503 Service Unavailable, 404 Not Found, 403 Forbidden, or 410 Gone. You could even use 301 Moved Permanently. Whatever makes the most sense to you.
You can't literally disable the assigned endpoint, but you can prevent it from returning any of your content.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RequestAndResponseBehaviorCustomOrigin.html

Is there any way to identify requests coming to custom origin server from CloudFront?

I'm using CloudFront with custom origin and want to redirect certain requests coming to a web app to CloudFront (clients use direct URLs, which cannot be changed to CloudFront-based URLs). In order to ensure that cache on CloudFront is updated properly, I must not redirect requests coming from CloudFront itself. Is there any way to identify such requests on origin server?
Does CloudFront add any custom headers to requests sent to origin server? Or is there any other reliable way to determine that requests is coming from CloudFront?
yes you can identify requests coming to your origin server from cloudfront by checking the useragent. the user agent would be 'Amazon CloudFront'
Update
It's an old question, but my update useful for someone research or looking for the new solution.
Recently AWS added new feature Origin Custom Headers.You can set a header with a secret value and check it on your origin server by the web server or your applications.
Update
Avinash Bijja correctly pointed out (+1) that the HTTP User-agent header would be 'Amazon CloudFront' for requests coming from Amazon CloudFront servers. Unfortunately this doesn't seem to be explicitly documented indeed, but is implicitly acknowledged by various posts in the respective forum, see e.g. the AWS Team response to User Agent String - does CF overwrite the user agent string?:
You are correct. The User-Agent field is always populated as "Amazon CloudFront".
However, it turns out this is not currently entirely reliable, insofar CloudFront sends an empty User-Agent to the origin if one is missing in the originating client request already:
I can confirm that CloudFront is not sending a User-Agent to the
origin when the original client does not send a User-Agent. We have
enhancements & fixes to User-Agent handling on our backlog, but no
release dates at this time. I've sent you a PM with further details.
These enhancements & fixes are apparently not rolled out still as of February 07 2013 at least.
These enhancements & fixes have been rolled out as of August 05 2013 (thanks webbiedave for the update!).
Initial Answer
Does CloudFront add any custom headers to requests sent to origin
server?
One would think so indeed, but at least they don't appear to be documented where I would have expected it, namely in How CloudFront Processes and Forwards Requests to Your Custom Origin Server. Given you are in control of the origin server, you might just check its HTTP access logs though?
Or is there any other reliable way to determine that requests is
coming from CloudFront?
You'll need to judge the reliability yourself, but The IP address that CloudFront forwards to the origin server is the IP addresses of a CloudFront server, not the IP address of the end user's computer. - consequently you could restrict access to the published Amazon CloudFront Public IP Ranges; however, be aware of the respective disclaimer:
The CloudFront IP addresses change frequently and we cannot guarantee
advance notice of changes. On a best-effort basis, we will provide the
list of current addresses. Customers should not use these addresses
for mission critical applications and must never hard code them in DNS
names. [emphasis mine]
Consequently you'll need to monitor this forum/post to take notice of respective changes as early as possible (if this constraint is acceptable for your use case in the first place of course).
CloudFront appears to add a X-Amz-Cf-Id header to every request before forwarding it to the origin. At least, it currently is doing that for me.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RequestAndResponseBehaviorCustomOrigin.html#request-custom-headers-behavior
This should probably be a comment on Reza's answer, but I can't do that :).
For completeness, here's the link to the official documentation regarding Forwarding Custom Headers, which currently claims the following.
You can configure CloudFront to include custom headers whenever it forwards a request to your origin. You can specify the names and values of custom headers for each origin, both for custom origins and for Amazon S3 buckets. Custom headers have a variety of uses, such as the following:
You can identify the requests that are forwarded to your custom origin by CloudFront. This is useful if you want to know whether users are bypassing CloudFront or if you're using more than one CDN and you want information about which requests are coming from each CDN. (If you're using an Amazon S3 origin and you enable Amazon S3 server access logging, the logs don't include header information.)

Resources