How to invalidate with custom cache policy - amazon-cloudfront

Context
I have a distribution where I added the host to the cache policy
These 2 domains point to the same distribution:
www.site1.com/pageA
www.site2.com/pageA
these 2 hosts have their respective cache entry, In this setup, I have a custom origin response lambda on edge that will return different content base on the host.
The question:
I'm use to invalidate based on the path ex: /pageA
how should I format my invalidation if I want to only invalidate pageA for site1?

It is currently not possible to invalidate by domain.
Cloudfront invalidation is by path only.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Invalidation.html

Related

Can we have two entries in AMConfig.properties for `com.iplanet.am.server.host`?

Can we have two entries in AMConfig.properties com.iplanet.am.server.host?
eg.
com.iplanet.am.server.host=server1.example.com,server1.example.info
OR
com.iplanet.am.server.host=server1.example.com
com.iplanet.am.server.host=server1.example.info
If not, how can we configure two identity_servers?
Update: Just one OpenAM instance servicing multiple FQDNs.
You can't have 2 entries for com.iplanet.am.server.host in AMConfig.properties.
You just need to configure an fqdnMap entry
com.sun.identity.server.fqdnMap[server1.example.info]=server1.example.info
as advanced server property. However please keep in mind that in your case the two FQDNs do not share a common cookie domain. From cookie security point of view you should use host-based cookies anyway by removing all cookie domains from OpenAM's platform service (global configuration). If you still want to use domain cookies, make sure to have cookie domains
example.info
example.com
set in platform service. Make sure you understand 'cookie spec'

Azure CDN update for WebApp

I have a setup a azure cdn that point to my webapp. while i am changing in my style sheet and deploying webapp, the styles are updating immediately. so is there no any rquiremtn for purge in this case? does in this case cdn automatically update styles from webapp?
I am working according to this article
https://azure.microsoft.com/en-in/documentation/articles/cdn-websites-with-cdn/
If the URL of the resource remains the same, the CDN servers (and the browsers) are free to cache them. So, if you are using CDN, you need to force a URL change every time the file content changes (commonly done by adding a version string).
Since, it is working for you, either your files are not getting served from the CDN at all or somehow the URL is getting updated.
Look at the URL from where your style sheet is getting fetched (network tab in the browser's debugger). Make sure the URL path is actually from the CDN and not your website directly.
If you have a MVC.net app and you are using System.Web.Optimization.BundleCollection for style bundle, it add a query parameter to the URL embedded in the HTML and changes it if the file contents change. This ensures that the stale cached copies of the resources are not used.
See CDN and bundle caching sections at http://www.asp.net/mvc/overview/performance/bundling-and-minification
No, CDN does not automatically update the CSS for webapp.
To be safe, you should always purge.
CDN is a global service, you saw the CSS update doesn't mean everyone else all see the CSS update. Another IP address might still have the old CSS cached.
Besides, cache control header also plays a role here.

CloudFront multiple origins configuration

I would like to start using CF on my website hosted here, I have a few questions that maybe some of you aws guys knows, thanks for your help.
my website have multiple origins that are located over different servers and different ELB, i would like to configure CF in order to collect data from different origins and provide content using different policy and remove a layer of reverse proxy that I've in place right now plus caching some stuff.
Below my idea about the CF configuration:
Origins:`
www.pippo.com > XXX.cloudfront.net
Origin 1: pluto.pippo.com = xxx.elb1.aws.amazon.com > WC
Origin 2: paperino.pippo.com = xxx.elb2.aws.amazon.com > WC
Origin 3: minnie.pippo.com = Apache\Nginx\Tomcat`
Behaviors:`
Origin 1: pluto.pippo.com/*.jpg cache
Origin 1: pluto.pippo.com/*.png cache
Origin 1: pluto.pippo.com/*.* cache
Origin 1: pluto.pippo.com Default(*) NON cache
Origin 2: paperino.pippo.com/paywall/* NON cache
Origin 2: peperino.pippo.com/*.png cache
Origin 2: peperino.pippo.com/*.jpg cache
Origin 2: paperino.pippo.com Default(*) NON cache
Origin 3: minnie.pippo.com/*.jpg cache
Origin 3: minnie.pippo.com/*.png cache
Origin 3: minnie.pippo.com/*.* cache
Origin 3: minnie.pippo.com Default(*) NON cache`
Questions:
When my users open www.pippo.com CF will provide the cached content (*.jpg \ .png in the example) and eveything is not specified in the behaviors will be directly requested (using the default () policy) from the ELB to the webcaches. Correct? From the CF or from the user?
How i can prevent that users go directly to pluto.pippo.com ? just a 301 with exception for the CF subnet?
Using this configuration Sticky Sessions will be maintained?
Sorry for the newbie question.
Thanks for any help.
This may not completely answer your questions, however:
Sticky sessions: You'll need to forward all cookies (or whitelist the AWSELB cookie) for your default rule (assuming this captures your page requests - unless you're using page extensions, in which case they'd be caught by the . rule).
Presumably you don't want your cached content (.jpg .png .) to obey sticky sessions.
Preventing users from accessing the origin: There isn't an really effective way to do this for custom origin. You can try security through obscurity (create an obscure origin domain) and / or only allow direct requests with an 'Amazon Cloudfront' User-Agent.

Can Varnish Read a List of Backend Hosts from a Text File

Is there any way for varnish to read a list of backend urls from a text file, and then proxy cache misses to a random url taken from the text file?
What I imagine is something like this pseudocode...
/var/services/backend-urls.conf
http://backend-host-1/path/to/application
http://backend-host-2/path/to/application
http://backend-host-3/path/to/application
# etc
varnish config
sub vcl_miss {
// read a list of urls from a text file
backendHosts = readFile("/var/services/backend-urls.conf");
//choose a random url from the file
randomHost = chooseLineAtRandom(backendHosts);
//proxy the request to the random host
set req.backend = randomHost;
}
To provide some background, I work on a server system that comprises a number of backend applications that currently sit behind a front-end running apache. We are evaluating replacing the apache layer with varnish so we can benefit from the caching capabilities of varnish. We also have a service discovery framework that knows the endpoint locations for each backend application (the endpoint urls change periodically as new hosts emerge or are taken out of service).
Currently we use the RewriteMap functionality in mod_rewrite to route requests to the backend services. Then we have a process to maintain the lists of backend services based upon the contents of the service discovery framework.
All this works well for us in apache, except that apache is like using a sledgehammer to crack a nut. All we really want is the reverse proxy loigc, and the caching in varnish would be helpful too.
Is there any way to have varnish read the list of backend urls from an external resource?
Without resorting to custom vmod/c modules, the quick answer is no.
The VCL instruction are being compiled within varnish, and that rules out run-time inclusions.
But why not include within the VCL a separate backend vcl which includes the current backends.
that vcl file could be written out on demand. Then using varnishadm CLI command you could request a new compile of the VCL, therefore bringing the config live.
I can see two potential solutions.
The first is to have something generate your VCL and backends such as Chef or some custom scripting. You can then process the text file into backend definitions and the necessary VCL to invoke them. To handle the requirement for the random backend you could use a director. I've not dealt with directors myself but it looks like they are meant to solve that requirement. When changes to the backends occur you could rerun the generation script/Chef and tell Varnish to reload its configuration either using varnishadm or service varnish reload to avoid a full restart.
The second would be to implement it in C, either via a VMOD as Marcel Dumont suggests or possibly using inline C in your VCL.
With vmod_dynamic you can just use any DNS name as a backend or even service records.
For your use case, one option would be to set up an SRV record in DNS pointing to all your servers and then just use that as for example in the basic-stub.vtc test case.

Is there any way to identify requests coming to custom origin server from CloudFront?

I'm using CloudFront with custom origin and want to redirect certain requests coming to a web app to CloudFront (clients use direct URLs, which cannot be changed to CloudFront-based URLs). In order to ensure that cache on CloudFront is updated properly, I must not redirect requests coming from CloudFront itself. Is there any way to identify such requests on origin server?
Does CloudFront add any custom headers to requests sent to origin server? Or is there any other reliable way to determine that requests is coming from CloudFront?
yes you can identify requests coming to your origin server from cloudfront by checking the useragent. the user agent would be 'Amazon CloudFront'
Update
It's an old question, but my update useful for someone research or looking for the new solution.
Recently AWS added new feature Origin Custom Headers.You can set a header with a secret value and check it on your origin server by the web server or your applications.
Update
Avinash Bijja correctly pointed out (+1) that the HTTP User-agent header would be 'Amazon CloudFront' for requests coming from Amazon CloudFront servers. Unfortunately this doesn't seem to be explicitly documented indeed, but is implicitly acknowledged by various posts in the respective forum, see e.g. the AWS Team response to User Agent String - does CF overwrite the user agent string?:
You are correct. The User-Agent field is always populated as "Amazon CloudFront".
However, it turns out this is not currently entirely reliable, insofar CloudFront sends an empty User-Agent to the origin if one is missing in the originating client request already:
I can confirm that CloudFront is not sending a User-Agent to the
origin when the original client does not send a User-Agent. We have
enhancements & fixes to User-Agent handling on our backlog, but no
release dates at this time. I've sent you a PM with further details.
These enhancements & fixes are apparently not rolled out still as of February 07 2013 at least.
These enhancements & fixes have been rolled out as of August 05 2013 (thanks webbiedave for the update!).
Initial Answer
Does CloudFront add any custom headers to requests sent to origin
server?
One would think so indeed, but at least they don't appear to be documented where I would have expected it, namely in How CloudFront Processes and Forwards Requests to Your Custom Origin Server. Given you are in control of the origin server, you might just check its HTTP access logs though?
Or is there any other reliable way to determine that requests is
coming from CloudFront?
You'll need to judge the reliability yourself, but The IP address that CloudFront forwards to the origin server is the IP addresses of a CloudFront server, not the IP address of the end user's computer. - consequently you could restrict access to the published Amazon CloudFront Public IP Ranges; however, be aware of the respective disclaimer:
The CloudFront IP addresses change frequently and we cannot guarantee
advance notice of changes. On a best-effort basis, we will provide the
list of current addresses. Customers should not use these addresses
for mission critical applications and must never hard code them in DNS
names. [emphasis mine]
Consequently you'll need to monitor this forum/post to take notice of respective changes as early as possible (if this constraint is acceptable for your use case in the first place of course).
CloudFront appears to add a X-Amz-Cf-Id header to every request before forwarding it to the origin. At least, it currently is doing that for me.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RequestAndResponseBehaviorCustomOrigin.html#request-custom-headers-behavior
This should probably be a comment on Reza's answer, but I can't do that :).
For completeness, here's the link to the official documentation regarding Forwarding Custom Headers, which currently claims the following.
You can configure CloudFront to include custom headers whenever it forwards a request to your origin. You can specify the names and values of custom headers for each origin, both for custom origins and for Amazon S3 buckets. Custom headers have a variety of uses, such as the following:
You can identify the requests that are forwarded to your custom origin by CloudFront. This is useful if you want to know whether users are bypassing CloudFront or if you're using more than one CDN and you want information about which requests are coming from each CDN. (If you're using an Amazon S3 origin and you enable Amazon S3 server access logging, the logs don't include header information.)

Resources