Handling of requests with Header Accept-Encoding in CloudFront vs request without the header Accept-Encoding - amazon-cloudfront

I have a Cloud Front Distribution to cache my images. my origin server is NOT S3, its some server i run.
I use these images in my website(taking the advantage of CF caching). Now to explain the problem, lets assume in my home page i am using an image called banner.png.
I visit my home page lets say from chrome for the first time - for banner.png its a cache miss, so it gets fetched fro origin and cached in CF.
After this i visit my page from FF,opera, Chromium, GET "banner.png" using postman - this all gets me the file from CF cache.
Now i GET "banner.png" using insomnia (Another rest client) - Now CF doesn't send me from cache, it goes back to origin to get the image, and reply me with **"x-cache: RefreshHit from cloudfront"**.
the difference between these 2 sets of clients are first set of clients sends "Accept-Encoding: gzip" header in the request and second client did not.
in my CF behaviour -
"Cache Based on Selected Request Headers" = NONE
Objects Automatically" = NO"Compress
.
Any pointers ?

CloudFront keeps two different copies of cache based on Accept-encoding.
One if Header contains Accept-encoding: gzip
Accept-encoding: any other value or without the header.
You can test it using curl, first without accept-encoding and second request with accept-encoding: gzip and you'll see MISS from CloudFront, this is expected with CloudFront.
The reason being is that CloudFront supports only gzip compression and it keeps this header into consideration to know if it needs to compress the response or not.
However, Your problem seems different, You're seeing Refersh from CloudFront which happens when CloudFront TTLs/Max-age expires and CloudFront is making a Condition GET to the origin to know if the content has been modified or not.
Ideally, it should be a Miss From CloudFront if no accept-header is present.

Related

Browser loads response from cache although no-cache header is set

I'm working on a web app and I'm having the following problem:
When I go on some page my server sends a response with cache-control: no-cache header.
Then I do some changes (graphql mutations) on that page.
When I go to an other page and then click browser back then my browser reads the outdated "response" from the disk cache instead of sending a request to the server to get the change data.
browser loads response from cache although no-cache header is set
I wondering if there is something missing in my headers telling the browser to not use the disk cache?
Some info:
The browser does not send a request to my server. (So it is not cached somewhere else.)
It is not the back-forward cache. (There is already some logic handling the bfcache.)
I can reproduce it in all my browsers. (e.g. Firefox, Chrome, ...)
When I disable the disk cache in the Firefox settings then it is working correctly. (Now, the bfcache kicks in.)
I also found the following thread. Is there a better solution?
Chrome is caching even with HTTP no-cache headers

How to set cache-control to always check for updates but always fall back to cache if server is unreachable

I'm missing something trying to understand cache-control (e.g., from https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control).
How do I set up cache control to accomplish the following (I'll be using an .htaccess file):
If client fetches a file, it should always store it in the cache.
When client needs a file, it should always check to see if the file has been changed and download a new copy if it has changed.
If the attempt to check fails -- e.g., server down or no Internet connection -- client should always use a cached copy if available, no matter how old. Any copy is better than none.
Use Cache-Control: no-cache and set the ETag header.
The resource will be stored in the cache. This is true of any cache header other than no-store.
no-cache tells the client that it must check with the server to see if the cached copy is valid. It does this by sending a conditional request, which requires that the cached response have an ETag (or Last-Modified) header.
Using a cached copy of a resource when there's no connectivity is the default behavior. You could prevent this with the must-revalidate directive.

Azure Verizon CDN - 100% Cache CONFIG_NOCACHE

I set up an Azure Verizon Premium CDN a few days ago as follows:
Origin: An Azure web app (.NET MVC 5 website)
Settings: Custom Domain, no geo-filtering
Caching Rules: standard-cache (doesn't care about parameters)
Compression: Enabled
Optimized for: Dynamic site acceleration
Protocols: HTTP, HTTPS, custom domain HTTPS
Rules: Force HTTPS via Rules Engine (if request scheme = http, 301 redirect to https://{customdomain}/$1)
So - this CDN has been running for a few days now, but the ADN reports are saying that nearly 100% (99.36%) of the cache status is "CONFIG_NOCACHE" (Description: "The object was configured to never be cached in accordance with customer-specific configurations residing on the edge servers, so the response was served via the origin server.") A few (0.64%) of them are "NONE" (Description: "The cache was bypassed entirely for this request. For instance, the request was immediately rejected by the token auth module, or the client request method used an uncacheable request method such as "PUT".") Also, in the "Cache Hit" report, it says "0 hits, 0 misses" for every day. Nothing is coming through the "HTTP Large" side, only "ADN".
I couldn't find these exact messages while searching around, but I've tried:
Updating cache-control header to max-age, public (ie: cache-control: public,max-age=1209600)
Updating the cache-control header to max-age (cache-control: max-age=1209600)
Updating the expires header to a date way in the future (expires: Tue, 19 Jan 2038 03:14:07 GMT)
Using different browsers so the request cache info is different. In Chrome, the request is "cache-control: no-cache" in my browser. In Firefox, it'll say "Cache-Control: max-age=0". In any case, I'd assume the users on the website wouldn't have these same settings, right?
Refreshing the page a bunch of times, and looking at the real time report to see hits/misses/cache statuses, and it shows the same thing - CONFIG_NOCACHE for almost everything.
Tried running a "worldwide" speed test on https://www.dotcom-tools.com/website-speed-test.aspx, but that had the same result - a bunch of "NOCACHE" hits.
Tried adding ADN rules to set the internal and external max age to 864000 sec (10 days).
Tried adding an ADN rule to ignore "no-cache" requests and just return the cached result.
So, the message for "NOCACHE" says it's a node configuration issue... but I haven't really even configured it! I'm so confused. It could also be an application issue, but I feel like I've tried all the different permutations of "cache-control" that I can. Here's an example of one file that I'd expect to be cached:
Ultimately, I would hope that most of the requests are being cached, so I'd see most of the requests be "TCP Hit". Maybe that's incorrect? Thanks in advance for your help!
So, I eventually figured out this issue. Apparently Azure Verzion Premium CDN ADN platform has "bypass cache" enabled by default.
To disable this behavior you need to configure additional features to your caching rules.
Example:
IF Always
Features:
Bypass Cache Disabled
Force Internal Max-Age Response 200 864000 Seconds
Ignore Origin No-Cache 200

"Accept-Language" header missing in http request from the browser

We have come across an issue in production logs where "Accept-Language" is missing in the http request from the browser. Although I am not able to replicate it so I want to understand any valid use case where any specific browser may send a request without "Accept-Language" header.
Even GET / HTTP/1.0 is a valid HTTP request. You can create one from the telnet client if you wish and it will still return a result from the server!
Accept-Language is a header to aid in content negotiation and is optional. The most widely used browsers send the correct headers, but there may be corporate proxies who may be filtering such headers. You should not rely on this header being present.

Amazon Cloudfront removes Referer header

I am using Amazon CloudFront to deliver some HDS files. I have an origin server which check the HTTP HEADER REFERER and in case is no allowed it block it.
The problem is that cloud front is removing the referer header, so it is not forwarded to the origin.
Is it possible to tell Amazon not to do it?
Within days of writing the answer below, changes have been announced to Cloudfront. Cloudfront will now pass through headers you select and can add some headers of its own.
However, much of what I stated below remains true. Note that in the announcement, an option is offered to forward all headers which, as I suggested, would effectively disable caching. There's also an option to forward specific headers, which will cause Cloudfront to cache the object against the complete set of forwarded headers -- not just the uri -- meaning that the effectiveness of the cache is somewhat reduced, since Cloudfront has no option but to assume that the inclusion of the header might modify the response the server will generate for that request.
Each of your CloudFront distributions now contains a list of headers that are to be forwarded to the origin server. You have three options:
None - This option requests the original behavior.
All - This option forwards all headers and effectively disables all caching at the edge.
Whitelist - This option give you full control of the headers that are to be forwarded. The list starts out empty, and grows as you add more headers. You can add common HTTP headers by choosing them from a list. You can also add "custom" headers by simply entering the name.
If you choose the Whitelist option, each header that you add to the list becomes part of the cache key for the URLs associated with the distribution. Adding a header to the list simply tells CloudFront that the value of the header can affect the content returned by the origin server.
http://aws.amazon.com/blogs/aws/enhanced-cloudfront-customization/
Cloudfront does remove the Referer header along with several others that are not particularly meaningful -- or whose presence would cause illogical consequences -- in the world of cached content.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RequestAndResponseBehaviorCustomOrigin.html
Just like cookies, if the Referer: header were allowed to remain, such that the origin could see it and react to it, that would imply that the object should be cached based on the request plus the referring page, which would seem to largely defeat the cachability of objects. Otherwise, if the origin did react to an undesired referer and send no-cache responses, that would be all well and good until the first legitimate request came in, the response to which would be served to subsequent requesters regardless of their referer, also largely defeating the purpose.
RFC-2616 Section 13 requires that a cache return a response that has been "checked for equivalence with what the origin server would have returned," and this implies that the response be valid based on all headers in the request.
The same thing goes for User-agent and other headers an origin server might use to modify its response... if you need to react to these values at the origin, there's little obvious purpose for serving them with a CDN.
Referring page-based tests are quite a primitive measure, the way many people use them, since headers are so trivial to forge.
If you are dealing with a platform that you don't control, and this is something you need to override (with a dummy value, just to keep the existing system "happy,") then a reverse proxy in front of the origin server could serve such a purpose, with Cloudfront using the reverse proxy as its origin.
In today's newsletter amazon announced that it is now possible to forward request headers with cloudfront. See: http://aws.amazon.com/de/about-aws/whats-new/2014/06/26/amazon-cloudfront-device-detection-geo-targeting-host-header-cors/

Resources