Pagespeed and cache-control in response - mod-pagespeed

I have enabled pagespeed module and find that for some resources (image, js and css) that are re-written by pagespeed the cache is set to the default 5 minutes. Few other resources (image, js and css) re-written by pagespeed has Cache-Control: max-age=31536000.
I explicitly give set by ExpiresDefault to 1 year for all my static resources in .htaccess.
The response i get has this:
Cache-Control:max-age=300,private
I am expecting:
Cache-Control:max-age=31536000,private
Suggestions and pointers are appreciated.

mod_pagespeed only serves responses with Cache-Control:max-age=300,private if the Hash in the URL doesn't match the content. This can happen normally when A) the contents of the resource changed recently and so there are a mixture of requests for both old and new URLs for some time or B) the rewriting does not finish in time when serving the resource.
This is most likely to happen if the resource request goes to a different server than the HTML request. You can try flushing the cache and see if this clears up.

Related

Cache-Control headers in a Cloudflare Workers site

I'm just testing Workers sites with a Hugo-created static site. It already existed, so I used the docs’ instructions for adapting an existing site. The cache-control headers for the woff2 and css files all show up with no-cache, contrary to what I'd have expected based on https://support.cloudflare.com/hc/en-us/articles/200172516#h_a01982d4-d5b6-4744-bb9b-a71da62c160a. The Workers site in question is https://hosts-test-hugo.brycewray.workers.dev/.
I found the following at https://levelup.gitconnected.com/use-cloudflare-javascript-workers-to-deploy-you-static-generated-site-ssg-1c518e078646 but don't know if it's related:
A Cloudflare Worker is a piece of JavaScript code that runs every time you access a specific route on a website proxied by Cloudflare. The code is executed on every request before they reach Cloudflare’s cache. This means Worker responses are not cached (although requests made by the worker to other web services might be cached with the appropriate caching headers).
Does the site need to have a custom domain — i.e., rather than being a “.workers.dev” URL — before it will have normal caching behavior? Is that even related?
[Note: I am posting this here because I’ve been unsuccessful in getting a response on either the Cloudflare community forum or the Cloudflare subreddit — hoping for better results here.]
Thanks again to Cloudflare’s Kenton Varda for his answer here, without which I’d have been totally stuck. I also thank Brian Li for additional and equally valuable help he provided separately.
Adding the following for others who may find it useful . . .
The remaining problem I encountered was that I didn’t know how to assign differing cache-control values (such as one month, or 2592000) to most of the static assets while leaving the HTML at a much smaller value (like 3600 or 0). Then, a few days later, I found the answer within a comment in Issue #81 for the Cloudflare KV-Asset-Handler repository. So, now, I have the following code within my Worker’s index.js file:
options.cacheControl = {
browserTTL: 0,
edgeTTL: 0,
bypassCache: false // default
}
const filesRegex = /(.*\.(ac3|avi|bmp|br|bz2|css|cue|dat|doc|docx|dts|eot|exe|flv|gif|gz|ico|img|iso|jpeg|jpg|js|json|map|mkv|mp3|mp4|mpeg|mpg|ogg|pdf|png|ppt|pptx|qt|rar|rm|svg|swf|tar|tgz|ttf|txt|wav|webp|webm|webmanifest|woff|woff2|xls|xlsx|xml|zip))$/
if(url.pathname.match(filesRegex)) {
options.cacheControl.edgeTTL = 2592000
options.cacheControl.browserTTL = 2592000
}
If you look at that linked issue comment and wonder what’s different: the only thing is that I removed html (and, to be safe, htm) from the list of extensions. As a result, my Worker site’s HTML has zero caching while each CSS, font, or image file has a one-month cache-control setting — exactly the desired result. Note: The vast majority of the site’s images are hosted elsewhere, but I do still host a small number for favicons and fallback in general.
By default, Workers Sites does not serve a Cache-Control header, but you can customize it to do so. (EDIT to clarify: Workers Sites are cached on Cloudflare's edge by default, and support "etags" for revalidation. The Cache-Control header controls whether they are also cached in the browser without requiring revalidation.)
Note that Workers Sites works very differently from using Cloudflare with a classic origin server. If you're reading something about caching on Cloudflare, but it doesn't specifically mention Workers Sites, then it probably does not apply to Workers Sites.
With Workers Sites, your site is served by a Cloudflare Worker -- code that runs directly on Cloudflare's servers. So, you have no "origin" server behind Cloudflare, and Cloudflare's cache doesn't work in the normal way. The Worker code is completely responsible for serving the content, including setting any headers like Cache-Control.
In fact, when you create a new Workers Sites project using wrangler, the code for this Worker is generated for you -- but you are allowed to edit it! You can customize the code all you want to do whatever you want. The code for the Worker is found in your project directory under workers-site/index.js. The code looks like this -- in fact, it is initialized as a copy of that file from GitHub.
This worker code depends on a library (npm module) called #cloudflare/kv-asset-handler to do most of the work. This library can be customized to handle caching in various ways through the cacheControl option.
But where do you set this option? Well, in your worker code!
Open up workers-site/index.js and look for the part that looks like this:
async function handleEvent(event) {
const url = new URL(event.request.url)
let options = {}
/**
* You can add custom logic to how we fetch your assets
* by configuring the function `mapRequestToAsset`
*/
// options.mapRequestToAsset = handlePrefix(/^\/docs/)
The comment mentions one way that you can use options to customize how your site is served, but you can also set cacheControl here. Try adding this:
options.cacheControl = {
browserTTL: 3600 // 1 hour
}
Now re-deploy your site, and you should see assets are served with Cache-Control: max-age=3600. Of course, this means that your content may be cached in people's browsers for up to an hour (3600 seconds); you may prefer a longer or shorter period.
Note that if you aren't a programmer, this may all seem a bit daunting. Workers Sites is really designed for people who want to be able to customize how their sites are served by editing JavaScript code. For those not interested in writing code, you will be limited to the default behavior, which may or may not suit your needs.

How to stop index.html being cached by Azure CDN

I am using Azure CDN to host a static website I am building.
It's great, other than the fact that when I update my web app the old page is cached and so still shown.
I have added the following Cache rule in the rules engine to put it to refresh every 60 seconds, however this does nothing and I still get the old content, the only way to get the new content is to go to an incognito browser.
Anyone have any ideas it's driving me crazy!
Here is a screenshot of the browser dev window when I hit the index.html page, I can't see any cache control headers here, I would think that the Azure CDN would/should be putting these on, is that incorrect?
The rule you are modifying controls the "internal max age". If a file shows up correctly in icognito mode, this rule is working fine. You have to set "external max age" to control the Cache-Control header.
https://learn.microsoft.com/en-us/azure/cdn/cdn-verizon-premium-rules-engine-reference-features
Looks like it is not Azure CDN which is caching index.html, it is your browser. Ensure that the Cache-Control header is returned correctly by using the developer tools.
https://learn.microsoft.com/en-us/azure/cdn/cdn-manage-expiration-of-cloud-service-content
https://learn.microsoft.com/en-us/azure/cdn/cdn-manage-expiration-of-blob-content

Should Content-Security-Policy header be in every server response or only in text/html?

Should Content-Security-Policy header be in every server response (images, CSS, JS, ...) or only in text/html (.html or HTML output of PHP script)?
Since CSP is a client side protection and only processed by browsers for HTML documents (whether static or dynamically created by PHP or such like) there is no need to have this header on anything but text/html documents.
In fact, as CSP policies can be quite large, there is bandwidth savings to be had by only serving it in HTML document responses.
The one exception at present to this is web workers. However if you are not using them then you can ignore them for now.
Note the current CSP draft spec says in the goals section that CSP is used to give control over:
The resources which can be requested (and subsequently embedded or
executed) on behalf of a specific Document or Worker

Express JS - Static files not being served from cache

I'm running an application on Express and my browser keeps fetching files that should've already been cached. The status code for the offending files is 304 and size is consistently 220 B / 221 B. Other resources (that are getting served properly), are showing '(from cache)'.
A bit more information: the ETags / file contents haven't changed and I've set some response headers.
res.set('Cache-Control', 'max-age=345600');
res.set('Expires', new Date(Date.now() + 345600000).toUTCString());
(source: imageno.com)
Admittedly, I'm no HTTP expert, but maybe someone can help me understand why this might be happening?
Essentially, the browser IS caching and serving the cached bundles (although it doesn’t display the “from cache” message). In order to serve them, it sends a request to the server and checks if the file has changed. If it hasn’t changed, the server sends a 304 response code and the browser pulls the file from cache. This takes about 15-50 ms, so it's not a substantial performance impact.
However, I CAN force the browser to show the file without sending a verification req (like externally hosted libraries for example). That would require setting expires/cache-control headers for the far future, time-stamping the filenames for static assets and serving them dynamically (by maybe writing the updated filenames to a configuration file or something like that), but I think this would be more trouble than it’s worth honestly.
Just posting this response for anyone who runs into the same issue.

What is HTTP cache best practices for high-traffic static site?

We have a fairly high-traffic static site (i.e. no server code), with lots of images, scripts, css, hosted by IIS 7.0
We'd like to turn on some caching to reduce server load, and are considered setting the expiry of web content to be some time in the future. In IIS, we can do this on a global level via "Expire web content" section of the common http headers in the IIS response header module. Perhaps setting content to expire 7 days after serving.
All this actually does is sets the max-age HTTP response header, so far as I can tell, which makes sense, I guess.
Now, the confusion:
Firstly, all browsers I've checked (IE9, Chrome, FF4) seem to ignore this and still make conditional requests to the server to see if content has changed. So, I'm not entirely sure what the max-age response header will actually effect?! Could it be older browsers? Or web-caches?
It is possible that we may want to change an image in the site at short notice... I'm guessing that if the max-age is actually used by something that, by its very nature, it won't then check if this image has changed for 7 days... so that's not what we want either
I wonder if a best practice is to partition one's site into folders of content really won't change often and only turn on some long-term expiry for these folders? Perhaps to vary the querystring to force a refresh of content in these folders if needed (e.g. /assets/images/background.png?version=2) ?
Anyway, having looked through the (rather dry!) HTTP specification, and some of the tutorials, I still don't really have a feel for what's right in our situation.
Any real-world experience of a situation similar to ours would be most appreciated!
Browsers fetch the HTML first, then all the resources inside (css, javascript, images, etc).
If you make the HTML expire soon (e.g. 1 hour or 1 day) and then make the other resources expire after 1 year, you can have the best of both worlds.
When you need to update an image, or other resource, you just change the name of that file, and update the HTML to match.
The next time the user gets fresh HTML, the browser will see a new URL for that image, and get it fresh, while grabbing all the other resources from a cache.
Also, at the time of this writing (December 2015), Firefox limits the maximum number of concurrent connections to a server to six (6). This means if you have 30 or more resources that are all hosted on the same website, only 6 are being downloaded at any time until the page is loaded. You can speed this up a bit by using a content delivery network (CDN) so that everything downloads at once.

Resources