how to cache static assets with revision query in URL

how to cache static assets with revision query in URL - varnish

I want to add a revision number to my static assets so when they're updated browsers will refresh them - how to force browsers reload static assets after code deployment?
Without a revision query string I can see hits in 'varnishstat', with them I see that its passing the request to the backend.
how can I cache static assets with revisions ?

Change your static assets regular expression to something like this:
if (req.url ~ "\.(jpeg|jpg|png|gif|ico|swf|js|css)(\?.*|)$") {
return (lookup);
}
it will match revisions too.

It is a strange needs but the following code should resolve your issue :
sub vcl_recv {
set req.url = regsub(req.url, "(\.(jpeg|jpg|png|gif|ico|swf|js|css))\?.*$", "\1");
}
You will need to clear varnish cache (purge/ban) every time your resource is updated in your backend.

Related

Different Varnish VCL based on URL

I have Magento installation in a subfolder and a WordPress site. How can I use the same Varnish installation for two different CMS?
I have prepared the following default.vcl file code example, but not sure if I am moving in the right direction. Thanks
vcl 4.0;
if (req.url ~ "^/path/to/magento/") {
include "magento.vcl";
} elseif (req.url ~ "^/path/to/wordpress/") {
include "wordpress.vcl";
}
P.S. My Varnish version is 4.0

Before I continue, please be ware of the fact that Varnish 4 is end-of-life. Any bug or error you encounter will not be fixed and can result in unexpected behavior. The best way forward is by upgrading to Varnish 6.
That being said, you are definitely moving in the right direction. But please make sure the included logic is put inside a subroutine.
Your example code will typically live inside sub vcl_recv {}. If you have logic that belongs in other subroutines (like sub vcl_backend_response {} for example), please make sure you include that logic as well.

Varnish clear cached items based on status code

We're caching 404 for images as sometimes our app would be released ahead of the actual images and would like to be able to clear them based on status code rather than ALL the images or specific images one by one.
However I am new to Varnish an unsure whether that is doable as I couldn't find any specific documentation on clearing based on status code.

you can either PURGE and image or BAN it.
Purging: it deletes a specific object from cache and to do so you will need to know the host and the URL of the specific object you want to purge.
Banning: to ban you can use regex and for your use case something among those lines should work.
In vcl_recv:
if (req.method == "BAN") {
ban("req.status == "404");
}

It seems that purge method is just an overlay on vcl's ban.
Using varnishadmn to test I've found to purge specific status, code only obj.status is accepted.
varnishadm ban obj.status == 404
verify with:
varnishadm ban.list

How do I cache bust imported modules in es6?

ES6 modules allows us to create a single point of entry like so:
// main.js
import foo from 'foo';
foo()
<script src="scripts/main.js" type="module"></script>
foo.js will be stored in the browser cache. This is desirable until I push a new version of foo.js to production.
It is common practice to add a query string param with a unique id to force the browser to fetch a new version of a js file (foo.js?cb=1234)
How can this be achieved using the es6 module pattern?

There is one solution for all of this that doesn't involve query string. let's say your module files are in /modules/. Use relative module resolution ./ or ../ when importing modules and then rewrite your paths in server side to include version number. Use something like /modules/x.x.x/ then rewrite path to /modules/. Now you can just have global version number for modules by including your first module with
<script type="module" src="/modules/1.1.2/foo.mjs"></script>
Or if you can't rewrite paths, then just put files into folder /modules/version/ during development and rename version folder to version number and update path in script tag when you publish.

HTTP headers to the rescue. Serve your files with an ETag that is the checksum of the file. S3 does that by default at example.
When you try to import the file again, the browser will request the file, this time attaching the ETag to a "if-none-match" header: the server will verify if the ETag matches the current file and send back either a 304 Not Modified, saving bandwith and time, or the new content of the file (with its new ETag).
This way if you change a single file in your project the user will not have to download the full content of every other module. It would be wise to add a short max-age header too, so that if the same module is requested twice in a short time there won't be additional requests.
If you add cache busting (e.g. appending ?x={randomNumber} through a bundler, or adding the checksum to every file name) you will force the user to download the full content of every necessary file at every new project version.
In both scenario you are going to do a request for each file anyway (the imported files on cascade will produce new requests, which at least may end in small 304 if you use etags). To avoid that you can use dynamic imports e.g if (userClickedOnSomethingAndINeedToLoadSomeMoreStuff) { import('./someModule').then('...') }

From my point of view dynamic imports could be a solution here.
Step 1)
Create a manifest file with gulp or webpack. There you have an mapping like this:
export default {
"/vendor/lib-a.mjs": "/vendor/lib-a-1234.mjs",
"/vendor/lib-b.mjs": "/vendor/lib-b-1234.mjs"
};
Step 2)
Create a file function to resolve your paths
import manifest from './manifest.js';
const busted (file) => {
return manifest[file];
};
export default busted;
Step 3)
Use dynamic import
import busted from '../busted.js';
import(busted('/vendor/lib-b.mjs'))
.then((module) => {
module.default();
});
I give it a short try in Chrome and it works. Handling relative paths is tricky part here.

I've created a Babel plugin which adds a content hash to each module name (static and dynamic imports).
import foo from './js/foo.js';
import('./bar.js').then(bar => bar());
becomes
import foo from './js/foo.abcd1234.js';
import('./bar.1234abcd.js').then(bar => bar());
You can then use Cache-control: immutable to let UAs (browsers, proxies, etc) cache these versioned URLs indefinitely. Some max-age is probably more reasonable, depending on your setup.
You can use the raw source files during development (and testing), and then transform and minify the files for production.

what i did was handle the cache busting in webserver (nginx in my instance)
instead of serving
<script src="scripts/main.js" type="module"></script>
serve it like this where 123456 is your cache busting key
<script src="scripts/123456/main.js" type="module"></script>
and include a location in nginx like
location ~ (.+)\/(?:\d+)\/(.+)\.(js|css)$ {
try_files $1/$2.min.$3 $uri;
}
requesting scripts/123456/main.js will serve scripts/main.min.js and an update to the key will result in a new file being served, this solution works well for cdns too.

Just a thought at the moment but you should be able to get Webpack to put a content hash in all the split bundles and write that hash into your import statements for you. I believe it does the second by default.

You can use an importmap for this purpose. I've tested it at least in Edge. It's just a twist on the old trick of appending a version number or hash to the querystring. import doesn't send the querystring onto the server but if you use an importmap it will.
<script type="importmap">
{
"imports": {
"/js/mylib.js": "/js/mylib.js?v=1",
"/js/myOtherLib.js": "/js/myOtherLib.js?v=1"
}
}
</script>
Then in your calling code:
import myThing from '/js/mylib.js';
import * as lib from '/js/myOtherLib.js';

You can use ETags, as pointed out by a previous answer, or alternatively use Last-Modified in relation with If-Modified-Since.
Here is a possible scenario:
The browser first loads the resource. The server responds with Last-Modified: Sat, 28 Mar 2020 18:12:45 GMT and Cache-Control: max-age=60.
If the second time the request is initiated earlier than 60 seconds after the first one, the browser serves the file from cache and doesn't make an actual request to the server.
If a request is initiated after 60 seconds, the browser will consider cached file stale and send the request with If-Modified-Since: Sat, 28 Mar 2020 18:12:45 GMT header. The server will check this value and:
If the file was modified after said date, it will issue a 200 response with the new file in the body.
If the file was not modified after the date, the server will issue a304 "not modified" status with empty body.
I ended up with this set up for Apache server:
<IfModule headers_module>
<FilesMatch "\.(js|mjs)$">
Header set Cache-Control "public, must-revalidate, max-age=3600"
Header unset ETag
</FilesMatch>
</IfModule>
You can set max-age to your liking.
We have to unset ETag. Otherwise Apache keeps responding with 200 OK every time (it's a bug). Besides, you won't need it if you use caching based on modification date.

A solution that crossed my mind but I wont use because I don't like it LOL is
window.version = `1.0.0`;
let { default: fu } = await import( `./bar.js?v=${ window.version }` );
Using the import "method" allows you to pass in a template literal string. I also added it to window so that it can be easily accessible no matter how deep I'm importing js files. The reason I don't like it though is I have to use "await" which means it has to be wrapped in an async method.

If you are using Visual Studio 2022 and TypeScript to write your code, you can follow a convention of adding a version number to your script file names, e.g. MyScript.v1.ts. When you make changes and rename the file to MyScript.v2.ts Visual Studio shows the following dialog similar to the following:
If you click Yes it will go ahead and update all the files that were importing this module to refer to MyScript.v2.ts instead of MyScript.v1.ts. The browser will notice the name change too and download the new modules as expected.
It's not a perfect solution (e.g. if you rename a heavily used module, a lot of files can end up being updated) but it is a simple one!

this work for me
let url = '/module/foo.js'
url = URL.createObjectURL(await (await fetch(url)).blob())
let foo = await import(url)

I came to the conclusion that cache-busting should not be used with ES Module.
Actually, if you have the versioning in the URL, the version is acting like a cache-busting. For instance https://unpkg.com/react#18.2.0/umd/react.production.min.js
If you don't have versioning in the URL, use the following HTTP header Cache-Control: max-age=0, no-cache to force the browser to always check if a new version of the file is available.
no-cache tells the browser to cache the file but to always perform a check
no-store tells the browser to don't cache the file. Don't use it!
Another approach: redirection
unpkg.com solved this problem with HTTP redirection.
Therefore it is not an ideal solution because it involves 2 HTTP requests instead of 1.
The first request is to get redirected to the latest version of the file (not cached, or cached for a short period of time)
The second request is to get the JS file (cached)
=> All JS files include the versioning in the URL (and have an aggressive caching strategy)
For instance https://unpkg.com/react#18.2.0/umd/react.production.min.js
=> Removing the version in the URL, will lead to a HTTP 302 redirect pointing to the latest version of the file
For instance https://unpkg.com/react/umd/react.production.min.js
Make sure the redirection is not cached by the browser, or cached for a short period of time. (unpkg allows 600 seconds of caching, but it's up to you)
About multiple HTTP requests: Yes, if you import 100 modules, your browser will do 100 requests. But with HTTP2 / HTTP3, it is not a problem anymore because all requests will be multiplexed into 1 (it is transparent for you)
About recursion:
If the module you are importing also imports other modules, you will want to check about <link rel="modulepreload"> (source Chrome dev blog).
The modulepreload spec actually allows for optionally loading not just the requested module, but all of its dependency tree as well. Browsers don't have to do this, but they can.
If you are using this technic in production, I am deeply interested to get your feedback!

Append version to all ES6 imports with PHP
I didn't want to use a bundler only because of this, so I created a small function that modifies the import statements of all the JS files in the given directory so that the version is at the end of each file import path in the form of a query parameter. It will break the cache on version change.
This is far from an ideal solution, as all JS file contents are verified by the server on each request and on each version change the client reloads every JS file that has imports instead of just the changed ones.
But it is good enough for my project right now. I thought I'd share.
$assetsPath = '/public/assets'
$version = '0.7';
$rii = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($assetsPath, FilesystemIterator::SKIP_DOTS) );
foreach ($rii as $file) {
if (pathinfo($file->getPathname())['extension'] === 'js') {
$content = file_get_contents($file->getPathname());
$originalContent = $content;
// Matches lines that have 'import ' then any string then ' from ' and single or double quote opening then
// any string (path) then '.js' and optionally numeric v GET param '?v=234' and '";' at the end with single or double quotes
preg_match_all('/import (.*?) from ("|\')(.*?)\.js(\?v=\d*)?("|\');/', $content, $matches);
// $matches array contains the following:
// Key [0] entire matching string including the search pattern
// Key [1] string after the 'import ' word
// Key [2] single or double quotes of path opening after "from" word
// Key [3] string after the opening quotes -> path without extension
// Key [4] optional '?v=1' GET param and [5] closing quotes
// Loop over import paths
foreach ($matches[3] as $key => $importPath) {
$oldFullImport = $matches[0][$key];
// Remove query params if version is null
if ($version === null) {
$newImportPath = $importPath . '.js';
} else {
$newImportPath = $importPath . '.js?v=' . $version;
}
// Old import path potentially with GET param
$existingImportPath = $importPath . '.js' . $matches[4][$key];
// Search for old import path and replace with new one
$newFullImport = str_replace($existingImportPath, $newImportPath, $oldFullImport);
// Replace in file content
$content = str_replace($oldFullImport, $newFullImport, $content);
}
// Replace file contents with modified one
if ($originalContent !== $content) {
file_put_contents($file->getPathname(), $content);
}
}
}
$version === null removes all query parameters of the imports in the given directory.
This adds between 10 and 20ms per request on my application (approx. 100 JS files when content is unchanged and 30—50ms when content changes).

Use of relative path works for me:
import foo from './foo';
or
import foo from './../modules/foo';
instead of
import foo from '/js/modules/foo';
EDIT
Since this answer is down voted, I update it. The module is not always reloaded. The first time, you have to reload the module manually and then the browser (at least Chrome) will "understand" the file is modified and then reload the file every time it is updated.

varnish not kicking in for www-version of domain

I have setup varnish on a server that runs two sites on two different domains, varnish works perfect without www in front of the two domains, I have attached the vcl file in this pastebin, I guess it's a basic misconfiguration somewhere, but I can't figure out where - does anyone know of a solution?
http://pastebin.com/CF37isis

To me the configuration looks fine and you should not get any troubles with/without www. Are you sure the DNS is pointing to your varnish server for the www? Of course there's also the possibility that your application acts differently and sets extra cookies/headers on the www request.
Further you should really redirect one of the www/non-www to the other so that only one is always used but both works (can also easily be done with varnish, or also probably in your DNS providers settings).
A redirect in varnish could look something like this (do not this needs to be added in your current vcl_recv/vlc_error and you should not add new blocks):
sub vcl_recv {
if (req.http.host == "www.somedomain.com") {
set req.http.x-Redir-Url = "http://somedomain.com" + req.url;
error 750 req.http.x-Redir-Url;
}
if (req.http.host == "www.someotherdomain.com") {
set req.http.x-Redir-Url = "http://someotherdomain.com" + req.url;
error 750 req.http.x-Redir-Url;
}
}
sub vcl_error {
if (obj.status == 750) {
set obj.http.Location = obj.response;
set obj.status = 301;
return (deliver);
}
}

Infinite redirection loop in varnish cache

we recently have put Varnish in front of our Drupal because the server was suffering heavy load and we are very pleased in general.
The only problem remaining is that we sometimes have an infinite redirection loop in the cached data. We have found this through our HTTP-Monitoring. We check the front page every minute. The page in the cache sometimes contains the full front page, but with a Location header set, that sends the user to the front page again.
We are not quite sure what could cause this, but also have no clue on how could track this down. Of course, the best way to handle this would be on the drupal side, but we can't really tell why this does happen.
Is there a way to log the cases when this happens? Or is it possible to detect this in varnish and mark the current cache content as invalid?
Of course, we don't want to always pass intentional redirects to the origin server, but the ones that would cause an infinite loop.
I hope to hear some ideas how we can further track this down. Many thank in advance for all kinds of hints.

I have found a workaround for this:
sub vcl_fetch {
// Fix a strange problem: HTTP 301 redirects to the same page sometimes go in$
if (beresp.http.Location == "http://" + req.http.host + req.url) {
if (req.restarts > 2) {
unset beresp.http.Location;
#set beresp.http.X-Restarts = req.restarts;
} else {
return (restart);
}
}
}
I give the backend a second (and thirhd) chance to return a proper page. If that fails as well, the Location header is removed. This works, because the proper page is served with just an additional invalid Location header.

The accepted answer by #philip updated for Varnish 4:
sub vcl_backend_response {
#Fix a strange problem: HTTP 301 redirects to the same page sometimes go in$
if (beresp.http.Location == "http://" + bereq.http.host + bereq.url) {
if (bereq.retries > 2) {
unset beresp.http.Location;
#set beresp.http.X-Restarts = bereq.retries;
} else {
return (retry);
}
}
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string