Caching a specific URL path in Varnish fails - varnish

I've set up some VCL to only cache the /js/ folder.
sub vcl_backend_response {
if (bereq.url ~ "^/js/.$") {
unset beresp.http.set-cookie;
set beresp.http.cache-control = "max-age = 2592000";
set beresp.ttl = 1y;
}else {
set beresp.http.cache-control = "max-age = 0";
set beresp.ttl = 0s;
}
}
When I check the cache status of items in the js folder after reloading a few times, nothing has been cached and the cache control header shows 0.
# curl -I localhost:6081/js/themes.js
HTTP/1.1 200 OK
Content-Type: application/javascript
Etag: W/"1655132873"
Last-Modified: Mon, 13 Jun 2022 15:07:53 GMT
Accept-Ranges: bytes
X-Content-Type-Options: nosniff
Content-Length: 656
Date: Thu, 11 Aug 2022 19:13:50 GMT
cache-control: max-age = 0
Vary: Accept-Encoding
X-Varnish: 5
Age: 0
Via: 1.1 varnish (Varnish/6.5)
Connection: keep-alive
Any idea on how to fix this?

The vcl_backend_response subroutine is called prior to storing an object in the cache. The VCL logic you provided will at least ensure that the Javascript ends up in the cache.
However, I'm not too sure that your ^/js/.$" regular expression will match the right URLs. Maybe ^/js/.*$ would be a better match.
Built-in VCL for vcl_backend_response
There may also be other factors that prevent the object from being stored in te cache. If you take a look at the following tutorial about Varnish's built-in VCL, you'll see what logic is applied: https://www.varnish-software.com/developers/tutorials/varnish-builtin-vcl/#11-vcl_backend_response
If the response contains a Cache-Control: no-cache or a Cache-Control: no-store or a Cache-Control: private header, Varnish will still not store the object in the cache.
To prevent this from happening, you can actually call return(deliver). This wil bypass any other built-in VCL logic.
Here's what your VCL code would look like:
sub vcl_backend_response {
if (bereq.url ~ "^/js/.*$") {
unset beresp.http.set-cookie;
set beresp.http.cache-control = "max-age = 2592000";
set beresp.ttl = 1y;
return(deliver);
}else {
set beresp.http.cache-control = "max-age = 0";
set beresp.ttl = 120s;
set beresp.uncacheable = true;
return(deliver);
}
}
However, built-in VCL logic mostly makes sense, so I'd be very cautious when bypassing it.
You probably noticed that I removed set beresp.ttl=0 from the VCL example and replaced it with set beresp.uncacheable=true. That's because you should never set the TTL to zero. For reasons that are not relevant to this question, they will make Varnish perform really poorly and will cause increased backend fetches to your backend.
Built-in VCL for vcl_recv
While we discussed how to force objects to be stored in the cache through vcl_backend_response logic, that doesn't mean the object will be served from the cache. That's where the built-in VCL for vcl_recv comes into play.
See https://www.varnish-software.com/developers/tutorials/varnish-builtin-vcl/#1-vcl_recv for a tutorial about the vcl_recv built-in VCL.
If a request contains a Cookie or an Authorization header, Varnish will not serve the object from cache, despite it being static content.
To bypass this behavior, you could also add the following code:
sub vcl_recv {
if (req.url ~ "^/js/.*$") {
unset req.http.Cookie;
unset req.http.Authorization;
return(hash);
}
}
This will ensure all content from the /js/ folder will be served from the cache.
VCL template for caching static data
If it's your goal to cache static data regardless of the folder it is stored in, you can use the following VCL template:
sub vcl_recv {
if (req.url ~ "^[^?]*\.(7z|avi|bmp|bz2|css|csv|doc|docx|eot|flac|flv|gif|gz|ico|jpeg|jpg|js|less|mka|mkv|mov|mp3|mp4|mpeg|mpg|odt|ogg|ogm|opus|otf|pdf|png|ppt|pptx|rar|rtf|svg|svgz|swf|tar|tbz|tgz|ttf|txt|txz|wav|webm|webp|woff|woff2|xls|xlsx|xml|xz|zip)(\?.*)?$") {
unset req.http.Cookie;
unset req.http.Authorization
return(hash);
}
}
sub vcl_backend_response {
if (bereq.url ~ "^[^?]*\.(7z|avi|bmp|bz2|css|csv|doc|docx|eot|flac|flv|gif|gz|ico|jpeg|jpg|js|less|mka|mkv|mov|mp3|mp4|mpeg|mpg|odt|ogg|ogm|opus|otf|pdf|png|ppt|pptx|rar|rtf|svg|svgz|swf|tar|tbz|tgz|ttf|txt|txz|wav|webm|webp|woff|woff2|xls|xlsx|xml|xz|zip)(\?.*)?$") {
unset beresp.http.Set-Cookie;
set beresp.ttl = 1d;
}
}
It comes from the following tutorial: https://www.varnish-software.com/developers/tutorials/example-vcl-template/#13-caching-static-content.
You can still adjust it by increasing the TTL or by prefixing the specific folder in the regex.

Related

Varnish 6 missing requests for same URL coming from different browsers

This is how my varnish.vcl looks like.
vcl 4.0;
import directors;
import std;
backend client {
.host = "service1";
.port = "80";
}
sub vcl_recv {
std.log("varnish log info:" + req.http.host);
# caching pages in client
set req.backend_hint = client;
# If request is from conent or for pages remove headers and cache
if ((req.url ~ "/content/") || (req.url ~ "/cms/api/") || req.url ~ "\.(png|gif|jpg|jpeg|json|ico)$" || (req.url ~ "/_nuxt/") ) {
unset req.http.Cookie;
std.log("Cachable request");
}
# If request is not from above do not cache and pass to Backend.
else
{
std.log("Non cachable request");
return (pass);
}
}
sub vcl_backend_response {
if ((bereq.url ~ "/content/") || (bereq.url ~ "/cms/api/") || bereq.url ~ "\.(png|gif|jpg|jpeg|json|ico)$" || (bereq.url ~ "/_nuxt/") )
{
unset beresp.http.set-cookie;
set beresp.http.cache-control = "public, max-age=259200";
set beresp.ttl = 12h;
return (deliver);
}
}
# Add some debug info headers when delivering the content:
# X-Cache: if content was served from Varnish or not
# X-Cache-Hits: Number of times the cached page was served
sub vcl_deliver {
# Was a HIT or a MISS?
if ( obj.hits > 0 )
{
set resp.http.X-Cache-Varnish = "HIT";
}
else
{
set resp.http.X-Cache-Varnish = "MISS";
}
# And add the number of hits in the header:
set resp.http.X-Cache-Hits = obj.hits;
}
If I am hitting a page from same browser netwrok tab showing
X-Cache-Varnish = "HIT";
X-Cache-Hits = ;
Lets say if I hot from chrome 10 times this is what I get
X-Cache-Varnish = "HIT";
X-Cache-Hits = 9;
9 because first was a miss and rest 9 were served from cache.
If I try incognito window or a different browser it gets its own count starting from 0. I think somehow I am still caching cookies. I could not identify what I am missing.
Ideally, I want to delete all cookies for specific paths. but somehow unset does not seem to be working for me.
If you really want to make sure these requests are cached, make sure you do a return(hash); in your if-statement.
If you don't return, the built-in VCL will take over, and continue executing its standard behavior.
Apart from that, it's unclear whether or not your backend sets a Vary header which might affect your hit rate.
Instead of guessing, I suggest we use the logs to figure out it.
Run the following command to track your requests:
varnishlog -g request -q "ReqUrl ~ '^/content/'"
This statement's VSL Query expression assumes the URL starts with /content. Please adjust accordingly.
Please send me an extract of varnishlog for 1 specific URL, but also for both situations:
The one that hits the cache on a regular browser tab
The one that results in a cache miss in incognito mode or from a different browser
The logs will give more context and explain what happened.

My varnish config doesn't seem to be working properly

I'm very new on varnish and I've a business on my hands recently. It's a local magazine website http caching (Tech Stack is Javascript + PHP). I'm trying to use varnish 4 for caching the website. What they want me to do is; any new articles should be appeared on FE immediately, any deleted articles should be erased from the FE immediately, any changes on website's current appereance should be applied directly (changing articles' current locations, they can be dragged anywhere on the website based on articles' popularity change.) and finally any changes on existing articles should be applied to website immediately. As you see on the config below, in sub vcl_recv block I tried to use return(purge) for POST requests, because new articles and article changes is applied via POST request. But it doesn't work at all. When I try create a new dummy content or make changes on existing articles, it's not purging the cache and showing the fresh content even if POST request is successful. Also, on the BE side, I tried to use if (beresp.status == 404) for deleted articles, but it doesn't work too. When I delete the dummy article I created, it's not being deleted too, I'm still seein the stale content. How should I change my config to get all these things done? Thank you.
my varnish config is ;
import directors;
import std;
backend server1 {
.host = "<some ip>";
.port = "<some port>";
}
sub vcl_init {
new bar = directors.round_robin();
bar.add_backend(server1);
}
sub vcl_recv {
set req.backend_hint = bar.backend();
if (req.http.Cookie == "") {
unset req.http.Cookie;
}
set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(__[a-z]+|has_js)=[^;]*", "");
if (req.url ~ "\.(css|js|png|gif|jp(e)?g|swf|ico)") {
unset req.http.cookie;
}
if (req.url ~ "\.*") {
unset req.http.cookie;
}
if (req.method == "POST") {
return(purge);
}
}
sub vcl_deliver {
# A bit of debugging info.
if (obj.hits > 0) {
set resp.http.X-Cache = "HIT";
}
else {
set resp.http.X-Cache = "MISS";
}
}
sub vcl_backend_response {
set beresp.grace = 1h;
set beresp.ttl = 120s;
if (bereq.url ~ "\.*") {
unset beresp.http.Set-Cookie;
unset beresp.http.Cache-Control;
}
if (bereq.method == "POST") {
return(abandon);
}
if (beresp.status == 404) {
return(abandon);
}
return (deliver);
}
No need to use the director if you only have one backend. Varnish will automatically select the backend you declared if there's only 1 backend.
Purging content
The POST purge call you're doing is not ideal. Please have a look at the following page to learn more about content invalidation in Varnish: https://varnish-cache.org/docs/6.0/users-guide/purging.html#http-purging
The snippet on that page contains an ACL to protect your platform from unauthorized purges.
It's important to know that you'll need to create a hook into your CMS or your MVC controller, that does the purge call.Here's a simple example using curl in PHP:
$curl = curl_init("http://your.varnish.cache/url-to-purge");
curl_setopt($curl, CURLOPT_CUSTOMREQUEST, "PURGE");
curl_exec($curl);
As you can see, this is an HTTP request done in cURL that uses the custom PURGE HTTP request method. This call needs to be executed in your good right after the changes are stored in the database. This post-publishing hook will ensure that Varnish clears this specific object from cache.
VCL cleanup
The statement below doesn't look like a reliable way to remove cookies, because the expression will remove cookies from all pages dat contain a dot:
if (req.url ~ "\.*") {
unset req.http.cookie;
}
The same applies to the following statement coming from the vcl_backend_response hook:
if (bereq.url ~ "\.*") {
unset beresp.http.Set-Cookie;
unset beresp.http.Cache-Control;
}
I assume some pages do actually need cookies to properly function. An admin panel for example, or the CMS, or maybe even a header that indicates whether or not you're logged in.
The best way forward is to define a blacklist or whitelist of URL patterns that can or cannot be cached.
Here's an example:
if(req.url !~ "^/(admin|user)" {
unset req.http.Cookie;
}
The example above will only keep cookies for pages that start with /admin or /user. There are other ways as well.
Conclusion
I hope the purging part is clear. If not, please take a closer look at https://varnish-cache.org/docs/6.0/users-guide/purging.html#http-purging.
In regards to the VCL cleanup: purging can only work if the right things are stored in cache. Dealing with cookies can be tricky in Varnish.
Just try to define under what circumstances cookies should be kept for specific pages. Otherwise, you can just remove the cookies.
Hope that helps. Good luck.
Thijs

Is there a way to force Varnish to read the Cache-Control header on 403 responses?

It looks like from the code, that since a 403 is not a whitelisted status, the Cache-Control header is ignored:
switch (http_GetStatus(hp)) {
default:
expp->ttl = -1.;
https://github.com/varnishcache/varnish-cache/blob/4.0/bin/varnishd/cache/cache_rfc2616.c#L112-L114
This is the best I could come up with:
sub vcl_backend_response {
if (beresp.status == 403) {
set beresp.http.X-Status = beresp.status;
set beresp.status = 200;
}
}
sub vcl_deliver {
if (resp.http.X-Status) {
set resp.status = std.integer(resp.http.X-Status, 403);
unset resp.http.X-Status;
}
}
While this properly sets and unsets the status, every request is a cache MISS.
See #2018
A 403 response is not cached by default in varnish.
"You can cache other status codes than the ones listed above, but you have to set the beresp.ttl to a positive value in vcl_backend_response. "
See http://book.varnish-software.com/4.0/chapters/VCL_Basics.html#the-initial-value-of-beresp-ttl

Varnish Cached Object Time

How can we get the the time of the cached object in varnish.
My requirement is something like, say if object is in cache for 5 mins and for a specified ip, I want to server the content from backend but not from cache.
You can setup your vcl so it will always miss when certain headers are set or when the request comes from a certain browser
in your vcl_recv set
sub vcl_recv {
if (req.http.Cache-Control ~ "no-cache" && client.ip ~ editors) {
set req.hash_always_miss = true;
}
}
https://www.varnish-cache.org/trac/wiki/VCLExampleEnableForceRefresh

Varnish 3 - how to set maximum age in http headers

I am using Varnish 3.0.3 and to use it to leverage browser caching by setting a maximum age in the HTTP headers for static resources. I tried adding the following configuration to default.vcl:
sub vcl_fetch {
if (beresp.cacheable) {
/* Remove Expires from backend, it's not long enough */
unset beresp.http.expires;
/* Set the clients TTL on this object */
set beresp.http.cache-control = "max-age=900";
/* Set how long Varnish will keep it */
set beresp.ttl = 1w;
/* marker for vcl_deliver to reset Age: */
set beresp.http.magicmarker = "1";
}
}
sub vcl_deliver {
if (resp.http.magicmarker) {
/* Remove the magic marker */
unset resp.http.magicmarker;
/* By definition we have a fresh object */
set resp.http.age = "0";
}
}
This is copied from https://www.varnish-cache.org/trac/wiki/VCLExampleLongerCaching . Maybe I just made a typo. On restart of Varnish, it no longer worked.
I have two questions. Is this the correct way to do it for Varnish 3? If so, what am I doing wrong? Secondly, is there a way to test the Varnish configuration file, before a restart? Something along the ways of what Apache has with "/sbin/service httpd configtest". That catches mistakes before going live. Thank you.
Yes, in general this is the way of overriding the backend's TTL.
Remove beresp.http.expires, set beresp.http.cache-control, set beresp.ttl.
beresp.cacheable is a 2.[01]-ism. The same test in 3.0 is to check that beresp.ttl > 0.
A small tip is to store your magic marker on req.http instead, then you don't have to clean it up before handing the object to the client.
With regards to testing a configuration file, you can call the VCL compiler directly with "varnishd -C -f /etc/varnish/default.vcl" for example. If your VCL is faulty you get the error message, if the VCL is valid you get a few pages with generated C code.

Resources