I am using Angular 5, with a Hapi NodeJs backend. When I send the "cache-control: private, max-age=3600" header in the http response the response is cached correctly. The problem is that when I make the exact same request in a different tab and with connections to different database the data cached in the browser tab 1 is shared with browser tab 2 when it make the same request. Is there a way for the cache to only be used per application instance using the cache-control header?
Same Webapp in Browser Tab 1. Same Domain.
Database 1
Same Webapp in Browser Tab 2. Same Domain.
Database 2
User agent needs to somehow differentiate these cache entries. Probably your best option is to adjust a cache entry key (add subdomain, path or query parameter that identifies a database to a URI).
You can also use custom HTTP header (such as X-Database), in pair with Vary HTTP header but in this case user agent may store only single response at a time because it is still uses URI as cache key and Vary HTTP header for response validation only. Relevant excerpt from The State of Browser Caching, Revisited article by Mark Nottingham:
The one hiccup I saw was that all tested browser caches would only store one variant at a time; i.e., if your responses contain Vary: Foo and you get two requests, the first with Foo: 1 and the second with Foo: 2, the second response will evict the first from cache.
Whether that’s a problem depends on how you use Vary; if you want to reuse cached responses with old values, it might reduce efficiency. However, that doesn’t seem like a common use case;
For more information check RFC 7234 Hypertext Transfer Protocol (HTTP/1.1): Caching, and Understanding The Vary Header article by Andrew Betts
Related
I want to send a cached page back to the user But the problem is that I need to generate a unique VISITOR_ID for every new user and
send it back to the user through headers , so I need to send an API call from varnish proxy server to my backend servers to fetch VISITOR_ID and then append it to the response
We were earlier using Akamai and we were able to implement this using edge workers present there,
I want to know if such a thing is possible to do in varnish or not.
Thanks in Advance
Open source solution for HTTP calls
You can use https://github.com/varnish/libvmod-curl and add this VMOD to perform HTTP calls from within VCL. The API for this module can be found here: https://github.com/varnish/libvmod-curl/blob/master/src/vmod_curl.vcc
Commercial solution for HTTP calls
Despite there being an open source solution, I want to mention that there might be a more stable solution that is actually supported and receives frequent updates. See https://docs.varnish-software.com/varnish-cache-plus/vmods/http/ for the HTTP VMOD that is part of Varnish Enterprise.
Generate the VISITOR_ID in VCL
Your solution implies that an HTTP call is needed for every single response. While that is possible through various VMODs, this will result in a lot of extra HTTP calls. Unless you cache every variation that includes the VISITOR_ID.
You could also consider generating the unique ID yourself in VCL.
See https://github.com/otto-de/libvmod-uuid for a VMOD that generates UUIDs or https://github.com/varnish/libvmod-digest for a VMOD that generates hashes.
Fetch VISITOR_ID from Redis
If you prefer to generate the VISITOR_ID in your origin application, you could use a Key/Value store like Redis to store or generate values.
You can generate the ID in your application and store it in Redis. You could also generate and store it using LUA scripting in Redis.
Varnish can then fetch the key from Redis and inject it in the response.
While this is a similar approach to the HTTP calls, at leasts we know Redis is capable of keeping up with Varnish in terms of high performance.
See https://github.com/carlosabalde/libvmod-redis to learn how to interface with Redis from Varnish.
If ideologically I oppose to the policies of a certain browser's developers (I think that the browser harms the users), can I somehow block that browser from accessing my website?
I would assume that such block would have to be backend, frontend won't help here, but can backend languages such as PHP/Ruby/C++/Python, etc. really help for that sake?
Your server can look at the HTTP_USER_AGENT header in the HTTP request that the client sends to the server. This header typically contains information about the user agent that made the request - i.e. if the request originated from a web browser, then the user agent information will generally contain the vendor and version of the browser. So, your server can respond conditionally based on what the client sends in this header.
See https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent for more info, and for examples of user agent strings for a number of widely used browsers.
However, be aware that the HTTP_USER_AGENT header is populated by the client. Therefore, this header cannot be trusted, as it can easily be forged by the client.
I need to do cache key generation/store based on the response header received from the backend not based on the URL requested from the client-side.
The main reason for doing so is: I have a backend logic to reply to the client with some different data if requested things are not available.
Ex:
Request: example.com/foo/102030?names=test1
now, my backend checks for is test1 is present for 102030, if not it checks whether 102030 has a special tag = Y : basically tells that have to look for some other matching object.
so, in that case, backed reply's to clients with data again, which can be accessible with example.com/foo/000000?names=test1.
So, now the problem is if some other request comes with example.com/foo/000000?names=test1, varnish considers this as a different request based on URL but in actuality, I need to serve the same data which is already present in Cache with example.com/foo/000000?names=test1.
Currently, I do Ban using some regex syntax from the backend, so in that case, I can easily invalidate the object which stored with /foo/000000?names=test1 not the other one.
So, is there a way through which I can store the cache key based on the response header info?
There's no way you can do this unfortunately. Only request information can be used to create the cache key.
That is by design, because incoming requests only have their own request properties they can present to Varnish to identify the resource they wish to retrieve.
I'm trying to use varnish to cache rpms and other giant binaries. What I would've expected is that when an object is expired in the cache varnish would send a request with If-Not-Modified to the backend and then assuming the object didn't change, varnish would refresh the ttl on the local cached object without downloading a new one. I wrote a test backend to generate specific request (set small max-age and whatnot, as well as see the header varnish sends) but I never get anything else then full fetch. If-Not-Modified is never sent. My VCL is basically the default VCL. I tried playing around with setting small ttl/grace but never got any interesting behavior.
Is varnish even able to do what I want it to ? If so has anyone done anything similar and can give tips ?
The request sent to the backend when an object is expired is the one that Varnish receives from the client.
So when testing your setup, are you sending an If-Not-Modified header in your requests to Varnish?
Have a look at https://www.varnish-software.com/wiki/content/tutorials/varnish/builtin_vcl.html to see what the built in VCL is.
Under vcl_backend_fetch, which will be called if there is no object in the cache, you can see there is no complex logic around stale objects, it is just passing on the request as is.
First of all, quite a bit has happened in varnish-cache since this question was posted. I am answering the questions for varnish-cache 6.0 and later:
The behavior the OP expects is how varnish should behave now if the backend returns the Last-Modified and/or Etag headers.
Obviously, an object can only be refreshed if it still exist in cache. This is what beresp.keep is for. It extends the time an object is kept in cache after ttl and grace have expired. Note that objects are also LRU evicted if the cache is too small to keep all objects for their maximum lifetime.
On the comment by #maxschlepzig, it might be based on a misunderstanding:
When an object is not in cache but is to be cached, varnish can not forward the client request's conditional headers (If-Modified-Since, If-None-Match) because a 304 response would not be good for caching (it has not body and is relevant only for a particular request). Instead, varnish strips to conditional headers for this case to (potentially) get a 200 response with an object to put into cache.
As explained above, for a subsequent backend request after the ttl has expired, the conditional headers are constructed based on the cached response. The conditional headers from the client are not used for this case either.
All of this above applies for the case that an object is to be cached at all (Fetch, Hit-for-Miss (as created by setting beresp.uncacheable)).
For Pass and Hit-for-Pass (as created by return(pass(duration)) in vcl_backend_response), the client conditional headers are passed to the backend.
Do you know how to change the response header in CouchDB? Now it has Cache-control: must-revalidate; and I want to change it to no-cache.
I do not see any way to configure CouchDB's cache header behavior in its configuration documentation for general (built-in) API calls. Since this is not a typical need, lack of configuration for this does not surprise me.
Likewise, last I tried even show and list functions (which do give custom developer-provided functions some control over headers) do not really leave the cache headers under developer control either.
However, if you are hosting your CouchDB instance behind a reverse proxy like nginx, you could probably override the headers at that level. Another option would be to add the usual "cache busting" hack of adding a random query parameter in the code accessing your server. This is sometimes necessary in the case of broken client cache implementations but is not typical.
But taking a step back: why do you want to make responses no-cache instead of must-revalidate? I could see perhaps occasionally wanting to override in the other direction, letting clients cache documents for a little while without having to revalidate. Not letting clients cache at all seems a little curious to me, since the built-in CouchDB behavior using revalidated Etags should not yield any incorrect data unless the client is broken.