What is the difference between bans and purge in varnish http-cache - varnish

Hi i'm a newbie in Varnish HTTP-Cache. I find it hard to understand the difference in concept between purging and banning cache invalidation.
Anyone who can explain and differentiate banning and purging in varnish http-cache?
Anyone? Thanks!

Basically the difference between Purge and Ban is hard and soft delete but they will both update your cache. However there are some further little details that distinguish them:
Purge: Removes the object from cache immediately. It will work only for the specific url that is being requested and it is not possible to use regular expressions with Purge. For example: a Purge for www.example.com/uri is called, only the object for this URL will be removed from cache.
Ban: It is used when you want to remove many objects at once. This can be accomplished using regular expressions that are not available in Purge. When Ban is used a rule is created inside Varnish to invalidate objects, every object that is requested to Varnish will be checked against this rule and updated if it matches. This rule will check only objects older than it and will stay in Varnish as long as there is an object older than it is. This procedure avoids the invalidation of the same object more than
once. A practical example would be that you want to ban all the .png objects. Using the Varnish Cli you issue the command ban req.url ~ "\\.png$". Every time an object that matches this condition is requested from cache it will be discarded, a new version of it will be generated and delivered to the client. Objects generated after the rule are not going to be checked.
If you want some practical examples and how to code it, maybe you should check this answer.

Related

How to change response header (cache) in CouchDB?

Do you know how to change the response header in CouchDB? Now it has Cache-control: must-revalidate; and I want to change it to no-cache.
I do not see any way to configure CouchDB's cache header behavior in its configuration documentation for general (built-in) API calls. Since this is not a typical need, lack of configuration for this does not surprise me.
Likewise, last I tried even show and list functions (which do give custom developer-provided functions some control over headers) do not really leave the cache headers under developer control either.
However, if you are hosting your CouchDB instance behind a reverse proxy like nginx, you could probably override the headers at that level. Another option would be to add the usual "cache busting" hack of adding a random query parameter in the code accessing your server. This is sometimes necessary in the case of broken client cache implementations but is not typical.
But taking a step back: why do you want to make responses no-cache instead of must-revalidate? I could see perhaps occasionally wanting to override in the other direction, letting clients cache documents for a little while without having to revalidate. Not letting clients cache at all seems a little curious to me, since the built-in CouchDB behavior using revalidated Etags should not yield any incorrect data unless the client is broken.

Best way to invalidate a large number of Varnish objects?

I'm working on an API gateway-ish server, which supports user and groups.
I have an API endpoint something like the following.
/authorization/users/1?resource=users
basically, it's asking "Can this user 1 have access to 'users'?".
I would like to cache "/authorization/users/1?resource=users" in Varnish.
A permission can be set user level or group level. Each users belongs to at least one group.
User level cache invalidation is easy since I just need to send a PURGE request to a single URL.
When it comes to groups, it's complicated. A group can have over 50000 users. How do I invalidate those users?
Looking at https://www.varnish-software.com/blog/advanced-cache-invalidation-strategies, using X-Article-ID might be a good solution. My concern is that..how does it work with a large # of objects? Is there going to be a huge CPU usage? How fast can it handle 50000 objects?
Are there any better ways?
Using varnish ban will put the request you want to ban into the varnish ban list.
each request is checked if it is in the ban list.
if the object is in the varnish cache with a timestamp that is older than an the item in the ban list. Then the item will be removed from the cache and a new fresh copy will be requested from the backend.
On top of this varnish also uses a process called the "ban lurker" this removes the items in the ban list pro-actively from the varnish cache. How fast this is done can be configured, for more information about this please check https://www.varnish-software.com/blog/ban-lurker
Personaly i did not have any issues with cpu and memory usage when using this type of varnish bans. But this all depends on how often an item is added to the ban list and how advanced the regex is you are using to ban the pages.

Generate document ID server side

When creating a document and letting Couch create the ID for you, does it check if the ID already exists, or could I still produce a conflict?
I need to generate UUIDs in my app, and wondered if it would be any different than letting Couch do it.
Use POST /db request for that, but you should be aware the fact that the underlying HTTP POST method is not idempotent, and a client may automatic retry it due to a problem some networking problems, which may create multiple documents in the database.
As Kxepal already mentioned it is generally not recommended to POST a document without providing your own _id.
You could, however, use GET /_uuids to retrieve a list of UUIDs from the server and use that for storing your documents. The UUIDs returned will depend on the algorithm that is used, but the chance of a duplicate are (for most purposes) insignificantly small.
You can and should give a document id, even when using the bulk document interface. Skipping that step makes the problem of resubmitted requests creating duplicate documents even worse. On the other hand, if you do assign ID's, and part of the request reaches couchdb twice (as in the case of a reconnecting proxy), then your response will include some conflicts, which you can safely ignore, you know the conflict was from you, in the same request

Varnish - how to serve stale content for all clients while re-fetching?

I'm using Varnish in front of the backend.
Because the backend is sometimes very slow, I've enabled grace mode to serve stale content for clients. However, with grace mode, there is still one user will need to go to backend and have a very bad user experience.
Is it possible with Varnish to server stale content for ALL users while refreshing the cache?
I've seen some people suggested to use a cron job or script to refresh the cache on local host. This is not an elegant solution because there are so many URLs on our site and it'll be very difficult to manually refresh each of them.
I know the underlying problem is with the backend and we need to fix the problem there. But in the short term, I'm wondering if I can improve response time from Varnish layer?
You can do this (in the average case) in Varnish 3 by using restarts and a helper process.
How you'd write a VCL for it is described here: (disclosure: my own blog)
http://lassekarstensen.wordpress.com/2012/10/11/varnish-trick-serve-stale-content-while-refetching/
It is fairly convoluted, but works when you have an existing object that just expired.
In (future) Varnish 4 there will be additional VCL hooks that will make such tricks easier.
Yes, it is possible to serve stale content to all users (during a specified amount of time). You should experiment with the grace and saint mode to set appropriate time limits that suits your application.
Read more here: https://www.varnish-cache.org/docs/3.0/tutorial/handling_misbehaving_servers.html

Varnish S3-like, signed, time-limited request before delivering objects, in VCL

This question may seem a bit odd, but is it possible, with a poor-mans solution in VCL, to parse a signed request (with a shared secret key, aka poor-mans solution of HMAC), created by the referrer (main) site, and only serve the content from varnish if the signature is correct and the (signed) timestamp hasn't expired?
That is, similar to how Amazon S3 is works, where you can easily create a signed temporary URL to your S3-object that will expire in a defined amount seconds.
Note: I'm not talking about cache object expiry here, but URL-expiration for the client.
It gets handy when you only want to give out temporary URL's to your users to prevent long-term hotlinking without checking the referrer-header.
So - A poor-mans-solution to temporary URL's in VCL (preferrably in the vcl_recv ) making the internal object expire). Is it possible without making a VMOD?
Edit:
I found another way of authorizing content with Varnish:
http://monolight.cc/2011/04/content-authorization-with-varnish/
But it's still not what I want to achieve.
Best regards!
Yes, this is possible.
In essence you need to verify the signature (digest vmod), pick out the timestamp from whatever header it is in (regsub), and compare it to the current time.
Use std.integer() to cast the timestamp:
https://www.varnish-cache.org/docs/trunk/reference/vmod_std.html#integer
use the built in now variable in VCL to find the current timestamp. You might want to do (now + 0s) to force Varnish to give you a unix timestamp.
https://www.varnish-cache.org/docs/trunk/reference/vcl.html#variables
The digest vmod is on github:
https://github.com/varnish/libvmod-digest
There is already a VMOD for this, if that helps?
Varnish Secure Download Module

Resources