Varnish Multi-Site configuration with varying caching - varnish

I have 3 groups of API's
Each of the 3 has a unique requirement for caching.
So group 1 can be cached "normally" as in just the URL matters.
Group 2 requires that an auth header is passed, so would like to cache them based on that header and url.
Group 3 generates responses based upon the UserAgent and url
Now I can easily do any of those on their own, but because all of the API's are "small" I would like them to share a cache system and reduce costs.
From what I understand using multiple vcl's and vcl.load in varnishadm would allow me to specify a custom vcl_hash (among others) for each. Or is there a better solution as having an army of if statements just seems awful.
If I use vcl.load is there a way of having varnish automatically do this at startup so that the servers can be in an auto-scaling group? (currently using systemctl)
Cheers

It looks like you're looking for VCL Labels. Please check https://varnish-cache.org/docs/trunk/users-guide/vcl-separate.html or https://info.varnish-software.com/blog/one-vcl-per-domain for documentation and some examples.

Related

Best way to invalidate a large number of Varnish objects?

I'm working on an API gateway-ish server, which supports user and groups.
I have an API endpoint something like the following.
/authorization/users/1?resource=users
basically, it's asking "Can this user 1 have access to 'users'?".
I would like to cache "/authorization/users/1?resource=users" in Varnish.
A permission can be set user level or group level. Each users belongs to at least one group.
User level cache invalidation is easy since I just need to send a PURGE request to a single URL.
When it comes to groups, it's complicated. A group can have over 50000 users. How do I invalidate those users?
Looking at https://www.varnish-software.com/blog/advanced-cache-invalidation-strategies, using X-Article-ID might be a good solution. My concern is that..how does it work with a large # of objects? Is there going to be a huge CPU usage? How fast can it handle 50000 objects?
Are there any better ways?
Using varnish ban will put the request you want to ban into the varnish ban list.
each request is checked if it is in the ban list.
if the object is in the varnish cache with a timestamp that is older than an the item in the ban list. Then the item will be removed from the cache and a new fresh copy will be requested from the backend.
On top of this varnish also uses a process called the "ban lurker" this removes the items in the ban list pro-actively from the varnish cache. How fast this is done can be configured, for more information about this please check https://www.varnish-software.com/blog/ban-lurker
Personaly i did not have any issues with cpu and memory usage when using this type of varnish bans. But this all depends on how often an item is added to the ban list and how advanced the regex is you are using to ban the pages.

Types of scenarios implemented by companies for Performance test a System

As a newbie to JMeter, I have created some scenarios like some number of users are logging in to the system, sending some HTTP Request, Requests are looped, etc.
I would like to know what are the real world scenarios implemented by Companies to Performance test their System using JMeter.
Consider a E-Commerce Website and what all scenarios they might consider to performance test their Website?
The whole idea of performance testing is generating a real life load to the system simulating real users as close as possible. In regards to E-commerce system it would be something like:
N users searching for some term
M users browsing and navigating
X users making purchases
To simulate different usage scenarios you can use different thread groups or set weight using Throughput Controller
To make your JMeter test looking more like a real browser add the following test elements to your test plan:
HTTP Cookie Manager - to represent browser cookies, simulate different unique sessions and deal with cookie-based authentication
HTTP Cache Manager - to simulate browser cache. Browsers download embedded resources like images, scripts, styles, etc. but to it only once. Cache Manager replicates this behavior and also respects cache control headers.
HTTP Header Manager - to represent browser headers like User-Agent, Accept-Language and so on.
Also according to How to make JMeter behave more like a real browser you need to "tell" JMeter to retrieve all embedded resources from pages and use concurrent thread pool from 3 to 5 threads for it. The best place to put this config in is HTTP Request Defaults.

Varnish - how to serve stale content for all clients while re-fetching?

I'm using Varnish in front of the backend.
Because the backend is sometimes very slow, I've enabled grace mode to serve stale content for clients. However, with grace mode, there is still one user will need to go to backend and have a very bad user experience.
Is it possible with Varnish to server stale content for ALL users while refreshing the cache?
I've seen some people suggested to use a cron job or script to refresh the cache on local host. This is not an elegant solution because there are so many URLs on our site and it'll be very difficult to manually refresh each of them.
I know the underlying problem is with the backend and we need to fix the problem there. But in the short term, I'm wondering if I can improve response time from Varnish layer?
You can do this (in the average case) in Varnish 3 by using restarts and a helper process.
How you'd write a VCL for it is described here: (disclosure: my own blog)
http://lassekarstensen.wordpress.com/2012/10/11/varnish-trick-serve-stale-content-while-refetching/
It is fairly convoluted, but works when you have an existing object that just expired.
In (future) Varnish 4 there will be additional VCL hooks that will make such tricks easier.
Yes, it is possible to serve stale content to all users (during a specified amount of time). You should experiment with the grace and saint mode to set appropriate time limits that suits your application.
Read more here: https://www.varnish-cache.org/docs/3.0/tutorial/handling_misbehaving_servers.html

Strategy for spreading image downloads across domains?

I am working on PHP wrapper for Google Image Charts API service. It supports serving images from multiple domains, such as:
http://chart.googleapis.com
http://0.chart.googleapis.com
http://1.chart.googleapis.com
...
Numeric range is 0-9, so 11 domains available in total.
I want to automatically track count of images generated and rotate domains for best performance in browser. However Google itself only vaguely recommends:
...you should only need this if you're loading perhaps five or more charts on a page.
What should be my strategy? Should I just change domain every N images and what would good N value be in context of modern browsers?
Is there point where it would make sense to reuse domain rather than introduce new one (to save DNS lookup)?
I don't have specific number of images in mind - since this is open source and publicly available code I would like to implement generic solution, rather than optimize for my specific needs.
Considerations:
Is the one host faster than the other?
Does a browser limit connection per host?
How long does it take for the browser to resolve a DNS name?
As you want this to make a component, I'd suggest you make it able to have multiple strategies to find the host name to use. This will not only allow you to have different strategies but also to test them against each other.
Also you might want to add support for the javascript libraries that can render the data on the page in the future so you might want to stay modular anyway.
Variants:
Pick one domain name and stick with it, hardcoded: http://chart.googleapis.com
Pick one domain name out of many, stick with it: e.g. http://#.chart.googleapis.com
Like 2 but start to rotate the name after some images.
Like 3 but add some javascript chunk at the end of the page that will resolve the DNS of the missing hostnames in the background so that it's cached for the next request (Provide the data of the hostnames not used so far).
Then you can make your library configureable, so you don't need to hardencode in the code the values but you provide the default configuration.
Then you can add the strategy as configuration so someone who implements can decide over it.
Then you can make the component offer to load the configuration from outside, so let's say, if you create a Wordpress plugin, the plugin can store the configuration and offer a plugin user an admin-interface to change the settings.
As the configuration already includes which strategy to follow you have completely given the responsibility to the consumer of the component and you can more easily integrate different usage-scenarios for different websites or applications.
I don't exactly understand the request to rotate domains. I guess it does make sense in the context that your browser may only allow X open requests to a given domain at once, so if you have 10 images served from chart.googleapis.com, you may need to wait for the first to finish downloading before beginning to recieve the fifth, and so on.
The problem with rotating domains randomly is that then you defeat browser caching entirely. If an image is served from 1.chart.googleapis.com on one page load and then from 7.chart.googleapis.com on the next page load, the cached chart is invalidated and the user needs to wait for it to be requested, generated, and downloaded all over again.
The best solution I can think of is somehow determining the domain to request from algorithmically from the request. If its in a function, you can md5 the arguments somehow, convert to an integer, and then serve the image from {$result % 10}.chart.googleapis.com.
Probably a little overkill, but you at least can guarantee that a given image will always be served from the same server.

Getting MSISDN from mobile browser headers

What is the best way of going about this? I need to get MSISDN data from users accessing a mobisite to enhance the user experience.
I understand not all gateways would populate the headers entirely, but would wish to have MSISDN capture as option one before falling back on a cookie based model
I know this is an old post, but I'd like to give my contribution.
I work for a mobile carrier and here we have a feature that you can set some parameteres for header enrichment. We create some filters to match certain traffic passing through the GGSN (GPRS gateway node) then it opens the packages at layer 7 (when application layer is HTTP - not protected with SSL) and write msisdn, imsi and other parameters inside it.
So it is a carrier-depending feature.
While some operators do this, the representation and mechanism depends entirely on the operator. There is no standard way to do this.
If you are willing to pay for it try http://Bango.com. They provide an api but you may need to redirect user to their service
As others have said, there is no standard way between mobile operators for passing the MSISDN in the HTTP headers.
Different operators vary on the header value used, some operators do not pass the MSISDN unless they "authorize" your website and others have more complicated means of passing the MSISDN (e.g. redirects to their network to pick up the header).
Developing a site for one specific operator is easy enough, developing for multiple is next to impossible if you need to rely on the header.

Resources