Varnish not making origin call for infrequently requested cache - varnish

I'm noticing this behavior on Varnish 6.5 where it's not making backend calls per the max-age cache control origin response, if the request is not frequently requested by clients.
Below's the expected behavior I see for a cache requested every 1 second. It has 20 seconds max-age cache-control header from origin:
Request 1:
HTTP/2 200
date: Tue, 20 Jul 2021 02:02:02 GMT
content-type: application/json
content-length: 33692
server: Apache/2.4.25 (Debian)
x-ua-compatible: IE=edge;chrome=1
pragma:
cache-control: public, max-age=20
x-varnish: 1183681 1512819
age: 17
via: 1.1 varnish (Varnish/6.5)
vary: Accept-Encoding
x-cache: HIT
accept-ranges: bytes
Request 2:
HTTP/2 200
date: Tue, 20 Jul 2021 02:02:04 GMT
content-type: application/json
content-length: 33692
server: Apache/2.4.25 (Debian)
x-ua-compatible: IE=edge;chrome=1
pragma:
cache-control: public, max-age=20
x-varnish: 891620 1512819
age: 19
via: 1.1 varnish (Varnish/6.5)
vary: Accept-Encoding
x-cache: HIT
accept-ranges: bytes
Request 3:
HTTP/2 200
date: Tue, 20 Jul 2021 02:02:05 GMT
content-type: application/json
content-length: 33692
server: Apache/2.4.25 (Debian)
x-ua-compatible: IE=edge;chrome=1
pragma:
cache-control: public, max-age=20
x-varnish: 1183687 1512819
age: 20
via: 1.1 varnish (Varnish/6.5)
vary: Accept-Encoding
x-cache: HIT
accept-ranges: bytes
Request 4:
HTTP/2 200
date: Tue, 20 Jul 2021 02:02:06 GMT
content-type: application/json
content-length: 33692
server: Apache/2.4.25 (Debian)
x-ua-compatible: IE=edge;chrome=1
pragma:
cache-control: public, max-age=20
x-varnish: 854039 1183688
age: 1
via: 1.1 varnish (Varnish/6.5)
vary: Accept-Encoding
x-cache: HIT
accept-ranges: bytes
You can see the Request #4 above makes a new origin request with the cache request id being 1183688.
Now if I wait a long while and make that same request, the cache age is pretty old and varnish does not make an origin request to cache a fresh object:
Request 5 after a while:
HTTP/2 200
date: Tue, 20 Jul 2021 02:10:08 GMT
content-type: application/json
content-length: 33692
server: Apache/2.4.25 (Debian)
x-ua-compatible: IE=edge;chrome=1
pragma:
cache-control: public, max-age=20
x-varnish: 1512998 1183688
age: 482
via: 1.1 varnish (Varnish/6.5)
vary: Accept-Encoding
x-cache: HIT
accept-ranges: bytes
I suppose I could start adding the Expires header from origin, but looking for explanation why varnish behaves this way if the request is infrequent. Thanks.

TTL header precedence in Varnish
Varnish does check the max-age directive, but there might be other factors can cause the TTL to be an unexpected value.
Here's the TTL precedence:
The Cache-Control header's s-maxage directive is checked.
When there's no s-maxage, Varnish will look for max-age to set its TTL.
When there's no Cache-Control header being returned, Varnish will use the Expires header to set its TTL.
When none of the above apply, Varnish will use the default_ttl runtime parameter as the TTL value. Its default value is 120 seconds.
Only then will Varnish enter vcl_backend_response, letting you change the TTL.
Any TTL being set in VCL using set beresp.ttl will get the upper hand, regardless of any other value being set via response headers.
Your specific situation
The best way to figure out what's going on is by running varnishlog and adding a filter for the URL you want to track.
Here's an example for the homepage:
varnishlog -g request -q "ReqUrl eq '/'"
The output will be extremely verbose, but will contain all the info you need.
Tags that are of particular interest are:
TTL see https://varnish-cache.org/docs/6.5/reference/vsl.html#varnish-shared-memory-logging
BerespHeader (specifically the Cache-Control backend response header)
RespHeader (specifically the Cache-Control response header)
Please also have a look at your VCL and check whether or not the TTL is changed by set beresp.ttl =.
What do I need to help you
In summary, if you want further assistance, please provide your full VCL, as well as a varnishlog extract for the transactions that is giving you to unexpected behavior.
Based on that information, we'll have a pretty good idea what's going on.

Related

CDN - Serve different content-type based on Accept header (Verizon/EdgeCast Premium)?

I have a server which returns a different response based on the Accept header e.g. if Accept header includes "image/webp", a webp image is served, otherwise a jpg is served.
We run Varnish at server-level and it does this correctly, as per example below:
Request (with image/webp in Accept header):
curl -s -D - -o /dev/null "https://REDACTED/media/tokinoha_bowl-4.jpg?sh=2&fmt=webp,jpg" -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"
Response (webp image served):
HTTP/2 200
date: Wed, 06 Feb 2019 08:25:05 GMT
content-type: image/webp
access-control-allow-origin: *
cache-control: public, s-maxage=31536000, max-age=31536000
x-frame-options: SAMEORIGIN
x-xss-protection: 1; mode=block
x-content-type-options: nosniff
strict-transport-security: max-age=31536000; includeSubDomains
vary: Accept-Encoding, Accept-Encoding,Origin
referrer-policy: strict-origin-when-cross-origin
accept-ranges: bytes
content-length: 60028
Request (no webp in Accept header, jpg served):
curl -s -D - -o /dev/null "https://REDACTED/media/tokinoha_bowl-4.jpg?sh=2&fmt=webp,jpg" -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/apng,*/*;q=0.8"
Response:
HTTP/2 200
date: Wed, 06 Feb 2019 08:25:18 GMT
content-type: image/jpeg
access-control-allow-origin: *
cache-control: public, s-maxage=31536000, max-age=31536000
x-frame-options: SAMEORIGIN
x-xss-protection: 1; mode=block
x-content-type-options: nosniff
strict-transport-security: max-age=31536000; includeSubDomains
vary: Accept-Encoding, Accept-Encoding,Origin
referrer-policy: strict-origin-when-cross-origin
accept-ranges: bytes
content-length: 166991
We have the below options in the Rules Engine set up, however whichever content-type is cached first is served on all subsequent requests irrespective of request header.
Rules Engine settings
Does anyone know of a way to achieve this?
Thanks in advance!
We had the same problem with Verizon/Edgecast: One URL delivered two different image types (JPEG and WebP) depending on Accept header. The origin (imgix) sent correctly Vary: Accept, but Edgecast ignored that and cached what it get and so browsers without WebP support got sometimes the wrong format.
We solved it with a rule in Edgecast:
WebP rule
The query parameter auto is always part of the URL and can therefore always be removed from the cache key. With the second query parameter varyWebP we recognize the URLs definitely and prevent a collision with URLs without query parameter auto.
In this case the URL
https://[HOST]/[PATH]?a=1&b=2&c=3&auto=compress,format
creates the same cache key as:
https://[HOST]/[PATH]?a=1&b=2&c=3
That's why the query parameter varyWebP protects us.

cloudfront Cache-Control headers are different than origin headers

I'm seeing a situation where requests through Cloudfront have a different Cache-Control than my origin. I have Object Caching set to "Use Origin Cache Headers" and (I don't think this is relevant) Compress Objects Automatically set to "No"
I've found that if I change Object Caching to "Customize" and change the value around that does in fact change the headers returned from the CDN. That's okay and all... but I'm curious to know why with my existing settings this header isn't being passed through.
Thanks!
Compressed Request from Origin - shows Cache-Control of '31536000'
(05:34 PM) jsharpe#mbp:~ curl -I https://staging.testing.com/assets/application-0d5691ba401c3f5a305fda52745a831376545a605a6c16e50fc838fdaa567e57.css --compressed
HTTP/1.1 200 OK
Server: Cowboy
Date: Wed, 16 Aug 2017 21:34:22 GMT
Connection: keep-alive
Last-Modified: Wed, 16 Aug 2017 05:05:25 GMT
Content-Type: text/css
Cache-Control: public, max-age=31536000
Content-Encoding: gzip
Vary: Accept-Encoding, Origin
Content-Length: 33563
Via: 1.1 vegur
Compressed Request from CDN - shows Cache-Control of '86400'
(05:34 PM) jsharpe#mbp:~ curl -I https://staging-cdn.testing.com/assets/application-0d5691ba401c3f5a305fda52745a831376545a605a6c16e50fc838fdaa567e57.css --compressed
HTTP/1.1 200 OK
Content-Type: text/css
Content-Length: 33563
Connection: keep-alive
Server: Cowboy
Date: Wed, 16 Aug 2017 05:07:12 GMT
Last-Modified: Wed, 16 Aug 2017 05:05:25 GMT
Cache-Control: public, max-age=86400
Content-Encoding: gzip
Via: 1.1 vegur, 1.1 7d327ef7e21429ba6a44eb6374c976f3.cloudfront.net (CloudFront)
Vary: Accept-Encoding
Age: 59233
X-Cache: Hit from cloudfront
X-Amz-Cf-Id: TEqKbQ5ZYySY7m8rDft_MAlygEiam6gYvzrXBpS7D2DrBNbVUZ1y3Q==

Cache-Control and Pragma Response headers while reading files from Data Lakes

Why do we get the below 2 headers while trying to read files stored in Azure Data Lake using REST API?
Cache-Control →no-cache, no-cache, no-store, max-age=0 (Why do we have multiple no-cache)
Pragma →no-cache
Why are these Headers getting set and How can we override them such them we can cache the responses ?
Below is my curl request
curl -v -X GET -H "Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6IlJyUXF1OXJ5ZEJWUldtY29jdVhVYjIwSEdSTSIsImtpZCI6IlJyUXF1OXJ5ZEJWUldtY29jdVhVYjIwSEdSTSJ9.eyJhdWQiOiJodHRwczovL21hbmFnZW1lbnQuY29yZS53aW5kb3dzLm5ldC8iLCJpc3MiOiJodHRwczovL3N0cy53aW5kb3dzLm5ldC8wNmZhNTQ1Yy01MmQ3LTRiOGMtYjBmNy03MzQ5MTFiNDA0MmEvIiwiaWF0IjoxNDgxMTc3MTE3LCJuYmYiOjE0ODExNzcxMTcsImV4cCI6MTQ4MTE4MTAxNywiYXBwaWQiOiI3MzU0NDBhNy1kM2RmLTQ0YjEtYTk2Yy0wMTlhMzE5NmEwYmYiLCJhcHBpZGFjciI6IjEiLCJpZHAiOiJodHRwczovL3N0cy53aW5kb3dzLm5ldC8wNmZhNTQ1Yy01MmQ3LTRiOGMtYjBmNy03MzQ5MTFiNDA0MmEvIiwib2lkIjoiNTM4OTQyMjQtNDIyNC00MTllLTgxNzAtMzQ3NTQwNGI2NGFlIiwic3ViIjoiNTM4OTQyMjQtNDIyNC00MTllLTgxNzAtMzQ3NTQwNGI2NGFlIiwidGlkIjoiMDZmYTU0NWMtNTJkNy00YjhjLWIwZjctNzM0OTExYjQwNDJhIiwidmVyIjoiMS4wIn0.TrkCayxF0MJbXe7SPc8ZtMx8Aw07Plv0PE1KDAUw1hjHBgmTE95y0ivA2qKpmkvbLkreaGICmzc-4DPNcPBgQFHaiHzS9MoiC6c0mOO_0oOw7FRsbDYnL-P03_MEoHYDas7o2BC88ruZlHHePmoOHqwwXwBOgr6si5RwRmFz7InJpfILqENKD-fk2uWBWfQ1JU3xvmVLUgeoToFK-q7Xs
g6eHgW84S4gGF7xuvjz2ogduxmhaV18A80rFFRFk70uHXllFcDylHKXPqgRJ9dfHswZEczxQSQCI2hH5XTn72xMUI0ygIFX4mPjwPQhxPAaygMLxYBOhG5gNm1vBAsJww" "https://signstorage.azuredatalakestore.net/webhdfs/v1/signsdata/test.txt?op=OPEN&api-version=2016-11-01&read=true"
Response
File contents and Response Headers are
HTTP/1.1 200 OK
Cache-Control: no-cache, no-cache, no-store, max-age=0
Pragma: no-cache
Transfer-Encoding: chunked
Content-Type: application/octet-stream
Expires: -1
x-ms-request-id: 302fd601-0eca-4db0-a2de-cc2ee5d951d8
x-ms-webhdfs-version: 16.07.18.01
Status: 0x0
X-Content-Type-Options: nosniff
Strict-Transport-Security: max-age=15724800; includeSubDomains
Date: Thu, 08 Dec 2016 06:27:31 GMT
For security reasons, ADLS doesn't support caching mechanisms, hence the headers that you see are sent by the service to limit caching.  
Thank you,
Guy

How can I get Varnish to hit on requests of static files on Cloudcontrol?

I'm serving static files (images, javascript, css files) from a (hopefully) cookieless domain also mapped to my cloudcontrol deployment. Here are the request and reponse headers. I see no cookie header in the request, ETag and date check should satisfy, so I would expect that the varnish proxy in front of the cloudcontrol deployment would fetch the request and serve it, but everytime I try it out all static files are served from the Apache processes according to the response header. Any tipps appreciated.
Request URL:http://static.hotelpress.mobi/bundles/viermediamagazine/icons/social/Facebook_64.png
Request Method:GET
Status Code:304 Not Modified
Request Headers
Accept:*/*
Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3
Accept-Encoding:gzip,deflate,sdch
Accept-Language:de-DE,de;q=0.8,en-US;q=0.6,en;q=0.4
Cache-Control:max-age=0
Connection:keep-alive
Host:static.hotelpress.mobi
If-Modified-Since:Sat, 20 Apr 2013 18:23:31 GMT
If-None-Match:"6008d436-1108-4daceeec74ec0"
Referer:---stripped out or my boss kills me---
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.65 Safari/537.31
Response Headers
Accept-Ranges:bytes
Age:0
Connection:keep-alive
Date:Sat, 20 Apr 2013 18:31:33 GMT
ETag:"6008d436-1108-4daceeec74ec0"
Last-Modified:Sat, 20 Apr 2013 18:23:31 GMT
Server:Apache
Via:1.1 varnish
X-Varnish:995972028
X-varnish-cache:MISS
Assuming that Varnish is passing through all your Apache headers, it appears that you are not setting any headers telling Varnish to cache.
Varnish does cache silently for 2 minutes by default with no headers, but you probably want more than that.
You should also remove the Etag, for the reasons you say. More information on Etags is here.
If you have fingerprinted assets (per deploy/change), you should set those in Apache for 1 year.
Any others can be as long as you can stand (remembering that this may stop you frequently updating those assets, because they may be cached somewhere).
Here are the lines you need in apache:
<LocationMatch "^/path/to/fingerprinted/assets/.*$">
Header unset ETag
FileETag None
# RFC says only cache for 1 year
ExpiresActive On
ExpiresDefault "access plus 1 year"
Header append Cache-Control "public"
</LocationMatch>
and for others:
<LocationMatch "^/bundles/viermediamagazine/icons/.*$">
Header unset ETag
FileETag None
ExpiresActive On
ExpiresDefault "access plus 1 week"
Header append Cache-Control "public"
</LocationMatch>
You can use as many locations as you want - just make sure they do not overlap!
The example request you posted contains
Cache-Control:max-age=0
which prevents cached answers iirc. You could also try if setting a Cache-Control: max-age=<x> header in your response helps.
Extending the other answers: Here's a sample request to an app on cloudControl, that caches (when the ?c=1). In any case send requests multiple times until you get hits consistently to make sure all Varnish instances have cached the response.
$ curl -v http://impresstw.cloudcontrolled.com/?c=1
* About to connect() to impresstw.cloudcontrolled.com port 80 (#0)
* Trying 46.137.184.215...
* connected
* Connected to impresstw.cloudcontrolled.com (46.137.184.215) port 80 (#0)
> GET /?c=1 HTTP/1.1
> User-Agent: curl/7.27.0
> Host: impresstw.cloudcontrolled.com
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: text/html; charset=UTF-8
< Server: TornadoServer/2.4.1
< Cache-Control: max-age=36000, must-revalidate
< Expires: Tue, 23 Apr 2013 20:18:12 GMT
< Content-Length: 13
< Accept-Ranges: bytes
< Date: Tue, 23 Apr 2013 10:18:28 GMT
< X-Varnish: 1434600184 1434599691
< Age: 16
< Via: 1.1 varnish
< Connection: keep-alive
< X-varnish-cache: HIT
<

Foursquare venue photos API only occasionally working with client_id/client_secret?

I've found that some venues will only return photos if I use a signed in user instead of a client_id / client_secret. Is this intentional?
curl -i https://api.foursquare.com/v2/venues/4c36476d93db0f47f6cc1d92/photos?client_id=xxx\&client_secret=xxx\&group=venue\&v=20120304
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Cache-Control: no-cache, private, no-store
Content-Type: application/json; charset=utf-8
Date: Mon, 05 Mar 2012 00:28:34 GMT
Expires: Mon, 5 Mar 2012 00:28:34 GMT
Pragma: no-cache
Server: nginx/0.8.52
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4999
Content-Length: 66
Connection: keep-alive
{"meta":{"code":200},"response":{"photos":{"count":0,"items":[]}}}
curl -i https://api.foursquare.com/v2/venues/4c36476d93db0f47f6cc1d92/photos?group=venue\&v=20120304\&oauth_token=xxx\&v=20120304
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Cache-Control: no-cache, private, no-store
Content-Type: application/json; charset=utf-8
Date: Mon, 05 Mar 2012 00:29:19 GMT
Expires: Mon, 5 Mar 2012 00:29:19 GMT
Pragma: no-cache
Server: nginx/0.8.52
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 1000
Content-Length: 15311
Connection: keep-alive
{"meta":{"code":200},"notifications":[{"type":"notificationTray","item":{"unreadCount":0}}],"response":{"photos":{"count":14,"items":[lots of images here]}}}
I want to fetch a photo to associate with a given place as a background process, not tied to the specific user. Is it intended that this API only functions correctly for signed in users?
Looks like there's a bug in userless access to /venues/photos. The team is investigating. The intended behavior is that userless access of that endpoint returns all public photos attached to that venue.

Resources