How to resolve Varnish FetchError "Timed out reusing backend connection" - varnish

I am seeing frequent errors Varnish FetchError "Timed out reusing backend connection". Checked couple of blogs, but not find any resolution. Could you please help?
BereqHeader Accept-Encoding: gzip
BereqHeader X-Varnish: 37849780
VCL_call BACKEND_FETCH
VCL_return fetch
BackendOpen 36 NODEJS_2 xx.xx.xx.xx 9000 yy.yy.yy.yy 43309
Timestamp Bereq: 1605444526.456709 0.000102 0.000102
FetchError Timed out reusing backend connection
BackendClose 36 NODEJS_2
Timestamp Beresp: 1605444571.456893 45.000285 45.000183
Timestamp Error: 1605444571.456900 45.000292 0.000006
BerespProtocol HTTP/1.1
BerespStatus 503
BerespReason Backend fetch failed
BerespHeader Date: Sun, 15 Nov 2020 12:49:31 GMT
BerespHeader Server: Varnish
VCL_call BACKEND_ERROR
BerespHeader Content-Type: text/html; charset=utf-8
BerespHeader Retry-After: 5
VCL_return deliver
Storage malloc Transient
Length 285
BereqAcct 2940 185 3125 0 0 0
End

The Timestamp Beresp: 1605444571.456893 45.000285 45.000183 tag in your VSL output indicates that your backend took 45.000183 seconds to generate its response, which triggered the first_byte_timeout.
In reality, your backend probably needed more than 45 seconds to generate the output, but Varnish just gave up after it hit the timeout.
Here are your options:
Increase the first_byte_timeout runtime parameter to a better number
Examine why your backend it taking so long
Although option 1 is theoretically viable, you really want to go for option 2, and figure out why it is taking so long for the backend to respond.

Related

Varnish not making origin call for infrequently requested cache

I'm noticing this behavior on Varnish 6.5 where it's not making backend calls per the max-age cache control origin response, if the request is not frequently requested by clients.
Below's the expected behavior I see for a cache requested every 1 second. It has 20 seconds max-age cache-control header from origin:
Request 1:
HTTP/2 200
date: Tue, 20 Jul 2021 02:02:02 GMT
content-type: application/json
content-length: 33692
server: Apache/2.4.25 (Debian)
x-ua-compatible: IE=edge;chrome=1
pragma:
cache-control: public, max-age=20
x-varnish: 1183681 1512819
age: 17
via: 1.1 varnish (Varnish/6.5)
vary: Accept-Encoding
x-cache: HIT
accept-ranges: bytes
Request 2:
HTTP/2 200
date: Tue, 20 Jul 2021 02:02:04 GMT
content-type: application/json
content-length: 33692
server: Apache/2.4.25 (Debian)
x-ua-compatible: IE=edge;chrome=1
pragma:
cache-control: public, max-age=20
x-varnish: 891620 1512819
age: 19
via: 1.1 varnish (Varnish/6.5)
vary: Accept-Encoding
x-cache: HIT
accept-ranges: bytes
Request 3:
HTTP/2 200
date: Tue, 20 Jul 2021 02:02:05 GMT
content-type: application/json
content-length: 33692
server: Apache/2.4.25 (Debian)
x-ua-compatible: IE=edge;chrome=1
pragma:
cache-control: public, max-age=20
x-varnish: 1183687 1512819
age: 20
via: 1.1 varnish (Varnish/6.5)
vary: Accept-Encoding
x-cache: HIT
accept-ranges: bytes
Request 4:
HTTP/2 200
date: Tue, 20 Jul 2021 02:02:06 GMT
content-type: application/json
content-length: 33692
server: Apache/2.4.25 (Debian)
x-ua-compatible: IE=edge;chrome=1
pragma:
cache-control: public, max-age=20
x-varnish: 854039 1183688
age: 1
via: 1.1 varnish (Varnish/6.5)
vary: Accept-Encoding
x-cache: HIT
accept-ranges: bytes
You can see the Request #4 above makes a new origin request with the cache request id being 1183688.
Now if I wait a long while and make that same request, the cache age is pretty old and varnish does not make an origin request to cache a fresh object:
Request 5 after a while:
HTTP/2 200
date: Tue, 20 Jul 2021 02:10:08 GMT
content-type: application/json
content-length: 33692
server: Apache/2.4.25 (Debian)
x-ua-compatible: IE=edge;chrome=1
pragma:
cache-control: public, max-age=20
x-varnish: 1512998 1183688
age: 482
via: 1.1 varnish (Varnish/6.5)
vary: Accept-Encoding
x-cache: HIT
accept-ranges: bytes
I suppose I could start adding the Expires header from origin, but looking for explanation why varnish behaves this way if the request is infrequent. Thanks.
TTL header precedence in Varnish
Varnish does check the max-age directive, but there might be other factors can cause the TTL to be an unexpected value.
Here's the TTL precedence:
The Cache-Control header's s-maxage directive is checked.
When there's no s-maxage, Varnish will look for max-age to set its TTL.
When there's no Cache-Control header being returned, Varnish will use the Expires header to set its TTL.
When none of the above apply, Varnish will use the default_ttl runtime parameter as the TTL value. Its default value is 120 seconds.
Only then will Varnish enter vcl_backend_response, letting you change the TTL.
Any TTL being set in VCL using set beresp.ttl will get the upper hand, regardless of any other value being set via response headers.
Your specific situation
The best way to figure out what's going on is by running varnishlog and adding a filter for the URL you want to track.
Here's an example for the homepage:
varnishlog -g request -q "ReqUrl eq '/'"
The output will be extremely verbose, but will contain all the info you need.
Tags that are of particular interest are:
TTL see https://varnish-cache.org/docs/6.5/reference/vsl.html#varnish-shared-memory-logging
BerespHeader (specifically the Cache-Control backend response header)
RespHeader (specifically the Cache-Control response header)
Please also have a look at your VCL and check whether or not the TTL is changed by set beresp.ttl =.
What do I need to help you
In summary, if you want further assistance, please provide your full VCL, as well as a varnishlog extract for the transactions that is giving you to unexpected behavior.
Based on that information, we'll have a pretty good idea what's going on.

Azure CDN - Images Respond 404 to CURL

We have a vendor who sends us photos that are hosted on Azure Edge. These photos are available and I can download them, but if we do a CURL request we get a 404 roughly 4 out of 5 times. If we do a HEAD request to get the filesize, we get a 404 about 7 out of 10 times. On our production server, we get a 404 100% of the time. Any idea how we might work around this or if there's another way to check these files without the vendor having to fix their issue?
Sample file:
curl -I http://tdrvehicles2.azureedge.net/photos/202008/1419/1850/f253435f-86b1-4cc4-b95c-7756addddad4.jpg
HTTP/1.1 404 Not Found
Pragma: no-cache
Content-Length: 0
Server: Microsoft-IIS/10.0
X-Powered-By: ASP.NET
Cache-Control: max-age=31536000
Expires: Thu, 19 Aug 2021 14:12:54 GMT
Date: Wed, 19 Aug 2020 14:12:54 GMT
Connection: keep-alive```

Varnish FetchError for up to 30s after reload

I have a varnish 6 setup with 26 backends and after ram upgrade I have a problem where it throws 503 error after reload for about 15-30s and varnishlog says its - FetchError backend reload_20190417_131210_1488.server15: unhealthy
Full headers from varnishlog:
<< BeReq >> 106235039
Begin bereq 106235038 fetch
Timestamp Start: 1555506951.751066 0.000000 0.000000
BereqMethod GET
BereqURL /_files/b6/ee/59/4f/af/b6ee594fafd3f13556216d89452f3dd4_1.jpg
BereqProtocol HTTP/1.1
BereqHeader User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103
Safari/537.36
BereqHeader Accept: image/webp,image/apng,image/,/*;q=0.8
BereqHeader Referer: http://www.example.com/
BereqHeader Accept-Language: lv
BereqHeader x-range: bytes=1135466-1135466
BereqHeader grace: none
BereqHeader X-Forwarded-For: 84.237.232.159
BereqHeader host: www.example.com
BereqHeader Surrogate-Capability: key=ESI/1.0
BereqHeader Accept-Encoding: gzip
BereqHeader X-Varnish: 106235039
VCL_call BACKEND_FETCH
VCL_return fetch
FetchError backend reload_20190417_131210_1488.server15: unhealthy
Timestamp Beresp: 1555506951.751106 0.000040 0.000040
Timestamp Error: 1555506951.751111 0.000045 0.000005
BerespProtocol HTTP/1.1
BerespStatus 503
BerespReason Service Unavailable
BerespReason Backend fetch failed
BerespHeader Date: Wed, 17 Apr 2019 13:15:51 GMT
BerespHeader Server: Varnish
VCL_call BACKEND_ERROR
BerespHeader Content-Type: text/html; charset=utf-8
BerespHeader Retry-After: 5
VCL_return deliver
Storage malloc Transient
Length 286
BereqAcct 0 0 0 0 0 0
End
We had 16 GB ram will malloc 8 GB and now it is 32 GB with 23 GB malloc. We are using varnish 6 with VSF so it is pretty complex setup but it worked just fine. It compiles just fine without any error but throws 503 backend fetch fail to some domains after reload.
The FetchError is pretty clear, the backend is sick. Check it using varnishadm backend.health, it should tell you what is failing.
Showing your backend definition would help too.

TYPO3 static file cache doesn't work for each request

I have a strange problem with static file cache and TYPO3. It works, but not all the time. During tests of the delivered pages I recognized, that not every time the cached html is delivered.
I've then done several tests in differnt browsers which were never logged in to the TYPO3 backend. Everywhere the same. Some pages are cached, some not.
Even when I do multiple requests to the same page, e.g. via curl, sometimes I get the static variant, sometimes TYPO3 delivers the page. But it's not reproduceable.
I've created a small shell script to automate this testing for me:
#!/bin/sh
url="https://domain.tld/foo/bar/"
for i in `seq 10`;
do
echo $i
curl -s $url | grep statically
done
Expected behaviour would be to get the static page on each request, but reality looks like this:
➜ ~ ./test.sh
1
<!-- cached statically on: 14-12-16 08:46 -->
2
3
4
5
<!-- cached statically on: 14-12-16 08:46 -->
6
<!-- cached statically on: 14-12-16 08:46 -->
7
8
9
<!-- cached statically on: 14-12-16 08:46 -->
10
<!-- cached statically on: 14-12-16 08:46 -->
This is what the access log looks like:
xx.xxx.xx.xx - 14/Dec/2016:11:52:44 +0100 GET /foo/bar/ HTTP/1.1 200 26892 - curl/7.43.0
xx.xxx.xx.xx - 14/Dec/2016:11:52:44 +0100 GET /foo/bar/ HTTP/1.1 200 26892 - curl/7.43.0
xx.xxx.xx.xx - 14/Dec/2016:11:52:43 +0100 GET /foo/bar/ HTTP/1.1 200 26892 - curl/7.43.0
xx.xxx.xx.xx - 14/Dec/2016:11:52:43 +0100 GET /foo/bar/ HTTP/1.1 200 26661 - curl/7.43.0
xx.xxx.xx.xx - 14/Dec/2016:11:52:43 +0100 GET /foo/bar/ HTTP/1.1 200 26661 - curl/7.43.0
xx.xxx.xx.xx - 14/Dec/2016:11:52:42 +0100 GET /foo/bar/ HTTP/1.1 200 26661 - curl/7.43.0
xx.xxx.xx.xx - 14/Dec/2016:11:52:42 +0100 GET /foo/bar/ HTTP/1.1 200 26661 - curl/7.43.0
xx.xxx.xx.xx - 14/Dec/2016:11:52:42 +0100 GET /foo/bar/ HTTP/1.1 200 26661 - curl/7.43.0
xx.xxx.xx.xx - 14/Dec/2016:11:52:42 +0100 GET /foo/bar/ HTTP/1.1 200 26892 - curl/7.43.0
xx.xxx.xx.xx - 14/Dec/2016:11:52:41 +0100 GET /foo/bar/ HTTP/1.1 200 26892 - curl/7.43.0
The only difference is the size of the delivered content.
This is how the response header looks like. The first is broken, the second is correct:
➜ ~ curl -I https://domain.tld/foo/bar/
HTTP/1.1 200 OK
Date: Wed, 14 Dec 2016 12:14:17 GMT
Server: Apache/2.4.20
X-Powered-By: PHP/7.0.6
Content-Language: de
Content-Length: 21850
Strict-Transport-Security: max-age=31536000
Connection: close
Content-Type: text/html; charset=utf-8
➜ Downloads curl -I https://domain.tld/foo/bar/
HTTP/1.1 200 OK
Date: Wed, 14 Dec 2016 12:14:19 GMT
Server: Apache/2.4.20
Strict-Transport-Security: max-age=31536000
Vary: Host,Accept-Encoding
Last-Modified: Wed, 14 Dec 2016 07:46:38 GMT
Accept-Ranges: bytes
Content-Length: 21932
Cache-Control: max-age=38741
Expires: Wed, 14 Dec 2016 23:00:01 GMT
X-UA-Compatible: IE=edge
X-Content-Type-Options: nosniff
Connection: close
Content-Type: text/html; charset=utf-8
My .htaccess file with the mod_rewrite rules can be found here:
http://pastebin.com/5G6f3b4W
It's basically the default .htacess file shipped with TYPO3 7.6 and some custom additions.
Since the hosting is done on a managed server, there is no access to the vhost configuration to enable RewriteLog.
Tl;dr
creation of static file cache files works, the files are present in the file system
it's not reproduceable, why the static files aren't delivered at any request
I'm now searching for hints what I can do to further track down the problem.

Varnish 503 after 200 from backend

I have a Varnish 4.0.3 server on Centos 7.2. Varnish has three backends configured. I am receiving intermittent 503's from Varnish. I have pulled a tcpdump during a 503 event, and I saw:
Consumer makes request to Varnish
Varnish opens socket to backend.
Backend responds in < 500ms
Varnish sends a ACK,FIN to the Backend.
Varnish sends a 503 to the consumer.
Backend sends a ACK,FIN to Varnish
The requests that fail do not fundementally appear different from requests that are succeeding. The failure rate is ~1 per 20k requests.
- Begin req 2795361 rxreq
- Timestamp Start: 1464106437.502383 0.000000 0.000000
- Timestamp Req: 1464106437.502383 0.000000 0.000000
- ReqStart 10.14.X.X 43190
- ReqMethod GET
- ReqURL /service/v2/service/parameter/parameter/parameter
- ReqProtocol HTTP/1.1
- ReqHeader Accept: application/json
- ReqHeader Content-Type: application/json
- ReqHeader Host: UpsteamLoadBalancer:6081
- ReqHeader Connection: Keep-Alive
- ReqHeader User-Agent: Apache-HttpClient/4.2.4 (java 1.5)
- ReqHeader X-Forwarded-For: 10.14.X.X
- VCL_call RECV
- ReqURL /service/v2/service/parameter/parameter/parameter
- ReqUnset X-Forwarded-For: 10.14.X.X
- ReqHeader X-Forwarded-For: 10.14.X.X
- VCL_return hash
- VCL_call HASH
- VCL_return lookup
- Debug "XXXX MISS"
- VCL_call MISS
- VCL_return fetch
- Link bereq 2795368 fetch
- Timestamp Fetch: 1464106442.526296 5.023913 5.023913
- Timestamp Process: 1464106442.526311 5.023929 0.000015
- RespHeader Date: Tue, 24 May 2016 16:14:02 GMT
- RespHeader Server: Varnish
- RespHeader X-Varnish: 2795367
- RespProtocol HTTP/1.1
- RespStatus 503
- RespReason Service Unavailable
- RespReason Service Unavailable
- VCL_call SYNTH
- RespHeader Content-Type: text/html; charset=utf-8
- RespHeader Retry-After: 5
- VCL_return deliver
- RespHeader Content-Length: 281
- Debug "RES_MODE 2"
- RespHeader Connection: keep-alive
- Timestamp Resp: 1464106442.526356 5.023974 0.000045
- ReqAcct 290 0 290 211 281 492
- End
Your client is using HTTP to communicate with Varnish.
The HTTP response 503 represents mean "The server is currently unable to handle the request due to a temporary overloading or maintenance of the server. The implication is that this is a temporary condition which will be alleviated after some delay".
So this error is sent by the Varnish server indicating above reason.
Regards,
Sudhansu

Resources