Varnish Pass - request coalescing - varnish

I have a varnish 4 setup with - nginx ssl termination -> varnish -> varnish rr to 4 apache backends
We need to basically not cache any requests where a specific cookie isn't set on the incoming request, so in my vcl_recv I have:
if (!req.http.Cookie ~ "cookiename") {
return(pass);
}
This works fine initially, but as it is a busy site over time (10 mins or so) our backend failures, and busy sleep/wakeup are increasing, and we get 503s from varnish itself, but the backends are fine and don't appear to be under any real load. Which makes me think that the requests are queued and sent sequentially to the backends and it skips any request coalescing.
I can't really find anything to support this, is this the case? Or is there is a better way to do this? I would appreciate the feedback.
Thanks

Passed requests aren't request coalescing candidates. Request coalescing only applies to cacheable resources.
This means requests that go through vcl_miss, but that don't end up becoming Hit-For-Miss/Hit-For-Pass objects in vcl_backend_response.
Please use the following command to monitor potential HTTP 503 errors:
varnishlog -g request -q "BerespStatus == 503"
It will allow you to figure out why the error is taking place.

Related

Send HTTP request at exact time in future with Nodejs

I need to make POST http request at exact timestamp in future, as accurate as possible, down to milliseconds. But there is network latency as well. How can I achieve such a goal?
setTimeout is not enough here, because it always takes some time resulting in latecomer request due vary network latency. And firing this request before target timestamp may result in early coming request.
My goal is to make request guaranteed came to server after target timestamp, but as soon as possible after it. Could you suggest any solutions with Nodejs?
The best you can do in nodejs (which is not a real-time system) is to do the following:
Premeasure the expected latency so you know about how much to presend the request.
Use setTimeout() to schedule the send at precisely the one-way latency time before your target time. There is no other mechanism in nodejs that would be more precise.
If your request involves a DNS lookup, you can prefetch the TCP address for your hostname and take the DNS lookup time out of your request cycle or at least prime the local DNS cache.
Create a dedicated nodejs program that does nothing else - so its event loop will not be doing anything else at the time the setTimeout() needs to run. You could run this as a child_process from your larger program if desired.
Run a number of tests to see how the timing works and, if you are consistently off by some margin, then adjust your latency offset.
You can develop a regular latency test to determine if the latency changes with time.
As others have said, there is no way to predict what the natural response time will be of the target server (how long it takes to start processing your request from the moment your network packets arrive there). If lots of incoming requests are all racing for the same time slot, then your request will get interleaved in among all the others and served in some order that you do not control.
Other things you can consider. If the target server supports the latest http specifications, then you can have a pre-established http connection with the host (perhaps targeting some other endpoint) that will be kept alive for you to send your precise timing request on. This would take some experimentation to figure out what the target host supports and if this would work.

Setting High Availability Infrastructure

[Problem Statement]
We have a Tier 0 service which has haproxy LB and multiple back end server configured behind it. Currently, infrastructure is serving P99 with ~100 ms. Now, as per the 100% availability and 0 downtime. Sometimes we see, some of the back end servers misbehaves or goes out of LB and that moment all of landed requests on those back end servers gets timeout.
So we looking to have configuration like that if any request on server takes more than 100ms then this same request can route to another back end server and we can achieve the ~100℅ no time outs.
[Disclaimer]
I understand after a certain retires if still request timeout, then it will serve the timeouts to end consumer of our Tier - 0 service.
[Tech Stack]
HAProxy
Java
Java
MySQL
Azure
Would appreciate to discuss on this problem as I searched a lot but didn't get any reference, the way I am thinking but yes this could be possible by other ways so that we can achieve the no downtime and under the defined SLA of service.
Thanks
The option redispatch directive sends a request to a different server.
The retry-on directive states what type of errors to retry on.
The retries directive states how many times to retry.
option redispatch 1
retry-on all-retryable-errors
retries 3
Plus, you'll want to test how to setup the timeouts for the following
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
Make sure all requests are idempotent and have no side effects. Otherwise, you will end up causing a lot of problems for yourself.

Dynamic action calls are getting through Amazon CloudFront

We have configured CDN to speed up our website. In our website we are doing some ajax calls basically action calls which take some amount of time to get response from origin server because they are some heavy queries.
Query takes more than 40 - 50 seconds to execute, due to which for most of the actions which take more than 30 seconds to execute we are getting 504 timeout error from cloud front.
Is there any option in cloudfront where we can increase these limit for dynamic calls or if we can ignore these action by cloudfront because all these are dynamic action it shouldn't get route through cloudfront CDN.
There is no way to set Cloudfront timeouts.
A couple methods:
Route the dynamic calls directly to your server. As you suggested, CLoudfront is going to offer 0 benefit for those calls so don't use the Cloudfront urls and instead use the backend urls.
Polling. The goal is to change your long request into lots of short ones. One call to make the request. Then subsequent calls to check on the status of the job. This is clearly much more effort as it will result in some coding changes - however, at some point, your jobs are going to grow to timing out at the browser level as well, so might be something to think about now. (You could also use something like websockets, where there is a persistent connection that you pass data on.)

The behavior of varnish on MISS

Consider this scenario:
The Varnish cache has a MISS and the backend server is now regenerating the requested content. During the generation time a second request comes in and also gets an MISS. Does varnish send this request to the backend while the other request is pending? What if thousand requests come in between this time. Server would crash right? Every request would make it slower.
Is this correct or is varnish "synchronizing" those scenarios to prevent such a problem?
Thank you in advance!
Varnish sends all the requests to the backend. I.e. it does not queue other requests and issue just one backend request and use its response for all.
However Varnish has a grace option that lets you keep old, expired content in cache for these types of situations.
For example consider the following VCL:
sub vcl_recv {
if (req.backend.healthy) {
set req.grace = 5m;
} else {
set req.grace = 24h;
}
}
sub vcl_fetch {
set beresp.grace = 24h;
}
Now if a backend is healthy (see backend polling) and a request results in a MISS, the first request is sent to a backend. If another request comes for the same content, but there is an item in the cache with age <TTL+req.grace (in this case 5 minutes), that request will get the "stale" content instead. This happens as long as either the first request that resulted in a MISS gets a response from the backend (and the cache is fresh again) or the age of the item becomes greater than TTL+req.grace.
If the backend was down (req.backend.healthy == FALSE), stale content would be served as long as age<TTL+24h.
You might also want to check out the Saving a request section of the Varnish book for a more thorough example and an exercise.
Fixed: unescaped < character.
Fixed more: There was another unescaped < character...
I believe Ketola's (accepted) answer is wrong.
Multiple requests to Varnish for the same URI will be queued.
Then it depends if the result of the first request is cacheable or not. If it is, it will be used for other (queued) requests as well.
If not, all other queued requests will be sent to the backend.
So if you have some slow API endpoint you want to cache, and it's cacheable (regarding Varnish rules), multiple requests will hit the backend only once for that URI.
I don't have the points or whatever to comment on #max_i's answer so I'm submitting another answer to verify his instead.
Ketola's accepted answer isn't completely wrong, it's possibly just out of date, and may have been true for older versions of Varnish. Specifically this part:
Varnish sends all the requests to the backend. I.e. it does not queue other requests and issue just one backend request and use its response for all.
After independently testing this myself using a standard installation of Varnish 4.1 LTS and Apache 2.4, I created a basic PHP file which contained the following:
<?php sleep(5); echo 'hello world!';
Then used ab to test the HTTP request cycle using 50 requests at 5 concurrency. The results showed that while Varnish accepted every single connection, only one request was ever made to the backend, which, as expected, took roughly 5 seconds to resolve. Each Varnish connection subsequently had to wait that minimum period before receiving a response.
The downside to this is of course that requests after the first one are "queued" behind it, but this is of course a minor concern compared to all 50 requests hitting the backend at once (or in the case of my test, with a concurrency of 5).

Multiple request triggered when used browser but not when used java httpclient

Here is my application cloud environment.
I have ELB with sticky session -> 2 HA Proxy -> 1 Machines which hosts my application on jboss.
I am processing a request which takes more than 1 minute. I am logging IP addresses at the start of the processing request.
When i process this request through browser, I see that duplicate request is being logged after 1 minute and few seconds. If first request routes from the HAProxy1 then another request routes from HAProxy2. On browser I get HttpStatus=0 response after 2.1 minute
My hypotesis is that ELB is triggering this duplicate request.
Kindly help me to verify this hypothesis.
When I use the Apache Http Client for same request, I do not see duplicate request being triggered. Also I get exception after 1 minute and few seconds.
org.apache.http.NoHttpResponseException: The target server failed to respond
Kindly help me to understand what is happening over here.
-Thanks
By ELB I presume you are referring to Amazon AWS's Elastic Load Balancer.
Elastic Load Balancer has a built-in request time-out of 60 seconds, which cannot be changed. The browser has smart re-try logic, hence you're seeing two requests, but your server should be processing them as two separate unrelated requests, so this actually makes matters worse. Using httpclient, the timeout causes the NoHttpResponseException, and no retry is used.
The solution is to either improve the performance of your request on the server, or have the initial request fire off a background task, and then a supplemental request (possibly using AJAX) which polls for completion.

Resources