Here is my application cloud environment.
I have ELB with sticky session -> 2 HA Proxy -> 1 Machines which hosts my application on jboss.
I am processing a request which takes more than 1 minute. I am logging IP addresses at the start of the processing request.
When i process this request through browser, I see that duplicate request is being logged after 1 minute and few seconds. If first request routes from the HAProxy1 then another request routes from HAProxy2. On browser I get HttpStatus=0 response after 2.1 minute
My hypotesis is that ELB is triggering this duplicate request.
Kindly help me to verify this hypothesis.
When I use the Apache Http Client for same request, I do not see duplicate request being triggered. Also I get exception after 1 minute and few seconds.
org.apache.http.NoHttpResponseException: The target server failed to respond
Kindly help me to understand what is happening over here.
-Thanks
By ELB I presume you are referring to Amazon AWS's Elastic Load Balancer.
Elastic Load Balancer has a built-in request time-out of 60 seconds, which cannot be changed. The browser has smart re-try logic, hence you're seeing two requests, but your server should be processing them as two separate unrelated requests, so this actually makes matters worse. Using httpclient, the timeout causes the NoHttpResponseException, and no retry is used.
The solution is to either improve the performance of your request on the server, or have the initial request fire off a background task, and then a supplemental request (possibly using AJAX) which polls for completion.
Related
[Problem Statement]
We have a Tier 0 service which has haproxy LB and multiple back end server configured behind it. Currently, infrastructure is serving P99 with ~100 ms. Now, as per the 100% availability and 0 downtime. Sometimes we see, some of the back end servers misbehaves or goes out of LB and that moment all of landed requests on those back end servers gets timeout.
So we looking to have configuration like that if any request on server takes more than 100ms then this same request can route to another back end server and we can achieve the ~100℅ no time outs.
[Disclaimer]
I understand after a certain retires if still request timeout, then it will serve the timeouts to end consumer of our Tier - 0 service.
[Tech Stack]
HAProxy
Java
Java
MySQL
Azure
Would appreciate to discuss on this problem as I searched a lot but didn't get any reference, the way I am thinking but yes this could be possible by other ways so that we can achieve the no downtime and under the defined SLA of service.
Thanks
The option redispatch directive sends a request to a different server.
The retry-on directive states what type of errors to retry on.
The retries directive states how many times to retry.
option redispatch 1
retry-on all-retryable-errors
retries 3
Plus, you'll want to test how to setup the timeouts for the following
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
Make sure all requests are idempotent and have no side effects. Otherwise, you will end up causing a lot of problems for yourself.
I have Web application on IIS server.
I have POST method that take a long time to run (Around 30-40min).
After period time the application stop running (Without any exception).
I set Idle timeout to be 0 and It is not help for me.
What can I do to solve it?
Instead of doing all the work initiated by the request before responding at all:
Receive the request
Put the information in the request in a queue (which you could manage with a database table, ZeroMQ, or whatever else you like)
Respond with a "Request recieved" message.
That way you respond within seconds, which is acceptable for HTTP.
Then have a separate process monitor the queue and process the data on it (doing the 30-40 minute long job). When the job is complete, notify the user.
You could do this through the browser with a Notification or through a WebSocket or use a completely different mechanism (such as by sending an email to the user who made the request).
We have a setup with several RESTful APIs on the same VM in Azure.
The websites run in Kestrel on IIS.
They are protected by the azure application gateway with firewall.
We now have requests that would run for at least 20 minutes.
The request run the full length uninterrupted on Kestrel (Visible in the logs) but the sender either get "socket hang up" after exactly 5 minutes or run forever even if the request finished in kestrel. The request continue in Kestrel even if the connection was interrupted for the sender.
What I have done:
Wrote a small example application that returns after a set amount of
seconds to exclude our websites being the problem.
Ran the request in the VM (to localhost): No problems, response was received.
Ran the request within Azure from one to another VM: Request ran forever.
Ran the request from outside of Azure: Request terminates after 5 minutes
with "socket hang up".
Checked set timeouts: Kestrel: 50m , IIS: 4000s, ApplicationGateway-HttpSettings: 3600
Request were tested with Postman,
Is there another request or connection timeout hidden somewhere in Azure?
We now have requests that would run for at least 20 minutes.
This is a horrible architecture and it should be rewritten to be async. Don't take this personally, it is what it is. Consider returning a 202 Accepted with a Location header to poll for the result.
You're most probably hitting the Azure SNAT layer timeout —
Change it under the Configuration blade for the Public IP.
So I ran into something like this a little while back:
For us the issue was probably the timeout like the other answer suggests but the solution was (instead of increasing timeout) to add PGbouncer in front of our postgres database to manage the connections and make sure a new one is started before the timeout fires.
Not sure what your backend connection looks like but something similar (backend db proxy) could work to give you more ability to tune connection / reconnection on your side.
For us we were running AKS (azure Kubernetes service) but all azure public ips obey the same rules that cause issues similar to this one.
While it isn't an answer I know there are also two types of public IP addresses, one of them is considered 'basic' and doesn't have the same configurability, could be something related to the difference between basic and standard public ips / load balancers?
I have an application with webix on UI and node js on server side.
From the UI if I trigger a long running AJAX request for e.g. process 1000 records, the request errors out after 1.5 mins (not consistently) approximately.
The error object contains no information about the reason for request failure but since processing smaller set of records seems to work fine I am thinking of blaming it on timeout.
From the developer console I see that request seems to be Stalled and response is empty.
Currently I cant drop a request and keep polling it after every few seconds to see if the processing has been finished. I have to wait for the request to finish but I am not sure how to do it as webix forum doesn't seem to have any information on this except for setting timeout.
If setting timeout is the way to go then what would happen tomorrow if the request size goes to 2000 records - I don't want to keep on increasing the timeout
Also, if I am left with no choice how would I implement the polling. If I drop a request on to server there can be other clients as well who are triggering a similar request. How would I distinguish between requests originated from different clients?
I would really appreciate some help on this.
Azure apparently has a 4 minute timeout for http requests before they kill the connection. This is non configurable in app services:
https://social.msdn.microsoft.com/Forums/en-US/32b76114-67a4-4e6b-ac45-61b0f0a0829f/changing-the-4-minute-request-time-out-for-app-services?forum=AzureAPIApps
I have seen this first hand in my application - I have a process that allows users to view files that exist on a network drive, select a subset of those files and upload those files to a third party service. This happens via a post request which sends the list of file names using content-type json. This operation can take a while and I receive a timeout error at almost exactly 4 minutes.
I also have another process which allows users to drag and drop files into the web application directly, these files are posted to the server using content-type multipart/form-data, and forwarded to the third party service. This request never times out no matter how long the upload takes.
Is there something about using multipart/form-data that overrides azures 4 minute timeout?
It probably does not matter but I am using Node.
The timeout is actually 3m 50s (230 seconds) and not 4 minutes.
But note that it is an idle connection timeout, meaning that it only kicks in if there is no data flowing in the request/response. So it is strange that you would hit this if you are actively uploading files. I would suggest monitoring network traffic to see if anything is being sent. If it really goes 230s with no uploaded data, then there is probably some other issue, and the timeout is just a side effect.