We currently have an application hosted on a Azure VM instance.
This application sometimes processes long-running and idle HTTP requests. This is causing an issue because Azure will close all connections that have been idle for longer than a few minutes.
I've seen some suggestions about setting a lower TCP keepalive rate. I've tried setting the this rate to around 45 seconds but my HTTP requests are still being closed.
Any suggestions? Our VM is running Server 2008 R2.
As a simple workaround, I had my script send a newline character every 5 seconds or so to keep the connection alive.
Example:
set_time_limit(60 * 30);
ini_set("zlib.output_compression", 0);
ini_set("implicit_flush", 1);
function flushBuffers()
{
#ob_end_flush();
#ob_flush();
#flush();
#ob_start();
}
function azureWorkaround($char = "\n")
{
echo $char;
flushBuffers();
}
$html = '';
$employees = getEmployees();
foreach($employee in $employees) {
html .= getReportHtmlForEmployee($employee);
azureWorkaround();
}
echo $html;
The Azure Load Balancer now supports configurable TCP Idle timeout for your Cloud Services and Virtual Machines. This feature can be configured using the Service Management API, PowerShell or the service model.
For more information check the announcement at http://azure.microsoft.com/blog/2014/08/14/new-configurable-idle-timeout-for-azure-load-balancer/
Related
I have a Blazor server side app published on IIS 10.
When browsing to an arbitrary page and just letting it idle after a minute or so (sometimes only 45 sec, sometimes something between 1 and two minutes) the modal
Attempting to reconnect to server ...
appears for a couple of seconds.
In the browser console the logging shows either
Error: Connection disconnected with error 'Error: Server timeout
elapsed without receiving a message from the server.'.
or
Information: Connection disconnected.
Since this seems to be a timeout problem I added the following options to ConfigureServices in my startup.cs
services.AddServerSideBlazor()
.AddHubOptions(options =>
{
options.ClientTimeoutInterval = TimeSpan.FromMinutes(10);
options.KeepAliveInterval = TimeSpan.FromSeconds(3);
options.HandshakeTimeout = TimeSpan.FromMinutes(10);
});
This does not solve the problem though.
I also went to the advanced settings of my site in IIS and increased the connection timeout from the default 120 sec to 600 sec. This did not help either.
Those frequent disconnections only happen on the live site hosted on IIS 10.
If I start the app locally with Visual Studio the connection is stable.
Any hints of what I'm missing would be appreciated!
Update:
As suggested by #agua from mars in comment below I changed transport type like this
app.UseEndpoints(endpoints =>
{
endpoints.MapControllers();
endpoints.MapBlazorHub(options => { options.Transports = HttpTransportType.LongPolling; });
endpoints.MapFallbackToPage("/_Host");
});
With this change the connection is still closed. The console log shows
Information: (LongPolling transport) Poll terminated by server.
I also tried HttpTransportType.ServerSentEvents which does not work at all but gives this error
Error: Failed to start the connection: Error: Unable to connect to the
server with any of the available transports. ServerSentEvents failed:
Error: 'ServerSentEvents' does not support Binary.
Update 2:
The IIS is configured to use HTTP 1.1
I tried changing to HTTP/2 but this did not change anything regarding the disconnections.
This is related to application pool recycling in IIS as stated by #Programmer. You can reproduce this by going into the application pool, right click the pool and choose recycle to force it. Your blazor app will get the "reconnect modal screen".
For me, I did not want to disable pool recycle, so I added js in the _Hosts.cshtml file as
<script>Blazor.defaultReconnectionHandler._reconnectCallback = function (d) {document.location.reload();}</script>
to automatically reconnect when the server comes back up.
Try this out..
app.UseEndpoints(endpoints =>
{
//other settings
.
.
endpoints.MapBlazorHub(options => options.WebSockets.CloseTimeout = new TimeSpan(1, 1, 1));
//other settings
.
.
});
This could be related to IIS application pool recycling. Try disabling the recycling to see if that's casing the disconnection.
I suffer the same problem on my Blazor server too: Myspector.com
I am sure this comes from network of data provider. I use Othello in Germany with 4G and see disconnection in 5 sec . When I am with wifi with t online on same target server no disconnection at all.
I Think some operators are incompatible with Blazor server/websoscket....
My recent experience especially on a shared server, increase the pool memory. Connectivity issues went away when we bumped 256MB up to 1GB for a small user base.
Backgound:
I'm currently hosting an ASP.NET application in Azure with the following specs:
ASP .Net Core 2.2
Using Flurl for HTTP requests
Kestrel Webserver
Docker (Linux - mcr.microsoft.com/dotnet/core/aspnet:2.2 runtime)
Azure App Service on P2V2 tier app service plan
I have a a couple of background jobs that run on the service that makes a lot of outbound HTTP calls to a 3rd party service.
Issue:
Under a small load (approximately 1 call per 10 seconds), all requests are completed in under a second with no issue. The issue I'm having is that under a heavy load, when service can make up to 3/4 calls in a 10 second span, some of the requests will randomly timeout and throw an exception. When I was using RestSharp the exception would read "The operation has timed out". Now that I'm using Flurl, the exception reads "The call timed out".
Here's the kicker - If I run the same job from my laptop running Windows 10 / Visual Studios 2017, this problem does NOT occur. This leads me to believe I'm hitting some limit or running out of some resource in my hosted environment. Unclear if that is connection/socket or thread related.
Things I've tried:
Ensure all code paths to the request are using async/await to prevent lockouts
Ensure Kestrel Defaults allow unlimited connections (it does by default)
Ensure Dockers default connection limits are sufficient (2000 by default, more than enough)
Configuring ServicePointManager settings for connection limits
Here is the code in my startup.cs that I'm currently using to try and prevent this issue:
public class Startup
{
public Startup(IHostingEnvironment hostingEnvironment)
{
...
// ServicePointManager setup
ServicePointManager.UseNagleAlgorithm = false;
ServicePointManager.Expect100Continue = false;
ServicePointManager.DefaultConnectionLimit = int.MaxValue;
ServicePointManager.EnableDnsRoundRobin = true;
ServicePointManager.ReusePort = true;
// Set Service point timeouts
var sp = ServicePointManager.FindServicePoint(new Uri("https://placeholder.thirdparty.com"));
sp.ConnectionLeaseTimeout = 15 * 1000; // 15 seconds
FlurlHttp.ConfigureClient("https://placeholder.thirdparty.com", cli => cli.Settings.ConnectionLeaseTimeout = new TimeSpan(0, 0, 15));
}
}
Has anyone else run into a similar issue to this? I'm open to any suggestions on how to best debug this situation, or possible methods to correct the issue. I'm at a complete loss after researching this for several days.
Thank you in advance.
I had similar issues. Take a look at Asp.net Core HttpClient has many TIME_WAIT or CLOSE_WAIT connections . Debugging via netstat helped identify the problem for me. As one possible solution. I suggest you use IHttpClientFactory. You can get more info from https://learn.microsoft.com/en-us/aspnet/core/fundamentals/http-requests?view=aspnetcore-2.2 It should be fairly easy to use as described in Flurl client lifetime in ASP.Net Core 2.1 and IHttpClientFactory
I have an ASP.NET MVC 5 application running in the azure german cloud as Azure Web App (single instance - Standard S3 size).
I'm calling a non azure hosted REST/SOAP service on a particular host and the web requests either succeed promptly or timeout after 21 / 42 seconds.
I've load tested the requests and the percentile of requests timing out is between 20 and 80.
One particular remarkable property of the timeout is, that they occur after exactly 21 or 42 seconds (this is serious, no reference to hitchhiker's guide to the galaxy intended).
Calling a different service from the web app works just fine, temporarily at least.
We've already checked the firewall of the non azure service and if the timeout occurs, not a single packet reached the host.
This issue occurred once in the past one year ago and support was unable to tell what the cause was until the issue suddenly went away roughly two weeks after first occuring, so the ticket got closed as fixed itself but now its back.
The code is using https://github.com/canton7/RestEase (uses HttpClient underneath) and looks like
[Header("Content-Type", "application/json")]
public interface IApi
{
[Post("/Login")]
Task<LoginToken> Login([Body]LoginRequest request);
}
private static Dictionary<string, IApi> ApiClientsByHost = new Dictionary<string, IApi>();
private IApi GetApiForHost(string host)
{
if (!ApiClientsByHost.TryGetValue(host, out var client))
{
lock (ApiClientsByHost)
{
if (!ApiClientsByHost.TryGetValue(host, out client))
{
ApiClientsByHost[host] = client = RestClient.For<IApi>(host);
}
}
}
return client;
}
var client = GetApiForHost("https://production/");
var loginToken = await client.Login(new LoginRequest { Username = username, Password = password });
By different service, i mean using "https://testserver/" instead of "https://production/" (testserver is located in a different data center with different IP and all).
The API authentication is passing a token via query but it timeouts already before being able to get a token.
The code is caching the IApi to avoid the TCP starvation problems of disposing HttpClients (but i've never run into port exhaustion).
Restarting the app does not resolve the issue and the issue only occurs to production currently (but a year ago, when this issue occurred on production, we've switched to testserver which worked initially but after some time, ran into the same problem)
EDIT: Found some explanation in the last answer as to where those magical 21 seconds are comming from.
EDIT: One way i've found to workaround is, is to setup a azure vm with a proxy on it and configure defaultProxy to pass through that vm.
That's TCP retransmission timing out. It's odd that you are getting different values though.
I'm in the process of troubleshooting an App Service that is using websockets.
It's running on service plan Basic which allows for 350 websockets.
This is the only app on that plan that uses websockets.
The problem is that after abou 20 hours I get 503 responses saying I reached my websocket limit.
The setup right now has 3 clients connecting to the service.
In the process of investigating websocket leakage in my app I would like to track the number of websockets in use.
Is there anywhere, from my app or in Azure portal, where I can see the number of active websocket connections?
Follow up:
I've logged the websocket connections as Amor suggested.
The HTTP part of my app is still working, I can get dynamic results from the app which now reports what websocket connections are active and how many has been created since start.
After restarting the app service and configured one client to reconnect indefinetely.
It worked fine until the "total websocket connections" reached 350. At this time I shut down the client.
The limit should be 350 concurrent connections but it looks like it is 350 in total since start.
Most (at least 340) of these connections were initiated by a single client which disposed each connection before starting a new one, it has been shutdown once the limit was reached.
I've been suggested to upgrade from Basic to Standard since standard doesn't have the artificial limitation. The only reason I can see this work would be if there is a bug in the websocket limitation for the Basic plan.
Update 2
In parallel I've been in contact with Microsoft Developer Support and they noticed what appears to be that the sockets are stuck in IIS whereas not in Kestrel. The cause of this is still being investigated.
Support could show me graphs of the connection usage over time which clearly showed how the limit was reached.
I'll keep this question updated in case there was some error in my code.
I suggest you define a variable to count the connections. If a web socket connection is opened, just increase the number of connections. If a web socket connection is closed, decrease the number of connections. Code below is for your reference.
Count the connections for ASP.NET SignalR.
public class MyHub : Hub
{
private int _connectionCount = 0;
public override Task OnConnected()
{
_connectionCount++;
return base.OnConnected();
}
public override Task OnReconnected()
{
_connectionCount++;
return base.OnReconnected();
}
public override Task OnDisconnected(bool stopCalled)
{
_connectionCount--;
return base.OnDisconnected(stopCalled);
}
}
Count the connections in traditional ASP.NET.
public class WSChatController : ApiController
{
private int _connectionCount = 0;
public HttpResponseMessage Get()
{
if (HttpContext.Current.IsWebSocketRequest)
{
HttpContext.Current.AcceptWebSocketRequest(ProcessWSChat);
}
return new HttpResponseMessage(HttpStatusCode.SwitchingProtocols);
}
private async Task ProcessWSChat(AspNetWebSocketContext context)
{
WebSocket socket = context.WebSocket;
while (true)
{
ArraySegment<byte> buffer = new ArraySegment<byte>(new byte[1024]);
WebSocketReceiveResult result = await socket.ReceiveAsync(
buffer, CancellationToken.None);
if (socket.State == WebSocketState.Open)
{
_connectionCount++;
//Process the request
}
else
{
_connectionCount--;
break;
}
}
}
}
Context
I have a RedisMqServer configured to handle a single message on my ServiceStack web service. The messages on that MQ originate from another application and show up in the .inq with all the correct properties. Everything is on 4.0.38.
My configuration in MyAppHost.cs:
public override void Configure(Container container)
{
var redisFactory = new PooledRedisClientManager(0, "etc:etc");
redisFactory.ConnectTimeout = 5;
redisFactory.IdleTimeOutSecs = 30;
redisFactory.PoolTimeout = 3;
container.Register<IRedisClientsManager>(redisFactory);
//Plugins, Filters, other Registrations omitted
var mqHost = new RedisMqServer(redisFactory, retryCount: 2);
mqHost.DisablePublishingResponses = true;
mqHost.RegisterHandler<CreateVisitor>(ServiceController.ExecuteMessage);
mqHost.Start();
}
And then in Global.asax.cs:
void Application_Start(object sender, EventArgs e)
{
new MyAppHost().Init();
}
Problem
The messages are not consistently handled when I deploy this elsewhere. They wait in the .inq until whenever. Nothing is lost, just delayed for an indeterminate duration.
As of this moment, the only things that come to mind are:
I'm using IIS Express locally, and the server is using IIS.
Application_Start needs to happen before it can handle messages.
I've tried initializing the service by making other API calls over HTTP, before and after queuing messages, with more failure than success. Sometimes the service starts to handle them, but I am unable to identify and thus influence when this happens.
Note
I do have several other console applications and windows services that listen on other MQs and handle messages placed by other applications, and those have always worked flawlessly. This is the first time I've tried this from within an existing web service, however.
Hard to know what the issue from this description (are messages getting lost or just delayed?) but this sounds like it's due to ASP.NET AppDomain recycling in which case you can disable AppDomain recycling or setup up a continuous ping route to hit your ASP.NET Web Application to keep the AppDomain alive.
If the ASP.NET Service is available on the Internet you can use services like https://uptimerobot.com or https://www.pingdom.com to configure it to ping your Service at different intervals (e.g. 5-10 minutes) otherwise if this is an internal Service you can use a Scheduled Task.