SignalR connection hangs, calls client ~30 seconds later - iis

I'm using the latest SignalR from NuGet in a MVC4 site. Using the sample hub code (or any code), I'm getting some weird connection problems. Everything loads fine, SignalR makes the negotiate call and logs "EventSource Connected" and returns connectionid. The problem starts when it makes the signalR/connect request with any transport. It returns a 200 response with the proper headers but the connection hangs open. If the called hub method executes a method on Caller or Clients, it does not get executed in the browser until the next request OR after 10-30 seconds later if you sit and wait. It's like something's stuck in the pipe and it gets flushed by the next request or some cleanup mechanism.
I made a clean project for this in a new website with its own app pool. The issue occurs only on one machine and only under IIS7.5. The same project run on the same machine under IIS Express or Cassini works fine. This project ran fine the last time I worked on it about a month ago. I've tried different browsers and different jQuery versions. I've tried restarting the entire machine and spent several hours in fiddler/debugger to no avail.
This is the server-side code the test runs:
public class Chub : Hub {
public void CallMeBack() {
Caller.callme();
}
}
Blew the whole day on this, hope someone can help!

Compression doesn't play well with streaming responses like EventSource, ForeverFrame, etc. as used by SignalR, so switching off compression is the only solution right now.
Maybe it's possible to switch off compression just for the ~/signalr path, I'll try that later on and update this answer with the results.
UPDATE: After a thorough analysis, I've found out this issue only occurs when the application pool is set to "Classic Mode" in IIS and Dynamic Compression is enabled. See my comment on this SignalR issue. If you use an "Integrated Mode" application pool, you are not affected and SignalR works as expected (even if compression is enabled).
In case you're stuck with Classic Mode + Dynamic Compression, add the following to your web.config for switching off compression only for the /signalr path:
<location path="signalr">
<system.webServer>
<urlCompression doDynamicCompression="false"/>
</system.webServer>
</location>

Related

SignalR long polling repeatedly calls /negotiate and /hub POST and returns 404 occasionally on Azure Web App

We have enabled SignalR on our ASP.NET Core 5.0 web project running on an Azure Web App (Windows App Service Plan). Our SignalR client is an Angular client using the #microsoft/signalr NPM package (version 5.0.11).
We have a hub located at /api/hub/notification.
Everything works as expected for most of our clients, the web socket connection is established and we can call methods from client to server and vice versa.
For a few of our clients, we see a massive amount of requests to POST /api/hub/notification/negotiate and POST /api/hub/notification within a short period of time (multiple requests per minute per client). It seems like that those clients switch to long polling instead of using web sockets since we see the POST /api/hub/notification requests.
We have the suspicion that the affected clients could maybe sit behind a proxy or a firewall which forbids the web sockets and therefore the connection switches to long polling in the first place.
The following screenshot shows requests to the hub endpoints for one single user within a short period of time. The list is very long since this pattern repeats as long as the user has opened our website. We see two strange things:
The client repeatedly calls /negotiate twice every 15 seconds.
The call to POST /notification?id=<connectionId> takes exactly 15 seconds and the following call with the same connection ID returns a 404 response. Then the pattern repeats and /negotiate is called again.
For testing purposes, we enabled only long polling in our client. This works for us as expected too. Unfortunately, we currently don't have access to the browsers or the network of the users where this behavior occurs, so it is hard for us to reproduce the issue.
Some more notes:
We currently have just one single instance of the Web App running.
We use the Redis backplane for a scale-out scenario in future.
The ARR affinity cookie is enabled and Web Sockets in the Azure Web App are enabled too.
The Web App instance doesn't suffer from high CPU usage or high memory usage.
We didn't change any SignalR options except of adding the Redis backplane. We just use services.AddSignalR().AddStackExchangeRedis(...) and endpoints.MapHub<NotificationHub>("/api/hub/notification").
The website runs on HTTPS.
What could cause these repeated calls to /negotiate and the 404 returns from the hub endpoint?
How can we further debug the issue without having access to the clients where this issue occurs?
Update
We now implemented a custom logger for the #microsoft/signalr package which we use in the configureLogger() overload. This logger logs into our Application Insights which allows us to track the client side logs of those clients where our issue occurs.
The following screenshot shows a short snippet of the log entries for one single client.
We see that the WebSocket connection fails (Failed to start the transport "WebSockets" ...) and the fallback transport ServerSentEvents is used. We see the log The HttpConnection connected successfully, but after pretty exactly 15 seconds after selecting the ServerSentEvents transport, a handshake request is sent which fails with the message from the server Server returned handshake error: Handshake was canceled. After that some more consequential errors occur and the connection gets closed. After that, the connection gets established again and everything starts from new, a new handshare error occurs after those 15 seconds and so on.
Why does it take so long for the client to send the handshake request? It seems like those 15 seconds are the problem, since this is too long for the server and the server cancels the connection due to a timeout.
We still think that this has maybe something to to with the client's network (Proxy, Firewall, etc.).
Fiddler
We used Fiddler to block the WebSockets for testing. As expected, the fallback mechanism starts and ServerSentEvents is used as transport. Opposed to the logs we see from our issue, the handshake request is sent immediately and not after 15 seconds. Then everything works as expected.
You should check which pricing tier you use, Free or Standard in your project.
You should change the connectionstring which is in Standard Tier. If you still use Free tier, there are some restrictions.
Official doc: Azure SignalR Service limits

Node app keeps crashing due to exhausted memory after migrated to Windows environment

I am working on a React site that was originally built by someone else. The app uses the Wordpress rest api for handling content.
Currently the live app sits on a nginx server running node v6 and it has been working just fine. However now I have to move the app over to an IIS environment(not by choice) and have been having nothing but problems with it. I have got the app to finally run as expected which is great, but now I am running into an issue regarding the memory in node becoming exhausted.
So when I was debugging this issue I noticed the server's firewall was polling the home route every 5 - 10 seconds, which was firing an api request to the Wordpress api each time. The api then would return a pretty large JSON object of data.
So my conclusion to this was the firewall is polling the home route too often which was killing the memory because then the app had to constantly fire api request and load in huge sets of data over and over.
So my solution was to set up a polling route on the node server(express) which would just return a 200 response and nothing else. This seemed to fix the issue as the app went from crashing every hours to lasting over two days. However after about two days the app crashed again with another memory error. The error looked like this:
So since the app lasted much longer with the polling route added in I assume that firewall polling was/is in fact my issue here, however now that I added in the polling route and the app still crashed after a couple days I have no idea what to do which is why I am asking for help.
I am very unfamiliar with working on Windows so I don't know if there are any memory restrictions or any obvious things I could do to help prevent this issue.
Some other notes are: I have tried increasing the --max-old-space-size to about 8000 but it didn't seem to do anything so I don't know if I am maybe implementing it wrong but when I start the script I have tried the following commands when starting the app:
Start-process npm -Argumentlist “run server-prod --max-old-space-size=8192” -WorkingDirectory C:\node\prod
And when I used forever to handle the process
forever start -o out.log -e error.log .\lib\server\server.js -c "node --max_old_space_size=8000”
Any help on what could be the issue or tips on what I should look for woulf be great, again I am very new to working on Windows so maybe there is just something I am missing.

Unable to connect to Azure Redis Cache with SSL

Connecting to Azure Redis Cache like this, on Owin application startup...
var options = ConfigurationOptions.Parse(cacheConnectionString);
var kernel = new StandardKernel();
kernel.Bind<ConnectionMultiplexer>().ToMethod(context =>
{
return ConnectionMultiplexer.Connect(options);
}).InSingletonScope();
Which works absolutely fine for Redis running on my local machine, or for Azure Redis with SSL turned off. However, as soon as I change the connection string from:
xyz.redis.cache.windows.net,ssl=false,password=abcdefghxyz=
to
xyz.redis.cache.windows.net,ssl=true,password=abcdefghxyz=
It throws:
It was not possible to connect to the redis server(s); to create a disconnected multiplexer, disable AbortOnConnectFail. UnableToResolvePhysicalConnection on PING
I'm using StackExchange.Redis version 1.0.316.0 from NuGet. I've tried...
Creating different caches in Azure Portal. Hasn't worked over SSL with any of them :(
Connecting without using Ninject
Creating the ConfigurationOptions object manually rather than parsing a string
I'm all out of ideas for what could possibly be going wrong now though. Hopefully it's something trivial I've missed, but just can't see it!
Edit
I ran the ConnectToAzure() unit test, passing in my cache name and password and it passed. So I'm almost certainly doing something silly here.
Edit 2
It also works from a console application without any issues.
I fixed it, super-weird situation but I'll answer just in case someone else is ever sat equally as confused as I've been.
The project was previously a web role in an Azure Cloud Service, which had In-role caching enabled. We moved it to a standalone Azure Web App, but never got around to removing all the unnecessary references that were left over.
Removing Microsoft.WindowsAzure.ServiceRuntime and Microsoft.WindowsAzure.Diagnostics miraculously got it working.

Ektorp querying performance against CouchDB is 4x slower when initiating the request from a remote host

I have a Spring MVC app running under Jetty. It connects to a CouchDB instance on the same host using Ektorp.
In this scenario, once the Web API request comes into Jetty, I don't have any code that connects to anything not on the localhost of where the Jetty instance is running. This is an important point for later.
I have debug statements in Jetty to show me the performance of various components of my app, including the performance of querying my CouchDB database.
Scenario 1: When I initiate the API request from localhost, i.e. I use Chrome to go to http://localhost:8080/, my debug statements indicate a CouchDB peformance of X.
Scenario 2: When I initiate the exact same API request from a remote host, i.e. I use Chrome to go to http://:8080/, my debug statements indicate a CouchDB performance of 4X.
It looks like something is causing the connection to CouchDB to be much slower in scenario 2 than scenario 1. That doesn't seem to make sense, since once the request comes into my app, regardless of where it came from, I don't have any code in the application that establishes the connection to CouchDB differently based on where the initial API request came from. As a matter of fact, I have nothing that establishes the connection to CouchDB differently based on anything.
It's always the same connection (from the application's perspective), and I have been able to reproduce this issue 100% of the time with a Jetty restart in between scenario 1 and 2, so it does not seem to be related to caching either.
I've gone fairly deep into StdCouchDbConnector and StdHttpClient to try to figure out if anything is different in these two scenarios, but cannot see anything different.
I have added timers around the executeRequest(HttpUriRequest request, boolean useBackend) call in StdHttpClient to confirm this is where the delay is happening and it is. The time difference between Scenario 1 and 2 is several fold on client.execute(), which basically uses the Apache HttpClient to connect to CouchDB.
I have also tried always using the "backend" HttpClient in StdHttpClient, just to take Apache HTTP caching out of the equation, and I've gotten the same results as Scenarios 1 and 2.
Has anyone run into this issue before, or does anyone have any idea what may be happening here? I have gone all the way down to the org.apache.http.impl.client.DefaultRequestDirectory to try to see if anything was different between scenarios 1 and 2, but couldn't find anything ...
A couple of additional notes:
a. I'm currently constrained to a Windows environment in EC2, so instances are virtualized.
b. Scenarios 1 and 2 give the same response time when the underlying instance is not virtualized. But see a - I have to be on AWS.
c. I can also reproduce similar 4X slower performance as in scenario 2 with this third scenario: instead of making the localhost:8080/ using Chrome, I make it using Postman, which is a Chrome application. Using Postman from the Jetty instance itself, I can reproduce the 4X slower times.
The only difference I see in c. above is that the request headers in Chrome's developer tools indicate a Remote Address of [::1]:8080. I don't have any way to set that through Postman, so I don't know if that's the difference maker. And if it were, first I wouldn't understand why. And second, I'm not sure what I could do about it, since I can't control how every single client is going to connect to my API.
All theories, questions, ideas welcome. Thanks in advance!

Speeding up Socket.IO

When I listen for a client connection in Socket.IO, there seems to be a latency of 8-9 seconds as it falls back to XHR. This is too slow for most purposes, as I'm using Socket.IO to push data to users' news feeds, and a lot can happen 8 or 9 seconds.
Is there any way to speed up this failure?
EDIT:
After deploying to Nodejitsu's VPS I tried this again and the socket connection was nearly immediate (enough that a user wouldn't notice). I'm only experiencing this on my local machine. So the question may actually be: why is it so slow on my local machine?
This question is almost impossible to answer without more information on your local setup, but it's interesting that you're failing over to XHR. The following question might explain why it's failing over to XHR, but not if you're able to use the same browser successfully once it's published.
Socket.io reverting to XHR / JSONP polling for no apparent reason
Another potential problem I've read about is that your browser has cached the incorrect transport method. You could try clearing your browser cache and reconnect to see if that gets around the problem.
https://groups.google.com/group/socket_io/browse_thread/thread/e6397e89efcdbcb7/a3ce764803726804
Lastly, if you're unable to figure out why it's not going using WebSockets or FlashSockets, you could try removing them as options from your socket.io configuration so that when you're developing locally, you may be able to get past that delay for quicker development at least.

Resources