Short version:
For a Blazor app running on a live IIS server (the production server for my old asp.net site), how can I debug: check the app's memory usage, check for unhandled errors, see status of live and disconnected circuits, etc.?
Long version:
I'm implementing a lot of Circuit management in my gaming page. It includes timers to determine when a player has timed-out long enough to be pulled from a game and so on.
In testing a game, I have 10-20 users connected. I'm hoping to scale up to about 200 live users. For the most part, the games work as expected, but sometimes players crash. I suspect that there's a server error (null reference to a non-existent user Circuit etc.)-- something to do with timeout values etc.
Related
In my project, there is a process that can run for a very long time (> 20 min.). The progress is transmitted to interested clients as a percentage value using SignalR. Now I noticed that the server is rigorously terminated after 20 minutes (IIS default Idle Time-out), although a client is connected and actively receiving data via SignalR.
Could it be that communication via WebSockets is not monitored by the IIS routine that resets the timeout? Is there any way to work around the problem? Or have I implemented something wrong?
We have enabled SignalR on our ASP.NET Core 5.0 web project running on an Azure Web App (Windows App Service Plan). Our SignalR client is an Angular client using the #microsoft/signalr NPM package (version 5.0.11).
We have a hub located at /api/hub/notification.
Everything works as expected for most of our clients, the web socket connection is established and we can call methods from client to server and vice versa.
For a few of our clients, we see a massive amount of requests to POST /api/hub/notification/negotiate and POST /api/hub/notification within a short period of time (multiple requests per minute per client). It seems like that those clients switch to long polling instead of using web sockets since we see the POST /api/hub/notification requests.
We have the suspicion that the affected clients could maybe sit behind a proxy or a firewall which forbids the web sockets and therefore the connection switches to long polling in the first place.
The following screenshot shows requests to the hub endpoints for one single user within a short period of time. The list is very long since this pattern repeats as long as the user has opened our website. We see two strange things:
The client repeatedly calls /negotiate twice every 15 seconds.
The call to POST /notification?id=<connectionId> takes exactly 15 seconds and the following call with the same connection ID returns a 404 response. Then the pattern repeats and /negotiate is called again.
For testing purposes, we enabled only long polling in our client. This works for us as expected too. Unfortunately, we currently don't have access to the browsers or the network of the users where this behavior occurs, so it is hard for us to reproduce the issue.
Some more notes:
We currently have just one single instance of the Web App running.
We use the Redis backplane for a scale-out scenario in future.
The ARR affinity cookie is enabled and Web Sockets in the Azure Web App are enabled too.
The Web App instance doesn't suffer from high CPU usage or high memory usage.
We didn't change any SignalR options except of adding the Redis backplane. We just use services.AddSignalR().AddStackExchangeRedis(...) and endpoints.MapHub<NotificationHub>("/api/hub/notification").
The website runs on HTTPS.
What could cause these repeated calls to /negotiate and the 404 returns from the hub endpoint?
How can we further debug the issue without having access to the clients where this issue occurs?
Update
We now implemented a custom logger for the #microsoft/signalr package which we use in the configureLogger() overload. This logger logs into our Application Insights which allows us to track the client side logs of those clients where our issue occurs.
The following screenshot shows a short snippet of the log entries for one single client.
We see that the WebSocket connection fails (Failed to start the transport "WebSockets" ...) and the fallback transport ServerSentEvents is used. We see the log The HttpConnection connected successfully, but after pretty exactly 15 seconds after selecting the ServerSentEvents transport, a handshake request is sent which fails with the message from the server Server returned handshake error: Handshake was canceled. After that some more consequential errors occur and the connection gets closed. After that, the connection gets established again and everything starts from new, a new handshare error occurs after those 15 seconds and so on.
Why does it take so long for the client to send the handshake request? It seems like those 15 seconds are the problem, since this is too long for the server and the server cancels the connection due to a timeout.
We still think that this has maybe something to to with the client's network (Proxy, Firewall, etc.).
Fiddler
We used Fiddler to block the WebSockets for testing. As expected, the fallback mechanism starts and ServerSentEvents is used as transport. Opposed to the logs we see from our issue, the handshake request is sent immediately and not after 15 seconds. Then everything works as expected.
You should check which pricing tier you use, Free or Standard in your project.
You should change the connectionstring which is in Standard Tier. If you still use Free tier, there are some restrictions.
Official doc: Azure SignalR Service limits
My site works fine locally. It even works fine with my backend using azure web services and front end using netlify but occasionally after several api calls (I'm not overloading the server because these api calls are done one by one) I get LOTS of errors that are all the same. 500 internal server error. I look at the logs and they give me some numbers 500 1013 109 329 2144 391
Reason for this could be
Network issue of your server
Server request time out
-Web app takes too long to respond for a request/response when connecting to any resource( database,different server) etc..
To resolve that , i would suggest you to to increase the idle timeout of your app.
in the app setting of your web app add SCM_COMMAND_IDLE_TIMEOUT = 3600
By default, Web Apps are unloaded if they are idle for some period of time. This lets the system conserve resources. In Basic or Standard mode, you can enable ‘Always On’ to keep the app loaded all the time.
You may also check the diagnostic log stream to get more details on this issue and the blog post for Troubleshooting Azure App Service Apps Using Web Server Logs.
Hope it helps.
I have deployed a node js web application on app service in azure. Issue is that my application occasionally getting killed for unknown reason. I have done exhaustive search through all the log fines using kudu.
If I restart app service, application starts working.
Is there any way I can restart my node application once it has crashed. Kind of run for ever no matter what. For example if any error happens in an asp.net code deployed in IIS, IIS never crashes, its keeps of serving other incoming request.
Something like using forever/pm2 in azure app service.
node.js in Azure App Services is powered by IISNode, which takes care of everything you described, including monitoring your process for failures and restarting it.
Consider the following POC:
var http = require('http');
http.createServer(function (req, res) {
if (req.url == '/bad') {
throw 'bad';
}
res.writeHead(200, {'Content-Type': 'text/plain'});
res.end('bye');
}).listen(process.env.PORT || 1337);
If I host this in a Web App and issue the following sequence of requests:
GET /
GET /bad
GET /
Then the first will yield HTTP 200, the second will throw on the server and yield HTTP 500, and the third will yield HTTP 200 without me having to do anything. IISNode will just detect the crash and restart the process.
So you shouldn't need PM2 or similar solution because this is built in with App Services. However, if you really want to, they now have App Services Preview on Linux, which is powered by PM2 and lets you configure PM2. More on this here. But again you get this out of the box already.
Another thing to consider is Always On setting which is on by default:
By default, web apps are unloaded if they are idle for some period of time. This lets the system conserve resources. In Basic or Standard mode, you can enable Always On to keep the app loaded all the time. If your app runs continuous web jobs, you should enable Always On, or the web jobs may not run reliably.
This is another possible root cause for your issue and the solution is to disable Always On for your Web App (see the link above).
I really want to thank itaysk for your support for this issue.
Issue was not what I was suspecting. Actually the node server was getting restarted on failure correctly.
There was a different issue. Why my website was getting non responsive is for a different reason. Here is what was happening-
We have used rethinkdbdash to connect to rethinkdb database and we was using connection pool. There was a coding/design issue. We have around 15 change feeds implemented with along with socket.io. And the change feed was getting initialised for every user logged in. This was increasing number of active connections in the pool. And rethinkdbdash has default limit of 1000 connection in the pool and as there were lots of live connections, all the available connection in the pool was getting exhausted resulting no more available connection. So, request was waiting for an open connection and it was not getting any available, hence waiting for ever blocking any new requests to be served.
I have almost completed a turn-based multiplayer game with node.js and socket.io.
I have express.js as web server and another class acting as game server, using socket.io.
My problem is that these two are running in the same application. I have a web-landing page where users can log in and see their player details and chat in the lobby. Now until here there is nothing related to game logic. So i'm asking myself why in the hell is this game server running in the same app with webserver. Also note that, in the future, there may be different servers like (europe server, americas server, asia server, etc.) but the same landing page and login mechanism stays.
So what is the perfect way to achieve this?
Should i make two separate node.js apps, one webserver and one game server. However in that case will i be able to share session data between sockets and requests?
Another drawback is that when navigating the site, any unexpected behaviour may cause the webserver to throw exception and close the application thus shutting down the game server also.
You may have completely separate node.js apps and a common session store, such as Redis, to check if a user authenticated with the webserver before allowing access to the game server. The common session store would also help with simultaneous game servers, not just one web + one game server.
You may also prevent closing the application by catching all errors, but that is not generally recommended.
Later edit:
Here is a small list of advantages and disadvantages regarding session stores. It's not for node.js, but it applies here:
http://techwhizbang.com/2009/12/memcache-session-store/