I have a production app that uses socket.io (node.js back-end)to distribute messages to all the logged in clients. Many of my users are experiencing disconnections from the socket.io server. The normal use case for a client is to keep the web app open the entire working day. Most of the time on the app in a work day time is spent idle, but the app is still open - until the socket.io connection is lost and then the app kicks them out.
Is there any way I can make the connection more reliable so my users are not constantly losing their connection to the socket.io server?
It appears that all we can do here is give you some debugging advice so that you might learn more about what is causing the problem. So, here's a list of things to look into.
Make sure that socket.io is configured for automatic reconnect. In the latest versions of socket.io, auto-reconnect defaults to on, but you may need to verify that no piece of code is turning it off.
Make sure the client is not going to sleep such that all network connections will become inactive get disconnected.
In a working client (before it has disconnected), use the Chrome debugger, Network tab, webSockets sub-tab to verify that you can see regular ping messages going between client and server. You will have to open the debug window, get to the network tab and then refresh your web page with that debug window open to start to see the network activity. You should see a funky looking URL that has ?EIO=3&transport=websocket&sid=xxxxxxxxxxxx in it. Click on that. Then click on the "Frames" sub-tag. At that point, you can watch individual websocket packets being sent. You should see tiny packets with length 1 every once in a while (these are the ping and pong keep-alive packets). There's a sample screen shot below that shows what you're looking for. If you aren't seeing these keep-alive packets, then you need to resolve why they aren't there (likely some socket.io configuration or version issue).
Since you mentioned that you can reproduce the situation, one thing you want to know is how is the socket getting closed (client-end initiated or server-end initiated). One way to gather info on this is to install a network analyzer on your client so you can literally watch every packet that goes over the network to/from your client. There are many different analyzers and many are free. I personally have used Fiddler, but I regularly hear people talking about WireShark. What you want to see is exactly what happens on the network when the client loses its connection. Does the client decide to send a close socket packet? Does the client receive a close socket packet from someone? What happens on the network at the time the connection is lost.
webSocket network view in Chrome Debugger
The most likely cause is one end closing a WebSocket due to inactivity. This is commonly done by load balancers, but there may be other culprits. The fix for this is to simply send a message every so often (I use 30 seconds, but depending on the issue you may be able to go higher) to every client. This will prevent it from appearing to be inactive and thus getting closed.
Related
I have a fault tolerant application, where an X Server requests to start an Application on a remote client (by some other mechanism) and receive and display its X-window. Fault tolerance means that the server needs to detect loss of the connection to the client and then call a different back-up-client and start the application there and show the window.
My question is whether there exists a mechanism in the X11 protocol that allows to reliably detect in an X11-Server whether the connection has been broken or not.
Experiments show that when unplugging a cable connection it needs some TCP-Timeout to detect the connection loss on socket level. This is very OS-dependent. In our case it was abut 30 minutes after which the X-Server eventually closed the window.
So another assumption could be that the X11-stream constantly delivers some commands and the server could implement some logic like this: If the X11-stream does not deliver any X11 traffic for a timeout y (e.g. 3 seconds), we assume the connection is lost and actively close the window and establish the connection to the fall-back-client.
Is the assumption true? I did not see any such statement in the X11-protocol about how to detect connection loss. Is there any explicit lifesign that is regularly transmitted? Or is the assumption valid that there is constant traffic? Or could there be longer periods of inactivity where nothing is transmitted at all while the connection is perfectly up and running?
There is a NoOperation command from the client that could be used for such purpose. But do clients usually implement something like that as a lifesign?
I have a fault tolerant application, where an X Server needs to start an Application...
I don't think that an X server can "start an application". May be that some setup allows something similar to that, but normally is not so.
...whether there exists a mechanism in the X11 protocol that allows to reliably detect in an X11-Server whether the connection has been broken or not.
No, it does not exist. The X11 protocol is based on TCP/IP, which does not provide directly this "heartbeat". I think the assumption is that, if you click or otherwise stimulate an X11 window, the TCP layer will timeout or throw another error if the client application is gone.
I did not see any statement in the X11-protocol about how to detect connection loss.
There is a NoOperation command from the client that could be used for such purpose. But do clients usually implement something like that as a lifesign?
Maybe that some application uses that NoOperation, but the purpose would be different from what you need. I mean, the X11 server is like an extension from the point of view of an application; the application can have interest to know whether the server is up and working, but it is not true the contrary. And, anyway, even if the server could detect that the application is gone, probably there is no way to tell the server to launch another application.
Probably a special proxy could be deployed; it could launch the application and monitor the connection (in both ways) and take the required steps in case the application goes away. But then again, who would monitor the proxy application?
First of all, X Protocol relies completely on TCP to send/receive information.
You cannot safely put a timeout capable transaction in order to detect a timeout in TCP. TCP is designed to retransmit only those segments that have already been sent but no acknowledged. It is completely asynchronous, in the sense that you send a command, and you can receive many responses or events unrelated to that command, before you receive the response. There's no heartbeat mechanism on XProtocol (except that the NOOP command is sent to synchronize operations with the server, and you receive a response for it, but you cannot overuse it, as that slows down severely the X connection, just launch any client with the -synchronous option to see it, see X(7)). You can even have TCP connections alive for years without interchanging a single packet. There's some mechanism, activated by option SO_KEEPALIVE that makes tcp to employ such heartbeat on TCP for a connection that has no data to transmit, but the X11 protocol normally doesn't make use of it. You don't post any code, nor a description of how the system is configured. The standard XServer never starts a connection by itself, except when launched specifically to negotiate with an XDMCP server (and this is done on UDP protocol) to serve as an XTerminal.
From your words probably you don't know that the roles of server and client are exchanged in X Protocol (the client is the remote application that connects to the server to display its output, and the server is the application that controls your display, mouse and keyboard) There's no means for the server to create a new client, so you need to be creating this connection in other means (probably through SSH, but not described).
By the way, when you say:
Experiments show that when unplugging a cable connection it needs some TCP-Timeout to detect the connection loss on socket level. This is very OS-dependent. In our case it was abut 30 minutes after which the X-Server eventually closed the window.
That is not OS-dependent. It is precisely the standard behaviour when you don't have traffic to send, there's no packet exchanged, so no detection is made (except if your client ---remember, this is the remote application program that wants to show its data in your local server--- activates the SO_KEEPALIVE option, and it requires several losses before declaring a lost connection) In your case the amount of time is variable because timers don't start until there's some data sent over the unplugged connection, and this makes it variable (not OS dependant)
On other side, you cannot pretend the server is going to turn on your monitor in case you leave the office and turn it off by mistake or by accident. What is the fault tolerance specification in that case?
IMHO, in regard of the presentation protocol, the application should be ready to show you as much information about the system as soon as you activate the connection (but the connection must be something allowed to fail). What is important is the means you develop for the application to be fault tolerant, even in the case you are not there to see the display. Will be somebody be advised that no one is looking at the screen? Are you going to detect the absence of operators in that case? Don't take this as a flame, but common sense should imperate in this case.
In case you need to ensure the connectivity to the remote host is available, you need to use another means to check for it. I recommend you to have a simple application pinging the remote host and alerting in case you don't get a positive result. Or you can open a connection to the server and then close it as soon as you get a positive response from the server (the first packet, for example) This will lead us to the next step, that is to ensure that some human is looking at the (turned on) screen of the display :)
For example, you can run a client in parallel to the one you are interested in, and force a heartbeat by asking for some server atom name (or a root window property value) in a loop with some delay. This will make the connection fail or your client can alert in case it doesn't receive the answer in some configurable time.
I'm using node.js, websockets/ws.
In my site sometimes a random client loses connection without losing connection to other internet stuff. Less then a second later they connect back, with a new socket. (There is a code that calls socket.onClose, which tries to reconnect back to server)
On the server side I can't see or log anything wrong. Everything looks like a normal disconnect, same as closing the browser tab.
I am guessing the reason is either socket related or client related but I don't know where to begin to debug this problem.
I got ping/pong responses with 60 second timer, this isn't it. The user usually loses connection while active.
How can I debug this problem and find the reason?
I keep all the session info, data, within the socket and that is why I do not want people to lose their connection.
Thanks
Essentially, I'm trying to work out the best way to ensure that a user is connected to the server / the internet and is thus able to make requests in my application without error.
I have come across various solutions, but I can;t really decide what is the best performing or useful.
Websockets, using Socket.io to keep an open connection with the server for each client. Also opens up the possibility for real time updates in my app, which could be a nice thing in the future. However, having lots of open sockets is sure to be hard hitting in performance.
Polling, so having an endpoint in my API that the angular app hits every 5 seconds or so to check the user is connected. Again, seems like it isn't a good idea to be hitting the server a load.
Waiting for an error, then start polling every couple of seconds to wait for the connection to be re-established. This is a little change on the above. However, you are still waiting for a user to fail, which isn't good for user experience.
Does anybody have any informed input on this issue?
Thanks
Socket.IO timeout (disconnect) occurs if there are no activity present in the socket, but how is detected that are no activity? I dont find information about that. How works the detection process? For example:
If user closes the website tab, occurs disconnect?
There is any way user loses connection and disconnect will not be executed?
Is possible cache information informs that Im online even if I leave the website?
If you open your browser's developer tools in the network panel, you can filter your requests to ws requests (web socket requests), in there you can see your active web sockets connections. If you choose one connection, you can see the headers, the frames, the cookies and timing. If you choose the frames option you can see what's being sent and received, between your browser and your web socket server.
The next image will make it clear for you, it's chrome's developer tools:
Now in there you can see there are some numbers, basically your browser and your server are doing ping pong. You can read more about these numbers in this answer SocketIO Chrome Inspector Frames
This ping pong is what keeps the socket alive so we know that there are no timeouts. As for the disconnect and the connect I advise you to read more about the WebSockets API, in there you can see there are event listeners for onclose, onerror, onmessage and onopen.
So answering this question:
For example: If user closes the website tab, occurs disconnect? There
is any way user loses connection and disconnect will not be executed?
No, the onclose event will be fired, but even if any cosmic reason the onclose isn't fired you will eventually disconnect due to timeout.
As for your other question:
Is possible cache information informs that Im online even if I leave
the website?
Yes, that's not up to sockets, that's up to you and your implementation. You can keep a list of online users and only update that list from time to time, let's say 10 minutes. You can keep the online users lists and between the time you update your online users list, some of them are already disconnected.
We have an browser application (SaaS) where we would like to notify the user in the case of internet connection or server connection loss. Gmail does this very nicely, the moment I unplug the internet cable or disable network traffic it immediately says unable to reach the server and gives me a count down for retry.
What is the best way to implement something like this? Would I want the client browser issuing AJAX requests to the application server every second, or have a separate server that just reports back "alive". Scalability will be come an issue down the road.
Because GMail already checks for new e-mails every some seconds and for chat information even more frequently, it can tell without a separate request if the connection is down. If you're not using Ajax for some other sort of constant update, then yes, you would just have your server reply with some sort of "alive" signal. Note that you couldn't use a separate server because of Ajax cross-domain restrictions, however.
With the server reporting to the client (push via Comet), you have to maintain an open connection for each client. This can be pretty expensive if you have a large number of clients. Scalability can be an issue, as you mentioned. The other option is to poll. Instead of doing it every second, you can have it poll every 5-10 seconds or so.
Something else that you can look at is Web Sockets (developed as part of HTML 5), but I am not sure if it is widely supported (AFAIK only Chrome supports it).