How do I find out from a socket client program that the remote connection is down (e.g. the server is down). When I do a recv and the server is down it blocks if I do not set any timeout. However in my case I cannot put any reliable timeout value to get around it since otherwise the recv times out even when the server is up but the response really takes longer than the timeout value that I have set.
Unfortunately, ZeroMQ just passes this on to the next layer. So the protocol you are implementing on top of ZeroMQ will have to handle this.
Heartbeats are recommended. Basically, just have one side send a message if the connection is otherwise idle. The other side can treat the absence of such messages as a failure condition and close the connection.
You may wish to modify your higher level protocols to be more robust. For example, you can submit a command, query its status, and allow the other side to forget about the command. That way, if the connection is lost, you can reconnect and query any outstanding commands. Any it doesn't have, you know didn't get through and can resubmit. Once you get a reply with the result of a command, you can tell the other side that it can now forget the response.
This allows you to keep the connection active while a long-running command is ongoing. Every so often you ask, "is everything okay". The other side responds, "yes". You can use long polling where the other side delays responding for a second or so while the command is in process. This allows it to return the results immediately rather than having to wait a second for your next query.
The specifics depend on your exact requirements, but you must design this correctly into your protocol.
If the remote host goes down without sending you a tcp FIN package then you have no chance to detect that. You can test that behaviour by firewalling a port after a connection has been established on that port. Your program will "hang" forever.
However, the Linux kernel supports a mechanism called TCP keep alives which are meant to close a tcp connection after a given timeout. If you can't specify a timeout for your application, than there isn't a reliable chance to use that. Last chance might be to use features of the application protocol (can you name it?), if that protocol does not support features for connection handling you may invent something on your own on top of that.
Related
When I connect to the socket server from the client side, which is considered react, every few seconds a repeated request is sent by the socket client. Generally, the requests are of get type and most of the time they are in pending mode. Sometimes the result of requests is 2.
What do you think is the problem of sending repeated requests after connecting or doing anything with the socket?
UPDATE
This problem occurs when I use namespace . I tried all the solutions but this problem was not solved.
image
This is expected behavior when the option used for transport is polling (long-polling).
What happens is, by default, the transport parameter is ["polling", "websocket"] (client, server), where the sequence of elements matters. So, the first connection attempt is made via polling (which is faster to start compared to websocket), and then (or in parallel, I don't know the working details) there is a connection attempt by websocket (this takes a little longer to establish but is faster for later communication).
If the websocket connection is successfully established, the communication will be carried in this way. But if an error occurs, or the connection takes a long time to be established, or this transport option is not present in the instance's parameters, then the communication will continue being carried out through polling, which are the various requests that remain pending. It is normal for them to remain pending, so they receive an update and are able to inform the requester immediately, without the need for several quick requests consulting the application's status.
Check the instance parameters you set for this connection to find out if transport via websocket is enabled. Be careful when using the socket server behind a reverse proxy, as this reverse proxy needs to be properly configured to accept websocket connections, otherwise it won't work.
You can check the websocket requests in the browser inspection, Network tab, by enabling the WS filter.
Here are some additional links for you to read more about:
https://socket.io/docs/v4/how-it-works/
https://socket.io/docs/v4/using-multiple-nodes/
https://socket.io/docs/v4/reverse-proxy/
https://ably.com/blog/websockets-vs-long-polling
I have a fault tolerant application, where an X Server requests to start an Application on a remote client (by some other mechanism) and receive and display its X-window. Fault tolerance means that the server needs to detect loss of the connection to the client and then call a different back-up-client and start the application there and show the window.
My question is whether there exists a mechanism in the X11 protocol that allows to reliably detect in an X11-Server whether the connection has been broken or not.
Experiments show that when unplugging a cable connection it needs some TCP-Timeout to detect the connection loss on socket level. This is very OS-dependent. In our case it was abut 30 minutes after which the X-Server eventually closed the window.
So another assumption could be that the X11-stream constantly delivers some commands and the server could implement some logic like this: If the X11-stream does not deliver any X11 traffic for a timeout y (e.g. 3 seconds), we assume the connection is lost and actively close the window and establish the connection to the fall-back-client.
Is the assumption true? I did not see any such statement in the X11-protocol about how to detect connection loss. Is there any explicit lifesign that is regularly transmitted? Or is the assumption valid that there is constant traffic? Or could there be longer periods of inactivity where nothing is transmitted at all while the connection is perfectly up and running?
There is a NoOperation command from the client that could be used for such purpose. But do clients usually implement something like that as a lifesign?
I have a fault tolerant application, where an X Server needs to start an Application...
I don't think that an X server can "start an application". May be that some setup allows something similar to that, but normally is not so.
...whether there exists a mechanism in the X11 protocol that allows to reliably detect in an X11-Server whether the connection has been broken or not.
No, it does not exist. The X11 protocol is based on TCP/IP, which does not provide directly this "heartbeat". I think the assumption is that, if you click or otherwise stimulate an X11 window, the TCP layer will timeout or throw another error if the client application is gone.
I did not see any statement in the X11-protocol about how to detect connection loss.
There is a NoOperation command from the client that could be used for such purpose. But do clients usually implement something like that as a lifesign?
Maybe that some application uses that NoOperation, but the purpose would be different from what you need. I mean, the X11 server is like an extension from the point of view of an application; the application can have interest to know whether the server is up and working, but it is not true the contrary. And, anyway, even if the server could detect that the application is gone, probably there is no way to tell the server to launch another application.
Probably a special proxy could be deployed; it could launch the application and monitor the connection (in both ways) and take the required steps in case the application goes away. But then again, who would monitor the proxy application?
First of all, X Protocol relies completely on TCP to send/receive information.
You cannot safely put a timeout capable transaction in order to detect a timeout in TCP. TCP is designed to retransmit only those segments that have already been sent but no acknowledged. It is completely asynchronous, in the sense that you send a command, and you can receive many responses or events unrelated to that command, before you receive the response. There's no heartbeat mechanism on XProtocol (except that the NOOP command is sent to synchronize operations with the server, and you receive a response for it, but you cannot overuse it, as that slows down severely the X connection, just launch any client with the -synchronous option to see it, see X(7)). You can even have TCP connections alive for years without interchanging a single packet. There's some mechanism, activated by option SO_KEEPALIVE that makes tcp to employ such heartbeat on TCP for a connection that has no data to transmit, but the X11 protocol normally doesn't make use of it. You don't post any code, nor a description of how the system is configured. The standard XServer never starts a connection by itself, except when launched specifically to negotiate with an XDMCP server (and this is done on UDP protocol) to serve as an XTerminal.
From your words probably you don't know that the roles of server and client are exchanged in X Protocol (the client is the remote application that connects to the server to display its output, and the server is the application that controls your display, mouse and keyboard) There's no means for the server to create a new client, so you need to be creating this connection in other means (probably through SSH, but not described).
By the way, when you say:
Experiments show that when unplugging a cable connection it needs some TCP-Timeout to detect the connection loss on socket level. This is very OS-dependent. In our case it was abut 30 minutes after which the X-Server eventually closed the window.
That is not OS-dependent. It is precisely the standard behaviour when you don't have traffic to send, there's no packet exchanged, so no detection is made (except if your client ---remember, this is the remote application program that wants to show its data in your local server--- activates the SO_KEEPALIVE option, and it requires several losses before declaring a lost connection) In your case the amount of time is variable because timers don't start until there's some data sent over the unplugged connection, and this makes it variable (not OS dependant)
On other side, you cannot pretend the server is going to turn on your monitor in case you leave the office and turn it off by mistake or by accident. What is the fault tolerance specification in that case?
IMHO, in regard of the presentation protocol, the application should be ready to show you as much information about the system as soon as you activate the connection (but the connection must be something allowed to fail). What is important is the means you develop for the application to be fault tolerant, even in the case you are not there to see the display. Will be somebody be advised that no one is looking at the screen? Are you going to detect the absence of operators in that case? Don't take this as a flame, but common sense should imperate in this case.
In case you need to ensure the connectivity to the remote host is available, you need to use another means to check for it. I recommend you to have a simple application pinging the remote host and alerting in case you don't get a positive result. Or you can open a connection to the server and then close it as soon as you get a positive response from the server (the first packet, for example) This will lead us to the next step, that is to ensure that some human is looking at the (turned on) screen of the display :)
For example, you can run a client in parallel to the one you are interested in, and force a heartbeat by asking for some server atom name (or a root window property value) in a loop with some delay. This will make the connection fail or your client can alert in case it doesn't receive the answer in some configurable time.
I have an app where a single client talks to a single server. Normally, the client does a single connect, and then calls send repeatedly, and there's no problem.
However, I need to do a version where the client sets up a connection for each individual send (a bit like HTTP with and without keep-alive). In this version, the client calls socket, connect, send once, and then close.
The problem with this is that I very quickly run out of ephemeral client ports, and the connect fails. To get around this I call setsockopt with SO_REUSEADDR, and then bind to port 0, before calling connect (see here, for example).
This works, except that the TCP connection is no longer reliable. I get occasional incorrect data, presumably because there's still data around when the TCP connection is closed.
Is there any way to make this reliable (and fast)? shutdown before close doesn't help. Maybe I can get select to tell me if the socket is ready for output, but that seems like overkill.
Do you have to use TCP? If so, you will probably have to maintain an open connection and route your messages over that one connection.
There is SCTP, which may be a good fit for your use case - a reliable datagram protocol:
Like TCP, SCTP provides reliable, connection oriented data delivery with congestion control. Unlike TCP, SCTP also provides message bound‐ ary preservation, ordered and unordered message delivery, multi-stream‐ ing and multi-homing. Detection of data corruption, loss of data and duplication of data is achieved by using checksums and sequence numbers. A selective retransmission mechanism is applied to correct loss or corruption of data.
I'm trying to set up a server that can handle a high sustained amount of simultaneous requests. I found that at a certain point, the server won't be able to recycle "old" TCP connections quickly enough to accommodate extreme amounts of requests.
Do websockets eliminate or decrease the amount of tcp connections that a server needs to handle, and are they a good alternative to "normal" requests?
Websockets are persistent connections so it really depends on what you're talking about. The way socket.io uses XHR is different from a typical ajax call in that it hangs onto the request for as long as possible before sending a response. It's a technique called long-polling and It's trying to simulate a persistent connection by never letting go of the request. When the request is about to timeout it sends a response and a new request is initiated immediately which it hangs onto yet again, and the cycle continues.
So I guess if you're getting flooded with connections because of ajax calls then that's probably because your client code is polling the server at some sort of interval. This means that even idle clients will be hitting your server with fury because of this polling. If that's the case then yes, socket.io will reduce your number of connections because it tries to hang onto one single connection per client for as long as possible.
These days I recommend socket.io over doing plain ajax requests. Socket.io is designed to be performant with whatever transport it settles on. The way it gracefully degrades based on what connection is possible is great and means your server will be overloaded as little as possible while still reaching as wide an audience as it can.
I got an app that web server re-direct some requests to backend servers, and the backend servers(Linux) will do complicated computations and response to web server.
For the tcp socket connection management between web server and backend server, i think there are two basic strategy:
"short" connection: that is, one connection per request. This seems very easy for socket management and simplify the whole program structure. After accept, we just get some thread to process the request and finally close this socket.
"long" connection: that is, for one tcp connection, there could be multi request one by one. It seems this strategy could make better use of socket resource and bring some performance improvement(i am not quite sure). BUT it seems this brings a lot of complexity than "short" connection. For example, since now socket fd may be used by multi-threads, synchronization must be involved. and there are more, socket failure process, message sequence...
Is there any suggestions for these two strategies?
UPDATE:, #SargeATM 's answer remind me that i should tell more about the backend service.
Each request is kind of context-free. Backend service can do calculation based on one single request message. It seems to be sth. stateless.
Without getting into the architecture of the backend which I think heavily influences this decision, I prefer short connections for stateless "quick" request/response type traffic and long connections for stateful protocols like a synchronization or file transfer.
I know there is some tcp overhead for establishing a new connection (if it isn't local host) but that has never been anything I have had to optimize in my applications.
Ok I will get a little into architecture since this is important. I always use threads not per request but by function. So I would have a thread that listened on the socket. Another thread that read packets off of all the active connections and another thread doing the backend calculations and a last thread saving to a database if needed. This keep things clean and simple. Easy to measure slow spots, maintain, and to optimize later when needed if needed.
What about a third option... no connection!
If your job description and job results are both of small size, UDP sockets may be a good idea. You have even less resources to manage, as there's no need to bound the request/response to a file descriptor, which give you some flexibility for the future. Imagine you have more backend services and would like to do some load balancing – a busy service can send the job to another one with UDP address of job submitter. The latter just waits for the result and doesn't care where you performed the task.
Obviously you'd have to deal with lost, duplicated and out of order packets, but as a reward you don't have to deal with broken connections. Out of order packets are probably not a big deal if you can fit the request and response in one UDP message, duplication can be taken care of by some job ids, and lost packet... well, they can be simply resent ;-)
Consider this!
Well, you are right.
The biggest problem with persistent connections will be making sure that app got "clean" connection from pool. Without any garbage left of data from another request.
There are a lot of ways to deal with that problem, but at the end it is better to close() tainted connection and open new one than trying to clean it...