Understanding BSD interface - linux

I'm trying to understand how the events in a BSD socket interface translate to the state of a TCP Connection. In particular, I'm trying to understand at what stage in the connection process accept() returns on the server side
client sends SYN
server sends SYN+ACK
client sends ACK
In which one of these steps does accept() return?

accept returns when the connection is complete. The connection is complete after the client sends his ACK.
accept gives you a socket on which you can communicate. Of course you know, you can't communicate until the connection is established. And the connection can't be established before the handshake.
It wouldn't make sense to return before the client sens his ACK. It is entirely possible he won't say anything after the initial SYN.

The TCP/IP stack code in the kernel normally[1] completes the three-way handshake entirely without intervention from any user space code. The three steps you list all happen before accept() returns. Indeed, they may happen before accept() is even called!
When you tell the stack to listen() for connections on a particular TCP port, you pass a backlog parameter, which tells the kernel how many connections it can silently accept on behalf of your program at once. It is this queue that is being used when the kernel automatically accepts new connection requests, and there that they are held until your program gets around to accept()ing them. When there is one or more connections in the listen backlog queue when you call accept(), all that happens is that the oldest is removed from the queue and bound to a new socket.[2]
In other words, if your program calls listen(sd, 5), then goes into an infinite do-nothing loop so that it never calls accept(), five concurrent client connection requests will succeed, from the clients' point of view. A sixth connection request will get stalled on the first SYN packet until either the program owning the TCP port calls accept() or one of the other clients drops its connection.
[1] Firewall and other stack modifications can change this behavior, of course. I am speaking here only of default BSD sockets stack behavior.
[2] If there are no connections waiting in the backlog when you call accept(), it blocks by default, unless the listener socket was set to non-blocking, in which case it returns -1 and errno is EWOULDBLOCK.

Related

Is SO_REUSEADDR socket options useful on the client-side?

I came across the sentence in one java client library:
socket.setReuseAddress(true);
Thought this is used to improve performance,
since the SO_REUSEADDR option can indicate that socket can forcibly
use the TIME_WAIT port even if it belongs to the other socket.
But Also I found that this option is mostly used in the server-side,
to enable the server restarting quickly, not waiting the TIME_WAIT socket to close.
My question is that Is this option useful for the client-side,
like this client library? Will this do harmful to the other socket, like some attack?
Thanks a lot!
-Dimi
It depends on what you mean by "client". You also mention "client library", which has nothing to do with it.
This is often misunderstood, SO_REUSEADDR is to be able to reuse a socket in TIME_WAIT, and TIME_WAIT only happens on one side of the TCP connection, the one that initiates the termination sequence i.e. sends the first FIN packet i.e. calls shutdown(SHUT_WR) first or calls close first, although the latter is unclear/may depend on other things such as connection state or platform, reasons why you should not call close before first calling shutdown(SHUT_WR). This article is a very informative as well as the two referenced at the end of the article. It makes clear that TIME_WAIT may occur on the listening (server) side as well as client side, and recommends actually having clients always initiate termination ("active close") so that the server doesn't accumulate sockets in TIME_WAIT, where that would be more of a problem.

Identifying remote disconnection in socket client

How do I find out from a socket client program that the remote connection is down (e.g. the server is down). When I do a recv and the server is down it blocks if I do not set any timeout. However in my case I cannot put any reliable timeout value to get around it since otherwise the recv times out even when the server is up but the response really takes longer than the timeout value that I have set.
Unfortunately, ZeroMQ just passes this on to the next layer. So the protocol you are implementing on top of ZeroMQ will have to handle this.
Heartbeats are recommended. Basically, just have one side send a message if the connection is otherwise idle. The other side can treat the absence of such messages as a failure condition and close the connection.
You may wish to modify your higher level protocols to be more robust. For example, you can submit a command, query its status, and allow the other side to forget about the command. That way, if the connection is lost, you can reconnect and query any outstanding commands. Any it doesn't have, you know didn't get through and can resubmit. Once you get a reply with the result of a command, you can tell the other side that it can now forget the response.
This allows you to keep the connection active while a long-running command is ongoing. Every so often you ask, "is everything okay". The other side responds, "yes". You can use long polling where the other side delays responding for a second or so while the command is in process. This allows it to return the results immediately rather than having to wait a second for your next query.
The specifics depend on your exact requirements, but you must design this correctly into your protocol.
If the remote host goes down without sending you a tcp FIN package then you have no chance to detect that. You can test that behaviour by firewalling a port after a connection has been established on that port. Your program will "hang" forever.
However, the Linux kernel supports a mechanism called TCP keep alives which are meant to close a tcp connection after a given timeout. If you can't specify a timeout for your application, than there isn't a reliable chance to use that. Last chance might be to use features of the application protocol (can you name it?), if that protocol does not support features for connection handling you may invent something on your own on top of that.

Epoll and remote 1-way shutdown

Assume a TCP socket on the local linux host is in a connected state with a remote host. The local host is using epoll_wait to be notified of events on the socket with the remote host.
If the remote host were to call:
shutdown(s,SHUT_WR);
on its connected socket to indicate it is done transmitting, what event(s) will epoll_wait return on the local host for its socket?
I'm assuming EPOLLIN would always get returned and a subsequent recv call would return 0 to indicate the remote side has finished tranmitting.
What about EPOLLHUP or EPOLLRDHUP? (And what is the difference between these two events)?
Or even EPOLLERR ?
If the remote host calls "close" instead of "shutdown", does the answer to any of the above change?
I'm answering this myself after doing the heavy lifting to find the answer.
A socket listening for epoll events will typically receive an EPOLLRDHUP (in addition to EPOLLIN) event flag upon the remote peer calling close or shutdown(SHUT_WR). This does not neccessarily mean the socket is dead. Subsequent calls to recv() will return any unread data on the socket and eventually "0" will be returned to indicate EOF. It may even be possible to send data back if the remote peer only did a half-close of its socket.
The one notable exception is if the remote peer is using the SO_LINGER option enabled on its socket with a linger value of "0". The result of closing such a socket may result in a TCP RST getting sent instead of a FIN. From what I've read, a connection reset event will generate either a EPOLLHUP or EPOLLERR. (I haven't had time to confirm, but it makes sense).
There is some documentation to suggest there are older Linux implementations that don't support EPOLLRDHUP, as such EPOLLHUP gets generated instead.
And for what it is worth, in my particular case, I found that it is not too interesting to have code that special cases EPOLLHUP or EPOLLRDHUP events. Instead, just treat these events the same as EPOLLIN/EPOLLOUT and call recv() (or send() as appropriate). But pay close attention to return codes returned back from recv() and send().

Avoiding connection refused error for multiple sockets in C

Just a quick background. I am willing to open two sockets per thread of the application.The main thread has the accept() call to accept a TCP connection. There are three other threads and all of them also have an accept(). The problem is sometimes in multithreaded environment, the client tries to connect before the accept call of the server in a child thread which results in "connection refused" error. The client doesn't know when the server is ready to connect
I do not want the main thread socket to be sending any control information to the client like "You can now connect to the server". To avoid this, I have two approaches in my mind
1. To set a max counter(attempt) at the client side to connect to the server before exiting with connection refused error.
2. A separate thread whose only function is to accept connections at server side as a common accept function for all the thread connections except for the main thread.
Would really appreciate to know if there is any other approach. Thanks
Connection refused is not because you're calling accept late, it's because you're calling listen late. Make sure you call listen before any connect calls (you can check with strace). This probably requires that you listen before you spawn any children.
After you call listen on a socket incoming connections will queue until you call accept. At some point the not-yet-accepted connections can get dropped but this shouldn't occur with only 2 or 3 sockets.
If this is unix you can just use pipe2 or socketpair to create a pair of connected pipes/unix domain sockets with a lot less code. Of course, you need to do this before spawning the child thread and pass one end to the child.

Forking with a listening socket

I'd like to make sure about the correctness of the way I try to use accept() on a socket.
I know that in Linux it's safe to listen() on a socket, fork() N children and then recv() the packets in all of them without any synchronisation from the user side (the packets get more or less load-balanced between the children). But that's UDP.
Does the same property hold for TCP and listen(), fork(), accept()? Can I just assume that it's ok to accept on a shared socket created by the parent, even when other children do the same? Is POSIX, BSD sockets or any other standard defining it somewhere?
If you fork() and then accept() in your children only one child process is going to call accept() on a connection and then process it. This is pre-forking and the connections won't be shared among the children.
You can do a standard one child per connection scheme by reversing the order and accepting and forking. However both of these techniques are for efficiency, balancing, etc., not for sharing a particular connection.
TCP is different from UDP. It would be inadvisable to do that in TCP as you will almost certainly end up with a mess. A given received message can be spread over one or more packets and it would be more of a pain for multiple process to coordinate than would be to have one child handle the connection.

Resources