Epoll and remote 1-way shutdown - linux

Assume a TCP socket on the local linux host is in a connected state with a remote host. The local host is using epoll_wait to be notified of events on the socket with the remote host.
If the remote host were to call:
shutdown(s,SHUT_WR);
on its connected socket to indicate it is done transmitting, what event(s) will epoll_wait return on the local host for its socket?
I'm assuming EPOLLIN would always get returned and a subsequent recv call would return 0 to indicate the remote side has finished tranmitting.
What about EPOLLHUP or EPOLLRDHUP? (And what is the difference between these two events)?
Or even EPOLLERR ?
If the remote host calls "close" instead of "shutdown", does the answer to any of the above change?

I'm answering this myself after doing the heavy lifting to find the answer.
A socket listening for epoll events will typically receive an EPOLLRDHUP (in addition to EPOLLIN) event flag upon the remote peer calling close or shutdown(SHUT_WR). This does not neccessarily mean the socket is dead. Subsequent calls to recv() will return any unread data on the socket and eventually "0" will be returned to indicate EOF. It may even be possible to send data back if the remote peer only did a half-close of its socket.
The one notable exception is if the remote peer is using the SO_LINGER option enabled on its socket with a linger value of "0". The result of closing such a socket may result in a TCP RST getting sent instead of a FIN. From what I've read, a connection reset event will generate either a EPOLLHUP or EPOLLERR. (I haven't had time to confirm, but it makes sense).
There is some documentation to suggest there are older Linux implementations that don't support EPOLLRDHUP, as such EPOLLHUP gets generated instead.
And for what it is worth, in my particular case, I found that it is not too interesting to have code that special cases EPOLLHUP or EPOLLRDHUP events. Instead, just treat these events the same as EPOLLIN/EPOLLOUT and call recv() (or send() as appropriate). But pay close attention to return codes returned back from recv() and send().

Related

How to get socket states from socket winsock2

Hi I'm new to C++ winsock2 sockets and I've been stuck at this for a while.
How do I get if a socket is connected, bound, listening or closed from the socket/file descriptor passed as an argument.
Note : I want to figure out the socket state without attempting to bind/connect/listen/closesocket because that will return a result and perform the operation. Example : if I wanted to check whether a socket is connected to a server I should attempt to connect and get the result which I don't want.
I am trying to achieve something like this.
// Using Winsock2 API.
bool isSocketBound(SOCKET s) {
// Check if socket is bound without attempting to bind.
// The code which I'm not able to figure out.
return isBound;
}
// Same goes for the other functions.
Thanks in advance.
How do I get if a socket is connected, bound, listening or closed from the socket/file descriptor passed as an argument.
There is no single socket API to query a SOCKET's current state. However:
getsockname() will tell you whether the socket is bound locally or not.
getpeername() will tell you whether the socket is connected to a remote peer or not.
getsockopt() with the SOL_SOCKET:SO_ACCEPTCONN option will tell you whether a connection-oriented (ie, TCP) SOCKET is listen()'ing or not.
Alternatively, if you try to connect() on a listen()'ing socket, you will get a WSAEINVAL error. If you try to listen() on a connect()'ed socket, you will get a WSAEISCONN error.
Alternatively, if the socket is at least bound locally, then you can use getsockname() to get its local IP/port and then use GetTcpTable(), GetTcpTable2(), or GetExtendedTcpTable() to find out whether that IP/port is in a LISTENING state or not.
If a SOCKET has been closed locally, the SOCKET is not valid to begin with, so you can't query it for anything. The best thing you can do is just set the SOCKET to INVALID_SOCKET when you are done using it, and then check for that condition when needed. On the other hand, if the socket has been closed remotely but not locally yet, then the SOCKET is still valid, but there is no way to query its close state. The only way to know if the socket has been closed by the peer is if a recv() operation reports 0 bytes received, or if a send() operation fails with a connection error, or if you are using the SOCKET in asynchronous mode (WSAAsyncSelect() or WSAEventSelect()) and receive an FD_CLOSE notification for it.
Note : I want to figure out the socket state without attempting to bind/connect/listen/closesocket because that will return a result and perform the operation. Example : if I wanted to check whether a socket is connected to a server I should attempt to connect and get the result which I don't want.
Why are you not simply keeping track of the state in your own code? If you bind() a SOCKET and it succeeds, then you know it is bound locally. If you connect() and it succeeds, then you know it is bound locally and is connected to a remote server. You should maintain a state variable along-side your SOCKET variable, and keep the state updated with each state-changing operation you perform on the SOCKET.

TCP send() is blocking

I am running an application in which I am using TCP blocking socket. TCP send() is blocked, but netstat is showing send and recv Q = 0.
Can someone suggest why would send() be blocked?
The two reasons I can think of would be:
The receiving program keeps the socket open but does not read the data. In this case when the receiving socket buffer would become full, the sender could not send any more, it's socket send buffer would fill up and send() would block.
The network connection between the sender and the receiver is completely blocked after you initially connect. This would result in the sender socket typically failing after a timeout, also this would usually not be repeatable.
Neither case exactly agrees with your netstat results, but from experience I'd say tcp send() does not block unless the socket send buffer is full.

read the content of send-q TCP socket in linux

I have a TCP client sending data to a server continuously . After successful connection of client with the server , client sends data continuously with some intervals in terms of few seconds .
When the link between the client and server got disconnected after sending few data ,I came to know that TCP retransmits the data according to the value in TCP_retries2 , I configured this value to be 8 , such that I get write error after 100 secs .
But there will be some unacknowledged packets in send-q .
Is there way to read the content of this unacknowledged packets in send-q in my program before closing this socket or should i remember the send data and resend it after connecting again ? Is there any other way to implement this ?
You can get the size of sendq with an ioctl:
SIOCOUTQ
Returns the amount of unsent data in the socket send queue.
The socket must not be in LISTEN state, otherwise an error
(EINVAL) is returned. SIOCOUTQ is defined in
<linux/sockios.h>. Alternatively, you can use the synonymous
TIOCOUTQ, defined in <sys/ioctl.h>.
Note that sendq only tells you what the kernel of the remote system accepted, it does not guarantee that the application running on that host handled it. Most failures exist in the network between the communicating parties, but this metric can't be used for definite proof as successful transmission.
Once the application has given its data to TCP, it is the responsibility of TCP to keep track of the acknowledgement of the packets. If ACKs are not forthcoming, it tries its best to get the packet delivered based on RTO algorithm. Now until ACK is received, the data is kept in TCP_SEND_Q. I do not think there is any control from the application to determine current state of TCP_SEND_Q.
//should i remember the send data and resend it after connecting again//
How do you do this? The previous connection status is gone, isn't? Until the client and the server applications maintain some understanding as to what was received and sent offline, you have to start fresh with new connection.
No there isn't.
If you need to know that the peer application has received the data, you need to have the peer application acknowledge it back to your application via your application protocol, and treat any unacknowledged data as needing re-sending from your application somehow. This also brings in the question of transactional idempotence, so that you can resend with impunity.
It takes two to tango. You can close your end of the connection and it waits for the other end of the connection to drop, too. Think 3-way handshake in reverse.
How long do you wait between closing the connectiion and re-opening it? You must wait at least the TIME_WAIT before trying to reconnect using the same connection info.

Understanding BSD interface

I'm trying to understand how the events in a BSD socket interface translate to the state of a TCP Connection. In particular, I'm trying to understand at what stage in the connection process accept() returns on the server side
client sends SYN
server sends SYN+ACK
client sends ACK
In which one of these steps does accept() return?
accept returns when the connection is complete. The connection is complete after the client sends his ACK.
accept gives you a socket on which you can communicate. Of course you know, you can't communicate until the connection is established. And the connection can't be established before the handshake.
It wouldn't make sense to return before the client sens his ACK. It is entirely possible he won't say anything after the initial SYN.
The TCP/IP stack code in the kernel normally[1] completes the three-way handshake entirely without intervention from any user space code. The three steps you list all happen before accept() returns. Indeed, they may happen before accept() is even called!
When you tell the stack to listen() for connections on a particular TCP port, you pass a backlog parameter, which tells the kernel how many connections it can silently accept on behalf of your program at once. It is this queue that is being used when the kernel automatically accepts new connection requests, and there that they are held until your program gets around to accept()ing them. When there is one or more connections in the listen backlog queue when you call accept(), all that happens is that the oldest is removed from the queue and bound to a new socket.[2]
In other words, if your program calls listen(sd, 5), then goes into an infinite do-nothing loop so that it never calls accept(), five concurrent client connection requests will succeed, from the clients' point of view. A sixth connection request will get stalled on the first SYN packet until either the program owning the TCP port calls accept() or one of the other clients drops its connection.
[1] Firewall and other stack modifications can change this behavior, of course. I am speaking here only of default BSD sockets stack behavior.
[2] If there are no connections waiting in the backlog when you call accept(), it blocks by default, unless the listener socket was set to non-blocking, in which case it returns -1 and errno is EWOULDBLOCK.

How to close connection for Node.js http.createClient?

Is there something like
remote_client.close()
or
request.close()
for closing the http.createClient connection?
It seems like in some occasions, the socket connection is still hanging there after the "response" event is emitted and properly handled.
Upgrade to the latest version of node.js (0.4.8).
The syntax for creating clients has changed. You now have
http.request
Which returns a ClientRequest which you can .end and it gives you a ClientResponse in the callback.
The ClientReponse is just a Readable Stream and you can .destroy a stream
Except for certain rare and extraordinary circumstances, your connections are performing correctly and will close by themselves, (or more precisely, will be closed automatically by your computer operating system's TCP stack.)
The issue you are seeing is that the socket underlying the connection will only be closed once all of its data have been delivered and acknowledged, and after the TIME_WAIT period has expired. This is well defined and perfectly normal behavior.
For an explanation of why this is required, see
UNIX Socket FAQ» 2.7 - Please explain the TIME_WAIT state
TCP Tutorial
But the bottom line is: don't worry about sockets; they're your operating system's responsibility.

Resources