TCP send() is blocking - linux

I am running an application in which I am using TCP blocking socket. TCP send() is blocked, but netstat is showing send and recv Q = 0.
Can someone suggest why would send() be blocked?

The two reasons I can think of would be:
The receiving program keeps the socket open but does not read the data. In this case when the receiving socket buffer would become full, the sender could not send any more, it's socket send buffer would fill up and send() would block.
The network connection between the sender and the receiver is completely blocked after you initially connect. This would result in the sender socket typically failing after a timeout, also this would usually not be repeatable.
Neither case exactly agrees with your netstat results, but from experience I'd say tcp send() does not block unless the socket send buffer is full.

Related

TCP buffer doesn't receive more data

I have a program which sends some TCP data to the kernel and another application is listening to the port. From my program, I keep sending the data using send() API after polling for space using the poll() API. There are multiple packets to be sent one by one. The TCP data buffer gets filled up after some point and the poll times out. The data is stuck at the buffer and the application which listens does not receive any packets, although the connection is established successfully.
I have tried to increase the buffer size too, but no luck.
Can someone please help me troubleshoot this issue?

Linux: socket close() vs shutdown()

So on linux, shutdown() can take a parameter SHUT_RD, SHUT_WR or SHUT_RDWR to shutdown only part of the communication channel. But in terms of the TCP messages sending to the peer, how does it work?
In TCP state machine, the closing works in a 4-way handshake fashion,
(1) (2)
FIN---------->
<----------ACK
<----------FIN
ACK----------->
So what messages dose it send when I do a shutdown(sock, SHUT_RD) or shutdown(sock, SHUT_WR)?
shutdown(sd, SHUT_WR) sends a FIN which the peer responds to with an ACK. Any further attempts to write to the socket will incur an error. However the peer can still continue to send data.
shutdown(sd, SHUT_RD) sends nothing on the network: it just conditions the local API to return EOS for any subsequent reads on the socket. The behaviour when receiving data on a socket that has been shutdown for read is system-dependent: Unix will ACK it and throw it away; Linux will ACK it and buffer it, which will eventually stall the sender; Windows will issue an RST, which the sender sees as 'connection reset by peer'.
The FIN packets don't have to be symmetric. Each end sends FIN when its local writer has closed the socket.
shutdown with write end send FIN and close will close the socket .so any attempt to send close socket will result RESET packet.
I met with intresting problem where
sender send:
Packet A send
Pakcet B send
shutdown with write FIN
shutdown read
close socket
but recevier receive out of order
Packet A
Pakcket containing FIN
Packet B
ACK from reciver ,this cause connection reset.

Epoll and remote 1-way shutdown

Assume a TCP socket on the local linux host is in a connected state with a remote host. The local host is using epoll_wait to be notified of events on the socket with the remote host.
If the remote host were to call:
shutdown(s,SHUT_WR);
on its connected socket to indicate it is done transmitting, what event(s) will epoll_wait return on the local host for its socket?
I'm assuming EPOLLIN would always get returned and a subsequent recv call would return 0 to indicate the remote side has finished tranmitting.
What about EPOLLHUP or EPOLLRDHUP? (And what is the difference between these two events)?
Or even EPOLLERR ?
If the remote host calls "close" instead of "shutdown", does the answer to any of the above change?
I'm answering this myself after doing the heavy lifting to find the answer.
A socket listening for epoll events will typically receive an EPOLLRDHUP (in addition to EPOLLIN) event flag upon the remote peer calling close or shutdown(SHUT_WR). This does not neccessarily mean the socket is dead. Subsequent calls to recv() will return any unread data on the socket and eventually "0" will be returned to indicate EOF. It may even be possible to send data back if the remote peer only did a half-close of its socket.
The one notable exception is if the remote peer is using the SO_LINGER option enabled on its socket with a linger value of "0". The result of closing such a socket may result in a TCP RST getting sent instead of a FIN. From what I've read, a connection reset event will generate either a EPOLLHUP or EPOLLERR. (I haven't had time to confirm, but it makes sense).
There is some documentation to suggest there are older Linux implementations that don't support EPOLLRDHUP, as such EPOLLHUP gets generated instead.
And for what it is worth, in my particular case, I found that it is not too interesting to have code that special cases EPOLLHUP or EPOLLRDHUP events. Instead, just treat these events the same as EPOLLIN/EPOLLOUT and call recv() (or send() as appropriate). But pay close attention to return codes returned back from recv() and send().

Understanding BSD interface

I'm trying to understand how the events in a BSD socket interface translate to the state of a TCP Connection. In particular, I'm trying to understand at what stage in the connection process accept() returns on the server side
client sends SYN
server sends SYN+ACK
client sends ACK
In which one of these steps does accept() return?
accept returns when the connection is complete. The connection is complete after the client sends his ACK.
accept gives you a socket on which you can communicate. Of course you know, you can't communicate until the connection is established. And the connection can't be established before the handshake.
It wouldn't make sense to return before the client sens his ACK. It is entirely possible he won't say anything after the initial SYN.
The TCP/IP stack code in the kernel normally[1] completes the three-way handshake entirely without intervention from any user space code. The three steps you list all happen before accept() returns. Indeed, they may happen before accept() is even called!
When you tell the stack to listen() for connections on a particular TCP port, you pass a backlog parameter, which tells the kernel how many connections it can silently accept on behalf of your program at once. It is this queue that is being used when the kernel automatically accepts new connection requests, and there that they are held until your program gets around to accept()ing them. When there is one or more connections in the listen backlog queue when you call accept(), all that happens is that the oldest is removed from the queue and bound to a new socket.[2]
In other words, if your program calls listen(sd, 5), then goes into an infinite do-nothing loop so that it never calls accept(), five concurrent client connection requests will succeed, from the clients' point of view. A sixth connection request will get stalled on the first SYN packet until either the program owning the TCP port calls accept() or one of the other clients drops its connection.
[1] Firewall and other stack modifications can change this behavior, of course. I am speaking here only of default BSD sockets stack behavior.
[2] If there are no connections waiting in the backlog when you call accept(), it blocks by default, unless the listener socket was set to non-blocking, in which case it returns -1 and errno is EWOULDBLOCK.

How the buffering work in socket on linux

How does buffering work with sockets on Linux?
i.e. if the server does not read the socket and the client keeps sending data.
So what will happen? How big is the socket's buffer? And will the client know so that it will stop sending?
For UDP socket client will never know - the server side will just start dropping packets after the receive buffer is filled.
TCP, on the other hand, implements flow control. The server's kernel will gradually reduce the window, so the client will be able to send less and less data. At some point the window will go down to zero. At this point the client fills up its send buffer and receives an error from the send(2).
TCP sockets use buffering in the protocol stack. The stack itself implements flow control so that if the server's buffer is full, it will stop the client stack from sending more data. Your code will see this as a blocked call to send(). The buffer size can vary widely from a few kB to several MB.
I'm assuming that you're using send() and recv() for client and server communication.
So, send() will return the number of bytes that have been sent out. This doesn't necessarily equal to to the number of bytes you wanted to send out, so it's up to you to realise this and send the rest.
Now, the recv() returns the number of bytes read to the buffer. So if recv returns a 0, then the server has probably closed the connection.

Resources