How to gracefully exit from a blocking kernel_recvmsg()?

How to gracefully exit from a blocking kernel_recvmsg()? - multithreading

I have the following running in a kernel thread:
size = kernel_recvmsg(kthread->sock, &msg, &iov, 1, bufsize, MSG_DONTWAIT)
I receive UDP packets very frequently (every 10ms let's say). In order to process them as fast as possible I don't have any sleep in the loop where the kernel_recvmsg() is. Unfortunately I observe very big CPU consumption while the UDP packets are not coming.
If I make the socket blocking (remove the MSG_DONTWAIT) is there some indirect way to unblock and exit from kernel_recvmsg()..?
What would happen if I do unexpected sock_release()? Should kernel_recvmsg() unblock and return some error and I could handle it accordingly (exit from the loop and thread)?

It's easy to unblock a UDP wait from another thread - sendto() it a datagram to unblock it on the local stack, ( an empty one would be fine). You can use some 'abort' bolean flag to indicate to the post-kernel_recvmsg code to return or perform some other unusual action.
That said, there are few comms stacks on mature OS that do not unblock a socket wait if the socket is closed from another thread. I've heard on SO that this is unreliable, but I've never seen ANY case where an error return or exception is not immediately raised if the underlying socket is freed.

Related

Using thread to write and select to read

Has any one tried to create a socket in non blocking mode and use a dedicated thread to write to the socket, but use the select system call to identify if data is available to read data.
if the socket is non blocking, the write call will return immediately and the application will not know the status of the write (if it passed or failed).
is there a way of knowing the status of the write call without having to block on it.

Has any one tried to create a socket in non blocking mode and use a dedicated thread to write to the socket, but use the select system call to identify if data is available to read data.
Yes, and it works fine. Sockets are bi-directional. They have separate buffers for reading and writing. It is perfectly acceptable to have one thread writing data to a socket while another thread is reading data from the same socket at the same time. Both threads can use select() at the same time.
if the socket is non blocking, the write call will
return immediately and the application will not
know the status of the write (if it passed or failed).
The same is true for blocking sockets, too. Outbound data is buffered in the kernel and transmitted in the background. The difference between the two types is that if the write buffer is full (such as if the peer is not reading and acking data fast enough), a non-blocking socket will fail to accept more data and report an error code (WSAEWOULDBLOCK on Windows, EAGAIN or EWOULDBLOCK on other platforms), whereas a blocking socket will wait for buffer space to clear up and then write the pending data into the buffer. Same thing with reading. If the inbound kernel buffer is empty, a non-blocking socket will fail with the same error code, whereas a blocking socket will wait for the buffer to receive data.
select() can be used with both blocking and non-blocking sockets. It is just more commonly used with non-blocking sockets than blocking sockets.
is there a way of knowing the status of the write
call without having to block on it.
On non-Windows platforms, about all you can do is use select() or equivalent to detect when the socket can accept new data before writing to it. On Windows, there are ways to receive a notification when a pending read/write operation completes if it does not finish right away.
But either way, outbound data is written into a kernel buffer and not transmitted right away. Writing functions, whether called on blocking or non-blocking sockets, merely report the status of writing data into that buffer, not the status of transmitting the data to the peer. The only way to know the status of the transmission is to have the peer explicitly send back a reply message once it has received the data. Some protocols do that, and others do not.

is there a way of knowing the status of the write call without having
to block on it.
If the result of the write call is -1, then check errno to for EAGAIN or EWOULDBLOCK. If it's one of those errors, then it's benign and you can go back to waiting on a select call. Sample code below.
int result = write(sock, buffer, size);
if ((result == -1) && ((errno == EAGAIN) || (errno==EWOULDBLOCK)) )
{
// write failed because socket isn't ready to handle more data. Try again later (or wait for select)
}
else if (result == -1)
{
// fatal socket error
}
else
{
// result == number of bytes sent.
// TCP - May be less than the number of bytes passed in to write/send call.
// UDP - number of bytes sent (should be the entire thing)
}

What Error Shall We handle in Using Sockets

When using non-blocking sockets to communicate with clients, what error codes we must care and do something other than calling close() directly? Can anyone list them and give a few comments about what extra work we must do?
Currently we only handle the EAGAIN and EWOULDBLOCK, for all the other errors we just close the socket. Is this kind of socket exception routine enough for a server software?

I think you've mostly got it right. There are two other transient errors that my code retries on rather than closing the socket, they are:
EINTR - this can be returned if the system call was interrupted by delivery of a signal. The usual appropriate response is to try the system call again. (See here for details on why this can occur)
ENOBUFS - Under certain circumstances, send() can return this if the network interface's output queue is full. (yes, you'd expect send() to return EWOULDBLOCK or EAGAIN instead, but I have seen this returned)

Linux TCP socket in blocking mode

When I create a TCP socket in blocking mode and use the send (or sendto) functions, when the will the function call return?
Will it have to wait till the other side of the socket has received the data? In that case, if there is traffic jam on the internet, could it block for a long time?

Both the sender and the receiver (and possibly intermediaries) will buffer the data.
Sending data successfully is no guarantee that the receiving end has received it.
Normally writes to a blocking socket, won't block as long as there is space in the sending-side buffer.
Once the sender's buffer is full, then the write WILL block, until there is space for the entire write in it.
If the write is partially successful (the receiver closed the socket, shut it down or an error occurred), then the write might return fewer bytes than you had intended. A subsequent write should give an error or return 0 - such conditions are irreversible on TCP sockets.
Note that if a subsequent send() or write() gives an error, then some previously written data could be lost forever. I don't think there is a real way of knowing how much data actually arrived (or was acknowledged, anyway).

Handling short reads using epoll()

Let's say client sent 100 bytes of data but somehow server only received 90 bytes. How do I handle this case? If server calls the "read" function inside of while loop checking the total received data then the server will wait forever for the pack last 10 bytes..
Also, it could happen that client got disconnected in the middle of data transfer. In this case also server will wait forever until it receives all the data which won't arrive..
I am using tcp but in real world network environment, this situation could happen. Thanks in advance...

You do not call the read() function in a loop until you receieve the number of bytes you require. Instead, you set the socket to nonblocking and call the read() function in a loop until it returns 0 (indicating end of stream) or an error.
In the normal case the loop will terminate by read() returning -1, with errno set to EAGAIN. This indicates that the connection hasn't been closed, but no more data is available at the current time. At this point, if you do not have enough data from the client yet, you simply save the data that you do have for later, and return to the main epoll() loop.
If and when the remainder of the data arrives, the socket will be returned as readable by epoll(), you will read() the rest of the data, retreieve the saved data and process it all.
This means that you need space in your per-socket data structure to store the read-but-not-processed-yet data.

You must carefully check the return value of read. It can return any of three things:
A positive number, indicating some bytes were read.
Zero, indicating the other end has gracefully closed the connection.
-1, meaning an error occurred. (If the socket is non-blocking, then the error EAGAIN or EWOULDBLOCK means the connection is still open but no data is ready for you right now, so you need to wait until epoll says there is more data for you.)
If your code is not checking for each of these three things and handling them differently, then it is almost certainly broken.
These cover all of the cases you are asking about, like a client sending 90 bytes then closing or rudely breaking the connection (because read() will return 0 or -1 for those cases).
If you are worried that a client might send 90 bytes and then never send any more, and never close the connection, then you have to implement your own timeouts. For that your best bet is non-blocking sockets and putting a timeout on select() / poll() / epoll(), ditching the connection if it is idle for too long.

TCP connection is a bi-directional stream layered on top of packet-based network. It's a common occurrence to read only part of what the other side sent. You have to read in a loop, appending until you have a complete message. For that you need an application level protocol - types, structure, and semantics of messages - that you use on top of TCP (FTP, HTTP, SMTP, etc. are such protocols).
To answer the specific second part of the question - add EPOLLRDHUP to the set of epoll(7) events to get notified when connection drops.

In addition to what caf has said, I'd recommend just subscribing EPOLLRDHUP because this is the only safe way to figure out whether a connection was closed (read() == 0 is not reliable as, caf mentioned this too, may be true in case of an error). EPOLLERR is always subscribed to, even if you didn't specifically asked for it. The correct behaviour is to close the connection using close() in case of EPOLLRDHUP and probably even when EPOLLERR is set.
For more information, I've given a similar answer here: epoll_wait() receives socket closed twice (read()/recv() returns 0)

Where does Linux kernel do process and TCP connections cleanup after process dies?

I am trying to find place in the linux kernel where it does cleanup after process dies. Specifically, I want to see if/how it handles open TCP connections after process is killed with -9 signal. I am pretty sure it closes all connections, but I want to see details, and if there is any chance that connections are not closed properly.
Pointers to linux kernel sources are welcome.

The meat of process termination is handled by exit.c:do_exit(). This function calls exit_files(), which in turn calls put_files_struct(), which calls close_files().
close_files() loops over all file descriptors the process has open (which includes all sockets), calling filp_close() on each one, which calls fput() on the struct file object. When the last reference to the struct file has been put, fput() calls the file object's .release() method, which for sockets, is the sock_close() function in net/socket.c.

I'm pretty sure the socket cleanup is more of a side effect of releasing all the file descriptors after the process dies, and not directly done by the process cleanup.
I'm going to go out on a limb though, and assume you're hitting a common pitfall with network programming. If I am correct in guessing that your problem is that you get an "Address in use" error (EADDRINUSE) when trying to bind to an address after a process is killed, then you are running into the socket's TIME_WAIT.
If this is the case, you can either wait for the timeout, usually 60 seconds, or you can modify the socket to allow immediate reuse like so.
int sock, ret, on;
struct sockaddr_in servaddr;
sock = socket( AF_INET, SOCK_STREAM, 0 ):
/* Enable address reuse */
on = 1;
ret = setsockopt( sock, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on) );
[EDIT]
From your comments, It sounds like you are having issues with half-open connections, and don't fully understand how TCP works. TCP has no way of knowing if a client is dead, or just idle. If you kill -9 a client process, the four-way closing handshake never completes. This shouldn't be leaving open connections on your server though, so you still may need to get a network dump to be sure of what's going on.
I can't say for sure how you should handle this without knowing exactly what you are doing, but you can read about TCP Keepalive here. A couple other options are sending empty or null messages periodically to the client (may require modifying your protocol), or setting hard timers on idle connections (may result in dropped valid connections).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string