recv with MSG_NONBLOCK and MSG_WAITALL - linux

I want to use recv syscall with nonblocking flags MSG_NONBLOCK. But with this flag syscall can return before full request is satisfied. So,
can I add MSG_WAITALL flag? Will it be nonblocking?
or how should I rewrite blocking recv into the loop with nonblocking recv

For IPv4 TCP receives on Linux at least, MSG_WAITALL is ignored if MSG_NONBLOCK is specified (or the file descriptor is set to non-blocking).
From tcp_recvmsg() in net/ipv4/tcp.c in the Linux kernel:
if (copied >= target && !sk->sk_backlog.tail)
break;
if (copied) {
if (sk->sk_err ||
sk->sk_state == TCP_CLOSE ||
(sk->sk_shutdown & RCV_SHUTDOWN) ||
!timeo ||
signal_pending(current))
break;
target in this cast is set to to the requested size if MSG_DONTWAIT is specified or some smaller value (at least 1) if not. The function will complete if:
Enough bytes have been copied
There's a socket error
The socket has been closed or shutdown
timeo is 0 (socket is set to non-blocking)
There's a signal pending for the process
To me this seems like it may be a bug in Linux, but either way it won't work the way you want. It looks like dec-vt100's solution will, but there is a race condition if you try to receive from the same socket in more than one process or thread.That is, another recv() call by another thread/process could occur after your thread has performed a peek, causing your thread to block on the second recv().

EDIT:
Plain recv() will return whatever is in the tcp buffer at the time of the call up to the requested number of bytes. MSG_DONTWAIT just avoids blocking if there is no data at all ready to be read on the socket. MSG_WAITALL requests blocking until the entire number of bytes requested can be read. So you won't get "all or none" behavior. At best you should get EAGAIN if no data is present and block until the full message is available otherwise.
You might be able to fashion something out of MSG_PEEK or ioctl() with a FIONREAD (if your system supports it) that effectively behaves like you want but I am unaware how you can accomplish your goal just using the recv() flags.

This is what I did for the same problem, but I'd like some confirmation that this works as expected...
ssize_t recv_allOrNothing(int socket_id, void *buffer, size_t buffer_len, bool block = false)
{
if(!block)
{
ssize_t bytes_received = recv(socket_id, buffer, buffer_len, MSG_DONTWAIT | MSG_PEEK);
if (bytes_received == -1)
return -1;
if ((size_t)bytes_received != buffer_len)
return 0;
}
return recv(socket_id, buffer, buffer_len, MSG_WAITALL);
}

Related

linux socket: lifetime of ancillary data for sendmsg

I use cmsg to activate timestamping on linux socket tx.
ssize_t sendWithOptions
(int sd, std::vector<uint8_t> &payload, uint32_t destIP, int flags)
{
msghdr msg { };
.... // filling standard
std::array<uint8_t, CMSG_LEN(sizeof(__u32))> buf;
msg.msg_control = buf.data();
msg.msg_controlen = buf.size();
auto cmsg { CMSG_FIRSTHDR ( &msg ) };
cmsg->cmsg_level = SOL_SOCKET;
cmsg->cmsg_type = SO_TIMESTAMPING;
cmsg->cmsg_len = buf.size();
*(reinterpret_cast<__u32>(CMSG_DATA (cmsg)) = static_cast<__u32>(flags);
return sendmsg ( sd, &msg, MSG_DONTWAIT );
}
Leaving the function, "buf" is automatically destroyed, but does sendmsg need this buffer to live longer?
Do I have a guarantee that the function does not need this buffer once it has returned the number of bytes sent.
Except for specific interfaces, it is generally the case that operating system calls do not rely on user-space to maintain data structures affecting their operation after they are finished. The exceptions will be spelled out in the manual pages.
With sendmsg, in particular, you can rely on the call to complete immediately - whether successful or not. It's fine therefore to use a dynamically allocated buffer as you're doing, and destroy it immediately after the call.
As an example of one exception, aio_write(2) is specifically intended to allow user-space to queue a write operation that will be completed asynchronously. For this call, the data is not consumed until it can be successfully written. Hence, you must not modify the data structures provided in the call until you have confirmed it is complete. That caveat is called out in the NOTES section of the manual page:
... The control block must not be changed while the write operation is in progress. The buffer area being written out must not be accessed during the operation or undefined results may occur. The memory areas involved must remain valid.
In summary: check the manual page for the system call. But most of the time, you don't need to worry about it.

Is write to SOCK_SEQPACKET atomic?

What I mean atomic is success or failed and do nothing.
I know socketpair(AF_LOCAL, SOCK_STREAM) is not atomic, if multiple processes/threads call write(fd, buf, len), the return value of the write() maybe > 0 && < len and cause data out of order.
If multiple processes/threads write(buf, len) to a sock_fd which created by socketpair(AF_LOCAL, SOCK_SEQPACKET), is it atomic?
I check the Linux manual and found something about pipe() which says if the len is less than PIPE_BUF, the write/writev is atomic.
I found nothing about socketpair. I wrote a test code and found it seems that the SOCK_SEQPACKET is atomic, I write random length buffer to fd and the return value is always -1 or len.
Yes.
Any interface that is datagram based (i.e. - the size you pass to write is visible to the person doing the read) must be atomic. There is no other way to guarantee that property.
So SOCK_SEQPACKET, as well as SOCK_DGRAM, must be atomic in order to function.
For that very same reason, SOCK_STREAM has no such atomicy guarantees.

Safely close an indefinitely running thread

So first off, I realize that if my code was in a loop I could use a do while loop to check a variable set when I want the thread to close, but in this case that is not possible (so it seems):
DWORD WINAPI recv thread (LPVOID random) {
recv(ClientSocket, recvbuffer, recvbuflen, 0);
return 1;
}
In the above, recv() is a blocking function.
(Please pardon me if the formatting isn't correct. It's the best I can do on my phone.)
How would I go about terminating this thread since it never closes but never loops?
Thanks,
~P
Amongst other solutions you can
a) set a timeout for the socket and handle timeouts correctly by checking the return values and/or errors in an appropriate loop:
setsockopt(ClientSocket,SOL_SOCKET,SO_RCVTIMEO,(char *)&timeout,sizeof(timeout))
b) close the socket with recv(..) returning from blocked state with error.
You can use poll before recv() to check if some thing there to receive.
struct pollfd poll;
int res;
poll.fd = ClientSocket;
poll.events = POLLIN;
res = poll(&poll, 1, 1000); // 1000 ms timeout
if (res == 0)
{
// timeout
}
else if (res == -1)
{
// error
}
else
{
// implies (poll.revents & POLLIN) != 0
recv(ClientSocket, recvbuffer, recvbuflen,0); // we can read ...
}
The way I handle this problem is to never block inside recv() -- preferably by setting the socket to non-blocking mode, but you may also be able to get away with simply only calling recv() when you know the socket currently has some bytes available to read.
That leads to the next question: if you don't block inside recv(), how do you prevent CPU-spinning? The answer to that question is to call select() (or poll()) with the correct arguments so that you'll block there until the socket has bytes ready to recv().
Then comes the third question: if your thread is now blocked (possibly forever) inside select(), aren't we back to the original problem again? Not quite, because now we can implement a variation of the self-pipe trick. In particular, because select() (or poll()) can 'watch' multiple sockets at the same time, we can tell the call to block until either of two sockets has data ready-to-read. Then, when we want to shut down the thread, all the main thread has to do is send a single byte of data to the second socket, and that will cause select() to return immediately. When the thread sees that it is this second socket that is ready-for-read, it should respond by exiting, so that the main thread's blocking call to WaitForSingleObject(theThreadHandle) will return, and then the main thread can clean up without any risk of race conditions.
The final question is: how to set up a socket-pair so that your main thread can call send() on one of the pair's sockets, and your recv-thread will see the sent data appear on the other socket? Under POSIX it's easy, there is a socketpair() function that does exactly that. Under Windows, socketpair() does not exist, but you can roll your own implementation of it as shown here.

How to handle the Linux socket revents POLLERR, POLLHUP and POLLNVAL?

I'm wondering what should be done when poll set these bits? Close socket, ignore it or what?
A POLLHUP means the socket is no longer connected. In TCP, this means FIN has been received and sent.
A POLLERR means the socket got an asynchronous error. In TCP, this typically means a RST has been received or sent. If the file descriptor is not a socket, POLLERR might mean the device does not support polling.
For both of the conditions above, the socket file descriptor is still open, and has not yet been closed (but shutdown() may have already been called). A close() on the file descriptor will release resources that are still being reserved on behalf of the socket. In theory, it should be possible to reuse the socket immediately (e.g., with another connect() call).
A POLLNVAL means the socket file descriptor is not open. It would be an error to close() it.
It depend on the exact error nature. Use getsockopt() to see the problem:
int error = 0;
socklen_t errlen = sizeof(error);
getsockopt(fd, SOL_SOCKET, SO_ERROR, (void *)&error, &errlen);
Values: http://www.xinotes.net/notes/note/1793/
The easiest way is to assume that the socket is no longer usable in any case and close it.
POLLNVAL means that the file descriptor value is invalid. It usually indicates an error in your program, but you can rely on poll returning POLLNVAL if you've closed a file descriptor and you haven't opened any file since then that might have reused the descriptor.
POLLERR is similar to error events from select. It indicates that a read or write call would return an error condition (e.g. I/O error). This does not include out-of-band data which select signals via its errorfds mask but poll signals via POLLPRI.
POLLHUP basically means that what's at the other end of the connection has closed its end of the connection. POSIX describes it as
The device has been disconnected. This event and POLLOUT are mutually-exclusive; a stream can never be writable if a hangup has occurred.
This is clear enough for a terminal: the terminal has gone away (same event that generates a SIGHUP: the modem session has been terminated, the terminal emulator window has been closed, etc.). POLLHUP is never sent for a regular file. For pipes and sockets, it depends on the operating system. Linux sets POLLHUP when the program on the writing end of a pipe closes the pipe, and sets POLLIN|POLLHUP when the other end of a socket closed the socket, but POLLIN only for a socket shutdown. Recent *BSD set POLLIN|POLLUP when the writing end of a pipe closes the pipe, and the behavior for sockets is more variable.
Minimal FIFO example
Once you understand when those conditions happen, it should be easy to know what to do with them.
poll.c
#define _XOPEN_SOURCE 700
#include <fcntl.h> /* creat, O_CREAT */
#include <poll.h> /* poll */
#include <stdio.h> /* printf, puts, snprintf */
#include <stdlib.h> /* EXIT_FAILURE, EXIT_SUCCESS */
#include <unistd.h> /* read */
int main(void) {
char buf[1024];
int fd, n;
short revents;
struct pollfd pfd;
fd = open("poll0.tmp", O_RDONLY | O_NONBLOCK);
pfd.fd = fd;
pfd.events = POLLIN;
while (1) {
puts("loop");
poll(&pfd, 1, -1);
revents = pfd.revents;
if (revents & POLLIN) {
n = read(pfd.fd, buf, sizeof(buf));
printf("POLLIN n=%d buf=%.*s\n", n, n, buf);
}
if (revents & POLLHUP) {
printf("POLLHUP\n");
close(pfd.fd);
pfd.fd *= -1;
}
if (revents & POLLNVAL) {
printf("POLLNVAL\n");
}
if (revents & POLLERR) {
printf("POLLERR\n");
}
}
}
GitHub upstream.
Compile with:
gcc -o poll.out -std=c99 poll.c
Usage:
sudo mknod -m 666 poll0.tmp p
./poll.out
On another shell:
printf a >poll0.tmp
POLLHUP
If you don't modify the source: ./poll.out outputs:
loop
POLLIN n=1 buf=a
loop
POLLHUP
loop
So:
POLLIN happens when input becomes available
POLLHUP happens when the file is closed by the printf
close(pfd.fd); and pfd.fd *= -1; clean things up, and we stop receiving POLLHUP
poll hangs forever
This is the normal operation.
You could now repoen the FIFO to wait for the next open, or exit the loop if you are done.
POLLNAL
If you comment out pfd.fd *= -1;: ./poll.out prints:
POLLIN n=1 buf=a
loop
POLLHUP
loop
POLLNVAL
loop
POLLNVAL
...
and loops forever.
So:
POLLIN and POLLHUP and close happened as before
since we didn't set pfd.fd to a negative number, poll keeps trying to use the fd that we closed
this keeps returning POLLNVAL forever
So we see that this shouldn't have happened, and indicates a bug in your code.
POLLERR
I don't know how to generate a POLLERR with FIFOs. Let me know if there is way. But it should be possible with file_operations of a device driver.
Tested in Ubuntu 14.04.

recv with flags MSG_DONTWAIT | MSG_PEEK on TCP socket

I have a TCP stream connection used to exchange messages. This is inside Linux kernel. The consumer thread keeps processing incoming messages. After consuming one message, I want to check if there are more pending messages; in which case I would process them too. My code to achieve this looks like below. krecv is wrapper for sock_recvmsg(), passing value of flags without modification (krecv from ksocket kernel module)
With MSG_DONTWAIT, I am expecting it should not block, but apparently it blocks. With MSG_PEEK, if there is no data to be read, it should just return zero. Is this understanding correct ? Is there a better way to achieve what I need here ? I am guessing this should be a common requirement as message passing across nodes is used frequently.
int recvd = 0;
do {
recvd += krecv(*sockp, (uchar*)msg + recvd, sizeof(my_msg) - recvd, 0);
printk("recvd = %d / %lu\n", recvd, sizeof(my_msg));
} while(recvd < sizeof(my_msg));
BUG_ON(recvd != sizeof(my_msg));
/* For some reason, below line _blocks_ even with no blocking flags */
recvd = krecv(*sockp, (uchar*)tempbuf, sizeof(tempbuf), MSG_PEEK | MSG_DONTWAIT);
if (recvd) {
printk("more data waiting to be read");
more_to_process = true;
} else {
printk("NO more data waiting to be read");
}
You might check buffer's length first :
int bytesAv = 0;
ioctl(m_Socket,FIONREAD,&bytesAv); //m_Socket is the socket client's fd
If there are data in it , then recv with MSG_PEEK should not be blocked ,
If there are no data at all , then no need to MSG_PEEK ,
that might be what you like to do .
This is a very-very old question, but
1. problem persits
2. I faced with it.
At least for me (Ubuntu 19.04 with python 2.7) this MSG_DONTWAIT has no effect, however if I set the timeout to zero (with settimeout function), it works nicely.
This can be done in c with setsockopt function.

Resources