Is write to SOCK_SEQPACKET atomic? - linux

What I mean atomic is success or failed and do nothing.
I know socketpair(AF_LOCAL, SOCK_STREAM) is not atomic, if multiple processes/threads call write(fd, buf, len), the return value of the write() maybe > 0 && < len and cause data out of order.
If multiple processes/threads write(buf, len) to a sock_fd which created by socketpair(AF_LOCAL, SOCK_SEQPACKET), is it atomic?
I check the Linux manual and found something about pipe() which says if the len is less than PIPE_BUF, the write/writev is atomic.
I found nothing about socketpair. I wrote a test code and found it seems that the SOCK_SEQPACKET is atomic, I write random length buffer to fd and the return value is always -1 or len.

Yes.
Any interface that is datagram based (i.e. - the size you pass to write is visible to the person doing the read) must be atomic. There is no other way to guarantee that property.
So SOCK_SEQPACKET, as well as SOCK_DGRAM, must be atomic in order to function.
For that very same reason, SOCK_STREAM has no such atomicy guarantees.

Related

linux socket: lifetime of ancillary data for sendmsg

I use cmsg to activate timestamping on linux socket tx.
ssize_t sendWithOptions
(int sd, std::vector<uint8_t> &payload, uint32_t destIP, int flags)
{
msghdr msg { };
.... // filling standard
std::array<uint8_t, CMSG_LEN(sizeof(__u32))> buf;
msg.msg_control = buf.data();
msg.msg_controlen = buf.size();
auto cmsg { CMSG_FIRSTHDR ( &msg ) };
cmsg->cmsg_level = SOL_SOCKET;
cmsg->cmsg_type = SO_TIMESTAMPING;
cmsg->cmsg_len = buf.size();
*(reinterpret_cast<__u32>(CMSG_DATA (cmsg)) = static_cast<__u32>(flags);
return sendmsg ( sd, &msg, MSG_DONTWAIT );
}
Leaving the function, "buf" is automatically destroyed, but does sendmsg need this buffer to live longer?
Do I have a guarantee that the function does not need this buffer once it has returned the number of bytes sent.
Except for specific interfaces, it is generally the case that operating system calls do not rely on user-space to maintain data structures affecting their operation after they are finished. The exceptions will be spelled out in the manual pages.
With sendmsg, in particular, you can rely on the call to complete immediately - whether successful or not. It's fine therefore to use a dynamically allocated buffer as you're doing, and destroy it immediately after the call.
As an example of one exception, aio_write(2) is specifically intended to allow user-space to queue a write operation that will be completed asynchronously. For this call, the data is not consumed until it can be successfully written. Hence, you must not modify the data structures provided in the call until you have confirmed it is complete. That caveat is called out in the NOTES section of the manual page:
... The control block must not be changed while the write operation is in progress. The buffer area being written out must not be accessed during the operation or undefined results may occur. The memory areas involved must remain valid.
In summary: check the manual page for the system call. But most of the time, you don't need to worry about it.

Use setsockopt() to modify kernel structures in TCP?

Is it possible to modify a single member of a kernel struct in TCP? I want to be able to use setsockopt() to update a member of the tcp_info struct in TCP.
I've tried the following:
struct tcp_info info;
unsigned int optlen = sizeof(struct tcp_info);
if (getsockopt(sock, IPPROTO_TCP, TCP_INFO, &info, &optlen) < 0)
printf("Can't get data from getsockopt.\n");
info.retransmits += 10; // random member of tcp_info - as example
if (setsockopt(sock, IPPROTO_TCP, TCP_INFO, (char *) &info, optlen) < 0)
printf("Can't set data with setsockopt.\n");
The call to setsockopt() fails (returns a negative value).
The way I'm trying to solve it (above), given that it had worked - doesn't seem optimal. Is it possible to modify a members value from a struct, without having to fetch and update the entire struct (all of its members)?
You may not set arbitrary values with setsockopt(). It has a finite list of options you may set.
I'll use the FreeBSD kernel in this example, but all of this is similar if not identical in Linux. I will jump to FreBSD's sosetopt() function in sys/kern/uipc_socket.c.
The only valid options you may set are:
SO_ACCEPTFILTER, SO_LINGER, SO_DEBUG, SO_KEEPALIVE, SO_DONTROUTE, SO_USELOOPBACK, SO_BROADCAST, SO_REUSEADDR, SO_REUSEPORT, SO_REUSEPORT_LB, SO_OOBINLINE, SO_TIMESTAMP, SO_BINTIME, SO_NOSIGPIPE, SO_NO_DDP, SO_NO_OFFLOAD, SO_RERROR, SO_SETFIB, SO_USER_COOKIE, SO_SNDBUF, SO_RCVBUF, SO_SNDLOWAT, SO_RCVLOWAT, SO_SNDTIMEO, SO_RCVTIMEO, SO_LABEL, SO_TS_CLOCK, and SO_MAX_PACING_RATE.
That list contain a number of status flags, enabling or disabling features. There are only a few that allow setting of numerical values.
SO_USER_COOKIE - set a user-specified metadata value to a socket.
SO_SNDBUF/SO_RCVBUF - set the allocated buffer sizes for sending and receiving.
SO_SNDLOWAT/SO_RCVLOWAT - set a minimum amount of data to be sent/received per call.
SO_SNDTIMEO/SO_RCVTIMEO - set a timeout for sending/receiving calls.
SO_MAX_PACING_RATE - Instructs the network adapter to limit the transfer rate.
None of these write values directly to kernel structures. To accomplish something of the sort you have request, you will need to modify the kernel. Your other question addresses that objective.

recv with flags MSG_DONTWAIT | MSG_PEEK on TCP socket

I have a TCP stream connection used to exchange messages. This is inside Linux kernel. The consumer thread keeps processing incoming messages. After consuming one message, I want to check if there are more pending messages; in which case I would process them too. My code to achieve this looks like below. krecv is wrapper for sock_recvmsg(), passing value of flags without modification (krecv from ksocket kernel module)
With MSG_DONTWAIT, I am expecting it should not block, but apparently it blocks. With MSG_PEEK, if there is no data to be read, it should just return zero. Is this understanding correct ? Is there a better way to achieve what I need here ? I am guessing this should be a common requirement as message passing across nodes is used frequently.
int recvd = 0;
do {
recvd += krecv(*sockp, (uchar*)msg + recvd, sizeof(my_msg) - recvd, 0);
printk("recvd = %d / %lu\n", recvd, sizeof(my_msg));
} while(recvd < sizeof(my_msg));
BUG_ON(recvd != sizeof(my_msg));
/* For some reason, below line _blocks_ even with no blocking flags */
recvd = krecv(*sockp, (uchar*)tempbuf, sizeof(tempbuf), MSG_PEEK | MSG_DONTWAIT);
if (recvd) {
printk("more data waiting to be read");
more_to_process = true;
} else {
printk("NO more data waiting to be read");
}
You might check buffer's length first :
int bytesAv = 0;
ioctl(m_Socket,FIONREAD,&bytesAv); //m_Socket is the socket client's fd
If there are data in it , then recv with MSG_PEEK should not be blocked ,
If there are no data at all , then no need to MSG_PEEK ,
that might be what you like to do .
This is a very-very old question, but
1. problem persits
2. I faced with it.
At least for me (Ubuntu 19.04 with python 2.7) this MSG_DONTWAIT has no effect, however if I set the timeout to zero (with settimeout function), it works nicely.
This can be done in c with setsockopt function.

Uninterruptible read/write calls

At some point during my C programming adventures on Linux, I encountered flags (possibly ioctl/fcntl?), that make reads and writes on a file descriptor uninterruptible.
Unfortunately I cannot recall how to do this, or where I read it. Can anyone shed some light?
Update0
To refine my query, I'm after the same blocking and guarantees that fwrite() and fread() provide, sans userspace buffering.
You can avoid EINTR from read() and write() by ensuring all your signal handlers are installed with the SA_RESTART flag of sigaction().
However this does not protect you from short reads / writes. This is only possible by putting the read() / write() into a loop (it does not require an additional buffer beyond the one that must already be supplied to the read() / write() call.)
Such a loop would look like:
/* If return value is less than `count', then errno == 0 indicates end of file,
* otherwise errno indicates the error that occurred. */
ssize_t hard_read(int fd, void *buf, size_t count)
{
ssize_t rv;
ssize_t total_read = 0;
while (total_read < count)
{
rv = read(fd, (char *)buf + total_read, count - total_read);
if (rv == 0)
errno = 0;
if (rv < 1)
if (errno == EINTR)
continue;
else
break;
total_read += rv;
}
return rv;
}
Do you wish to disable interrupts while reading/writing, or guarantee that nobody else will read/write the file while you are?
For the second, you can use fcntl()'s F_GETLK, F_SETLK and F_SETLKW to acquire, release and test for record locks respectively. However, since POSIX locks are only advisory, Linux does not enforce them - it's only meaningful between cooperating processes.
The first task involves diving into ring zero and disabling interrupts on your local processor (or all, if you're on an SMP system). Remember to enable them again when you're done!

recv with MSG_NONBLOCK and MSG_WAITALL

I want to use recv syscall with nonblocking flags MSG_NONBLOCK. But with this flag syscall can return before full request is satisfied. So,
can I add MSG_WAITALL flag? Will it be nonblocking?
or how should I rewrite blocking recv into the loop with nonblocking recv
For IPv4 TCP receives on Linux at least, MSG_WAITALL is ignored if MSG_NONBLOCK is specified (or the file descriptor is set to non-blocking).
From tcp_recvmsg() in net/ipv4/tcp.c in the Linux kernel:
if (copied >= target && !sk->sk_backlog.tail)
break;
if (copied) {
if (sk->sk_err ||
sk->sk_state == TCP_CLOSE ||
(sk->sk_shutdown & RCV_SHUTDOWN) ||
!timeo ||
signal_pending(current))
break;
target in this cast is set to to the requested size if MSG_DONTWAIT is specified or some smaller value (at least 1) if not. The function will complete if:
Enough bytes have been copied
There's a socket error
The socket has been closed or shutdown
timeo is 0 (socket is set to non-blocking)
There's a signal pending for the process
To me this seems like it may be a bug in Linux, but either way it won't work the way you want. It looks like dec-vt100's solution will, but there is a race condition if you try to receive from the same socket in more than one process or thread.That is, another recv() call by another thread/process could occur after your thread has performed a peek, causing your thread to block on the second recv().
EDIT:
Plain recv() will return whatever is in the tcp buffer at the time of the call up to the requested number of bytes. MSG_DONTWAIT just avoids blocking if there is no data at all ready to be read on the socket. MSG_WAITALL requests blocking until the entire number of bytes requested can be read. So you won't get "all or none" behavior. At best you should get EAGAIN if no data is present and block until the full message is available otherwise.
You might be able to fashion something out of MSG_PEEK or ioctl() with a FIONREAD (if your system supports it) that effectively behaves like you want but I am unaware how you can accomplish your goal just using the recv() flags.
This is what I did for the same problem, but I'd like some confirmation that this works as expected...
ssize_t recv_allOrNothing(int socket_id, void *buffer, size_t buffer_len, bool block = false)
{
if(!block)
{
ssize_t bytes_received = recv(socket_id, buffer, buffer_len, MSG_DONTWAIT | MSG_PEEK);
if (bytes_received == -1)
return -1;
if ((size_t)bytes_received != buffer_len)
return 0;
}
return recv(socket_id, buffer, buffer_len, MSG_WAITALL);
}

Resources