Socket send concurrency guarantees - linux

If I have a single socket shared between two processes (or two threads), and in both of them I try to send a big message (bigger than the underlining protocol buffer) that blocks, is it guaranteed that both messages will be sent sequentially? Or it possible for the messages to be interleaved inside the kernel?
I am mainly interested in TCP over IP behavior, but it would be interesting to know if it varies according to socket's protocol.

You're asking that if you write() message A, then B on the same socket, is A guaranteed to arrive before B? For SOCK_STREAM (e.g. TCP) and SOCK_SEQPACKET (almost never used) sockets, the answer is an unqualified yes. For SOCK_DGRAM over the internet (i.e. UDP packets) the answer is no: packets can be reordered by the network. On a single host, a unix domain datagram socket will (on all systems I know) preserve ordering, but I don't believe that's guaranteed by any standard and I'm sure there are edge cases.
Or wait: maybe you're asking if the messages written by the two processes won't be mixed? Yes: single system calls (write/writev/sendto/sendmsg) always place their content into a file descriptor atomically. But obviously if you or your library splits that write into multiple calls, you lose that guarantee.

For UDP, if two threads write to a socket handle simultaneously, both messages will be sent as separate datagrams. Might undergo IP fragmentation if the packet is larger than MTU, but resulting datagrams will be preserved and reassembled correctly by the receiver. In other words, you are safe for UDP, except for the normal issues associated with UDP (datagram reordering, packet loss, etc...).
For TCP, which is stream based, I don't know. Your question is essentially asking the equivalent of "if two threads try to write to the same file handle, will the file still be legible?" I actually don't know the answer.
The simplest thing you can do is just use a thread safe lock (mutex) to guard send/write calls to the socket so that only on thread can write to the socket at a time.
For TCP, I would suggest having a dedicated thread for handling all socket io. Then just invent a way in which messages from the worker thrads can get asynchronously queued to the socket thread for it to send on. The socket thread could also handle recv() calls and notify the other threads when the socket connection is terminated by the remote side.

If you try to send a large message on a STREAM socket that exceeds the underlying buffer size, it's pretty much guaranteed that you you will get a short write -- the write or send call will write only part of the data (as much as will fit in the buffer) and then return the amount written, leaving you to do another write for the remaining data.
If you do this in multiple threads or processes, then each write (or send) will thus write a small portion of the message atomically into the send buffer, but the subsequent writes might happen in any order, with the result that the large buffers being sent will get interleaved.
If you send messages on DGRAM sockets, on the other hand, either the entire message will be sent atomically (as a single layer 4 packet, which might be fragmented and reassembled by lower layers of the protocol stack), or you will get an error (EMSGSIZE linux or other UNIX variants)

Related

Can two threads simultaneously `send` and `recv` on the same socket?

I need to repeatedly send and receive UDP datagrams to/from a socket. My idea was to spawn two threads, one responsible for sending and the other for receiving. The whole idea makes sense only if it is possible to one thread to wait on a blocking recv() and the other executing send() on the same socket at the same time.
I did some Googling and found this SO question: Are parallel calls to send/recv on the same socket valid? The accepted answer mentions that send() and recv() are thread-safe (whew…), but then proceeds with an alarming remark:
This doesn't necessarily mean that they'll be executed in parallel
Oops. Does this mean that if I implement my multithreaded idea, will I end up with the sending thread waiting for the receiving thread’s recv() to return before it actually starts sending its data? Bad.
It is ambiguous whether this accepted answer refers to two parallel send()’s only, or if the concern is real also for an attempt to parallely execute one send() and one recv(). Therefore:
Will a call to send() and a call to recv() on the same socket by two threads be executed parallely, or will one of these calls block until the other returns?
Short answer: You should be ok to have separate threads for sending and receiving with the same socket handle.
A common scenario is a video conferencing application. You might want to have one thread recording from the microphone and sending audio out the udp port. Another thread receives packets on the same port and plays them back over the speakers.
If your protocol is more synchronous (i.e. request/response flow - in order to send, you first have to receive something first), then a single likely thread makes more sense from a design perspective.

How do I eliminate EAGAIN errors on blocking send() calls on Linux

I am trying to write a test program that writes data across a TCP socket through Localhost on a Linux machine (CentOS 6.5 to be exact). I have one program writing, and one program reading. The reading is done via non-blocking recv() calls triggered from an epoll. There are enough cores on the machine to handle all CPU load without contention or scheduling issues.
The sends are buffers of smaller packets (about 100 bytes), aggregated up to 1400 bytes. Changing to aggregate larger (64K) buffers makes no apparent difference. When I do the sends, after 10s of MB of data, I start getting EAGAIN errors on the sender. Verifying via fcntl, I am definitiely configured as a blocking socket. I also noticed that calling ioctl(SIOCOUTQ) whenever I get EAGAIN yields larger and larger numbers.
The receiver is slower in processing the data read than the sender is in creating the data. Adding receiver threads is not an option. The fact that it is slower is OK assuming I can throttle the sender.
Now, my understanding of blocking sockets is that a send() should block in the send until the data goes out (this is based upon past experience) - meaning, the tcp stack should force the input side to be self-throttling.
Also, the Linux manual page for send() (and other queries on the web) indicate that EAGAIN is only returned for non-blocking sockets.
Can anyone give me some understanding as to what is going on, and how I can get my send() to actually block until enough data goes out for me to put more in. I would rather not have to rework the logic to put usleep() or other things in the code. It is a lightly loaded system, yield() is insufficient to allow it to drain.

Fairness of socket write() in 2 parallel connections?

Suppose I have a multi-threaded program in which each of the 2 threads:
has its own socket socket_fd in default (blocking) mode
repeatedly sends data using the write(socket_fd, data, data_len) such that network becomes a bottleneck
the size of the data being passed to write (i.e. data_len) is always equal to MSS; for simplicity, assume data_len = 500
I'm wondering about the fairness of writes assuming a single network interface card, i.e.: if thread 2 calls write 9x less frequently, is there a weak guarantee that the data sent by thread 2 will be roughly 1/(1 + 9) of the total data sent within reasonable time (i.e. thread 2 will eventually send its data even though thread 1 keeps the underlying media very busy by constantly sending excessive amount of data)?
I am primarily interested in the case where thread 1 (which sends more data) uses TCP and thread 2 uses DCCP. Nevertheless, answers for the scenarios in which thread 2 uses UDP and TCP are also welcome.
It depends on the queuing discipline which schedules outgoing packets on the network interface. The default pfifo_fast, the default Linux qdisc organizes outgoing packets into fifo queues indexed by ToS field. Outgoing packets with the same ToS are sent in the order the kernel receives packets from applications.

should socket be set NON-BLOCKING before it is polled by select()?

I have the memory that when we want to use select() over a socket descriptor, this socket should be set NONBLOCKING in advance.
but today, I read a source file where there seems no lines which set socket to NON-BLOCKING
Is my memory correct or not?
thanks!
duskwuff has the right idea when he says
In general, you do not need to set a socket as non-blocking to use it
in select().
This is true if your kernel is POSIX compliant with regard to select(). Unfortunately, some people use Linux, which is not, as the Linux select() man page says:
Under Linux, select() may report a socket file descriptor as "ready for
reading", while nevertheless a subsequent read blocks. This could for
example happen when data has arrived but upon examination has wrong
checksum and is discarded. There may be other circumstances in which a
file descriptor is spuriously reported as ready. Thus it may be safer
to use O_NONBLOCK on sockets that should not block.
There was a discussion of this on lkml on or about Sat, 18 Jun 2011. One kernel hacker tried to justify the non POSIX compliance. They honor POSIX when it's convenient and desecrate it when it's not.
He argued "there may be two readers and the second will block." But such an application flaw is non sequiter. The kernel is not expected to prevent application flaws. The kernel has a clear duty: in all cases of the first read() after select(), the kernel must return at least 1 byte, EOF, or an error; but NEVER block. As for write(), you should always test whether the socket is reported writable by select(), before writing. This guarantees you can write at least one byte, or get an error; but NEVER block. Let select() help you, don't write blindly hoping you won't block. The Linux hacker's grumbling about corner cases, etc., are euphemisms for "we're too lazy to work on hard problems."
Suppose you read a serial port set for:
min N; with -icanon, set N characters minimum for a completed read
time N; with -icanon, set read timeout of N tenths of a second
min 250 time 1
Here you want blocks of 250 characters, or a one tenth second timeout. When I tried this on Linux in non blocking mode, the read returned for every single character, hammering the CPU. It was NECESSARY to leave it in blocking mode to get the documented behavior.
So there are good reasons to use blocking mode with select() and expect your kernel to be POSIX compliant.
But if you must use Linux, Jeremy's advice may help you cope with some of its kernel flaws.
It depends. Setting a socket as non-blocking does several things:
Makes read() / recv() return immediately with no data, instead of blocking, if there is nothing available to read on the socket.
If you are using select(), this is probably a non-issue. So long as you only read from a socket when select() tells you it is readable, you're fine.
Makes write() / send() return partial (or zero) writes, instead of blocking, if not enough space is available in kernel buffers.
This one is tricky. If your application is written to handle this situation, it's great, because it means your application will not block when a client is reading slowly. However, it means that your application will need to temporarily store writable data in its own application-level buffers, rather than writing directly to sockets, and selectively place sockets with pending writes in the writefds set. Depending on what your application is, this may either be a lifesaver or a huge added complication. Choose carefully.
If set before the socket is connected, makes connect() return immediately, before a connection is actually made.
Similarly, this is sometimes useful if your application needs to make connections to hosts that may respond slowly while continuing to respond on other sockets, but can cause issues if you aren't careful about how you handle these half-connected sockets. It's usually best avoided (by only setting sockets as non-blocking after they are connected, if at all).
In general, you do not need to set a socket as non-blocking to use it in select(). The system call already lets you handle sockets in a basic non-blocking fashion. Some applications will need non-blocking writes, though, and that's what the flag is still needed for.
send() and write() block if you provide more data than can be fitted into the socket send buffer. Normally in select() programming you don't want to block anywhere except in select(), so you use non-blocking mode.
With certain Windows APIs it indeed essential to use non-blocking mode.
Usually when you are using select(), you are using it is the basis of an event loop; and when using an event loop you want the event loop to block only inside select() and never anywhere else. (The reason for that is so that your program will always wake up whenever there is something to do on any of the sockets it is handling -- if, for example, your program was blocked inside recv() for socket A, it would be unable to handle any data coming in on socket B until it got some data from socket A first to wake it up; and vice versa).
Therefore it is best to set all sockets non-blocking when using select(). That way there is no chance of your program getting blocked on a single socket and ignoring the other ones for an extended period of time.

Server running in linux kernel. Should listen happen in a thread or not?

I am writing a client/server in linux kernel (Yes. Inside the kernel. Its design decision taken and finalised. Its not going to change)
The server reads incoming packets from a raw socket. The transport protocol for these packets (on which the raw socket is listening) is custom and UDP like. In short I do not have to listen for incoming connections and then fork a thread to handle that connection.
I have to just process any IP datagram coming on that raw socket. I will keep reading for packets in an infinite loop on the raw socket. In the user-level equivalent program, I would have created a separate thread and kept listening for incoming packets.
Now for kernel level server, I have doubts about whether I should run it in a separate thread or not because:
I think read() is an I/O operation. So somewhere inside the read(), kernel must be calling schedule() function to relinquish the control of the processor. Thus after calling read() on raw socket, the current kernel active context will be put on hold (put in a sleep queue maybe?) until the packets are available. As and when packets will arrive, the kernel interrupt context will signal that the read context, which is sleeping in the queue, is once again ready to run. I am using 'context' here on purpose instead of 'thread'. Thus I should not require a separate kernel thread.
On the other hand, if read() does not relinquish the control then entire kernel will be blocked.
Can anyone provide tips about how should I design my server?
What is the fallacy of the argument presented in point 1?
I'm not sure whether you need a raw socket at all in the kernel. Inside the kernel you can add a netfilter hook, or register something else (???) which will receive all packets; this might be what you want.
If you DID use a raw socket inside the kernel, then you'd probably need to have a kernel thread (i.e. started by kernel_thread) to call read() on it. But it need not be a kernel thread, it could be a userspace thread which just made a special syscall or device call to call the desired kernel-mode routine.
If you have a hook registered, the context it's called in is probably something which should not do too much processing; I don't know exactly what that is likely to be, it may be a "bottom half handler" or "tasklet", whatever the are (these types of control structures keep changing from one version to another). I hope it's not actually an interrupt service routine.
In answer to your original question:
Yes, sys_read will block the calling thread, whether it's a kernel thread or a userspace one. The system will not hang. However, if the calling thread is not in a state where blocking makes sense, the kernel will panic (scheduling in interrupt or something)
Yes you will need to do this in a separate thread, no it won't hang the system. However, making system calls in kernel mode is very iffy, although it does work (sort of).
But if you installed some kind of hook instead, you wouldn't need to do any of that.
I think your best bet might be to emulate the way drivers are written, think of your server as a virtual device sitting on top of the ones that the requests are coming from. Example: a mouse driver accepts continuous input, but doesn't lock the system if programmed correctly, and a network adapter is probably more similar to your case.

Resources