Can two threads simultaneously `send` and `recv` on the same socket? - linux

I need to repeatedly send and receive UDP datagrams to/from a socket. My idea was to spawn two threads, one responsible for sending and the other for receiving. The whole idea makes sense only if it is possible to one thread to wait on a blocking recv() and the other executing send() on the same socket at the same time.
I did some Googling and found this SO question: Are parallel calls to send/recv on the same socket valid? The accepted answer mentions that send() and recv() are thread-safe (whew…), but then proceeds with an alarming remark:
This doesn't necessarily mean that they'll be executed in parallel
Oops. Does this mean that if I implement my multithreaded idea, will I end up with the sending thread waiting for the receiving thread’s recv() to return before it actually starts sending its data? Bad.
It is ambiguous whether this accepted answer refers to two parallel send()’s only, or if the concern is real also for an attempt to parallely execute one send() and one recv(). Therefore:
Will a call to send() and a call to recv() on the same socket by two threads be executed parallely, or will one of these calls block until the other returns?

Short answer: You should be ok to have separate threads for sending and receiving with the same socket handle.
A common scenario is a video conferencing application. You might want to have one thread recording from the microphone and sending audio out the udp port. Another thread receives packets on the same port and plays them back over the speakers.
If your protocol is more synchronous (i.e. request/response flow - in order to send, you first have to receive something first), then a single likely thread makes more sense from a design perspective.

Related

Relative merits between one thread per client and queuing thread models for a threaded server?

Let's say we're building a threaded server intended to run on a system with four cores. The two thread management schemes I can think of are one thread per client connection and a queuing system.
As the first system's name implies, we'll spawn one thread per client that connects to our server. Assuming one thread is always dedicated to our program's main thread of execution, we'll be able to handle up to three clients concurrently and for any more simultaneous clients than that we'll have to rely on the operating system's preemptive multitasking functionality to switch among them (or the VM's in the case of green threads).
For our second approach, we'll make two thread-safe queues. One is for incoming messages and one is for outgoing messages. In other words, requests and replies. That means we'll probably have one thread accepting incoming connections and placing their requests into the incoming queue. One or two threads will handle the processing of the incoming requests, resolving the appropriate replies, and placing those replies on the outgoing queue. Finally, we'll have one thread just taking replies off of that queue and sending them back out to the clients.
What are the pros and cons of these approaches? Notice that I didn't mention what kind of server this is. I'm assuming that which one has a better performance profile depends on whether the server handles short connections like a web servers and POP3 servers, or longer connections like a WebSocket servers, game servers, and messaging app servers.
Are there other thread management strategies besides these two?
I believe I've done both organizations at one time or another.
Method 1
Just so we're on the same page, the first has the main thread do a listen. Then, in a loop, it does accept. It then passes off the return value to a pthread_create and the client thread's loop does recv/send in loop processing all commands the remote client wants. When done, it cleans up and terminates.
For an example of this, see my recent answer: multi-threaded file transfer with socket
This has the virtues that the main thread and client threads are straightforward and independent. No thread waits on anything another thread is doing. No thread is waiting on anything that it doesn't have to. Thus, the client threads [plural] can all run at maximum line speed. Also, if a client thread is blocked on a recv or send, and another thread can go, it will. It is self balancing.
All thread loops are simple: wait for input, process, send output, repeat. Even the main thread is simple: sock = accept, pthread_create(sock), repeat
Another thing. The interaction between the client thread and its remote client can be anything they agree on. Any protocol or any type of data transfer.
Method 2
This is somewhat akin to an N worker model, where N is fixed.
Because the accept is [usually] blocking, we'll need a main thread that is similar to method 1. Except, that instead of firing up a new thread, it needs to malloc a control struct [or some other mgmt scheme] and put the socket in that. It then puts this on a list of client connections and then loops back to the accept
In addition to the N worker threads, you are correct. At least two control threads, one to do select/poll, recv, enqueue request and one to do wait for result, select/poll, send.
Two threads are needed to prevent one of these threads having to wait on two different things: the various sockets [as a group] and the request/result queues from the various worker threads. With a single control thread all actions would have to be non-blocking and the thread would spin like crazy.
Here is an [extremely] simplified version of what the threads look like:
// control thread for recv:
while (1) {
// (1) do blocking poll on all client connection sockets for read
poll(...)
// (2) for all pending sockets do a recv for a request block and enqueue
// it on the request queue
for (all in read_mask) {
request_buf = dequeue(control_free_list)
recv(request_buf);
enqueue(request_list,request_buf);
}
}
// control thread for recv:
while (1) {
// (1) do blocking wait on result queue
// (2) peek at all result queue elements and create aggregate write mask
// for poll from the socket numbers
// (3) do blocking poll on all client connection sockets for write
poll(...)
// (4) for all pending sockets that can be written to
for (all in write_mask) {
// find and dequeue first result buffer from result queue that
// matches the given client
result_buf = dequeue(result_list,client_id);
send(request_buf);
enqueue(control_free_list,request_buf);
}
}
// worker thread:
while (1) {
// (1) do blocking wait on request queue
request_buf = dequeue(request_list);
// (2) process request ...
// (3) do blocking poll on all client connection sockets for write
enqueue(result_list,request_buf);
}
Now, a few things to notice. Only one request queue was used for all worker threads. The recv control thread did not try to pick an idle [or under utilized] worker thread and enqueue to a thread specific queue [this is another option to consider].
The single request queue is probably the most efficient. But, maybe, not all worker threads are created equal. Some may end up on CPU cores [or cluster nodes] that have special acceleration H/W, so some requests may have to be sent to specific threads.
And, if that is done, can a thread do "work stealing"? That is, a thread completes all its work and notices that another thread has a request in its queue [that is compatible] but hasn't been started. The thread dequeues the request and starts working on it.
Here's a big drawback to this method. The request/result blocks are of [mostly] fixed size. I've done an implementation where the control could have a field for a "side/extra" payload pointer that could be an arbitrary size.
But, if doing a large transfer file transfer, either upload or download, trying to pass this piecemeal through request blocks is not a good idea.
In the download case, the worker thread could usurp the socket temporarily and send the file data before enqueuing the result to the control thread.
But, for the upload case, if the worker tried to do the upload in a tight loop, it would conflict with recv control thread. The worker would have to [somehow] alert the control thread to not include the socket in its poll mask.
This is beginning to get complex.
And, there is overhead to all this request/result block enqueue/dequeue.
Also, the two control threads are a "hot spot". The entire throughput of the system depends on them.
And, there are interactions between the sockets. In the simple case, the recv thread can start one on one socket, but other clients wishing to send requests are delayed until the recv completes. It is a bottleneck.
This means that all recv syscalls have to be non-blocking [asynchronous]. The control thread has to manage these async requests (i.e. initiate one and wait for an async completion notification, and only then enqueue the request on the request queue).
This is beginning to get complicated.
The main benefit to wanting to do this is having a large number of simultaneous clients (e.g. 50,000) but keep the number of threads to a sane value (e.g. 100).
Another advantage to this method is that it is possible to assign priorities and use multiple priority queues
Comparison and hybrids
Meanwhile, method 1 does everything that method 2 does, but in a simpler, more robust [and, I suspect, higher throughput way].
After a method 1 client thread is created, it might split the work up and create several sub-threads. It could then act like the control threads of method 2. In fact, it might draw on these threads from a fixed N pool just like method 2.
This would compensate for a weakness of method 1, where the thread is going to do heavy computation. With a large number threads all doing computation, the system would get swamped. The queuing approach helps alleviate this. The client thread is still created/active, but it's sleeping on the result queue.
So, we've just muddied up the waters a bit more.
Either method could be the "front facing" method and have elements of the other underneath.
A given client thread [method 1] or worker thread [method 2] could farm out its work by opening [yet] another connection to a "back office" compute cluster. The cluster could be managed with either method.
So, method 1 is simpler and easier to implement and can easily accomodate most job mixes. Method 2 might be better for heavy compute servers to throttle the requests to limited resources. But, care must be taken with method 2 to avoid bottlenecks.
I don't think your "second approach" is well thought out, so I'll just see if I can tell you how I find it most useful to think about these things.
Rule 1) Your throughput is maximized if all your cores are busy doing useful work. Try to keep your cores busy doing useful work.
These are things that can keep you from keeping your cores busy doing useful work:
you are keeping them busy creating threads. If tasks are short-lived, then use a thread pool so you aren't spending all your time starting up and killing threads.
you are keeping them busy switching contexts. Modern OSes are pretty good at multithreading, but if you've gotta switch jobs 10000 times per second, that overhead is going to add up. If that's a problem for you you'll have to consider and event-driven architecture or other sort of more efficient explicit scheduling.
your jobs block or wait for a long time, and you don't have the resources to run enough threads threads to keep your cores busy. This can be a problem when you're serving protocols with persistent connections that hang around doing nothing most of the time, like websocket chat. You don't want to keep a whole thread hanging around doing nothing by tying it to a single client. You'll need to architect around that.
All your jobs need some other resource besides CPU, and you're bottlenecked on that -- that's a discussion for another day.
All that said... for most request/response kinds of protocols, passing each request or connection off to a thread pool that assigns it a thread for the duration of the request is easy to implement and performant in most cases.
Rule 2) Given maximized throughput (all your cores are usefully busy), getting jobs done on a first-come, first-served basis minimizes latency and maximizes responsiveness.
This is truth, but in most servers it is not considered at all. You can run into trouble here when your server is busy and jobs have to stop, even for short moments, to perform a lot of blocking operations.
The problem is that there is nothing to tell the OS thread scheduler which thread's job came in first. Every time your thread blocks and then becomes ready, it is scheduled on equal terms with all the other threads. If the server is busy, that means that the time it takes to process your request is roughly proportional to the number of times it blocks. That is generally no good.
If you have to block a lot in the process of processing a job, and you want to minimize the overall latency of each request, you'll have to do your own scheduling that keeps track of which jobs started first. In an event-driven architecture, for example, you can give priority to handling events for jobs that started earlier. In a pipelined architecture, you can give priority to later stages of the pipeline.
Remember these two rules, design your server to keep your cores busy with useful work, and do first things first. Then you can have a fast and responsive server.

Aborting a Read() call from another goroutine

I'm working on an IMAP server, and one of the operations is to upgrade the connection to use TLS (via the STARTTLS command). Our current architecture has one goroutine reading data from the socket, parsing the commands, and then sending logical commands over a channel. Another goroutine reads from that channel and executes the commands. This works great in general.
When executing STARTTLS, though, we need to stop the current in-progress Read() call, otherwise that Read() will consume bytes from the TLS handshake. We can insert another class in between, but then that class will be blocked on the Read() call and we have the same problem. If the network connection were a channel, we could add another signal channel and use a select{} block to stop reading, but network connections aren't channels (and simply wrapping it in a goroutine and channel just moves the problem to that goroutine).
Is there any way to stop a Read() call once it's begun, without waiting for a timeout to expire or something similar?
Read() call relies on your operating system behaviour under the hood. And its behaviour relies on socket behaviour.
If you're familiar with socket interface (which is almost a standard between operating systems with some small differences), you'll see that using synchronous communication mode for socket, read system call always blocks thread's execution until time out value expires, and you can't change this behaviour.
Go uses synchronous I/O under the hood for all its needs because goroutines make asynchronous communication unneeded by design.
There's also a way to break read: by shutting down the socket manually, which is not the best design decision for ones code, and in your specific case. So you should better play with smaller timeouts I think, or redesign your code to work in some other way.
There's really not much you can do to stop a Read call, unless you SetReadDeadline, or Close the connection.
One thing you could do is buffer it with the bufio package. This will allow you to Peek without actually reading something off the buffer. Peek will block just like Read does, but will allow you to decide what to do when something is available to read.

Linux, cancel blocking read()

In a multi-threaded Linux program used for serial communication, is it possible (and what would be the best approach) to terminate a blocking read() call from another thread?
I would like to keep everything as reactive as possible and avoid any use of timeouts with repeated polling.
The background of this question is that I'm trying to create a Scala serial communication library for Linux using JNI. I'm trying to keep the native side as simple as possible providing, amongst others, a read() and close() function. On the Scala side, one thread would call read() and block until data from the serial port is available. However, the serial port can be closed by other means, resulting in a call to close(). Now, to free up the blocked thread, I would somehow need to cancel the system read call.
One fairly popular trick: instead of blocking in read(), block in select() on both your serial-socket and a pipe. Then when another thread wants to wake up your thread, it can do so by writing a byte to the other end of that pipe. That byte will cause select() to return and your thread can now cleanup and exit or whatever it needs to do. (Note that to make this work 100% reliably you'll probably want to set your serial-socket to be non-blocking, to ensure that your thread only blocks in select() and never in read())
AFAIK signals are the only way to break any thread out of a blocking system call.
Use a pthread_kill() aimed at the thread with a USR1 signal.
You could probably do fake data input:
tty_ioctl(fd,TIOCSTI,"please unblock!");
Before calling it you should set some global flag, in order be able to check after 'read(...)' returns, if received data are just wake up goo or rather something more important.
Source: https://www.systutorials.com/docs/linux/man/4-tty_ioctl/

Socket send concurrency guarantees

If I have a single socket shared between two processes (or two threads), and in both of them I try to send a big message (bigger than the underlining protocol buffer) that blocks, is it guaranteed that both messages will be sent sequentially? Or it possible for the messages to be interleaved inside the kernel?
I am mainly interested in TCP over IP behavior, but it would be interesting to know if it varies according to socket's protocol.
You're asking that if you write() message A, then B on the same socket, is A guaranteed to arrive before B? For SOCK_STREAM (e.g. TCP) and SOCK_SEQPACKET (almost never used) sockets, the answer is an unqualified yes. For SOCK_DGRAM over the internet (i.e. UDP packets) the answer is no: packets can be reordered by the network. On a single host, a unix domain datagram socket will (on all systems I know) preserve ordering, but I don't believe that's guaranteed by any standard and I'm sure there are edge cases.
Or wait: maybe you're asking if the messages written by the two processes won't be mixed? Yes: single system calls (write/writev/sendto/sendmsg) always place their content into a file descriptor atomically. But obviously if you or your library splits that write into multiple calls, you lose that guarantee.
For UDP, if two threads write to a socket handle simultaneously, both messages will be sent as separate datagrams. Might undergo IP fragmentation if the packet is larger than MTU, but resulting datagrams will be preserved and reassembled correctly by the receiver. In other words, you are safe for UDP, except for the normal issues associated with UDP (datagram reordering, packet loss, etc...).
For TCP, which is stream based, I don't know. Your question is essentially asking the equivalent of "if two threads try to write to the same file handle, will the file still be legible?" I actually don't know the answer.
The simplest thing you can do is just use a thread safe lock (mutex) to guard send/write calls to the socket so that only on thread can write to the socket at a time.
For TCP, I would suggest having a dedicated thread for handling all socket io. Then just invent a way in which messages from the worker thrads can get asynchronously queued to the socket thread for it to send on. The socket thread could also handle recv() calls and notify the other threads when the socket connection is terminated by the remote side.
If you try to send a large message on a STREAM socket that exceeds the underlying buffer size, it's pretty much guaranteed that you you will get a short write -- the write or send call will write only part of the data (as much as will fit in the buffer) and then return the amount written, leaving you to do another write for the remaining data.
If you do this in multiple threads or processes, then each write (or send) will thus write a small portion of the message atomically into the send buffer, but the subsequent writes might happen in any order, with the result that the large buffers being sent will get interleaved.
If you send messages on DGRAM sockets, on the other hand, either the entire message will be sent atomically (as a single layer 4 packet, which might be fragmented and reassembled by lower layers of the protocol stack), or you will get an error (EMSGSIZE linux or other UNIX variants)

How to interrupt a thread performing a blocking socket connect?

I have some code that spawns a pthread that attempts to maintain a socket connection to a remote host. If the connection is ever lost, it attempts to reconnect using a blocking connect() call on its socket. Since the code runs in a separate thread, I don't really care about the fact that it uses the synchronous socket API.
That is, until it comes time for my application to exit. I would like to perform some semblance of an orderly shutdown, so I use thread synchronization primitives to wake up the thread and signal for it to exit, then perform a pthread_join() on the thread to wait for it to complete. This works great, unless the thread is in the middle of a connect() call when I command the shutdown. In that case, I have to wait for the connect to time out, which could be a long time. This makes the application appear to take a long time to shut down.
What I would like to do is to interrupt the call to connect() in some way. After the call returns, the thread will notice my exit signal and shut down cleanly. Since connect() is a system call, I thought that I might be able to intentionally interrupt it using a signal (thus making the call return EINTR), but I'm not sure if this is a robust method in a POSIX threads environment.
Does anyone have any recommendations on how to do this, either using signals or via some other method? As a note, the connect() call is down in some library code that I cannot modify, so changing to a non-blocking socket is not an option.
Try to close() the socket to interrupt the connect(). I'm not sure, but I think it will work at least on Linux. Of course, be careful to synchronize properly such that you only ever close() this socket once, or a second close() could theoretically close an unrelated file descriptor that was just opened.
EDIT: shutdown() might be more appropriate because it does not actually close the socket.
Alternatively, you might want to take a look at pthread_cancel() and pthread_kill(). However, I don't see a way to use these two without a race condition.
I advise that you abandon the multithreaded-server approach and instead go event-driven, for example by using epoll for event notification. This way you can avoid all these very basic problems that become very hard with threads, like proper shutdown. You are free to at any time do anything you want, e.g. safely close sockets and never hear from them again.
On the other hand, if in your worker thread you do a non-blocking connect() and get notified via epoll_pwait() (or ppoll() or pselect(); note the p), you may be able to avoid race conditions associated with signals.

Resources