epoll_wait on several Threads faster?

epoll_wait on several Threads faster? - multithreading

I am thinking of programming a tcp server based on epoll.
To achieve the best performance i want to implement multi core support too. But during my researches the following question came up:
Is it faster to call two epoll_wait()-Calls from two different threads, each observing their own file descriptors on a dual core? Or is this as fast as calling just one single epoll_wait() which observes all file descriptors?
Since the Kernel observes the file descriptors, i think it doens't matter how much threads i use on user space calling epoll_wait()?

You can even call epoll_wait concurrently on multiple threads for the same epoll_fd as long as you use edge-triggered (EPOLLET) mode (and be careful about synchronisation). Using that approach you can get real performance benefits on multi-core machines compared to a single-threaded epoll event loop.
I have actually done performance measurements some time ago, see my blog postings for the results:
http://cmeerw.org/blog/748.html#748
http://cmeerw.org/blog/746.html#746

I suspect you're correct that the performance of epoll per se is no different in the two scenarios you mention. What could make a difference, OTOH, is that by having an event loop in each thread you don't need to context switch to a separate worker thread to handle the connection. This is e.g. how nginx works, see e.g. http://www.aosabook.org/en/nginx.html .

Related

Sockets use threads instead of select()

I have a question about multi sockets.
I know that I have to use select() for multi sockets. select() waits for a fd ...
But why we need to use select() when we can create a thread for each socket and perform accept() on each one seperatly ? Is it even a bad idea ? Is it just about "too many sockets, too many threads so" or what ??

It's true, you can avoid multiplexing sockets by instead spawning one thread for each socket, and then using blocking I/O on each thread.
That saves you from having to deal with select() (or poll() or etc); but now you have to deal with multiple threads instead, which is often worse.
Whether threads will be more of a hassle to manage than socket-multiplexing in your particular program depends a lot on what your program is trying to do. For example, if the threads in your program don't need to communicate/co-operate with each other or share any resources, then a multithreaded design can work well (as would a multiprocess design). On the other hand, if your threads all need to access a shared data structure or other resource, or if they need to interact with each other, then you've got a bit of a programming challenge on your hands, which you'll need to solve 100% perfectly or you'll end up with a program that "seems to work most of the time" but then occasionally deadlocks, crashes, or gives incorrect results due to incorrect/insufficient synchronization. This phenomenon of "meta-stability" is much more common/severe amongst buggy multithreaded programs than in buggy single-threaded programs, since a multithreaded program's exact flow of execution will be different every time you run it (due to the asynchronous nature of the threads with respect to each other).
Stability and code-correctness issues aside, there are a couple of other problems particular to multithreading that you avoid by using a single-threaded design:
Most OS's don't scale well above a few dozen threads. So if you're thinking one-thread-per-client, and you want to support hundreds or thousands of simultaneous clients, you're in for some performance problems.
It's hard to control a thread that is blocked in a blocking-socket call. Say the user has pressed Command-Q (or whatever the appropriate equivalent is) so it's now time for your program to quit. If you have one or more threads blocked inside a blocking-socket call, there's no straightforward way to do that:
You can't just call unilaterally call exit(), because while the main thread is tearing down process-global resources, one or more threads might still be using them, leading to an occasional crash
You can't ask the threads to exit (via atomic-boolean or whatever) and then call join() to wait for them, because they are blocking inside I/O calls and thus might take minutes/hours/days before they respond
You can't send a signal to the threads and have them react in a signal-handler, because signals are per-process, and you can't control which thread will receive the signal.
You can't just unilaterally kill the threads, because they might be holding resources (like mutexes or file handles) that would then remain unreleased forever, potentially causing deadlocks or other problems
You can't close down the threads' sockets for them, and hope that this will cause the threads to error out and terminate, as this leads to race condition if the threads also try to close down those same resources.
So even in a multithreaded design, if you want a clean shutdown (or any other sort of local control of a network-thread) you usually end up having to use non-blocking I/O and/or socket multiplexing inside each thread anyway, so now you've got the worst of both worlds, complexity-wise.

Would handling each TCP connection in a separate thread improve latency?

I have an FTP server, implemented on top of QTcpServer and QTcpSocket.
I take advantage of the signals and slots mechanism to support multiple TCP connections simultaneously, even though I have a single thread. My code returns as soon as possible to the event loop, it doesn't block (no wait functions), and it doesn't use nested event loops anywhere. That way I already have cooperative multitasking, like Win3.1 applications had.
But a lot of other FTP servers are multithreaded. Now I'm wondering if using a separate thread for handling each TCP connection would improve performance, and especially latency.
On one hand, threads add to latency because you need to start a new thread for each new connection, but on the other, with my cooperative multitasking, other TCP connections have to wait until I've returned to the main loop before their readyRead()/bytesWritten() signals can be handled.

In your current system and ignoring file I/O time one processor is always doing something useful if there's something useful to be done, and waiting ready-to-go if there's nothing useful to be done. If this were a single processor (single core) system you would have maximized throughput. This is often a very good design -- particularly for an FTP server where you don't usually have a human waiting on a packet-by-packet basis.
You have also minimized average latency (for a single processor system.) What you do not have is consistent latency. Measuring your system's performance is likely to show a lot of jitter -- a lot of variation in the time it takes to handle a packet. Again because this is FTP and not real-time process control or human interaction, jitter may not be a problem.
Now, however consider that there is probably more than one processor available on your system and that it may be possible to overlap I/O time and processing time.
To take full advantage of a multi-processor(core) system you need some concurrency.
This normally translates to using multiple threads, but it may be possible to achieve concurrency via asynchronous (non-blocking) file reads and writes.
However, adding multiple threads to a program opens up a huge can-of-worms.
If you do decide to go the MT route, I'd suggest that you consider depending on a thread-aware I/O library. QT may provide that for you (I'm not sure.) If not, take a look at boost::asio (or ACE for an older, but still solid solution). You'll discover that using the MT capabilities of such a library involves a considerable investment in learning time; however as it turns out the time to add on multithreading "by-hand" and get it right is even worse.
So I'd say stay with your existing solution unless you are worried about unused Processor cycles and/or jitter in which case start learning QT's multithreading support or boost::asio.

Do you need to start a new thread for each new connection? Could you not just have a pool of threads that acts on requests as and when they arrive. This should reduce some of the latency. I have to say that in general a multi-threaded FTP server should be more responsive that a single-threaded one. Is it possible to have an event based FTP server?

Is non-blocking I/O really faster than multi-threaded blocking I/O? How?

I searched the web on some technical details about blocking I/O and non blocking I/O and I found several people stating that non-blocking I/O would be faster than blocking I/O. For example in this document.
If I use blocking I/O, then of course the thread that is currently blocked can't do anything else... Because it's blocked. But as soon as a thread starts being blocked, the OS can switch to another thread and not switch back until there is something to do for the blocked thread. So as long as there is another thread on the system that needs CPU and is not blocked, there should not be any more CPU idle time compared to an event based non-blocking approach, is there?
Besides reducing the time the CPU is idle I see one more option to increase the number of tasks a computer can perform in a given time frame: Reduce the overhead introduced by switching threads. But how can this be done? And is the overhead large enough to show measurable effects? Here is an idea on how I can picture it working:
To load the contents of a file, an application delegates this task to an event-based i/o framework, passing a callback function along with a filename
The event framework delegates to the operating system, which programs a DMA controller of the hard disk to write the file directly to memory
The event framework allows further code to run.
Upon completion of the disk-to-memory copy, the DMA controller causes an interrupt.
The operating system's interrupt handler notifies the event-based i/o framework about the file being completely loaded into memory. How does it do that? Using a signal??
The code that is currently run within the event i/o framework finishes.
The event-based i/o framework checks its queue and sees the operating system's message from step 5 and executes the callback it got in step 1.
Is that how it works? If it does not, how does it work? That means that the event system can work without ever having the need to explicitly touch the stack (such as a real scheduler that would need to backup the stack and copy the stack of another thread into memory while switching threads)? How much time does this actually save? Is there more to it?

The biggest advantage of nonblocking or asynchronous I/O is that your thread can continue its work in parallel. Of course you can achieve this also using an additional thread. As you stated for best overall (system) performance I guess it would be better to use asynchronous I/O and not multiple threads (so reducing thread switching).
Let's look at possible implementations of a network server program that shall handle 1000 clients connected in parallel:
One thread per connection (can be blocking I/O, but can also be non-blocking I/O).
Each thread requires memory resources (also kernel memory!), that is a disadvantage. And every additional thread means more work for the scheduler.
One thread for all connections.
This takes load from the system because we have fewer threads. But it also prevents you from using the full performance of your machine, because you might end up driving one processor to 100% and letting all other processors idle around.
A few threads where each thread handles some of the connections.
This takes load from the system because there are fewer threads. And it can use all available processors. On Windows this approach is supported by Thread Pool API.
Of course having more threads is not per se a problem. As you might have recognized I chose quite a high number of connections/threads. I doubt that you'll see any difference between the three possible implementations if we are talking about only a dozen threads (this is also what Raymond Chen suggests on the MSDN blog post Does Windows have a limit of 2000 threads per process?).
On Windows using unbuffered file I/O means that writes must be of a size which is a multiple of the page size. I have not tested it, but it sounds like this could also affect write performance positively for buffered synchronous and asynchronous writes.
The steps 1 to 7 you describe give a good idea of how it works. On Windows the operating system will inform you about completion of an asynchronous I/O (WriteFile with OVERLAPPED structure) using an event or a callback. Callback functions will only be called for example when your code calls WaitForMultipleObjectsEx with bAlertable set to true.
Some more reading on the web:
Multiple Threads in the User Interface on MSDN, also shortly handling the cost of creating threads
Section Threads and Thread Pools says "Although threads are relatively easy to create and use, the operating system allocates a significant amount of time and other resources to manage them."
CreateThread documentation on MSDN says "However, your application will have better performance if you create one thread per processor and build queues of requests for which the application maintains the context information.".
Old article Why Too Many Threads Hurts Performance, and What to do About It

I/O includes multiple kind of operations like reading and writing data from hard drives, accessing network resources, calling web services or retrieving data from databases. Depending on the platform and on the kind of operation, asynchronous I/O will usually take advantage of any hardware or low level system support for performing the operation. This means that it will be performed with as little impact as possible on the CPU.
At application level, asynchronous I/O prevents threads from having to wait for I/O operations to complete. As soon as an asynchronous I/O operation is started, it releases the thread on which it was launched and a callback is registered. When the operation completes, the callback is queued for execution on the first available thread.
If the I/O operation is executed synchronously, it keeps its running thread doing nothing until the operation completes. The runtime doesn't know when the I/O operation completes, so it will periodically provide some CPU time to the waiting thread, CPU time that could have otherwise be used by other threads that have actual CPU bound operations to perform.
So, as #user1629468 mentioned, asynchronous I/O does not provide better performance but rather better scalability. This is obvious when running in contexts that have a limited number of threads available, like it is the case with web applications. Web application usually use a thread pool from which they assign threads to each request. If requests are blocked on long running I/O operations there is the risk of depleting the web pool and making the web application freeze or slow to respond.
One thing I have noticed is that asynchronous I/O isn't the best option when dealing with very fast I/O operations. In that case the benefit of not keeping a thread busy while waiting for the I/O operation to complete is not very important and the fact that the operation is started on one thread and it is completed on another adds an overhead to the overall execution.
You can read a more detailed research I have recently made on the topic of asynchronous I/O vs. multithreading here.

To presume a speed improvement due to any form of multi-computing you must presume either that multiple CPU-based tasks are being executed concurrently upon multiple computing resources (generally processor cores) or else that not all of the tasks rely upon the concurrent usage of the same resource -- that is, some tasks may depend on one system subcomponent (disk storage, say) while some tasks depend on another (receiving communication from a peripheral device) and still others may require usage of processor cores.
The first scenario is often referred to as "parallel" programming. The second scenario is often referred to as "concurrent" or "asynchronous" programming, although "concurrent" is sometimes also used to refer to the case of merely allowing an operating system to interleave execution of multiple tasks, regardless of whether such execution must take place serially or if multiple resources can be used to achieve parallel execution. In this latter case, "concurrent" generally refers to the way that execution is written in the program, rather than from the perspective of the actual simultaneity of task execution.
It's very easy to speak about all of this with tacit assumptions. For example, some are quick to make a claim such as "Asynchronous I/O will be faster than multi-threaded I/O." This claim is dubious for several reasons. First, it could be the case that some given asynchronous I/O framework is implemented precisely with multi-threading, in which case they are one in the same and it doesn't make sense to say one concept "is faster than" the other.
Second, even in the case when there is a single-threaded implementation of an asynchronous framework (such as a single-threaded event loop) you must still make an assumption about what that loop is doing. For example, one silly thing you can do with a single-threaded event loop is request for it to asynchronously complete two different purely CPU-bound tasks. If you did this on a machine with only an idealized single processor core (ignoring modern hardware optimizations) then performing this task "asynchronously" wouldn't really perform any differently than performing it with two independently managed threads, or with just one lone process -- the difference might come down to thread context switching or operating system schedule optimizations, but if both tasks are going to the CPU it would be similar in either case.
It is useful to imagine a lot of the unusual or stupid corner cases you might run into.
"Asynchronous" does not have to be concurrent, for example just as above: you "asynchronously" execute two CPU-bound tasks on a machine with exactly one processor core.
Multi-threaded execution doesn't have to be concurrent: you spawn two threads on a machine with a single processor core, or ask two threads to acquire any other kind of scarce resource (imagine, say, a network database that can only establish one connection at a time). The threads' execution might be interleaved however the operating system scheduler sees fit, but their total runtime cannot be reduced (and will be increased from the thread context switching) on a single core (or more generally, if you spawn more threads than there are cores to run them, or have more threads asking for a resource than what the resource can sustain). This same thing goes for multi-processing as well.
So neither asynchronous I/O nor multi-threading have to offer any performance gain in terms of run time. They can even slow things down.
If you define a specific use case, however, like a specific program that both makes a network call to retrieve data from a network-connected resource like a remote database and also does some local CPU-bound computation, then you can start to reason about the performance differences between the two methods given a particular assumption about hardware.
The questions to ask: How many computational steps do I need to perform and how many independent systems of resources are there to perform them? Are there subsets of the computational steps that require usage of independent system subcomponents and can benefit from doing so concurrently? How many processor cores do I have and what is the overhead for using multiple processors or threads to complete tasks on separate cores?
If your tasks largely rely on independent subsystems, then an asynchronous solution might be good. If the number of threads needed to handle it would be large, such that context switching became non-trivial for the operating system, then a single-threaded asynchronous solution might be better.
Whenever the tasks are bound by the same resource (e.g. multiple needs to concurrently access the same network or local resource), then multi-threading will probably introduce unsatisfactory overhead, and while single-threaded asynchrony may introduce less overhead, in such a resource-limited situation it too cannot produce a speed-up. In such a case, the only option (if you want a speed-up) is to make multiple copies of that resource available (e.g. multiple processor cores if the scarce resource is CPU; a better database that supports more concurrent connections if the scarce resource is a connection-limited database, etc.).
Another way to put it is: allowing the operating system to interleave the usage of a single resource for two tasks cannot be faster than merely letting one task use the resource while the other waits, then letting the second task finish serially. Further, the scheduler cost of interleaving means in any real situation it actually creates a slowdown. It doesn't matter if the interleaved usage occurs of the CPU, a network resource, a memory resource, a peripheral device, or any other system resource.

The main reason to use AIO is for scalability. When viewed in the context of a few threads, the benefits are not obvious. But when the system scales to 1000s of threads, AIO will offer much better performance. The caveat is that AIO library should not introduce further bottlenecks.

One possible implementation of non-blocking I/O is exactly what you said, with a pool of background threads that do blocking I/O and notify the thread of the originator of the I/O via some callback mechanism. In fact, this is how the AIO module in glibc works. Here are some vague details about the implementation.
While this is a good solution that is quite portable (as long as you have threads), the OS is typically able to service non-blocking I/O more efficiently. This Wikipedia article lists possible implementations besides the thread pool.

I am currently in the process of implementing async io on an embedded platform using protothreads. Non blocking io makes the difference between running at 16000fps and 160fps. The biggest benefit of non blocking io is that you can structure your code to do other things while hardware does its thing. Even initialization of devices can be done in parallel.
Martin

In Node, multiple threads are being launched, but it's a layer down in the C++ run-time.
"So Yes NodeJS is single threaded, but this is a half truth, actually it is event-driven and single-threaded with background workers. The main event loop is single-threaded but most of the I/O works run on separate threads, because the I/O APIs in Node.js are asynchronous/non-blocking by design, in order to accommodate the event loop. "
https://codeburst.io/how-node-js-single-thread-mechanism-work-understanding-event-loop-in-nodejs-230f7440b0ea
"Node.js is non-blocking which means that all functions ( callbacks ) are delegated to the event loop and they are ( or can be ) executed by different threads. That is handled by Node.js run-time."
https://itnext.io/multi-threading-and-multi-process-in-node-js-ffa5bb5cde98 
The "Node is faster because it's non-blocking..." explanation is a bit of marketing and this is a great question. It's efficient and scaleable, but not exactly single threaded.

The improvement as far as I know is that Asynchronous I/O uses ( I'm talking about MS System, just to clarify ) the so called I/O completion ports. By using the Asynchronous call the framework leverage such architecture automatically, and this is supposed to be much more efficient that standard threading mechanism. As a personal experience I can say that you would sensibly feel your application more reactive if you prefer AsyncCalls instead of blocking threads.

Let me give you a counterexample that asynchronous I/O does not work.
I am writing a proxy similar to below-using boost::asio.
https://github.com/ArashPartow/proxy/blob/master/tcpproxy_server.cpp
However, the scenario of my case is, incoming (from clients side) messages are fast while outgoing (to server side) is slow for one session, to keep up with the incoming speed or to maximize the total proxy throughput, we have to use multiple sessions under one connection.
Thus this async I/O framework does not work anymore. We do need a thread pool to send to the server by assigning each thread a session.

Why should I use a thread vs. using a process?

Separating different parts of a program into different processes seems (to me) to make a more elegant program than just threading everything. In what scenario would it make sense to make things run on a thread vs. separating the program into different processes? When should I use a thread?
Edit
Anything on how (or if) they act differently with single-core and multi-core would also be helpful.

You'd prefer multiple threads over multiple processes for two reasons:
Inter-thread communication (sharing data etc.) is significantly simpler to program than inter-process communication.
Context switches between threads are faster than between processes. That is, it's quicker for the OS to stop one thread and start running another than do the same with two processes.
Example:
Applications with GUIs typically use one thread for the GUI and others for background computation. The spellchecker in MS Office, for example, is a separate thread from the one running the Office user interface. In such applications, using multiple processes instead would result in slower performance and code that's tough to write and maintain.

Well apart from advantages of using thread over process, like:
Advantages:
Much quicker to create a thread than
a process.
Much quicker to switch
between threads than to switch
between processes.
Threads share data
easily
Consider few disadvantages too:
No security between threads.
One thread can stomp on another thread's
data.
If one thread blocks, all
threads in task block.
As to the important part of your question "When should I use a thread?"
Well you should consider few facts that a threads should not alter the semantics of a program. They simply change the timing of operations. As a result, they are almost always used as an elegant solution to performance related problems. Here are some examples of situations where you might use threads:
Doing lengthy processing: When a windows application is calculating it cannot process any more messages. As a result, the display cannot be updated.
Doing background processing: Some
tasks may not be time critical, but
need to execute continuously.
Doing I/O work: I/O to disk or to
network can have unpredictable
delays. Threads allow you to ensure
that I/O latency does not delay
unrelated parts of your application.

I assume you already know you need a thread or a process, so I'd say the main reason to pick one over the other would be data sharing.
Use of a process means you also need Inter Process Communication (IPC) to get data in and out of the process. This is a good thing if the process is to be isolated though.

You sure don't sound like a newbie. It's an excellent observation that processes are, in many ways, more elegant. Threads are basically an optimization to avoid too many transitions or too much communication between memory spaces.
Superficially using threads may also seem like it makes your program easier to read and write, because you can share variables and memory between the threads freely. In practice, doing that requires very careful attention to avoid race conditions or deadlocks.
There are operating-system kernels (most notably L4) that try very hard to improve the efficiency of inter-process communication. For such systems one could probably make a convincing argument that threads are pointless.

I would like to answer this in a different way. "It depends on your application's working scenario and performance SLA" would be my answer.
For instance threads may be sharing the same address space and communication between threads may be faster and easier but it is also possible that under certain conditions threads deadlock and then what do you think would happen to your process.
Even if you are a programming whiz and have used all the fancy thread synchronization mechanisms to prevent deadlocks it certainly is not rocket science to see that unless a deterministic model is followed which may be the case with hard real time systems running on Real Time OSes where you have a certain degree of control over thread priorities and can expect the OS to respect these priorities it may not be the case with General Purpose OSes like Windows.
From a Design perspective too you might want to isolate your functionality into independent self contained modules where they may not really need to share the same address space or memory or even talk to each other. This is a case where processes will make sense.
Take the case of Google Chrome where multiple processes are spawned as opposed to most browsers which use a multi-threaded model.
Each tab in Chrome can be talking to a different server and rendering a different website. Imagine what would happen if one website stopped responding and if you had a thread stalled due to this, the entire browser would either slow down or come to a stop.
So Google decided to spawn multiple processes and that is why even if one tab freezes you can still continue using other tabs of your Chrome browser.
Read more about it here
and also look here

I agree to most of the answers above. But speaking from design perspective i would rather go for a thread when i want set of logically co-related operations to be carried out parallel. For example if you run a word processor there will be one thread running in foreground as an editor and other thread running in background auto saving the document at regular intervals so no one would design a process to do that auto saving task separately.

In addition to the other answers, maintaining and deploying a single process is a lot simpler than having a few executables.
One would use multiple processes/executables to provide a well-defined interface/decoupling so that one part or the other can be reused or reimplemented more easily than keeping all the functionality in one process.

Came across this post. Interesting discussion. but I felt one point is missing or indirectly pointed.
Creating a new process is costly because of all of the
data structures that must be allocated and initialized. The process is subdivided into different threads of control to achieve multithreading inside the process.
Using a thread or a process to achieve the target is based on your program usage requirements and resource utilization.

When is multi-threading not a good idea? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I was recently working on an application that sent and received messages over Ethernet and Serial. I was then tasked to add the monitoring of DIO discretes. I throught,
"No reason to interrupt the main
thread which is involved in message
processing, I'll just create
another thread that monitors DIO."
This decision, however, proved to be poor. Sometimes the main thread would be interrupted between a Send and a Receive serial message. This interruption would disrupt the timing and alas, messages would be lost (forever).
I found another way to monitor the DIO without using another thread and Ethernet and Serial communication were restored to their correct functionality.
The whole fiasco, however, got me thinking. Are their any general guidelines about when not to use multiple-threads and/or does anyone have anymore examples of situations when using multiple-threads is not a good idea?
**EDIT:Based on your comments and after scowering the internet for information, I have composed a blog post entitled When is multi-threading not a good idea?

On a single processor machine and a desktop application, you use multi threads so you don't freeze the app but for nothing else really.
On a single processor server and a web based app, no need for multi threading because the web server handles most of it.
On a multi-processor machine and desktop app, you are suggested to use multi threads and parallel programming. Make as many threads as there are processors.
On a multi-processor server and a web based app, no need again for multi threads because the web server handles it.
In total, if you use multiple threads for other than un-freezing desktop apps and any other generic answer, you will make the app slower if you have a single core machine due to the threads interrupting each other.
Why? Because of the hardware switches. It takes time for the hardware to switch between threads in total. On a multi-core box, go ahead and use 1 thread for each core and you will greatly see a ramp up.

To paraphrase an old quote: A programmer had a problem. He thought, "I know, I'll use threads." Now the programmer has two problems. (Often attributed to JWZ, but it seems to predate his use of it talking about regexes.)
A good rule of thumb is "Don't use threads, unless there's a very compelling reason to use threads." Multiple threads are asking for trouble. Try to find a good way to solve the problem without using multiple threads, and only fall back to using threads if avoiding it is as much trouble as the extra effort to use threads. Also, consider switching to multiple threads if you're running on a multi-core/multi-CPU machine, and performance testing of the single threaded version shows that you need the performance of the extra cores.

Multi-threading is a bad idea if:
Several threads access and update the same resource (set a variable, write to a file), and you don't understand thread safety.
Several threads interact with each other and you don't understand mutexes and similar thread-management tools.
Your program uses static variables (threads typically share them by default).
You haven't debugged concurrency issues.

Actually, multi threading is not scalable and is hard to debug, so it should not be used in any case if you can avoid it. There is few cases where it is mandatory : when performance on a multi CPU matters, or when you deal whith a server that have a lot of clients taking a long time to answer.
In any other cases, you can use alternatives such as queue + cron jobs or else.

You might want to take a look at the Dan Kegel's "The C10K problem" web page about handling multiple data sources/sinks.
Basically it is best to use minimal threads, which in sockets can be done in most OS's w/ some event system (or asynchronously in Windows using IOCP).
When you run into the case where the OS and/or libraries do not offer a way to perform communication in a non-blocking manner, it is best to use a thread-pool to handle them while reporting back to the same event loop.
Example diagram of layout:
Per CPU [*] EVENTLOOP ------ Handles nonblocking I/O using OS/library utilities
| \___ Threadpool for various blocking events
Threadpool for handling the I/O messages that would take long

Multithreading is bad except in the single case where it is good. This case is
The work is CPU Bound, or parts of it is CPU Bound
The work is parallelisable.
If either or both of these conditions are missing, multithreading is not going to be a winning strategy.
If the work is not CPU bound, then you are waiting not on threads to finish work, but rather for some external event, such as network activity, for the process to complete its work. Using threads, there is the additional cost of context switches between threads, The cost of synchronization (mutexes, etc), and the irregularity of thread preemption. The alternative in most common use is asynchronous IO, in which a single thread listens to several io ports, and acts on whichever happens to be ready now, one at a time. If by some chance these slow channels all happen to become ready at the same time, It might seem like you will experience a slow-down, but in practice this is rarely true. The cost of handling each port individually is often comparable or better than the cost of synchronizing state on multiple threads as each channel is emptied.
Many tasks may be compute bound, but still not practical to use a multithreaded approach because the process must synchronise on the entire state. Such a program cannot benefit from multithreading because no work can be performed concurrently. Fortunately, most programs that require enormous amounts of CPU can be parallelized to some level.

Multi-threading is not a good idea if you need to guarantee precise physical timing (like in your example). Other cons include intensive data exchange between threads. I would say multi-threading is good for really parallel tasks if you don't care much about their relative speed/priority/timing.

A recent application I wrote that had to use multithreading (although not unbounded number of threads) was one where I had to communicate in several directions over two protocols, plus monitoring a third resource for changes. Both protocol libraries required a thread to run the respective event loop in, and when those were accounted for, it was easy to create a third loop for the resource monitoring. In addition to the event loop requirements, the messages going through the wires had strict timing requirements, and one loop couldn't be risked blocking the other, something that was further alleviated by using a multicore CPU (SPARC).
There were further discussions on whether each message processing should be considered a job that was given to a thread from a thread pool, but in the end that was an extension that wasn't worth the work.
All-in-all, threads should if possible only be considered when you can partition the work into well defined jobs (or series of jobs) such that the semantics are relatively easy to document and implement, and you can put an upper bound on the number of threads you use and that need to interact. Systems where this is best applied are almost message passing systems.

In priciple everytime there is no overhead for the caller to wait in a queue.

A couple more possible reasons to use threads:
Your platform lacks asynchronous I/O operations, e.g. Windows ME (No completion ports or overlapped I/O, a pain when porting XP applications that use them.) Java 1.3 and earlier.
A third-party library function that can hang, e.g. if a remote server is down, and the library provides no way to cancel the operation and you can't modify it.
Keeping a GUI responsive during intensive processing doesn't always require additional threads. A single callback function is usually sufficient.
If none of the above apply and I still want parallelism for some reason, I prefer to launch an independent process if possible.

I would say multi-threading is generally used to:
Allow data processing in the background while a GUI remains responsive
Split very big data analysis onto multiple processing units so that you can get your results quicker.
When you're receiving data from some hardware and need something to continuously add it to a buffer while some other element decides what to do with it (write to disk, display on a GUI etc.).
So if you're not solving one of those issues, it's unlikely that adding threads will make your life easier. In fact it'll almost certainly make it harder because as others have mentioned; debugging mutithreaded applications is considerably more work than a single threaded solution.
Security might be a reason to avoid using multiple threads (over multiple processes). See Google chrome for an example of multi-process safety features.

Multi-threading is scalable, and will allow your UI to maintain its responsivness while doing very complicated things in the background. I don't understand where other responses are acquiring their information on multi-threading.
When you shouldn't multi-thread is a mis-leading question to your problem. Your problem is this: Why did multi-threading my application cause serial / ethernet communications to fail?
The answer to that question will depend on the implementation, which should be discussed in another question. I know for a fact that you can have both ethernet and serial communications happening in a multi-threaded application at the same time as numerous other tasks without causing any data loss.
The one reason to not use multi-threading is:
There is one task, and no user interface with which the task will interfere.
The reasons to use mutli-threading are:
Provides superior responsiveness to the user
Performs multiple tasks at the same time to decrease overall execution time
Uses more of the current multi-core CPUs, and multi-multi-cores of the future.
There are three basic methods of multi-threaded programming that make thread safety implemented with ease - you only need to use one for success:
Thread Safe Data types passed between threads.
Thread Safe Methods in the threaded object to modify data passed between.
PostMessage capabilities to communicate between threads.

Are the processes parallel? Is performance a real concern? Are there multiple 'threads' of execution like on a web server? I don't think there is a finite answer.

A common source of threading issues is the usual approaches employed to synchronize data. Having threads share state and then implement locking at all the appropriate places is a major source of complexity for both design and debugging. Getting the locking right to balance stability, performance, and scalability is always a hard problem to solve. Even the most experienced experts get it wrong frequently. Alternative techniques to deal with threading can alleviate much of this complexity. The Clojure programming language implements several interesting techniques for dealing with concurrency.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string