Does asynchronous IO reduce the number of threads compared to synchronized IO? - multithreading

I heard from a person that he chooses asynchronous IO instead of synchronized IO in the design of his system, because there will be less threads and threads are expensive.
But isn't asynchronous IO's advantage is to release the CPU from waiting for the IO to finish and do something else and then being interrupted by the IO operation once it finishes running. Does that change the number of threads or processes? I don't know.
Thanks.

The CPU is released in any case. It does not make sense to keep a CPU core idle while an IO is in progress. Another thread can be scheduled.
Async IO really is about having less threads. An IO is just a kernel data structure. An IO does not need to be "on a thread". Async IO decouples threads and IOs.
If you have 1M connected sockets and are reading on all of them you really don't want 1M threads.

I suggest you read this https://stackoverflow.com/a/14797071/2241168 . It's practical example to illustrate the difference between async and sync model.

The root of the confusion is that you actually have three options:
In a single thread, start I/O and wait for it to finish ("blocking" or "synchronous" I/O)
In a single thread, start I/O and do something else while it is running ("asynchronous" I/O)
In multiple threads, start I/O and "wait" for it to finish ("blocking" or "synchronous" I/O, but with multiple threads). In this case only one thread waits while the others go on doing useful work.
So if you use async I/O, you are avoiding both alternatives -- blocking the CPU or using multiple threads.

Related

nodejs spawns threads implicitly by delegating the I/O to the kernel. How is this different than a server that makes a thread per request

Following up on ideas from these two previous questions I had:
When a goroutine blocks on I/O how does the scheduler identify that it has stopped blocking?
When doing asynchronous I/O, how does the kernel determine if an I/O operation is completed?
I've been looking into nodejs recently. It's advertised as "single threaded", which is partially true since all your JS does run on one thread, but from what I've read, in the background, node achieves this by delegating the I/O tasks to the kernel so that it doesn't get stuck having to wait for the response.
What I'm having difficulty understanding is how this is any different than the paradigms where you explicitly are creating a thread per request.
Could someone explain the differences in depth?
This would be true if node created one thread for each I/O request. But, of course, it doesn't do that. It has an I/O engine that understands the best way to do I/O on each platform.
What nodejs hides from you is not some naive implementation where a scheduling entity waits for each request to complete, but a sophisticated implementation that understands the optimal way to do I/O on every platform on which it is implemented.
Updates:
If both approaches need the kernel for I/O aren't they both creating a kernel thread per request?
No. There are lots of ways to use the kernel for I/O that don't require a kernel thread per request. They differ from platform to platform. Windows has IOCP. Linux has epoll. And so on.
If nodejs somehow is using a fixed amount of threads and queueing the I/O operations, isn't that slower than a thread per request?
No, it's typically much faster for a variety of reasons that depend on the specifics of each platform. Here are a few advantages:
You can avoid "thundering herds" when lots of I/O completes at once. Instead, you can wake just the number of threads that can usefully run at the same time.
You can avoid needing lots of contexts switches to get all the different threads to execute. Instead, each thread can handle completion after completion.
You don't have to put each thread on a wait queue for each I/O operation. Instead, you can use a single wait queue for the group of threads.
Just to give you an idea of how significant it can be, consider the difference between using a thread per I/O and using epoll on Linux. If you use a thread per I/O, that means each I/O operation requires a thread to place itself on a wait queue, that thread to block, that thread to be unblocked, a context switch to occur to that thread, and that thread to remove itself from the wait queue.
By contrast, with epoll, a single thread can service any number of I/O completions without having to be rescheduled or added to or removed from a wait queue for each I/O. Similarly, a thread can issue a number of I/O requests without being descheduled. This difference is massive.

Does a thread waiting on IO also block a core?

In the synchronous/blocking model of computation we usually say that a thread of execution will wait (be blocked) while it waits for an IO task to complete.
My question is simply will this usually cause the CPU core executing the thread to be idle, or will a thread waiting on IO usually be context switched out and put into a waiting state until the IO is ready to be processed?
A CPU core is normally not dedicated to one particular thread of execution. The kernel is constantly switching processes being executed in and out of the CPU. The process currently being executed by the CPU is in the "running" state. The list of processes waiting for their turn are in a "ready" state. The kernel switches these in and out very quickly. Modern CPU features (multiple cores, simultaneous multithreading, etc.) try to increase the number of threads of execution that can be physically executed at once.
If a process is I/O blocked, the kernel will just set it aside (put it in the "waiting" state) and not even consider giving it time in the CPU. When the I/O has finished, the kernel moves the blocked process from the "waiting" state to the "ready" state so it can have its turn ("running") in the CPU.
So your blocked thread of execution blocks only that: the thread of execution. The CPU and the CPU cores continue to have other threads of execution switched in and out of them, and are not idle.
For most programming languages, used in standard ways, then the answer is that it will block your thread, but not your CPU.
You would need to explicitely reserve a CPU for a particular thread (affinity) for 1 thread to block an entire CPU. To be more explicit, see this question:
You could call the SetProcessAffinityMask on every process but yours with a mask that excludes just the core that will "belong" to your process, and use it on your process to set it to run just on this core (or, even better, SetThreadAffinityMask just on the thread that does the time-critical task).
If we assume it's not async, then I would say, in that case, your thread owning the thread would be put to the waiting queue for sure and the state would be "waiting".
Context-switching wise, IMO, it may need a little bit more explanation since the term context-switch can mean/involve many things (swapping in/out, page table updates, register updates, etc). Depending on the current state of execution, potentially, a second thread that belongs to the same process might be scheduled to run whilst the thread that was blocked on the IO operation is still waiting.
For example, then context-switching would most likely be limited to changing register values on the CPU regarding core (but potentially the owning process might even get swapped-out if there's no much memory left).
no,in java , block thread did't participate scheduling

more details about an asynchronous IO call

whichever way I think about asynchronicity, I still come up with some sort of concurrency.
This guy here says that asynchronicity can have two flavors:
simulated asynchronicity ( let me call it that way)- where a thread is spawn for the async execution of some operations. To me this is a fake-asynchronicity and it's similar to concurrency. I don't see any real benefits here.
hardware supported asynch - where the request is just forwarded to the Hardware (like the Hard-disk or the Network Card) and the control of
execution is immediately returned to the CPU. When the IO operation is ready, the CPU is notified and a call back is executed.
This seems ok if you think about one single IO request, but if I try to extend the example for multiple IO requests then I still arrive at concurrency only that
the concurency has been forwarded to the Hardware.
Here's a diagram for two async IO calls:
CPU ----- io async req 1 -----> Hardware
CPU <------ returns the control (no data) ------- Hardware
CPU ----- io async req 2 ------> Hardware
CPU <------ return the control (no data) ------- Hardware
CPU executes other operations while the Hardware executes two IO tasks
CPU <------- data for req 1 ------- Hardware
CPU executes the callback
CPU executes other operations
CPU <-------- data for req 2 ------- Hardware
CPU executes the callback
As you can see, at line 5, the hardware handles two tasks simultaneously so the concurrency has been transferred to the hardware. So, as I said, whichever way I think about asynchronicity I still come up with some sort of concurrency, of course this time is not the CPU that handles it but the IO-Hardware.
Am I wrong ?
Does the IO hardware accept concurrency?
If yes, is the concurrency offered by the IO-hardware much better than that of the CPU?
If not, then the hardware is executing synchronously multiple IO operations, in which case, I don't see the benefits of asynchronicity vs. concurrency.
Thanks in advance for your help.
Async IO mainly is about not having to have a thread exist for the duration of the IO. Imagine a server waiting on 1000000 TCP connections for data to arrive. With one thread per connection that is a lot of memory burned.
Instead, a threadless async IO is issued and it's just a small data structure. It'S a registration with the OS that says "If data arrives call me back".
How IOs map to hardware operations varies. Some hardware might have concurrency built-in. My SSD certainly has because it has multiple independent flash chips on it. Other hardware might not be able to process multiple IOs concurrently. Older magnetic disks did not do that. Simple NICs have no concurrency. Here, the driver or OS will serialize requests.
But that has nothing to do with how you initiate the IO. It's the same for thread-based and threadless IO. The driver and the hardware can't tell the difference usually (or don't care).
Async IO is about having less threads. It's not about driving the hardware differently at all.
It doesn't seem like you understand asynchronous I/O at all. Here's a typical example of how asynchronous I/O might work:
A thread is running. It wants to send some data over the network. It does an asynchronous network read operation. The call into the operating system reports that no data is ready yet but arranges to notify when some data is ready. The thread keeps running until data arrives at the network card. The network card generates an interrupt, the interrupt handler dispatches to code that notices that there's a pending asynchronous read, it queues an event signalling that the read has completed. Later, the thread is finished with all the work it has to do at that time, so it checks for events. It sees that the read completed, gets the data, processes it, and does another asynchronous read.
The thread may have dozens of asynchronous I/O operations pending at any particular time.

IO Completion port Linux equivalent

Windows OS, A threading model where ThreadPool associated with an IO Completion, ~releasing~ a thread each time an Async IO is completed, the ~released~ thread is then used to deal with the IO completion.
While linux select can be used for async IO it doesn't seem to support the IO Completion / ThreadPool logic.
Is there any Linux equivalent to the IO Completion / ThreadPool above threading model ?
I don't know of anything that does that directly, but you can combine a select() loop with a thread pool of your own to get similar behavior. When select() returns and you examine your fd_sets to see which file descriptors are ready, push each of those descriptors to a thread pool for processing. You'll still need a main thread to run the select() loop, independent of the thread pool for processing the I/O events.
Most of the complexity in this approach will be in how you keep track of which descriptors need to be included in the select() call at each iteration, and in the thread pool itself (since POSIX provides threads but not a standard thread pool API).
You might be interested in using a library like GLib, which provides a modular main loop that helps with polling on a varying set of file descriptors, as well as a thread pool implementation.

What is in simple words blocking IO and non-blocking IO?

How would you explain a simple mortal about blocking IO and non-blocking IO? I've found these concepts are not very clear among many of us programmers.
Blocking I/O means that the program execution is put on hold while the I/O is going on. So the program waits until the I/O is finished and then continues it's execution.
In non-blocking I/O the program can continue during I/O operations.
It's a concurrency issue. In the normal case, after an OS kernel receives an I/O op from a user program, that program does not run again until the I/O operation completes. Other programs typically get scheduled in the meantime.
This solves lots of little problems. For example, how does a program know how many bytes were read unless the I/O is complete when the read(2) returns? How does it know if it can reuse a write(2) buffer if the operation is still in progress when write(2) returns? Obviously, a more complex interface is needed for truely asynchronous I/O.
Ultimately it comes down to:
I/O happens synchronously with respect to the program, by blocking the program until I/O is finished
I/O is merely scheduled by a system call, and some notification mechanism exists to communicate the real result
There is a compromise where I/O ops simply fail if they can't be completed immediately. This is the more common use of "non-blocking" I/O in practice.
The whole issue is complicated moreover by the effort to schedule multithreaded programs when I/O could conceivably block only one thread, but that's a different question...
simply said .. the non blocking i/o (Asynchronous) allows other operations to be carried out while it does its thing and blocking i/o would block other operations

Resources