Questions about the implementation of async programming on OS level

Questions about the implementation of async programming on OS level - multithreading

I am not convinced about some of the claims on the implementation of async programming.
I know that if a thread requests "any" IO operations such as reading a buffer of a file descriptor(stdin, stdout, etc), listening to a pipe, reading and writing on sockets or hard drives immediately send the corresponding thread to the "blocked state". CPU doesn't handle this thread until it turned to the "ready state".
My question is, How does a program exploit a single(main) thread to do IO tasks without creating multiple threads which are assigned to each IO task.
My second question is if async programming is implemented via multiple threads even they are lightweight ones why it is called single-threaded in the cases like NodeJS or Javascript.
My third question is, if async programming is implemented via multi-threads how does it provide memory efficiency?
thanks for the answers.

Related

How many tasks can a single thread execute simultaneously? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
How many tasks can a single thread execute simultaneously?

Concurrently: Zero or one. A thread is a thread. Not a magic yarn.
If by "in parallel" you mean "processed in parallel" and if you consider awaited Tasks, then there is no upper-bound limit on how many tasks are being awaited - but only one will actually be executed per a single CPU hardware-thread (usually 2x the CPU core count due to superscalar simultaneous multithreading, aka Hyper-Threading).
Also remember that Task is very abstract. It does not refer only to concurrently executing/executed (non-blocking) code, but can also refer to pending IO (e.g. disk IO, network IO, etc) that is being handled asynchronously by the host environment (e.g. the operating system) rather than it blocking the thread if it used a "traditional" (non-asynchronous) OS API call.
Re: comment
I just have a problem with handling multiple (it can be 5000, for instance) clients on the server and for each of them, I need to run a separate handling loop. But I'm concerned about the fact that the thread can handle either 0 or 1 tasks. Does it mean I should create a new thread for every new client? I know it does not matter how much threads I'll create, it won't change speed. But speed does not matter - the loop just should be executed independently for each client.
Ugh, this is not quite the same thing as your question - but I'll try my best to explain...
for each of them, I need to run a separate handling loop
Not necessarily. Just because you need to maintain state for each connected client does not mean you need a separate "loop" (i.e. a thread of execution).
In computers today fundamentally almost all network IO goes through the BSD Sockets API ("WinSock" on Windows, and in .NET this is represented via System.Net.Sockets.Socket). Remember that all kinds of computers work with sockets, including simple single-threaded computers. They don't need a blocking-loop for each connection: instead they use select to get information about socket status without blocking and only read data from the socket's input buffer if safe to do so. Voila! Only a single thread is needed. You can do this in .NET by checking Socket.Available, Socket.Select, or better yet: using the newer NetworkStream.ReadAsync method, for example.
If you're using BSD Sockets API (System.Net.Sockets) then you should use Socket.Select
Does it mean I should create a new thread for every new client?
*NOOOOONONONONONNONONO - no, you do not. Creating and running a new Thread for each connected client (Socket, NetworkStream, TcpClient, etc) is an anti-pattern that will quickly exhaust your available process memory (as each Thread costs 1MB just for its default stack on Windows desktop, ~250KB within IIS).
I know it does not matter how much threads I'll create
YES IT DOES!. Spawning lots of threads is a good way to torpedo your application's network performance and consume unnecessarily large amounts of memory.
the loop just should be executed independently for each client.
Please learn about Asynchronous Sockets. By using the async feature in C# with NetworkStream or Socket's async methods your code will use as few threads as necessary to handle network data.

Where do blocking I/O comes from?

My understanding is that the hardware architecture and the operating systems are designed not to block the cpu. When any kind of blocking operation needs to happen, the operating system registers an interruption and moves on to something else, making sure the precious time of the cpu is always effectively used.
It makes me wonder why most programming languages were designed with blocking APIs, but most importantly, since the operating system works in an asynchronous way when it comes to IO, registering interruptions and dealing with results when they are ready later on, I'm really puzzled about how our programming language APIs escape this asynchrony. How does the OS provides synchronous system calls for our programming language using blocking APIs?
Where this synchrony comes from? Certainly not at the hardware level. So, is there an infinite loop somewhere I don't know about spinning and spinning until some interruption is triggered?

My understanding is that the hardware architecture and the operating systems are designed not to block the cpu.
Any rationally designed operating system would have a system service interface that does what you say. However, there are many non-rational operating systems that do not work in this manner at the process level.
Blocking I/O is simpler to program than non-blocking I/O. Let me give you an example from the VMS operating system (Windoze works the same way under the covers). VMS has a system services called SYS$QIO and SYS$QIOW. That is, Queue I/O Request and Queue I/O Request and wait. The system services have identical parameters. One pair of parameters is the address of a completion routine and a parameters to that routine. However, these parameters are rarely used with SYS$QIOW.
If you do a SYS$QIO call, it returns immediately. When the I/O operation completes, the completion routine is called as a software interrupt. You then have to do interrupt programming in your application. We did this all the time. If you want your application to be able to read from 100 input streams simultaneously, this is the way you had to do it. It's just more complicated than doing simple blocking I/O with one device.
If a programming language were to incorporate such a callback system into its I/O statements, it would be mirroring VMS/RSX/Windoze. Ada uses the task concept to implement such systems in a operating-system-independent manner.
In the Eunuchs world, it was traditional to create a separate process for each device. That was simpler until you had to read AND write to each device.

Your observations are correct - the operating system interacts with the underlying hardware asynchronously to perform I/O requests.
The behavior of blocking I/O comes from threads. Typically, the OS provides threads as an abstraction for user-mode programs to use. But sometimes, green/lightweight threads are provided by a user-mode virtual machine like in Go, Erlang, Java (Project Loom), etc. If you aren't familiar with threads as an abstraction, read up on some background theory from any OS textbook.
Each thread has a state consisting of a fixed set of registers, a dynamically growing/shrinking stack (for function arguments, function call registers, and return addresses), and a next instruction pointer. The implementation of blocking I/O is that when a thread calls an I/O function, the underlying platform hosting the thread (Java VM, Linux kernel, etc.) immediately suspends the thread so that it cannot be scheduled for execution, and also submits the I/O request to the platform below. When the platform receives the completion of the I/O request, it puts the result on that thread's stack and puts the thread on the scheduler's execution queue. That's all there is to the magic.
Why are threads popular? Well, I/O requests happen in some sort of context. You don't just read a file or write a file as a standalone operation; you read a file, run a specific algorithm to process the result, and issue further I/O requests. A thread is one way to keep track of your progress. Another way is known as "continuation passing style", where every time you perform an I/O operation (A), you pass a callback or function pointer to explicitly specify what needs to happen after I/O completion (B), but the call (A) returns immediately (non-blocking / asynchronous). This way of programming asynchronous I/O is considered hard to reason about and even harder to debug because now you don't have a meaningful call stack because it gets cleared after every I/O operation. This is discussed at length in the great essay "What color is your function?".
Note that the platform has no obligation to provide a threading abstraction to its users. The OS or language VM can very well expose an asynchronous I/O API to the user code. But the vast majority of platforms (with exceptions like Node.js) choose to provide threads because it's much easier for humans to reason about.

What really is asynchronous computing?

I've been reading (and working) quite a bit with massively multi-threaded applications, and with IO, and I've found that the term asynchronous has become some sort of catch-all for multiple vague ideas. I'm wondering if I understand it correctly. The way I see it is that there are two main branches of "asynchronicity".
Asynchronous I/O. Such as network read/write. What this really boils down to is efficient parallel processing between multiple CPUs, such as your main CPU and your NIC CPU. The idea is to have multiple processors running in parallel, exchanging data, without blocking waiting for the other to finish and return the results of it's job.
Minimizing context-switching penalties by minimizing use of threads. This seems to be what the .NET framework is focusing on with it's async/await features. Instead of spawning/closing/blocking threads, break parallel jobs into tasks, and use a software task scheduler to keep a pool of threads as busy as possible without resorting to spawning new threads.
These seem like two entirely separate concepts with no similarities that could tie them together, but are both referred to by the same "asynchronous computing" vocabulary.
Am I understanding all of this correctly?

Asynchronous basically means not blocking, i.e. not having to wait for an operation to complete.
Threads are just one way of accomplishing that. There are many ways of doing this, from hardware level, SO level, software level.
Someone with more experience than me can give examples of asyncronicity not related to threads.

What this really boils down to is efficient parallel processing between multiple CPUs, such as your main CPU and your NIC CPU. The idea is to have multiple processors running in parallel...
Asynchronous programming is not all about multi-core CPU's and parallelism: consider a single core CPU, with just one thread creating email messages and sends them. In a synchronous fashion, it would spend a few micro seconds to create the message, and a lot more time to send it through network, and only then create the next message. But in asynchronous program, the thread could create a new message while the previous one is being sent through the network. One implementation for that kind of program can be using .NET async/await feature, where you can have just one thread. But even a blocking IO program could be considered asynchronous: If the main thread creates the messages and queues them in a buffer, which another thread pulls them from and sends them in a blocking IO way. From the main thread's point of view - it's completely async.
.NET async/await just uses the OS api's which are already async - reading /writing a file, send /receive data through network, they are all async anyway - the OS doesn't block on them (the drivers themselves are async).

Asynchronous is a general term, which does not have widely accepted meaning. Different domains have different meanings to it.
For instance, async IO means that instead of blocking on IO call, something else happens. Something else can be really different things, but it usually involves some sort of notification of call completion. Details might differ. For instance, a notification might be built into the call itself - like in MS Completeion Ports (if memory serves). Or, it can be something verify do before you make a call so that the call can not block - this is what poll() and friends do.
Async might also well mean simply parallel execution. For instance, one might say that 'database is updated asynchronously' meaning that there is a dedicated thread which handles database connectivity, and that thread does not slow down the main processing thread.

file descriptor starvation and blocking file descriptors

In the linux programming interface book, (p.1367)
Starvation considerations can also apply when using signal-driven I/O,
since it also presents an edge-triggered notification mechanism. By
contrast, starvation considerations don’t necessarily apply in
applications employing a level-triggered notification mechanism. This
is because we can employ blocking file descriptors with
level-triggered notification and use a loop that continuously checks
descriptors for readiness, and then performs some I/O on the ready
descriptors before once more checking for ready file descriptors.
I don't understand what this 'blocking' part means. I think it's irrelevant whether we use blocking I/O or nonblocking I/O. (The author also says early in the chapter that nonblocking I/O is usually employed regardless of level-triggered or edge-triggered notification)

SO, IO eh? Well, IO is "handling things" so we can go with a human metaphor. Imagine you are a process on a system getting stuff done for your boss.
Then blocking IO is like going to the dentist or having a face to face meeting with a customer. In both these scenarios, when you go to undertake that event, you're away from your desk and so totally unable to do anything else until you get back to your desk. Chances are, you're going to waste some time in the waiting room or idle chatter in the meeting/waiting for people to turn up.
Blocking IO is like this - blocking IO "sacrifices" (I say this because you lose the thread, effectively) the thread to the task in question. You can't use it for any other purpose whilst it is blocked - it's waiting on that IO to happen.
Non-blocking IO, by contrast, is like being on the telephone. When you're on the phone, you can engage in that IO whilst writing an answer on Stack Overflow! Such IO is said to be asynchronous - in that you accept an IO request and start processing it, but can handle other requests whilst they complete.
Now, my favourite resource for this kind of thing is the c10k problem page here. I'd say you're right - 99% of the time you are going to use non-blocking IO (in fact, your OS is conducting non-blocking IO for you all the time), mostly because using a whole thread for each incoming IO task is incredibly inefficient, even in Linux where threads and processes are the same thing (tasks) and are fairly lightweight.
The difference between edge-triggered and level-triggered notification types probably applies more to non-blocking connections, since it would be irrelevant for the blocking case anyway. As I understand it, an edge-triggered notification only marks a descriptor as ready when there is new data from the last time you asked for a status update, whereas level triggered marks a descriptor ready to handle whenever there is data available. This means edge-triggered interfaces are considered a bit more tricky, since you have to handle the incoming data when you see it as you won't be notified again. In theory, that should be more efficient (fewer notifications).
So, tl;dr - edge vs level readiness are slightly different considerations to blocking vs non-blocking designs namely, there are several ways to do non-blocking IO and only really one to do blocking IO.

Asynchronous vs Multithreading - Is there a difference?

Does an asynchronous call always create a new thread? What is the difference between the two?
Does an asynchronous call always create or use a new thread?
Wikipedia says:
In computer programming, asynchronous events are those occurring independently of the main program flow. Asynchronous actions are actions executed in a non-blocking scheme, allowing the main program flow to continue processing.
I know async calls can be done on single threads? How is this possible?

Whenever the operation that needs to happen asynchronously does not require the CPU to do work, that operation can be done without spawning another thread. For example, if the async operation is I/O, the CPU does not have to wait for the I/O to complete. It just needs to start the operation, and can then move on to other work while the I/O hardware (disk controller, network interface, etc.) does the I/O work. The hardware lets the CPU know when it's finished by interrupting the CPU, and the OS then delivers the event to your application.
Frequently higher-level abstractions and APIs don't expose the underlying asynchronous API's available from the OS and the underlying hardware. In those cases it's usually easier to create threads to do asynchronous operations, even if the spawned thread is just waiting on an I/O operation.
If the asynchronous operation requires the CPU to do work, then generally that operation has to happen in another thread in order for it to be truly asynchronous. Even then, it will really only be asynchronous if there is more than one execution unit.

This question is darn near too general to answer.
In the general case, an asynchronous call does not necessarily create a new thread. That's one way to implement it, with a pre-existing thread pool or external process being other ways. It depends heavily on language, object model (if any), and run time environment.
Asynchronous just means the calling thread doesn't sit and wait for the response, nor does the asynchronous activity happen in the calling thread.
Beyond that, you're going to need to get more specific.

No, asynchronous calls do not always involve threads.
They typically do start some sort of operation which continues in parallel with the caller. But that operation might be handled by another process, by the OS, by other hardware (like a disk controller), by some other computer on the network, or by a human being. Threads aren't the only way to get things done in parallel.

JavaScript is single-threaded and asynchronous. When you use XmlHttpRequest, for example, you provide it with a callback function that will be executed asynchronously when the response returns.
John Resig has a good explanation of the related issue of how timers work in JavaScript.

Multi threading refers to more than one operation happening in the same process. While async programming spreads across processes. For example if my operations calls a web service, The thread need not wait till the web service returns. Here we use async programming which allows the thread not wait for a process in another machine to complete. And when it starts getting response from the webservice it can interrupt the main thread to say that web service has completed processing the request. Now the main thread can process the result.

Windows always had asynchronous processing since the non preemptive times (versions 2.13, 3.0, 3.1, etc) using the message loop, way before supporting real threads. So to answer your question, no, it is not necessary to create a thread to perform asynchronous processing.

Asynchronous calls don't even need to occur on the same system/device as the one invoking the call. So if the question is, does an asynchronous call require a thread in the current process, the answer is no. However, there must be a thread of execution somewhere processing the asynchronous request.
Thread of execution is a vague term. In a cooperative tasking systems such as the early Macintosh and Windows OS'es, the thread of execution could simply be the same process that made the request running another stack, instruction pointer, etc... However, when people generally talk about asynchronous calls, they typically mean calls that are handled by another thread if it is intra-process (i.e. within the same process) or by another process if it is inter-process.
Note that inter-process (or interprocess) communication (IPC) is commonly generalized to include intra-process communication, since the techniques for locking, and synchronizing data are usually the same regardless of what process the separate threads of execution run in.

Some systems allow you to take advantage of the concurrency in the kernel for some facilities using callbacks. For a rather obscure instance, asynchronous IO callbacks were used to implement non-blocking internet severs back in the no-preemptive multitasking days of Mac System 6-8.
This way you have concurrent execution streams "in" you program without threads as such.

Asynchronous just means that you don't block your program waiting for something (function call, device, etc.) to finish. It can be implemented in a separate thread, but it is also common to use a dedicated thread for synchronous tasks and communicate via some kind of event system and thus achieve asynchronous-like behavior.
There are examples of single-threaded asynchronous programs. Something like:
...do something
...send some async request
while (not done)
...do something else
...do async check for results

The nature of asynchronous calls is such that, if you want the application to continue running while the call is in progress, you will either need to spawn a new thread, or at least utilise another thread you that you have created solely for the purposes of handling asynchronous callbacks.
Sometimes, depending on the situation, you may want to invoke an asynchronous method but make it appear to the user to be be synchronous (i.e. block until the asynchronous method has signalled that it is complete). This can be achieved through Win32 APIs such as WaitForSingleObject.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string