node js - what happens to incoming events during callback excution - node.js

Suppose I have a callback with some heavy synchronous processing. During the execution, the event loop is not free to poll for incoming events. So what happens to these events? are they queued somewhere to be processed later, or are they simply lost?
Thanks.

They are added to queue and processed later:
A JavaScript runtime contains a message queue, which is a list of messages to be processed. A function is associated with each message. When the stack is empty, a message is taken out of the queue and processed. The processing consists of calling the associated function (and thus creating an initial stack frame). The message processing ends when the stack becomes empty again.
Concurrency model and Event Loop

The event loop does not poll. Therefore not being able to process the event loop does not affect incoming events.
How the event loop work:
Most modern OSes (or unix-like ancient OSes) handle I/O at the OS level instead of the application level. The POSIX standard requires the OS to support at least the select() system call. The select() function is a blocking function that most programs use to handle non-blocking I/O. That statement sounds contradictory but it's not.
How non-blocking I/O work:
I'm going to use select() as an example but different OSes also have other non-blocking API like poll() and epoll() and overlapped-IO (Windows). Various javascript engines typically use a library like libuv to automatically handle which API to use at compile time.
A non-blocking API typically provides one function like select() that blocks and waits for events on any I/O the application is listening on. Why blocking? Because that's the only way for the program to use 0% CPU time. Otherwise the process will be busy polling and that would be very inefficient.
Side note: What does blocking mean? Blocking is basically any function that tells the OS: hey, I'm waiting for this "thing" so can you remove me from the CPU sharing schedule and wake me up only when the "thing" arrive?
The difference between non-blocking I/O and blocking I/O is not that you never block, non-blocking I/O blocks waiting on multiple I/O whereas blocking I/O blocks waiting on a single I/O. If you want to know more google the documentation of the select() POSIX function.
Anyway, javascript uses non-blocking I/O so it does not block on reading from I/O but blocks on select() or similar functions. When the interpreter is executing javascript code obviously it is not simultaneously calling the select() function. So while the interpreter is busy the OS buffers any I/O destined for the program.
Does the OS poll?
No. The OS generally does not poll (then again it depends on the device driver but in general no). I/O activity is handled by interrupts. Even for non-interrupt driven I/O (for example USB) generally the chipset that handles that I/O will generate an interrupt when its buffers are full so that the OS will copy the data to OS buffers in RAM. Sometimes for high speed devices it's not even the OS that does the copy but the DMA controller which would generate an interrupt once data is copied to RAM.
What about GUI activity?
In the end, GUI activity like mouse clicks and key presses are also interrupt driven (early version of DOS based GUI managers like Windows 1.0 used poll driven mouse driver, then Microsoft saw a demo of the Mac OS and legend has it that an engineer at Apple let slip that they didn't poll, since then mouse drivers generally trigger interrupts).
The exception:
One minor exception is threads in javascript. By threads I mean web workers in browsers and disk I/O handlers in node.js. In node.js for example disk I/O drivers are implemented as blocking I/O in individual threads. So node.js is responsible for buffering data before passing it back to the event loop. Again, all the OS buffering layers still exist: while copying data for example the OS may buffer a completed disk read command before the node.js thread call the next read(). In any case, the threads still communicate with the event loop via I/O channels, either pipes or sockets or unix domain sockets so everything I outlined above still holds: if the main js thread is busy the OS will simply buffer data from the threads (or if it's blocking then the threads will simply block until the event loop process their I/O).

They're queued and handled in order upon being pulled from the event queue.
Your JS code can't block new events from entering the queue.

Related

Will non-blocking I/O be put to sleep during copying data from kernel to user?

I ask this question because I am looking at multiplexing I/O in Go, which is using epollwait.
When an socket is ready, a goroutine will be waked up and begin to read socket in non-blocking mode. If the read system call still will be blocked during copying data from kernel to user, I assume the kernel thread the gorouine attached to will be put to sleep as well.
I am not sure of that, hoping someone can help correct me if I am wrong.
I fail to quite parse what you've written.
I'll try to make a sheer guess and conjure you might be overseeing the fact that the write(2) and read(2) syscalls (and those of their ilk such as send(2) and recv(2)) on the sockets put into non-blocking mode are free to consume (and return, respectively) less data than requested.
In other words, a write(2) call on a non-blocking socket told to write 1 megabyte of data will consume just as much data currently fits into the assotiated kernel buffer and return immediately, signalling it consumed only as much data. The next immediate call to write(2) will likely return EWOULDBLOCK.
The same goes for the read(2) call: if you pass it a buffer large enough to hold 1 megabyte of data, and tell it to read that number of bytes, the call will only drain the contents of the kernel buffer and return immediately, signaling how much data it actually copied. The next immediate call to read(2) will likely return EWOULDBLOCK.
So, any attempt to get or put data to the socket succeeds almost immediately: either after the data had been shoveled between the kernel's buffer and the user space or right away—with the EAGAIN return code.
Sure, there's supposedly a possibility for an OS thread to be suspended right in the middle of performing such a syscall, but this does not count as "blocking in a syscall."
Update to the original answer in response to the following comment of the OP:
<…>
This is what I see in book
"UNIX Network Programming" (Volume 1, 3rd), chapter 6.2:
A synchronous I/O operation causes the requesting process
to be blocked until that I/O operation completes. Using these
definitions, the first four I/O models—blocking, nonblocking, I/O
multiplexing, and signal-driven I/O—are all synchronous because the
actual I/O operation (recvfrom) blocks the process.
It uses "blocks" to describe nonblocking I/O operation. That makes me confused.
I still don't understand why the book uses "blocks the process" if the process is actually not blocked.
I can only guess that the book's author intended to highlight that the process is indeed blocked since entering a syscall and until returning from it. Reads from and writes to a non-blocking socket do block to transfer the data, if available, between the kernel and the user space. We colloquially say this does not block because we mean "it does not block waiting and doing nothing for an indeterminate amount of time".
The book's author might contrast this to the so-called asynchronous I/O (called "overlapping" on Windows™)—where you basically give the kernel a buffer with/for data and ask it to do away with it completely in parallel with your code—in the sense the relevant syscall returns right away and the I/O is carried out in background (with regard to your user-space code).
To my knowledge, Go does not use kernel's async I/O facilities on neither platform it supports. You might look there for the developments regarding Linux and its contemporary io_uring subsystem.
Oh, and one more point. The book might (at that point through the narrative at least) be discussing a simplified "classic" scheme where there are no in-process threads, and the sole unit of concurrency is the process (with a single thread of execution). In this scheme, any syscall obviously blocks the whole process. In contrast, Go works only on kernels which support threads, so in a Go program a syscall never blocks the whole process—only the thread it's called on.
Let me take yet another stab at explaining the problem as—I perceive—the OP stated it.
The problem of serving multiple client requests is not new—one of the more visible first statements of it is "The C10k problem".
To quickly recap it, a single threaded server with blocking operations on the sockets it manages is only realistically able to handle a single client at a time.
To solve it, there exist two straightforward approaches:
Fork a copy of the server process to handle each incoming client connection.
On an OS which supports threads, fork a new thread inside the same process to handle each incoming client.
They have their pros and cons but they both suck with regard to resource usage, and—which is more important—they do not play well with the fact most clients have relatively low rate and bandwidth of I/O they perform with regard to the processing resources available on a typical server.
In other words, when serving a typical TCP/IP exchange with a client, the serving thread most of the time sleeps in the write(2) and read(2) calls on the client socket.
This is what most people mean when talking about "blocking operations" on sockets: if a socket is blocking, and operation on it will block until it can actually be carried out, and the originating thread will be put to sleep for an indeterminate amount of time.
Another important thing to note is that when the socket becomes ready, the amount of work done is typically miniscule compared to the amount of time slept between the wakeups.
While the tread sleeps, its resources (such as memory) are effectively wasted, as they cannot be used to do any other work.
Enter "polling". It combats the problem of wasted resources by noticing that the points of readiness of networked sockets are relatively rare and far in between, so it makes sense to have lots of such sockets been served by a single thread: it allows to keep the thread almost as busy as theoretically possible, and also allows to scale out when needed: if a single thread is unable to cope with the data flow, add another thread, and so on.
This approach is definitely cool but it has a downside: the code which reads and writes data must be re-written to use callback style instead of the original plain sequential style. Writing with callbacks is hard: you usuaully have to implement intricate buffer management and state machines to deal with this.
The Go runtime solves this problem by adding another layer of scheduling for its execution flow units—goroutines: for goroutines, operations on the sockets are always blocking, but when a goroutine is about to block on a socket, this is transparently handled by suspending only the goroutine itself—until the requested operation will be able to proceed—and using the thread the goroutine was running on to do other work¹.
This allows to have the best of both approaches: the programmer may write classic no-brainer sequential callback-free networking code but the threads used to handle networking requests are fully utilized².
As to the original question of blocking, both the goroutine and the thread it runs on are indeed blocked when the data transfer on a socket is happening, but since what happens is data shoveling between a kernel and a user-space buffer, the delay is most of the time small, and is no different to the classic "polling" case.
Note that performing of syscalls—including I/O on non-pollable descriptors—in Go (at leas up until, and including Go 1.14) does block both the calling goroutine and the thread it runs on, but is handled differently from those of pollable descriptors: when a special monitoring thread notices a goroutine spent in a syscall more that certain amount of time (20 µs, IIRC), the runtime pulls the so-called "processor" (a runtime thing which runs goroutines on OS threads) from under the gorotuine and tries to make it run another goroutine on another OS thread; if there is a goroutine wanting to run but no free OS thread, the Go runtime creates another one.
Hence "normal" blocking I/O is still blocking in Go in both senses: it blocks both goroutines and OS threads, but the Go scheduler makes sure the program as a whole still able to make progress.
This could arguably be a perfect case for using true asynchronous I/O provided by the kernel, but it's not there yet.
¹ See this classic essay for more info.
² The Go runtime is certainly not the first one to pioneer this idea. For instance, look at the State Threads library (and the more recent libtask) which implement the same approach in plain C; the ST library has superb docs which explain the idea.

nodejs spawns threads implicitly by delegating the I/O to the kernel. How is this different than a server that makes a thread per request

Following up on ideas from these two previous questions I had:
When a goroutine blocks on I/O how does the scheduler identify that it has stopped blocking?
When doing asynchronous I/O, how does the kernel determine if an I/O operation is completed?
I've been looking into nodejs recently. It's advertised as "single threaded", which is partially true since all your JS does run on one thread, but from what I've read, in the background, node achieves this by delegating the I/O tasks to the kernel so that it doesn't get stuck having to wait for the response.
What I'm having difficulty understanding is how this is any different than the paradigms where you explicitly are creating a thread per request.
Could someone explain the differences in depth?
This would be true if node created one thread for each I/O request. But, of course, it doesn't do that. It has an I/O engine that understands the best way to do I/O on each platform.
What nodejs hides from you is not some naive implementation where a scheduling entity waits for each request to complete, but a sophisticated implementation that understands the optimal way to do I/O on every platform on which it is implemented.
Updates:
If both approaches need the kernel for I/O aren't they both creating a kernel thread per request?
No. There are lots of ways to use the kernel for I/O that don't require a kernel thread per request. They differ from platform to platform. Windows has IOCP. Linux has epoll. And so on.
If nodejs somehow is using a fixed amount of threads and queueing the I/O operations, isn't that slower than a thread per request?
No, it's typically much faster for a variety of reasons that depend on the specifics of each platform. Here are a few advantages:
You can avoid "thundering herds" when lots of I/O completes at once. Instead, you can wake just the number of threads that can usefully run at the same time.
You can avoid needing lots of contexts switches to get all the different threads to execute. Instead, each thread can handle completion after completion.
You don't have to put each thread on a wait queue for each I/O operation. Instead, you can use a single wait queue for the group of threads.
Just to give you an idea of how significant it can be, consider the difference between using a thread per I/O and using epoll on Linux. If you use a thread per I/O, that means each I/O operation requires a thread to place itself on a wait queue, that thread to block, that thread to be unblocked, a context switch to occur to that thread, and that thread to remove itself from the wait queue.
By contrast, with epoll, a single thread can service any number of I/O completions without having to be rescheduled or added to or removed from a wait queue for each I/O. Similarly, a thread can issue a number of I/O requests without being descheduled. This difference is massive.

C# When thread switching will most probably occur?

I was wondering when .Net would most probably switch from a thread to another?
I understand we can't predict when this will happen exactly, but is there any intelligence in this? For example, when a thread is executed will it try to wait for a method to returns or a loop to finish before switching?
I'm not an expert on .NET, but in general scheduling is handled by the kernel.
Either your thread's timeslice has expired (threads/processes only get a certain amount of CPU time)
Your thread has blocked for IO.
Some other obscure reason, like waiting for an IPC message, a network packet or something.
Threads can be preempted at any point along their execution path, be it in a loop or returning from a function. This in general isn't handled by the underlying VM (.NET or JVM) but is controlled by the OS.
Of course there is 'intelligence', of a sort:). The set of running threads can only change upon an interrupt, either:
An actual hardware interrupt from a peripheral device, eg. disk, NIC, KB, mouse, timer.
A software interrupt, (ie. a system call), that can change the state of thread/s. This encompasses sleep calls and calls to wait/signal on inter-thread synchro objects, as well as I/O calls that request data that is not immediately available.
If there is no interrupt, the OS cannot change the set of running threads because it is not entered. The OS does not know or care about loops, function/methods calls, (except those that make system calls as above), gotos or any other user-level flow-control mechanisms.
I read your question now, it may not be rellevant anymore, but after reading the above answers, i want to just to make sure:
Threads are managed (or as i know) by the process they belong to. There is nothing to do with the Operation System(and that's is the main reason why working with multithreads is more faster than working with multiprocess, because there are data sharing between threads and the switching between them is occuring faster than the context switch wich occure between process by the Short-Term-Scheduler).
(NOTE: There are two types of threads: USER_MODE' threads and KERNEL_MODE' threadss, and each os can have both of them or just on of them. Anyway a thread that working in a user application environment is considered as a USER_MODE' thread and managed by the process it's belong to.)
Am I Write?
Thanks!!!

How do system calls like select() or poll() work under the hood?

I understand that async I/O ops via select() and poll() do not use processor time i.e its not a busy loop but then how are these really implemented under the hood ? Is it supported in hardware somehow and is that why there is not much apparent processor cost for using these ?
It depends on what the select/poll is waiting for. Let's consider a few cases; I'm going to assume a single-core machine for simplification.
First, consider the case where the select is waiting on another process (for example, the other process might be carrying out some computation and then outputs the result through a pipeline). In this case the kernel will mark your process as waiting for input, and so it will not provide any CPU time to your process. When the other process outputs data, the kernel will wake up your process (give it time on the CPU) so that it can deal with the input. This will happen even if the other process is still running, because modern OSes use preemptive multitasking, which means that the kernel will periodically interrupt processes to give other processes a chance to use the CPU ("time-slicing").
The picture changes when the select is waiting on I/O; network data, for example, or keyboard input. In this case, while archaic hardware would have to spin the CPU waiting for input, all modern hardware can put the CPU itself into a low-power "wait" state until the hardware provides an interrupt - a specially handled event that the kernel handles. In the interrupt handler the CPU will record the incoming data and after returning from the interrupt will wake up your process to allow it to handle the data.
There is no hardware support. Well, there is... but is nothing special and it depends on what kind of file descriptor are you watching. If there is a device driver involved, the implementation depends on the driver and/or the device. For example, sockets. If you wait for some data to read, there are a sequence of events:
Some process calls poll()/select()/epoll() system call to wait for data in a socket. There is a context switch from the user mode to the kernel.
The NIC interrupts the processor when some packet arrives. The interrupt routine in the driver push the packet in the back of a queue.
There is a kernel thread that takes data from that queue and wakes up the network code inside the kernel to process that packet.
When the packet is processed, the kernel determines the socket that was expecting for it, saves the data in the socket buffer and returns the system call back to user space.
This is just a very brief description, there are a lot of details missing but I think that is enough to get the point.
Another example where no drivers are involved is a unix socket. If you wait for data from one of them, the process that waits is added to a list. When other process on the other side of the socket writes data, the kernel checks that list and the point 4 is applied again.
I hope it helps. I think that examples are the best to undertand it.

What mechanism do PIPES use to "wake up" the recipient?

I have two questions is one here.
On Windows, I am familiar with pipes and how they work. However, I am curious as to what mechanism the OS uses to notify the recipient thread of a message arrival.
Does the thread "poll & sleep" continuously for data? Does the OS check to see if the thread is sleeping and wake it up? Or is there some other mechanism used?
Specifically, I want to build an IPC system where many threads need to pass messages. I don't need to use pipes, but I do need to know the most efficient notification method possible.
The developer can decide how they want to work with the pipe, whether they will sleep/poll or else they want to call blocking functions and wait until the data is available.
About the mechanism that the pipe has for waking up the process --assuming that the process is in a blocking read call-- it is not the pipe, but the OS the one that takes charge, like in any other OS call: it registers the operation and blocks the process/thread until the data is available. When the data is available, it completes the system call.
This is an answer for Unix. I'd lay good money on Windows being pretty similar as the solution has been around a long time and is well known to be robust. The details will vary a bit (different API calls, specifics of semantics, etc.)
It depends on whether the other end is using the pipe's file descriptor in blocking or non-blocking mode.
In blocking mode, the process is waiting in the OS kernel for the data to become available. The way in which notification happens there depends on the OS. Chances are it involves a queue of processes that are considered to be runnable, and everything's made simpler by the fact that the kernel can (largely) control what interrupts it. In a simple (single processor) implementation you could go for something as trivial as noting on write to the pipe that the other process is waiting to read from it (via some kind of “interest set”), and so marking the reader as runnable at that point (at which time it becomes up to the scheduler to decide).
In non-blocking mode, either the process is polling from time to time (yuck!) or they're using a system call like select() or poll() (there are some higher-performance variants too). That's very much like the Windows call WaitForMultipleObjects() and works just great with pipes. That in turn ends up back at that runnable process queue, the interest set, and the scheduler.
It also doesn't really matter too much whether it's blocking because the pipe is full or the pipe is empty, as the control flow is pretty much symmetric between readers and writers. (Unlike the data flow, of course.)

Resources