Are there explicit considerations about the latency of any single request in the node.js event loop? AFAICT every IO call returns an eventEmitter which emits an event. The processing of all the events is multiplexed through the use of a pipe. So it is possible that the event that needs to be processed for an important request may be placed too far back into the pipe. Is there some sort of priority queue that can be used to schedule the order of execution of eventHandlers ?
Here's why I asked this question in the first place. I decided to give a gist.github link because the reason is long and related to the technical question
It's not clear exactly what you're asking here. Your Javascript does not directly add things to the event queue (that is only done with native code). Instead, you call some async operation and the native code behind that operation adds something to the event queue when the async operation completes.
This article The Node.js Event Loop, Timers, and process.nextTick() gives you a lot of details about how the event queue is serviced and how it handles different types of events (timers, I/O, etc...).
In general things are FIFO (first in, first out) within an event type with some exceptions.
process.nextTick() will run its callback BEFORE waiting I/O events.
setImmediate() will runs its callback AFTER waiting I/O events.
More detail here: setImmediate vs. nextTick and nextTick vs setImmediate, visual explanation and setTimeout vs. setImmediate vs. nextTick
So it is possible that the event that needs to be processed for an important request may be placed too far back into the pipe.
You'd have to show us the specific situation you're concerned about. If you yourself are scheduling a callback with setTimeout(), setImmediate() or process.nextTick(), then you have some control over when it happens by which of these three you pick. If you aren't scheduling it yourself (e.g. it's the completion callback of some async operation), then you don't control it's scheduling in the event loop. It will go into the sub-queue that matches its type and be served FIFO from that phase or the event loop (as described in the above articles).
Is there some sort of priority queue that can be used to schedule the order of execution of eventHandlers ?
There is no exposed priority system. Within an event type, things are FIFO. Again, if you give us an actual coding example so we can see exactly what you're trying to do, we could offer some help on what your choices are. You may be able to use the setTimeout(), setImmediate() and process.nextTick() tools that are already available or you may want to implement your own task queuing and prioritization system that runs off some of the above three methods that allows you to prioritize things that are already queued yourself.
About priorities of execution in the event loop:
setImmediate() runs before setTimeout(fn, 0)
nextTick() triggers the callback on next tick (iteration)
Natively the event loop in node.js does not support priorities. You can always implement your own priority queue or use an existing one like here and assign your functions to the priority queue.
Related
I'm trying to understand the internals of node.js and how it works under the hood
So if I've understood correctly,
The event loop is executing on a single thread, when app generates a new I/O task it makes a non blocking call to the event demultiplexer, the call immediately returns without any data, allowing the event loop to continue with other tasks. When data is ready the demultiplexer pushes an event (or more) in the event queue. Event loop takes out this event (handler with data)...
As I understand it, the demultiplexer is executing in the os(epoll in Linux) not in the application's main thread (in which event loop executes) and by definition demultiplexer is synchronous and blocking that's because the watch() call is blocking and this is exactly where my question stands.
I know that the demultiplexer is watching until one or more events are ready but:
If Watch() is executed in the os and not in the event loop what does blocking mean? It blocks what?
(Sorry for my English, I'm not a native)
I've been using various concurrency constructs for a while now without much consideration for how all the magic happens, which has recently made me increasingly uneasy.
In an attempt to remedy this ... feeling, I have been reading up on how async works under the hood. When I say async, in this case I'm referring to userland / greenthread / cooperative multitasking, although I assume some of the concepts will also apply to traditional OS managed threads insofar as a scheduler and workers are involved.
I see how a worker can suspend itself and let other workers execute, but at the lowest level in non-blocking library code, how does the scheduler know when a previously suspended worker's job is done and to wake up that worker?
For example if you fire up a worker in some sort of async block and perform an operation that would normally block (e.g. HTTP request, SQL query, other I/O), then even though your calling code is async, that operation (library code) better play nice with your async framework or you've effectively defeated the purpose of using it and blocked your scheduler from calling other waiting operations (the, What Color is Your Function problem) while it waits for your blocking call, which was executed inside your non-blocking calling code, to complete.
So now we've got async code calling other async library code, and now I'm asking myself the question all over again - how does the async library code know when to suspend and resume operation?
The idea of firing off a HTTP request, moving on, and returning later to check for results is weird to think about for me - not conceptually but from an implementation standpoint.
How do you perform a partial operation, e.g. sending TCP packets and then continuing with the rest of the program execution, only to come back later and check if results have been delivered. Delivered to what? A socket?
Now we're another layer deep and you are using socket selects to avoid creating threads and blocking, but, again...
how do those sockets start an operation, move on before completion, and then how does select know when data is available?
Are you continually checking some buffer to see if bytes have been delivered in an infinite loop and moving on if not?
Anyhow - I think you see where I'm going here....
I focused mainly on HTTP as a motivating example, but the same question applies for any normally blocking operations - how does it all work at the bottom?
Here are some of the resources I found helpful while researching the topic and which informed this question:
David Beazley's excellent video Build Your Own Async where he walks you through a simple implementation of a scheduler which fire callbacks and suspend execution by sleeping on a waiting queue. I found this video tremendously instructive, but it stops a bit short in that it shows you how using an async sleep frees up the scheduler to execute other workers, but doesn't really go into what would happen when you call code in those workers that itself must be non-blocking so it plays nice with the scheduler.
How does non-blocking IO work under the hood - This got me further along in my understanding, but still left with a few uncertainties.
We all know that in Node.js, the functions are handled by worker thread for execution and then send to the event queue and then the event loop looks into the call stack.
If the call stack is empty then the event loop takes the function's context environment to call stack, and then call stack process it and give it as a response.
My question is if we have multiple functions with same timeout function and then all the function is given to worker thread then worker thread sends their context environment to the event queue,
and if the timeout of all the functions are same then they all come into the event queue at the same time and then if the call stack is empty then the event loop will send all the functions to call stack, and we all know the property of stack is FILO.
so if this happened resulting the last function should be sent in response first but this is not happening the first function is coming in response first if all the timeouts are the same?
There are lots of things wrong in how you describe things in your question, but I'll speak to the timeout issue that you ask about.
nodejs has its own timer system. It keeps a sorted list of timers and the ONLY Javascript timeout that has a physical system timer is the next one to fire. If multiple Javascript timeouts are set to fire at the same point in time, then they all share that one OS timer.
When that OS timer fires and when the main JS thread is free and able to pull the next event from the event loop, it will see a JS timer is ready to call its callback. If there are more than one ready to go, all for the same time, then the interpreter will call each of their callbacks one after the other, in the order the timers were configured (FIFO).
We all know that in Node.js, the functions are handled by worker thread for execution and then send to the event queue and then the event loop looks into the call stack. If the call stack is empty then the event loop takes the function's context environment to call stack, and then call stack process it and give it as a response.
My question is if we have multiple functions with same timeout function and then all the function is given to worker thread
That part is wrong. They aren't given to a worker thread.
Then worker thread sends their context environment to the event queue, and if the timeout of all the functions are same then they all come into the event queue at the same time
As I described above, timers are a special type of event in the event loop code. They use only one system timer at a time to schedule the next timer to fire. Multiple timers set to fire at the same time are all stored in the same list (a list element for that particular time) and all share the same OS timer when it's their turn to be next. So, they don't all come into the event queue at the same time. Nodejs has set one system timer for the group of timers set to fire at the same time. When that system timer fires and when the interpreter is free to pull the next event from the event loop, then it will call each callback for each timer set for that time one after another in FIFO order.
and then if the call stack is empty then the event loop will send all the functions to call stack, and we all know the property of stack is FILO.
I don't know what "send all the functions to that call stack" means. That's not how things work at all. node.js runs your Javascript in a single thread (except for WorkerThreads which are not in play here) so it calls the callback for one timer, runs that to completion, then calls the callback for the next timer, runs it to completion and so on...
so if this happened resulting the last function should be sent in response first but this is not happening the first function is coming in response first if all the timeouts are the same?
As I've said a couple times above, multiple timers set to fire at the same use one system timer and when that system timer fires, the callbacks for each of those timers are called one after the other in FIFO order.
References
You can learn more about the node.js timer architecture here: How does Node.js manage timers.
And, here's some more info about the node.js timer architecture taken from comments in the source code: How many concurrent setTimeouts before performance issues?.
Looking for a solution between setting lots of timers or using a scheduled task queue
Six Part Series on the Node.js Event Loop and How It Works
My question is if we have multiple functions with same timeout
function
poll phase controll the timer
First read here
I highly recommend this
series
So the main question is how does Node decide what code to run next?
As we know the Event Loop and the Worker Pool maintain queues for pending events and pending tasks, respectively.
don't confuse with Worker thread
Worker threads is different concept Read
here
But In realtiy the Event Loop does not actually maintain a queue. Instead, it has a collection of file descriptors that it asks the operating system to monitor, using a mechanism like epoll (Linux), kqueue (OSX), event ports (Solaris), or IOCP (Windows). These file descriptors correspond to network sockets, any files it is watching, and so on. When the operating system says that one of these file descriptors is ready, the Event Loop translates it to the appropriate event and invokes the callback(s) associated with that event
The Worker Pool uses a real queue whose entries are tasks to be processed. A Worker pops a task from this queue and works on it, and when finished the Worker raises an "At least one task is finished" event for the Event Loop.
timer callback is called depends on the performance of the system (Node has to check the timer for expiration once before executing the callback, which takes some CPU time) as well as currently running processes in the event loop.
How do I shoot an event EXACTLY after given time in milliseconds? Is there any module for that? I was looking for it on google, but didn't find anything satisfactory...
You can't control exact execution of code in the Event Loop. If you need this, then you should look at using a different Framework/Language.
Understanding the Node.js Event Loop, Timers and process.nextTick()
There are no guarantees of when setTimeout() will run, only a guaranteed minimum of how long it will wait, see below excerpt from the above guide:
setTimeout() schedules a script to be run after a minimum threshold in ms has elapsed.
The closest you have is process.nextTick() and even then you're at the mercy of the Event Queue because other things queued with process.nextTick() can occur before yours. This is also dangerous due to possible starvation of the Event Loop if not implemented correctly.
Is "event based" the same as "asynchronous"?
No it doesn't imply that the events are asynchronous.
In a event driven single threaded system, you can fire events, but they are all processed serially. They may yield as part of their processing, but nothing happens concurrently, if they yield, they stop processing and have to wait until they are messaged again to start processing again.
Examples of this are Swing ( Java ), Twisted ( Python ), Node.js ( JavaScript ), EventMachine ( Ruby )
All of these examples are event driven message loops, but they are all single threaded, every event will block all subsequent events on that same thread.
In programming, asynchronous events are those occurring independently of the main program flow. Asynchronous actions are actions executed in a non-blocking scheme, allowing the main program flow to continue processing.
So just because something is event driven doesn't make it asynchronous, and just because something is asynchronous doesn't make it event driven either; much less concurrent.
They are essentially orthogonal concepts.
"event driven" essentially means that the code associated to certain function calls is bind at runtime (and can change through the execution).
Who "fires" the event doesn't know what will handle it, and who handle the event is defined to respond to the event through an association defined while the program executes. Typically though function pointers, reference or pointers to object carrying virtual methods etc.)
"asynchronous" means that a program flow doesn't have to wait for a call to be executed before proceed over (mostly implemented with a call that returns immediately after delegating the execution to another thread or process)
Not all events are asynchronous (think to the windows SendMessage, respect to PostMessage), and not all asynchronous calls are necessary implemented by "events" (although the use of the event mechanism is quite common to implement asynchronous calls)
One meaning of asynchronous is that at a point where you emit an computation, you don't wait for an answer, but you get the answer later. The answer comes in orthogonal to you normal control flow.
One way the answer comes in is using events: They happen spontaneously in this case, without your code triggering them. In a handler you may process the result.
Whereas the computation and answer is connected by the point in control flow for synchronous mode, you have to do the connection yourself in asynchronous mode. For example by use of a sequence number or something.