Understanding the Event-Loop in node.js - node.js

I've been reading a lot about the Event Loop, and I understand the abstraction provided whereby I can make an I/O request (let's use fs.readFile(foo.txt)) and just pass in a callback that will be executed once a particular event indicates completion of the file reading is fired. However, what I do not understand is where the function that is doing the work of actually reading the file is being executed. Javascript is single-threaded, but there are two things happening at once: the execution of my node.js file and of some program/function actually reading data from the hard drive. Where does this second function take place in relation to node?

The Node event loop is truly single threaded. When we start up a program with Node, a single instance of the event loop is created and placed into one thread.
However for some standard library function calls, the node C++ side and libuv decide to do expensive calculations outside of the event loop entirely. So they will not block the main loop or event loop. Instead they make use of something called a thread pool that thread pool is a series of (by default) four threads that can be used for running computationally intensive tasks. There are ONLY FOUR things that use this thread pool - DNS lookup, fs, crypto and zlib. Everything else execute in the main thread.

"Of course, on the backend, there are threads and processes for DB access and process execution. However, these are not explicitly exposed to your code, so you can’t worry about them other than by knowing that I/O interactions e.g. with the database, or with other processes will be asynchronous from the perspective of each request since the results from those threads are returned via the event loop to your code. Compared to the Apache model, there are a lot less threads and thread overhead, since threads aren’t needed for each connection; just when you absolutely positively must have something else running in parallel and even then the management is handled by Node.js." via http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/

Its like using, setTimeout(function(){/*file reading code here*/},1000);. JavaScript can run multiple things side by side like, having three setInterval(function(){/*code to execute*/},1000);. So in a way, JavaScript is multi-threading. And for actually reading from/or writing to the hard drive, in NodeJS, if you use:
var child=require("child_process");
function put_text(file,text){
child.exec("echo "+text+">"+file);
}
function get_text(file){
//JQuery code for getting file contents here (i think)
return JQueryResults;
}
These can also be used for reading and writing to/from the hard drive using NodeJS.

Related

How many thread really have in Node

As Node is using java-script engine internally it means Node has only one thread but i was checking online,some people saying two thread some saying one thread per core and some are saying something else,Can anyone clarify?
And if it is single threaded then how many request it can handle concurrently because if 1000 request are coming at the same time and say each request taking 100ms then the one at the 1001th will take 100 sec (100×1000=100,000 ms=100 sec).
So it means Node is not good for large number of user?
Based on your writing, I can tell you know something about programming, some logic and, some basic math, but you are getting into javascript (not java-script) now.
That background makes you an excellent fit for this new javascript on the server-side with nodejs, and I foresee you becoming great at it.
What is clear to me is that you are confusing parallelism with concurrency and this post here is going to be very useful for you.
But a TLDR could be something like this:
Parallelism: 2 or more Process/Threads running at the same time
Concurrency: The main single Process/Thread will not be waiting for an IO operation to end, it will keep doing other things and get back to it whenever the IO operation ends
IO (Input/Output) operations involve interactions with the OS (Operating System) by using the DISK or the NETWORK, for example.
In nodejs, those tasks are asynchronous, and that's why they ask for a callback, or they are Promise based.
const fs = require('fs')
function myCallbackFunc (err, data) {
if (err) {
return console.error(err)
}
console.log(data)
}
fs.readFile('./some-large-file', myCallbackFunc)
fs.readFile('./some-tiny-file', myCallbackFunc)
A simple way you could theoretically say that in the example above you'll have a
main thread taking care of your code and another one (which you don't control at all) observing the asynchronous requests, and the second one will call the myCallBackFunc whenever the IO process, which will happen concurrently, ends.
So YES, nodejs is GREAT for a large number of requests.
Of course, that single process will still share the same computational power with all the concurrent requests it is taking care.
Meaning that if within the callback above you are doing some heavy computational tasks that take a while to execute, the second call would have to wait for the first one to end.
In that case, if you are running it on your own Server/Container, you can make use of real parallelism by forking the process using the Cluster native module that already comes with nodejs :D
Node has 1 main thread(that's why called single threaded) which receives all request and it then give that request to internal threads(these are not main thread these are smaller threads) which are present in thread pool. Node js put all its request in event queue. And node server continuously runs internal event loop which checks any request is placed in Event Queue. If no, then wait for incoming requests for indefinitely else it picks one request from event queue.
And checks Threads availability from Internal Thread Pool if it's available then it picks up one Thread and assign this request to that thread.
That Thread is responsible for taking that request, process it, perform Blocking IO operations, prepare response and send it back to the Event Loop
Event Loop in turn, sends that Response to the respective Client.
Click here for Node js architecture image

Can the same line of Node.js code run at the same time?

I'm pretty new to node and am trying to setup an express server. The server will handle various requests and if any of them fail call a common failure function that will send e-mail. If I was doing this in something like Java I'd likely use something like a synchronized block and a boolean to allow the first entrance into the code to send the mail.
Is there anything like a synchronized block in Node? I believe node is single threaded and has a few helper threads to handle asyncronous/callback code. Is it at all possible that the same line of code could run at exactly the same time in Node?
Thanks!
Can the same line of Node.js code run at the same time? Is it at all possible that the same line of code could run at exactly the same time in Node?
No, it is not. Your Javascript in node.js is entirely single threaded. An event is pulled from the event queue. That calls a callback associated with that event. That callback runs until it returns. No other events can be processed until that first one returns. When it returns, the interpreter pulls the next event from the event queue and then calls the callback associated with it.
This does not mean that there are not concurrency issues in node.js. There can be, but it is caused not by code running at the same physical time and creating conflicting access to shared variables (like can happen in threaded languages like Java). Concurrency issues can be caused by the asynchronous nature of I/O in node.js. In the asynchronous case, you call an asynchronous function, pass it a callback (or expect a promise in return). Your code then continues on and returns to the interpreter. Some time later an event will occur inside of node.js native code that will add something to the event queue. When the interpreter is free from running other Javascript, it will process that event and then call your callback which will cause more of your code to run.
While all this is "in process", other events are free to run and other parts of your Javascript can run. So, the exposure to concurrency issues comes, not from simultaneous running of two pieces of code, but from one piece of your code running while another piece of your code is waiting for a callback to occur. Both pieces of code are "in process". They are not "running" at the same time, but both operations are waiting from something else to occur in order to complete. If these two operations access variables in ways that can conflict with each other, then you can have a concurrency issue in Javascript. Because none of this is pre-emptive (like threads in Java), it's all very predictable and is much easier to design for.
Is there anything like a synchronized block in Node?
No, there is not. It is simply not needed in your Javascript code. If you wanted to protect something from some other asynchronous operation modifying it while your own asynchronous operation was waiting to complete, you can use simple flags or variables in your code. Because there's no preemption, a simple flag will work just fine.
I believe node is single threaded and has a few helper threads to handle asyncronous/callback code.
Node.js runs your Javascript as single threaded. Internally in its own native code, it does use threads in order to do its work. For example, the asynchronous file system access code internal to node.js uses threads for disk I/O. But these threads are only internal and the result of these threads is not to call Javascript directly, but to insert events in the event queue and all your Javascript is serialized through the event queue. Pull event from event queue, run callback associated with the event. Wait for that callback to return. Pull next event from the event queue, repeat...
The server will handle various requests and if any of them fail call a common failure function that will send e-mail. If I was doing this in something like Java I'd likely use something like a synchronized block and a boolean to allow the first entrance into the code to send the mail.
We'd really have to see what your code looks like to understand what exact problem you're trying to solve. I'd guess that you can just use a simple boolean (regular variable) in node.js, but we'd have to see your code to really understand what you're doing.

Watch blocking call nodejs

I'm trying to connect to a 3rd party library, that has a function that can block. I would like to use it, but without blocking. Is it possible to wrap a blocking call that I don't have the control of, to make it async?
// calling this function will block the nodejs thread
blockingCall();
What I would like would be something like this.
// wrapper for the blocking call
var wrapper = wrapBlockingCall(blockingCall);
wrapper.on('complete', function() {});
Is this possible? Does this make sense?
There is no way to make a blocking JavaScript code non-blocking in Node.js - the mechanism which Node uses for its non-blocking behaviour is implemented in the C/C++ layer, which in turn is used only when doing I/O operations (reading from disk, networking etc.).
In reality, every line of JavaScript your program uses will be executed one-by-one, because it is always executed on the same thread, no matter what you do.
The only option I see is to execute the offending code in a separate Node process using the built-in Child Process module. However, this will have significant performance impact, even bigger one if the code needs to be executed frequently.
Note:
After reading the comments under your question, it seems that you are actually the author of the blocking function, which in turn calls a C API which performs blocking I/O. There are ways of calling C functions which would normally block in a manner which does not block the upper JavaScript layer.
While I am not a C expert, I think this is accomplished using the libuv library included in Node - have a look at the addons documentation for more info.

Node Background Threads - When Do These Get Created?

I've been doing a fair amount of work with Node lately, trying to build a system which has certain characteristics, one of which is non-blocking / parallelism - a Node strong suit, as I understand it.
What I don't fully understand is when a separate thread is spun off to handle some processing. I'm pretty sue this happens on a function call/call back, but certainly not all of them.
In my specific case, it's an Express based app. At app start-up it does several things including instantiating a RabbitMQ based "bus", an object with a method which will write to the bus (objA) and object which will subscribe to the bus and process messages coming across it (objB).
objA will write to the bus inside an express callback
app.put((req,res) => {
objA.methodWhichWritesToBus();
});
I believe at this point, that objA.methodWhichWritesToBus is executed in a background/worker thread - whatever you call it, not on the main event loop.
Is that the only point at which this sort of thing happens? methodWhichWritesToBus is IO instensive (it calls an elastic search service on another box and brings back 10's to 100's of thousands of records) with lots of chained promises etc., but none of that gets split off, does it?
How about the fact that the obj on which the method is called is instantiated outside the Express callback - does that affect the parallel-ism?
Finally, are the ways to effect/force a method etc to "run in the background"?
I've been noodling this, testing it, for awhile now but all on one machine so it's difficult to tell what's going on.
Who can clarify this for me?
Pre-answer: this is a topic best learned by going and reading, doing coding exercises to solidify your understanding, and working with the technology in a significant way. You're not going to "get it" based on a Q&A format. That said...
What I don't fully understand is when a separate thread is spun off to handle some processing.
Never, sort of. "Processing" as in the computation that happens in your javascript program, happens in the main event loop thread. End of story. However, waiting on I/O to come back from the OS is not considered "processing" so there are various queues managed by node and the OS to track pending I/O requests and invoke callbacks when data is ready. There are a handful of threads node uses internally to manage this stuff with the OS, but from your program's perspective, those threads are irrelevant. Your program can ask node to do some IO, then your program keeps running in parallel, and when the I/O is done, node will eventually invoke the callback in the main event loop and you can process the results.
I believe at this point, that objA.methodWhichWritesToBus is executed in a background/worker thread - whatever you call it, not on the main event loop.
You call it "asynchronously" and it happens whenever you do IO, including filesystem calls, networking, or child processes. Which is to say, quite a lot.
How about the fact that the obj on which the method is called is instantiated outside the Express callback - does that affect the parallel-ism?
Nope.
Finally, are the ways to effect/force a method etc to "run in the background"?
Generally I/O is done asynchronously by default, so no you don't normally need to force anything to run in the background. It's baked into the node design by way of the node core APIs themselves. However, there are ways to delay synchronous processing to a future event loop using setImmediate, setTimeout, or process.nextTick. I explain these in some detail in my blog post setTimeout and friends.
More precisely, all networking is asynchronous. End of story. Specifically, the APIs in node core that are available are all asynchronous, and there's simply no synchronous API available in node. For filesystem IO and child processes, there are both synchronous and asynchronous APIs, but the synchronous APIs must only be used under special limited circumstances, and if you don't know confidently that it's OK in this specific case to make a synchronous IO API call, you should use the asynchronous API so you don't break the lynchpin that makes node perform as it does.

How the single threaded non blocking IO model works in Node.js

I'm not a Node programmer, but I'm interested in how the single-threaded non-blocking IO model works.
After I read the article understanding-the-node-js-event-loop, I'm really confused about it.
It gave an example for the model:
c.query(
'SELECT SLEEP(20);',
function (err, results, fields) {
if (err) {
throw err;
}
res.writeHead(200, {'Content-Type': 'text/html'});
res.end('<html><head><title>Hello</title></head><body><h1>Return from async DB query</h1></body></html>');
c.end();
}
);
Que: When there are two requests A(comes first) and B since there is only a single thread, the server-side program will handle the request A firstly: doing SQL querying is asleep statement standing for I/O wait. And The program is stuck at the I/O waiting, and cannot execute the code which renders the web page behind. Will the program switch to request B during the waiting? In my opinion, because of the single thread model, there is no way to switch one request from another. But the title of the example code says that everything runs in parallel except your code.
(P.S I'm not sure if I misunderstand the code or not since I have
never used Node.)How Node switch A to B during the waiting? And can
you explain the single-threaded non-blocking IO model of Node in a
simple way? I would appreciate it if you could help me. :)
Node.js is built upon libuv, a cross-platform library that abstracts apis/syscalls for asynchronous (non-blocking) input/output provided by the supported OSes (Unix, OS X and Windows at least).
Asynchronous IO
In this programming model open/read/write operation on devices and resources (sockets, filesystem, etc.) managed by the file-system don't block the calling thread (as in the typical synchronous c-like model) and just mark the process (in kernel/OS level data structure) to be notified when new data or events are available. In case of a web-server-like app, the process is then responsible to figure out which request/context the notified event belongs to and proceed processing the request from there. Note that this will necessarily mean you'll be on a different stack frame from the one that originated the request to the OS as the latter had to yield to a process' dispatcher in order for a single threaded process to handle new events.
The problem with the model I described is that it's not familiar and hard to reason about for the programmer as it's non-sequential in nature. "You need to make request in function A and handle the result in a different function where your locals from A are usually not available."
Node's model (Continuation Passing Style and Event Loop)
Node tackles the problem leveraging javascript's language features to make this model a little more synchronous-looking by inducing the programmer to employ a certain programming style. Every function that requests IO has a signature like function (... parameters ..., callback) and needs to be given a callback that will be invoked when the requested operation is completed (keep in mind that most of the time is spent waiting for the OS to signal the completion - time that can be spent doing other work). Javascript's support for closures allows you to use variables you've defined in the outer (calling) function inside the body of the callback - this allows to keep state between different functions that will be invoked by the node runtime independently. See also Continuation Passing Style.
Moreover, after invoking a function spawning an IO operation the calling function will usually return control to node's event loop. This loop will invoke the next callback or function that was scheduled for execution (most likely because the corresponding event was notified by the OS) - this allows the concurrent processing of multiple requests.
You can think of node's event loop as somewhat similar to the kernel's dispatcher: the kernel would schedule for execution a blocked thread once its pending IO is completed while node will schedule a callback when the corresponding event has occured.
Highly concurrent, no parallelism
As a final remark, the phrase "everything runs in parallel except your code" does a decent job of capturing the point that node allows your code to handle requests from hundreds of thousands open socket with a single thread concurrently by multiplexing and sequencing all your js logic in a single stream of execution (even though saying "everything runs in parallel" is probably not correct here - see Concurrency vs Parallelism - What is the difference?). This works pretty well for webapp servers as most of the time is actually spent on waiting for network or disk (database / sockets) and the logic is not really CPU intensive - that is to say: this works well for IO-bound workloads.
Well, to give some perspective, let me compare node.js with apache.
Apache is a multi-threaded HTTP server, for each and every request that the server receives, it creates a separate thread which handles that request.
Node.js on the other hand is event driven, handling all requests asynchronously from single thread.
When A and B are received on apache, two threads are created which handle requests. Each handling the query separately, each waiting for the query results before serving the page. The page is only served until the query is finished. The query fetch is blocking because the server cannot execute the rest of thread until it receives the result.
In node, c.query is handled asynchronously, which means while c.query fetches the results for A, it jumps to handle c.query for B, and when the results arrive for A arrive it sends back the results to callback which sends the response. Node.js knows to execute callback when fetch finishes.
In my opinion, because it's a single thread model, there is no way to
switch from one request to another.
Actually the node server does exactly that for you all the time. To make switches, (the asynchronous behavior) most functions that you would use will have callbacks.
Edit
The SQL query is taken from mysql library. It implements callback style as well as event emitter to queue SQL requests. It does not execute them asynchronously, that is done by the internal libuv threads that provide the abstraction of non-blocking I/O. The following steps happen for making a query :
Open a connection to db, connection itself can be made asynchronously.
Once db is connected, query is passed on to the server. Queries can be queued.
The main event loop gets notified of the completion with callback or event.
Main loop executes your callback/eventhandler.
The incoming requests to http server are handled in the similar fashion. The internal thread architecture is something like this:
The C++ threads are the libuv ones which do the asynchronous I/O (disk or network). The main event loop continues to execute after the dispatching the request to thread pool. It can accept more requests as it does not wait or sleep. SQL queries/HTTP requests/file system reads all happen this way.
Node.js uses libuv behind the scenes. libuv has a thread pool (of size 4 by default). Therefore Node.js does use threads to achieve concurrency.
However, your code runs on a single thread (i.e., all of the callbacks of Node.js functions will be called on the same thread, the so called loop-thread or event-loop). When people say "Node.js runs on a single thread" they are really saying "the callbacks of Node.js run on a single thread".
Node.js is based on the event loop programming model. The event loop runs in single thread and repeatedly waits for events and then runs any event handlers subscribed to those events. Events can be for example
timer wait is complete
next chunk of data is ready to be written to this file
theres a fresh new HTTP request coming our way
All of this runs in single thread and no JavaScript code is ever executed in parallel. As long as these event handlers are small and wait for yet more events themselves everything works out nicely. This allows multiple request to be handled concurrently by a single Node.js process.
(There's a little bit magic under the hood as where the events originate. Some of it involve low level worker threads running in parallel.)
In this SQL case, there's a lot of things (events) happening between making the database query and getting its results in the callback. During that time the event loop keeps pumping life into the application and advancing other requests one tiny event at a time. Therefore multiple requests are being served concurrently.
According to: "Event loop from 10,000ft - core concept behind Node.js".
The function c.query() has two argument
c.query("Fetch Data", "Post-Processing of Data")
The operation "Fetch Data" in this case is a DB-Query, now this may be handled by Node.js by spawning off a worker thread and giving it this task of performing the DB-Query. (Remember Node.js can create thread internally). This enables the function to return instantaneously without any delay
The second argument "Post-Processing of Data" is a callback function, the node framework registers this callback and is called by the event loop.
Thus the statement c.query (paramenter1, parameter2) will return instantaneously, enabling node to cater for another request.
P.S: I have just started to understand node, actually I wanted to write this as comment to #Philip but since didn't have enough reputation points so wrote it as an answer.
if you read a bit further - "Of course, on the backend, there are threads and processes for DB access and process execution. However, these are not explicitly exposed to your code, so you can’t worry about them other than by knowing that I/O interactions e.g. with the database, or with other processes will be asynchronous from the perspective of each request since the results from those threads are returned via the event loop to your code."
about - "everything runs in parallel except your code" - your code is executed synchronously, whenever you invoke an asynchronous operation such as waiting for IO, the event loop handles everything and invokes the callback. it just not something you have to think about.
in your example: there are two requests A (comes first) and B. you execute request A, your code continue to run synchronously and execute request B. the event loop handles request A, when it finishes it invokes the callback of request A with the result, same goes to request B.
Okay, most things should be clear so far... the tricky part is the SQL: if it is not in reality running in another thread or process in it’s entirety, the SQL-execution has to be broken down into individual steps (by an SQL processor made for asynchronous execution!), where the non-blocking ones are executed, and the blocking ones (e.g. the sleep) actually can be transferred to the kernel (as an alarm interrupt/event) and put on the event list for the main loop.
That means, e.g. the interpretation of the SQL, etc. is done immediately, but during the wait (stored as an event to come in the future by the kernel in some kqueue, epoll, ... structure; together with the other IO operations) the main loop can do other things and eventually check if something happened of those IOs and waits.
So, to rephrase it again: the program is never (allowed to get) stuck, sleeping calls are never executed. Their duty is done by the kernel (write something, wait for something to come over the network, waiting for time to elapse) or another thread or process. – The Node process checks if at least one of those duties is finished by the kernel in the only blocking call to the OS once in each event-loop-cycle. That point is reached, when everything non-blocking is done.
Clear? :-)
I don’t know Node. But where does the c.query come from?
The event loop is what allows Node.js to perform non-blocking I/O operations — despite the fact that JavaScript is single-threaded — by offloading operations to the system kernel whenever possible. Think of event loop as the manager.
New requests are sent into a queue and watched by the synchronous event demultiplexer. As you see each operations handler is also registered.
Then those requests are sent to the thread pool (Worker Pool) synchronously to be executed. JavaScript cannot perform asynchronous I/O operations. In browser environment, browser handles the async operations. In node environment, async operations are handled by the libuv by using C++. Thread's pool default size is 4, but it can be changed at startup time by setting the UV_THREADPOOL_SIZE environment variable to any value (maximum is 128). thread pool size 4 means 4 requests can get executed at a time, if event demultiplexer has 5 requsts, 4 would be passed to thread pool and 5th would be waiting. Once each request gets executed, result is returned to the `event demultiplexer.
When a set of I/O operations completes, the Event Demultiplexer pushes a set of corresponding events into the Event Queue.
handler is the callback. Now event loop keeps an eye on the event queue, if there is something ready, it is pushed to stack to execute the callback. Remember eventually callbacks get executed on stack. Note that some callbacks has priorities on other, the event loop does pick the callbacks based on their priorities.
For those who seek short answer and don't want to go to the deepest levels of Node.js internals.
Node.js is not single threaded, it runs on 5 threads by default.
Yes, the only single thread is for actual JavaScript processing, but it always switches from function to function.
It sends SQL query to a database and lets it wait in other thread, while single threaded Node.js continues to compute some other code ready to be computed.
If you wish more explanations, there are good articles about Event Loop, Worker Pool and the whole libuv documentation.

Resources