nodejs run single or multi thread? - node.js

I read some artirle talk about nodejs work as single thread. It talked nodejs handle second request after finished first request. So i try to test it. I have some code like
this. Then i send three request sequentially:
http://localhost:3000/?wait=1 : start at **14:29:14**.469Z
http://localhost:3000/?wait=1 : .........**14:29:19**.496Z
http://localhost:3000/ : .........**14:29:15**.725Z
I dont understand. I though third request should be started at 14:29:19, right ? or i misunderstood. Please explain for me, thanks everyone!

Javascript (not just node.js but even your web browser regardless if it's Chrome or Safari or IE or Edge) is single threaded. The structure of the javascript interpreter can generally be described by the following pseudocode:
script_to_exec = compile_js_from_files()
do {
event_queue = execute_javascript(script_to_exec)
completed_events = wait_for_events()
for_each (event from completed_events) {
this_event = event_queue.find_one_matching(event)
script_to_exec = this_event.callback
}
} while (there_is_pending_events)
You may notice that there is no threading code at all in the above code. All we are doing in wait_for_events() is to use the async I/O API provided by the OS. Depending on the OS this can be something like the POSIX select() or the Linux and OSX poll() or BSD epoll() or Windows overlapped I/O functions. Node uses the libuv library which will automatically select the best API to use at compile time.
Loop round 1
Ok. So first round of the loop it executes your entire code. This sets up a TCP socket listener when you set up express. Then it gets stuck at wait_for_events()
The first HTTP request causes wait_for_events() to return. This causes Express to go through your list of middlewares and routes to find a route to execute. It finds your route and calls wait() which calls setTimeout() which adds its callback to the list of timers.
Loop round 2
Since there is no more javascript to execute until await returns we go round the loop again and get stuck at wait_for_events() again.
The second HTTP request causes wait_for_events() to return. This causes Express to go through your list of middlewares and routes and repeat what I described previously.
Loop round 3
Since there is no more javascript to execute until await returns we go round the loop again and get stuck at wait_for_events() again.
The third HTTP request causes wait_for_events() to return. This causes Express to go through your list of middlewares and routes and repeat what I described previously.
Loop round 4
Again, we go round the loop again and get stuck at wait_for_events().
The first await wait() timer expires. This causes wait_for_events() to return and javascript figures out that the event is the timer for that wait therefore it continues processing the first HTTP request. This causes Express to send a response.
Loop round 5
We go round the loop again and wait for wait_for_events().
The second await wait() timer expires. This causes wait_for_events() to return and javascript figures out that the event is the timer for that wait therefore it continues processing the second HTTP request. This causes Express to send a response.
Loop round 6
We go round the loop again and wait for wait_for_events().
The third await wait() timer expires. This causes wait_for_events() to return and javascript figures out that the event is the timer for that wait therefore it continues processing the third HTTP request. This causes Express to send a response.
Loop round 7
We go round the loop again and get stuck at wait_for_events() again waiting for more HTTP requests...
Summary
Basically javascript can wait for multiple things in parallel. This evolved out of the fact that it started as a GUI scripting language that had to have the ability to wait for things like mouse click, keyboard input etc. But it can only execute one instruction at a time (unless of course you deliberately use features to execute additional processes/threads such as web workers or worker threads).
If you're wondering why people use node.js for high performance web servers I have written an answer to this related question explaining why node is fast: Node js architecture and performance
Here's some of my other answers going into different levels of details about asynchronous execution:
I know that callback function runs asynchronously, but why?
Is my understanding of asynchronous operations correct?
Is there any other way to implement a "listening" function without an infinite while loop?
node js - what happens to incoming events during callback excution
I know that callback function runs asynchronously, but why?

Related

How many thread really have in Node

As Node is using java-script engine internally it means Node has only one thread but i was checking online,some people saying two thread some saying one thread per core and some are saying something else,Can anyone clarify?
And if it is single threaded then how many request it can handle concurrently because if 1000 request are coming at the same time and say each request taking 100ms then the one at the 1001th will take 100 sec (100×1000=100,000 ms=100 sec).
So it means Node is not good for large number of user?
Based on your writing, I can tell you know something about programming, some logic and, some basic math, but you are getting into javascript (not java-script) now.
That background makes you an excellent fit for this new javascript on the server-side with nodejs, and I foresee you becoming great at it.
What is clear to me is that you are confusing parallelism with concurrency and this post here is going to be very useful for you.
But a TLDR could be something like this:
Parallelism: 2 or more Process/Threads running at the same time
Concurrency: The main single Process/Thread will not be waiting for an IO operation to end, it will keep doing other things and get back to it whenever the IO operation ends
IO (Input/Output) operations involve interactions with the OS (Operating System) by using the DISK or the NETWORK, for example.
In nodejs, those tasks are asynchronous, and that's why they ask for a callback, or they are Promise based.
const fs = require('fs')
function myCallbackFunc (err, data) {
if (err) {
return console.error(err)
}
console.log(data)
}
fs.readFile('./some-large-file', myCallbackFunc)
fs.readFile('./some-tiny-file', myCallbackFunc)
A simple way you could theoretically say that in the example above you'll have a
main thread taking care of your code and another one (which you don't control at all) observing the asynchronous requests, and the second one will call the myCallBackFunc whenever the IO process, which will happen concurrently, ends.
So YES, nodejs is GREAT for a large number of requests.
Of course, that single process will still share the same computational power with all the concurrent requests it is taking care.
Meaning that if within the callback above you are doing some heavy computational tasks that take a while to execute, the second call would have to wait for the first one to end.
In that case, if you are running it on your own Server/Container, you can make use of real parallelism by forking the process using the Cluster native module that already comes with nodejs :D
Node has 1 main thread(that's why called single threaded) which receives all request and it then give that request to internal threads(these are not main thread these are smaller threads) which are present in thread pool. Node js put all its request in event queue. And node server continuously runs internal event loop which checks any request is placed in Event Queue. If no, then wait for incoming requests for indefinitely else it picks one request from event queue.
And checks Threads availability from Internal Thread Pool if it's available then it picks up one Thread and assign this request to that thread.
That Thread is responsible for taking that request, process it, perform Blocking IO operations, prepare response and send it back to the Event Loop
Event Loop in turn, sends that Response to the respective Client.
Click here for Node js architecture image

Express handle CPU heavy request

ALl:
I am pretty new to Node and Express, I wonder if anyone could give me a little explain how Express server works when handle the request, I got server stuck when make some CPU calc heavy request, for an simple example:
app.get("/", function(req, res){
// some long time math calculation in for loop or while to get data
res.json(data);
})
I thought Node is event loop based single process, when a task fired, the task will be put into a thread pool, and this single process keep asking if it finish or not, for my example, I thought app.get can put the handler function task in that pool and waiting for other request, but unfortuantely it does not seems work in that way.
So how exactly it works in app.get and how can I seperate this heavy calc task and make this request handler waiting for other request?
Any example will be appreciated
Thanks
NodeJS, like browser JavaScript, is single-threaded. Tasks to be executed (such as your handler function) are put in the event queue, and executed in sequence. Therefore, if one function is taking a long time to execute, it will prevent other stuff in the queue (such as another request) from running.
If you don't want your long computation to prevent other requests from being processed, you can split it up across multiple times through the event loop, e.g. with setImmediate(). See this page on the NodeJS event loop.
A (probably better) option would be to do the computation in another thread/process, so that it doesn't interfere with request handling at all.

Single threaded and Event Loop in Node.js

First of all, I am starter trying to understand what is Node.Js. I have two questions.
First Question
From the article of Felix, it said "there can only be one callback firing at the same time. Until that callback has finished executing, all other callbacks have to wait in line".
Then, consider about the following code (copied from nodejs official website)
var http = require('http');
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.end('Hello World\n');
}).listen(8124, "127.0.0.1");
If two client requests are received simultaneously, it means the following workflow:
First http request event received, Second request event received.
As soon as first event received, callback function for first event is executing.
At while, callback function for second event has to be waiting.
Am I right? If I am right, how Node.js control if there are thousands of client request within very short-time duration.
Second Question
The term "Event Loop" is mostly used in Node.js topic. I have understood "Event Loop" as the following from http://www.wisegeek.com/what-is-an-event-loop.htm;
An event loop - or main loop, is a construct within programs that
controls and dispatches events following an initial event.
The initial event can be anything, including pushing a button on a
keyboard or clicking a button on a program (in Node.js, I think the
initial events will be http request, db queries or I/O file access).
This is called a loop, not because the event circles and happens
continuously, but because the loop prepares for an event, checks the
event, dispatches an event and repeats the process all over again.
I have a conflict about second paragraph especially the phrase"repeats the process all over again". I accepted that the above http.createServer code from above question is absolutely "event loop" because it repeatedly listens the http request events.
But I don't know how to identify the following code as whether event-driven or event loop. It does not repeat anything except the callback function fired after db query is finished.
database.query("SELECT * FROM table", function(rows) {
var result = rows;
});
Please, let me hear your opinions and answers.
Answer one, your logic is correct: second event will wait. And will execute as far as its queued callbacks time comes.
As well, remember that there is no such thing as "simultaneously" in technical world. Everything have very specific place and time.
The way node.js manages thousands of connections is that there is no need to hold thread idling while there is some database call blocking the logic, or another IO operation is processing (like streams for example). It can "serve" first request, maybe creating more callbacks, and proceed to others.
Because there is no way to block the execution (except nonsense while(true) and similar), it becomes extremely efficient in spreading actual resources all over application logic.
Threads - are expensive, and server capacity of threads is directly related to available Memory. So most of classic web applications would suffer just because RAM is used on threads that are simply idling while there is database query block going on or similar. In node that's not a case.
Still, it allows to create multiple threads (as child_process) through cluster, that expands even more possibilities.
Answer Two. There is no such thing as "loop" that you might thinking about. There will be no loop behind the scenes that does checks if there is connections or any data received and so on. It is nowadays handled by Async methods as well.
So from application point of view, there is no 'main loop', and everything from developer point of view is event-driven (not event-loop).
In case with http.createServer, you bind callback as response to requests. All socket operations and IO stuff will happen behind the scenes, as well as HTTP handshaking, parsing headers, queries, parameters, and so on. Once it happens behind the scenes and job is done, it will keep data and will push callback to event loop with some data. Once event loop ill be free and will come time it will execute in node.js application context your callback with data from behind the scenes.
With database request - same story. It ill prepare and ask stuff (might do it even async again), and then will callback once database responds and data will be prepared for application context.
To be honest, all you need with node.js is to understand the concept, but not implementation of events.
And the best way to do it - experiment.
1) Yes, you are right.
It works because everything you do with node is primarily I/O bound.
When a new request (event) comes in, it's put into a queue. At initialization time, Node allocates a ThreadPool which is responsible to spawn threads for I/O bound processing, like network/socket calls, database, etc. (this is non-blocking).
Now, your "callbacks" (or event handlers) are extremely fast because most of what you are doing is most likely CRUD and I/O operations, not CPU intensive.
Therefore, these callbacks give the feeling that they are being processed in parallel, but they are actually not, because the actual parallel work is being done via the ThreadPool (with multi-threading), while the callbacks per-se are just receiving the result from these threads so that processing can continue and send a response back to the client.
You can easily verify this: if your callbacks are heavy CPU tasks, you can be sure that you will not be able to process thousands of requests per second and it scales down really bad, comparing to a multi-threaded system.
2) You are right, again.
Unfortunately, due to all these abstractions, you have to dive in order to understand what's going on in background. However, yes, there is a loop.
In particular, Nodejs is implemented with libuv.
Interesting to read.
But I don't know how to identify the following code as whether event-driven or event loop. It does not repeat anything except the callback function fired after db query is finished.
Event-driven is a term you normally use when there is an event-loop, and it means an app that is driven by events such as click-on-button, data-arrived, etc. Normally you associate a callback to such events.

How the single threaded non blocking IO model works in Node.js

I'm not a Node programmer, but I'm interested in how the single-threaded non-blocking IO model works.
After I read the article understanding-the-node-js-event-loop, I'm really confused about it.
It gave an example for the model:
c.query(
'SELECT SLEEP(20);',
function (err, results, fields) {
if (err) {
throw err;
}
res.writeHead(200, {'Content-Type': 'text/html'});
res.end('<html><head><title>Hello</title></head><body><h1>Return from async DB query</h1></body></html>');
c.end();
}
);
Que: When there are two requests A(comes first) and B since there is only a single thread, the server-side program will handle the request A firstly: doing SQL querying is asleep statement standing for I/O wait. And The program is stuck at the I/O waiting, and cannot execute the code which renders the web page behind. Will the program switch to request B during the waiting? In my opinion, because of the single thread model, there is no way to switch one request from another. But the title of the example code says that everything runs in parallel except your code.
(P.S I'm not sure if I misunderstand the code or not since I have
never used Node.)How Node switch A to B during the waiting? And can
you explain the single-threaded non-blocking IO model of Node in a
simple way? I would appreciate it if you could help me. :)
Node.js is built upon libuv, a cross-platform library that abstracts apis/syscalls for asynchronous (non-blocking) input/output provided by the supported OSes (Unix, OS X and Windows at least).
Asynchronous IO
In this programming model open/read/write operation on devices and resources (sockets, filesystem, etc.) managed by the file-system don't block the calling thread (as in the typical synchronous c-like model) and just mark the process (in kernel/OS level data structure) to be notified when new data or events are available. In case of a web-server-like app, the process is then responsible to figure out which request/context the notified event belongs to and proceed processing the request from there. Note that this will necessarily mean you'll be on a different stack frame from the one that originated the request to the OS as the latter had to yield to a process' dispatcher in order for a single threaded process to handle new events.
The problem with the model I described is that it's not familiar and hard to reason about for the programmer as it's non-sequential in nature. "You need to make request in function A and handle the result in a different function where your locals from A are usually not available."
Node's model (Continuation Passing Style and Event Loop)
Node tackles the problem leveraging javascript's language features to make this model a little more synchronous-looking by inducing the programmer to employ a certain programming style. Every function that requests IO has a signature like function (... parameters ..., callback) and needs to be given a callback that will be invoked when the requested operation is completed (keep in mind that most of the time is spent waiting for the OS to signal the completion - time that can be spent doing other work). Javascript's support for closures allows you to use variables you've defined in the outer (calling) function inside the body of the callback - this allows to keep state between different functions that will be invoked by the node runtime independently. See also Continuation Passing Style.
Moreover, after invoking a function spawning an IO operation the calling function will usually return control to node's event loop. This loop will invoke the next callback or function that was scheduled for execution (most likely because the corresponding event was notified by the OS) - this allows the concurrent processing of multiple requests.
You can think of node's event loop as somewhat similar to the kernel's dispatcher: the kernel would schedule for execution a blocked thread once its pending IO is completed while node will schedule a callback when the corresponding event has occured.
Highly concurrent, no parallelism
As a final remark, the phrase "everything runs in parallel except your code" does a decent job of capturing the point that node allows your code to handle requests from hundreds of thousands open socket with a single thread concurrently by multiplexing and sequencing all your js logic in a single stream of execution (even though saying "everything runs in parallel" is probably not correct here - see Concurrency vs Parallelism - What is the difference?). This works pretty well for webapp servers as most of the time is actually spent on waiting for network or disk (database / sockets) and the logic is not really CPU intensive - that is to say: this works well for IO-bound workloads.
Well, to give some perspective, let me compare node.js with apache.
Apache is a multi-threaded HTTP server, for each and every request that the server receives, it creates a separate thread which handles that request.
Node.js on the other hand is event driven, handling all requests asynchronously from single thread.
When A and B are received on apache, two threads are created which handle requests. Each handling the query separately, each waiting for the query results before serving the page. The page is only served until the query is finished. The query fetch is blocking because the server cannot execute the rest of thread until it receives the result.
In node, c.query is handled asynchronously, which means while c.query fetches the results for A, it jumps to handle c.query for B, and when the results arrive for A arrive it sends back the results to callback which sends the response. Node.js knows to execute callback when fetch finishes.
In my opinion, because it's a single thread model, there is no way to
switch from one request to another.
Actually the node server does exactly that for you all the time. To make switches, (the asynchronous behavior) most functions that you would use will have callbacks.
Edit
The SQL query is taken from mysql library. It implements callback style as well as event emitter to queue SQL requests. It does not execute them asynchronously, that is done by the internal libuv threads that provide the abstraction of non-blocking I/O. The following steps happen for making a query :
Open a connection to db, connection itself can be made asynchronously.
Once db is connected, query is passed on to the server. Queries can be queued.
The main event loop gets notified of the completion with callback or event.
Main loop executes your callback/eventhandler.
The incoming requests to http server are handled in the similar fashion. The internal thread architecture is something like this:
The C++ threads are the libuv ones which do the asynchronous I/O (disk or network). The main event loop continues to execute after the dispatching the request to thread pool. It can accept more requests as it does not wait or sleep. SQL queries/HTTP requests/file system reads all happen this way.
Node.js uses libuv behind the scenes. libuv has a thread pool (of size 4 by default). Therefore Node.js does use threads to achieve concurrency.
However, your code runs on a single thread (i.e., all of the callbacks of Node.js functions will be called on the same thread, the so called loop-thread or event-loop). When people say "Node.js runs on a single thread" they are really saying "the callbacks of Node.js run on a single thread".
Node.js is based on the event loop programming model. The event loop runs in single thread and repeatedly waits for events and then runs any event handlers subscribed to those events. Events can be for example
timer wait is complete
next chunk of data is ready to be written to this file
theres a fresh new HTTP request coming our way
All of this runs in single thread and no JavaScript code is ever executed in parallel. As long as these event handlers are small and wait for yet more events themselves everything works out nicely. This allows multiple request to be handled concurrently by a single Node.js process.
(There's a little bit magic under the hood as where the events originate. Some of it involve low level worker threads running in parallel.)
In this SQL case, there's a lot of things (events) happening between making the database query and getting its results in the callback. During that time the event loop keeps pumping life into the application and advancing other requests one tiny event at a time. Therefore multiple requests are being served concurrently.
According to: "Event loop from 10,000ft - core concept behind Node.js".
The function c.query() has two argument
c.query("Fetch Data", "Post-Processing of Data")
The operation "Fetch Data" in this case is a DB-Query, now this may be handled by Node.js by spawning off a worker thread and giving it this task of performing the DB-Query. (Remember Node.js can create thread internally). This enables the function to return instantaneously without any delay
The second argument "Post-Processing of Data" is a callback function, the node framework registers this callback and is called by the event loop.
Thus the statement c.query (paramenter1, parameter2) will return instantaneously, enabling node to cater for another request.
P.S: I have just started to understand node, actually I wanted to write this as comment to #Philip but since didn't have enough reputation points so wrote it as an answer.
if you read a bit further - "Of course, on the backend, there are threads and processes for DB access and process execution. However, these are not explicitly exposed to your code, so you can’t worry about them other than by knowing that I/O interactions e.g. with the database, or with other processes will be asynchronous from the perspective of each request since the results from those threads are returned via the event loop to your code."
about - "everything runs in parallel except your code" - your code is executed synchronously, whenever you invoke an asynchronous operation such as waiting for IO, the event loop handles everything and invokes the callback. it just not something you have to think about.
in your example: there are two requests A (comes first) and B. you execute request A, your code continue to run synchronously and execute request B. the event loop handles request A, when it finishes it invokes the callback of request A with the result, same goes to request B.
Okay, most things should be clear so far... the tricky part is the SQL: if it is not in reality running in another thread or process in it’s entirety, the SQL-execution has to be broken down into individual steps (by an SQL processor made for asynchronous execution!), where the non-blocking ones are executed, and the blocking ones (e.g. the sleep) actually can be transferred to the kernel (as an alarm interrupt/event) and put on the event list for the main loop.
That means, e.g. the interpretation of the SQL, etc. is done immediately, but during the wait (stored as an event to come in the future by the kernel in some kqueue, epoll, ... structure; together with the other IO operations) the main loop can do other things and eventually check if something happened of those IOs and waits.
So, to rephrase it again: the program is never (allowed to get) stuck, sleeping calls are never executed. Their duty is done by the kernel (write something, wait for something to come over the network, waiting for time to elapse) or another thread or process. – The Node process checks if at least one of those duties is finished by the kernel in the only blocking call to the OS once in each event-loop-cycle. That point is reached, when everything non-blocking is done.
Clear? :-)
I don’t know Node. But where does the c.query come from?
The event loop is what allows Node.js to perform non-blocking I/O operations — despite the fact that JavaScript is single-threaded — by offloading operations to the system kernel whenever possible. Think of event loop as the manager.
New requests are sent into a queue and watched by the synchronous event demultiplexer. As you see each operations handler is also registered.
Then those requests are sent to the thread pool (Worker Pool) synchronously to be executed. JavaScript cannot perform asynchronous I/O operations. In browser environment, browser handles the async operations. In node environment, async operations are handled by the libuv by using C++. Thread's pool default size is 4, but it can be changed at startup time by setting the UV_THREADPOOL_SIZE environment variable to any value (maximum is 128). thread pool size 4 means 4 requests can get executed at a time, if event demultiplexer has 5 requsts, 4 would be passed to thread pool and 5th would be waiting. Once each request gets executed, result is returned to the `event demultiplexer.
When a set of I/O operations completes, the Event Demultiplexer pushes a set of corresponding events into the Event Queue.
handler is the callback. Now event loop keeps an eye on the event queue, if there is something ready, it is pushed to stack to execute the callback. Remember eventually callbacks get executed on stack. Note that some callbacks has priorities on other, the event loop does pick the callbacks based on their priorities.
For those who seek short answer and don't want to go to the deepest levels of Node.js internals.
Node.js is not single threaded, it runs on 5 threads by default.
Yes, the only single thread is for actual JavaScript processing, but it always switches from function to function.
It sends SQL query to a database and lets it wait in other thread, while single threaded Node.js continues to compute some other code ready to be computed.
If you wish more explanations, there are good articles about Event Loop, Worker Pool and the whole libuv documentation.

How to use browser.wait() in zombie.js?

I've got a Web application that continuously polls for data from the server using Ajax requests. I would like to implement an integration test for it using zombie.js.
What I am trying to do is to wait until the Ajax poll loop receives data from the server. The data should be received after 20 seconds, so I use browser.wait(done, callback) to check if the data is there, and set waitFor to a maximum timeout of one minute.
However, browser.wait() always returns almost immediately, even if my done callback returns false.
In the zombie API documentation, I read the following about browser.wait():
... it can't wait forever, especially not for timers that may fire repeatedly (e.g. checking page state, long polling).
I guess that's the reason for the behavior I see, but I don't really understand what's going on. Why can't I wait for one minute until my poll loop receives data from the server? Why can't browser.wait() wait for timers that may fire repeatedly? What do I need to do to implement my test?
Zombie.js will by default wait untill all the scripts on your page have loaded and executed if they are waiting for document ready.
If I understand you correctly, your script will not execute til after 20 seconds of document ready. In that case Zombie has a function which will let you evaluate javascript in the context of the browser, so you can kick off your ajax code quicker if it is on a timer, and you do not want to wait for it.
Look at browser.evaluate(expr)
Another option would be to simply use a normal JavaScript timeout to wait 20 seconds, and then look at the DOM for the changes you are expecting.
setTimeout(function(){
browser.document.query("#interestingElement")
}, 20*1000);

Resources