How is Node.js asynchronous AND single-threaded at the same time? - node.js

The question title basically says it all, but to rephrase it:What handles the asynchronous function execution, if the main (and only) thread is occupied with running down the main code block?
So far I have only found that the async code gets executed elsewhere or outside the main thread, but what does this mean specifically?
EDIT: The proposed Node.js Event loop question's answers may also address this topic, but I was looking for a less complex, more specific answer, rather then an explanation of Node.js concept. Also, it does not show up in search for anything similar to "node asynchronous single-threaded".
EDIT, #Mr_Thorynque: Running a query to get data from database and log it to console. Nothing gets logged, because Node, being async, does not wait for query to finish and for data to populate. (this is just an example as requested, NOT a part of my question)
var = data;
mysql.query(`SELECT *some rows from database*`, function (err, rows, fields) {
rows.forEach(function(row){
data += *gather the requested data*
});
});
console.log(data);

What it really comes down to is that the node process (which is single threaded) hands off the work to "something else". This could be the OS's I/O process, or a networked resource or whatever. By handing it off, it frees its thread to keep working on the next in-memory thing. It uses file handles to keep track of any pending work and in the event loop marries the two back together and fire the callback when the work is done.
Note that this is also why you can block processing in your own code if poorly designed. If your code runs complex tasks and doesn't hand off the work, you'll block the single thread.
This is about as simple an answer as I can make, I think that the details are well explained in the links in your comments.

Related

Play-Scala Async for Dummies

I tried researching and understanding the async and non-blocking ability of play.
What I understand(may be wrong):
Action.async leads to Future[Result] which are placeholders of results that are yet to be received. A request comes in and a method handles it (say a database query) but as the db call is made, the thread is freed up to take another request. So how does the system not lose track of that db call as a thread is no longer with it?
Once the result is received does an available thread then pick up on that result and responds with it?
Usually when learning new concepts I'd like an animated or layman's terms type video that visually shows threads and requests.
Also isn't there waiting that has to be done on every request anyway, is it just that resources are used when that waiting is being done?
Thanks in advance!

Understanding the Event-Loop in node.js

I've been reading a lot about the Event Loop, and I understand the abstraction provided whereby I can make an I/O request (let's use fs.readFile(foo.txt)) and just pass in a callback that will be executed once a particular event indicates completion of the file reading is fired. However, what I do not understand is where the function that is doing the work of actually reading the file is being executed. Javascript is single-threaded, but there are two things happening at once: the execution of my node.js file and of some program/function actually reading data from the hard drive. Where does this second function take place in relation to node?
The Node event loop is truly single threaded. When we start up a program with Node, a single instance of the event loop is created and placed into one thread.
However for some standard library function calls, the node C++ side and libuv decide to do expensive calculations outside of the event loop entirely. So they will not block the main loop or event loop. Instead they make use of something called a thread pool that thread pool is a series of (by default) four threads that can be used for running computationally intensive tasks. There are ONLY FOUR things that use this thread pool - DNS lookup, fs, crypto and zlib. Everything else execute in the main thread.
"Of course, on the backend, there are threads and processes for DB access and process execution. However, these are not explicitly exposed to your code, so you can’t worry about them other than by knowing that I/O interactions e.g. with the database, or with other processes will be asynchronous from the perspective of each request since the results from those threads are returned via the event loop to your code. Compared to the Apache model, there are a lot less threads and thread overhead, since threads aren’t needed for each connection; just when you absolutely positively must have something else running in parallel and even then the management is handled by Node.js." via http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/
Its like using, setTimeout(function(){/*file reading code here*/},1000);. JavaScript can run multiple things side by side like, having three setInterval(function(){/*code to execute*/},1000);. So in a way, JavaScript is multi-threading. And for actually reading from/or writing to the hard drive, in NodeJS, if you use:
var child=require("child_process");
function put_text(file,text){
child.exec("echo "+text+">"+file);
}
function get_text(file){
//JQuery code for getting file contents here (i think)
return JQueryResults;
}
These can also be used for reading and writing to/from the hard drive using NodeJS.

Single threaded and Event Loop in Node.js

First of all, I am starter trying to understand what is Node.Js. I have two questions.
First Question
From the article of Felix, it said "there can only be one callback firing at the same time. Until that callback has finished executing, all other callbacks have to wait in line".
Then, consider about the following code (copied from nodejs official website)
var http = require('http');
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.end('Hello World\n');
}).listen(8124, "127.0.0.1");
If two client requests are received simultaneously, it means the following workflow:
First http request event received, Second request event received.
As soon as first event received, callback function for first event is executing.
At while, callback function for second event has to be waiting.
Am I right? If I am right, how Node.js control if there are thousands of client request within very short-time duration.
Second Question
The term "Event Loop" is mostly used in Node.js topic. I have understood "Event Loop" as the following from http://www.wisegeek.com/what-is-an-event-loop.htm;
An event loop - or main loop, is a construct within programs that
controls and dispatches events following an initial event.
The initial event can be anything, including pushing a button on a
keyboard or clicking a button on a program (in Node.js, I think the
initial events will be http request, db queries or I/O file access).
This is called a loop, not because the event circles and happens
continuously, but because the loop prepares for an event, checks the
event, dispatches an event and repeats the process all over again.
I have a conflict about second paragraph especially the phrase"repeats the process all over again". I accepted that the above http.createServer code from above question is absolutely "event loop" because it repeatedly listens the http request events.
But I don't know how to identify the following code as whether event-driven or event loop. It does not repeat anything except the callback function fired after db query is finished.
database.query("SELECT * FROM table", function(rows) {
var result = rows;
});
Please, let me hear your opinions and answers.
Answer one, your logic is correct: second event will wait. And will execute as far as its queued callbacks time comes.
As well, remember that there is no such thing as "simultaneously" in technical world. Everything have very specific place and time.
The way node.js manages thousands of connections is that there is no need to hold thread idling while there is some database call blocking the logic, or another IO operation is processing (like streams for example). It can "serve" first request, maybe creating more callbacks, and proceed to others.
Because there is no way to block the execution (except nonsense while(true) and similar), it becomes extremely efficient in spreading actual resources all over application logic.
Threads - are expensive, and server capacity of threads is directly related to available Memory. So most of classic web applications would suffer just because RAM is used on threads that are simply idling while there is database query block going on or similar. In node that's not a case.
Still, it allows to create multiple threads (as child_process) through cluster, that expands even more possibilities.
Answer Two. There is no such thing as "loop" that you might thinking about. There will be no loop behind the scenes that does checks if there is connections or any data received and so on. It is nowadays handled by Async methods as well.
So from application point of view, there is no 'main loop', and everything from developer point of view is event-driven (not event-loop).
In case with http.createServer, you bind callback as response to requests. All socket operations and IO stuff will happen behind the scenes, as well as HTTP handshaking, parsing headers, queries, parameters, and so on. Once it happens behind the scenes and job is done, it will keep data and will push callback to event loop with some data. Once event loop ill be free and will come time it will execute in node.js application context your callback with data from behind the scenes.
With database request - same story. It ill prepare and ask stuff (might do it even async again), and then will callback once database responds and data will be prepared for application context.
To be honest, all you need with node.js is to understand the concept, but not implementation of events.
And the best way to do it - experiment.
1) Yes, you are right.
It works because everything you do with node is primarily I/O bound.
When a new request (event) comes in, it's put into a queue. At initialization time, Node allocates a ThreadPool which is responsible to spawn threads for I/O bound processing, like network/socket calls, database, etc. (this is non-blocking).
Now, your "callbacks" (or event handlers) are extremely fast because most of what you are doing is most likely CRUD and I/O operations, not CPU intensive.
Therefore, these callbacks give the feeling that they are being processed in parallel, but they are actually not, because the actual parallel work is being done via the ThreadPool (with multi-threading), while the callbacks per-se are just receiving the result from these threads so that processing can continue and send a response back to the client.
You can easily verify this: if your callbacks are heavy CPU tasks, you can be sure that you will not be able to process thousands of requests per second and it scales down really bad, comparing to a multi-threaded system.
2) You are right, again.
Unfortunately, due to all these abstractions, you have to dive in order to understand what's going on in background. However, yes, there is a loop.
In particular, Nodejs is implemented with libuv.
Interesting to read.
But I don't know how to identify the following code as whether event-driven or event loop. It does not repeat anything except the callback function fired after db query is finished.
Event-driven is a term you normally use when there is an event-loop, and it means an app that is driven by events such as click-on-button, data-arrived, etc. Normally you associate a callback to such events.

In Node.JS, any way to make Array.prototype.sort() yield to other processes?

FYI: I am doing this already with Web Workers, and it works fine, but I was just exploring what can and can't be done with process.nextTick.
So I have an array of a million elements that I'm sorting in Node.JS. I want Node to be responsive to other requests while it's doing this.
Is there any way to make Array.prototype.sort() not block other processes? Since this is a core function, I can't insert any process.nextTick().
I could implement quicksort manually, but I can't see how you do that efficiently, in a continuation-passing-style, which seems to be required for process.nextTick(). I can modify a for loop to do this, but sort() seems impossible.
While it's not possible to make Array.prototype.sort somehow behave asynchronously itself, asynchronous sorting is definitely possible, as shown by this sorting demo to demonstrate the speed advantage of setImmediate (shim) over setTimeout.
The source code does not seem to come with any license, unfortunately. The Github repo for the demo at https://github.com/jphpsf/setImmediate-shim-demo names Jason Weber as the author. You may want to ask him if you want to use (parts) of the code.
I think that if you use setImmediate (available since Node 0.10) the individual sort operations will be effectively interleaved with I/O callbacks. For such a big amount of work, I would not recommend process.nextTick (if it works at all, because there's a 1000 maxTickDepth limit). See setImmediate vs. nextTick for some backgroud.
Using setImmediate instead of plain "synchronous" processing will certainly be slower overall, so you could choose to handle a batch of individual sort operations per "tick" to speed things up, at the expense of Node not being responsive during that time. I think the right balance between speed and responsiveness wrt I/O can only be found with experimentation.
A much simpler alternative would be to do it more like web workers: Spawn a child process and do the sorting there. Biggest problem you face then is transferring the sorted data back to your main process(to generate some kind of output, presumably). AFAIK there's nothing like transferable objects for Node.js. After having buffered the sorted array, you could stream the results to the child process stdout and parse the data in the main process, or perhaps more simple; use child process messaging.
You may not have a spare cpu core lying around, so the child process would invade some other process cpu time. To avoid the sort process from hurting your other processes, you may need to assign it a low priority. It's seemingly not possible to do this with Node itself, but you could try using nice, as discussed here: https://groups.google.com/forum/#!topic/nodejs/9O-2gLJzmcQ . I have no experience in this matter.
Well, I initially thought you could use async.sortBy, but upon closer examination it seems that won't behave as you need. See Array.sort and Node.js for a similar question, although at the moment there's no accepted answer.
I know this is a rather old question, but I came across a similar situation, with still no simple solution that I could find.
I modified an exising quick sort and published a package that gives up execution to the eventloop periodically here:
https://www.npmjs.com/package/qsort-async
If you are familiar with a traditional quicksort, my only modification was to to the initial function which does the partitioning. Basically the function still modifies the array in place, but now returns a promise. It stops execution for other things in the eventloop if it tries to process too many elements in a single iteration. (I believe the default size I specified was 10000).
Note: Its important to use setImmedate here and not process.nextTick or setTimeout here. nextTick will actually place your execution before process IO and you will still have issues responding to other requests tied to IO. setTimeout is just too slow (which I believe one of the other answers linked a demo for).
Note 2: If something like a mergesort is more your style, you could do the same sort of logic in the 'merge' function of the of the sort.
const immediate = require('util').promisify(setImmediate);
async function quickSort(items, compare, left, right, size) {
let index;
if (items.length > 1) {
index = partition(items, compare, left, right);
if (Math.abs(left - right) > size) {
await immediate();
}
if (left < index - 1) {
await quickSort(items, compare, left, index - 1, size);
}
if (index < right) {
await quickSort(items, compare, index, right, size);
}
}
return items;
}
The full code is here: https://github.com/Mudrekh/qsort-async/blob/master/lib/quicksort.js

How do I make a non-IO operation synchronous vs. asynchronous in node.js?

I know the title sounds like a dupe of a dozen other questions, and it may well be. However, I've read those dozen questions, and Googled around for awhile, and found nothing that answers these questions to my satisfaction.
This might be because nobody has answered it properly, in which case you should vote me up.
This might be because I'm dumb and didn't understand the other answers (much more likely), in which case you should vote me down.
Context:
I know that IO operations in Node.js are detected and made to run asynchronously by default. My question is about non-IO operations that still might block/run for a long time.
Say I have a function blockingfunction with a for loop that does addition or whatnot (pure CPU cycles, no IO), and a lot of it. It takes a minute or more to run.
Say I want this function to run whenever someone makes a certain request to my server.
Question:
Obviously, if I explicitly invoke this loop at the outer level in my code, everything will block until it completes.
Most suggestions I've read suggest pushing it off into the future by starting all of my other handlers/servers etc. first, and deferring invocation of the function via process.nextTick or setTimeout(blockingfunction, 0).
But won't blockingfunction1 then just block on the next spin around the execution loop? I may be wrong, but it seems like doing that would start all of my other stuff without blocking the app, but then the first time someone made the request that results in blockingfunction being called, everything would block for as long as it took to complete.
Does putting blockingfunction inside a setTimeout or process.nextTick call somehow make it coexist with future operations without blocking them?
If not, is there a way to make blockingfunction do that without rewriting it?
How do others handle this problem? A lot of the answers I've seen are to the tune of "just trust your CPU-intensive things to be fast, they will be", but this doesn't satisfy.
Absent threading (where I can be guaranteed that the execution of blockingfunction will be interleaved with the execution of whatever else is going on), should I re-write CPU-intensive/time consuming loops to use process.nextTick to perform a fixed, guaranteed-fast number of iterations per tick?
Yes, you are correct. If you defer your function until the next tick, it will just block in that tick rather than the current one.
Unfortunately, there is no magic here that solves this for you. While it is possible to fire up that function in another process, it might not be worth the hassle, depending on what you're doing.
I recommend re-writing your function in such a way that work happens for a bit, and then continues on the next tick. Node ticks are very efficient... you could call them every iteration of a decent sized loop if needed, without a whole ton of overhead. Of course, you would have to profile it in your code to see what the impact is.
Yes, a blocking function will keep blocking even if you run it process.nextTick.
Some options:
If it truly takes a while, then perhaps it should be spun out to a queue where you can have a dedicated worker process handle it.
1a. Node.js has a child-process flavor specifically for forking other node.js files with a built in communication channel. So e.g. you can create one (or several) thread that handles these requests in order, then responds and hits the callback. See: http://nodejs.org/api/child_process.html#child_process_child_process_fork_modulepath_args_options
You can break up the blockingFunction into chunks that run in a loop. Have it call every X iterations with process.nextTick to make way for other events to be handled.

Resources