When event loop starts? - node.js

I’ve recently started to figure out what event loop really is and that confused me a lot, seems like I don’t know how nodejs works..
I mean when program starts, gets loaded into memory - what’s next?
I can’t see a place inside event loop where all sync. Code executes (like for/ while cycles that’s computes something).. doesn’t that means that V8 executes JavaScript and starts event loop when needed?
If anybody can help and explain how nodejs runtime is functioning on the high level would be really great

I highly recommend reading this Asynchrony: Now & Later
and I'll quote some things that I've once read.
........
JS Engine know nothing about code being asynchronous,It only execute code at a time and finishes..no more no less
the JS host environment is the one who has an implementation of the event-loop concept where code that doesn't need to run now(in the future),is waiting(imagine a network call/ io call) to finish processing and get called (be added to the event-queue of the event-loop and then executed at a next tick)
At program start,I'm 100% sure but I think all code is added to the event-queue(the way how the event-loop is implemented) and it's processed as First in First out (FIFO) which means the earlier the code the first is executed,and while running if some code need to be stalled like a setTimeout or IO process or an Ajax call(which both need time) it's up to them to use for example a callback to call(here the callback is added to the event-queue) and it's the event-loop responsibility to execute these callback in order that they've reached in at a next future tick.

Related

Can the same line of Node.js code run at the same time?

I'm pretty new to node and am trying to setup an express server. The server will handle various requests and if any of them fail call a common failure function that will send e-mail. If I was doing this in something like Java I'd likely use something like a synchronized block and a boolean to allow the first entrance into the code to send the mail.
Is there anything like a synchronized block in Node? I believe node is single threaded and has a few helper threads to handle asyncronous/callback code. Is it at all possible that the same line of code could run at exactly the same time in Node?
Thanks!
Can the same line of Node.js code run at the same time? Is it at all possible that the same line of code could run at exactly the same time in Node?
No, it is not. Your Javascript in node.js is entirely single threaded. An event is pulled from the event queue. That calls a callback associated with that event. That callback runs until it returns. No other events can be processed until that first one returns. When it returns, the interpreter pulls the next event from the event queue and then calls the callback associated with it.
This does not mean that there are not concurrency issues in node.js. There can be, but it is caused not by code running at the same physical time and creating conflicting access to shared variables (like can happen in threaded languages like Java). Concurrency issues can be caused by the asynchronous nature of I/O in node.js. In the asynchronous case, you call an asynchronous function, pass it a callback (or expect a promise in return). Your code then continues on and returns to the interpreter. Some time later an event will occur inside of node.js native code that will add something to the event queue. When the interpreter is free from running other Javascript, it will process that event and then call your callback which will cause more of your code to run.
While all this is "in process", other events are free to run and other parts of your Javascript can run. So, the exposure to concurrency issues comes, not from simultaneous running of two pieces of code, but from one piece of your code running while another piece of your code is waiting for a callback to occur. Both pieces of code are "in process". They are not "running" at the same time, but both operations are waiting from something else to occur in order to complete. If these two operations access variables in ways that can conflict with each other, then you can have a concurrency issue in Javascript. Because none of this is pre-emptive (like threads in Java), it's all very predictable and is much easier to design for.
Is there anything like a synchronized block in Node?
No, there is not. It is simply not needed in your Javascript code. If you wanted to protect something from some other asynchronous operation modifying it while your own asynchronous operation was waiting to complete, you can use simple flags or variables in your code. Because there's no preemption, a simple flag will work just fine.
I believe node is single threaded and has a few helper threads to handle asyncronous/callback code.
Node.js runs your Javascript as single threaded. Internally in its own native code, it does use threads in order to do its work. For example, the asynchronous file system access code internal to node.js uses threads for disk I/O. But these threads are only internal and the result of these threads is not to call Javascript directly, but to insert events in the event queue and all your Javascript is serialized through the event queue. Pull event from event queue, run callback associated with the event. Wait for that callback to return. Pull next event from the event queue, repeat...
The server will handle various requests and if any of them fail call a common failure function that will send e-mail. If I was doing this in something like Java I'd likely use something like a synchronized block and a boolean to allow the first entrance into the code to send the mail.
We'd really have to see what your code looks like to understand what exact problem you're trying to solve. I'd guess that you can just use a simple boolean (regular variable) in node.js, but we'd have to see your code to really understand what you're doing.

What's the internal difference between async and multithreading?

I used to consider async as equavelent as multithreading. Multi tasks will be done parallel. However, in javascript I wrote this and it seems that dosomething will never happen.
setTimeout(1000, dosomething)
while(true){}
Why?
Node.js is a single threaded asynchronous language. As mentioned in another answer
Javascript is single threaded (with the exception of web workers, but that is irrelavent to this example so we will ignore it). What this means is that setTimeout actually does is schedules some code to be executed some time in the future, after at least some amount of time, but only when the browser has stopped whatever else it was doing on the rendering thread at the time, which could be rendering html, or executing javascript.
In your example the execution of the while loop never stops, control is never returned to the top level, so the scheduled setTimeout function is never executed.
Multithreading is one of many ways to implement asynchronous programming. Reacting to events and yielding to a scheduler is one other way, and happens to be the way that javascript in the browser is implemented.
In your example, the event that gave you control and allowed you to call setTimeout must be allowed to complete so that the javascript engine can monitor the timeout and call your doSomething callback when it expires.

Understanding the Event-Loop in node.js

I've been reading a lot about the Event Loop, and I understand the abstraction provided whereby I can make an I/O request (let's use fs.readFile(foo.txt)) and just pass in a callback that will be executed once a particular event indicates completion of the file reading is fired. However, what I do not understand is where the function that is doing the work of actually reading the file is being executed. Javascript is single-threaded, but there are two things happening at once: the execution of my node.js file and of some program/function actually reading data from the hard drive. Where does this second function take place in relation to node?
The Node event loop is truly single threaded. When we start up a program with Node, a single instance of the event loop is created and placed into one thread.
However for some standard library function calls, the node C++ side and libuv decide to do expensive calculations outside of the event loop entirely. So they will not block the main loop or event loop. Instead they make use of something called a thread pool that thread pool is a series of (by default) four threads that can be used for running computationally intensive tasks. There are ONLY FOUR things that use this thread pool - DNS lookup, fs, crypto and zlib. Everything else execute in the main thread.
"Of course, on the backend, there are threads and processes for DB access and process execution. However, these are not explicitly exposed to your code, so you can’t worry about them other than by knowing that I/O interactions e.g. with the database, or with other processes will be asynchronous from the perspective of each request since the results from those threads are returned via the event loop to your code. Compared to the Apache model, there are a lot less threads and thread overhead, since threads aren’t needed for each connection; just when you absolutely positively must have something else running in parallel and even then the management is handled by Node.js." via http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/
Its like using, setTimeout(function(){/*file reading code here*/},1000);. JavaScript can run multiple things side by side like, having three setInterval(function(){/*code to execute*/},1000);. So in a way, JavaScript is multi-threading. And for actually reading from/or writing to the hard drive, in NodeJS, if you use:
var child=require("child_process");
function put_text(file,text){
child.exec("echo "+text+">"+file);
}
function get_text(file){
//JQuery code for getting file contents here (i think)
return JQueryResults;
}
These can also be used for reading and writing to/from the hard drive using NodeJS.

How do I make a non-IO operation synchronous vs. asynchronous in node.js?

I know the title sounds like a dupe of a dozen other questions, and it may well be. However, I've read those dozen questions, and Googled around for awhile, and found nothing that answers these questions to my satisfaction.
This might be because nobody has answered it properly, in which case you should vote me up.
This might be because I'm dumb and didn't understand the other answers (much more likely), in which case you should vote me down.
Context:
I know that IO operations in Node.js are detected and made to run asynchronously by default. My question is about non-IO operations that still might block/run for a long time.
Say I have a function blockingfunction with a for loop that does addition or whatnot (pure CPU cycles, no IO), and a lot of it. It takes a minute or more to run.
Say I want this function to run whenever someone makes a certain request to my server.
Question:
Obviously, if I explicitly invoke this loop at the outer level in my code, everything will block until it completes.
Most suggestions I've read suggest pushing it off into the future by starting all of my other handlers/servers etc. first, and deferring invocation of the function via process.nextTick or setTimeout(blockingfunction, 0).
But won't blockingfunction1 then just block on the next spin around the execution loop? I may be wrong, but it seems like doing that would start all of my other stuff without blocking the app, but then the first time someone made the request that results in blockingfunction being called, everything would block for as long as it took to complete.
Does putting blockingfunction inside a setTimeout or process.nextTick call somehow make it coexist with future operations without blocking them?
If not, is there a way to make blockingfunction do that without rewriting it?
How do others handle this problem? A lot of the answers I've seen are to the tune of "just trust your CPU-intensive things to be fast, they will be", but this doesn't satisfy.
Absent threading (where I can be guaranteed that the execution of blockingfunction will be interleaved with the execution of whatever else is going on), should I re-write CPU-intensive/time consuming loops to use process.nextTick to perform a fixed, guaranteed-fast number of iterations per tick?
Yes, you are correct. If you defer your function until the next tick, it will just block in that tick rather than the current one.
Unfortunately, there is no magic here that solves this for you. While it is possible to fire up that function in another process, it might not be worth the hassle, depending on what you're doing.
I recommend re-writing your function in such a way that work happens for a bit, and then continues on the next tick. Node ticks are very efficient... you could call them every iteration of a decent sized loop if needed, without a whole ton of overhead. Of course, you would have to profile it in your code to see what the impact is.
Yes, a blocking function will keep blocking even if you run it process.nextTick.
Some options:
If it truly takes a while, then perhaps it should be spun out to a queue where you can have a dedicated worker process handle it.
1a. Node.js has a child-process flavor specifically for forking other node.js files with a built in communication channel. So e.g. you can create one (or several) thread that handles these requests in order, then responds and hits the callback. See: http://nodejs.org/api/child_process.html#child_process_child_process_fork_modulepath_args_options
You can break up the blockingFunction into chunks that run in a loop. Have it call every X iterations with process.nextTick to make way for other events to be handled.

Does use of recursive process.nexttick let other processes or threads work?

Technically when we execute the following code(recursive process.nexttick), the CPU usage would get to 100% or near. The question is the imagining that I'm running on a machine with one CPU and there's another process of node HTTP server working, how does it affect it?
Does the thread doing recursive process.nexttick let the HTTP server work at all?
If we have two threads of recursive process.nexttick, do they both get 50% share?
Since I don't know any machine with one core cannot try it. And since my understanding of time sharing of the CPU between threads is limited in this case, I don't how should try it with machines that have 4 cores of CPU.
function interval(){
process.nextTick(function(){
someSmallSyncCode();
interval();
})
}
Thanks
To understand whats going on here, you have to understand a few things about node's event loop as well as the OS and CPU.
First of all, lets understand this code better. You call this recursive, but is it?
In recursion we normally think of nested call stacks and then when the computation is done (reaching a base case), the stack "unwinds" back to the point of where our recursive function was called.
While this is a method that calls itself (indirectly through a callback), the event loop skews what is actually going on.
process.nextTick takes a function as a callback and puts it first at the list of stuff to be done on the next go-around of the event loop. This callback is then executed and when it is done, you once again register the same callback. Essentially, the key difference between this and true recursion is that our call stack never gets more than one call deep. We never "unwind" the stack, we just have lots of small short stacks in succession.
Okay, so why does this matter?
When we understand the event loop better and what is really going on, we can better understand how system resources are used. By using process.nextTick in this fashion, you are assuring there is ALWAYS something to do on the event loop, which is why you get high cpu usage (but you knew that already). Now, if we were to suppose that your HTTP server were to run in the SAME process as the script, such as below
function interval(){
process.nextTick(doIntervalStuff)
}
function doIntervalStuff() {
someSmallSyncCode();
interval();
}
http.createServer(function (req, res) {
doHTTPStuff()
}).listen(1337, "127.0.0.1");
then how does the CPU usage get split up between the two different parts of the program? Well thats hard to say, but if we understand the event loop, we can at least guess.
Since we use process.nextTick, the doIntervalStuff function will be run every time at the "start" of the event loop, however, if there is something to be done for the HTTP server (like handle a connection) then we know that will get done before the next time the event loop starts, and remember, due to the evented nature of node, this could be handling any number of connections on one iteration of the event loop. What this implies, is that at least in theory, each function in the process gets what it "needs" as far as CPU usage, and then the process.nextTick functions uses the rest. While this isn't exactly true (for example, your bit of blocking code would mess this up), it is a good enough model to think about.
Okay now (finally) on to your real question, what about separate processes?
Interestingly enough, the OS and CPU are often times also very "evented" in nature. Whenever a processes wants to do something (like in the case of node, start an iteration of the event loop), it makes a request to the OS to be handled, the OS then shoves this job in a ready queue (which is prioritized) and it executes when the CPU scheduler decides to get around to it. This is once again is an oversimplified model, but the core concept to take away is that much like in node's event loop, each process gets what it "needs" and then a process like your node app tries to execute whenever possible by filling in the gaps.
So when your node processes says its taking 100% of cpu, that isn't accurate, otherwise, nothing else would ever be getting done and the system would crash. Essentially, its taking up all the CPU it can but the OS still determines other stuff to slip in.
If you were to add a second node process that did the same process.nextTick, the OS would try to accommodate both processes and, depending on the amount of work to be done on each node process's event loop, the OS would split up the work accordingly (at least in theory, but in reality would probably just lead to everything slowing down and system instability).
Once again, this is very oversimplified, but hopefully it gives you an idea of what is going on. That being said, I wouldn't recommend using process.nextTick unless you know you need it, if doing something every 5 ms is acceptable, using a setTimeout instead of process.nextTick will save oodles on cpu usage.
Hope that answers your question :D
No. You don't have to artificially pause your processes to let others do their work, your operating system has mechanisms for that. In fact, using process.nextTick in this way will slow your computer down because it has a lot of overhead.

Resources