I have a very basic question in node.js programming .
I am finding it interesting as I have started understanding it deeper.
I came across the following code :
Abc.do1(‘operation’,2, function(err,res1)
{
If(err) {
Callback(err);
}
def.do2(‘operation2’,function(l) {
}
}
My question is :
since def.do2 is written in the callback of abc.do1 ,
is it true that def.do2 will be executed after the ‘operation’ of abc.do1 is completed and the callback function gets called . If yes, is this a good programming practice , because we speak only asynchronous and non blocking code in node.js
Yes, you're correct. def.do2() is executed after abc.do1() is done. However, this done on purpose to make sure that do1() is done before do2() can start. If it's meant to be done in parallel, then do2() would have been outside of do1()'s callback. This code is not exactly blocking. after do1() is started, the code is still continuing on to execute everything below do1() functon (outside of do1's callback), just not do2() until do1() is done which is meant to be.
Yes you are absolutely correct and you have given a correct example of callback function.
The main browser process is a single threaded event loop. If you execute a long-running operation within a single-threaded event loop, the process "blocks". This is bad because the process stops processing other events while waiting for your operation to complete. 'alert' is one of the few blocking browser methods: if you call alert('test'), you can no longer click links, perform ajax queries, or interact with the browser UI.
In order to prevent blocking on long-running operations, the XMLHttpRequest provides an asynchronous interface. You pass it a callback to run after the operation is complete, and while it is processing it cedes control back to the main event loop instead of blocking.
There's no reason to use a callback unless you want to bind something to an event handler, or your operation is potentially blocking and therefore requires an asynchronous programming interface.
See this video
http://www.yuiblog.com/blog/2010/08/30/yui-theater-douglas-crockford-crockford-on-javascript-scene-6-loopage-52-min/
Related
If I understand correctly, the EventLoop is the mechanism that node uses to resolve asynchronous operations and then passing them to the call stack, right? My question is, when I use a synchronous method (for example pbkdf2Sync) it will get stuck in the call stack until it is finished but it won't be moved to the EventLoop because is not an async operation, so why is this known as blocking the event loop if in reality it is blocking everything? not JUST the event loop (that as far as I understand, can continue working and will pass the callbacks to the call stack when they're finished)
Is my understanding of the NodeJs inner workins completly wrong? This topic specifically is kinda hard to understand because every resource I read differs in some way or another, so even if I think I get the bigger picture, these are the "details" that confuse me.
What is blocking
Why using sync functions in nodejs is known as "blocking the event loop"?
In a nutshell, it's because the event loop can only process the next event when you return from whatever your current Javascript is doing and allow the event loop to look for the next event. A sync function blocks the interpreter until it finishes. So, the entire time a sync function is working and you're waiting for it to return, the interpreter is blocked and control is not returned back to the event loop. This blocks the event loop and also blocks your Javascript from running.
Single Thread
Nodejs runs your Javascript all with a single thread. Other threads are used internally, but your Javascript itself runs only in a single thread (we're assuming there is no use of WorkerThreads in your code). So, when you make a synchronous function call, that single thread that runs your Javascript is busy and blocked until the synchronous function call returns and can then continue executing more of your Javascript.
This blocks everything. It blocks running more of your Javascript after the synchronous function call and it blocks getting back to the event loop to run any other event handlers that are pending such as incoming network events, timers, completion events from other things such as disk I/O, etc... So, while this is blocking the event loop, it's also blocking running more of your own code after the function call.
Non-Blocking, Asynchronous Operations
On the other hand, asynchronous functions such as fs.readFile(), for example, don't block. They initiate their operation and return immediately. This allows the interpreter to continue running any more of your own Javascript after the call to fs.readFile() and it also allows you to return from whatever event triggered your work in the first place which will return control back to the event loop so it can service other waiting events or other events that will trigger in the future. fs.readFile() then does most of its work in native code (behind the scenes) outside of the main thread that runs your Javascript. So, these type of asynchronous functions don't block the event loop - instead they cooperate with the event loop so that other things can get run while waiting for the completion of the asynchronous operation that was previously initiated. When they complete, they insert an event into the event loop that causes the event loop to call the completion callback at it's earliest convenience (when it's not blocked).
Differences in Blocking
It's also worth noting that functions that represent both synchronous and asynchronous operations both block the execution of your Javascript and block the event loop until they return. The difference is that an asynchronous operation returns from the function nearly immediately, long before the asynchronous operation itself is complete and communicates its completion and/or eventual result back via a promise, callback or event (which are all callbacks at the lowest level of the event loop). The synchronous operation does not return until the operation itself is complete. So, the asynchronous operation only blocks for a very short duration while the operation is being initiated whereas the synchronous operation blocks for the entire duration of the operation (until it completes).
More About the Event Loop
So, in what moment during the event loop is my javascript code executed?
When control returns back to the event loop, it goes through several different phases looking for things to do. When it finds something to do, that "something" results in calling a Javascript callback that starts running some of your Javascript. For example if the "something to do" is a setTimeout() timer that is ready to fire, then it will call the Javascript callback that was passed to setTimeout(). That callback runs to its completion and only when your Javascript returns from that callback does the event loop regain control and get to look for the next event to run and call its callback.
it won't be moved to the EventLoop because is not an async operation
This is not really the correct way to think about things. Things are not really "moved to the event loop".
A synchronous operation is just a blocking function call that returns when it returns and execution of any other Javascript is blocked until that blocking function call returns. Things are blocked because the single threaded interpreter running your Javascript is stuck waiting for this function to finish. It can't do anything else and the event loop is also blocked because it can't do anything until the interpreter returns control back to the event loop.
An asynchronous operation, on the other hand, initiates some operation (let's say it issues an http request to some other host) and then immediately returns, long before it has the result of that http request. Since this asynchronous operation returns before it has its result, it is considered non-blocking and because it returns quickly, you can then return from whatever event caused your code to run and that will then return control back to the event loop. That allows the event loop to then look for other events to handle and run their corresponding callbacks. Meanwhile, the asynchronous operation that was previously started has some native code associated with it (that may or may not be running in an native code OS thread - depending upon what type of operation it is). But regardless, that native code is configured such that when the asynchronous operation completes, it will insert an event into the appropriate event queue. So, at some future point when nodejs has control back in the event loop, it will find that event and run the Javascript callback associated with that event, thus notifying the original Javascript code that the asynchronous operation is now complete and providing some sort of result or error code.
Example
As a simple example, let's say you run this code:
// timer that wants to fire in 1 second
setTimeout(function() {
console.log("timer fired")
}, 1000);
// loop that blocks for 5 seconds
const start = Date.now();
while (Date.now() - start < 5000) { }
console.log("blocking loop finished");
This will output:
blocking loop finished
timer fired
Even though the timer was set to run 1 second from now, the while() loop blocked everything for 5 seconds so it wasn't until after the while() loop finished and returned back to the system that the event loop could look at what it had to do next and call the callback associated with the timer.
This while loop is similar to a blocking function such as pbkdf2Sync(). Both block the interpreter and don't return until they are finished and therefore nothing in the event loop gets a chance to run until they are finished.
A Simple Analogy
Here's a simple analogy. Imagine you need to contact your cable company to troubleshoot a problem. There are two ways for you to get ahold of them. The first way is that you call them and sit on the phone on hold for an hour waiting for a customer service representative to pick up your call. You can't really do much else while you're sitting on hold because you have to be right there waiting and ready to respond when someone finally comes on the line to help you. Thus, you are "blocked" from doing a lot of other things. This is a "blocking, synchronous operation". You can't really do much else while you're waiting on hold.
The second way you can call the cable company is to request a callback sometime in the future and when a representative is available, they will call you back. As soon as you're done requesting the callback, you can go about your business doing other things. You have to be able to answer an incoming call when it arrives, but other than that you're not blocked from doing many other things. This is a "non-blocking, asynchronous operation".
But, if while you're supposed to be able to answer your telephone callback from the second scenario above, you make another call and are on the phone for 30 minutes, the cable company is blocked from reaching you on the phone for the duration of your other call. In our analogy, you are "blocking the event loop" during your other call as the incoming event from the cable company can't be processed while you're on a call.
I am doing an IO wait operation inside a for loop now the thing is when all of the operations terminates I want to send the response to the server. Now I was just wondering that suppose two IO operation terminates exactly at the same time now can they execute code at the same time(parallel) or will they execute serially?
As far as I know, as Node is Concurrent but not the Parallel language so I don't think they will execute at the same time.
node.js runs Javascript with a single thread. That means that two pieces of Javascript can never be running at the exact same moment.
node.js processes I/O completion using an event queue. That means when an I/O operation completes, it places an event in the event queue and when that event gets to the front of the event queue and the JS interpreter has finished whatever else it was doing, then it will pull that event from the event queue and call the callback associated with it.
Because of this process, even if two I/O operations finish at basically the same moment, one of them will put its completion event into the event queue before the other (access to the event queue internally is likely controlled by a mutex so one will get the mutex before the other) and that one's completion callback will get into the event queue first and then called before the other. The two completion callbacks will not run at the exact same time.
Keep in mind that more than one piece of Javascript can be "in flight" or "in process" at the same time if it contains non-blocking I/O operations or other asynchronous operations. This is because when you "wait" for an asynchronous operation to complete in Javscript, you return control back to the system and you then resume processing only when your completion callback is called. While the JS interpreter is waiting for an asynchronous I/O operation to complete and the associated callback to be called, then other Javascript can run. But, there's still only one piece of Javascript actually ever running at a time.
As far as I know, as Node is Concurrent but not the Parallel language so I don't think they will execute at the same time.
Yes, that's correct. That's not exactly how I'd describe it since "concurrent" and "parallel" don't have strict technical definitions, but based on what I think you mean by them, that is correct.
you can use Promise.all :
let promises = [];
for(...)
{
promises.push(somePromise); // somePromise represents your IO operation
}
Promise.all(promises).then((results) => { // here you send the response }
You don't have to worry about the execution order.
Node.js is designed to be single thread. So basically there is no way that 'two IO operation terminates exactly at the same time' could happen. They will just finish one by one.
Assume makeBurger() will take 10 seconds
In synchronous program,
function serveBurger() {
makeBurger();
makeBurger();
console.log("READY") // Assume takes 5 seconds to log.
}
This will take a total of 25 seconds to execute.
So for NodeJs lets say we make an async version of makeBurgerAsync() which also takes 10 seconds.
function serveBurger() {
makeBurgerAsync(function(count) {
});
makeBurgerAsync(function(count) {
});
console.log("READY") // Assume takes 5 seconds to log.
}
Since it is a single thread. I have troubling imagine what is really going on behind the scene.
So for sure when the function run, both async functions will enter event loops and console.log("READY") will get executed straight away.
But while console.log("READY") is executing, no work is really done for both async function right? Since single thread is hogging console.log for 5 seconds.
After console.log is done. CPU will have time to switch between both async so that it can run a bit of each function each time.
So according to this, the function doesn't necessarily result in faster execution, async is probably slower due to switching between event loop? I imagine that, at the end of the day, everything will be spread on a single thread which will be the same thing as synchronous version?
I am probably missing some very big concept so please let me know. Thanks.
EDIT
It makes sense if the asynchronous operations are like query DB etc. Basically nodejs will just say "Hey DB handle this for me while I'll do something else". However, the case I am not understanding is the self-defined callback function within nodejs itself.
EDIT2
function makeBurger() {
var count = 0;
count++; // 1 time
...
count++; // 999999 times
return count;
}
function makeBurgerAsync(callback) {
var count = 0;
count++; // 1 time
...
count++; // 999999 times
callback(count);
}
In node.js, all asynchronous operations accomplish their tasks outside of the node.js Javascript single thread. They either use a native code thread (such as disk I/O in node.js) or they don't use a thread at all (such as event driven networking or timers).
You can't take a synchronous operation written entirely in node.js Javascript and magically make it asynchronous. An asynchronous operation is asynchronous because it calls some function that is implemented in native code and written in a way to actually be asynchronous. So, to make something asynchronous, it has to be specifically written to use lower level operations that are themselves asynchronous with an asynchronous native code implementation.
These out-of-band operations, then communicate with the main node.js Javascript thread via the event queue. When one of these asynchronous operations completes, it adds an event to the Javascript event queue and then when the single node.js thread finishes what it is currently doing, it grabs the next event from the event queue and calls the callback associated with that event.
Thus, you can have multiple asynchronous operations running in parallel. And running 3 operations in parallel will usually have a shorter end-to-end running time than running those same 3 operations in sequence.
Let's examine a real-world async situation rather than your pseudo-code:
function doSomething() {
fs.readFile(fname, function(err, data) {
console.log("file read");
});
setTimeout(function() {
console.log("timer fired");
}, 100);
http.get(someUrl, function(err, response, body) {
console.log("http get finished");
});
console.log("READY");
}
doSomething();
console.log("AFTER");
Here's what happens step-by-step:
fs.readFile() is initiated. Since node.js implements file I/O using a thread pool, this operation is passed off to a thread in node.js and it will run there in a separate thread.
Without waiting for fs.readFile() to finish, setTimeout() is called. This uses a timer sub-system in libuv (the cross platform library that node.js is built on). This is also non-blocking so the timer is registered and then execution continues.
http.get() is called. This will send the desired http request and then immediately return to further execution.
console.log("READY") will run.
The three asynchronous operations will complete in an indeterminate order (whichever one completes it's operation first will be done first). For purposes of this discussion, let's say the setTimeout() finishes first. When it finishes, some internals in node.js will insert an event in the event queue with the timer event and the registered callback. When the node.js main JS thread is done executing any other JS, it will grab the next event from the event queue and call the callback associated with it.
For purposes of this description, let's say that while that timer callback is executing, the fs.readFile() operation finishes. Using it's own thread, it will insert an event in the node.js event queue.
Now the setTimeout() callback finishes. At that point, the JS interpreter checks to see if there are any other events in the event queue. The fs.readfile() event is in the queue so it grabs that and calls the callback associated with that. That callback executes and finishes.
Some time later, the http.get() operation finishes. Internal to node.js, an event is added to the event queue. Since there is nothing else in the event queue and the JS interpreter is not currently executing, that event can immediately be serviced and the callback for the http.get() can get called.
Per the above sequence of events, you would see this in the console:
READY
AFTER
timer fired
file read
http get finished
Keep in mind that the order of the last three lines here is indeterminate (it's just based on unpredictable execution speed) so that precise order here is just an example. If you needed those to be executed in a specific order or needed to know when all three were done, then you would have to add additional code in order to track that.
Since it appears you are trying to make code run faster by making something asynchronous that isn't currently asynchronous, let me repeat. You can't take a synchronous operation written entirely in Javascript and "make it asynchronous". You'd have to rewrite it from scratch to use fundamentally different asynchronous lower level operations or you'd have to pass it off to some other process to execute and then get notified when it was done (using worker processes or external processes or native code plugins or something like that).
I want to add some admin utilities to a little Web app, such as "Backup Database". The user will click on a button and the HTTP response will return immediately, although the potentially long-running process has been started in the background.
In Java this would probably be implemented by spawning an independent thread, in Scala by using an Actor. But what's an appropriate idiom in node.js? (code snippet appreciated)
I'm now re-reading the docs, this really does seem a node 101 question but that's pretty much where I am on this...anyhow, to clarify this is the basic scenario :
function onRequest(request, response) {
doSomething();
response.writeHead(202, headers);
response.end("doing something");
}
function doSomething(){
// long-running operation
}
I want the response to return immediately, leaving doSomething() running in the background.
Ok, given the single-thread model of node that doesn't seem possible without spawning another OS-level ChildProcess. My misunderstanding.
In my code what I need for backup is mostly I/O based, so node should handle that in a nice async fashion. What I think I'll do is shift the doSomething to after the response.end, see how that behaves.
As supertopi said you could have a look at Child process. But I think it will hurt the performance of your server, if this happens a lot sequentially. Then I think you should queue them instead. I think you should have a look at asynchronous message queues to process your jobs offline(distributed). Some(just to name two) example of message queues are beanstalkd, gearman.
I don't see the problem. All you need to do is have doSomething() start an asynchronous operation. It'll return immediately, your onRequest will write the response back, and the client will get their "OK, I started" message.
function doSomething() {
openDatabaseConnection(connectionString, function(conn) {
// This is called some time later, once the connection is established.
// Now you can tell the database to back itself up.
});
}
doSomething won't just sit there until the database connection is established, or wait while you tell it to back up. It'll return right away, having registered a callback that will run later. Behind the scenes, your database library is probably creating some threads for you, to make the async work the way it should, but your code doesn't need to worry about it; you just return right away, send the response to the client right away, and the asynchronous code keeps running asynchronously.
(It's actually more work to make this run synchronously -- you would have to pass your response object into doSomething, and have doSomething do the response.end call inside the innermost callback, after the backup is done. Of course, that's not what you want to do here; you want to return immediately, which is exactly what your code will do.)
Does asynchronous call always create a new thread?
Example:
If JavaScript is single threaded then how can it do an async postback? Is it actually blocking until it gets a callback? If so, is this really an async call?
This is an interesting question.
Asynchronous programming is a paradigm of programming that is principally single threaded, i.e. "following one thread of continuous execution".
You refer to javascript, so lets discuss that language, in the environment of a web browser. A web browser runs a single thread of javascript execution in each window, it handles events (such as onclick="someFunction()") and network connections (such as xmlhttprequest calls).
<script>
function performRequest() {
xmlhttp.open("GET", "someurl", true);
xmlhttp.onreadystatechange = function() {
if (xmlhttp.readyState == 4) {
alert(xmlhttp.responseText);
}
}
xmlhttp.send(sometext);
}
</script>
<span onclick="performRequest()">perform request</span>
(This is a nonworking example, for demonstration of concepts only).
In order to do everything in an asynchronous manner, the controlling thread has what is known as a 'main loop'. A main loop looks kind of like this:
while (true) {
event = nextEvent(all_event_sources);
handler = findEventHandler(event);
handler(event);
}
It is important to note that this is not a 'busy loop'. This is kind of like a sleeping thread, waiting for activity to occur. Activity could be input from the user (Mouse Movement, a Button Click, Typing), or it could be network activity (The response from the server).
So in the example above,
When the user clicks on the span, a ButtonClicked event would be generated, findEventHandler() would find the onclick event on the span tag, and then that handler would be called with the event.
When the xmlhttp request is created, it is added to the all_event_sources list of event sources.
After the performRequest() function returns, the mainloop is waiting at the nextEvent() step waiting for a response. At this point there is nothing 'blocking' further events from being handled.
The data comes back from the remote server, nextEvent() returns the network event, the event handler is found to be the onreadystatechange() method, that method is called, and an alert() dialog fires up.
It is worth noting that alert() is a blocking dialog. While that dialog is up, no further events can be processed. It's an eccentricity of the javascript model of web pages that we have a readily available method that will block further execution within the context of that page.
The Javascript model is single-threaded. An asynchronous call is not a new thread, but rather interrupts an existing thread. It's analogous to interrupts in a kernel.
Yes it makes sense to have asynchronous calls with a single thread. Here's how to think about it: When you call a function within a single thread, the state for the current method is pushed onto a stack (i.e. local variables). The subroutine is invoked and eventually returns, at which time the original state is popped off the stack.
With an asynchronous callback, the same thing happens! The difference is that the subroutine is invoked by the system, not by the current code invoking a subroutine.
A couple notes about JavaScript in particular:
XMLHttpRequests are non-blocking by default. The send() method returns immediately after the request has been relayed to the underlying network stack. A response from the server will schedule an invocation of your callback on the event loop as discussed by the other excellent answers.
This does not require a new thread. The underlying socket API is selectable, similar to java.nio.channels in Java.
It's possible to construct synchronous XMLHttpRequest objects by passing false as the third parameter to open(). This will cause the send() method to block until a response has been received from the server, thus placing the event loop at the mercy of network latency and potentially hanging the browser until network timeout. This is a Bad Thing™.
Firefox 3.5 will introduce honest-to-god multithreaded JavaScript with the Worker class. The background code runs in a completely separate environment and communicates with the browser window by scheduling callbacks on the event loop.
In many GUI applications, an async call (like Java's invokeLater) merely adds the Runnable object to its GUI thread queue. The GUI thread is already created, and it doesn't create a new thread. But threads aren't even strictly required for an asynchronous system. Take, for example, libevent, which uses select/poll/kqueue, etc. to make non-blocking calls to sockets, which then fires callbacks to your code, completely without threads.
No, but more than one thread will be involved.
An asynchronous call might launch another thread to do the work, or it might post a message into a queue on another, already running thread. The caller continues and the callee calls back once it processes the message.
If you wanted to do a synchronous call in this context, you'd need to post a message and actively wait for the callback to happen.
So in summary: More than one thread will be involved, but it doesn't necessarily create a new thread.
I don't know about javascript, but for instance in the Windows Forms world, asynchronous invocations can be made without multiple threads. This has to do with the way the Windows Message Pump operates. Basically a Windows Forms application sets up a message queue through which Windows places messages notifying it about events. For instance, if you move the mouse, messages will be placed on that queue. The Windows Forms application will be in an endless loop consuming all the messages that are thrown at it. According to what each message contains it will move windows around, repaint them or even invoke user-defined methods, amongst other things. Calls to methods are identified by delegates. When the application finds a delegate instance in the queue, it happily invokes the method referred by the delegate.
So, if you are in a method doing something and want to spawn some asynchronous work without creating a new thread, all you have to do is place a delegate instance into the queue, using the Control.BeginInvoke method. Now, this isn't actually multithreaded, but if you throw very small pieces of work to the queue, it will look like multithreaded. If, on the other hand you give it a time consuming method to execute, the application will freeze until the method is done, which will look like a jammed application, even though it is doing something.