Will condition block main thread in Node.js?

Will condition block main thread in Node.js? - node.js

I use socket.io in Node.js.
var rooms = {"a", "b"};
io.on('connection', function(client) {
socket.on('room', function(room) {
if(room in rooms){
socket.join(room);
}
});
});
...Code below...
Will block condition the main thread and process below:
if(room in rooms){
socket.join(room);
}
I mean, will code below wait the process upper due condition if?

There's nothing in your code that waits or blocks the main thread. Both io.on() and socket.on() are just installing event handlers. The rest of your Javascript continues to run and those event handlers will be called some time in the future when the event they are associated with occurs.
The conditional:
if(room in rooms){
is evaluated at the moment that code runs and immediately executes. It doesn't wait for anything or block.
We could probably help you better if you told us what the actual problem is that you're trying to solve.

Related

nodejs express . For loop is blocking my simple server [duplicate]

The following example is given in a Node.js book:
var open = false;
setTimeout(function() {
open = true
}, 1000)
while (!open) {
console.log('wait');
}
console.log('open sesame');
Explaining why the while loop blocks execution, the author says:
Node will never execute the timeout callback because the event loop is
stuck on this while loop started on line 7, never giving it a chance
to process the timeout event!
However, the author doesn't explain why this happens in the context of the event loop or what is really going on under the hood.
Can someone elaborate on this? Why does node get stuck? And how would one change the above code, whilst retaining the while control structure so that the event loop is not blocked and the code will behave as one might reasonably expect; wait
will be logged for only 1 second before the setTimeout fires and the process then exits after logging 'open sesame'.
Generic explanations such as the answers to this question about IO and event loops and callbacks do not really help me rationalise this. I'm hoping an answer which directly references the above code will help.

It's fairly simple really. Internally, node.js consists of this type of loop:
Get something from the event queue
Run whatever task is indicated and run it until it returns
When the above task is done, get the next item from the event queue
Run whatever task is indicated and run it until it returns
Rinse, lather, repeat - over and over
If at some point, there is nothing in the event queue, then go to sleep until something is placed in the event queue or until it's time for a timer to fire.
So, if a piece of Javascript is sitting in a while() loop, then that task is not finishing and per the above sequence, nothing new will be picked out of the event queue until that prior task is completely done. So, a very long or forever running while() loop just gums up the works. Because Javascript only runs one task at a time (single threaded for JS execution), if that one task is spinning in a while loop, then nothing else can ever execute.
Here's a simple example that might help explain it:
var done = false;
// set a timer for 1 second from now to set done to true
setTimeout(function() {
done = true;
}, 1000);
// spin wait for the done value to change
while (!done) { /* do nothing */}
console.log("finally, the done value changed!");
Some might logically think that the while loop will spin until the timer fires and then the timer will change the value of done to true and then the while loop will finish and the console.log() at the end will execute. That is NOT what will happen. This will actually be an infinite loop and the console.log() statement will never be executed.
The issue is that once you go into the spin wait in the while() loop, NO other Javascript can execute. So, the timer that wants to change the value of the done variable cannot execute. Thus, the while loop condition can never change and thus it is an infinite loop.
Here's what happens internally inside the JS engine:
done variable initialized to false
setTimeout() schedules a timer event for 1 second from now
The while loop starts spinning
1 second into the while loop spinning, the timer is ready to fire, but it won't be able to actually do anything until the interpreter gets back to the event loop
The while loop keeps spinning because the done variable never changes. Because it continues to spin, the JS engine never finishes this thread of execution and never gets to pull the next item from the event queue or run the pending timer.
node.js is an event driven environment. To solve this problem in a real world application, the done flag would get changed on some future event. So, rather than a spinning while loop, you would register an event handler for some relevant event in the future and do your work there. In the absolute worst case, you could set a recurring timer and "poll" to check the flag ever so often, but in nearly every single case, you can register an event handler for the actual event that will cause the done flag to change and do your work in that. Properly designed code that knows other code wants to know when something has changed may even offer its own event listener and its own notification events that one can register an interest in or even just a simple callback.

This is a great question but I found a fix!
var sleep = require('system-sleep')
var done = false
setTimeout(function() {
done = true
}, 1000)
while (!done) {
sleep(100)
console.log('sleeping')
}
console.log('finally, the done value changed!')
I think it works because system-sleep is not a spin wait.

There is another solution. You can get access to event loop almost every cycle.
let done = false;
setTimeout(() => {
done = true
}, 5);
const eventLoopQueue = () => {
return new Promise(resolve =>
setImmediate(() => {
console.log('event loop');
resolve();
})
);
}
const run = async () => {
while (!done) {
console.log('loop');
await eventLoopQueue();
}
}
run().then(() => console.log('Done'));

Node is a single serial task. There is no parallelism, and its concurrency is IO bound. Think of it like this: Everything is running on a single thread, when you make an IO call that is blocking/synchronous your process halts until the data is returned; however say we have a single thread that instead of waiting on IO(reading disk, grabbing a url, etc) your task continues on to the next task, and after that task is complete it checks that IO. This is basically what node does, its an "event-loop" its polling IO for completion(or progress) on a loop. So when a task does not complete(your loop) the event loop does not progress. To put it simply.

because timer needs to comeback and is waiting loop to finish to add to the queue, so although the timeout is in a separate thread, and may indeed finsihed the timer, but the "task" to set done = true is waiting on that infinite loop to finish

var open = false;
const EventEmitter = require("events");
const eventEmitter = new EventEmitter();
setTimeout(function () {
open = true;
eventEmitter.emit("open_var_changed");
}, 1000);
let wait_interval = setInterval(() => {
console.log("waiting");
}, 100);
eventEmitter.on("open_var_changed", () => {
clearInterval(wait_interval);
console.log("open var changed to ", open);
});
this exemple works and you can do setInterval and check if the open value changed inside it and it will work

while loop in sync function in node js [duplicate]

The following example is given in a Node.js book:
var open = false;
setTimeout(function() {
open = true
}, 1000)
while (!open) {
console.log('wait');
}
console.log('open sesame');
Explaining why the while loop blocks execution, the author says:
Node will never execute the timeout callback because the event loop is
stuck on this while loop started on line 7, never giving it a chance
to process the timeout event!
However, the author doesn't explain why this happens in the context of the event loop or what is really going on under the hood.
Can someone elaborate on this? Why does node get stuck? And how would one change the above code, whilst retaining the while control structure so that the event loop is not blocked and the code will behave as one might reasonably expect; wait
will be logged for only 1 second before the setTimeout fires and the process then exits after logging 'open sesame'.
Generic explanations such as the answers to this question about IO and event loops and callbacks do not really help me rationalise this. I'm hoping an answer which directly references the above code will help.

This is a great question but I found a fix!
var sleep = require('system-sleep')
var done = false
setTimeout(function() {
done = true
}, 1000)
while (!done) {
sleep(100)
console.log('sleeping')
}
console.log('finally, the done value changed!')
I think it works because system-sleep is not a spin wait.

There is another solution. You can get access to event loop almost every cycle.
let done = false;
setTimeout(() => {
done = true
}, 5);
const eventLoopQueue = () => {
return new Promise(resolve =>
setImmediate(() => {
console.log('event loop');
resolve();
})
);
}
const run = async () => {
while (!done) {
console.log('loop');
await eventLoopQueue();
}
}
run().then(() => console.log('Done'));

Node is a single serial task. There is no parallelism, and its concurrency is IO bound. Think of it like this: Everything is running on a single thread, when you make an IO call that is blocking/synchronous your process halts until the data is returned; however say we have a single thread that instead of waiting on IO(reading disk, grabbing a url, etc) your task continues on to the next task, and after that task is complete it checks that IO. This is basically what node does, its an "event-loop" its polling IO for completion(or progress) on a loop. So when a task does not complete(your loop) the event loop does not progress. To put it simply.

because timer needs to comeback and is waiting loop to finish to add to the queue, so although the timeout is in a separate thread, and may indeed finsihed the timer, but the "task" to set done = true is waiting on that infinite loop to finish

var open = false;
const EventEmitter = require("events");
const eventEmitter = new EventEmitter();
setTimeout(function () {
open = true;
eventEmitter.emit("open_var_changed");
}, 1000);
let wait_interval = setInterval(() => {
console.log("waiting");
}, 100);
eventEmitter.on("open_var_changed", () => {
clearInterval(wait_interval);
console.log("open var changed to ", open);
});
this exemple works and you can do setInterval and check if the open value changed inside it and it will work

Events & Event Emitter in Node.js

So I've attached two events if someone screams, they are called synchronously, why so?
const events = require('events');
const eventEmitter = new events.EventEmitter();
eventEmitter.on('scream', function() {
console.log("Screaming");
});
eventEmitter.on('scream', function(name) {
console.log(name+" is screaming");
});
eventEmitter.emit('scream', 'Bob');
O/P:
Screaming
Bob is screaming

Because in nodejs, The event loop is single threaded and pick one event at a time and treat those events independently.
In your case, there are two event handler with the same name, so when event loop gets the eventEmitter.emit('scream', 'Bob') it sends the particular event handler.
When first event handler done with it, Now it goes to the second handler because with the same name.
It follow the FIFO but if you use emitter.prependListener(eventName, listener) then it will be executed first the FIFO.
You should know, if you want to call only one time then you should use eventEmitter.once('scream') It will be called only one time.
eventEmitter.once('scream', function() {
console.log("Screaming");
});
eventEmitter.emit('scream', 'Bob');
eventEmitter.emit('scream', 'Bob');
eventEmitter.emit('scream', 'Bob');
Output: Screaming // Only one time.

Because the Event loop fetches events from Event Queue and sends them to call stack one by one.
And Event Queue is FIFO (First-In-First-Out)

How to execute an async task with socket.io and node.js?

When I receive an "on" event on the server side, I want to start a task in parallel so it does not block the current event loop thread. Is it possible to do so? How?
I don't want to block the server side loop and I want to be able to send back a message to the client once the task is done, something such as:
client.on('execute-parallel-task', function(msg) {
setTimeout(function() {
// do something that takes a while
client.emit('finished-that-task');
},0);
// this block should return asap, not waiting for the previous call
});
I am not sure if setTimeout will do the job.

It depends what the takes a while is. If it takes a while asynchronously (you can tell because you'll have to register a callback or complete handler), and takes a while because it's blocked on something like IO, rather than CPU bound, it'll inherently be parallel.
If however, its something synchronous or CPU bound, whilst you can use setTimeout, setImmediate etc. to send back a message immediately, once the handler for setTimeout or setImmediate executes, your single thread of execution will be stuck handling that; you're not really fixing the problem, merely deferring it.
To exhibit true parallel behaviour, you'll need to launch a child process. You can use the message passing functionality to notify your worker what work to do, and to notify the parent process once the work is complete.
var cp = require('child_process');
var child = cp.fork(__dirname + '/my-child-worker.js');
n.on('message', function(m) {
if (m === "done") {
// Whey!
}
});
n.send(/* Job id, or something */);
Then in my-child-worker.js;
process.on('message', function (m) {
switch (m) {
case 'get-x':
// blah
break;
// other jobs
}
process.send('done');
});

you do not need the setTimeout.
Your function(msg) will be called once the execute parallel task finishes.
if you are designing a task to run in an async manner, you can look at something like the async lib for node.js
Async Node JS Link

Run NodeJS event loop / wait for child process to finish

I first tried a general description of the problem, then some more detail why the usual approaches don't work. If you would like to read these abstracted explanations go on. In the end I explain the greater problem and the specific application, so if you would rather read that, jump to "Actual application".
I am using a node.js child-process to do some computationally intensive work. The parent process does it's work but at some point in the execution it reaches a point where it must have the information from the child process before continuing. Therefore, I am looking for a way to wait for the child-process to finish.
My current setup looks somewhat like this:
importantDataCalculator = fork("./runtime");
importantDataCalculator.on("message", function (msg) {
if (msg.type === "result") {
importantData = msg.data;
} else if (msg.type === "error") {
importantData = null;
} else {
throw new Error("Unknown message from dataGenerator!");
}
});
and somewhere else
function getImportantData() {
while (importantData === undefined) {
// wait for the importantDataGenerator to finish
}
if (importantData === null) {
throw new Error("Data could not be generated.");
} else {
// we should have a proper data now
return importantData;
}
}
So when the parent process starts, it executes the first bit of code, spawning a child process to calculate the data and goes on doing it's own bit of work. When the time comes that it needs the result from the child process to continue it calls getImportantData(). So the idea is that getImportantData() blocks until the data is calculated.
However, the way I used doesn't work. I think this is due to me preventing the event loop from executing by using the while-loop. And since the Event-Loop does not execute no message from the child-process can be received and thus the condition of the while-loop can not change, making it an infinite loop.
Of course, I don't really want to use this kind of while-loop. What I would rather do is tell node.js "execute one iteration of the event loop, then get back to me". I would do this repeatedly, until the data I need was received and then continue the execution where I left of by returning from the getter.
I realize that his poses the danger of reentering the same function several times, but the module I want to use this in does almost nothing on the event loop except for waiting for this message from the child process and sending out other messages reporting it's progress, so that shouldn't be a problem.
Is there way to execute just one iteration of the event loop in Node.js? Or is there another way to achieve something similar? Or is there a completely different approach to achieve what I'm trying to do here?
The only solution I could think of so far is to change the calculation in such a way that I introduce yet another process. In this scenario, there would be the process calculating the important data, a process calculating the bits of data for which the important data is not needed and a parent process for these two, which just waits for data from the two child-processes and combines the pieces when they arrive. Since it does not have to do any computationally intensive work itself, it can just wait for events from the event loop (=messages) and react to them, forwarding the combined data as necessary and storing pieces of data that cannot be combined yet.
However this introduces yet another process and even more inter-process communication, which introduces more overhead, which I would like to avoid.
Edit
I see that more detail is needed.
The parent process (let's call it process 1) is itself a process spawned by another process (process 0) to do some computationally intensive work. Actually, it just executes some code over which I don't have control, so I cannot make it work asynchronously. What I can do (and have done) is make the code that is executed regularly call a function to report it's progress and provided partial results. This progress report is then send back to the original process via IPC.
But in rare cases the partial results are not correct, so they have to be modified. To do so I need some data I can calculate independently from the normal calculation. However, this calculation could take several seconds; thus, I start another process (process 2) to do this calculation and provide the result to process 1, via an IPC message. Now process 1 and 2 are happily calculating there stuff, and hopefully the corrective data calculated by process 2 is finished before process 1 needs it. But sometimes one of the early results of process 1 needs to be corrected and in that case I have to wait for process 2 to finish its calculation. Blocking the event loop of process 1 is theoretically not a problem, since the main process (process 0) would not be be affected by it. The only problem is, that by preventing the further execution of code in process 1 I am also blocking the event loop, which prevents it from ever receiving the result from process 2.
So I need to somehow pause the further execution of code in process 1 without blocking the event loop. I was hoping that there was a call like process.runEventLoopIteration that executes an iteration of the event loop and then returns.
I would then change the code like this:
function getImportantData() {
while (importantData === undefined) {
process.runEventLoopIteration();
}
if (importantData === null) {
throw new Error("Data could not be generated.");
} else {
// we should have a proper data now
return importantData;
}
}
thus executing the event loop until I have received the necessary data but NOT continuing the execution of the code that called getImportantData().
Basically what I'm doing in process 1 is this:
function callback(partialDataMessage) {
if (partialDataMessage.needsCorrection) {
getImportantData();
// use data to correct message
process.send(correctedMessage); // send corrected result to main process
} else {
process.send(partialDataMessage); // send unmodified result to main process
}
}
function executeCode(code) {
run(code, callback); // the callback will be called from time to time when the code produces new data
// this call is synchronous, run is blocking until the calculation is finished
// so if we reach this point we are done
// the only way to pause the execution of the code is to NOT return from the callback
}
Actual application/implementation/problem
I need this behaviour for the following application. If you have a better approach to achieve this feel free to propose it.
I want to execute arbitrary code and be notified about what variables it changes, what functions are called, what exceptions occur etc. I also need the location of these events in the code to be able to display the gathered information in the UI next to the original code.
To achieve this, I instrument the code and insert callbacks into it. I then execute the code, wrapping the execution in a try-catch block. Whenever the callback is called with some data about the execution (e.g. a variable change) I send a message to the main process telling it about the change. This way, the user is notified about the execution of the code, while it is running. The location information for the events generated by these callbacks is added to the callback call during the instrumentation, so that is not a problem.
The problem appears, when an exception occurs. I also want to notify the user about exceptions in the tested code. Therefore, I wrapped the execution of the code in a try-catch and any exceptions that get out of the execution are caught and send to the user interface. But the location of the errors is not correct. An Error object created by node.js has a complete call stack so it knows where it occurred. But this location if relative to the instrumented code, so I cannot use this location information as is, to display the error next to the original code. I need to transform this location in the instrumented code into a location in the original code. To do so, after instrumenting the code, I calculate a source map to map locations in the instrumented code to locations in the original code. However, this calculation might take several seconds. So, I figured, I would start a child process to calculate the source map, while the execution of the instrumented code is already started. Then, when an exception occurs, I check whether the source map has already been calculated, and if it hasn't I wait for the calculation to finish to be able to correct the location.
Since the code to be executed and watched can be completely arbitrary I cannot trivially rewrite it to be asynchronous. I only know that it calls the provided callback, because I instrumented the code to do so. I also cannot just store the message and return to continue the execution of the code, checking back during the next call whether the source map has been finished, because continuing the execution of the code would also block the event-loop, preventing the calculated source map from ever being received in the execution process. Or if it is received, then only after the code to execute has completely finished, which could be quite late or never (if the code to execute contains an infinite loop). But before I receive the sourceMap I cannot send further updates about the execution state. Combined, this means I would only be able to send the corrected progress messages after the code to execute has finished (which might be never) which completely defeats the purpose of the program (to enable the programmer to watch what the code does, while it executes).
Temporarily surrendering control to the event loop would solve this problem. However, that does not seem to be possible. The other idea I have is to introduce a third process which controls both the execution process and the sourceMapGeneration process. It receives progress messages from the execution process and if any of the messages needs correction it waits for the sourceMapGeneration process. Since the processes are independent, the controlling process can store the received messages and wait for the sourceMapGeneration process while the execution process continues executing, and as soon as it receives the source map, it corrects the messages and sends all of them off.
However, this would not only require yet another process (overhead) it also means I have to transfer the code once more between processes and since the code can have thousands of line that in itself can take some time, so I would like to move it around as little as possible.
I hope this explains, why I cannot and didn't use the usual "asynchronous callback" approach.

Adding a third ( :) ) solution to your problem after you clarified what behavior you seek I suggest using Fibers.
Fibers let you do co-routines in nodejs. Coroutines are functions that allow multiple entry/exit points. This means you will be able to yield control and resume it as you please.
Here is a sleep function from the official documentation that does exactly that, sleep for a given amount of time and perform actions.
function sleep(ms) {
var fiber = Fiber.current;
setTimeout(function() {
fiber.run();
}, ms);
Fiber.yield();
}
Fiber(function() {
console.log('wait... ' + new Date);
sleep(1000);
console.log('ok... ' + new Date);
}).run();
console.log('back in main');
You can place the code that does the waiting for the resource in a function, causing it to yield and then run again when the task is done.
For example, adapting your example from the question:
var pausedExecution, importantData;
function getImportantData() {
while (importantData === undefined) {
pausedExecution = Fiber.current;
Fiber.yield();
pausedExecution = undefined;
}
if (importantData === null) {
throw new Error("Data could not be generated.");
} else {
// we should have proper data now
return importantData;
}
}
function callback(partialDataMessage) {
if (partialDataMessage.needsCorrection) {
var theData = getImportantData();
// use data to correct message
process.send(correctedMessage); // send corrected result to main process
} else {
process.send(partialDataMessage); // send unmodified result to main process
}
}
function executeCode(code) {
// setup child process to calculate the data
importantDataCalculator = fork("./runtime");
importantDataCalculator.on("message", function (msg) {
if (msg.type === "result") {
importantData = msg.data;
} else if (msg.type === "error") {
importantData = null;
} else {
throw new Error("Unknown message from dataGenerator!");
}
if (pausedExecution) {
// execution is waiting for the data
pausedExecution.run();
}
});
// wrap the execution of the code in a Fiber, so it can be paused
Fiber(function () {
runCodeWithCallback(code, callback); // the callback will be called from time to time when the code produces new data
// this callback is synchronous and blocking,
// but it will yield control to the event loop if it has to wait for the child-process to finish
}).run();
}
Good luck! I always say it is better to solve one problem in 3 ways than solving 3 problems the same way. I'm glad we were able to work out something that worked for you. Admittingly, this was a pretty interesting question.

The rule of asynchronous programming is, once you've entered asynchronous code, you must continue to use asynchronous code. While you can continue to call the function over and over via setImmediate or something of the sort, you still have the issue that you're trying to return from an asynchronous process.
Without knowing more about your program, I can't tell you exactly how you should structure it, but by and large the way to "return" data from a process that involves asynchronous code is to pass in a callback; perhaps this will put you on the right track:
function getImportantData(callback) {
importantDataCalculator = fork("./runtime");
importantDataCalculator.on("message", function (msg) {
if (msg.type === "result") {
callback(null, msg.data);
} else if (msg.type === "error") {
callback(new Error("Data could not be generated."));
} else {
callback(new Error("Unknown message from sourceMapGenerator!"));
}
});
}
You would then use this function like this:
getImportantData(function(error, data) {
if (error) {
// handle the error somehow
} else {
// `data` is the data from the forked process
}
});
I talk about this in a bit more detail in one of my screencasts, Thinking Asynchronously.

What you are running into is a very common scenario that skilled programmers who are starting with nodejs often struggle with.
You're correct. You can't do this the way you are attempting (loop).
The main process in node.js is single threaded and you are blocking the event loop.
The simplest way to resolve this is something like:
function getImportantData() {
if(importantData === undefined){ // not set yet
setImmediate(getImportantData); // try again on the next event loop cycle
return; //stop this attempt
}
if (importantData === null) {
throw new Error("Data could not be generated.");
} else {
// we should have a proper data now
return importantData;
}
}
What we are doing, is that the function is re-attempting to process the data on the next iteration of the event loop using setImmediate.
This introduces a new problem though, your function returns a value. Since it will not be ready, the value you are returning is undefined. So you have to code reactively. You need to tell your code what to do when the data arrives.
This is typically done in node with a callback
function getImportantData(err,whenDone) {
if(importantData === undefined){ // not set yet
setImmediate(getImportantData.bind(null,whenDone)); // try again on the next event loop cycle
return; //stop this attempt
}
if (importantData === null) {
err("Data could not be generated.");
} else {
// we should have a proper data now
whenDone(importantData);
}
}
This can be used in the following way
getImportantData(function(err){
throw new Error(err); // error handling function callback
}, function(data){ //this is whenDone in our case
//perform actions on the important data
})

Your question (updated) is very interesting, it appears to be closely related to a problem I had with asynchronously catching exceptions. (Also Brandon and Ihad an interesting discussion with me about it! It's a small world)
See this question on how to catch exceptions asynchronously. The key concept is that you can use (assuming nodejs 0.8+) nodejs domains to constrain the scope of an exception.
This will allow you to easily get the location of the exception since you can surround asynchronous blocks with atry/catch. I think this should solve the bigger issue here.
You can find the relevant code in the linked question. The usage is something like:
atry(function() {
setTimeout(function(){
throw "something";
},1000);
}).catch(function(err){
console.log("caught "+err);
});
Since you have access to the scope of atry you can get the stack trace there which would let you skip the more complicated source-map usage.
Good luck!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string