How to execute an async task with socket.io and node.js? - node.js

When I receive an "on" event on the server side, I want to start a task in parallel so it does not block the current event loop thread. Is it possible to do so? How?
I don't want to block the server side loop and I want to be able to send back a message to the client once the task is done, something such as:
client.on('execute-parallel-task', function(msg) {
setTimeout(function() {
// do something that takes a while
client.emit('finished-that-task');
},0);
// this block should return asap, not waiting for the previous call
});
I am not sure if setTimeout will do the job.

It depends what the takes a while is. If it takes a while asynchronously (you can tell because you'll have to register a callback or complete handler), and takes a while because it's blocked on something like IO, rather than CPU bound, it'll inherently be parallel.
If however, its something synchronous or CPU bound, whilst you can use setTimeout, setImmediate etc. to send back a message immediately, once the handler for setTimeout or setImmediate executes, your single thread of execution will be stuck handling that; you're not really fixing the problem, merely deferring it.
To exhibit true parallel behaviour, you'll need to launch a child process. You can use the message passing functionality to notify your worker what work to do, and to notify the parent process once the work is complete.
var cp = require('child_process');
var child = cp.fork(__dirname + '/my-child-worker.js');
n.on('message', function(m) {
if (m === "done") {
// Whey!
}
});
n.send(/* Job id, or something */);
Then in my-child-worker.js;
process.on('message', function (m) {
switch (m) {
case 'get-x':
// blah
break;
// other jobs
}
process.send('done');
});

you do not need the setTimeout.
Your function(msg) will be called once the execute parallel task finishes.
if you are designing a task to run in an async manner, you can look at something like the async lib for node.js
Async Node JS Link

Related

nodejs express . For loop is blocking my simple server [duplicate]

The following example is given in a Node.js book:
var open = false;
setTimeout(function() {
open = true
}, 1000)
while (!open) {
console.log('wait');
}
console.log('open sesame');
Explaining why the while loop blocks execution, the author says:
Node will never execute the timeout callback because the event loop is
stuck on this while loop started on line 7, never giving it a chance
to process the timeout event!
However, the author doesn't explain why this happens in the context of the event loop or what is really going on under the hood.
Can someone elaborate on this? Why does node get stuck? And how would one change the above code, whilst retaining the while control structure so that the event loop is not blocked and the code will behave as one might reasonably expect; wait
will be logged for only 1 second before the setTimeout fires and the process then exits after logging 'open sesame'.
Generic explanations such as the answers to this question about IO and event loops and callbacks do not really help me rationalise this. I'm hoping an answer which directly references the above code will help.
It's fairly simple really. Internally, node.js consists of this type of loop:
Get something from the event queue
Run whatever task is indicated and run it until it returns
When the above task is done, get the next item from the event queue
Run whatever task is indicated and run it until it returns
Rinse, lather, repeat - over and over
If at some point, there is nothing in the event queue, then go to sleep until something is placed in the event queue or until it's time for a timer to fire.
So, if a piece of Javascript is sitting in a while() loop, then that task is not finishing and per the above sequence, nothing new will be picked out of the event queue until that prior task is completely done. So, a very long or forever running while() loop just gums up the works. Because Javascript only runs one task at a time (single threaded for JS execution), if that one task is spinning in a while loop, then nothing else can ever execute.
Here's a simple example that might help explain it:
var done = false;
// set a timer for 1 second from now to set done to true
setTimeout(function() {
done = true;
}, 1000);
// spin wait for the done value to change
while (!done) { /* do nothing */}
console.log("finally, the done value changed!");
Some might logically think that the while loop will spin until the timer fires and then the timer will change the value of done to true and then the while loop will finish and the console.log() at the end will execute. That is NOT what will happen. This will actually be an infinite loop and the console.log() statement will never be executed.
The issue is that once you go into the spin wait in the while() loop, NO other Javascript can execute. So, the timer that wants to change the value of the done variable cannot execute. Thus, the while loop condition can never change and thus it is an infinite loop.
Here's what happens internally inside the JS engine:
done variable initialized to false
setTimeout() schedules a timer event for 1 second from now
The while loop starts spinning
1 second into the while loop spinning, the timer is ready to fire, but it won't be able to actually do anything until the interpreter gets back to the event loop
The while loop keeps spinning because the done variable never changes. Because it continues to spin, the JS engine never finishes this thread of execution and never gets to pull the next item from the event queue or run the pending timer.
node.js is an event driven environment. To solve this problem in a real world application, the done flag would get changed on some future event. So, rather than a spinning while loop, you would register an event handler for some relevant event in the future and do your work there. In the absolute worst case, you could set a recurring timer and "poll" to check the flag ever so often, but in nearly every single case, you can register an event handler for the actual event that will cause the done flag to change and do your work in that. Properly designed code that knows other code wants to know when something has changed may even offer its own event listener and its own notification events that one can register an interest in or even just a simple callback.
This is a great question but I found a fix!
var sleep = require('system-sleep')
var done = false
setTimeout(function() {
done = true
}, 1000)
while (!done) {
sleep(100)
console.log('sleeping')
}
console.log('finally, the done value changed!')
I think it works because system-sleep is not a spin wait.
There is another solution. You can get access to event loop almost every cycle.
let done = false;
setTimeout(() => {
done = true
}, 5);
const eventLoopQueue = () => {
return new Promise(resolve =>
setImmediate(() => {
console.log('event loop');
resolve();
})
);
}
const run = async () => {
while (!done) {
console.log('loop');
await eventLoopQueue();
}
}
run().then(() => console.log('Done'));
Node is a single serial task. There is no parallelism, and its concurrency is IO bound. Think of it like this: Everything is running on a single thread, when you make an IO call that is blocking/synchronous your process halts until the data is returned; however say we have a single thread that instead of waiting on IO(reading disk, grabbing a url, etc) your task continues on to the next task, and after that task is complete it checks that IO. This is basically what node does, its an "event-loop" its polling IO for completion(or progress) on a loop. So when a task does not complete(your loop) the event loop does not progress. To put it simply.
because timer needs to comeback and is waiting loop to finish to add to the queue, so although the timeout is in a separate thread, and may indeed finsihed the timer, but the "task" to set done = true is waiting on that infinite loop to finish
var open = false;
const EventEmitter = require("events");
const eventEmitter = new EventEmitter();
setTimeout(function () {
open = true;
eventEmitter.emit("open_var_changed");
}, 1000);
let wait_interval = setInterval(() => {
console.log("waiting");
}, 100);
eventEmitter.on("open_var_changed", () => {
clearInterval(wait_interval);
console.log("open var changed to ", open);
});
this exemple works and you can do setInterval and check if the open value changed inside it and it will work

while loop in sync function in node js [duplicate]

The following example is given in a Node.js book:
var open = false;
setTimeout(function() {
open = true
}, 1000)
while (!open) {
console.log('wait');
}
console.log('open sesame');
Explaining why the while loop blocks execution, the author says:
Node will never execute the timeout callback because the event loop is
stuck on this while loop started on line 7, never giving it a chance
to process the timeout event!
However, the author doesn't explain why this happens in the context of the event loop or what is really going on under the hood.
Can someone elaborate on this? Why does node get stuck? And how would one change the above code, whilst retaining the while control structure so that the event loop is not blocked and the code will behave as one might reasonably expect; wait
will be logged for only 1 second before the setTimeout fires and the process then exits after logging 'open sesame'.
Generic explanations such as the answers to this question about IO and event loops and callbacks do not really help me rationalise this. I'm hoping an answer which directly references the above code will help.
It's fairly simple really. Internally, node.js consists of this type of loop:
Get something from the event queue
Run whatever task is indicated and run it until it returns
When the above task is done, get the next item from the event queue
Run whatever task is indicated and run it until it returns
Rinse, lather, repeat - over and over
If at some point, there is nothing in the event queue, then go to sleep until something is placed in the event queue or until it's time for a timer to fire.
So, if a piece of Javascript is sitting in a while() loop, then that task is not finishing and per the above sequence, nothing new will be picked out of the event queue until that prior task is completely done. So, a very long or forever running while() loop just gums up the works. Because Javascript only runs one task at a time (single threaded for JS execution), if that one task is spinning in a while loop, then nothing else can ever execute.
Here's a simple example that might help explain it:
var done = false;
// set a timer for 1 second from now to set done to true
setTimeout(function() {
done = true;
}, 1000);
// spin wait for the done value to change
while (!done) { /* do nothing */}
console.log("finally, the done value changed!");
Some might logically think that the while loop will spin until the timer fires and then the timer will change the value of done to true and then the while loop will finish and the console.log() at the end will execute. That is NOT what will happen. This will actually be an infinite loop and the console.log() statement will never be executed.
The issue is that once you go into the spin wait in the while() loop, NO other Javascript can execute. So, the timer that wants to change the value of the done variable cannot execute. Thus, the while loop condition can never change and thus it is an infinite loop.
Here's what happens internally inside the JS engine:
done variable initialized to false
setTimeout() schedules a timer event for 1 second from now
The while loop starts spinning
1 second into the while loop spinning, the timer is ready to fire, but it won't be able to actually do anything until the interpreter gets back to the event loop
The while loop keeps spinning because the done variable never changes. Because it continues to spin, the JS engine never finishes this thread of execution and never gets to pull the next item from the event queue or run the pending timer.
node.js is an event driven environment. To solve this problem in a real world application, the done flag would get changed on some future event. So, rather than a spinning while loop, you would register an event handler for some relevant event in the future and do your work there. In the absolute worst case, you could set a recurring timer and "poll" to check the flag ever so often, but in nearly every single case, you can register an event handler for the actual event that will cause the done flag to change and do your work in that. Properly designed code that knows other code wants to know when something has changed may even offer its own event listener and its own notification events that one can register an interest in or even just a simple callback.
This is a great question but I found a fix!
var sleep = require('system-sleep')
var done = false
setTimeout(function() {
done = true
}, 1000)
while (!done) {
sleep(100)
console.log('sleeping')
}
console.log('finally, the done value changed!')
I think it works because system-sleep is not a spin wait.
There is another solution. You can get access to event loop almost every cycle.
let done = false;
setTimeout(() => {
done = true
}, 5);
const eventLoopQueue = () => {
return new Promise(resolve =>
setImmediate(() => {
console.log('event loop');
resolve();
})
);
}
const run = async () => {
while (!done) {
console.log('loop');
await eventLoopQueue();
}
}
run().then(() => console.log('Done'));
Node is a single serial task. There is no parallelism, and its concurrency is IO bound. Think of it like this: Everything is running on a single thread, when you make an IO call that is blocking/synchronous your process halts until the data is returned; however say we have a single thread that instead of waiting on IO(reading disk, grabbing a url, etc) your task continues on to the next task, and after that task is complete it checks that IO. This is basically what node does, its an "event-loop" its polling IO for completion(or progress) on a loop. So when a task does not complete(your loop) the event loop does not progress. To put it simply.
because timer needs to comeback and is waiting loop to finish to add to the queue, so although the timeout is in a separate thread, and may indeed finsihed the timer, but the "task" to set done = true is waiting on that infinite loop to finish
var open = false;
const EventEmitter = require("events");
const eventEmitter = new EventEmitter();
setTimeout(function () {
open = true;
eventEmitter.emit("open_var_changed");
}, 1000);
let wait_interval = setInterval(() => {
console.log("waiting");
}, 100);
eventEmitter.on("open_var_changed", () => {
clearInterval(wait_interval);
console.log("open var changed to ", open);
});
this exemple works and you can do setInterval and check if the open value changed inside it and it will work

Call promisify() on a non-callback function: "interesting" results in node. Why?

I discovered an odd behaviour in node's promisify() function and I cannot work out why it's doing what it's doing.
Consider the following script:
#!/usr/bin/env node
/**
* Module dependencies.
*/
var http = require('http')
var promisify = require('util').promisify
;(async () => {
try {
// UNCOMMENT THIS, AND NODE WILL QUIT
// var f = function () { return 'Straight value' }
// var fP = promisify(f)
// await fP()
/**
* Create HTTP server.
*/
var server = http.createServer()
/**
* Listen on provided port, on all network interfaces.
*/
server.listen(3000)
server.on('error', (e) => { console.log('Error:', e); process.exit() })
server.on('listening', () => { console.log('Listening') })
} catch (e) {
console.log('ERROR:', e)
}
})()
console.log('OUT OF THE ASYNC FUNCTION')
It's a straightforward self-invoking function that starts a node server.
And that's fine.
NOW... if you uncomment the lines under "UNCOMMENT THIS", node will quit without running the server.
I KNOW that I am using promisify() on a function that does not call the callback, but returns a value instead. So, I KNOW that that is in itself a problem.
However... why is node just quitting...?
This was really difficult to debug -- especially when you have something more complex that a tiny script.
If you change the function definition to something that actually calls a callback:
var f = function (cb) { setTimeout( () => { return cb( null, 'Straight value') }, 2000) }
Everything works as expected...
UPDATE
Huge simplification:
function f () {
return new Promise(resolve => {
console.log('AH')
})
}
f().then(() => {
console.log('Will this happen...?')
})
Will only print "AH"!
Call promisify() on a non-callback function: “interesting” results in node. Why?
Because you allow node.js to go to the event loop with nothing to do. Since there are no live asynchronous operations in play and no more code to run, node.js realizes that there is nothing else to do and no way for anything else to run so it exits.
When you hit the await and node.js goes back to the event loop, there is nothing keeping node.js running so it exits. There are no timers or open sockets or any of those types of things that keep node.js running so the node.js auto-exit-detection logic says that there's nothing else to do so it exits.
Because node.js is an event driven system, if your code returns back to the event loop and there are no asynchronous operations of any kind in flight (open sockets, listening servers, timers, file I/O operations, other hardware listeners, etc...), then there is nothing running that could ever insert any events in the event queue and the queue is currently empty. As such, node.js realizes that there can never be any way to run any more code in this app so it exits. This is an automatic behavior built into node.js.
A real async operation inside of fp() would have some sort of socket or timer or something open that keeps the process running. But because yours is fake, there's nothing there and nothing to keep node.js running.
If you put a setTimeout() for 1 second inside of f(), you will see that the process exit happens 1 second later. So, the process exit has nothing to do with the promise. It has to do with the fact that you've gone back to the event loop, but you haven't started anything yet that would keep node.js running.
Or, if you put a setInterval() at the top of your async function, you will similarly find that the process does not exit.
So, this would similarly happen if you did this:
var f = function () { return 'Straight value' }
var fP = promisify(f);
fP().then(() => {
// start your server here
});
Or this:
function f() {
return new Promise(resolve => {
// do nothing here
});
}
f().then(() => {
// start your server here
});
The issue isn't with the promisify() operation. It's because you are waiting on a non-existent async operation and thus node.js has nothing to do and it notices there's nothing to do so it auto-exits. Having an open promise with a .then() handler is not something that keeps node.js running. Rather there needs to be some active asynchronous operation (timer, network socket, listening server, file I/O operation underway, etc...) to keep node.js running.
In this particular case, node.js is essentially correct. Your promise will never resolve, nothing else is queued to ever run and thus your server will never get started and no other code in your app will ever run, thus it is not actually useful to keep running. There is nothing to do and no way for your code to actually do anything else.
If you change the function definition to something that actually calls a callback:
That's because you used a timer so node.js has something to actually do while waiting for the promise to resolve. A running timer that has not had .unref() called on it will prevent auto-exit.
Worth reading: How does a node.js process know when to stop?
FYI, you can "turn off" or "bypass" the node.js auto-exit logic by just adding this anywhere in your startup code:
// timer that fires once per day
let foreverInterval = setInterval(() => {
// do nothing
}, 1000 * 60 * 60 * 24);
That always gives node.js something to do so it will never auto-exit. Then when you do want your process to exit, you could either call clearInterval(foreverInterval) or just force things with process.exit(0).

Can I write a real async callback in Nodejs?

This is a normal example to read a file:
var fs = require('fs');
fs.readFile('./gparted-live-0.18.0-2-i486.iso', function (err, data) {
console.log(data.length);
});
console.log('All done.');
the code above outputs:
All done.
187695104
whereas this is my own version of a callback, I hope it could be async like the file reading code above, but it is not:
var f = function(cb) {
cb();
};
f(function() {
var i = 0;
// Do some very long job.
while(++i < (1<<30)) {}
console.log('Cb comes back.')
});
console.log('All done.');
the code above outputs:
Cb comes back.
All done.
Up till now, it's clear that in the first version of the file reading code, All done. is always printed before the file is read. However, in the second my home brewed version of code, All done. is always waiting until the very long job is done.
So what on earth is the magic that makes fs.readFile's callback an async call back while mine is not?
var f = function(cb) {
cb();
};
Is not async because it invokes cb immediately.
I think you want
var f = function(cb) {
setImmediate(function(){ cb(); });
};
In your example the while-loop is occupying the event-loop therefore the function call to console.log('All done.') is queued on the stack. When the event-loop becomes unblocked the subsequent function calls will be called in sequence.
In Mastering Node.js by Sandro Pasquali - Chapter 2, he discusses deferred execution and the event-loop in order to avoid the issue of the event-loop taking hold and blocking execution. I recommend reading that chapter in order to better understand this non-intuitive way of working in Node.js.
From Mastering Node.js...
Node processes JavaScript instructions using a single thread. Within
your JavaScript program no two operations will ever execute at exactly
the same moment, as might happen in a multithreaded environment.
Understanding this fact is essential to understanding how a Node
program, or process, is designed and runs.
The use of setImmediate() can remedy this issue.
You can use setImmediate() to defer the execution of code until the next cycle of the event loop, which I think accomplishes what you want:
var f = function(cb) {
cb();
};
f(function() {
setImmediate(function() {
var i = 0;
// Do some very long job.
while(++i < (1<<30)) {}
console.log('Cb comes back.')
});
});
console.log('All done.');
The documentation for setImmediate explains the difference between process.nextTick and setImmediate thusly:
Immediates are queued in the order created, and are popped off the queue once per loop iteration. This is different from process.nextTick which will execute process.maxTickDepth queued callbacks per iteration. setImmediate will yield to the event loop after firing a queued callback to make sure I/O is not being starved. While order is preserved for execution, other I/O events may fire between any two scheduled immediate callbacks.
Edit: Update answer based on #generalhenry's comment.

execute a function parallely, while completing execution of rest of the code

I have a code snippet in nodejs like this:
in every 2 sec, foo() will be called.
function foo()
{
while (count < 10)
{
doSometing()
count ++;``
}
}
doSomething()
{
...
}
The limitation is, foo() has no callback.
How to make while loop execute and foo() completes without waiting for dosomething() to complete (call dosomething() and proceed), and dosomething() executes parallely?
I think, what you want is:
function foo()
{
while (count < 10)
{
process.nextTick(doSometing);
count ++;
}
}
process.nextTick will schedule the execution of doSometing on the next tick of the event loop. So, instead of switching immediately to doSometing this code will just schedule the execution and complete foo first.
You may also try setTimeout(doSometing,0) and setImmediate(doSometing). They'll allow I/O calls to occur before doSometing will be executed.
Passing arguments to doSomething
If you want to pass some parameters to doSomething, then it's best to ensure they'll be encapsulated and won't change before doSomething will be executed:
setTimeout(doSometing.bind(null,foo,bar),0);
In this case doSometing will be called with correct arguments even if foo and bar will be changed or deleted. But this won't work in case if foo is an object and you changes one of its properties.
What the alternatives are?
If you want doSomething to be executed in parallel (not just asynchronous, but actually in parallel), then you may be interested in some job-processing solution. I recommend you to look at kickq:
var kickq = require('kickq');
kickq.process('some_job', function (jobItem, data, cb) {
doSomething(data);
cb();
});
// ...
function foo()
{
while (count < 10)
{
kickq.create('some_job', data);
count ++;
}
}
kickq.process will create a separate process for processing your jobs. So, kickq.create will just register the job to be processed.
kickq uses redis to queue jobs and it won't work without it.
Using node.js build-in modules
Another alternative is building your own job-processor using Child Process. The resulting code may look something like this:
var fork = require('child_process').fork,
child = fork(__dirname + '/do-something.js');
// ...
function foo()
{
while (count < 10)
{
child.send(data);
count ++;
}
}
do-something.js here is a separate .js file with doSomething logic:
process.on('message', doSomething);
The actual code may be more complicated.
Things you should be aware of
Node.js is single-threaded, so it executes only one function at a time. It also can't utilize more then one CPU.
Node.js is asynchronous, so it's capable of processing multiple functions at once by switching between them. It's really efficient when dealing with functions with lots of I/O calls, because it's newer blocks. So, when one function waits for the response from DB, another function is executed. But node.js is not a good choice for blocking tasks with heavy CPU utilization.
It's possible to do real parallel calculations in node.js using modules like child_process and cluster. child_process allows you to start a new node.js process. It also creates a communication channel between parent and child processes. Cluster allows you to run a cluster of identical processes. It's really handy when you're dealing with http requests, because cluster can distribute them randomly between workers. So, it's possible to create a cluster of workers processing your data in parallel, though generally node.js is single-threaded.

Resources