I need to run multiple parallel tasks (infinite loops) without blocking each other in node.js. I'm trying now to do:
const test = async () => {
let a = new Promise(async res => {
while (true) {
console.log('test1')
}
})
let b = new Promise(async res => {
while (true) {
console.log('test2')
}
})
}
test();
But it does not work, only 'test1' appears in the console. What am I doing wrong?
You can't. You need to stop thinking in loops but start thinking in event loop instead.
There are two functions that can schedule functions to run in the future that can be used for this: setTimeout() and setInterval(). Note that in Node.js there is also setImmediate() but I suggest you avoid using it until you really know what you are doing because setImmediate() has to potential to block I/O.
Note that neither setTimeout() not setImmediate() are Promise aware.
The minimal code example that does what you want would be something like:
const test = () => {
setInterval(() => {
console.log('test1')
},
10 // execute the above code every 10ms
)
setInterval(() => {
console.log('test2')
},
10 // execute the above code every 10ms
)
}
test();
Or if you really want to run the two pieces of code as fast as possible you can pass 0 as the timeout. It won't run every 0ms but just as fast as the interpreter can. The minimal interval differs depending on OS and how busy your CPU is. For example Windows can run setInterval() down to every 1ms while Linux typically won't run any faster than 10ms. This is down to how fast the OS tick is (or jiffy in Linux terminology). Linux is a server oriented OS so sets its tick bigger to give it higher throughput (each process gets the CPU for longer thus can finish long tasks faster) while Windows is a UI oriented (some would say game oriented) OS so sets its tick smaller for smoother UI experience.
To get something closer to the style of code you want you can use a promisified setTimeout() and await it:
function delay (x) {
return new Promise((done, fail) => setTimeout(done,x));
}
const test = async () => {
let a = async (res) => {
while (true) {
console.log('test1')
await delay(0) // this allows the function to be async
}
}
let b = async (res) => {
while (true) {
console.log('test2')
await delay(0) // this allows the function to be async
}
}
a();
b();
}
test();
However, this is no longer the minimal possible working code, though not by much.
Note: After writing the promisified example above it suddenly reminded me of the programming style on early cooperative multitasking OSes. I think Windows3.1 did this though I never wrote anything on it. It reminds me of MacOS Classic. You had to periodically call the WaitNextEvent() function to pass control of the CPU back to the OS so that the OS can run other programs. If you forgot to do that (or your program gets stuck in a long loop with no WaitNextEvent() call) your entire computer would freeze. This is exactly what you are experiencing with node where the entire javascript engine "freezes" and executes only one loop while ignoring other code.
Related
As i understand Nodejs has a event loop based mechanism, therefore the main nodejs code is single threaded which just tries to execute the methods and callbacks present in the call stack, therefore at a single point of time in the nodejs runtime only one thread executes(main thread), so if i define a data structure like a queue or map i need not to worry about locking it or blocking others accessing it when one of the callbacks is using because
unless a callback is done executing the main thread will keep executing ( unlike multithreaded application where a thread could execute for a bit and then contexted switched out to give chance to some other thread.)
no 2 callbacks are executed in parallel
Example:
let arr = [];
const fun = async (website) => {
const result = await fetch(website, {}, "POST");
arr.push(...result); }
const getData = async () => {
let promises = [];
for (let i = 0; i < 10; i++) {
promises.push(fun(`website${i}`));
}
Promise.all(promises);
}
So my question is do i ever need to lock a resource on the nodejs side of thing,
i do understand the libuv is multithreaded therefore when calling async operations that use libuv (ex writing to file) could cause problems if i don't lock.
But my question is do i ever need a lock in the nodejs runtime.
I discovered an odd behaviour in node's promisify() function and I cannot work out why it's doing what it's doing.
Consider the following script:
#!/usr/bin/env node
/**
* Module dependencies.
*/
var http = require('http')
var promisify = require('util').promisify
;(async () => {
try {
// UNCOMMENT THIS, AND NODE WILL QUIT
// var f = function () { return 'Straight value' }
// var fP = promisify(f)
// await fP()
/**
* Create HTTP server.
*/
var server = http.createServer()
/**
* Listen on provided port, on all network interfaces.
*/
server.listen(3000)
server.on('error', (e) => { console.log('Error:', e); process.exit() })
server.on('listening', () => { console.log('Listening') })
} catch (e) {
console.log('ERROR:', e)
}
})()
console.log('OUT OF THE ASYNC FUNCTION')
It's a straightforward self-invoking function that starts a node server.
And that's fine.
NOW... if you uncomment the lines under "UNCOMMENT THIS", node will quit without running the server.
I KNOW that I am using promisify() on a function that does not call the callback, but returns a value instead. So, I KNOW that that is in itself a problem.
However... why is node just quitting...?
This was really difficult to debug -- especially when you have something more complex that a tiny script.
If you change the function definition to something that actually calls a callback:
var f = function (cb) { setTimeout( () => { return cb( null, 'Straight value') }, 2000) }
Everything works as expected...
UPDATE
Huge simplification:
function f () {
return new Promise(resolve => {
console.log('AH')
})
}
f().then(() => {
console.log('Will this happen...?')
})
Will only print "AH"!
Call promisify() on a non-callback function: “interesting” results in node. Why?
Because you allow node.js to go to the event loop with nothing to do. Since there are no live asynchronous operations in play and no more code to run, node.js realizes that there is nothing else to do and no way for anything else to run so it exits.
When you hit the await and node.js goes back to the event loop, there is nothing keeping node.js running so it exits. There are no timers or open sockets or any of those types of things that keep node.js running so the node.js auto-exit-detection logic says that there's nothing else to do so it exits.
Because node.js is an event driven system, if your code returns back to the event loop and there are no asynchronous operations of any kind in flight (open sockets, listening servers, timers, file I/O operations, other hardware listeners, etc...), then there is nothing running that could ever insert any events in the event queue and the queue is currently empty. As such, node.js realizes that there can never be any way to run any more code in this app so it exits. This is an automatic behavior built into node.js.
A real async operation inside of fp() would have some sort of socket or timer or something open that keeps the process running. But because yours is fake, there's nothing there and nothing to keep node.js running.
If you put a setTimeout() for 1 second inside of f(), you will see that the process exit happens 1 second later. So, the process exit has nothing to do with the promise. It has to do with the fact that you've gone back to the event loop, but you haven't started anything yet that would keep node.js running.
Or, if you put a setInterval() at the top of your async function, you will similarly find that the process does not exit.
So, this would similarly happen if you did this:
var f = function () { return 'Straight value' }
var fP = promisify(f);
fP().then(() => {
// start your server here
});
Or this:
function f() {
return new Promise(resolve => {
// do nothing here
});
}
f().then(() => {
// start your server here
});
The issue isn't with the promisify() operation. It's because you are waiting on a non-existent async operation and thus node.js has nothing to do and it notices there's nothing to do so it auto-exits. Having an open promise with a .then() handler is not something that keeps node.js running. Rather there needs to be some active asynchronous operation (timer, network socket, listening server, file I/O operation underway, etc...) to keep node.js running.
In this particular case, node.js is essentially correct. Your promise will never resolve, nothing else is queued to ever run and thus your server will never get started and no other code in your app will ever run, thus it is not actually useful to keep running. There is nothing to do and no way for your code to actually do anything else.
If you change the function definition to something that actually calls a callback:
That's because you used a timer so node.js has something to actually do while waiting for the promise to resolve. A running timer that has not had .unref() called on it will prevent auto-exit.
Worth reading: How does a node.js process know when to stop?
FYI, you can "turn off" or "bypass" the node.js auto-exit logic by just adding this anywhere in your startup code:
// timer that fires once per day
let foreverInterval = setInterval(() => {
// do nothing
}, 1000 * 60 * 60 * 24);
That always gives node.js something to do so it will never auto-exit. Then when you do want your process to exit, you could either call clearInterval(foreverInterval) or just force things with process.exit(0).
This is a normal example to read a file:
var fs = require('fs');
fs.readFile('./gparted-live-0.18.0-2-i486.iso', function (err, data) {
console.log(data.length);
});
console.log('All done.');
the code above outputs:
All done.
187695104
whereas this is my own version of a callback, I hope it could be async like the file reading code above, but it is not:
var f = function(cb) {
cb();
};
f(function() {
var i = 0;
// Do some very long job.
while(++i < (1<<30)) {}
console.log('Cb comes back.')
});
console.log('All done.');
the code above outputs:
Cb comes back.
All done.
Up till now, it's clear that in the first version of the file reading code, All done. is always printed before the file is read. However, in the second my home brewed version of code, All done. is always waiting until the very long job is done.
So what on earth is the magic that makes fs.readFile's callback an async call back while mine is not?
var f = function(cb) {
cb();
};
Is not async because it invokes cb immediately.
I think you want
var f = function(cb) {
setImmediate(function(){ cb(); });
};
In your example the while-loop is occupying the event-loop therefore the function call to console.log('All done.') is queued on the stack. When the event-loop becomes unblocked the subsequent function calls will be called in sequence.
In Mastering Node.js by Sandro Pasquali - Chapter 2, he discusses deferred execution and the event-loop in order to avoid the issue of the event-loop taking hold and blocking execution. I recommend reading that chapter in order to better understand this non-intuitive way of working in Node.js.
From Mastering Node.js...
Node processes JavaScript instructions using a single thread. Within
your JavaScript program no two operations will ever execute at exactly
the same moment, as might happen in a multithreaded environment.
Understanding this fact is essential to understanding how a Node
program, or process, is designed and runs.
The use of setImmediate() can remedy this issue.
You can use setImmediate() to defer the execution of code until the next cycle of the event loop, which I think accomplishes what you want:
var f = function(cb) {
cb();
};
f(function() {
setImmediate(function() {
var i = 0;
// Do some very long job.
while(++i < (1<<30)) {}
console.log('Cb comes back.')
});
});
console.log('All done.');
The documentation for setImmediate explains the difference between process.nextTick and setImmediate thusly:
Immediates are queued in the order created, and are popped off the queue once per loop iteration. This is different from process.nextTick which will execute process.maxTickDepth queued callbacks per iteration. setImmediate will yield to the event loop after firing a queued callback to make sure I/O is not being starved. While order is preserved for execution, other I/O events may fire between any two scheduled immediate callbacks.
Edit: Update answer based on #generalhenry's comment.
TL;DR
What is the best way to forcibly keep a Node.js process running, i.e., keep its event loop from running empty and hence keeping the process from terminating? The best solution I could come up with was this:
const SOME_HUGE_INTERVAL = 1 << 30;
setInterval(() => {}, SOME_HUGE_INTERVAL);
Which will keep an interval running without causing too much disturbance if you keep the interval period long enough.
Is there a better way to do it?
Long version of the question
I have a Node.js script using Edge.js to register a callback function so that it can be called from inside a DLL in .NET. This function will be called 1 time per second, sending a simple sequence number that should be printed to the console.
The Edge.js part is fine, everything is working. My only problem is that my Node.js process executes its script and after that it runs out of events to process. With its event loop empty, it just terminates, ignoring the fact that it should've kept running to be able to receive callbacks from the DLL.
My Node.js script:
var
edge = require('edge');
var foo = edge.func({
assemblyFile: 'cs.dll',
typeName: 'cs.MyClass',
methodName: 'Foo'
});
// The callback function that will be called from C# code:
function callback(sequence) {
console.info('Sequence:', sequence);
}
// Register for a callback:
foo({ callback: callback }, true);
// My hack to keep the process alive:
setInterval(function() {}, 60000);
My C# code (the DLL):
public class MyClass
{
Func<object, Task<object>> Callback;
void Bar()
{
int sequence = 1;
while (true)
{
Callback(sequence++);
Thread.Sleep(1000);
}
}
public async Task<object> Foo(dynamic input)
{
// Receives the callback function that will be used:
Callback = (Func<object, Task<object>>)input.callback;
// Starts a new thread that will call back periodically:
(new Thread(Bar)).Start();
return new object { };
}
}
The only solution I could come up with was to register a timer with a long interval to call an empty function just to keep the scheduler busy and avoid getting the event loop empty so that the process keeps running forever.
Is there any way to do this better than I did? I.e., keep the process running without having to use this kind of "hack"?
The simplest, least intrusive solution
I honestly think my approach is the least intrusive one:
setInterval(() => {}, 1 << 30);
This will set a harmless interval that will fire approximately once every 12 days, effectively doing nothing, but keeping the process running.
Originally, my solution used Number.POSITIVE_INFINITY as the period, so the timer would actually never fire, but this behavior was recently changed by the API and now it doesn't accept anything greater than 2147483647 (i.e., 2 ** 31 - 1). See docs here and here.
Comments on other solutions
For reference, here are the other two answers given so far:
Joe's (deleted since then, but perfectly valid):
require('net').createServer().listen();
Will create a "bogus listener", as he called it. A minor downside is that we'd allocate a port just for that.
Jacob's:
process.stdin.resume();
Or the equivalent:
process.stdin.on("data", () => {});
Puts stdin into "old" mode, a deprecated feature that is still present in Node.js for compatibility with scripts written prior to Node.js v0.10 (reference).
I'd advise against it. Not only it's deprecated, it also unnecessarily messes with stdin.
Use "old" Streams mode to listen for a standard input that will never come:
// Start reading from stdin so we don't exit.
process.stdin.resume();
Here is IFFE based on the accepted answer:
(function keepProcessRunning() {
setTimeout(keepProcessRunning, 1 << 30);
})();
and here is conditional exit:
let flag = true;
(function keepProcessRunning() {
setTimeout(() => flag && keepProcessRunning(), 1000);
})();
You could use a setTimeout(function() {""},1000000000000000000); command to keep your script alive without overload.
spin up a nice repl, node would do the same if it didn't receive an exit code anyway:
import("repl").then(repl=>
repl.start({prompt:"\x1b[31m"+process.versions.node+": \x1b[0m"}));
I'll throw another hack into the mix. Here's how to do it with Promise:
new Promise(_ => null);
Throw that at the bottom of your .js file and it should run forever.
Suppose you've got a 3rd-party library that's got a synchronous API. Naturally, attempting to use it in an async fashion yields undesirable results in the sense that you get blocked when trying to do multiple things in "parallel".
Are there any common patterns that allow us to use such libraries in an async fashion?
Consider the following example (using the async library from NPM for brevity):
var async = require('async');
function ts() {
return new Date().getTime();
}
var startTs = ts();
process.on('exit', function() {
console.log('Total Time: ~' + (ts() - startTs) + ' ms');
});
// This is a dummy function that simulates some 3rd-party synchronous code.
function vendorSyncCode() {
var future = ts() + 50; // ~50 ms in the future.
while(ts() <= future) {} // Spin to simulate blocking work.
}
// My code that handles the workload and uses `vendorSyncCode`.
function myTaskRunner(task, callback) {
// Do async stuff with `task`...
vendorSyncCode(task);
// Do more async stuff...
callback();
}
// Dummy workload.
var work = (function() {
var result = [];
for(var i = 0; i < 100; ++i) result.push(i);
return result;
})();
// Problem:
// -------
// The following two calls will take roughly the same amount of time to complete.
// In this case, ~6 seconds each.
async.each(work, myTaskRunner, function(err) {});
async.eachLimit(work, 10, myTaskRunner, function(err) {});
// Desired:
// --------
// The latter call with 10 "workers" should complete roughly an order of magnitude
// faster than the former.
Are fork/join or spawning worker processes manually my only options?
Yes, it is your only option.
If you need to use 50ms of cpu time to do something, and need to do it 10 times, then you'll need 500ms of cpu time to do it. If you want it to be done in less than 500ms of wall clock time, you need to use more cpus. That means multiple node instances (or a C++ addon that pushes the work out onto the thread pool). How to get multiple instances depends on your app strucuture, a child that you feed the work to using child_process.send() is one way, running multiple servers with cluster is another. Breaking up your server is another way. Say its an image store application, and mostly is fast to process requests, unless someone asks to convert an image into another format and that's cpu intensive. You could push the image processing portion into a different app, and access it through a REST API, leaving the main app server responsive.
If you aren't concerned that it takes 50ms of cpu to do the request, but instead you are concerned that you can't interleave handling of other requests with the processing of the cpu intensive request, then you could break the work up into small chunks, and schedule the next chunk with setInterval(). That's usually a horrid hack, though. Better to restructure the app.