I am writing a nodejs script to process (convert to PDF from various formats) large amount of files on regular basis. The script reads all files in the folder, splits them into even sized arrays and then is supposed to run a command line converter asynchronously on each of the arrays. Everything works fine except for the node waiting for the promise to return before continuing with the for loop. The code uses a test command for copying files instead of conversion because it's simpler for testing. Instead of running 5 simultaneous copy tasks it runs one after the other. I am using array of promises so I can call Promises.all()when all promises return. What am I missing that this is not working as it should?
let processCount = 5;
let promisesArray = [];
for (let processNo = 0; processNo < processCount; processNo++) {
promisesArray[processNo] = new Promise(function(resolve, reject) {
let fileList = splitArrays[processNo];
for (let file = 0; file < fileList.length; file++) {
let command = `Copy "${fileList[file][0]}" "${fileList[file][1]}"`;
execSync(command);
}
resolve(true);
});
}
execSync() by definition blocks the event loop until the child process is done. You'll have to use exec() instead. The third param of exec() is callback function which will be called when your child process exits.
Related
As i understand Nodejs has a event loop based mechanism, therefore the main nodejs code is single threaded which just tries to execute the methods and callbacks present in the call stack, therefore at a single point of time in the nodejs runtime only one thread executes(main thread), so if i define a data structure like a queue or map i need not to worry about locking it or blocking others accessing it when one of the callbacks is using because
unless a callback is done executing the main thread will keep executing ( unlike multithreaded application where a thread could execute for a bit and then contexted switched out to give chance to some other thread.)
no 2 callbacks are executed in parallel
Example:
let arr = [];
const fun = async (website) => {
const result = await fetch(website, {}, "POST");
arr.push(...result); }
const getData = async () => {
let promises = [];
for (let i = 0; i < 10; i++) {
promises.push(fun(`website${i}`));
}
Promise.all(promises);
}
So my question is do i ever need to lock a resource on the nodejs side of thing,
i do understand the libuv is multithreaded therefore when calling async operations that use libuv (ex writing to file) could cause problems if i don't lock.
But my question is do i ever need a lock in the nodejs runtime.
Recently I came across code that makes me wonder about how Node.js engine works. Specifically in regards to event loop, for loop, and asynchronous processing. For example:
const x = 100000000; // or some stupendously large number
for (let i = 0; i < x; i++) {
asyncCall(callback);
}
const callback = () => {
console.log("done one");
}
Let's say asyncCall takes anywhere from 1ms to 100ms. Is there a possibility that console.log("done one") will be called before the for loop finishes?
To try to find my own answer, I've read this article: https://blog.sessionstack.com/how-javascript-works-event-loop-and-the-rise-of-async-programming-5-ways-to-better-coding-with-2f077c4438b5. But I'm not sure if there is a case where the call stack will be empty in the middle of the for loop so that the event loop puts the callback in between asyncCall calls?
How can I run a single method multiple times multi-threaded when called as a method of a class?
At first I tried to use the cluster module, but I realize it just re-runs the whole process from the start, rightfully so.
How can I achieve something like what's outlined below?
I want a class's method to spawn n processes, and when the parallel tasks are completed, I can resolve a promise which the method returns.
The problem with the code below is that calling cluster.fork() will fork index.js process.
index.js
const Person = require('./Person.js');
var Mary = new Person('Mary');
Mary.run(5).then(() => {...});
console.log('I should only run once, but I am called 5 times too many');
Person.js
const cluster = require('cluster');
class Person{
run(distance){
var completed = 0;
return new Promise((resolve, reject) => {
for(var i = 0; i < distance; i++) {
// run a separate process for each
cluster.fork().send(i).on('message', message => {
if (message === 'completed') { ++completed; }
if (completed === distance) { resolve(); }
});
}
});
}
}
I think the short answer is impossible. It's even worse - this has nothing to do with js. To multi (process or thread) in your particular problem you will essentially need a copy of the object in every thread, since it needs (maybe) access to fields - in this case you would need to either initialize it in every thread or share memory. That last one I don't think is provided in cluster, and not trivial in other languages in every use case.
If the calculation is independent of the Person I suggest you extract it, and use the usual (in index.js):
if(cluster.isWorker) {
//Use the i for calculation
} else {
//Create Person, then fork children in for loop
}
You then collect the results and change the Person as needed. You will be copying index.js, but this is standard and you only run what you need.
The problem is if results are dependent on Person. If these are constant for all i you can still send them to your forks independently. Otherwise what you have is the only way to fork. In general forking in cluster is not meant for methods, but for the app itself, which is the standard forking behavior.
Another solution
Following your comment, I suggest you checkout child_process.execFile or child_process.exec on same file.
This way you can spawn a totally independent process on the fly. Now instead of calling cluster.fork you call execFile. You can use either the exit code or stdout as return values (stderr etc.). Promise is now replaced with:
var results = []
for(var i = 0; i < distance; i++) {
// run a separate process for each
results.push(child_process.execFile().child.execFile('node', 'mymethod.js`,i]));
}
//... catch the exit event from all results or return a callback using results.
Inside mymethod.js Have your code that takes i and returns what you want either in the exit code or through stdout, both properties of the returned child_process. This is a bit un-node.js-y since you're waiting on asynchronous calls, but you're requirements are non standard. Since I'm not sure how you use this perhaps returning a callback with the array is a better idea.
My Node.js program - which is an ordinary command line program that by and large doesn't do anything operationally unusual, nothing system-specific or asynchronous or anything like that - needs to write messages to a file from time to time, and then it will be interrupted with ^C and it needs the contents of the file to still be there.
I've tried using fs.createWriteStream but that just ends up with a 0-byte file. (The file does contain text if the program ends by running off the end of the main file, but that's not the scenario I have.)
I've tried using winston but that ends up not creating the file at all. (The file does contain text if the program ends by running off the end of the main file, but that's not the scenario I have.)
And fs.writeFile works perfectly when you have all the text you want to write up front, but doesn't seem to support appending a line at a time.
What is the recommended way to do this?
Edit: specific code I've tried:
var fs = require('fs')
var log = fs.createWriteStream('test.log')
for (var i = 0; i < 1000000; i++) {
console.log(i)
log.write(i + '\n')
}
Run for a few seconds, hit ^C, leaves a 0-byte file.
Turns out Node provides a lower level file I/O API that seems to work fine!
var fs = require('fs')
var log = fs.openSync('test.log', 'w')
for (var i = 0; i < 100000; i++) {
console.log(i)
fs.writeSync(log, i + '\n')
}
NodeJS doesn't work in the traditional way. It uses a single thread, so by running a large loop and doing I/O inside, you aren't giving it a chance (i.e. releasing the CPU) to do other async operations for eg: flushing memory buffer to actual file.
The logic must be - do one write, then pass your function (which invokes the write) as a callback to process.nextTick or as callback to the write stream's drain event (if buffer was full during last write).
Here's a quick and dirty version which does what you need. Notice that there are no long-running loops or other CPU blockage, instead I schedule my subsequent writes for future and return quickly, momentarily freeing up the CPU for other things.
var fs = require('fs')
var log = fs.createWriteStream('test.log');
var i = 0;
function my_write() {
if (i++ < 1000000)
{
var res = log.write("" + i + "\r\n");
if (!res) {
log.on('drain',my_write);
} else {
process.nextTick(my_write);
}
console.log("Done" + i + " " + res + "\r\n");
}
}
my_write();
This function might also be helpful.
/**
* Write `data` to a `stream`. if the buffer is full will block
* until it's flushed and ready to be written again.
* [see](https://nodejs.org/api/stream.html#stream_writable_write_chunk_encoding_callback)
*/
export function write(data, stream) {
return new Promise((resolve, reject) => {
if (stream.write(data)) {
process.nextTick(resolve);
} else {
stream.once("drain", () => {
stream.off("error", reject);
resolve();
});
stream.once("error", reject);
}
});
}
You are writing into file using for loop which is bad but that's other case. First of all createWriteStream doesn't close the file automatically you should call close.
If you call close immediately after for loop it will close without writing because it's async.
For more info read here: https://nodejs.org/api/fs.html#fs_fs_createwritestream_path_options
Problem is async function inside for loop.
I have a NodeJS application which uses the fs API to read files from a directory tree. I'm using the fs-walk module to walk the tree. For every sub directory encountered, the same function executes again to handle it. (I don't think this is recursion; rather, the same function is bound to an event which is fired each time a directory is handled.) Files are handled by a different function, which does stuff to them.
I'd like to execute arbitrary code once all files have been read without using synchronous or blocking code. I couldn't find any way to keep track of the number of files in a directory (to count down, for instance), nor could I find any attribute in fs.stat to indicate that the entire operation has completed.
Had anyone found a way to do this yet? I could find nothing in the node docs or on stack overflow.
After reviewing the fs-walk library a little closer, it looks like the third argument to the walk() method is actually a final callback. Internally they are using the async library, specifically async.whilst() and async.waterfall() methods which will execute the final callback when everything is complete.
I think the intention of the library creator is for that final callback to be executed when all async actions are completed. If that isn't working, you may want to file an issue in Github for it:
According to the code, you should be able to do:
var walk = require('fs-walk';
walk('/some/dir', someFileOrDirHandler, function(err) {
// This should be a final callback, if the first argument is present,
// then there was an error
if (err) {
/* handle it */
return;
}
// Getting here indicates success
});
As a compromise in performance, I ended up doing a total file count using a recursive function that accessed the file system synchronously. Using the total, I then accessed all the files asynchronously, decrementing the total each time. Once the total reached zero, I executed a function to handle all of the completed data.
var countAllFiles = new Promise(function (resolve, reject) {
var total = 0,
count = function (path) {
var contents = fs.readdirSync(path), file, name;
for (file in contents) {
if (!contents.hasOwnProperty(file)) continue;
name = path + '/' + contents[file];
if (fs.statSync(name).isDirectory())
count(name);
else
++total;
}
};
count('/path/to/tree/');
resolve(total);
}).then(function (total) {
walk.dirs('/path/to/tree/', handlerFunction, errorHandler);
// for every file, decrement total. Then, if it's zero, execute the code that
// depends on all the read/write operations being complete
});