Nodejs exec child process stdout not getting all the chunks - node.js

I'm trying to send messages from my child process to main my process but some chunks are not being sent, possibly because the file is too big.
main process:
let response = ''
let error = ''
await new Promise(resolve => {
const p = exec(command)
p.stdout.on('data', data => {
// this gets triggered many times because the html string is big and gets split up
response += data
})
p.stderr.on('data', data => {
error += data
})
p.on('exit', resolve)
})
console.log(response)
child process:
// only fetch 1 page, then quit
const bigHtmlString = await fetchHtmlString(url)
process.stdout.write(bigHtmlString)
I know the child process works because when I run the it directly, I can see the end of the file in in the console. But when I run the main process, I cannot see the end of the file. It's quite big so I'm not sure exactly what chunks are missing.
edit: there's also a new unknown problem. when I add a wait at the end of my child process, it doesn't wait, it closes. So I'm guessing it crashes somehow? I'm not seeing any error even with p.on('error', console.log)
example:
const bigHtmlString = await fetchHtmlString(url)
process.stdout.write(bigHtmlString)
// this never gets executed, the process closes. The wait works if I launch the child process directly
await new Promise(resolve => setTimeout(resolve, 1000000))

process.stdout.write(...) returns true/false depending on whether it wrote the string or not. If it returns false, you can listen() to the drain event to make sure it finishes.
Something like this:
const bigHtmlString = await fetchHtmlString(url);
const wrote = process.stdout.write(bigHtmlString);
if (!wrote){
// this effectively means "wait for this
// event to fire", but it doesn't block everything
process.stdout.on('drain', ...doSomethingHere)
}

My suggestion from the comments resolved the issue so I'm posting it as an answer.
I would suggest using spawn instead of exec. The latter buffers the output and flushes it when the process is ended (or the buffer is full) while spawn is streaming the output which is better for huge output like in your case

Related

How do I avoid a race condition with Node.js's process.send?

What exactly happens when a child process (created by child_process.fork()) in Node sends a message to its parent (process.send()) before the parent has an event handler for the message (child.on("message",...))? (It seems, at least, like there must be some kind of buffer.)
In particular, I'm faced with what seems like an unavoidable race condition - I cannot install a message handler on a child process until after I've finished the call to fork, but the child could potentially send me (the parent) a message right away. What guarantee do I have that, assuming a particularly horrible interleaving of OS processes, I will receive all messages sent by my child?
Consider the following example code:
parent.js:
const child_process = require("child_process");
const child_module = require.resolve("./child");
const run = async () => {
console.log("parent start");
const child = child_process.fork(child_module);
await new Promise(resolve => setTimeout(resolve, 40));
console.log("add handler");
child.on("message", (m) => console.log("parent receive:", m));
console.log("parent end");
};
run();
child.js:
console.log("child start");
process.send("123abc");
console.log("child end");
In the above, I'm hoping to simulate a "bad interleaving" by preventing the message handler from being installed for a few milliseconds (suppose that a context switch takes place immediately after the fork, and that some other processes run for a while before the parent's node.js process can be scheduled again). In my own testing, the parent seems to "reliably" receive the message with numbers << 40ms (e.g. 20ms), but for values >35ms, it's flaky at best, and for values >> 40ms (e.g. 50 or 60), the message is never received. What's special about these numbers - just how fast the processes are being scheduled on my machine?
It seems to be independent of whether the handler is installed before or after the message is sent. For example, I've observed both of the following executions with the timeout set to 40 milliseconds. Notice that in each one, the child's "end" message (indicating that the process.send() has already happened) comes before "add handler". In one case, the message is received, but in the next, it's lost. It's possible, I suppose, that buffering of the standard output of these processes could potentially cause these outputs to be misrepresenting the true execution - is that's what's going on here?
Execution A:
parent start
child start
child end
add handler
parent end
parent receive: 123abc
Execution B:
parent start
child start
child end
add handler
parent end
In short - is there a solution to this apparent race condition? I seem to be able to "reliably" receive messages as long as I install a handler "soon" enough - but am I just getting lucky, or is there some guarantee that I'm getting? How do I ensure, without relying on luck, that this code will always work (barring cosmic rays, spilled coffee, etc...)? I can't seem to find any detail about how this is supposed to work in the Node documentation.
What exactly happens when a child process (created by child_process.fork()) in Node sends a message to its parent (process.send()) before the parent has an event handler for the message (child.on("message",...))? (It seems, at least, like there must be some kind of buffer.)
First off, the fact that a message arrived from another process goes into the nodejs event queue. It won't be processed until the current nodejs code finishes whatever it was doing and returns control back to the event loop so that it can process the next event in the event queue. If that moment arrives before there is any listener for that incoming event, then it is just received and then thrown away. The message arrives, the code looks to call any registered event handlers and if there are none, then it's done. It's the same as if you call eventEmitter.emit("someMsg", data) and there are no listeners for "someMsg". But, read on, there is hope for your specific situation.
In particular, I'm faced with what seems like an unavoidable race condition - I cannot install a message handler on a child process until after I've finished the call to fork, but the child could potentially send me (the parent) a message right away. What guarantee do I have that, assuming a particularly horrible interleaving of OS processes, I will receive all messages sent by my child?
Fortunately, due to the single-threaded, event-driven nature of nodejs, this is not a problem. You can install the message handler before there's any chance of the message arriving and being processed. This is because even though the child may be started up and may be running independently using other CPUs or interleaved with your process, the single-threaded nature and the event driven architecture help you solve this problem.
If you do something like this:
const child = child_process.fork(child_module);
child.on("message", (m) => console.log("parent receive:", m));
Then you are guaranteed that your message handler will be installed before there's any chance of an incoming message being processed and you will not miss it. This is because the interpreter is busy running these two lines of code and does not return control back to the event loop until after these two lines of code are run. Therefore, no incoming message from the child_module can get processed before your child.on(...) handler is installed.
Now, if you purposely do return back to the event loop as you are doing here with the await before installing your event handler like your code here:
const run = async () => {
console.log("parent start");
const child = child_process.fork(child_module);
// this await allows events in the event queue to be processed
// while this function is suspended waiting for the await
await new Promise(resolve => setTimeout(resolve, 40));
console.log("add handler");
child.on("message", (m) => console.log("parent receive:", m));
console.log("parent end");
};
run();
Then, you have purposely introduced a race condition with your own coding that can be avoided by just installing the event handler BEFORE the await like this:
const run = async () => {
console.log("parent start");
// no events will be processed before these next three statements run
const child = child_process.fork(child_module);
console.log("add handler");
child.on("message", (m) => console.log("parent receive:", m));
await new Promise(resolve => setTimeout(resolve, 40));
console.log("parent end");
};
run();

How not to exit node process when using readline-sync?

I'm creating a pomodoro timer with node.
At the moment I start the project as so node start.js coding
I let that run as I do my work. When I need a break I terminate the process and the time I've spent coding get's added into a JSON file as so
{
"code": [
{
"seconds": 1,
"time": "00 : 00 : 01",
"date": "2020-06-28T03:08:42.340Z"
}
],
"read": [],
"write": []
}
Now I'm just trying to think of things I'll need in the future, I'll most definitely forget the keys in the above object. Is it code, coding, write, writing? So I thought I'd have a prompt.
var objKeys = [...Object.keys(obj), 'info']
const inputVariable = objKeys[readline.keyInSelect(objKeys, 'What are you going to be working on?')]
As it is, once I make the selection, the process terminates, I don't want that.
(I could make the selection when I want to actually terminate, but most likely that would be confusing)
Is there a way to make the selection and still keep the process running
EDIT
const time = require('./module/timeEntry');
var readline = require('readline-sync');
var obj = require('./data.json') // has the above json code
var start = process.hrtime(); // start the timer
var objKeys = Object.keys(obj)
const inputVariable = objKeys[readline.keyInSelect(objKeys, 'Which task you're going to work on?')]
function exitHandler(options, exitCode) {
if (options.cleanup) console.log('clean');
if (exitCode || exitCode === 0) {
//code
if(inputVariable !== "info"){
time.timeEntry(obj, start, inputVariable) // reads and writes to file
}
}
if (options.exit) process.exit();
}
process.on('exit', exitHandler.bind(null,{cleanup:true}));
// I want to be able to do this: `ctrl+c`
process.on('SIGINT', exitHandler.bind(null, {exit:true}));
The problem is the moment the inputVariable is entered by the user, the process ends.
time.timeEntry(obj, start, inputVariable) simply reads and writes some time keeping info into JSON.
var fs = require('fs');
var getTime = require('./getTime')
const timeEntry = (obj, start, segment ) => {
let totalSeconds = process.hrtime(start)[0];
obj[segment].push({
seconds: totalSeconds,
time: getTime.getTime(totalSeconds),
date: new Date
})
let data = JSON.stringify(obj);
fs.writeFileSync('data.json', data , 'utf-8');
}
exports.timeEntry = timeEntry;
I don't need to use readline-sync, if I instead use const inputVariable = process.argv[2] and run node start.js coding, the process wouldn't be terminated which is what I want
Here's what's your code is doing right now:
read in the inputVariable from the user
define a function exitHandler
tell Node to invoke exitHandler when the program exits
tell Node to invoke exitHandler upon receiving SIGINT
Note that none of these steps involves calling exitHandler, or doing anything else for that matter (e.g. there's no code here that waits for anything to happen).
Perhaps the confusion is coming from the use of process.on: this function tells Node that when it receives SIGINT (e.g. you press Ctrl-C), then it should call exitHandler. In particular, this does not tell your program "pause execution until SIGINT is received". As a result, after calling process.on, your code "continues" but there's no more code to run so the process ends (before it can ever receive SIGINT).
It seems like you want your program to do nothing until the signal is received; in that case you could add
while (true) {}
or similar at the end. So your code will do nothing (forever), until it receives SIGINT, then it will call exitHandler.

Node-cli freeze after Spawning a child_process

The cli could not receive keyboard input after execution, and this also includes 'ctrl-c' and 'ctrl-z', thus you have to manually exit the program. It gave me a lot of trouble, please take a look at it;
var { exec, spawn } = require("child_process");
let cmd = (cmdline, consolelog = true) => {
return new Promise((resolve, reject) => {
let cmdarray = cmdline.split(" ");
let result = "";
let error = "";
let child = spawn(cmdarray.shift(), cmdarray);
process.stdin.pipe(child.stdin);
child.stdout.setEncoding("utf8");
child.stderr.setEncoding("utf8");
child.stderr.on("data", data => {
if (consolelog) process.stdout.write(data.toString());
error = data.toString();
});
child.stdout.on("data", data => {
if (consolelog) process.stdout.write(data.toString());
result = data.toString();
});
child.on("close", code => {
if (consolelog) process.stdout.write(`Exit code: ${code}\n`);
code == 0 ? resolve(result) : reject(error);
});
});
};
OS: osx & ubuntu 19.04
Test case:
cmd("echo hi");
Edit:
Normal circumstances : put the code inside myprogram.js and use node myprogram.js to activate the script. It works perfectly, and you can also try different commands. HOWEVER, if you put following code by using
$ node
> let cmd = require(PATH_TO_CMD_FUNCTION)
> cmd("echo hi");
The node-cli will freeze and stop listening to your keyboard input.
Edit 2:
Found out, you need to channel through {stdio: "inherit"}
UPDATED ANSWER:
I trimmed down your spawner a little in order to be succinct, and eliminate any other possibilities. There is one common test case I could find to reproduce stated issue regarding signals, keyboard shortcuts, and trapped input.
If you spawn the 'sh' command, you will not be able to escape from the spawned process by means of conventional signal keyboard shortcuts. This is because node.js "traps" the input and forwards it directly to the spawned process.
Most processes allow killing via signalling through keyboard shortcuts such as CTRL-C. 'sh', however, does not-- and so is a perfect example.
The only ways to exit are to use the 'exit' command, close the window (which may possibly leave the spawned process running in the background), reboot your machine, etc. Also, internally or by other means sending a signal, but not via stdin or equivalent.
Your CTRL-C input, in other words, is "normally working" not because it is killing your node app, but because it is being forwarded to the spawned process and killing it.
The spawned process will continue to trap your input if it is immune.
require("child_process").spawn("sh", {
shell: true,
encoding: 'utf8',
stdio: [0,1,2]
});
This may not be the best example for your specific program, but it illustrates the principle, which is the closest I can come since I cannot replicate with the given test case (I have tried it on my phone, my laptop, and my cloud server, three different versions of node, two different versions of Ubuntu).
In any case, it sounds like your stdin is not being "let go" by the spawned process. You may need to "reassign" it to the original process.stdin .
As stated here:
https://node.readthedocs.io/en/latest/api/child_process/
Also, note that node establishes signal handlers for 'SIGINT' and
'SIGTERM', so it will not terminate due to receipt of those signals,
it will exit.
PREVIOUSLY:
It looks like your cmd function is only getting one argument (the command itself) due to the split and shift. Spawn expects a string with the whole command, so likely it is only getting "echo" without "hi", so it isn't exiting due to hanging on "echo". May need to append a newline ("\n") as well.
It also may help to nest the command in an sh command that then executes it, so it runs in a shell.
Like this:
var { exec, spawn } = require("child_process");
let cmd = (cmdline, consolelog = true) => {
return new Promise((resolve, reject) => {
let result = "";
let error = "";
// This. Note the shell option.
let child = spawn(cmdline, {shell:true});
process.stdin.pipe(child.stdin);
child.stdout.setEncoding("utf8");
child.stderr.setEncoding("utf8");
child.stderr.on("data", data => {
if (consolelog) process.stdout.write(data.toString());
error = data.toString();
});
child.stdout.on("data", data => {
if (consolelog) process.stdout.write(data.toString());
result = data.toString();
});
child.on("close", code => {
if (consolelog) process.stdout.write(`Exit code: ${code}\n`);
code == 0 ? resolve(result) : reject(error);
});
});
};
cmd("echo hello");
Output:
hello
Exit code: 0

Exit Process When all Readline on('line') Callbacks Complete

I have a Node v10.14.1 program that reads a CSV file line-by-line using the readline Interface
My .on('line') is an async callback performs some operations which read/write from a db, thus I use async/await to deal with the promises.
A short version of the program's code block of interest would look something like:
const readline = require('readline');
const filesystem = require('fs');
const reader = readline.createInterface({
input: filesystem.createReadStream(pathToSomeCSV)
});
reader.on('line', async (line) => {
await doSomeDBStuff();
})
If I leave the above the way it is, the process does not exit. However, if I
reader.on('close', () => {process.exit()});
then the process exits prior to all of the on('line') callbacks finishing and their promises resolving.
My question is: is there a way to say "Upon all lines being read AND all on('line') callbacks being completed with their promises resolved, then exit the process (I assume with process.exit())"?
Investigation
I get the feeling the docs are leaving some non-obvious details out. I was unable to get this official example working correctly (which is what your question appears to be based on). That implementation would kill my application prematurely. Or, if I removed the 'close' listener, the terminal would just hang forever on exit. I tried overriding process.on('exit') to no avail. I also tried the prompt-sync package, but it consistently corrupted my terminal.
Solution
I found a lovely answer here which offers a good solution.
Create the function:
const prompt = msg => {
fs.writeSync(1, String(msg));
let s = '', buf = Buffer.alloc(1);
while(buf[0] - 10 && buf[0] - 13)
s += buf, fs.readSync(0, buf, 0, 1, 0);
return s.slice(1);
};
Use it:
const result = prompt('Input something: ');
console.log('Your input was: ' + result);
No terminal corruption, the application does not die prematurely, and it does not hang on exit, either.
This solution is not perfect however - it intentionally blocks the main thread while waiting for user input, meaning you cannot run other functions in the background while waiting for user input. In my mind user input should be thread-blocking in most cases anyway, so this solution works very well for me personally.
Edit: see an improved version for Linux here.

Better way to make node not exit?

In a node program I'm reading from a file stream with fs.createReadStream. But when I pause the stream the program exits. I thought the program would keep running since the file is still opened, just not being read.
Currently to get it to not exit I'm setting an interval that does nothing.
setInterval(function() {}, 10000000);
When I'm ready to let the program exit, I clear it. But is there a better way?
Example Code where node will exit:
var fs = require('fs');
var rs = fs.createReadStream('file.js');
rs.pause();
Node will exit when there is no more queued work. Calling pause on a ReadableStream simply pauses the data event. At that point, there are no more events being emitted and no outstanding work requests, so Node will exit. The setInterval works since it counts as queued work.
Generally this is not a problem since you will probably be doing something after you pause that stream. Once you resume the stream, there will be a bunch of queued I/O and your code will execute before Node exits.
Let me give you an example. Here is a script that exits without printing anything:
var fs = require('fs');
var rs = fs.createReadStream('file.js');
rs.pause();
rs.on('data', function (data) {
console.log(data); // never gets executed
});
The stream is paused, there is no outstanding work, and my callback never runs.
However, this script does actually print output:
var fs = require('fs');
var rs = fs.createReadStream('file.js');
rs.pause();
rs.on('data', function (data) {
console.log(data); // prints stuff
});
rs.resume(); // queues I/O
In conclusion, as long as you are eventually calling resume later, you should be fine.
Short way based on answers below
require('fs').createReadStream('file.js').pause();

Resources