node.js child_process#spawn bypass stdin/stdout inner buffers

node.js child_process#spawn bypass stdin/stdout inner buffers - node.js

I'm using child_process#spawn to use external binaries through node.js. Each binary search for precise words in a string, depending on a language, and produce an output depending on an input text. They don't have inner buffers. usage examples :
echo "I'm a random input" | ./my-english-binary
produces text like The word X is in the sentence
cat /dev/urandom | ./my-english-binary produces infinite ouptut
I want to use each of these binaries as "server". I want to launch a new binary instance after a language never found before is met, send data to it with stdin.write() when necessary, and get its output directly with stdout.on('data') event. The problem is that stdout.on('data') isn't called before a huge amount of data is sent to stdin.write(). stdout or stdin (or both) might have inner blocking buffers... But I want the output as soon as possible because otherwise, the program might wait hours before new input showed up and unlock stdin.write() or stdout.on('data'). How can I change their inner buffer size ? Or maybe can I use another non-blocking system ?
My code is:
const spawn = require('child_process').spawn;
const path = require('path');
class Driver {
constructor() {
// I have one binary per language
this.instances = {
frFR: {
instance: null,
path: path.join(__dirname, './my-french-binary')
},
enGB: {
instance: null,
path: path.join(__dirname, './my-english-binary')
}
}
};
// this function just check if an instance is running for a language
isRunning(lang) {
if (this.instances[lang] === undefined)
throw new Error("Language not supported by TreeTagger: " + lang);
return this.instances[lang].instance !== null;
}
// launch a binary according to a language and attach the function 'onData' to the stdout.on('data') event
run(lang, onData) {
const instance = spawn(this.instances[lang].path,{cwd:__dirname});
instance.stdout.on('data', buf => onData(buf.toString()));
// if a binary instance is killed, it will be relaunched later
instance.on('close', () => this.instances[lang].instance = null );
this.instances[lang].instance = instance;
}
/**
* indefinitely write to instance.stdin()
* I want to avoid this behavior by just writing one time to stdin
* But if I write only one time, stdout.on('data') is never called
* Everything works if I use stdin.end() but I don't want to use it
*/
write(lang, text) {
const id = setInterval(() => {
console.log('setInterval');
this.instances[lang].instance.stdin.write(text + '\n');
}, 1000);
}
};
// simple usage example
const driver = new Driver;
const txt = "This is a random input.";
if (driver.isRunning('enGB') === true)
driver.write('enGB', txt);
else {
/**
* the arrow function is called once every N stdin.write()
* While I want it to be called after each write
*/
driver.run('enGB', data => console.log('Data received!', data));
driver.write('enGB', txt);
}
I tried to:
use cork() and uncork() around stdin.write().
pipe child_process.stdout() to a custom Readable and a Socket.
Override the highWaterMark value to 1 and 0 in stdin, stdout and the aforementioned Readable
Lots of other things I forgot...
Moreover, I can't use stdin.end() because I don't want to kill my binaries instances each time a new text arrives. Does anyone have an idea ?

Related

Node.js readable maximize throughput/performance for compute intense readable - Writable doesn't pull data fast enough

General setup
I developed an application using AWS Lambda node.js 14.
I use a custom Readable implementation FrameCreationStream that uses node-canvas to draw images, svgs and more on a canvas. This result is then extracted as a raw image buffer in BGRA. A single image buffer contains 1920 * 1080 * 4 Bytes = 8294400 Bytes ~8 MB.
This is then piped to stdin of a child_process running ffmpeg.
The highWaterMark of my Readable in objectMode:true is set to 25 so that the internal buffer can use up to 8 MB * 25 = 200 MB.
All this works fine and also doesn't contain too much RAM. But I noticed after some time, that the performance is not ideally.
Performance not optimal
I have an example input that generates a video of 315 frames. If I set highWaterMark to a value above 25 the performance increases to the point, when I set to a value of 315 or above.
For some reason ffmpeg doesn't start to pull any data until highWaterMark is reached. Obviously thats not what I want. ffmpeg should always consume data if minimum 1 frame is cached in the Readable and if it has finished processing the frame before. And the Readable should produce more frames as long highWaterMark isn't reached or the last frame has been reached. So ideally the Readable and the Writeable are busy all the time.
I found another way to improve the speed. If I add a timeout in the _read() method of the Readable after let's say every tenth frame for 100 ms. Then the ffmpeg-Writable will use this timeout to write some frames to ffmpeg.
It seems like frames aren't passed to ffmpeg during frame creation because some node.js main thread is busy?
The fastest result I have if I increase highWaterMark above the amount of frames - which doesn't work for longer videos as this would make the AWS Lambda RAM explode. And this makes the whole streaming idea useless. Using timeouts always gives me stomach pain. Also depending on the execution on different environments a good fitting timeout might differ. Any ideas?
FrameCreationStream
import canvas from 'canvas';
import {Readable} from 'stream';
import {IMAGE_STREAM_BUFFER_SIZE, PerformanceUtil, RenderingLibraryError, VideoRendererInput} from 'vm-rendering-backend-commons';
import {AnimationAssets, BufferType, DrawingService, FullAnimationData} from 'vm-rendering-library';
/**
* This is a proper back pressure compatible implementation of readable for a having a stream to read single frames from.
* Whenever read() is called a new frame is created and added to the stream.
* read() will be called internally until options.highWaterMark has been reached.
* then calling read will be paused until one frame is read from the stream.
*/
export class FrameCreationStream extends Readable {
drawingService: DrawingService;
endFrameIndex: number;
currentFrameIndex: number = 0;
startFrameIndex: number;
frameTimer: [number, number];
readTimer: [number, number];
fullAnimationData: FullAnimationData;
constructor(animationAssets: AnimationAssets, fullAnimationData: FullAnimationData, videoRenderingInput: VideoRendererInput, frameTimer: [number, number]) {
super({highWaterMark: IMAGE_STREAM_BUFFER_SIZE, objectMode: true});
this.frameTimer = frameTimer;
this.readTimer = PerformanceUtil.startTimer();
this.fullAnimationData = fullAnimationData;
this.startFrameIndex = Math.floor(videoRenderingInput.startFrameId);
this.currentFrameIndex = this.startFrameIndex;
this.endFrameIndex = Math.floor(videoRenderingInput.endFrameId);
this.drawingService = new DrawingService(animationAssets, fullAnimationData, videoRenderingInput, canvas);
console.time("read");
}
/**
* this method is only overwritten for debugging
* #param size
*/
read(size?: number): string | Buffer {
console.log("read("+size+")");
const buffer = super.read(size);
console.log(buffer);
console.log(buffer?.length);
if(buffer) {
console.timeLog("read");
}
return buffer;
}
// _read() will be called when the stream wants to pull more data in.
// _read() will be called again after each call to this.push(dataChunk) once the stream is ready to accept more data. https://nodejs.org/api/stream.html#readable_readsize
// this way it is ensured, that even though this.createImageBuffer() is async, only one frame is created at a time and the order is kept
_read(): void {
// as frame numbers are consecutive and unique, we have to draw each frame number (also the first and the last one)
if (this.currentFrameIndex <= this.endFrameIndex) {
PerformanceUtil.logTimer(this.readTimer, 'WAIT -> READ\t');
this.createImageBuffer()
.then(buffer => this.optionalTimeout(buffer))
// push means adding a buffered raw frame to the stream
.then((buffer: Buffer) => {
this.readTimer = PerformanceUtil.startTimer();
// the following two frame numbers start with 1 as first value
const processedFrameNumberOfScene = 1 + this.currentFrameIndex - this.startFrameIndex;
const totalFrameNumberOfScene = 1 + this.endFrameIndex - this.startFrameIndex;
// the overall frameId or frameIndex starts with frameId 0
const processedFrameIndex = this.currentFrameIndex;
this.currentFrameIndex++;
this.push(buffer); // nothing besides logging should happen after calling this.push(buffer)
console.log(processedFrameNumberOfScene + ' of ' + totalFrameNumberOfScene + ' processed - full video frameId: ' + processedFrameIndex + ' - buffered frames: ' + this.readableLength);
})
.catch(err => {
// errors will be finally handled, when subscribing to frameCreation stream in ffmpeg service
// this log is just generated for tracing errors and if for some reason the handling in ffmpeg service doesn't work
console.log("createImageBuffer: ", err);
this.emit("error", err);
});
} else {
// push(null) makes clear that this stream has ended
this.push(null);
PerformanceUtil.logTimer(this.frameTimer, 'FRAME_STREAM');
}
}
private optionalTimeout(buffer: Buffer): Promise<Buffer> {
if(this.currentFrameIndex % 10 === 0) {
return new Promise(resolve => setTimeout(() => resolve(buffer), 140));
}
return Promise.resolve(buffer);
}
// prevent memory leaks - without this lambda memory will increase with every call
_destroy(): void {
this.drawingService.destroyStage();
}
/**
* This creates a raw pixel buffer that contains a single frame of the video drawn by the rendering library
*
*/
public async createImageBuffer(): Promise<Buffer> {
const drawTimer = PerformanceUtil.startTimer();
try {
await this.drawingService.drawForFrame(this.currentFrameIndex);
} catch (err: any) {
throw new RenderingLibraryError(err);
}
PerformanceUtil.logTimer(drawTimer, 'DRAW -> FRAME\t');
const bufferTimer = PerformanceUtil.startTimer();
// Creates a raw pixel buffer, containing simple binary data
// the exact same information (BGRA/screen ratio) has to be provided to ffmpeg, because ffmpeg cannot detect format for raw input
const buffer = await this.drawingService.toBuffer(BufferType.RAW);
PerformanceUtil.logTimer(bufferTimer, 'CANVAS -> BUFFER');
return buffer;
}
}
FfmpegService
import {ChildProcess, execFile} from 'child_process';
import {Readable} from 'stream';
import {FPS, StageSize} from 'vm-rendering-library';
import {
FfmpegError,
LOCAL_MERGE_VIDEOS_TEXT_FILE, LOCAL_SOUND_FILE_PATH,
LOCAL_VIDEO_FILE_PATH,
LOCAL_VIDEO_SOUNDLESS_MERGE_FILE_PATH
} from "vm-rendering-backend-commons";
/**
* This class bundles all ffmpeg usages for rendering one scene.
* FFmpeg is a console program which can transcode nearly all types of sounds, images and videos from one to another.
*/
export class FfmpegService {
ffmpegPath: string = null;
constructor(ffmpegPath: string) {
this.ffmpegPath = ffmpegPath;
}
/**
* Convert a stream of raw images into an .mp4 video using the command line program ffmpeg.
*
* #param inputStream an input stream containing images in raw format BGRA
* #param stageSize the size of a single frame in pixels (minimum is 2*2)
* #param outputPath the filepath to write the resulting video to
*/
public imageToVideo(inputStream: Readable, stageSize: StageSize, outputPath: string): Promise<void> {
const args: string[] = [
'-f',
'rawvideo',
'-r',
`${FPS}`,
'-pix_fmt',
'bgra',
'-s',
`${stageSize.width}x${stageSize.height}`,
'-i',
// input "-" means input will be passed via pipe (streamed)
'-',
// codec that also QuickTime player can understand
'-vcodec',
'libx264',
'-pix_fmt',
'yuv420p',
/*
* "-movflags faststart":
* metadata at beginning of file
* needs more RAM
* file will be broken, if not finished properly
* higher application compatibility
* better for browser streaming
*/
'-movflags',
'faststart',
// "-preset ultrafast", //use this to speed up compression, but quality/compression ratio gets worse
// don't overwrite an existing file here,
// but delete file in the beginning of execution index.ts
// (this is better for local testing believe me)
outputPath
];
return this.execFfmpegPromise(args, inputStream);
}
private execFfmpegPromise(args: string[], inputStream?: Readable): Promise<void> {
const ffmpegServiceSelf = this;
return new Promise(function (resolve, reject) {
const executionProcess: ChildProcess = execFile(ffmpegServiceSelf.ffmpegPath, args, (err) => {
if (err) {
reject(new FfmpegError(err));
} else {
console.log('ffmpeg finished');
resolve();
}
});
if (inputStream) {
// it's important to listen on errors of input stream before piping it into the write stream
// if we don't do this here, we get an unhandled promise exception for every issue in the input stream
inputStream.on("error", err => {
reject(err);
});
// don't reject promise here as the error will also be thrown inside execFile and will contain more debugging info
// this log is just generated for tracing errors and if for some reason the handling in execFile doesn't work
inputStream.pipe(executionProcess.stdin).on("error", err => console.log("pipe stream: " , err));
}
});
}
}

Memory efficient growing Nodejs Duplex/Transform stream

I am trying to add variables into a template at specific indices through streams.
The idea is that I have a readable stream in and a list of variables that can be either a readable stream a buffer or a string of an undetermined size. These variables can be inserted at a predefined list of indices. I have a few questions based on my assumptions and what I have tried so far.
My first attempt was to do it manually with readable streams. However, I couldn't const buffer = templateIn.read(size) (since the buffers were still empty) before template combined was trying to read it. The solution for that problem is similar to how you'd use a transform stream so that was the next step I took.
However, I have a problem with the transform streams. My problem is that something like this pseudo code will pile up buffers into memory until done() is called.
public _transform(chunk: Buffer, encoding: string, done: (err?: Error, data?: any) => void ): void {
let index = 0;
while (index < chunk.length) {
if (index === this.variableIndex) { // the basic idea (the actual logic is a bit more complex)
this.insertStreamHere(index);
index++;
} else {
// continue reading stream normally
}
}
done()
}
From: https://github.com/nodejs/node/blob/master/lib/_stream_transform.js
In a transform stream, the written data is placed in a buffer. When
_read(n) is called, it transforms the queued up data, calling the
buffered _write cb's as it consumes chunks. If consuming a single
written chunk would result in multiple output chunks, then the first
outputted bit calls the readcb, and subsequent chunks just go into
the read buffer, and will cause it to emit 'readable' if necessary.
This way, back-pressure is actually determined by the reading side,
since _read has to be called to start processing a new chunk. However,
a pathological inflate type of transform can cause excessive buffering
here. For example, imagine a stream where every byte of input is
interpreted as an integer from 0-255, and then results in that many
bytes of output. Writing the 4 bytes {ff,ff,ff,ff} would result in
1kb of data being output. In this case, you could write a very small
amount of input, and end up with a very large amount of output. In
such a pathological inflating mechanism, there'd be no way to tell
the system to stop doing the transform. A single 4MB write could
cause the system to run out of memory.
So TL;DR: How do I insert (large) streams at a specific index, without having a huge back pressure of buffers in memory. Any advice is appreciated.

After a lot of reading the documentation and the source code, a lot of trial and error and some testing. I have come up with a solution for my problem. I can just copy and paste my solution, but for the sake of completeness I will explain my findings here.
Handling the back pressure with pipes consists out of a few parts. We've got the Readable that writes data to the Writable. The Readable provides a callback for the Writable with which it can tell the Readable it is ready to receive a new chunk of data. The reading part is simpler. The Readable has an internal buffer. Using Readable.push() will add data to the buffer. When the data is being read, it will come from this internal buffer. Next to that we can use Readable.readableHighWaterMark and Readable.readableLength to make sure we don't push to much data at once.
Readable.readableHighWaterMark - Readable.readableLength
is the maximum amount of bytes we should push to this internal buffer.
So this means, since we want to read from two Readable streams at the same time we need two Writable streams to control the flow. To merge data we will need to buffer it ourselves, since there is (as far as I know) no internal buffer in the Writable stream. So the Duplex stream will be the best option, because we want to handle buffering, writing and reading our selves.
Writing
So let's get to the code now. To control the state of multiple streams we will create a state interface. which looks as follows:
declare type StreamCallback = (error?: Error | null) => void;
interface MergingState {
callback: StreamCallback;
queue: BufferList;
highWaterMark: number;
size: number;
finalizing: boolean;
}
The callback holds the last callback provided by either write or final (we'll get to final later). highWaterMark indicates the maximum size for the our queue and the size is our current size of the queue. Lastly the finalizing flag indicates that the current queue is the last queue. So once the queue is empty we're done reading the stream belonging to that state.
BufferList is a copy of the internal Nodejs implementation used for the build in streams.
As mentioned before the writable handles the back pressure, so the generalized method for both our writables looks like the following:
/**
* Method to write to provided state if it can
*
* (Will unshift the bytes that cannot be written back to the source)
*
* #param src the readable source that writes the chunk
* #param chunk the chunk to be written
* #param encoding the chunk encoding, currently not used
* #param cb the streamCallback provided by the writing state
* #param state the state which should be written to
*/
private writeState(src: Readable, chunk: Buffer, encoding: string, cb: StreamCallback, state: MergingState): void {
this.mergeNextTick();
const bytesAvailable = state.highWaterMark - state.size;
if (chunk.length <= bytesAvailable) {
// save to write to our local buffer
state.queue.push(chunk);
state.size += chunk.length;
if (chunk.length === bytesAvailable) {
// our queue is full, so store our callback
this.stateCallbackAndSet(state, cb);
} else {
// we still have some space, so we can call the callback immediately
cb();
}
return;
}
if (bytesAvailable === 0) {
// no space available unshift entire chunk
src.unshift(chunk);
} else {
state.size += bytesAvailable;
const leftOver = Buffer.alloc(chunk.length - bytesAvailable);
chunk.copy(leftOver, 0, bytesAvailable);
// push amount of bytes available
state.queue.push(chunk.slice(0, bytesAvailable));
// unshift what we cannot fit in our queue
src.unshift(leftOver);
}
this.stateCallbackAndSet(state, cb);
}
First we check how much space is available to buffer. If there is enough space for our full chunk, we'll buffer it. If there is no space available, we will unshift the buffer to its readable source. If there is some space available, we'll only unshift what we cannot fit. If our buffer is full, we will store the callback that requests a new chunk. If there is space we will request our next chunk.
this.mergeNextTick() is called because our state has changed and that it should be read in the next tick:
private mergeNextTick(): void {
if (!this.mergeSync) {
// make sure it is only called once per tick
// we don't want to call it multiple times
// since there will be nothing left to read the second time
this.mergeSync = true;
process.nextTick(() => this._read(this.readableHighWaterMark));
}
}
this.stateCallbackAndSet is a helper function that will just call our last callback to make sure we'll not get in a state that makes our stream stop flowing. And will the new one provided.
/**
* Helper function to call the callback if it exists and set the new callback
* #param state the state which holds the callback
* #param cb the new callback to be set
*/
private stateCallbackAndSet(state: MergingState, cb: StreamCallback): void {
if (!state) {
return;
}
if (state.callback) {
const callback = state.callback;
// do callback next tick, such that we can't get stuck in a writing loop
process.nextTick(() => callback());
}
state.callback = cb;
}
Reading
Now onto the reading side this is the part where we handle selecting the correct stream.
First our function to read the state, which is pretty straight forward. it reads the amount of bytes it is able to read. It returns the amount of bytes written, which is useful information for our other function.
/**
* Method to read the provided state if it can
*
* #param size the number of bytes to consume
* #param state the state from which needs to be read
* #returns the amount of bytes read
*/
private readState(size: number, state: MergingState): number {
if (state.size === 0) {
// our queue is empty so we read 0 bytes
return 0;
}
let buffer = null;
if (state.size < size) {
buffer = state.queue.consume(state.size, false);
} else {
buffer = state.queue.consume(size, false);
}
this.push(buffer);
this.stateCallbackAndSet(state, null);
state.size -= buffer.length;
return buffer.length;
}
The doRead method is where all the merging takes place: it fetches the nextMergingIndex. If the merging index is the END then we can just read the writingState until the end of the stream. If we are at the merging index, we read from the mergingState. Otherwise we read as much from the writingState until we reach the next merging index.
/**
* Method to read from the correct Queue
*
* The doRead method is called multiple times by the _read method until
* it is satisfied with the returned size, or until no more bytes can be read
*
* #param n the number of bytes that can be read until highWaterMark is hit
* #throws Errors when something goes wrong, so wrap this method in a try catch.
* #returns the number of bytes read from either buffer
*/
private doRead(n: number): number {
// first check all constants below 0,
// which is only Merge.END right now
const nextMergingIndex = this.getNextMergingIndex();
if (nextMergingIndex === Merge.END) {
// read writing state until the end
return this.readWritingState(n);
}
const bytesToNextIndex = nextMergingIndex - this.index;
if (bytesToNextIndex === 0) {
// We are at the merging index, thus should read merging queue
return this.readState(n, this.mergingState);
}
if (n <= bytesToNextIndex) {
// We are safe to read n bytes
return this.readWritingState(n);
}
// read the bytes until the next merging index
return this.readWritingState(bytesToNextIndex);
}
readWritingState reads the state and updates the index:
/**
* Method to read from the writing state
*
* #param n maximum number of bytes to be read
* #returns number of bytes written.
*/
private readWritingState(n: number): number {
const bytesWritten = this.readState(n, this.writingState);
this.index += bytesWritten;
return bytesWritten;
}
Merging
For selecting our streams to merge we'll use a generator function. The generator function yields an index and a stream to merge at that index:
export interface MergingStream { index: number; stream: Readable; }
In doRead getNextMergingIndex() is called. This function returns the index of the next MergingStream. If there is no next mergingStream the generator is called to fetch a new mergingStream. If there is no new merging stream, we'll just return END.
/**
* Method to get the next merging index.
*
* Also fetches the next merging stream if merging stream is null
*
* #returns the next merging index, or Merge.END if there is no new mergingStream
* #throws Error when invalid MergingStream is returned by streamGenerator
*/
private getNextMergingIndex(): number {
if (!this.mergingStream) {
this.setNewMergeStream(this.streamGenerator.next().value);
if (!this.mergingStream) {
return Merge.END;
}
}
return this.mergingStream.index;
}
In the setNewMergeStream we are creating a new Writable which we can pipe our new merging stream into. For our Writable We will need to handle the write callback for writing to our state and the final callback to handle the last chunk. We should also not forget to reset our state.
/**
* Method to set the new merging stream
*
* #throws Error when mergingStream has an index less than the current index
*/
private setNewMergeStream(mergingStream?: MergingStream): void {
if (this.mergingStream) {
throw new Error('There already is a merging stream');
}
// Set a new merging stream
this.mergingStream = mergingStream;
if (mergingStream == null || mergingStream.index === Merge.END) {
// set new state
this.mergingState = newMergingState(this.writableHighWaterMark);
// We're done, for now...
// mergingStream will be handled further once nextMainStream() is called
return;
}
if (mergingStream.index < this.index) {
throw new Error('Cannot merge at ' + mergingStream.index + ' because current index is ' + this.index);
}
// Create a new writable our new mergingStream can write to
this.mergeWriteStream = new Writable({
// Create a write callback for our new mergingStream
write: (chunk, encoding, cb) => this.writeMerge(mergingStream.stream, chunk, encoding, cb),
final: (cb: StreamCallback) => {
this.onMergeEnd(mergingStream.stream, cb);
},
});
// Create a new mergingState for our new merging stream
this.mergingState = newMergingState(this.mergeWriteStream.writableHighWaterMark);
// Pipe our new merging stream to our sink
mergingStream.stream.pipe(this.mergeWriteStream);
}
Finalizing
The last step in the process is to handle our final chunks. Such that we know when to end merging and can send an end chunk. In our main read loop we first read until our doRead() method returns 0 twice in a row, or has filled our read buffer. Once that happens we end our read loop and check our states to see if they have finished.
public _read(size: number): void {
if (this.finished) {
// we've finished, there is nothing to left to read
return;
}
this.mergeSync = false;
let bytesRead = 0;
do {
const availableSpace = this.readableHighWaterMark - this.readableLength;
bytesRead = 0;
READ_LOOP: while (bytesRead < availableSpace && !this.finished) {
try {
const result = this.doRead(availableSpace - bytesRead);
if (result === 0) {
// either there is nothing in our buffers
// or our states are outdated (since they get updated in doRead)
break READ_LOOP;
}
bytesRead += result;
} catch (error) {
this.emit('error', error);
this.push(null);
this.finished = true;
}
}
} while (bytesRead > 0 && !this.finished);
this.handleFinished();
}
Then in our handleFinished() we check our states.
private handleFinished(): void {
if (this.finished) {
// merge stream has finished, so nothing to check
return;
}
if (this.isStateFinished(this.mergingState)) {
this.stateCallbackAndSet(this.mergingState, null);
// set our mergingStream to null, to indicate we need a new one
// which will be fetched by getNextMergingIndex()
this.mergingStream = null;
this.mergeNextTick();
}
if (this.isStateFinished(this.writingState)) {
this.stateCallbackAndSet(this.writingState, null);
this.handleMainFinish(); // checks if there are still mergingStreams left, and sets finished flag
this.mergeNextTick();
}
}
The isStateFinished() checks if our state has the finalizing flag set and if the queue size equals 0
/**
* Method to check if a specific state has completed
* #param state the state to check
* #returns true if the state has completed
*/
private isStateFinished(state: MergingState): boolean {
if (!state || !state.finalizing || state.size > 0) {
return false;
}
return true;
}
The finalized flag is set once our end callback is in the final callback for our merging Writable stream. For our main stream we have to approach it a little differently, since we have little control over when our stream ends, because the readable calls the end of our writable by default. We want to remove this behavior such that we can decide when we finish our stream. This might cause some issues when other end listeners are set, but for most use cases this should be fine.
private onPipe(readable: Readable): void {
// prevent our stream from being closed prematurely and unpipe it instead
readable.removeAllListeners('end'); // Note: will cause issues if another end listener is set
readable.once('end', () => {
this.finalizeState(this.writingState);
readable.unpipe();
});
}
The finalizeState() sets the flag and the callback to end the stream.
/**
* Method to put a state in finalizing mode
*
* Finalizing mode: the last chunk has been received, when size is 0
* the stream should be removed.
*
* #param state the state which should be put in finalizing mode
*
*/
private finalizeState(state: MergingState, cb?: StreamCallback): void {
state.finalizing = true;
this.stateCallbackAndSet(state, cb);
this.mergeNextTick();
}
And that is how you merge multiple streams in one single sink.
TL;DR: The complete code
This code has been fully tested with my jest test suite on multiple edge cases And has a few more features than explained in my code. Such as appending streams and merging into that appended stream. By providing Merge.END as index.
Test result
You can see the tests I have ran here, if I forgot any, send me a message and I may write another test for it
MergeStream
✓ should throw an error when nextStream is not implemented (9ms)
✓ should throw an error when nextStream returns a stream with lower index (4ms)
✓ should reset index after new main stream (5ms)
✓ should write a single stream normally (50ms)
✓ should be able to merge a stream (2ms)
✓ should be able to append a stream on the end (1ms)
✓ should be able to merge large streams into a smaller stream (396ms)
✓ should be able to merge at the correct index (2ms)
Usage
const mergingStream = new Merge({
*nextStream(): IterableIterator<MergingStream> {
for (let i = 0; i < 10; i++) {
const stream = new Readable();
stream.push(i.toString());
stream.push(null);
yield {index: i * 2, stream};
}
},
});
const template = new Readable();
template.push(', , , , , , , , , ');
template.push(null);
template.pipe(mergingStream).pipe(getSink());
The result will of our sink would be
0, 1, 2, 3, 4, 5, 6, 7, 8, 9
Final Thoughts
This is not the most time efficient way of doing it, since we only manage one merging buffer at once. So there is a lot of waiting. For my use case that is fine. I care about it not eating up my memory and this solution works for me. But there is definitely some space for optimization. The complete code has some extra features that are not fully explained here, such as appending streams and merging into that appended stream. They have been explained with comments though.

How to write incrementally to a text file and flush output

My Node.js program - which is an ordinary command line program that by and large doesn't do anything operationally unusual, nothing system-specific or asynchronous or anything like that - needs to write messages to a file from time to time, and then it will be interrupted with ^C and it needs the contents of the file to still be there.
I've tried using fs.createWriteStream but that just ends up with a 0-byte file. (The file does contain text if the program ends by running off the end of the main file, but that's not the scenario I have.)
I've tried using winston but that ends up not creating the file at all. (The file does contain text if the program ends by running off the end of the main file, but that's not the scenario I have.)
And fs.writeFile works perfectly when you have all the text you want to write up front, but doesn't seem to support appending a line at a time.
What is the recommended way to do this?
Edit: specific code I've tried:
var fs = require('fs')
var log = fs.createWriteStream('test.log')
for (var i = 0; i < 1000000; i++) {
console.log(i)
log.write(i + '\n')
}
Run for a few seconds, hit ^C, leaves a 0-byte file.

Turns out Node provides a lower level file I/O API that seems to work fine!
var fs = require('fs')
var log = fs.openSync('test.log', 'w')
for (var i = 0; i < 100000; i++) {
console.log(i)
fs.writeSync(log, i + '\n')
}

NodeJS doesn't work in the traditional way. It uses a single thread, so by running a large loop and doing I/O inside, you aren't giving it a chance (i.e. releasing the CPU) to do other async operations for eg: flushing memory buffer to actual file.
The logic must be - do one write, then pass your function (which invokes the write) as a callback to process.nextTick or as callback to the write stream's drain event (if buffer was full during last write).
Here's a quick and dirty version which does what you need. Notice that there are no long-running loops or other CPU blockage, instead I schedule my subsequent writes for future and return quickly, momentarily freeing up the CPU for other things.
var fs = require('fs')
var log = fs.createWriteStream('test.log');
var i = 0;
function my_write() {
if (i++ < 1000000)
{
var res = log.write("" + i + "\r\n");
if (!res) {
log.on('drain',my_write);
} else {
process.nextTick(my_write);
}
console.log("Done" + i + " " + res + "\r\n");
}
}
my_write();

This function might also be helpful.
/**
* Write `data` to a `stream`. if the buffer is full will block
* until it's flushed and ready to be written again.
* [see](https://nodejs.org/api/stream.html#stream_writable_write_chunk_encoding_callback)
*/
export function write(data, stream) {
return new Promise((resolve, reject) => {
if (stream.write(data)) {
process.nextTick(resolve);
} else {
stream.once("drain", () => {
stream.off("error", reject);
resolve();
});
stream.once("error", reject);
}
});
}

You are writing into file using for loop which is bad but that's other case. First of all createWriteStream doesn't close the file automatically you should call close.
If you call close immediately after for loop it will close without writing because it's async.
For more info read here: https://nodejs.org/api/fs.html#fs_fs_createwritestream_path_options
Problem is async function inside for loop.

How do I apply back-pressure to node streams?

While attempting to experiment with Node.JS streams I ran into an interesting conundrum. When the input (Readable) stream pushes more data then the destination (Writable) cares about I was unable to apply back-pressure correctly.
The two methods I attempted was to return false from the Writable.prototype._write and to retain a reference to the Readable so I can call Readable.pause() from the Writable. Neither solution helped much which I'll explain.
In my exercise (which you can view the full source as a Gist) I have three streams:
Readable - PasscodeGenerator
util.inherits(PasscodeGenerator, stream.Readable);
function PasscodeGenerator(prefix) {
stream.Readable.call(this, {objectMode: true});
this.count = 0;
this.prefix = prefix || '';
}
PasscodeGenerator.prototype._read = function() {
var passcode = '' + this.prefix + this.count;
if (!this.push({passcode: passcode})) {
this.pause();
this.once('drain', this.resume.bind(this));
}
this.count++;
};
I thought that the return code from this.push() was enough to self pause and wait for the drain event to resume.
Transform - Hasher
util.inherits(Hasher, stream.Transform);
function Hasher(hashType) {
stream.Transform.call(this, {objectMode: true});
this.hashType = hashType;
}
Hasher.prototype._transform = function(sample, encoding, next) {
var hash = crypto.createHash(this.hashType);
hash.setEncoding('hex');
hash.write(sample.passcode);
hash.end();
sample.hash = hash.read();
this.push(sample);
next();
};
Simply add the hash of the passcode to the object.
Writable - SampleConsumer
util.inherits(SampleConsumer, stream.Writable);
function SampleConsumer(max) {
stream.Writable.call(this, {objectMode: true});
this.max = (max != null) ? max : 10;
this.count = 0;
}
SampleConsumer.prototype._write = function(sample, encoding, next) {
this.count++;
console.log('Hash %d (%s): %s', this.count, sample.passcode, sample.hash);
if (this.count < this.max) {
next();
} else {
return false;
}
};
Here I want to consume the data as fast as possible until I reach my max number of samples and then end the stream. I tried using this.end() instead of return false but that caused the dreaded write called after end problem. Returning false does stop everything if the sample size is small but when it is large I get an out of memory error:
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Aborted (core dumped)
According to this SO answer in theory the Write stream would return false causing the streams to buffer until the buffers were full (16 by default for objectMode) and eventually the Readable would call it's this.pause() method. But 16 + 16 + 16 = 48; that's 48 objects in buffer till things fill up and the system is clogged. Actually less because there is no cloning involved so the objects passed between them is the same reference. Would that not mean only 16 objects in memory till the high water mark halts everything?
Lastly I realize I could have the Writable reference the Readable to call it's pause method using closures. However, this solution means the Writable stream knows to much about another object. I'd have to pass in a reference:
var foo = new PasscodeGenerator('foobar');
foo
.pipe(new Hasher('md5'))
.pipe(new SampleConsumer(samples, foo));
And this feels out of norm for how streams would work. I thought back-pressure was enough to cause a Writable to stop a Readable from pushing data and prevent out of memory errors.
An analogous example would be the Unix head command. Implementing that in Node I would assume that the destination could end and not just ignore causing the source to keep pushing data even if the destination has enough data to satisfy the beginning portion of the file.
How do I idiomatically construct custom streams such that when the destination is ready to end the source stream doesn't attempt to push more data?

This is a known issue with how _read() is called internally. Since your _read() is always pushing synchronously/immediately, the internal stream implementation can get into a loop in the right conditions. _read() implementations are generally expected to do some sort of async I/O (e.g. reading from disk or network).
The workaround for this (as noted in the link above) is to make your _read() asynchronous at least some of the time. You could also just make it async every time it's called with:
PasscodeGenerator.prototype._read = function(n) {
var passcode = '' + this.prefix + this.count;
var self = this;
// `setImmediate()` delays the push until the beginning
// of the next tick of the event loop
setImmediate(function() {
self.push({passcode: passcode});
});
this.count++;
};

Stdout flush for NodeJS?

Is there any stdout flush for nodejs just like python or other languages?
sys.stdout.write('some data')
sys.stdout.flush()
Right now I only saw process.stdout.write() for nodejs.

process.stdout is a WritableStream object, and the method WritableStream.write() automatically flushes the stream (unless it was explicitly corked). However, it will return true if the flush was successful, and false if the kernel buffer was full and it can't write yet. If you need to write several times in succession, you should handle the drain event.
See the documentation for write.

In newer NodeJS versions, you can pass a callback to .write(), which will be called once the data is flushed:
sys.stdout.write('some data', () => {
console.log('The data has been flushed');
});
This is exactly the same as checking .write() result and registering to the drain event:
let write = sys.stdout.write('some data');
if (!write) {
sys.stdout.once('drain', () => {
console.log('The data has been flushed');
});
}

write returns true if the data has been flushed. If it returns false, you can wait for the 'drain' event.
I think there is no flush, because that would be a blocking operation.

There is another function stdout which to clear last output to the terminal which is kind of work like flush
function flush() {
process.stdout.clearLine();
process.stdout.cursorTo(0);
}
var total = 5000;
var current = 0;
var percent = 0;
var waitingTime = 500;
setInterval(function() {
current += waitingTime;
percent = Math.floor((current / total) * 100);
flush();
process.stdout.write(`downloading ... ${percent}%`);
if (current >= total) {
console.log("\nDone.");
clearInterval(this);
}
}, waitingTime);
cursorTo will move the cursor to position 0 which is the starting point
use the flush function before stdout.write because it will clear the screen, if you put after you will not see any output

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

node.js child_process#spawn bypass stdin/stdout inner buffers - node.js

Related

Node.js readable maximize throughput/performance for compute intense readable - Writable doesn't pull data fast enough

Memory efficient growing Nodejs Duplex/Transform stream

How to write incrementally to a text file and flush output

How do I apply back-pressure to node streams?

Stdout flush for NodeJS?

Categories

Resources