NodeJS Stream flushed during the Event Loop iteration

NodeJS Stream flushed during the Event Loop iteration - node.js

I'm trying to pipe one Stream Axios Response into multiple files. It's not working, and I can reproduce it with the simple code below:
Will work:
const { PassThrough } = require('stream')
const inputStream = new PassThrough()
inputStream.write('foo')
// Now I have a stream with content
inputStream.pipe(process.stdout)
inputStream.pipe(process.stderr)
// will print 'foofoo', for both stdout and stderr
Will not work:
const { PassThrough } = require('stream')
const inputStream = new PassThrough()
inputStream.write('foo')
inputStream.pipe(process.stdout)
setImmediate(() => {
inputStream.pipe(process.stderr)
})
// Will print only 'foo'
The question is, Can I say that the existed content in the stream will be piped only if the two pipe commands will execute in the same Event-Loop iteration?
Doesn't that make the situation non-deterministic?

By the time the callback scheduled with setImmediate is executed, the stream data is already flushed. This can checked by .readableLength stream property.
You can use cork and uncork in order to control when the buffered stream data is flushed.
const { PassThrough } = require('stream')
const inputStream = new PassThrough()
inputStream.cork()
inputStream.write('foo')
inputStream.pipe(process.stdout)
setImmediate(() => {
inputStream.pipe(process.stderr)
inputStream.uncork()
})

Related

Create read stream from Buffer for uploading to s3 [duplicate]

I have a library that takes as input a ReadableStream, but my input is just a base64 format image. I could convert the data I have in a Buffer like so:
var img = new Buffer(img_string, 'base64');
But I have no idea how to convert it to a ReadableStream or convert the Buffer I obtained to a ReadableStream.
Is there a way to do this?

For nodejs 10.17.0 and up:
const { Readable } = require('stream');
const stream = Readable.from(myBuffer);

something like this...
import { Readable } from 'stream'
const buffer = new Buffer(img_string, 'base64')
const readable = new Readable()
readable._read = () => {} // _read is required but you can noop it
readable.push(buffer)
readable.push(null)
readable.pipe(consumer) // consume the stream
In the general course, a readable stream's _read function should collect data from the underlying source and push it incrementally ensuring you don't harvest a huge source into memory before it's needed.
In this case though you already have the source in memory, so _read is not required.
Pushing the whole buffer just wraps it in the readable stream api.

Node Stream Buffer is obviously designed for use in testing; the inability to avoid a delay makes it a poor choice for production use.
Gabriel Llamas suggests streamifier in this answer: How to wrap a buffer as a stream2 Readable stream?

You can create a ReadableStream using Node Stream Buffers like so:
// Initialize stream
var myReadableStreamBuffer = new streamBuffers.ReadableStreamBuffer({
frequency: 10, // in milliseconds.
chunkSize: 2048 // in bytes.
});
// With a buffer
myReadableStreamBuffer.put(aBuffer);
// Or with a string
myReadableStreamBuffer.put("A String", "utf8");
The frequency cannot be 0 so this will introduce a certain delay.

You can use the standard NodeJS stream API for this - stream.Readable.from
const { Readable } = require('stream');
const stream = Readable.from(buffer);
Note: Don't convert a buffer to string (buffer.toString()) if the buffer contains binary data. It will lead to corrupted binary files.

You don't need to add a whole npm lib for a single file. i refactored it to typescript:
import { Readable, ReadableOptions } from "stream";
export class MultiStream extends Readable {
_object: any;
constructor(object: any, options: ReadableOptions) {
super(object instanceof Buffer || typeof object === "string" ? options : { objectMode: true });
this._object = object;
}
_read = () => {
this.push(this._object);
this._object = null;
};
}
based on node-streamifier (the best option as said above).

Here is a simple solution using streamifier module.
const streamifier = require('streamifier');
streamifier.createReadStream(new Buffer ([97, 98, 99])).pipe(process.stdout);
You can use Strings, Buffer and Object as its arguments.

This is my simple code for this.
import { Readable } from 'stream';
const newStream = new Readable({
read() {
this.push(someBuffer);
},
})

Try this:
const Duplex = require('stream').Duplex; // core NodeJS API
function bufferToStream(buffer) {
let stream = new Duplex();
stream.push(buffer);
stream.push(null);
return stream;
}
Source:
Brian Mancini -> http://derpturkey.com/buffer-to-stream-in-node/

Node: pipeline not blocking on paused passthrough

One of the base behaviour of node's stream is to block when writing on a paused stream, and any non piped stream is blocked.
In this example, the created PassThrough is not piped to anything in it's creation event loop. One would expect any pipeline run on this PassThrough to block until it is piped / a data event is attached, but this is not the case.
The pipeline callbacks, but nothing is consumed.
const {promises: pFs} = require('fs');
const fs = require('fs');
const {PassThrough} = require('stream');
const {pipeline: pipelineCb} = require('stream');
const util = require('util');
const pipeline = util.promisify(pipelineCb);
const path = require('path');
const assert = require('assert');
/**
* Start a test ftp server
* #param {string} outputPath
* #return {Promise<void>}
*/
function myCreateWritableStream (outputPath) {
// The stream is created in paused mode -> should block until piped
const stream = new PassThrough();
(async () => {
// Do some stuff (create directory / check space / connect...)
await new Promise(resolve => setTimeout(resolve, 500));
console.log('piping passThrough to finale output');
// Consume the stream
await pipeline(stream, fs.createWriteStream(outputPath));
console.log('passThrough stream content written');
})().catch(e => {
console.error(e);
stream.emit('error', e);
});
return stream;
}
/**
* Main test function
* #return {Promise<void>}
*/
async function main () {
// Prepare the test directory with a 'tmp1' file only
const smallFilePath = path.join(__dirname, 'tmp1');
const smallFileOut = path.join(__dirname, 'tmp2');
await Promise.all([
pFs.writeFile(smallFilePath, 'a small content'),
pFs.unlink(smallFileOut).catch(e => assert(e.code === 'ENOENT'))
]);
// Duplicate the tmp1 file to tmp2
await pipeline([
fs.createReadStream(smallFilePath),
myCreateWritableStream(smallFileOut)
]);
console.log('pipeline ended');
// Check content
const finalContent = await pFs.readdir(__dirname);
console.log('directory content');
console.log(finalContent.filter(file => file.startsWith('tmp')));
}
main().catch(e => {
process.exitCode = 1;
console.error(e);
});
This code output the following lines:
pipeline ended
directory content
[ 'tmp1' ]
piping passThrough to finale output
passThrough stream content written
If the pipeline really waited for the stream to end, then the output would be this one:
piping passThrough to finale output
passThrough stream content written
pipeline ended
directory content
[ 'tmp1', 'tmp2' ]
How can you explain this behaviour ?

I don't think the API gives the guarantees you are looking for here.
The stream.pipeline calls its callback after all data has finished writing. Since the data has been written to a new Transform stream (your Passthrough), and that stream has nowhere to put the data yet, it simply gets stored in the stream's internal buffer. That is good enough for the pipeline.
If you were to read a large enough file, filling the Transform stream's buffer, the stream backpressure can automatically trigger a pause() on the readable that is reading a file. Once the Transform stream drains, it will automatically unpause() the readable so data flow resumes.
I think your example makes two incorrect assumptions:
(1) That you can pause a transform stream. According to the stream docs, pausing any stream that is piped to a destination is ineffective, because it will immediately unpause itself as soon as a piped destination asks for more data. Also, a paused transform stream still reads data! A paused stream just doesn't write data.
(2) That a pause further down a pipeline somehow propagates up to the front of a pipeline and causes data to stop flowing. This is only true if caused by backpressure, meaning, you would need to trigger node's detection of a full internal buffer.
When working with pipes, it's best to assume you have manual control over the two farthest ends, but not necessarily of any of the pieces in the middle. (You can manually pipe() and unpipe() to connect and disconnect intermediate streams, but you can't pause them.)

Proper way to consume NodeJS stream into buffer and write stream

I have a need to pipe a readable stream into both a buffer (to be converted into a string) and a file. The stream is coming from node-fetch.
NodeJS streams have two states: paused and flowing. From what I understand, as soon as a 'data' listener is attached, the stream will change to flowing mode. I want to make sure the way I am reading a stream will not lose any bytes.
Method 1: piping and reading from 'data':
fetch(url).then(
response =>
new Promise(resolve => {
const buffers = []
const dest = fs.createWriteStream(filename)
response.body.pipe(dest)
response.body.on('data', chunk => buffers.push(chunk))
dest.on('close', () => resolve(Buffer.concat(buffers).toString())
})
)
Method 2: using passthrough streams:
const { PassThrough } = require('stream')
fetch(url).then(
response =>
new Promise(resolve => {
const buffers = []
const dest = fs.createWriteStream(filename)
const forFile = new PassThrough()
const forBuffer = new PassThrough()
response.body.pipe(forFile).pipe(dest)
response.body.pipe(forBuffer)
forBuffer.on('data', chunk => buffers.push(chunk))
dest.on('close', () => resolve(Buffer.concat(buffers).toString())
})
)
Is the second method required so there is no lost data? Is the second method wasteful since two more streams could be buffered? Or, is there another way to fill a buffer and write stream simultaneously?

You won't miss any data, since .pipe internally calls src.on('data') and writes any chunk to the target stream.
So any chunk written to your dest stream, will also be emitted to response.body.on('data') where you're buffering the chunks.
In any case, you should listen to 'error' events and reject if any error occurs.
And While your second mode will work, you don't need it.
This is a chunk of code from the .pipe function
src.on('data', ondata);
function ondata(chunk) {
debug('ondata');
var ret = dest.write(chunk);
debug('dest.write', ret);
if (ret === false) {
// If the user unpiped during `dest.write()`, it is possible
// to get stuck in a permanently paused state if that write
// also returned false.
// => Check whether `dest` is still a piping destination.
if (((state.pipesCount === 1 && state.pipes === dest) ||
(state.pipesCount > 1 && state.pipes.indexOf(dest) !== -1)) &&
!cleanedUp) {
debug('false write response, pause', state.awaitDrain);
state.awaitDrain++;
}
src.pause();
}
}

How to mock streams in NodeJS

I'm attempting to unit test one of my node-js modules which deals heavily in streams. I'm trying to mock a stream (that I will write to), as within my module I have ".on('data/end)" listeners that I would like to trigger. Essentially I want to be able to do something like this:
var mockedStream = new require('stream').readable();
mockedStream.on('data', function withData('data') {
console.dir(data);
});
mockedStream.on('end', function() {
console.dir('goodbye');
});
mockedStream.push('hello world');
mockedStream.close();
This executes, but the 'on' event never gets fired after I do the push (and .close() is invalid).
All the guidance I can find on streams uses the 'fs' or 'net' library as a basis for creating a new stream (https://github.com/substack/stream-handbook), or they mock it out with sinon but the mocking gets very lengthy very quicky.
Is there a nice way to provide a dummy stream like this?

There's a simpler way: stream.PassThrough
I've just found Node's very easy to miss stream.PassThrough class, which I believe is what you're looking for.
From Node docs:
The stream.PassThrough class is a trivial implementation of a Transform stream that simply passes the input bytes across to the output. Its purpose is primarily for examples and testing...
The code from the question, modified:
const { PassThrough } = require('stream');
const mockedStream = new PassThrough(); // <----
mockedStream.on('data', (d) => {
console.dir(d);
});
mockedStream.on('end', function() {
console.dir('goodbye');
});
mockedStream.emit('data', 'hello world');
mockedStream.end(); // <-- end. not close.
mockedStream.destroy();
mockedStream.push() works too but as a Buffer so you'll might want to do: console.dir(d.toString());

Instead of using Push, I should have been using ".emit(<event>, <data>);"
My mock code now works and looks like:
var mockedStream = new require('stream').Readable();
mockedStream._read = function(size) { /* do nothing */ };
myModule.functionIWantToTest(mockedStream); // has .on() listeners in it
mockedStream.emit('data', 'Hello data!');
mockedStream.emit('end');

The accept answer is only partially correct. If all you need is events to fire, using .emit('data', datum) is okay, but if you need to pipe this mock stream anywhere else it won't work.
Mocking a Readable stream is surprisingly easy, requiring only the Readable lib.
let eventCount = 0;
const mockEventStream = new Readable({
objectMode: true,
read: function (size) {
if (eventCount < 10) {
eventCount = eventCount + 1;
return this.push({message: `event${eventCount}`})
} else {
return this.push(null);
}
}
});
Now you can pipe this stream wherever and 'data' and 'end' will fire.
Another example from the node docs:
https://nodejs.org/api/stream.html#stream_an_example_counting_stream

Building on #flacnut 's answer, I did this (in NodeJS 12+) using Readable.from() to construct a stream preloaded with data (a list of filenames):
const mockStream = require('stream').Readable.from([
'file1.txt',
'file2.txt',
'file3.txt',
])
In my case, I wanted to mock the stream of filenames returned by fast-glob.stream:
const glob = require('fast-glob')
// inject the mock stream into glob module
glob.stream = jest.fn().mockReturnValue(mockStream)
In the function being tested:
const stream = glob.stream(globFilespec)
for await (const filename of stream) {
// filename = file1.txt, then file2.txt, then file3.txt
}
Works like a charm!

Here's a simple implementation which uses jest.fn() where the goal is to validate what has been written to the stream created by fs.createWriteStream(). The nice thing about jest.fn() is that although the calls to fs.createWriteStream() and stream.write() are inline in this test function, these functions don't need to be called directly by the test.
const fs = require('fs');
const mockStream = {}
test('mock fs.createWriteStream with mock implementation', async () => {
const createMockWriteStream = (filename, args) => {
return mockStream;
}
mockStream3.write = jest.fn();
fs.createWriteStream = jest.fn(createMockWriteStream);
const stream = fs.createWriteStream('foo.csv', {'flags': 'a'});
await stream.write('foobar');
expect(fs.createWriteStream).toHaveBeenCalledWith('foo.csv', {'flags': 'a'});
expect(mockStream.write).toHaveBeenCalledWith('foobar');
})

Is it possible to register multiple listeners to a child process's stdout data event? [duplicate]

I need to run two commands in series that need to read data from the same stream.
After piping a stream into another the buffer is emptied so i can't read data from that stream again so this doesn't work:
var spawn = require('child_process').spawn;
var fs = require('fs');
var request = require('request');
var inputStream = request('http://placehold.it/640x360');
var identify = spawn('identify',['-']);
inputStream.pipe(identify.stdin);
var chunks = [];
identify.stdout.on('data',function(chunk) {
chunks.push(chunk);
});
identify.stdout.on('end',function() {
var size = getSize(Buffer.concat(chunks)); //width
var convert = spawn('convert',['-','-scale',size * 0.5,'png:-']);
inputStream.pipe(convert.stdin);
convert.stdout.pipe(fs.createWriteStream('half.png'));
});
function getSize(buffer){
return parseInt(buffer.toString().split(' ')[2].split('x')[0]);
}
Request complains about this
Error: You cannot pipe after data has been emitted from the response.
and changing the inputStream to fs.createWriteStream yields the same issue of course.
I don't want to write into a file but reuse in some way the stream that request produces (or any other for that matter).
Is there a way to reuse a readable stream once it finishes piping?
What would be the best way to accomplish something like the above example?

You have to create duplicate of the stream by piping it to two streams. You can create a simple stream with a PassThrough stream, it simply passes the input to the output.
const spawn = require('child_process').spawn;
const PassThrough = require('stream').PassThrough;
const a = spawn('echo', ['hi user']);
const b = new PassThrough();
const c = new PassThrough();
a.stdout.pipe(b);
a.stdout.pipe(c);
let count = 0;
b.on('data', function (chunk) {
count += chunk.length;
});
b.on('end', function () {
console.log(count);
c.pipe(process.stdout);
});
Output:
8
hi user

The first answer only works if streams take roughly the same amount of time to process data. If one takes significantly longer, the faster one will request new data, consequently overwriting the data still being used by the slower one (I had this problem after trying to solve it using a duplicate stream).
The following pattern worked very well for me. It uses a library based on Stream2 streams, Streamz, and Promises to synchronize async streams via a callback. Using the familiar example from the first answer:
spawn = require('child_process').spawn;
pass = require('stream').PassThrough;
streamz = require('streamz').PassThrough;
var Promise = require('bluebird');
a = spawn('echo', ['hi user']);
b = new pass;
c = new pass;
a.stdout.pipe(streamz(combineStreamOperations));
function combineStreamOperations(data, next){
Promise.join(b, c, function(b, c){ //perform n operations on the same data
next(); //request more
}
count = 0;
b.on('data', function(chunk) { count += chunk.length; });
b.on('end', function() { console.log(count); c.pipe(process.stdout); });

You can use this small npm package I created:
readable-stream-clone
With this you can reuse readable streams as many times as you need

For general problem, the following code works fine
var PassThrough = require('stream').PassThrough
a=PassThrough()
b1=PassThrough()
b2=PassThrough()
a.pipe(b1)
a.pipe(b2)
b1.on('data', function(data) {
console.log('b1:', data.toString())
})
b2.on('data', function(data) {
console.log('b2:', data.toString())
})
a.write('text')

I have a different solution to write to two streams simultaneously, naturally, the time to write will be the addition of the two times, but I use it to respond to a download request, where I want to keep a copy of the downloaded file on my server (actually I use a S3 backup, so I cache the most used files locally to avoid multiple file transfers)
/**
* A utility class made to write to a file while answering a file download request
*/
class TwoOutputStreams {
constructor(streamOne, streamTwo) {
this.streamOne = streamOne
this.streamTwo = streamTwo
}
setHeader(header, value) {
if (this.streamOne.setHeader)
this.streamOne.setHeader(header, value)
if (this.streamTwo.setHeader)
this.streamTwo.setHeader(header, value)
}
write(chunk) {
this.streamOne.write(chunk)
this.streamTwo.write(chunk)
}
end() {
this.streamOne.end()
this.streamTwo.end()
}
}
You can then use this as a regular OutputStream
const twoStreamsOut = new TwoOutputStreams(fileOut, responseStream)
and pass it to to your method as if it was a response or a fileOutputStream

If you have async operations on the PassThrough streams, the answers posted here won't work.
A solution that works for async operations includes buffering the stream content and then creating streams from the buffered result.
To buffer the result you can use concat-stream
const Promise = require('bluebird');
const concat = require('concat-stream');
const getBuffer = function(stream){
return new Promise(function(resolve, reject){
var gotBuffer = function(buffer){
resolve(buffer);
}
var concatStream = concat(gotBuffer);
stream.on('error', reject);
stream.pipe(concatStream);
});
}
To create streams from the buffer you can use:
const { Readable } = require('stream');
const getBufferStream = function(buffer){
const stream = new Readable();
stream.push(buffer);
stream.push(null);
return Promise.resolve(stream);
}

What about piping into two or more streams not at the same time ?
For example :
var PassThrough = require('stream').PassThrough;
var mybiraryStream = stream.start(); //never ending audio stream
var file1 = fs.createWriteStream('file1.wav',{encoding:'binary'})
var file2 = fs.createWriteStream('file2.wav',{encoding:'binary'})
var mypass = PassThrough
mybinaryStream.pipe(mypass)
mypass.pipe(file1)
setTimeout(function(){
mypass.pipe(file2);
},2000)
The above code does not produce any errors but the file2 is empty

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string