fs.createReadStream - limit the amount of data streamed at a time

fs.createReadStream - limit the amount of data streamed at a time - node.js

If I only want to read 10 bytes at a time, or one line of data at a time (looking for newline characters) is it possible to pass fs.createReadStream() options like so
var options = {}
var stream = fs.createReadStream('file.txt', options);
so that I can limit the amount of data streamed at a time?
looking at the fs docs, I don't see any options that would allow me to do that even though I am guessing that it's possible.
https://nodejs.org/api/fs.html#fs_fs_createreadstream_path_options

You can use .read():
var stream = fs.createReadStream('file.txt', options);
var byteSize = 10;
stream.on("readable", function() {
var chunk;
while ( (chunk = stream.read(byteSize)) ) {
console.log(chunk.length);
}
});
The main benefit of knowing this one over just the highWaterMark option is that you can call it on streams you haven't created.
Here are the docs

Related

NodeJS - fs.createReadStream() - how to cut off chunks at a certain point?

I have a large XML file (2GB) and I need to add a new line if a criterion is met. Example:
<chickens>
<chicken>
<name>sam</name>
<female>false</female>
</chicken>
<chicken>
<name>julia</name>
<female>true</female>
</chicken>
// many many more chickens
</chickens>
to:
<chickens>
<chicken>
<name>sam</name>
<female>false</female>
</chicken>
<chicken>
<name>julia</name>
<female>true</female>
<canLayEggs>true</canLayEggs> // <- Add this line if female is true;
</chicken>
// many many more chickens
</chickens>
However, the issue that I'm facing is that sometimes the chunk gets cut off like <female>true
and then the next chunk starts with </female>
Here is my code:
const fs = require("fs");
const input = "input.xml";
const MAX_CHUNK_SIZE = 50 * 1024 * 1024; //50 MB
const buffer = Buffer.alloc(MAX_CHUNK_SIZE);
let readStream = fs.createReadStream(input, "utf8", {
highWaterMark: MAX_CHUNK_SIZE,
});
let writeStream = fs.createWriteStream("output.xml");
readStream.on("data", (chunk) => {
let data = chunk;
if (data.includes("<category>f</category>")) {
data = data.replace(
/<female>true<\/female>/g,
"<female>true</female><canLayEggs>true</canLayEggs>"
);
}
writeStream.write(data, "utf-8");
});
readStream.on("end", () => {
writeStream.end();
})
I have tried Google but I can't seem to find the right term, and many tutorials out there doesn't really cover this. Any help is appreciated.

you are reading 50MB per chunk. so in the on data callback, you can call:
readStream.destroy();
also, you don't need to init the buffer with 50MB size, it is not used here and after the text replacement it is likely longer than 50MB.
It is good, that you close the writeStream when the readStream closes.

Web Audio Api: Proper way to play data chunks from a nodejs server via socket

I'm using the following code to decode audio chunks from nodejs's socket
window.AudioContext = window.AudioContext || window.webkitAudioContext;
var context = new AudioContext();
var delayTime = 0;
var init = 0;
var audioStack = [];
var nextTime = 0;
client.on('stream', function(stream, meta){
stream.on('data', function(data) {
context.decodeAudioData(data, function(buffer) {
audioStack.push(buffer);
if ((init!=0) || (audioStack.length > 10)) { // make sure we put at least 10 chunks in the buffer before starting
init++;
scheduleBuffers();
}
}, function(err) {
console.log("err(decodeAudioData): "+err);
});
});
});
function scheduleBuffers() {
while ( audioStack.length) {
var buffer = audioStack.shift();
var source = context.createBufferSource();
source.buffer = buffer;
source.connect(context.destination);
if (nextTime == 0)
nextTime = context.currentTime + 0.05; /// add 50ms latency to work well across systems - tune this if you like
source.start(nextTime);
nextTime+=source.buffer.duration; // Make the next buffer wait the length of the last buffer before being played
};
}
But it has some gaps/glitches between audio chunks that I'm unable to figure out.
I've also read that with MediaSource it's possible to do the same and having the timing handled by the player instead of doing it manually. Can someone provide an example of handling mp3 data?
Moreover, which is the proper way to handle live streaming with web audio API? I've already read almost all questions os SO about this subject and none of them seem to work without glitches. Any ideas?

You can take this code as an example: https://github.com/kmoskwiak/node-tcp-streaming-server
It basically uses media source extensions. All you need to do is to change from video to audio
buffer = mediaSource.addSourceBuffer('audio/mpeg');

yes #Keyne is right,
const mediaSource = new MediaSource()
const sourceBuffer = mediaSource.addSourceBuffer('audio/mpeg')
player.src = URL.createObjectURL(mediaSource)
sourceBuffer.appendBuffer(chunk) // Repeat this for each chunk as ArrayBuffer
player.play()
But do this only if you don't care about IOS support 🤔 (https://developer.mozilla.org/en-US/docs/Web/API/MediaSource#Browser_compatibility)
Otherwise please let me know how you do it !

Bad performance on combination of streams

I want to stream the results of a PostgreSQL query to a client via a websocket.
Data is fetched from the database using pg-promise and pg-query-stream. To stream data via a websocket I use socket.io-stream.
Individually, all components perform quite wel. Though when I pipe the pg-query-stream to the socket.io-stream, performance drops drastically.
I've started with:
var QueryStream = require('pg-query-stream');
var ss = require('socket.io-stream');
// Query with a lot of results
var qs = new QueryStream('SELECT...');
db.stream(qs, s => {
var socketStream = ss.createStream({objectMode: true});
ss(socket).emit('data', socketStream);
s.pipe(socketStream);
})
.then(data => {
console.log('Total rows processed:', data.processed,
'Duration in milliseconds:', data.duration);
});
I have tried to use non-object streams:
var socketStream = ss.createStream();
ss(socket).emit('data', socketStream);
s.pipe(JSONStream.stringify()).pipe(socketStream);
Or:
var socketStream = ss.createStream();
ss(socket).emit('data', socketStream);
s.pipe(JSONStream.stringify(false)).pipe(socketStream);
It takes roughly one minute to query and transfer the data for all solutions.
The query results can be written to a file within one second:
s.pipe(fs.createWriteStream('temp.txt'));
And that file can be transmitted within one second:
var socketStream = ss.createStream();
fs.createReadStream('temp.txt').pipe(socketStream);
So somehow, these streams don't seem to combine well.
As a silly experiment, I've tried placing something in between:
var socketStream = ss.createStream();
ss(socket).emit('data', socketStream);
var zip = zlib.createGzip();
var unzip = zlib.createGunzip();
s.pipe(JSONStream.stringify(false)).pipe(zip).pipe(unzip).pipe(socketStream);
And suddenly data can be queried and transfered within one second...
Unfortunately this is not going to work as my final solution. It would waste too much CPU. What is causing performance to degrade on this combination of streams? How can this be fixed?

Is it possible to register multiple listeners to a child process's stdout data event? [duplicate]

I need to run two commands in series that need to read data from the same stream.
After piping a stream into another the buffer is emptied so i can't read data from that stream again so this doesn't work:
var spawn = require('child_process').spawn;
var fs = require('fs');
var request = require('request');
var inputStream = request('http://placehold.it/640x360');
var identify = spawn('identify',['-']);
inputStream.pipe(identify.stdin);
var chunks = [];
identify.stdout.on('data',function(chunk) {
chunks.push(chunk);
});
identify.stdout.on('end',function() {
var size = getSize(Buffer.concat(chunks)); //width
var convert = spawn('convert',['-','-scale',size * 0.5,'png:-']);
inputStream.pipe(convert.stdin);
convert.stdout.pipe(fs.createWriteStream('half.png'));
});
function getSize(buffer){
return parseInt(buffer.toString().split(' ')[2].split('x')[0]);
}
Request complains about this
Error: You cannot pipe after data has been emitted from the response.
and changing the inputStream to fs.createWriteStream yields the same issue of course.
I don't want to write into a file but reuse in some way the stream that request produces (or any other for that matter).
Is there a way to reuse a readable stream once it finishes piping?
What would be the best way to accomplish something like the above example?

You have to create duplicate of the stream by piping it to two streams. You can create a simple stream with a PassThrough stream, it simply passes the input to the output.
const spawn = require('child_process').spawn;
const PassThrough = require('stream').PassThrough;
const a = spawn('echo', ['hi user']);
const b = new PassThrough();
const c = new PassThrough();
a.stdout.pipe(b);
a.stdout.pipe(c);
let count = 0;
b.on('data', function (chunk) {
count += chunk.length;
});
b.on('end', function () {
console.log(count);
c.pipe(process.stdout);
});
Output:
8
hi user

The first answer only works if streams take roughly the same amount of time to process data. If one takes significantly longer, the faster one will request new data, consequently overwriting the data still being used by the slower one (I had this problem after trying to solve it using a duplicate stream).
The following pattern worked very well for me. It uses a library based on Stream2 streams, Streamz, and Promises to synchronize async streams via a callback. Using the familiar example from the first answer:
spawn = require('child_process').spawn;
pass = require('stream').PassThrough;
streamz = require('streamz').PassThrough;
var Promise = require('bluebird');
a = spawn('echo', ['hi user']);
b = new pass;
c = new pass;
a.stdout.pipe(streamz(combineStreamOperations));
function combineStreamOperations(data, next){
Promise.join(b, c, function(b, c){ //perform n operations on the same data
next(); //request more
}
count = 0;
b.on('data', function(chunk) { count += chunk.length; });
b.on('end', function() { console.log(count); c.pipe(process.stdout); });

You can use this small npm package I created:
readable-stream-clone
With this you can reuse readable streams as many times as you need

For general problem, the following code works fine
var PassThrough = require('stream').PassThrough
a=PassThrough()
b1=PassThrough()
b2=PassThrough()
a.pipe(b1)
a.pipe(b2)
b1.on('data', function(data) {
console.log('b1:', data.toString())
})
b2.on('data', function(data) {
console.log('b2:', data.toString())
})
a.write('text')

I have a different solution to write to two streams simultaneously, naturally, the time to write will be the addition of the two times, but I use it to respond to a download request, where I want to keep a copy of the downloaded file on my server (actually I use a S3 backup, so I cache the most used files locally to avoid multiple file transfers)
/**
* A utility class made to write to a file while answering a file download request
*/
class TwoOutputStreams {
constructor(streamOne, streamTwo) {
this.streamOne = streamOne
this.streamTwo = streamTwo
}
setHeader(header, value) {
if (this.streamOne.setHeader)
this.streamOne.setHeader(header, value)
if (this.streamTwo.setHeader)
this.streamTwo.setHeader(header, value)
}
write(chunk) {
this.streamOne.write(chunk)
this.streamTwo.write(chunk)
}
end() {
this.streamOne.end()
this.streamTwo.end()
}
}
You can then use this as a regular OutputStream
const twoStreamsOut = new TwoOutputStreams(fileOut, responseStream)
and pass it to to your method as if it was a response or a fileOutputStream

If you have async operations on the PassThrough streams, the answers posted here won't work.
A solution that works for async operations includes buffering the stream content and then creating streams from the buffered result.
To buffer the result you can use concat-stream
const Promise = require('bluebird');
const concat = require('concat-stream');
const getBuffer = function(stream){
return new Promise(function(resolve, reject){
var gotBuffer = function(buffer){
resolve(buffer);
}
var concatStream = concat(gotBuffer);
stream.on('error', reject);
stream.pipe(concatStream);
});
}
To create streams from the buffer you can use:
const { Readable } = require('stream');
const getBufferStream = function(buffer){
const stream = new Readable();
stream.push(buffer);
stream.push(null);
return Promise.resolve(stream);
}

What about piping into two or more streams not at the same time ?
For example :
var PassThrough = require('stream').PassThrough;
var mybiraryStream = stream.start(); //never ending audio stream
var file1 = fs.createWriteStream('file1.wav',{encoding:'binary'})
var file2 = fs.createWriteStream('file2.wav',{encoding:'binary'})
var mypass = PassThrough
mybinaryStream.pipe(mypass)
mypass.pipe(file1)
setTimeout(function(){
mypass.pipe(file2);
},2000)
The above code does not produce any errors but the file2 is empty

Stream and transform a file in place with nodejs

I'd like to do something like:
var fs = require('fs');
var through = require('through');
var file = 'path/to/file.json';
var input = fs.createReadStream(file, 'utf8');
var output = fs.createWriteStream(file, 'utf8');
var buf = '';
input
.pipe(through(function data(chunk) { buf += chunk; }, function end() {
var data = JSON.parse(buf);
// Do some transformation on the obj, and then...
this.queue(JSON.stringify(data, null, ' '));
})
.pipe(output);
But this fails because it's trying to read and write to the same destination. There are ways around it, like only piping to output from within the end callback above.
Is there a better way? By better, I mean uses less code or less memory. And yes, I'm aware that I could just do:
var fs = require('fs');
var file = 'path/to/file.json';
var str = fs.readFileSync(file, 'utf8');
var data = JSON.parse(str);
// Do some transformation on the obj, and then...
fs.writeFileSync(file, JSON.stringify(data, null, ' '), 'utf8');

There is no other way that your code will use less memory, because you need the whole file to parse it into a Javascript object. In this way, both versions of your code are equivalent memory-wise. If you can do some work without having to work on the full JSON object, check out JSONStream.
In your example, you should read the file, then parse and transform it, then write the result to a file; although you shouldn't use the synchronous version of the functions, see the end of this paragraph of the Node.js documentation:
In busy processes, the programmer is strongly encouraged to use the asynchronous versions of these calls. The synchronous versions will block the entire process until they complete--halting all connections.
Anyway, I don't think you can read from a file while you're overwriting it. See this particular answer to the same problem.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

fs.createReadStream - limit the amount of data streamed at a time - node.js

Related

NodeJS - fs.createReadStream() - how to cut off chunks at a certain point?

Web Audio Api: Proper way to play data chunks from a nodejs server via socket

Bad performance on combination of streams

Is it possible to register multiple listeners to a child process's stdout data event? [duplicate]

Stream and transform a file in place with nodejs

Categories

Resources