Node Streaming, Writing, and Memory

Node Streaming, Writing, and Memory - node.js

I'm attempting to dynamically concatenate files prior to serving their content. The following very simplified code shows an approach:
var http = require('http');
var fs = require('fs');
var start = '<!doctype html><html lang="en"><head><script>';
var funcsA = fs.readFileSync('functionsA.js', 'utf8');
var funcsB = fs.readFileSync('functionsB.js', 'utf8');
var funcsC = fs.readFileSync('functionsC.js', 'utf8');
var finish = '</script></head><body>some stuff here</body></html>';
var output = start + funcsA + funcsB + funcsC + finish;
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/html'});
res.end(output);
}).listen(9000);
In reality, how I concatenate might depend on clues from the userAgent. My markup and scripts could be several hundred kilobytes combined.
I like this approach because there is no file system I/O happening within createServer. I seem to have read somewhere that this response.write(...); approach is not as efficient/low overhead as streaming data using an fs.createReadStream approach. I seem to recall this had something to do with what happens when the client cannot receive data as fast as Node can send it.(?) We seem to be able to create a readable stream from a file system object, but not from memory. Is it possible to do what I have coded above with a streaming approach? With file I/O happening initially, outside of the CreateServer function?
Or, on the other hand, are my concerns not that critical, and the approach above offers perhaps no less efficiency than a streaming approach.
Thanks.

res.write(start)
var A = fs.createReadStream()
var B = fs.createReadStream()
var C = fs.createReadStream()
A.pipe(res, {
end: false
})
A.on('end', function () {
B.pipe(res, {
end: false
})
})
B.on('end', function () {
C.pipe(res, {
end: false
})
})
C.on('end', function () {
res.write(finish)
res.end()
})

Defining Streams prior to (and not inside) the createServer callback won't typically work, see here

Related

Piping readstream into response makes it one-time-use

Right now I'm trying to use Node readstreams and stream transforms to edit my HTML data before sending it to the client. Yes, I'm aware templating engines exist, this is the method I'm working with right now though. The code I'm using looks like this:
const express = require('express')
const app = express()
const port = 8080
const fs = require('fs')
const Transform = require("stream").Transform
const parser = new Transform()
const newLineStream = require("new-line")
parser._transform = function(data, encoding, done) {
const str = data.toString().replace('</body>', `<script>var questions = ${JSON.stringify(require('./questions.json'))};</script></body>`)
this.push(str)
done()
}
app.get('/', (req, res) => {
console.log('Homepage served')
res.write('<!-- Begin stream -->\n');
let stream = fs.createReadStream('./index.html')
stream.pipe(newLineStream())
.pipe(parser)
.on('end', () => {
res.write('\n<!-- End stream -->')
}).pipe(res)
})
This is just a rough draft to try and get this method working. Right now, the issue I'm running into is that the first time I load my webpage everything works fine, but every time after that the html I'm given looks like this:
<!-- Begin stream -->
<html>
<head></head>
<body></body>
</html>
It seems like the stream is getting hung up in the middle, because most of the data is never transmitted and the stream is never ended. Another thing I notice in the console is a warning after 10 reloads that there are 11 event listeners on [Transform] and there's a possible memory leak. I've tried clearing all event listeners on both the readstream and the parser once the readstream ends, but that didn't solve anything. Is there a way to change my code to fix this issue?
Original StackOverflow post that this method came from

The issue here was using a single parser and ._transform() instead of creating a new transform every time the app received a request. Putting const parser = ... and parser._transform = ... inside the app.get() fixed everything.

How to keep the request open to use the write() method after a long time

I need to keep the connection open so after I finish the music I write the new data. The problem is that the way I did, the stream simply stops after the first song.
How can I keep the connection open and play the next songs too?
const fs = require('fs');
const express = require('express');
const app = express();
const server = require('http').createServer(app)
const getMP3Duration = require('get-mp3-duration')
let sounds = ['61880.mp3', '62026.mp3', '62041.mp3', '62090.mp3', '62257.mp3', '60763.mp3']
app.get('/current', async (req, res) => {
let readStream = fs.createReadStream('sounds/61068.mp3')
let duration = await getMP3Duration(fs.readFileSync('sounds/61068.mp3'))
let pipe = readStream.pipe(res, {end: false})
async function put(){
let file_path = 'sounds/'+sounds[Math.random() * sounds.length-1]
duration = await getMP3Duration(fs.readFileSync(file_path))
readStream = fs.createReadStream(file_path)
readStream.on('data', chunk => {
console.log(chunk)
pipe.write(chunk)
})
console.log('Current Sound: ', file_path)
setTimeout(put, duration)
}
setTimeout(put, duration)
})
server.listen(3005, async function () {
console.log('Server is running on port 3005...')
});

You should use a library or look at the source code and see what they do.
A good one is:
https://github.com/obastemur/mediaserver
TIP:
Always start your research by learning from other projects.. (When possible or when you are not inventing the wheel ;)) you are not the first to do so or to hit this problem :)
a quick search with the phrase "nodejs stream mp3 github" gave me few directions..
Good luck !

Express works by returning a single response to a single request. As soon as the request has been sent, a new request needs to be generated to trigger a new response.
In your case however you want to keep on generating new responses out of a single request.
Two approaches can be used to solve your problem:
Change the way you create your response to satisfy your use-case.
use an instantaneous communication framework (websocket). The best and simplest which comes to my mind is socket.io
Adapting express
The solution here is to follow this procedure:
Request on endpoint /current comes in
The audio sequence is prepared
The stream of the entire sequence is returned
So your handler would look like that:
const fs = require('fs');
const express = require('express');
const app = express();
const server = require('http').createServer(app);
// Import the PassThrough class to concatenate the streams
const { PassThrough } = require('stream');
// The array of sounds now contain all the sounds
const sounds = ['61068.mp3','61880.mp3', '62026.mp3', '62041.mp3', '62090.mp3', '62257.mp3', '60763.mp3'];
// function which concatenate an array of streams
const concatStreams = streamArray => {
let pass = new PassThrough();
let waiting = streamArray.length;
streamArray.forEach(soundStream => {
pass = soundStream.pipe(pass, {end: false});
soundStream.once('end', () => --waiting === 0 && pass.emit('end'));
});
return pass;
};
// function which returns a shuffled array
const shuffle = (array) => {
const a = [...array]; // shallow copy of the array
for (let i = a.length - 1; i > 0; i--) {
const j = Math.floor(Math.random() * (i + 1));
[a[i], a[j]] = [a[j], a[i]];
}
return a;
};
server.get('/current', (req, res) => {
// Start by shuffling the array
const shuffledSounds = shuffle(sounds);
// Create a readable stream for each sound
const streams = shuffledSounds.map(sound => fs.createReadStream(`sounds/${sound}`));
// Concatenate all the streams into a single stream
const readStream = concatStreams(streams);
// This will wait until we know the readable stream is actually valid before piping
readStream.on('open', function () {
// This just pipes the read stream to the response object (which goes to the client)
// the response is automatically ended when the stream emits the "end" event
readStream.pipe(res);
});
});
Notice that the function does not require the async keyword any longer. The process is still asynchronous but the coding is emitter based instead of promise based.
If you want to loop the sounds you can create additional steps of shuffling/mapping to stream/concatenation.
I did not include the socketio alternative as to keep it simple.

Final Solution After a Few Edits:
I suspect your main issue is with your random array element generator. You need to wrap what you have with Math.floor to round down to ensure you end up with a whole number:
sounds[Math.floor(Math.random() * sounds.length)]
Also, Readstream.pipe returns the destination, so what you're doing makes sense. However, you might get unexpected results with calling on('data') on your readable after you've already piped from it. The node.js streams docs mention this. I tested out your code on my local machine and it doesn't seem to be an issue, but it might make sense to change this so you don't have problems in the future.
Choose One API Style
The Readable stream API evolved across multiple Node.js versions and provides multiple methods of consuming stream data. In general, developers should choose one of the methods of consuming data and should never use multiple methods to consume data from a single stream. Specifically, using a combination of on('data'), on('readable'), pipe(), or async iterators could lead to unintuitive behavior.
Instead of calling on('data') and res.write, I would just pipe from the readStream into the res again. Also, unless you really want to get the duration, I would pull that library out and just use the readStream.end event to make additional calls to put(). This works because you're passing the false option when piping, which disables the default end event functionality on the write stream and leaves it open. However, it still gets emitted, so you can use that as a marker to know when the readable has finished piping. Here's the refactored code:
const fs = require('fs');
const express = require('express');
const app = express();
const server = require('http').createServer(app)
//const getMP3Duration = require('get-mp3-duration') no longer needed
let sounds = ['61880.mp3', '62026.mp3', '62041.mp3', '62090.mp3', '62257.mp3', '60763.mp3']
app.get('/current', async (req, res) => {
let readStream = fs.createReadStream('sounds/61068.mp3')
let duration = await getMP3Duration(fs.readFileSync('sounds/61068.mp3'))
let pipe = readStream.pipe(res, {end: false})
function put(){
let file_path = 'sounds/'+sounds[Math.floor(Math.random() * sounds.length)]
readStream = fs.createReadStream(file_path)
// you may also be able to do readStream.pipe(res, {end: false})
readStream.pipe(pipe, {end: false})
console.log('Current Sound: ', file_path)
readStream.on('end', () => {
put()
});
}
readStream.on('end', () => {
put()
});
})
server.listen(3005, async function () {
console.log('Server is running on port 3005...')
});

Is it possible to register multiple listeners to a child process's stdout data event? [duplicate]

I need to run two commands in series that need to read data from the same stream.
After piping a stream into another the buffer is emptied so i can't read data from that stream again so this doesn't work:
var spawn = require('child_process').spawn;
var fs = require('fs');
var request = require('request');
var inputStream = request('http://placehold.it/640x360');
var identify = spawn('identify',['-']);
inputStream.pipe(identify.stdin);
var chunks = [];
identify.stdout.on('data',function(chunk) {
chunks.push(chunk);
});
identify.stdout.on('end',function() {
var size = getSize(Buffer.concat(chunks)); //width
var convert = spawn('convert',['-','-scale',size * 0.5,'png:-']);
inputStream.pipe(convert.stdin);
convert.stdout.pipe(fs.createWriteStream('half.png'));
});
function getSize(buffer){
return parseInt(buffer.toString().split(' ')[2].split('x')[0]);
}
Request complains about this
Error: You cannot pipe after data has been emitted from the response.
and changing the inputStream to fs.createWriteStream yields the same issue of course.
I don't want to write into a file but reuse in some way the stream that request produces (or any other for that matter).
Is there a way to reuse a readable stream once it finishes piping?
What would be the best way to accomplish something like the above example?

You have to create duplicate of the stream by piping it to two streams. You can create a simple stream with a PassThrough stream, it simply passes the input to the output.
const spawn = require('child_process').spawn;
const PassThrough = require('stream').PassThrough;
const a = spawn('echo', ['hi user']);
const b = new PassThrough();
const c = new PassThrough();
a.stdout.pipe(b);
a.stdout.pipe(c);
let count = 0;
b.on('data', function (chunk) {
count += chunk.length;
});
b.on('end', function () {
console.log(count);
c.pipe(process.stdout);
});
Output:
8
hi user

The first answer only works if streams take roughly the same amount of time to process data. If one takes significantly longer, the faster one will request new data, consequently overwriting the data still being used by the slower one (I had this problem after trying to solve it using a duplicate stream).
The following pattern worked very well for me. It uses a library based on Stream2 streams, Streamz, and Promises to synchronize async streams via a callback. Using the familiar example from the first answer:
spawn = require('child_process').spawn;
pass = require('stream').PassThrough;
streamz = require('streamz').PassThrough;
var Promise = require('bluebird');
a = spawn('echo', ['hi user']);
b = new pass;
c = new pass;
a.stdout.pipe(streamz(combineStreamOperations));
function combineStreamOperations(data, next){
Promise.join(b, c, function(b, c){ //perform n operations on the same data
next(); //request more
}
count = 0;
b.on('data', function(chunk) { count += chunk.length; });
b.on('end', function() { console.log(count); c.pipe(process.stdout); });

You can use this small npm package I created:
readable-stream-clone
With this you can reuse readable streams as many times as you need

For general problem, the following code works fine
var PassThrough = require('stream').PassThrough
a=PassThrough()
b1=PassThrough()
b2=PassThrough()
a.pipe(b1)
a.pipe(b2)
b1.on('data', function(data) {
console.log('b1:', data.toString())
})
b2.on('data', function(data) {
console.log('b2:', data.toString())
})
a.write('text')

I have a different solution to write to two streams simultaneously, naturally, the time to write will be the addition of the two times, but I use it to respond to a download request, where I want to keep a copy of the downloaded file on my server (actually I use a S3 backup, so I cache the most used files locally to avoid multiple file transfers)
/**
* A utility class made to write to a file while answering a file download request
*/
class TwoOutputStreams {
constructor(streamOne, streamTwo) {
this.streamOne = streamOne
this.streamTwo = streamTwo
}
setHeader(header, value) {
if (this.streamOne.setHeader)
this.streamOne.setHeader(header, value)
if (this.streamTwo.setHeader)
this.streamTwo.setHeader(header, value)
}
write(chunk) {
this.streamOne.write(chunk)
this.streamTwo.write(chunk)
}
end() {
this.streamOne.end()
this.streamTwo.end()
}
}
You can then use this as a regular OutputStream
const twoStreamsOut = new TwoOutputStreams(fileOut, responseStream)
and pass it to to your method as if it was a response or a fileOutputStream

If you have async operations on the PassThrough streams, the answers posted here won't work.
A solution that works for async operations includes buffering the stream content and then creating streams from the buffered result.
To buffer the result you can use concat-stream
const Promise = require('bluebird');
const concat = require('concat-stream');
const getBuffer = function(stream){
return new Promise(function(resolve, reject){
var gotBuffer = function(buffer){
resolve(buffer);
}
var concatStream = concat(gotBuffer);
stream.on('error', reject);
stream.pipe(concatStream);
});
}
To create streams from the buffer you can use:
const { Readable } = require('stream');
const getBufferStream = function(buffer){
const stream = new Readable();
stream.push(buffer);
stream.push(null);
return Promise.resolve(stream);
}

What about piping into two or more streams not at the same time ?
For example :
var PassThrough = require('stream').PassThrough;
var mybiraryStream = stream.start(); //never ending audio stream
var file1 = fs.createWriteStream('file1.wav',{encoding:'binary'})
var file2 = fs.createWriteStream('file2.wav',{encoding:'binary'})
var mypass = PassThrough
mybinaryStream.pipe(mypass)
mypass.pipe(file1)
setTimeout(function(){
mypass.pipe(file2);
},2000)
The above code does not produce any errors but the file2 is empty

JSONStream handle one data with different parser

I'm using JSONStream to parse the data from server, the data can either be like {"error": "SomeError"} or {"articles":[{"id": 123}]};
My code goes like
var request = require('request');
var JSONStream = require('JSONStream');
var articleIDParser = JSONStream.parse(['articles', true, 'id']);
var errorParser = JSONStream.parse(['error']);
request({url: 'http://XXX/articles.json'})
.pipe(articleIDParser).pipe(errorParser);
errorParser.on('data', function(data) {
console.log(data);
});
articleIDParser.on('data', someFuncHere);
But unlucky, the second parser does not work even when the server returns error.
Am I wrong at pipe function or JSONStream?
Thanks in advance.

Well, I use the following way to solved the problem:
var request({url: 'http://XXX/articles.json'})
dest.pipe(articleIDParser)
dest.pipe(errorParser);

Explanation in Node.js Stream documentation.
The callback function of the 'end' event doesn't have a data parameter. Listen for the 'data' event instead. In case of piping listen for the pipe event on the destination.
var request, JSONStream, articleIDParser, errorParser;
request = require('request');
JSONStream = require('JSONStream');
articleIDParser = JSONStream.parse(['articles', true, 'id']);
errorParser = JSONStream.parse(['error']);
articleIDParser.on('pipe', function (src) {
// some code
});
errorParser.on('pipe', function (src) {
// some code
});
request({url: 'http://XXX/articles.json'}).pipe(articleIDParser).pipe(errorParser);
Note: JSONStream.getParserStream is less ambiguous, one might think you're already parsing while you're just getting the parser/writable stream. If you still have issues please give more information (code) about JSONStream. The Stream module is still marked as unstable by the way.

HTTP request stream not firing readable when reading fixed sizes

I am trying to work with the new Streams API in Node.js, but having troubles when specifying a fixed read buffer size.
var http = require('http');
var req = http.get('http://143.226.75.100/waug_mp3_128k', function (res) {
res.on('readable', function () {
var receiveBuffer = res.read(1024);
console.log(receiveBuffer.length);
});
});
This code will receive a few buffers and then exit. However, if I add this line after the console.log() line:
res.read(0);
... all is well again. My program continues to stream as predicted.
Why is this happening? How can I fix it?

It's explained here.
As far as I understand it, by reading only 1024 bytes with each readable event, Node is left to assume that you're not interested in the rest of the data that's in the stream buffers, and discards it. Issuing the read(0) (in the same event loop iteration) 'resets' this behaviour. I'm not sure why the process exits after reading a couple of 1024-byte buffers though; I can recreate it, but I don't understand it yet :)
If you don't have a specific reason to use the 1024-byte reads, just read the entire buffer for each event:
var receiveBuffer = res.read();
Or instead of using non-flowing mode, use flowing mode by using the data/end events instead:
var http = require('http');
var req = http.get('http://143.226.75.100/waug_mp3_128k', function (res) {
var chunks = [];
res.on('data', function(chunk) {
chunks.push(chunk);
console.log('chunk:', chunk.length);
});
res.on('end', function() {
var result = Buffer.concat(chunks);
console.log('final result:', result.length);
});
});

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Node Streaming, Writing, and Memory - node.js

Defining Streams prior to (and not inside) the createServer callback won't typically work, see here

Related

Piping readstream into response makes it one-time-use

How to keep the request open to use the write() method after a long time

Is it possible to register multiple listeners to a child process's stdout data event? [duplicate]

JSONStream handle one data with different parser

HTTP request stream not firing readable when reading fixed sizes

Categories

Resources