Having trouble streaming response to client using expressjs - node.js

I am having a really hard time wrapping my head around how to stream data back to my client when using Nodejs/Expressjs.
I am grabbing a lot of data from my database and I am doing it in chunks, I would like to stream that back to the client as I get the data such that I do not have to store the entire dataset in memory as a json object before sending it back.
I would like the data to stream back as a file, ie I want the browser to ask my users what to do with the file on download. I was previously creating a file system write stream and stream the contents of my data to the file system, then when done I would send the file back to the client. I would like to eliminate the middle man (creating tmp file on file system) and just stream data to client.
app.get(
'/api/export',
function (req, res, next) {
var notDone = true;
while (notDone) {
var partialData = // grab partial data from database (maybe first 1000 records);
// stream this partial data as a string to res???
if (checkIfDone) notDone = false;
}
}
);
I can call res.write("some string data"), then call res.end() when I am done. However I am not 100% sure that this is actually streaming the response to the client as I write. Seems like expressjs is storing all the data until I call end and then sending the response. Is that true?
What is the proper way to stream strings chunks of data to a response using expressjs?

The response object is already a writable stream. Express handles sending chunked data automatically, so you won't need to do anything extra but:
response.send(data)
You may also want to check out the built-in pipe method, http://nodejs.org/api/stream.html#stream_event_pipe.

You can do this by setting the appropriate headers and then just writing to the response object. Example:
res.writeHead(200, {
'Content-Type': 'text/plain',
'Content-Disposition': contentDisposition('foo.data')
});
var c = 0;
var interval = setInterval(function() {
res.write(JSON.stringify({ foo: Math.random() * 100, count: ++c }) + '\n');
if (c === 10) {
clearInterval(interval);
res.end();
}
}, 1000);
// extracted from Express, used by `res.download()`
function contentDisposition(filename) {
var ret = 'attachment';
if (filename) {
filename = basename(filename);
// if filename contains non-ascii characters, add a utf-8 version ala RFC 5987
ret = /[^\040-\176]/.test(filename)
? 'attachment; filename="' + encodeURI(filename) + '"; filename*=UTF-8\'\'' + encodeURI(filename)
: 'attachment; filename="' + filename + '"';
}
return ret;
}
Also, Express/node does not buffer data written to a socket unless the socket is paused (either explicitly or implicitly due to backpressure). Data buffered by node when in this paused state may or may not be combined with other data chunks that already buffered. You can check the return value of res.write() to determine if you should continue writing to the socket. If it returns false, then listen for the 'drain' event and then continue writing again.

Related

How do I stream a chunked file using Node.js Readable?

I have a 400Mb file split into chunks that are ~1Mb each.
Each chunk is a MongoDB document:
{
name: 'stuff.zip',
index: 15,
buffer: Binary('......'),
totalChunks: 400
}
I am fetching each chunk from my database and then streaming it to the client.
Every time I get chunk from the DB I push it to the readableStream which is being piped to the client.
Here is the code:
import { Readable } from 'stream'
const name = 'stuff.zip'
const contentType = 'application/zip'
app.get('/api/download-stuff', (req, res) => {
res.set('Content-Type', contentType)
res.set('Content-Disposition', `attachment; filename=${name}`)
res.attachment(name)
// get `totalChunks` from random chunk
let { totalChunks } = await ChunkModel.findOne({ name }).select('totalChunks')
let index = 0
const readableStream = new Readable({
async read() {
if (index < totalChunks) {
let { buffer } = await ChunkModel.findOne({ name, index }).select('buffer')
let canContinue = readableStream.push(buffer)
console.log(`pushed chunk ${index}/${totalChunks}`)
index++
// sometimes it logs false
// which means I should be waiting before pushing more
// but I don't know how
console.log('canContinue = ', canContinue)
} else {
readableStream.push(null)
readableStream.destroy()
console.log(`all ${totalChunks} chunks streamed to the client`)
}
}
})
readableStream.pipe(res)
})
The code works.
But I'm wondering whether I risk having memory overflows on my local server memory, especially when the requests for the same file are too many or the chunks are too many.
Question: My code is not waiting for readableStream to finish reading the chunk that was just pushed to it, before pushing the next one. I thought it was, and that is why I'm using read(){..} in this probably wrong way. So how should I wait for each chunk to be pushed, read, streamed to the client and cleared from my server's local memory, before I push the next one in ?
I have created this sandbox in case it helps anyone
In general, when the readable interface is implemented correctly (i.e., the backpressure signal is respected), the readable interface will prevent the code from overflowing the memory regardless of source size.
When implemented according to the API spec, the readable itself does not keep references for data that has finished passing through the stream. The memory requirement of a readable buffer is adjusted by specifying a highWatermark.
In this case, the snippet does not conform to the readable interface. It violates the following two concepts:
No data shall be pushed to the readable's buffer unless read() has been called. Currently, this implementation proceeds to push data from DB immediately. Consequently, the readable buffer will start to fill before the sink has begun to consume data.
The readable's push() method returns a boolean flag. When the flag is false, the implementation must wait for .read() to be called before pushing additional data. If the flag is ignored, the buffer will overflow wrt. the highWatermark.
Note that ignoring these core criteria of Readables circumvents the backpressure logic.
An alternative implementation, if this is a Mongoose query:
app.get('/api/download-stuff', async (req, res) => {
// ... truncated handler
// A helper variable to relay data from the stream to the response body
const passThrough = new stream.PassThrough({objectMode: false});
// Pipe data using pipeline() to simplify handling stream errors
stream.pipeline(
// Create a cursor that fetch all relevant documents using a single query
ChunkModel.find().limit(chunksLength).select("buffer").sort({index: 1}).lean().cursor(),
// Cherry pick the `buffer` property
new stream.Transform({
objectMode: true,
transform: ({ buffer }, encoding, next) => {
next(null, buffer);
}
}),
// Write the retrieved documents to the helper variable
passThrough,
error => {
if(error){
// Log and handle error. At this point the HTTP headers are probably already sent,
// and it is therefore too late to return HTTP500
}
}
);
res.body = passThrough;
});

Node: How can I use pipe and change one file from a multipart

I have a http service that needs to redirect a request, I am not using streams because I deal with big files in multipart and it overwhelms RAM or disk(see How do Node.js Streams work?)
Now I am using pipes and it works, the code is something of this sort:
var Req = getReq(response);
request.pipe(Req);
The only shortcoming of this is that in this multipart I resend in the pipe contains one JSON file that needs a few fields to be changed.
Can I still use a pipe and change one file in the piped multipart?
You can do this using a Transform Stream.
var Req = getReq(response);
var transformStream = new TransformStream();
// the boundary key for the multipart is in the headers['content-type']
// if this isn't set, the multipart request would be invalid
Req.headers['content-type'] = request.headers['content-type'];
// pipe from request to our transform stream, and then to Req
// it will pipe chunks, so it won't use too much RAM
// however, you will have to keep the JSON you want to modify in memory
request.pipe(transformStream).pipe(Req);
Transform Stream code:
var Transform = require('stream').Transform,
util = require('util');
var TransformStream = function() {
Transform.call(this, {objectMode: true});
};
util.inherits(TransformStream, Transform);
TransformStream.prototype._transform = function(chunk, encoding, callback) {
// here should be the "modify" logic;
// this will push all chunks as they come, leaving the multipart unchanged
// there's no limitation on what you can push
// you can push nothing, or you can push an entire file
this.push(chunk);
callback();
};
TransformStream.prototype._flush = function (callback) {
// you can push in _flush
// this.push( SOMETHING );
callback();
};
In the _transform function, your logic should be something like this:
If, in the current chunk, the JSON you want to modify begins
<SOME_DATA_BEFORE_JSON> <MY_JSON_START>
then this.push(SOME_DATA_BEFORE_JSON); and keep MY_JSON_START in a local var
While your JSON hasn't ended, append the chunk to your local var
If, in the current chunk, the JSON ends:
<JSON_END> <SOME_DATA_AFTER_JSON>
then add JSON_END to your var, do whatever changes you want,
and push the changes:
this.push(local_var);
this.push(SOME_DATA_AFTER_JSON);
If current chunk has nothing of your JSON, simply push the chunk
this.push(chunk);
Other than that, you may want to read the multipart format.
SOME_DATA_BEFORE_JSON from above will be:
--frontier
Content-Type: text/plain
<JSON_START>
Other than Content-Type, it may contain the filename, encoding, etc.
Something to keep in mind the chunks may end wherever (could end in the middle of the frontier).
The parsing could get quite tricky; I would search for the boundary key (frontier), and then check if the JSON starts after that. There would be two cases:
chunk: <SOME_DATA> --frontier <FILE METADATA> <FILE_DATA>
chunk 1: <SOME_DATA> --fron
chunk 2: ier <FILE METADATA> <FILE_DATA>
Hope this helps!

request.on in http.createServer(function(request,response) {});

var http = require('http');
var map = require('through2-map');
uc = map(function(ch) {
return ch.toString().toUpperCase();
});
server = http.createServer(function(request, response) {
request.on('data',function(chunk){
if (request.method == 'POST') {
//change the data from request to uppercase letters and
//pipe to response.
}
});
});
server.listen(8000);
I have two questions about the code above. First, I read the documentation for request, it said that request is an instance of IncomingMessage, which implements Readable Stream. However, I couldn't find .on method in the Stream documentation. So I don't know what chunk in the callback function in request.on does. Secondly, I want to do some manipulation to the data from request and pipe it to response. Should I pipe from chunk or from request? Thank you for consideration!
is chunk a stream?
nop. The stream is the flow among what the chunks of the whole data are sent.
A simple example, If you read a 1gb file, a stream will read it by chunks of 10k, each chunk will go through your stream, from the beginning to the end, with the right order.
I use a file as example, but a socket, request or whatever streams is based on that idea.
Also, whenever someone sends a request to this server would that entire thing be a chunk?
In the particular case of http requests, only the request body is a stream. It can be the posted files/data. Or the response body of the response. Headers are treated as Objects to apply on the request before the body is written on the socket.
A small example to help you with some concrete code,
var through2 = require('through2');
var Readable = require('stream').Readable;
var s1 = through2(function transform(chunk, enc, cb){
console.log("s1 chunk %s", chunk.toString())
cb(err=null, chunk.toString()+chunk.toString() )
});
var s2 = through2(function transform(chunk, enc, cb){
console.log("s2 chunk %s", chunk.toString())
cb(err=null, chunk)
});
s2.on('data', function (data) {
console.log("s2 data %s", data.toString())
})
s1.on('end', function (data) {
console.log("s1 end")
})
s2.on('end', function (data) {
console.log("s2 end")
})
var rs = new Readable;
rs.push('beep '); // this is a chunk
rs.push('boop'); // this is a chunk
rs.push(null); // this is a signal to end the stream
rs.on('end', function (data) {
console.log("rs end")
})
console.log(
".pipe always return piped stream: %s", rs.pipe(s1)===s1
)
s1.pipe(s2)
I would like to suggest you to read more :
https://github.com/substack/stream-handbook
http://maxogden.com/node-streams.html
https://github.com/maxogden/mississippi
All Streams are instances of EventEmitter (docs), that is where the .on method comes from.
Regarding the second question, you MUST pipe from the Stream object (request in this case). The "data" event emits data as a Buffer or a String (the "chunk" argument in the event listener), not a stream.
Manipulating Streams is usually done by implementing a Transform stream (docs). Though there are many NPM packages available that make this process simpler (like through2-map or the like), though in reality, they produce Transform streams.
Consider the following:
var http = require('http');
var map = require('through2-map');
// Transform Stream to uppercase
var uc = map(function(ch) {
return ch.toString().toUpperCase();
});
var server = http.createServer(function(request, response) {
// Pipe from the request to our transform stream
request
.pipe(uc)
// pipe from transfrom stream to response
.pipe(response);
});
server.listen(8000);
You can test by running curl:
$ curl -X POST -d 'foo=bar' http://localhost:8000
# logs FOO=BAR

How to correctly calculate the the number of bytes of a node.js stream that have been processed?

I have a stream I'm sending over the wire and takes a bit of time to fully send, so I want to display how far along it is on the fly. I know you can listen on the 'data' event for streams, but in newer versions of node, it also puts the stream into "flowing mode". I want to make sure i'm doing this correctly.
Currently I have the following stuff:
deploymentPackageStream.pause() // to prevent it from entering "flowing mode"
var bytesSent = 0
deploymentPackageStream.on('data', function(data) {
bytesSent+=data.length
process.stdout.write('\r ')
process.stdout.write('\r'+(bytesSent/1000)+'kb sent')
})
deploymentPackageStream.resume()
// copy over the deployment package
execute(conn, 'cat > deploymentPackage.sh', deploymentPackageStream).wait()
This gives me the right bytesSent output, but the resulting package seems to be missing some data off the front. If I put the 'resume' line after executing the copy line (the last line), it doesn't copy anything. If I don't resume, it also doesn't copy anything. What's going on and how do I do this properly without disrupting the stream and without entering flowing mode (I want back pressure)?
I should mention, i'm still using node v0.10.x
Alright, I made something that essentially is a passthrough, but calls a callback with data as it comes in:
// creates a stream that can view all the data in a stream and passes the data through
// parameters:
// stream - the stream to peek at
// callback - called when there's data sent from the passed stream
var StreamPeeker = exports.StreamPeeker = function(stream, callback) {
Readable.call(this)
this.stream = stream
stream.on('readable', function() {
var data = stream.read()
if(data !== null) {
if(!this.push(data)) stream.pause()
callback(data)
}
}.bind(this))
stream.on('end', function() {
this.push(null)
}.bind(this))
}
util.inherits(StreamPeeker, Readable)
StreamPeeker.prototype._read = function() {
this.stream.resume()
}
If I understand streams properly, this should appropriately handle backpressure.
Using this, I can just count up data.length in the callback like this:
var peeker = new StreamPeeker(stream, function(data) {
// use data.length
})
peeker.pipe(destination)

Node js- writing data to the writable stream

In my node application im writing data to the file using write method in the createWriteStream method.Now i need to find whether the write for the particular stream is complete or not.How can i find that.
var stream = fs.createWriteStream('myFile.txt', {flags: 'a'});
var result = stream.write(data);
writeToStream();
function writeToStream() {
var result = stream.write(data + '\n');
if (!result) {
stream.once('drain',writeToStream());
}
}
I need to call other method for every time when write completes.How can i do this.
From the node.js WritableStream.write(...) documentation you can give the "write" method a callback that is called when the written data is flushed:
var stream = fs.createWriteStream('myFile.txt', {flags: 'a'});
var data = "Hello, World!\n";
stream.write(data, function() {
// Now the data has been written.
});
Note that you probably don't need to actually wait for each call to "write" to complete before queueing the next call. Even if the "write" method returns false you can still call subsequent writes and node will buffer the pending write requests into memory.
I am using maerics's answer along with error handling. The flag 'a' is used to Open file for appending. The file is created if it does not exist. There Other flags you can use.
// Create a writable stream & Write the data to stream with encoding to be utf8
var writerStream = fs.createWriteStream('MockData/output.txt',{flags: 'a'})
.on('finish', function() {
console.log("Write Finish.");
})
.on('error', function(err){
console.log(err.stack);
});
writerStream.write(outPutData,function() {
// Now the data has been written.
console.log("Write completed.");
});
// Mark the end of file
writerStream.end();

Resources