HTTP - how to send multiple pre-cached gzipped chunks? - node.js

Lets say I have 2 individually gziped html chunks in memory.
Can I send chunk1+chunk2 to HTTP client? Does any browser supports this?
Or there is no way to do this and I have to gzip the whole stream not individual chunks?
I want to serve to clients for example chunk1+chunk2 and chunk2+chunk1 etc (different order) but I don't want to compress the whole page every time and I dont want to cache the whole page. I want to use precompressed cached chunks and send them.
nodejs code (node v0.10.7):
// creating pre cached data buffers
var zlib = require('zlib');
var chunk1, chunk2;
zlib.gzip(new Buffer('test1'), function(err, data){
chunk1 = data;
});
zlib.gzip(new Buffer('test2'), function(err, data){
chunk2 = data;
});
var http = require('http');
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/plain', 'Content-Encoding': 'gzip'});
// writing two pre gziped buffers
res.write(chunk1); // if I send only this one everything is OK
res.write(chunk2); // if I send two chunks Chrome trying to download file
res.end();
}).listen(8080);
When my example server returns this kind of response Chrome browser display download window (it doesnt understand it :/

I haven't tried it, but if the http clients are compliant with RFC 1952, then they should accept concatenated gzip streams, and decompress them with the same result as if the data were all compressed into one stream. The HTTP 1.1 standard in RFC 2616 does in fact refer to RFC 1952.
If by "chunks" you are referring to chunked transfer encoding, then that is independent of the compression. If the clients do accept concatenated streams, then there is no reason for chunked transfer encoded boundaries to have to align with the gzip streams within.
As to how to do it, simply gzip your pieces and directly concatenate them. No other formatting or preparation is required.

Related

Any performance concerns with sending a buffer array in express JSON response?

My nodejs server consumes data from a nodejs JSON API. Some endpoints on the API return image data like so:
let buffer = await getImageBuffer();
res.set('content-type', 'image/png');
res.end(buffer);
Which works great. However for a number of complexity reasons, I'd love include a buffer array in a JSON response instead... like so:
let buffer = await getBuffer();
res.json({
contentType: 'image/png',
buffer
});
Are there any performance issues w/ including a buffer array in a JSON response like that? Is there any inherent performance benefit to using res.end(buffer) instead? The consuming server is also running nodejs, and will naturally JSON.parse() the response from the API.

How can I consume a stream of json chunks from http endpoint?

I have a server that is streaming json objects to an endpoint. Here is a simplified example:
app.get('/getJsonObjects', function (req, res) {
res.write(JSON.stringify(json1));
res.write(JSON.stringify(json2));
res.write(JSON.stringify(json3));
res.write(JSON.stringify(json4));
res.write(JSON.stringify(json5));
res.end();
});
Then client side using browser-request, I'm trying to do:
var r = request(url);
r.on('data', function(data) {
console.log(JSON.parse(data));
});
The problem is despite streaming to the endpoint chunks of valid stringified JSON, the chunks I'm getting back from the request are just text chunks that don't necessarily align with the start/end of the JSON chunks that were sent from the server. This means that JSON.parse(data) will sometimes fail.
What is the best way to stream these chunks of json in the same way that they were written to the endpoint?
This is an async problem. The server code you have provided will not be guaranteed to send out data in that order.
You will either have to accumulate the chunks on the client side and determine the order of the chunks on the client end for display or you will have to do some sort of accumulator method on the server end and then output the JSON in order as they get processed.
Edit:
It appears that res.write can take in an encoding type "chunked". So try setting the header field to chunked and then specify "chunked" in the encoding parameter of res.write().
https://nodejs.org/api/http.html#http_response_write_chunk_encoding_callback
If this fails, you can just make a huge callback / promise chain using the callback parameter of res.write to guarantee the order of the res.write().

How can I stream multiple remote images to a zip file and stream that to browser with ExpressJS?

I've got a small web app built in ExpressJs that allows people in our company to browse product information. A recent feature request requires that users be able to download batches of images (potentially hundreds at a time). These are stored on another server.
Ideally I think I need to to stream the batch of files to a zip file and stream that to the end user's browser as a download. All preferably without having to store the files on the server. The idea being that I want to reduce load on the server as much as possible.
Is it possible to do this or do I need to look at another approach? I've been experimenting with the 'request' module for the initial download.
If anyone can point me in the right direction or recommend any NPM modules that might help it would be very much appreciated.
Thanks.
One useful module for this is archiver, but I'm sure there are others as well.
Here's an example program that shows:
how to retrieve a list of URL's (I'm using async to handle the requests, and also to limit the # of concurrent HTTP requests to 3);
how to add the responses for those URL's to a ZIP file;
to stream the final ZIP file somewhere (in this case to stdout, but in case of Express you can pipe to the response object).
Example:
var async = require('async');
var request = require('request');
var archiver = require('archiver');
function zipURLs(urls, outStream) {
var zipArchive = archiver.create('zip');
async.eachLimit(urls, 3, function(url, done) {
var stream = request.get(url);
stream.on('error', function(err) {
return done(err);
}).on('end', function() {
return done();
});
// Use the last part of the URL as a filename within the ZIP archive.
zipArchive.append(stream, { name : url.replace(/^.*\//, '') });
}, function(err) {
if (err) throw err;
zipArchive.finalize().pipe(outStream);
});
}
zipURLs([
'http://example.com/image1.jpg',
'http://example.com/image2.jpg',
...
], process.stdout);
Do note that although this doesn't require the image files to be locally stored, it does build the ZIP file entirely in memory. Perhaps there are other ZIP modules that would allow you to work around that, although (AFAIK) the ZIP file format isn't really great in terms of streaming, as it depends on metadata being appended to the end of the file.

Decode a base64 document in ExpressJS response

I would like to store some documents in a database as base64 strings. Then when those docs are requested using HTTP, I would like ExpressJS to decode the base64 docs and return them. So something like this:
app.get('/base64', function (req, res) {
//pdf is my base64 encoded string that represents a document
var buffer = new Buffer(pdf, 'base64');
res.send(buffer);
});
The code is simply to give an idea of what I'm trying to accomplish. Do I need to use a stream for this? If so, how would I do that? Or should I be writing these docs to a temp directory and then serving up the file? Would be nice to skip that step if possible. Thanks!
UPDATE: Just to be clear I would like this to work with a typical HTTP request. So the user will click a link in his browser that will take him to a URL that returns a file from the database. Seems like it must be possible, Microsoft SharePoint stores serialized files in a SQL database and returns those files over http requests, and I don't believe it writes all those files to a temp location first. I'm feeling like a nodejs stream may be the answer, but I'm not very familiar with streaming.
Before saving a file representation to the DB you can just use the toString method with base 64 encoding:
var base64pdf = pdf.toString('base64');
After you get the base64 file representation from db use the buffer as follows in order to convert it back to a file:
var decodedFile = new Buffer(base64pdf, 'base64');
More information on Buffer usages can be found here - NodeJS Buffer
As for how to send a buffer from express server to the client, Socket IO should solve this issue.
Using socket.emit -
Emits an event to the socket identified by the string name. Any
other parameters can be included.
All datastructures are supported, including Buffer. JavaScript
functions can’t be serialized/deserialized.
var io = require('socket.io')();
io.on('connection', function(socket){
socket.emit('an event', { some: 'data' });
});
Required documentation on socket.io website.

Node.js POST File to Server

I am trying to write an app that will allow my users to upload files to my Google Cloud Storage account. In order to prevent overwrites and to do some custom handling and logging on my side, I'm using a Node.js server as a middleman for the upload. So the process is:
User uploads file to Node.js Server
Node.js server parses file, checks file type, stores some data in DB
Node.js server uploads file to GCS
Node.js server response to user's request with a pass/fail remark
I'm getting a little lost on step 3, of exactly how to send that file to GCS. This question gives some helpful insight, as well as a nice example, but I'm still confused.
I understand that I can open a ReadStream for the temporary upload file and pipe that to the http.request() object. What I'm confused about is how do I signify in my POST request that the piped data is the file variable. According to the GCS API Docs, there needs to be a file variable, and it needs to be the last one.
So, how do I specify a POST variable name for the piped data?
Bonus points if you can tell me how to pipe it directly from my user's upload, rather than storing it in a temporary file
I believe that if you want to do POST, you have to use a Content-Type: multipart/form-data;boundary=myboundary header. And then, in the body, write() something like this for each string field (linebreaks should be \r\n):
--myboundary
Content-Disposition: form-data; name="field_name"
field_value
And then for the file itself, write() something like this to the body:
--myboundary
Content-Disposition: form-data; name="file"; filename="urlencoded_filename.jpg"
Content-Type: image/jpeg
Content-Transfer-Encoding: binary
binary_file_data
The binary_file_data is where you use pipe():
var fileStream = fs.createReadStream("path/to/my/file.jpg");
fileStream.pipe(requestToGoogle, {end: false});
fileStream.on('end, function() {
req.end("--myboundary--\r\n\r\n");
});
The {end: false} prevents pipe() from automatically closing the request because you need to write one more boundary after you're finished sending the file. Note the extra -- on the end of the boundary.
The big gotcha is that Google may require a content-length header (very likely). If that is the case, then you cannot stream a POST from your user to a POST to Google because you won't reliably know what what the content-length is until you've received the entire file.
The content-length header's value should be a single number for the entire body. The simple way to do this is to call Buffer.byteLength(body) on the entire body, but that gets ugly quickly if you have large files, and it also kills the streaming. An alternative would be to calculate it like so:
var body_before_file = "..."; // string fields + boundary and metadata for the file
var body_after_file = "--myboundary--\r\n\r\n";
var fs = require('fs');
fs.stat(local_path_to_file, function(err, file_info) {
var content_length = Buffer.byteLength(body_before_file) +
file_info.size +
Buffer.byteLength(body_after_file);
// create request to google, write content-length and other headers
// write() the body_before_file part,
// and then pipe the file and end the request like we did above
But, that still kills your ability to stream from the user to google, the file has to be downloaded to the local disk to determine it's length.
Alternate option
...now, after going through all of that, PUT might be your friend here. According to https://developers.google.com/storage/docs/reference-methods#putobject you can use a transfer-encoding: chunked header so you don't need to find the files length. And, I believe that the entire body of the request is just the file, so you can use pipe() and just let it end the request when it's done. If you're using https://github.com/felixge/node-formidable to handle uploads, then you can do something like this:
incomingForm.onPart = function(part) {
if (part.filename) {
var req = ... // create a PUT request to google and set the headers
part.pipe(req);
} else {
// let formidable handle all non-file parts
incomingForm.handlePart(part);
}
}

Resources