Node Pipes Handle Input - node.js

I am piping data from a textfile to another pipe which is downloading images from some urls. Now as expected this sends a large number of requests in quick succession and remote server shuts me down. I would like to handle next chunk only after the first is processed.
My code is:
read.pipe(JSONStream.parse('*'))
.pipe(es.map(function (d, cb) {
download_images(x,y)
.then(function(r) ...)
.fail(function(r) ...)
.fin(function(f) cb())
})
.pipe(xyz)
Since I have just started looking into streams, I might have missed a very simple point, or in my zeal to use streams I could have ignored a better approach
Extremely Large json file
Download images with a delay

You can call read.pause() right before calling download_images() and then call read.resume() right before you call cb().

Related

Piping a stream to a stream with custom results

After reading and marginally understanding the node stream handbook, I want to use streams whenever it seems appropriate/possible.
I have a request that uploads a file which should be written to another spot on the file system. This is done via:
readStream = fs.createReadStream(request.files.file.path);
readStream.pipe(fs.createWriteStream(targetPath));
This works great, but I want to pipe the result of the write stream to a response -- specifically I want the target path to be piped to the result when it's successful. Right now I'm doing:
readStream.pipe(fs.createWriteStream(targetPath)).on("close", function ()
serverResponse.send(200, targetPath);
});
This works fine, but I feel like it is more verbose than it needs to be and I should be able to call .pipe on the result as in read.pipe(write).pipe(respose).
Is there something I can do to get the write stream to pipe the target path to the response or better way I can go about doing what I'm doing?

Websockets with Streaming Archives

So this is the setup I'm working with:
I am on an express server which must stream an archived binary payload to a browser (does not matter if it is zip, tar or tar.gz - although zip would be nice).
On this server, I have a websocket open that connects to another server which is sending me binary payloads of individual files in a directory. I get these payloads streamed, piece-by-piece, as buffers, and I'm doing this serially (that is - file-by-file - there aren't multiple websockets open at one time, and there is one websocket per file). This is the websocket library I'm using: https://github.com/einaros/ws
I would like to go through each file, open a websocket, and then append the buffers to an archiver as they come through the websockets. When data is appended to the archiver, it would be nice if I could stream the ouput of the archiver to the browser (via the response object with response.write). So, basically, as I'm getting the payload from the websocket, I would like that payload streamed through an archiver and then to the response. :-)
Some things I have looked into:
node-zipstream - This is nice because it gives me an output stream I can pipe directly to response.write. However, it doesn't appear to support nested files/folders, and, more importantly, it only accepts an input stream. I have looked at the source code (which is quite terse and readable), and it seems as though, if I were able to have access to the update function within ZipStream.prototype.addFile, I could just call that each time on the message event when I get a binary buffer from the websocket. This is quite messy/hacky though, and, given that this library already doesn't seem to support nested files/folders, I'm not sure I will be going with it.
node-archiver - This suffers from the same issue as node-zipstream (probably because it was inspired by it) where it allows me to pipe the output, but I cannot append multiple buffers for the same file within the archive (and then manually signal when the last buffer has been appended for a given file). However, it does allow me to have nested folders, which is a clear win over node-zipstream.
Is there something I'm not aware of, or is this just a really crazy thing that I want to do?
The only alternative I see at this point is to wait for the entire payload to be streamed through a websocket and then append with node-archiver, but I really would like to reap the benefit of true streaming/archiving on-the-fly.
I've also thought about the possibility of creating a read stream of sorts just to serve as a proxy object that I can pass into node-archiver and then just append the buffers I get from the websocket to this read stream. Looking at various read streams, I'm not sure how to do this though. The only way I could think of was creating a writestream, piping buffers to it, and having a readstream read from that writestream. Am I on the correct thought process here?
As always, thanks for any help/direction you can offer SO community.
EDIT:
Since I just opened this question, and I'm new to node, there may be a better answer than the one I provided. I will keep this question open and accept a better answer if one presents itself within a few days. As always, I will upvote any other answers, even if they're ridiculous, as long as they're correct and allow me to stream on-the-fly as mine does.
I figured out a way to get this working with node-archiver. :-)
It was based off my hunch of creating a temporary "proxy stream" of sorts, inspired by this SO question: How to create streams from string in Node.Js?
The basic gist is (coffeescript syntax):
archive = archiver 'zip'
archive.pipe response // where response is the http response
// and then for each file...
fileName = ... // known file name
fileSize = ... // known file size
ws = .... // create websocket
proxyStream = new Stream()
numBytesStreamed = 0
archive.append proxyStream, name: fileName
ws.on 'message', (dataBuffer) ->
numBytesStreamed += dataBuffer.length
proxyStream.emit 'data', dataBuffer
if numBytesStreamed is fileSize
proxyStream.emit 'end'
// function/indicator to do this for the next file in the folder
// and then when you're completely done...
archive.finalize (err, bytesOfArchive) ->
if err?
// do whatever
else
// unless you somehow knew this ahead of time
res.addTrailers
'Content-Length': bytesOfArchive
res.end()
Note that this is not the complete solution I implemented. There is still a lot of logic dealing with getting the files, their paths, etc. Not to mention error-handling.
EDIT:
Since I just opened this question, and I'm new to node, there may be a better answer. I will keep this question open and accept a better answer if one presents itself within a few days. As always, I will upvote any other answers, even if they're ridiculous, as long as they're correct and allow me to stream on-the-fly as mine does.

Can you pass meta data along with a stream?

When I pipe something like an image file through a stream is there any way to send an meta object along with it?
My server gets sent an image from a user. The image gets pushed through a set of streams that perform various actions.
The final stream emits a data event, it passes the resulting image buffer into a callback but I lose all context for the user. I need to keep the resulting image tied to the user's id and some other meta data.
Ideal:
stream.on('data', function(img, meta){
...
})
Thanks for any possible solutions!
In short, no, there's nothing built into Node.js to support including metadata with streams. You do have some other options, though, including:
You could use a closure to track the meta data separately from the stream. For example:
function handleImage(imageStream) {
var meta = {...};
imageStream.pipe(otherStreams).on('data', function(image) {
// you now have `image` and `meta` variables at your disposal here.
}
}
The downside of this is that the metadata is not available to your otherStreams.
This is a good solution if your other streams are third-party code outside of your control, of if they don't need to know about the metadata.
You could do something similar to HTTP headers, where all the data up to a certain point is meta data, and everything after it is the image. (In HTTP, the deliminator is wherever \n\n occurs first.) All of your streams in the chain have to know about this and handle it though.
If you know your metadata will always be in one chunk and none of your streams split or merge chunks, then you could simplify this a bit and just say that the first (or last) chunk is always metadata.
Switch to an object stream like Amoli mentioned in his answer. Here you would pass {image: imgData, meta: {...}}. You would then have to update your other streams to expect this format.
The main downside of this method, though, is that you either have to pass the metadata multiple times, cache it somewhere for each stream that needs it, or pass your entire image as one chunk (which kind of kills the entire point of "streams"). And, from what I've been told, node.js can optimize text/binary streams better than object streams. So, this probably isn't a good approach for your situation.
https://github.com/dominictarr/mux-demux might be helpful here. It combines multiple streams into one, so you could have separate image and meta streams. I'm not sure how well it would work for your situation though. You'd probably need to update all of your streams to be aware of it.
I know I said that all but the first option require modifying the other streams, but there is a way around that: you could create a generic "stream wrapper" that splits up the image and meta data and passes just the image data through the main stream, and has the meta data bypass it and go on to the next one down the chain. This gets ugly fast though, so probably not the best idea.
Basically, whenever you want to read or write any objects which are not strings or buffers, you’ll need to put your stream into objectMode
Example (source):
function S3Lister (s3, options) {
options || (options = {});
stream.Readable.call(this, { objectMode : true });
this.s3 = s3; // a knox-like client.
this.marker = options.start;
this.connecting = false;
this.ended = false;
}
util.inherits(S3Lister, stream.Readable);
We set the stream to use objectMode as we want to return not just data but also some metadata.
For more information:
Node.js Docs stream object mode
An introduction to nodes streams
I created a module called metastream for this type of thing. (It is in npm).

Handling chunked responses from process.stdout 'data' event

I have some code which I can't seem to fix. It looks as follows:
var childProcess = require('child_process');
var spawn = childProcess.spawn;
child = spawn('./simulator',[]);
child.stdout.on('data',
function(data){
console.log(data);
}
);
This is all at the backend of my web application which is running a specific type of simulation. The simulator executable is a c program which runs a loop waiting to be passed data (via its standard input) When the inputs come in for the simulation (ie from the client), I parse the input, and then write data to the child process stdin as follows:
child.stdin.write(INPUTS);
Now the data coming back is 40,000 bytes give or take. But the data seems to be getting broken into chunks of 8192 bytes. I've tried fixing the standard output buffer of the c program but it doesnt fix it. I'm wondering if there is a limit to the size of the 'data' event that is imposed by node.js? I need it to come back as one chunk.
The buffer chunk sizes are applied in node. Nothing you do outside of node will solve the problem. There is no way to get what you want from node without a little extra work in your messaging protocol. Any message larger than the chunk size will be chunked. There are two ways you can handle this issue.
If you know the total output size before you start to stream out of C, prepend the message length to the data so the node process knows how many chunks to pull before terminating the entire message.
Determine a special character you can append to the message you are sending from the C program. When node sees that character, you end the input from that message.
If you are dealing with IO in a web application you really want to stick with the async methods. You need something like the following (untested). There is a good sample of how to consume the Stream API in the docs
var data = '';
child.stdout.on('data',
function(chunk){
data += chunk;
}
);
child.stdout.on('end',
function(){
// do something with var data
}
);
I ran into the same problem. I tried many different things and was starting to get annoyed. I tried prepending and appending with special characters. Maybe I was stupid but I just couldn't get it right.
I ran into a module called linerstream which basically parses every chunk until it sees an EOF. You can use it like this:
process.stdout.pipe(new Linerstream()).on('data', (data) => {
// data here is complete and not chunked
});
The important part is that you do have to write data to stdout with a line that ends with EOF. Otherwise it doesn't know it is the end.
I can say this worked me. Hopefully it helps other people.
ppejovic's solution works, but I prefer concat-stream.
var concat = require('concat-stream');
child.stdout.pipe(concat(function(data) {
// all your data ready to be used.
});
There are a number of good stream helpers worth looking into based on your problem area. Take a look at substack's stream-handbook.

Node.js request stream ends/stalls when piped to writable file stream

I'm trying to pipe() data from Twitter's Streaming API to a file using modern Node.js Streams. I'm using a library I wrote called TweetPipe, which leverages EventStream and Request.
Setup:
var TweetPipe = require('tweet-pipe')
, fs = require('fs');
var tp = new TweetPipe(myOAuthCreds);
var file = fs.createWriteStream('./tweets.json');
Piping to STDOUT works and stream stays open:
tp.stream('statuses/filter', { track: ['bieber'] })
.pipe(tp.stringify())
.pipe(process.stdout);
Piping to the file writes one tweet and then the stream ends silently:
tp.stream('statuses/filter', { track: ['bieber'] })
.pipe(tp.stringify())
.pipe(file);
Could anyone tell me why this happens?
it's hard to say from what you have here, it sounds like the stream is getting cleaned up before you expect. This can be triggered a number of ways, see here https://github.com/joyent/node/blob/master/lib/stream.js#L89-112
A stream could emit 'end', and then something just stops.
Although I doubt this is the problem, one thing that concerns me is this
https://github.com/peeinears/tweet-pipe/blob/master/index.js#L173-174
destroy should be called after emitting error.
I would normally debug a problem like this by adding logging statements until I can see what is not happening right.
Can you post a script that can be run to reproduce?
(for extra points, include a package.json that specifies the dependencies :)
According to this, you should create an error handler on the stream created by tp.

Resources