Handling chunked responses from process.stdout 'data' event - node.js

I have some code which I can't seem to fix. It looks as follows:
var childProcess = require('child_process');
var spawn = childProcess.spawn;
child = spawn('./simulator',[]);
child.stdout.on('data',
function(data){
console.log(data);
}
);
This is all at the backend of my web application which is running a specific type of simulation. The simulator executable is a c program which runs a loop waiting to be passed data (via its standard input) When the inputs come in for the simulation (ie from the client), I parse the input, and then write data to the child process stdin as follows:
child.stdin.write(INPUTS);
Now the data coming back is 40,000 bytes give or take. But the data seems to be getting broken into chunks of 8192 bytes. I've tried fixing the standard output buffer of the c program but it doesnt fix it. I'm wondering if there is a limit to the size of the 'data' event that is imposed by node.js? I need it to come back as one chunk.

The buffer chunk sizes are applied in node. Nothing you do outside of node will solve the problem. There is no way to get what you want from node without a little extra work in your messaging protocol. Any message larger than the chunk size will be chunked. There are two ways you can handle this issue.
If you know the total output size before you start to stream out of C, prepend the message length to the data so the node process knows how many chunks to pull before terminating the entire message.
Determine a special character you can append to the message you are sending from the C program. When node sees that character, you end the input from that message.

If you are dealing with IO in a web application you really want to stick with the async methods. You need something like the following (untested). There is a good sample of how to consume the Stream API in the docs
var data = '';
child.stdout.on('data',
function(chunk){
data += chunk;
}
);
child.stdout.on('end',
function(){
// do something with var data
}
);

I ran into the same problem. I tried many different things and was starting to get annoyed. I tried prepending and appending with special characters. Maybe I was stupid but I just couldn't get it right.
I ran into a module called linerstream which basically parses every chunk until it sees an EOF. You can use it like this:
process.stdout.pipe(new Linerstream()).on('data', (data) => {
// data here is complete and not chunked
});
The important part is that you do have to write data to stdout with a line that ends with EOF. Otherwise it doesn't know it is the end.
I can say this worked me. Hopefully it helps other people.

ppejovic's solution works, but I prefer concat-stream.
var concat = require('concat-stream');
child.stdout.pipe(concat(function(data) {
// all your data ready to be used.
});
There are a number of good stream helpers worth looking into based on your problem area. Take a look at substack's stream-handbook.

Related

Unable to use one readable stream to write to two different targets in Node JS

I have a client side app where users can upload an image. I receive this image in my Node JS app as readable data and then manipulate it before saving like this:
uploadPhoto: async (server, request) => {
try {
const randomString = `${uuidv4()}.jpg`;
const stream = Fse.createWriteStream(`${rootUploadPath}/${userId}/${randomString}`);
const resizer = Sharp()
.resize({
width: 450
});
await data.file
.pipe(resizer)
.pipe(stream);
This works fine, and writes the file to the projects local directory. The problem comes when I try to use the same readable data again in the same async function. Please note, all of this code is in a try block.
const stream2 = Fse.createWriteStream(`${rootUploadPath}/${userId}/thumb_${randomString}`);
const resizer2 = Sharp()
.resize({
width: 45
});
await data.file
.pipe(resizer2)
.pipe(stream2);
The second file is written, but when I check the file, it seems corrupted or didn't successfully write the data. The first image is always fine.
I've tried a few things, and found one method that seems to work but I don't understand why. I add this code just before the I create the second write stream:
data.file.on('end', () => {
console.log('There will be no more data.');
});
Putting the code for the second write stream inside the on-end callback block doesn't make a difference, however, if I leave the code outside of the block, between the first write stream code and the second write stream code, then it works, and both files are successfully written.
It doesn't feel right leaving the code the way it is. Is there a better way I can write the second thumb nail image? I've tried to use the Sharp module to read the file after the first write stream writes the data, and then create a smaller version of it, but it doesn't work. The file doesn't ever seem to be ready to use.
You have 2 alternatives, which depends on how your software is designed.
If possible, I would avoid to execute two transform operations on the same stream in the same "context", eg: an API endpoint. I would rather separate those two different tranform so they do not work on the same input stream.
If that is not possible or would require too many changes, the solution is to fork the input stream and the pipe it into two different Writable. I normally use Highland.js fork for these tasks.
Please also see my comments on how to properly handle streams with async/await to check when the write operation is finished.

Node.js: Race condition when receiving data on tcp socket

I'm using the net library of Node.js to conect to a server that is publishing data. So I'm listening for 'data'-events on client side. When the data-event is fired, I append the received data to my rx-buffer and check if we got a complete message by reading some bytes. If I got a valid message, I remove the message from the buffer and process it. The source code looks like:
rxBuffer = ''
client.on('data', (data) => {
rxBuffer += data
// for example... 10 stores the message length...
while (rxBuffer.length > 10 && rxBuffer.length >= (10 + rxBuffer[10])) {
const msg = rxBuffer.slice(0, 10 + rxBuffer[10])
rxBuffer = rxBuffer.slice(0, msg.length) // remove message from buffer
processMsg(msg) // process message..
}
})
As far as I know that the typical way. But... what happens if the data event fired multiple times? So, imagine I'm getting a data event and while I append the data to my rx-buffer I'm getting the next data event. So the "new" data event will also append the data to the rxBuffer and starts my while-loop. So I've two handlers that are processing the same messages because they share the same rx-buffer. Is this correct?
How can I handle this? In other languages I'd say use something like a mutex to prevent multiple access to the rx-buffer... but what's the solution forjs?!?! Or maybe I'm wrong and I'm never getting multiple data-events while one event is still active? Any ideas?
JavaScript is single threaded. The second event will not run until the first one either completes or blocks, the latter of which could presumably happen in your processMsg(). If that's the case, multiple executions of processMsg() could be interleaved. If they aren't changing any global data (rxBuffer included), then you shouldn't have a problem.

Nodejs Multipart File Upload, Resulting File Larger Than Original

I wanted to write a file uploader for a Nodejs Server using Express 4. I didn't want to use any middleware to achieve this because this was more of an academic exercise to understand a better how Nodejs works and multipart uploads.
Below is just the main bit of code for a route in Express 4 that collects the client data and writes it out.
var clientData = [];
// When Data Arrives
req.on('data', function(data){
clientData.push(data);
});
// Done
req.on('end', function(){
var output = Buffer.concat(clientData);
fs.writeFile('Thisisthesong.mp3', output, 'binary', function(err){
if (err) throw err;
debug('Wrote out song');
});
});
My issue is that when the file is finally written out, is larger than the original. For example, if I were to upload an MP3 with this code that was originally 10.5 MB the result is 11 MB. I believe it has something to do with switching the encodings back and forth from the body to writing it out. I also understand that node does not truly have a binary encoding, could that be an issue?
I also thought it could be because I'm not stripping the boundaries or the the Content-Disposition for the data (this would be the next step once this is working well), but the boundary and the Disposition are only about 300 bytes not 500KB. Does anybody have an explanation or could point out what I'm doing incorrectly, I would greatly appreciate it.
Other Info:
+ Express 4
+ I'm not using any middleware at the moment besides cookieparser
+ Ubuntu 12.04
+ Node v0.10.31
Double check that you're comparing apples to apples here. Different interfaces on the operating system can calculate the size in different ways which could show a difference of hundreds of kilobytes for the exact same file. For example, I have a file on my computer right now which shows 2.3MB in Finder, but shows 2.2MB in Terminal when using ls -h.

Can you pass meta data along with a stream?

When I pipe something like an image file through a stream is there any way to send an meta object along with it?
My server gets sent an image from a user. The image gets pushed through a set of streams that perform various actions.
The final stream emits a data event, it passes the resulting image buffer into a callback but I lose all context for the user. I need to keep the resulting image tied to the user's id and some other meta data.
Ideal:
stream.on('data', function(img, meta){
...
})
Thanks for any possible solutions!
In short, no, there's nothing built into Node.js to support including metadata with streams. You do have some other options, though, including:
You could use a closure to track the meta data separately from the stream. For example:
function handleImage(imageStream) {
var meta = {...};
imageStream.pipe(otherStreams).on('data', function(image) {
// you now have `image` and `meta` variables at your disposal here.
}
}
The downside of this is that the metadata is not available to your otherStreams.
This is a good solution if your other streams are third-party code outside of your control, of if they don't need to know about the metadata.
You could do something similar to HTTP headers, where all the data up to a certain point is meta data, and everything after it is the image. (In HTTP, the deliminator is wherever \n\n occurs first.) All of your streams in the chain have to know about this and handle it though.
If you know your metadata will always be in one chunk and none of your streams split or merge chunks, then you could simplify this a bit and just say that the first (or last) chunk is always metadata.
Switch to an object stream like Amoli mentioned in his answer. Here you would pass {image: imgData, meta: {...}}. You would then have to update your other streams to expect this format.
The main downside of this method, though, is that you either have to pass the metadata multiple times, cache it somewhere for each stream that needs it, or pass your entire image as one chunk (which kind of kills the entire point of "streams"). And, from what I've been told, node.js can optimize text/binary streams better than object streams. So, this probably isn't a good approach for your situation.
https://github.com/dominictarr/mux-demux might be helpful here. It combines multiple streams into one, so you could have separate image and meta streams. I'm not sure how well it would work for your situation though. You'd probably need to update all of your streams to be aware of it.
I know I said that all but the first option require modifying the other streams, but there is a way around that: you could create a generic "stream wrapper" that splits up the image and meta data and passes just the image data through the main stream, and has the meta data bypass it and go on to the next one down the chain. This gets ugly fast though, so probably not the best idea.
Basically, whenever you want to read or write any objects which are not strings or buffers, you’ll need to put your stream into objectMode
Example (source):
function S3Lister (s3, options) {
options || (options = {});
stream.Readable.call(this, { objectMode : true });
this.s3 = s3; // a knox-like client.
this.marker = options.start;
this.connecting = false;
this.ended = false;
}
util.inherits(S3Lister, stream.Readable);
We set the stream to use objectMode as we want to return not just data but also some metadata.
For more information:
Node.js Docs stream object mode
An introduction to nodes streams
I created a module called metastream for this type of thing. (It is in npm).

Node.js request stream ends/stalls when piped to writable file stream

I'm trying to pipe() data from Twitter's Streaming API to a file using modern Node.js Streams. I'm using a library I wrote called TweetPipe, which leverages EventStream and Request.
Setup:
var TweetPipe = require('tweet-pipe')
, fs = require('fs');
var tp = new TweetPipe(myOAuthCreds);
var file = fs.createWriteStream('./tweets.json');
Piping to STDOUT works and stream stays open:
tp.stream('statuses/filter', { track: ['bieber'] })
.pipe(tp.stringify())
.pipe(process.stdout);
Piping to the file writes one tweet and then the stream ends silently:
tp.stream('statuses/filter', { track: ['bieber'] })
.pipe(tp.stringify())
.pipe(file);
Could anyone tell me why this happens?
it's hard to say from what you have here, it sounds like the stream is getting cleaned up before you expect. This can be triggered a number of ways, see here https://github.com/joyent/node/blob/master/lib/stream.js#L89-112
A stream could emit 'end', and then something just stops.
Although I doubt this is the problem, one thing that concerns me is this
https://github.com/peeinears/tweet-pipe/blob/master/index.js#L173-174
destroy should be called after emitting error.
I would normally debug a problem like this by adding logging statements until I can see what is not happening right.
Can you post a script that can be run to reproduce?
(for extra points, include a package.json that specifies the dependencies :)
According to this, you should create an error handler on the stream created by tp.

Resources