I've been experiencing a problem with the require library of node js. When I try to pipe to a file and a stream on response, I get the error: you cannot pipe after data has been emitted from the response. This is because I do some calculations before really piping the data.
Example:
var request = require('request')
var fs = require('fs')
var through2 = require('through2')
options = {
url: 'url-to-fetch-a-file'
};
var req = request(options)
req.on('response',function(res){
//Some computations to remove files potentially
//These computations take quite somme time.
//Function that creates path recursively
createPath(path,function(){
var file = fs.createWriteStream(path+fname)
var stream = through2.obj(function (chunk, enc, callback) {
this.push(chunk)
callback()
})
req.pipe(file)
req.pipe(stream)
})
})
If I just pipe to the stream without any calculations, it's just fine. How can I pipe to both a file and stream using request module in nodejs?
I found this:Node.js Piping the same readable stream into multiple (writable) targets but it is not the same thing. There, piping happens 2 times in a different tick. This example pipes like the answer in the question and still receives an error.
Instead of piping directly to the file you can add a listener to the stream you defined. So you can replace req.pipe(file) with
stream.on('data',function(data){
file.write(data)
})
stream.on('end',function(){
file.end()
})
or
stream.pipe(file)
This will pause the stream untill its read, something that doesn't happen with the request module.
More info: https://github.com/request/request/issues/887
Related
With a node http server I'm trying to pipe the request read stream to the response write stream with some intermediary transforms, one of which is a file system write.
The pipeline looks like this with non pertinent code removed for simplicity:
function handler (req, res) {
req.pipe(jsonParse())
.pipe(addTimeStamp())
.pipe(jsonStringify())
.pipe(saveToFs('saved.json'))
.pipe(res);
}
The custom Transform streams are pretty straight forward, but I have no elegant way of writing saveToFs. It looks like this:
function saveToFs (filename) {
const write$ = fs.createWriteStream(filename);
write$.on('open', () => console.log('opened'));
write$.on('close', () => console.log('closed'));
const T = new Transform();
T._transform = function (chunk, encoding, cb) {
write$.write(chunk);
cb(null, chunk);
}
return T;
}
The idea is simply to pipe the data to the write stream and then through to the response stream, but fs.createWriteStream(<file.name>) is only a writable stream, so it makes this approach difficult.
Right now this code has two problems that I can see: the write stream never fires a close event (memory leak?), and I would like the data to pass through the file system write before returning data to the response stream instead of essentially multicasting to two sinks.
Any suggestions, or pointing out fundamental things I've missed would be greatly appreciated.
What you should do is save the stream returned by the .pipe before saveToFs, and then pipe that to a file and res.
function handler(req, res) {
const transformed = req.pipe(jsonParse())
.pipe(addTimeStamp())
.pipe(jsonStringify());
transformed.pipe(fs.createWriteStream('saved.json'));
transformed.pipe(res);
}
To sum it up, you can pipe the same readable stream (transformed) to multiple writable streams.
And I would like the data to pass through the file system write
before returning data to the response stream instead of essentially
multicasting to two sinks.
Use { end: false } option when piping to res.
transformed.pipe(res, { end: false });
And then call res.end() when the file is written or whenever you want.
I'm familiar with Node streams, but I'm struggling on best practices for abstracting code that I reuse a lot into a single pipe step.
Here's a stripped down version of what I'm writing today:
inputStream
.pipe(csv.parse({columns:true})
.pipe(csv.transform(function(row) {return transform(row); }))
.pipe(csv.stringify({header: true})
.pipe(outputStream);
The actual work happens in transform(). The only things that really change are inputStream, transform(), and outputStream. Like I said, this is a stripped down version of what I actually use. I have a lot of error handling and logging on each pipe step, which is ultimately why I'm try to abstract the code.
What I'm looking to write is a single pipe step, like so:
inputStream
.pipe(csvFunction(transform(row)))
.pipe(outputStream);
What I'm struggling to understand is how to turn those pipe steps into a single function that accepts a stream and returns a stream. I've looked at libraries like through2 but I'm but not sure how that get's me to where I'm trying to go.
You can use the PassThrough class like this:
var PassThrough = require('stream').PassThrough;
var csvStream = new PassThrough();
csvStream.on('pipe', function (source) {
// undo piping of source
source.unpipe(this);
// build own pipe-line and store internally
this.combinedStream =
source.pipe(csv.parse({columns: true}))
.pipe(csv.transform(function (row) {
return transform(row);
}))
.pipe(csv.stringify({header: true}));
});
csvStream.pipe = function (dest, options) {
// pipe internal combined stream to dest
return this.combinedStream.pipe(dest, options);
};
inputStream
.pipe(csvStream)
.pipe(outputStream);
Here's what I ended up going with. I used the through2 library and the streaming API of the csv library to create the pipe function I was looking for.
var csv = require('csv');
through = require('through2');
module.exports = function(transformFunc) {
parser = csv.parse({columns:true, relax_column_count:true}),
transformer = csv.transform(function(row) {
return transformFunc(row);
}),
stringifier = csv.stringify({header: true});
return through(function(chunk,enc,cb){
var stream = this;
parser.on('data', function(data){
transformer.write(data);
});
transformer.on('data', function(data){
stringifier.write(data);
});
stringifier.on('data', function(data){
stream.push(data);
});
parser.write(chunk);
parser.removeAllListeners('data');
transformer.removeAllListeners('data');
stringifier.removeAllListeners('data');
cb();
})
}
It's worth noting the part where I remove the event listeners towards the end, this was due to running into memory errors where I had created too many event listeners. I initially tried solving this problem by listening to events with once, but that prevented subsequent chunks from being read and passed on to the next pipe step.
Let me know if anyone has feedback or additional ideas.
I have a http service that needs to redirect a request, I am not using streams because I deal with big files in multipart and it overwhelms RAM or disk(see How do Node.js Streams work?)
Now I am using pipes and it works, the code is something of this sort:
var Req = getReq(response);
request.pipe(Req);
The only shortcoming of this is that in this multipart I resend in the pipe contains one JSON file that needs a few fields to be changed.
Can I still use a pipe and change one file in the piped multipart?
You can do this using a Transform Stream.
var Req = getReq(response);
var transformStream = new TransformStream();
// the boundary key for the multipart is in the headers['content-type']
// if this isn't set, the multipart request would be invalid
Req.headers['content-type'] = request.headers['content-type'];
// pipe from request to our transform stream, and then to Req
// it will pipe chunks, so it won't use too much RAM
// however, you will have to keep the JSON you want to modify in memory
request.pipe(transformStream).pipe(Req);
Transform Stream code:
var Transform = require('stream').Transform,
util = require('util');
var TransformStream = function() {
Transform.call(this, {objectMode: true});
};
util.inherits(TransformStream, Transform);
TransformStream.prototype._transform = function(chunk, encoding, callback) {
// here should be the "modify" logic;
// this will push all chunks as they come, leaving the multipart unchanged
// there's no limitation on what you can push
// you can push nothing, or you can push an entire file
this.push(chunk);
callback();
};
TransformStream.prototype._flush = function (callback) {
// you can push in _flush
// this.push( SOMETHING );
callback();
};
In the _transform function, your logic should be something like this:
If, in the current chunk, the JSON you want to modify begins
<SOME_DATA_BEFORE_JSON> <MY_JSON_START>
then this.push(SOME_DATA_BEFORE_JSON); and keep MY_JSON_START in a local var
While your JSON hasn't ended, append the chunk to your local var
If, in the current chunk, the JSON ends:
<JSON_END> <SOME_DATA_AFTER_JSON>
then add JSON_END to your var, do whatever changes you want,
and push the changes:
this.push(local_var);
this.push(SOME_DATA_AFTER_JSON);
If current chunk has nothing of your JSON, simply push the chunk
this.push(chunk);
Other than that, you may want to read the multipart format.
SOME_DATA_BEFORE_JSON from above will be:
--frontier
Content-Type: text/plain
<JSON_START>
Other than Content-Type, it may contain the filename, encoding, etc.
Something to keep in mind the chunks may end wherever (could end in the middle of the frontier).
The parsing could get quite tricky; I would search for the boundary key (frontier), and then check if the JSON starts after that. There would be two cases:
chunk: <SOME_DATA> --frontier <FILE METADATA> <FILE_DATA>
chunk 1: <SOME_DATA> --fron
chunk 2: ier <FILE METADATA> <FILE_DATA>
Hope this helps!
var http = require('http');
var map = require('through2-map');
uc = map(function(ch) {
return ch.toString().toUpperCase();
});
server = http.createServer(function(request, response) {
request.on('data',function(chunk){
if (request.method == 'POST') {
//change the data from request to uppercase letters and
//pipe to response.
}
});
});
server.listen(8000);
I have two questions about the code above. First, I read the documentation for request, it said that request is an instance of IncomingMessage, which implements Readable Stream. However, I couldn't find .on method in the Stream documentation. So I don't know what chunk in the callback function in request.on does. Secondly, I want to do some manipulation to the data from request and pipe it to response. Should I pipe from chunk or from request? Thank you for consideration!
is chunk a stream?
nop. The stream is the flow among what the chunks of the whole data are sent.
A simple example, If you read a 1gb file, a stream will read it by chunks of 10k, each chunk will go through your stream, from the beginning to the end, with the right order.
I use a file as example, but a socket, request or whatever streams is based on that idea.
Also, whenever someone sends a request to this server would that entire thing be a chunk?
In the particular case of http requests, only the request body is a stream. It can be the posted files/data. Or the response body of the response. Headers are treated as Objects to apply on the request before the body is written on the socket.
A small example to help you with some concrete code,
var through2 = require('through2');
var Readable = require('stream').Readable;
var s1 = through2(function transform(chunk, enc, cb){
console.log("s1 chunk %s", chunk.toString())
cb(err=null, chunk.toString()+chunk.toString() )
});
var s2 = through2(function transform(chunk, enc, cb){
console.log("s2 chunk %s", chunk.toString())
cb(err=null, chunk)
});
s2.on('data', function (data) {
console.log("s2 data %s", data.toString())
})
s1.on('end', function (data) {
console.log("s1 end")
})
s2.on('end', function (data) {
console.log("s2 end")
})
var rs = new Readable;
rs.push('beep '); // this is a chunk
rs.push('boop'); // this is a chunk
rs.push(null); // this is a signal to end the stream
rs.on('end', function (data) {
console.log("rs end")
})
console.log(
".pipe always return piped stream: %s", rs.pipe(s1)===s1
)
s1.pipe(s2)
I would like to suggest you to read more :
https://github.com/substack/stream-handbook
http://maxogden.com/node-streams.html
https://github.com/maxogden/mississippi
All Streams are instances of EventEmitter (docs), that is where the .on method comes from.
Regarding the second question, you MUST pipe from the Stream object (request in this case). The "data" event emits data as a Buffer or a String (the "chunk" argument in the event listener), not a stream.
Manipulating Streams is usually done by implementing a Transform stream (docs). Though there are many NPM packages available that make this process simpler (like through2-map or the like), though in reality, they produce Transform streams.
Consider the following:
var http = require('http');
var map = require('through2-map');
// Transform Stream to uppercase
var uc = map(function(ch) {
return ch.toString().toUpperCase();
});
var server = http.createServer(function(request, response) {
// Pipe from the request to our transform stream
request
.pipe(uc)
// pipe from transfrom stream to response
.pipe(response);
});
server.listen(8000);
You can test by running curl:
$ curl -X POST -d 'foo=bar' http://localhost:8000
# logs FOO=BAR
I'm learning about Node.js in a hurry by trying to recreate some utility applications that I wrote once in C#. I'm sort of confused about what's happening in one of the programs I'm writing. What I want this program to do is open a file on the filesystem, gzip that file, then send the gzipped data off to another server. For testing, I settled on "save the gzipped data to the filesystem, then read it back to verify you got it right."
So what I expect my program to do is:
Create the output file.
Print "Created test output file".
Print "Attempting to re-read the data."
Read the file.
Output the plain text to the console.
What happens instead:
The output file is created.
"Attempting to re-read the data." is printed.
"Created test output file." is printed.
I get no console output.
Here is my code (don't throw stones at me if my style is terrible please!):
var inputFilePath = "data.xml";
var fs = require('fs');
var zlib = require('zlib');
var inputStream = fs.createReadStream(inputFilePath)
.on('end', function() {
console.log("Created test output file.");
});
var gzipStream = zlib.createGzip();
inputStream.pipe(gzipStream);
var outputFile = fs.createWriteStream('output.gzip');
gzipStream.pipe(outputFile);
console.log('Attempting to re-read the data.');
var testInput = fs.createReadStream('output.gzip');
var unzip = zlib.createGunzip();
testInput.pipe(unzip).on('data', function(chunk) {
console.log(chunk.toString('utf8'));
});
I'm suspicious the streaming stuff happens asynchronously, which is why I see the console output in a different order than I expected. If that's true, it's a problem as I need the output file to be ready before I try to open it. I noticed writable.end() and thought that would be the solution, but I don't understand what happens when I call it in various places:
If I call outputFile.end() before trying to read the file, the output file is empty. I guess this is because it closes the stream before a write happens.
If I call gzipStream.end() instead, I don't get the "Created test output file" message, but one is created!
edit
I figured out the second part. In the given code, I should be setting up the handler for the data event from the unzip stream before I call pipe(). The way the code is set up now, I'm setting up to handle the event on a stream returned by the pipe, but never reading from THAT stream so my handler is never called.
I stumbled upon this solution. It looks like conceptually the way to express that I want an order is to say, "When there is no more data in the input stream, start processing the output file it generated." Simplifying greatly:
var inputFilePath = "data.xml";
var fs = require('fs');
var zlib = require('zlib');
function testOutput() {
console.log('Attempting to re-read the data.');
var testInput = fs.createReadStream('output.gzip');
var unzip = zlib.createGunzip();
unzip.on('data', function(chunk) {
console.log(chunk.toString('utf8'))
});
testInput.pipe(unzip);
};
// Load input file stream
var inputStream = fs.createReadStream(inputFilePath);
// Pipe the file stream into a gzip stream
var gzipStream = zlib.createGzip();
gzipStream.on('end', function() {
console.log("Created test output file.");
testOutput();
});
var outputFile = fs.createWriteStream('output.gzip');
inputStream.pipe(gzipStream).pipe(outputFile);
The pipe() call empties inputStream, then gzipStream, which invokes testOutput().