nodejs appending data to the end of a readstream - node.js

I would like to read a .txt file, append data to the end and finally send it to a zipstream. Right now what I'm doing is writting a new file and then using the new file for zipstream, but I would like to do it on the fly, without creating an unnecessary new file.
My question is how to create a read stream, modify it and send to another readstream (maybe with a writestream in the middle).
Is this possible?
The original idea was this one, but I'm lost somewhere in the middle:
var zipstream = require('zipstream');
var Stream = require('stream');
var zipOut = fs.createWriteStream('file.zip');
var zip = zipstream.createZip({ level : 1 });
zip.pipe(zipOut);
var rs = fs.createReadStream('file.txt');
var newRs = new Stream(); // << Here should be an in/out stream??
newRs.pipe = function(dest) {
dest.write(rs.read());
dest.write("New text at the end");
};
zip.addEntry(newRs, {name : 'file.txt'}, function() {
zip.finalize();
});

You can inplement a transform (subclass of stream.Transform). Then in its _flush method you have the ability to output any content you want when the input stream has reached the end. This can be piped between a readable stream and a writable one. Refer to node's stream module documentation for inplementation details.

Related

Please tell me how to print space in a file after stream pipe works completely in nodeJS

I have a simple code which creates a readable stream and pipes it to a writeable stream and count spaces in the file.
My logic seems to be right.The problem is that before calculating space it goes to printing it.
var fs=require('fs');
var inStream=fs.createReadStream('data.txt');
var outStream=fs.createWriteStream('out.txt');
var printStream=process.stdout;
var space=0;
var upperStream=new Transform({
transform(chunck,enc,cb){
var text=chunck.toString().toUpperCase();
for (var i=0;i<text.length;i++)
if(text[i]==' ')
space++;
this.push(text);
cb();
}
});
inStream.pipe(upperStream).pipe(printStream);
console.log("Number of space in file : ",space);
data.txt contains 'Standard error'
This is because your code to count spaces is async code. You should handle the instream end event. The end event will be emitted when data is read completely from instream. Once the space counting is done and data is written to printstream, and consumed fully from instream then only you will be able to get the actual space count.
Also you have missed requiring the Transform stream.
var { Transform } = require('stream');
var fs=require('fs');
var inStream=fs.createReadStream('data.txt');
var outStream=fs.createWriteStream('out.txt');
var printStream=process.stdout;
var space=0;
var upperStream=new Transform({
transform(chunck,enc,cb){
var text=chunck.toString().toUpperCase();
for (var i=0;i<text.length;i++)
if(text[i]==' ')
space++;
this.push(text);
cb();
}
});
inStream.pipe(upperStream).pipe(printStream);
inStream.on('end', () => console.log(`Number of space in file : ${space}`));
Read more about stream events here: https://nodejs.org/api/stream.html

Using stream-combiner and Writable Streams (stream-adventure)

i'm working on nodeschool.io's stream-adventure. The challenge:
Write a module that returns a readable/writable stream using the
stream-combiner module. You can use this code to start with:
var combine = require('stream-combiner')
module.exports = function () {
return combine(
// read newline-separated json,
// group books into genres,
// then gzip the output
)
}
Your stream will be written a newline-separated JSON list of science fiction
genres and books. All the books after a "type":"genre" row belong in that
genre until the next "type":"genre" comes along in the output.
{"type":"genre","name":"cyberpunk"}
{"type":"book","name":"Neuromancer"}
{"type":"book","name":"Snow Crash"}
{"type":"genre","name":"space opera"}
{"type":"book","name":"A Deepness in the Sky"}
{"type":"book","name":"Void"}
Your program should generate a newline-separated list of JSON lines of genres,
each with a "books" array containing all the books in that genre. The input
above would yield the output:
{"name":"cyberpunk","books":["Neuromancer","Snow Crash"]}
{"name":"space opera","books":["A Deepness in the Sky","Void"]}
Your stream should take this list of JSON lines and gzip it with
zlib.createGzip().
HINTS
The stream-combiner module creates a pipeline from a list of streams,
returning a single stream that exposes the first stream as the writable side and
the last stream as the readable side like the duplexer module, but with an
arbitrary number of streams in between. Unlike the duplexer module, each
stream is piped to the next. For example:
var combine = require('stream-combiner');
var stream = combine(a, b, c, d);
will internally do a.pipe(b).pipe(c).pipe(d) but the stream returned by
combine() has its writable side hooked into a and its readable side hooked
into d.
As in the previous LINES adventure, the split module is very handy here. You
can put a split stream directly into the stream-combiner pipeline.
Note that split can send empty lines too.
If you end up using split and stream-combiner, make sure to install them
into the directory where your solution file resides by doing:
`npm install stream-combiner split`
Note: when you test the program, the source stream is automatically inserted into the program, so it's perfectly fine to have split() as the first parameter in combine(split(), etc., etc.)
I'm trying to solve this challenge without using the 'through' package.
My code:
var combiner = require('stream-combiner');
var stream = require('stream')
var split = require('split');
var zlib = require('zlib');
module.exports = function() {
var ws = new stream.Writable({decodeStrings: false});
function ResultObj() {
name: '';
books: [];
}
ws._write = function(chunk, enc, next) {
if(chunk.length === 0) {
next();
}
chunk = JSON.parse(chunk);
if(chunk.type === 'genre') {
if(currentResult) {
this.push(JSON.stringify(currentResult) + '\n');
}
var currentResult = new ResultObj();
currentResult.name = chunk.name;
} else {
currentResult.books.push(chunk.name);
}
next();
var wsObj = this;
ws.end = function(d) {
wsObj.push(JSON.stringify(currentResult) + '\n');
}
}
return combiner(split(), ws, zlib.createGzip());
}
My code does not work and returns 'Cannot pipe. Not readable'. Can someone point out to me where i'm going wrong?
Any other comments on how to improve are welcome too...

How to redirect a stream to other stream depending on data in first chunk?

I'm processing files in a multipart form with Busboy. The process in simplified version looks like this:
file.pipe(filePeeker).pipe(gzip).pipe(encrypt).pipe(uploadToS3)
filePeeker is a through-stream (built with trough2). This duplex stream does the following:
Identify filetype by looking at first bytes in first chunk
Accumulating file size
Calculating MD5 hash
After the first four bytes in the first chunk I know if the file is a zip file. If this is the case I want to redirect the file to a completely different stream. In the new stream the compressed files will be unZipped and then handled separately with the same concept as the original file.
How can I accomplish this?
OriginalProcess:
file.pipe(filePeeker).if(!zipFile).pipe(gZip).pipe(encrypt).pipe(uploadToS3)
UnZip-process
file.pipe(filePeeker).if(zipFile).pipe(streamUnzip).pipeEachNewFile(originalProcess).
Thanks
//Michael
There are modules for that, but the basic idea would be to push to another readable stream and return early in your conditional. Write a Transform stream for it.
var Transform = require("stream").Transform;
var util = require("util");
var Readable = require('stream').Readable;
var rs = new Readable;
rs.pipe(unzip());
function BranchStream () {
Transform.call(this);
}
util.inherits(BranchStream, Transform);
BranchStream.prototype._transform = function (chunk, encoding, done) {
if (isZip(chunk)) {
rs.push(chunk);
return done()
}
this.push(doSomethingElseTo(chunk))
return done()
}

write base64 to file using stream

I am sending a base64 string to my server. On the server I want to create a readable stream that I push the base64 chunks onto that then goes to a writable stream and written to file. My problem is only the first chunk is written to file. My guess is because I create a new buffer with each chunk this is what is causing the problem but if I send just the string chunks in without creating the buffer the image file is corrupt.
var readable = new stream.Readable();
readable._read = function() {}
req.on('data', function(data) {
var dataText = data.toString();
var dataMatch = dataText.match(/^data:([A-Za-z-+\/]+);base64,(.+)$/);
var bufferData = null;
if (dataMatch) {
bufferData = new Buffer(dataMatch[2], 'base64')
}
else {
bufferData = new Buffer(dataText, 'base64')
}
readable.push(bufferData)
})
req.on('end', function() {
readable.push(null);
})
This is not so trivial as you might think:
Use Transform, not Readable. You can pipe request stream to transform, thus handling back pressure.
You can't use regular expressions, because text you are expecting can be broken in two or more chunks. You could try to accumulate chunks and exec regular expression each time, but if the format of stream is incorrect (that is, not a data uri) you will end up buffering the whole request and running regular expression a lot of times on megabytes long string.
You can't take arbitrary chunk and do new Buffer(chunk, 'base64') because it may not be valid itself. Example: new Buffer('AQID', 'base64') yields new Buffer([1, 2, 3]), but Buffer.concat([new Buffer('AQ', 'base64'), new Buffer('ID', 'base64')]) yields new Buffer([1, 32])
For the 3 problem you can use one of available modules (like base64-stream). Here is an example:
var base64 = require('base64-stream');
var stream = require('stream');
var decoder = base64.decode();
var input = new stream.PassThrough();
var output = new stream.PassThrough();
input.pipe(decoder).pipe(output);
output.on('data', function (data) {
console.log(data);
});
input.write('AQ');
input.write('ID');
You can see that it buffers input and emits data as soon as enough arrived.
As for the 2 problem you need to implement simple stream parser. As an idea: wait for data: string, then buffer chunks (if you need them) until ;base64, found, then pipe to base64-stream.

How do I ensure a Node.js function only runs once a stream has ended?

I'm learning about Node.js in a hurry by trying to recreate some utility applications that I wrote once in C#. I'm sort of confused about what's happening in one of the programs I'm writing. What I want this program to do is open a file on the filesystem, gzip that file, then send the gzipped data off to another server. For testing, I settled on "save the gzipped data to the filesystem, then read it back to verify you got it right."
So what I expect my program to do is:
Create the output file.
Print "Created test output file".
Print "Attempting to re-read the data."
Read the file.
Output the plain text to the console.
What happens instead:
The output file is created.
"Attempting to re-read the data." is printed.
"Created test output file." is printed.
I get no console output.
Here is my code (don't throw stones at me if my style is terrible please!):
var inputFilePath = "data.xml";
var fs = require('fs');
var zlib = require('zlib');
var inputStream = fs.createReadStream(inputFilePath)
.on('end', function() {
console.log("Created test output file.");
});
var gzipStream = zlib.createGzip();
inputStream.pipe(gzipStream);
var outputFile = fs.createWriteStream('output.gzip');
gzipStream.pipe(outputFile);
console.log('Attempting to re-read the data.');
var testInput = fs.createReadStream('output.gzip');
var unzip = zlib.createGunzip();
testInput.pipe(unzip).on('data', function(chunk) {
console.log(chunk.toString('utf8'));
});
I'm suspicious the streaming stuff happens asynchronously, which is why I see the console output in a different order than I expected. If that's true, it's a problem as I need the output file to be ready before I try to open it. I noticed writable.end() and thought that would be the solution, but I don't understand what happens when I call it in various places:
If I call outputFile.end() before trying to read the file, the output file is empty. I guess this is because it closes the stream before a write happens.
If I call gzipStream.end() instead, I don't get the "Created test output file" message, but one is created!
edit
I figured out the second part. In the given code, I should be setting up the handler for the data event from the unzip stream before I call pipe(). The way the code is set up now, I'm setting up to handle the event on a stream returned by the pipe, but never reading from THAT stream so my handler is never called.
I stumbled upon this solution. It looks like conceptually the way to express that I want an order is to say, "When there is no more data in the input stream, start processing the output file it generated." Simplifying greatly:
var inputFilePath = "data.xml";
var fs = require('fs');
var zlib = require('zlib');
function testOutput() {
console.log('Attempting to re-read the data.');
var testInput = fs.createReadStream('output.gzip');
var unzip = zlib.createGunzip();
unzip.on('data', function(chunk) {
console.log(chunk.toString('utf8'))
});
testInput.pipe(unzip);
};
// Load input file stream
var inputStream = fs.createReadStream(inputFilePath);
// Pipe the file stream into a gzip stream
var gzipStream = zlib.createGzip();
gzipStream.on('end', function() {
console.log("Created test output file.");
testOutput();
});
var outputFile = fs.createWriteStream('output.gzip');
inputStream.pipe(gzipStream).pipe(outputFile);
The pipe() call empties inputStream, then gzipStream, which invokes testOutput().

Resources