file upload with nodejs koa co-busboy - node.js

I use co-busyboy to parse file fields when uploading files using KOA. The official example looks like this:
var parse = require('co-busboy')
var parts = parse(this);
var part;
while (part = yield parts) {
if(!part.length) //it is a stream
part.pipe(fs.createWriteStream('some file.txt'));
}
For some reasons, I want to save all of "part" streams into an array, and perform the actual files writing when all of file streams are fetched. i.e.:
var parse = require('co-busboy');
var parts = parse(this);
var part;
var partArrays = [];
var cnt = 0;
while(part = yield parts){
partArray[cnt++] = part;
}
//after some processing, I perform the writing
for(file in partArray){
file.pipe(fs.createWriteStream('some file.txt');
}
The problem is that the while loop just does not continue. It seems that if I do not call part.pipe, the loop will halt.
So how can I make the while loop continue?

Related

Nodejs stream.pipe performing asynchronously

I have a large xml which is a combination of xml documents. I'm trying to use a nodejs xml splitter and respond back with the number of documents i have found.
My code looks something like this. I'm looking to get the number of documents outside the function(in the last line). Is there something I can do to achieve this?
var XmlSplit = require('./xmlsplitter.js')
const fs = require('fs')
var xmlsplit = new XmlSplit()
var no_of_docs = 0
var inputStream = fs.createReadStream('./files/input/test.xml')
inputStream.pipe(xmlsplit).on('data', function(data,callback) {
var xmlDocument = data.toString();
no_of_docs = no_of_docs + 1;
})
inputStream.pipe(xmlsplit).on('end', function(){
console.log('Stream ended');
console.log(no_of_docs); <-- This prints the correct value but the value is lost as soon as we exit this.
});
console.log("This is outside the function " + no_of_docs); <-- I need the value here.

node.js Get file through http request in chunks, then join them

I am using range header to download mp4 file in parts like so:
const chunkFile = fs.createWriteStream('5a52e9ba-9328-11ec-b909-0242ac120002.mp4.chunk');
const downloadRequest = https.get({
...
range: 'bytes=0-10000'
}, response => {
...
response.pipe(chunkFile);
}
I do this in loop for all ranges and I end up with a bunch of chunk files in directory, I simplified it for the sake of the question. Then I join the chunks back into one file like so:
function joinChunks({chunkHashes, fileName}) {
const outputFile = fs.createWriteStream(fileName);
for (let i = 0, {length} = chunkHashes; i < length; i++) {
const chunkData = fs.readFileSync(chunkHashes[i]);
outputFile.write(chunkData);
fs.unlinkSync(chunkHashes[i]);
}
}
Was I naive to think it would work like that? The resulting file is the same size but it's broken. Is there a way to store raw data from http response in chunk files on disk, and then join that data to end up with the original file?

Using stream-combiner and Writable Streams (stream-adventure)

i'm working on nodeschool.io's stream-adventure. The challenge:
Write a module that returns a readable/writable stream using the
stream-combiner module. You can use this code to start with:
var combine = require('stream-combiner')
module.exports = function () {
return combine(
// read newline-separated json,
// group books into genres,
// then gzip the output
)
}
Your stream will be written a newline-separated JSON list of science fiction
genres and books. All the books after a "type":"genre" row belong in that
genre until the next "type":"genre" comes along in the output.
{"type":"genre","name":"cyberpunk"}
{"type":"book","name":"Neuromancer"}
{"type":"book","name":"Snow Crash"}
{"type":"genre","name":"space opera"}
{"type":"book","name":"A Deepness in the Sky"}
{"type":"book","name":"Void"}
Your program should generate a newline-separated list of JSON lines of genres,
each with a "books" array containing all the books in that genre. The input
above would yield the output:
{"name":"cyberpunk","books":["Neuromancer","Snow Crash"]}
{"name":"space opera","books":["A Deepness in the Sky","Void"]}
Your stream should take this list of JSON lines and gzip it with
zlib.createGzip().
HINTS
The stream-combiner module creates a pipeline from a list of streams,
returning a single stream that exposes the first stream as the writable side and
the last stream as the readable side like the duplexer module, but with an
arbitrary number of streams in between. Unlike the duplexer module, each
stream is piped to the next. For example:
var combine = require('stream-combiner');
var stream = combine(a, b, c, d);
will internally do a.pipe(b).pipe(c).pipe(d) but the stream returned by
combine() has its writable side hooked into a and its readable side hooked
into d.
As in the previous LINES adventure, the split module is very handy here. You
can put a split stream directly into the stream-combiner pipeline.
Note that split can send empty lines too.
If you end up using split and stream-combiner, make sure to install them
into the directory where your solution file resides by doing:
`npm install stream-combiner split`
Note: when you test the program, the source stream is automatically inserted into the program, so it's perfectly fine to have split() as the first parameter in combine(split(), etc., etc.)
I'm trying to solve this challenge without using the 'through' package.
My code:
var combiner = require('stream-combiner');
var stream = require('stream')
var split = require('split');
var zlib = require('zlib');
module.exports = function() {
var ws = new stream.Writable({decodeStrings: false});
function ResultObj() {
name: '';
books: [];
}
ws._write = function(chunk, enc, next) {
if(chunk.length === 0) {
next();
}
chunk = JSON.parse(chunk);
if(chunk.type === 'genre') {
if(currentResult) {
this.push(JSON.stringify(currentResult) + '\n');
}
var currentResult = new ResultObj();
currentResult.name = chunk.name;
} else {
currentResult.books.push(chunk.name);
}
next();
var wsObj = this;
ws.end = function(d) {
wsObj.push(JSON.stringify(currentResult) + '\n');
}
}
return combiner(split(), ws, zlib.createGzip());
}
My code does not work and returns 'Cannot pipe. Not readable'. Can someone point out to me where i'm going wrong?
Any other comments on how to improve are welcome too...

wait for previous stream to be empty before allowing reading

Say I have a file that contains a list of integers, one per line. I use fs.createReadStream and pipe that into split (so that each chunk is an integer). Then I pipe that into a duplex stream that is supposed to add the numbers and write the sum by piping into fs.createWriteStream.
var fs = require('fs');
var stream = require('stream');
var split = require('split');
var addIntegers = new stream.Duplex();
addIntegers.sum = 0;
addIntegers._read = function(size) {
this.push(this.sum + '\n');
}
addIntegers._write = function(chunk, encoding, done) {
this.sum += +chunk;
done();
}
fs.createReadStream('list-of-integers.txt')
.pipe(split())
.pipe(addIntegers)
.pipe(fs.createWriteStream('sum.txt'));
When I run this, sum.txt just gets continually filled with zeroes and the program never terminates (as expected). How do I wait for the input stream (split) to be empty before allowing the ouput stream (fs.createWriteStream) to read from addIntegers?
I figured it out.
I decided to use a Transform stream instead (thanks mscdex) because it has a method (_flush) that gets called after all written data is consumed. The working code is below. Don't forget to npm i split :)
var fs = require('fs');
var stream = require('stream');
var split = require('split');
var addIntegers = new stream.Transform();
addIntegers.sum = 0;
addIntegers._transform = function(chunk, encoding, done) {
this.sum += +chunk;
done();
}
addIntegers._flush = function(done) {
this.push(this.sum + '\n');
}
fs.createReadStream('list-of-integers.txt')
.pipe(split())
.pipe(addIntegers)
.pipe(fs.createWriteStream('sum.txt'));

Stream and transform a file in place with nodejs

I'd like to do something like:
var fs = require('fs');
var through = require('through');
var file = 'path/to/file.json';
var input = fs.createReadStream(file, 'utf8');
var output = fs.createWriteStream(file, 'utf8');
var buf = '';
input
.pipe(through(function data(chunk) { buf += chunk; }, function end() {
var data = JSON.parse(buf);
// Do some transformation on the obj, and then...
this.queue(JSON.stringify(data, null, ' '));
})
.pipe(output);
But this fails because it's trying to read and write to the same destination. There are ways around it, like only piping to output from within the end callback above.
Is there a better way? By better, I mean uses less code or less memory. And yes, I'm aware that I could just do:
var fs = require('fs');
var file = 'path/to/file.json';
var str = fs.readFileSync(file, 'utf8');
var data = JSON.parse(str);
// Do some transformation on the obj, and then...
fs.writeFileSync(file, JSON.stringify(data, null, ' '), 'utf8');
There is no other way that your code will use less memory, because you need the whole file to parse it into a Javascript object. In this way, both versions of your code are equivalent memory-wise. If you can do some work without having to work on the full JSON object, check out JSONStream.
In your example, you should read the file, then parse and transform it, then write the result to a file; although you shouldn't use the synchronous version of the functions, see the end of this paragraph of the Node.js documentation:
In busy processes, the programmer is strongly encouraged to use the asynchronous versions of these calls. The synchronous versions will block the entire process until they complete--halting all connections.
Anyway, I don't think you can read from a file while you're overwriting it. See this particular answer to the same problem.

Resources