Node js file system: end event not called for readable stream - node.js

I'm trying to extract a .tar file(packed from a directory) and then check the names of the files in the extracted directory. I'm using tar-fs to extract the tar file and then use fs.createReadStream to manipulate the data. Here's what I've got so far:
fs.createReadStream(req.files.file.path)
.pipe(tar.extract(req.files.file.path + '0'))
.on('error', function() {
errorMessage = 'Failed to extract file. Please make sure to upload a tar file.';
})
.on('entry', function(header, stream, callback) {
console.error(header);
stream.on('end', function() {
console.error("this is working");
});
})
.on('end', function() {
//the one did not get called
console.error('end');
})
;
I was hoping to extract the whole folder and then check the file names. Well, I haven't get that far yet..
To my understanding, I got a readable stream after the pipe. And a readable stream has an end event? My question is, why the end event in the code is not called?
Thanks!

Listen for finish event for writable streams. This is fired when end() has been called and processing of the entry is finished. More on it here.
.on('finish', function() {
console.error('end');
})

Related

Read remote file into Duplex NodeJS stream, then write the stream data into form-data upload

I am reading from a remote file on an SFTP server. The method to read the file takes a writeable stream, and writes the file data onto that stream before returning.
Once complete, I am passing this stream to form.append used by the form-data library, in order to upload that file data to an API.
The stream is declared like this:
const stream = new duplex({
write(chunk, encoding, callback) {
console.log('wrote chunk')
callback();
},
read() {
console.log(`Read method called`)
}
})
When the write method is called during the SFTP access, 'wrote chunks' is called multiple times. When the file upload using form-data is called, 'Read method called' is called once. After it is called, the HTTP request never completes.
Suspect my implementation of the read method on the Duplex stream is wrong, and the full file data is never read properly from the stream.
Is there a way I can validate what my problem is here or am I fundamentally misunderstanding how streams operate?
I have tried adding the following to the Duplex stream before it is passed to the form-data function, but none of it is ever called.
stream.on('data', () => {
console.log(`Read bytes of data.`);
});
stream.on('end', () => {
console.log('There will be no more data.');
});
stream.on('error', () => {
console.log('error');
});```

Running function once inside node stream pipe chain

I'm using vinyl-fs to write a simple pipeline that loads markdown files, converts them to HTML, and saves them to disk. This is all working.
However, in the middle of my pipe() chain, I want to perform an asynchronous task that should just be done once for all my files. My current problem relates to loading a file (and it's important that file is loaded in the middle of the chain), but it's a problem I find myself stumbling upon all the time.
To solve this problem, I have started to do this:
vfs.src(*.md).pipe(function() {
var loaded = false;
return through2.obj(function(file, enc, cb) {
if(!loaded) {
fs.readFile('myfile', function(err, data) {
// use data for something
loaded = true;
cb(null, file);
}
} else {
// passthrough
cb(null, file);
}
});
}
This feels a bit silly. Am I approaching this all wrong, or is this actually an okay thing to do?
After reading a ton of articles about Node streams, it seems that the best implementation for this is to listen to the streams finish event, and then create a new stream based on the files from the former stream. This allows me to do exactly what I want: stream files through the pipeline until a point where I need to access the array of files for some task, and then continue the pipeline stream afterwards.
Here's what that looks like:
var vfs = require('vinyl-fs');
var through = require('through2');
// array for storing file objects
var files = [];
// start the stream
var firstStream = vfs.src("*.jpg")
// pipe it through a function that saves each file to the array
.pipe(through.obj(function(file, enc, cb) {
files.push(file);
console.log('1: ', path.basename(file.path));
cb(null, file);
}))
// when this stream is done
.on('finish', function() {
console.log('FINISH');
// files will now be full of all files from stream
// and you can do whatever you want with them.
// create a new stream
var secondStream = through.obj();
// write the files to the new stream
files.each(function(file) {
secondStream.write(file);
});
// end the stream to make sure the finish
// event triggers
secondStream.end();
// now continue piping
secondStream.pipe(through.obj(function(file, enc, cb) {
console.log('2: ', path.basename(file.path));
cb(null, file)
}))
.pipe(vfs.dest("tmp"));
});
In this scenario, I have 5 JPG images next to my scripts, and the console.log will say
1: IMG_1.JPG
1: IMG_2.JPG
1: IMG_3.JPG
1: IMG_4.JPG
1: IMG_5.JPG
FINISH
2: IMG_1.JPG
2: IMG_2.JPG
2: IMG_3.JPG
2: IMG_4.JPG
2: IMG_5.JPG

How do you use Node.js to stream an MP4 file with ffmpeg?

I've been trying to solve this problem for several days now and would really appreciate any help on the subject.
I'm able to successfully stream an mp4 audio file stored on a Node.js server using fluent-ffmpeg by passing the location of the file as a string and transcoding it to mp3. If I create a file stream from the same file and pass that to fluent-ffmpeg instead it works for an mp3 input file, but not a mp4 file. In the case of the mp4 file no error is thrown and it claims the stream completed successfully, but nothing is playing in the browser. I'm guessing this has to do with the meta data being stored at the end of an mp4 file, but I don't know how to code around this. This is the exact same file that works correctly when it's location is passed to ffmpeg, rather than the stream. When I try and pass a stream to the mp4 file on s3, again no error is thrown, but nothing streams to the browser. This isn't surprising as ffmpeg won't work with the file locally as stream, so expecting it to handle the stream from s3 is wishful thinking.
How can I stream the mp4 file from s3, without storing it locally as a file first? How do I get ffmpeg to do this without transcoding the file too? The following is the code I have at the moment which isn't working. Note that it attempts to pass the s3 file as a stream to ffmpeg and it's also transcoding it into an mp3, which I'd prefer not to do.
.get(function(req,res) {
aws.s3(s3Bucket).getFile(s3Path, function (err, result) {
if (err) {
return next(err);
}
var proc = new ffmpeg(result)
.withAudioCodec('libmp3lame')
.format('mp3')
.on('error', function (err, stdout, stderr) {
console.log('an error happened: ' + err.message);
console.log('ffmpeg stdout: ' + stdout);
console.log('ffmpeg stderr: ' + stderr);
})
.on('end', function () {
console.log('Processing finished !');
})
.on('progress', function (progress) {
console.log('Processing: ' + progress.percent + '% done');
})
.pipe(res, {end: true});
});
});
This is using the knox library when it calls aws.s3... I've also tried writing it using the standard aws sdk for Node.js, as shown below, but I get the same outcome as above.
var AWS = require('aws-sdk');
var s3 = new AWS.S3({
accessKeyId: process.env.AWS_ACCESS_KEY_ID,
secretAccessKey: process.env.AWS_SECRET_KEY,
region: process.env.AWS_REGION_ID
});
var fileStream = s3.getObject({
Bucket: s3Bucket,
Key: s3Key
}).createReadStream();
var proc = new ffmpeg(fileStream)
.withAudioCodec('libmp3lame')
.format('mp3')
.on('error', function (err, stdout, stderr) {
console.log('an error happened: ' + err.message);
console.log('ffmpeg stdout: ' + stdout);
console.log('ffmpeg stderr: ' + stderr);
})
.on('end', function () {
console.log('Processing finished !');
})
.on('progress', function (progress) {
console.log('Processing: ' + progress.percent + '% done');
})
.pipe(res, {end: true});
=====================================
Updated
I placed an mp3 file in the same s3 bucket and the code I have here worked and was able to stream the file through to the browser without storing a local copy. So the streaming issues I face have something to do with the mp4/aac container/encoder format.
I'm still interested in a way to bring the m4a file down from s3 to the Node.js server in it's entirety, then pass it to ffmpeg for streaming without actually storing the file in the local file system.
=====================================
Updated Again
I've managed to get the server streaming the file, as mp4, straight to the browser. This half answers my original question. My only issue now is that I have to download the file to a local store first, before I can stream it. I'd still like to find a way to stream from s3 without needing the temporary file.
aws.s3(s3Bucket).getFile(s3Path, function(err, result){
result.pipe(fs.createWriteStream(file_location));
result.on('end', function() {
console.log('File Downloaded!');
var proc = new ffmpeg(file_location)
.outputOptions(['-movflags isml+frag_keyframe'])
.toFormat('mp4')
.withAudioCodec('copy')
.seekInput(offset)
.on('error', function(err,stdout,stderr) {
console.log('an error happened: ' + err.message);
console.log('ffmpeg stdout: ' + stdout);
console.log('ffmpeg stderr: ' + stderr);
})
.on('end', function() {
console.log('Processing finished !');
})
.on('progress', function(progress) {
console.log('Processing: ' + progress.percent + '% done');
})
.pipe(res, {end: true});
});
});
On the receiving side I just have the following javascript in an empty html page:
window.AudioContext = window.AudioContext || window.webkitAudioContext;
context = new AudioContext();
function process(Data) {
source = context.createBufferSource(); // Create Sound Source
context.decodeAudioData(Data, function(buffer){
source.buffer = buffer;
source.connect(context.destination);
source.start(context.currentTime);
});
};
function loadSound() {
var request = new XMLHttpRequest();
request.open("GET", "/stream/<audio_identifier>", true);
request.responseType = "arraybuffer";
request.onload = function() {
var Data = request.response;
process(Data);
};
request.send();
};
loadSound()
=====================================
The Answer
The code above under the title 'updated again' will stream an mp4 file, from s3, via a Node.js server to a browser without using flash. It does require that the file be stored temporarily on the Node.js server so that the meta data in the file is moved from the end of the file to the front. In order to stream without storing the temporary file, you need to actual modify the file on S3 first and make this meta data change. If you have changed the file in this way on S3 then you can modify the code under the title 'updated again' so that the result from S3 is piped straight into the ffmpeg constructor, rather than into a file stream on the Node.js server, then providing that file location to ffmepg, as the code does now. You can change the final 'pipe' command to 'save(location)' to get a version of the mp4 file locally with the meta data moved to the front. You can then upload that new version of the file to S3 and try out the end to end streaming. Personally I'm now going to create a task that modifies the files in this way as they are uploaded to s3 in the first place. This allows me to record and stream in mp4 without transcoding or storing a temp file on the Node.js server.
One of the main issues here is that you cannot seek on a piped stream. So you would have to store the file first. However, if you want to just stream from the beginning you can use a slightly different construction and pipe. Here is an example of the most straight forward way to do it.
// can just create an in-memory read stream without saving
var stream = aws.s3(s3Bucket).getObject(s3Path).createReadStream();
// fluent-ffmpeg supports a readstream as an arg in constructor
var proc = new ffmpeg(stream)
.outputOptions(['-movflags isml+frag_keyframe'])
.toFormat('mp4')
.withAudioCodec('copy')
//.seekInput(offset) this is a problem with piping
.on('error', function(err,stdout,stderr) {
console.log('an error happened: ' + err.message);
console.log('ffmpeg stdout: ' + stdout);
console.log('ffmpeg stderr: ' + stderr);
})
.on('end', function() {
console.log('Processing finished !');
})
.on('progress', function(progress) {
console.log('Processing: ' + progress.percent + '% done');
})
.pipe(res, {end: true});
Blockquote
How can I stream the mp4 file from s3, without storing it locally as a file first? How do I get ffmpeg to do this without transcoding the file too? The following is the code I have at the moment which isn't working. Note that it attempts to pass the s3 file as a stream to ffmpeg and it's also transcoding it into an mp3, which I'd prefer not to do.
AFAIK - if the moov atom is in the right place in media file, for S3 hosted mp4, nothing special is require for streaming because you can rely on http for that. If the client request "chunked" encoding it will get just that, a chunked stream terminated by the "END-OF" marker shown below.
0\r\n
\r\n
By including the chunked header, the client is saying " I want a stream" . Under the covers, S3 is just nginx or apache isn't it? They both honor the headers.
test it with curl CLI as your client...
> User-Agent: curl/7.28.1-DEV
> Host: S3.domain
> Accept: */*
> Transfer-Encoding: chunked
> Content-Type: video/mp4
> Expect: 100-continue
May want to try out adding the codecs to the "Content-Type:" header. I dont know, but dont think it would be required for this type of streaming ( the atom resolves that stuff )
I had an issue buffering file streams from the S3 file object. The s3 filestream does not have the correct headers set and it seems it does not implement piping correctly.
I think a better solution is to use this nodejs module called s3-streams. It sets the correct headers and buffers the output so that the stream can be correctly piped to the output response socket. It saves you from saving the filestream locally first before restreaming it.

How do I stream large files over ssh in Node?

I'm trying to stream a cat command using the ssh2 module but it just hangs at some point of the execution. I'm executing cat there.txt where there.txt is around 10 MB or so.
For example:
local = fs.createWriteStream('here.txt');
conn.exec('cat there.txt', function(err, stream) {
if (err) throw err;
stream.pipe(local).on('finish, function() { console.log('Done'); });
}
This just completely stops at one point. I've even piped the stream to local stdout, and it just hangs after a while. In my actual code, I pipe it through a bunch of other transform streams so I think this is better than transferring the files to the local system first (the files may get larger than 200MB).
I had just started working with streams recently so I when I was piping the ssh stream through various transform streams, I wasn't ending on a writable stream like I was in my example (I should've included my actual code, sorry!). This caused it to hang. This was originally so that I could execute multiple commands remotely and put their output sorted into a single file.
So, my original code was stream.pipe(transformStream), then push the transformStream to an array once it's finished. And then sort it using the mergesort-stream npm module. Instead of that, I just write the results from the multiple ssh commands (transformed) to temporary files and then sort them all at once.
Try out the createReadStream for serving huge files:
fs.exists(correctfilepath, function(exists) {
if (exists) {
var readstream = fs.createReadStream(correctfilepath);
console.log("About to serve " + correctfilepath);
res.writeHead(200);
readstream.setEncoding("binary");
readstream.on("data", function (chunk) {
res.write(chunk, "binary");
});
readstream.on("end", function () {
console.log("Served file " + correctfilepath);
res.end();
});
readstream.on('error', function(err) {
res.write(err + "\n");
res.end();
return;
});
} else {
res.writeHead(404);
res.write("No data\n");
res.end();
}
});

Node js- writing data to the writable stream

In my node application im writing data to the file using write method in the createWriteStream method.Now i need to find whether the write for the particular stream is complete or not.How can i find that.
var stream = fs.createWriteStream('myFile.txt', {flags: 'a'});
var result = stream.write(data);
writeToStream();
function writeToStream() {
var result = stream.write(data + '\n');
if (!result) {
stream.once('drain',writeToStream());
}
}
I need to call other method for every time when write completes.How can i do this.
From the node.js WritableStream.write(...) documentation you can give the "write" method a callback that is called when the written data is flushed:
var stream = fs.createWriteStream('myFile.txt', {flags: 'a'});
var data = "Hello, World!\n";
stream.write(data, function() {
// Now the data has been written.
});
Note that you probably don't need to actually wait for each call to "write" to complete before queueing the next call. Even if the "write" method returns false you can still call subsequent writes and node will buffer the pending write requests into memory.
I am using maerics's answer along with error handling. The flag 'a' is used to Open file for appending. The file is created if it does not exist. There Other flags you can use.
// Create a writable stream & Write the data to stream with encoding to be utf8
var writerStream = fs.createWriteStream('MockData/output.txt',{flags: 'a'})
.on('finish', function() {
console.log("Write Finish.");
})
.on('error', function(err){
console.log(err.stack);
});
writerStream.write(outPutData,function() {
// Now the data has been written.
console.log("Write completed.");
});
// Mark the end of file
writerStream.end();

Resources