NodeJS writable streams: how to wait for data to be flushed? - node.js

I have a simple situation in which an https.get pipes its response stream into a file stream created with fs.createWriteStream, something like this:
var file = fs.createWriteStream('some-file');
var downloadComplete = function() {
// check file size with fs.stat
};
https.get(options, function(response) {
file.on('finish', downloadComplete);
response.pipe(file);
});
Almost all the time this works fine and the file size determined in downloadComplete is what is expected. Every so often however, it's a bit too small, almost like the underlying file stream hasn't written to the disk even though it has raised the finish event.
Does anyone know what's happening here, or have any particular way to make this safer to delays in finish being called and the underlying data being written to disk?

Related

Node - Is it possible that a stream closes or stops writing and how to reopen?

I could not find much information in the documentation. Using fs, is it possible that a stream opened with fs.createWriteStream closes unexpectedly or stops writing to file before calling stream.end()?
One scenario that comes to my mind is an error happens and the OS closes the stream. Is this possible, and in this case would fs reopen the stream automatically?
I think we can use something along the lines of
let stream = fs.createWriteStream('file.log', {flags: 'a'});
stream.on('close', () => {
stream = fs.createWriteStream('file.log', {flags: 'a'});
});
But I wonder if this approach is prone to memory leaks or other issues. Thanks!

Unable to use one readable stream to write to two different targets in Node JS

I have a client side app where users can upload an image. I receive this image in my Node JS app as readable data and then manipulate it before saving like this:
uploadPhoto: async (server, request) => {
try {
const randomString = `${uuidv4()}.jpg`;
const stream = Fse.createWriteStream(`${rootUploadPath}/${userId}/${randomString}`);
const resizer = Sharp()
.resize({
width: 450
});
await data.file
.pipe(resizer)
.pipe(stream);
This works fine, and writes the file to the projects local directory. The problem comes when I try to use the same readable data again in the same async function. Please note, all of this code is in a try block.
const stream2 = Fse.createWriteStream(`${rootUploadPath}/${userId}/thumb_${randomString}`);
const resizer2 = Sharp()
.resize({
width: 45
});
await data.file
.pipe(resizer2)
.pipe(stream2);
The second file is written, but when I check the file, it seems corrupted or didn't successfully write the data. The first image is always fine.
I've tried a few things, and found one method that seems to work but I don't understand why. I add this code just before the I create the second write stream:
data.file.on('end', () => {
console.log('There will be no more data.');
});
Putting the code for the second write stream inside the on-end callback block doesn't make a difference, however, if I leave the code outside of the block, between the first write stream code and the second write stream code, then it works, and both files are successfully written.
It doesn't feel right leaving the code the way it is. Is there a better way I can write the second thumb nail image? I've tried to use the Sharp module to read the file after the first write stream writes the data, and then create a smaller version of it, but it doesn't work. The file doesn't ever seem to be ready to use.
You have 2 alternatives, which depends on how your software is designed.
If possible, I would avoid to execute two transform operations on the same stream in the same "context", eg: an API endpoint. I would rather separate those two different tranform so they do not work on the same input stream.
If that is not possible or would require too many changes, the solution is to fork the input stream and the pipe it into two different Writable. I normally use Highland.js fork for these tasks.
Please also see my comments on how to properly handle streams with async/await to check when the write operation is finished.

Why is fs.createReadStream ... pipe(res) locking the read file?

I'm using express to stream audio & video files according to this answer. Relevant code looks like this:
function streamMedia(filePath, req, res) {
// code here to determine which bytes to send, compute response headers, etc.
res.writeHead(status, headers);
var stream = fs.createReadStream(filePath, { start, end })
.on('open', function() {
stream.pipe(res);
})
.on('error', function(err) {
res.end(err);
})
;
}
This works just fine to stream bytes to <audio> and <video> elements on the client. However after these requests are served, another express request can delete the file being streamed from the filesystem. This second request is failing, sort of.
What happens is that as long as the file is streamed at least once (meaning a createReadStream was invoked for the file's path while running the code above), then a different express request comes in to delete the file, the file remains on the filesystem until express is stopped. As soon as express is stopped, the files are deleted from the filesystem.
What exactly is going on here? Is it fs or express that is locking the file, why, and how can I get the process to release the file so that it can be deleted (after its contents have been read and piped to a response, if any is pending)?
Update 1:
I've modified the above code to set autoClose: true for the second function arg, and added both 'end' and 'close' event handlers, like so:
res.writeHead(status, headers);
var streamReadOpts = { start: start, end: end, autoClose: true };
var stream = fs.createReadStream(filePath, streamReadOpts)
// previous 'open' & 'error' event handlers are still here
.on('end', function () {
console.log('stream end');
})
.on('close', function () {
console.log('stream close');
})
What I have discovered is that when a page initially loads with a <video> or <audio> element, only the 'open' even is fired. Then when the user clicks to play the video/audio, a second request is made, and this second time, both the 'end' and 'close' events fire, and subsequently deleting the file succeeds.
So it appears that the file is being locked when a user loads the page that has the <video> or <audio> element that gets its source from the request that calls this function. It isn't until that media file is played that a second request is made, and the file is unlocked.
I've also discovered that closing the browser also causes the 'end' and 'close' events to fire, and the file to be unlocked. My guess is that I'm doing something wrong with the express res to make it not close properly, but I'm still not sure what that could be.
It turned out the solution to this was to read and pipe smaller blocks of data from the file during each request. In my test cases for this, I was streaming a 6MB MP4 video file. Though I was able to reproduce the issue using either firefox or chrome, I debugged using the latter, and found that the client was blocking the stream.
When the page initially loads, there is an element that looks something like this:
<video> <!-- or <audio> -->
<source src="/path/to/express/request" type="video/mpeg" /> <!-- or audio/mpeg -->
</video> <!-- or </audio> -->
As is documented in the other answer referenced in the OP, chrome will send a request with a range header like so:
Range:bytes=0-
For this request, my function was sending the whole file, and my response looked like this:
Accept-Ranges:bytes
Connection:keep-alive
Content-Length:6070289
Content-Range:bytes 0-6070288/6070289
Content-Type:video/mp4
However, chrome was not reading the whole stream. It was only reading the first 3-4MB, then blocking the connection until a user action caused it to need the rest of the file. This explains why closing either the browser or stopping express caused the files to be unlocked, because it closed the connection from either the browser or the server's end.
My current solution is to only send a maximum of 1MB (the old school 1MB, 1024 * 1024) chunk at a time. The relevant code can be found in an additional answer to the question referenced in the OP.
Set autoClose = true in options. If autoClose = false you have to close it manually in 'end' event.
Refer node doc :- https://nodejs.org/api/fs.html#fs_fs_createreadstream_path_options

how to read an incomplete file and wait for new data in nodejs

I have a UDP client that grabs some data from another source and writes it to a file on the server. Since this is large amount of data, I dont want the end user to wait until they its full written to the server so that they can download it. So I made a NodeJS server that grabs the latest data from the file and sends it to the user.
Here is the code:
var stream = fs.readFileSync(filename)
.on("data", function(data) {
response.write(data)
});
The problem here is, if the download starts when the file was only for example 10mb.. the fs.readFileSync will only read my file up to 10mb. Even if 2 mins later the file increased to 100mb. fs.readFileSync will never know about the new updated data. How can I do this in Node? I would like somehow refresh the fs state or maybe perpaps wait for new data using fs file system. Or is there some kind of fs fileContent watcher?
EDIT:
I think the code below describes better what I would like to achieve, however in this code it keeps reading forever and I dont have any variable from fs.read that can help me stop it:
fs.open(filename, 'r', function(err, fd) {
var bufferSize=1000,
chunkSize=512,
buffer=new Buffer(bufferSize),
bytesRead = 0;
while(true){ //check if file has new content inside
fs.read(fd, buffer, 0, chunkSize, bytesRead);
bytesRead+= buffer.length;
}
});
Node has built-in methods in the fs module. It is tagged as unstable, so it can change in the future.
Its called: fs.watchFile(filename[, options], listener)
You can read more about it here: https://nodejs.org/api/fs.html#fs_fs_watchfile_filename_options_listener
But i highly suggest you to use one of the good modules mantained actively like
watchr:
From his readme:
Better file system watching for Node.js. Provides a normalised API the
file watching APIs of different node versions, nested/recursive file
and directory watching, and accurate detailed events for
file/directory changes, deletions and creations.
The module page is here: https://github.com/bevry/watchr
(Used the module in a couple of proyects and working great, im not related to it in other way)
you need store in some data base last size of file.
read filesize first.
load your file.
then make a script to check if file was change.
you can consult the size with jquery.post to obtain your result and decide if need to reload in javascript

Node.js request stream ends/stalls when piped to writable file stream

I'm trying to pipe() data from Twitter's Streaming API to a file using modern Node.js Streams. I'm using a library I wrote called TweetPipe, which leverages EventStream and Request.
Setup:
var TweetPipe = require('tweet-pipe')
, fs = require('fs');
var tp = new TweetPipe(myOAuthCreds);
var file = fs.createWriteStream('./tweets.json');
Piping to STDOUT works and stream stays open:
tp.stream('statuses/filter', { track: ['bieber'] })
.pipe(tp.stringify())
.pipe(process.stdout);
Piping to the file writes one tweet and then the stream ends silently:
tp.stream('statuses/filter', { track: ['bieber'] })
.pipe(tp.stringify())
.pipe(file);
Could anyone tell me why this happens?
it's hard to say from what you have here, it sounds like the stream is getting cleaned up before you expect. This can be triggered a number of ways, see here https://github.com/joyent/node/blob/master/lib/stream.js#L89-112
A stream could emit 'end', and then something just stops.
Although I doubt this is the problem, one thing that concerns me is this
https://github.com/peeinears/tweet-pipe/blob/master/index.js#L173-174
destroy should be called after emitting error.
I would normally debug a problem like this by adding logging statements until I can see what is not happening right.
Can you post a script that can be run to reproduce?
(for extra points, include a package.json that specifies the dependencies :)
According to this, you should create an error handler on the stream created by tp.

Resources