What is the difference between async and steam writing files? - node.js

I now that it's possible to use async methods (like fs.appendFile) and streams (like fs.createWriteStream) to write files.
But why do we need both of them if streams are asynchronous as well and can provide us with better functionality?

Let's say you're downloading a file, a huge file, 1TB file, and you want to write that file to your filesystem.
You could download the whole file into a buffer in-memory, then fs.appendFile() or fs.writeFile() the buffer to a local file, or try, at least, you'd run out of memory.
Or you could create a read-stream for the downloading file, and pipe it to a write-stream for the write to your file-system:
const readStream = magicReadStreamFromUrl/*[1]*/('https://example.com/large.txt');
const writeStream = fs.createWriteStream('large.txt');
readStream.pipe(writeStream);
This means that the file is downloaded in chunks, and those chunks get piped to the writeStream (which would write them to disk), without having to store it in-memory yourself.
That is the reason for Streaming abstractions in general, and in Node in particular.
The http module supports streaming in this way, as well as most other HTTP libraries like request and axios, I've left out the specifics of how to create a read-stream as an exercise to the reader for brevity.

Related

Node uploaded image save - stream vs buffer

I am working on image upload and don't know how to properly deal with storing the received file. It would be nice to analyze the file first if it is really an image or someone just changed the extension. Luckily I use package sharp which has exactly such a feature. I currently work with two approaches.
Buffering approach
I can parse multipart form as a buffer and easily decide whether save a file or not.
const metadata = await sharp(buffer).metadata();
if (metadata) {
saveImage(buffer);
} else {
throw new Error('It is not an image');
}
Streaming approach
I can parse multipart form as a readable stream. First I need to forward the readable stream to writable and store file to disk. Afterward, I need again to create a readable stream from saved file and verify whether it is really image. Otherwise, revert all.
// save uploaded file to file system with stream
readableStream.pipe(createWriteStream('./uploaded-file.jpg'));
// verify whether it is an image
createReadStream('./uploaded-file.jpg').pipe(
sharp().metadata((err, metadata) => {
if (!metadata) {
revertAll();
throw new Error('It is not an image');
}
})
)
It was my intention to avoid using buffer because as I know it needs to store the whole file in RAM. But on the other hand, the approach using streams seems to be really clunky.
Can someone help me to understand how these two approaches differ in terms of performance and used resources? Or is there some better approach to how to deal with such a situation?
In buffer mode, all the data coming from a resource is collected into a buffer, think of it as a data pool, until the operation is completed; it is then passed back to the caller as one single blob of data. Buffers in V8 are limited in size. You cannot allocate more than a few gigabytes of data, so you may hit a wall way before running out of physical memory if you need to read a big file.
On the other hand, streams allow us to process the data as soon as it arrives from the resource. So streams execute their data without storing it all in memory. Streams can be more efficient in terms of both space (memory usage) and time (computation clock time).

How to synchronously read from a ReadStream in node

I am trying to read UTF-8 text from a file in a memory and time efficient way. There are two ways to read directly from a file synchronously:
fs.readFileSync will read the entire file and return a buffer containing the file's entire contents
fs.readSync will read a set amount of bytes from a file and return a buffer containing just those contents
I initially just used fs.readFileSync because it's easiest, but I'd like to be able to efficiently handle potentially large files by only reading in chunks of text at a time. So I started using fs.readSync instead. But then I realized that fs.readSync doesn't handle UTF-8 decoding. UTF-8 is simple, so I could whip up some logic to manually decode it, but Node already has services for that, so I'd like to avoid that if possible.
I noticed fs.createReadStream, which returns a ReadStream that can be used for exactly this purpose, but unfortunately it seems to only be available in an asynchronous mode of operation.
Is there a way to read from a ReadStream in a synchronous way? I have a massive stack built on top of this already, and I'd rather not have to refactor it to be asynchronous.
I discovered the string_decoder module, which handles all that UTF-8 decoding logic I was worried I'd have to write. At this point, it seems like a no-brainer to use this on top of fs.readSync to get the synchronous behavior I was looking for.
You basically just keep feeding bytes to it, and as it is able to successfully decode characters, it will emit them. The Node documentation is sufficient at describing how it works.

is the nodejs Buffer asynchronous or synchronous?

I dont see a callback in the Buffer documentation at http://nodejs.org/api/buffer.html#buffer_buffer. Am I safe to assume that Buffer is synchronous? I'm trying to convert a binary file to a base64 encoded string.
What I'm ultimately trying to do is take a PNG file and store its base64 encoded string in MongoDB. I read somewhere that I should take the PNG file, use Buffer to convert to base64, then pass this base64 output to Mongo.
My code looks something like this:
fs.readFile(filepath, function(err, data) {
var fileBuffer = new Buffer(data).toString('base64');
// do Mongo save here with the fileBuffer ...
});
I'm a bit fearful that Buffer is synchronous, and thus would be blocking other requests while this base64 encoding takes place. If so, is there a better way of converting a binary file to a base64 encoded one for storage in Mongo?
It is synchronous. You could make it asynchronous by slicing your Buffer and converting a small amount at a time and calling process.nextTick() in between, or by running it in a child process - but I wouldn't recommend either of those approaches.
Instead, I would recommend not storing images in your db- store them on disk or perhaps in a file storage service such as Amazon S3, and then store just the file path or URL in your database.

Saving a base 64 string to a file via createWriteStream

I have an image coming into my Node.js application via email (through cloud service provider Mandrill). The image comes in as a base64 encoded string, email.content in the example below. I'm currently writing the image to a buffer, and then a file like this:
//create buffer and write to file
var dataBuffer = new Buffer(email.content, 'base64');
var writeStream = fs.createWriteStream(tmpFileName);
writeStream.once('open', function(fd) {
console.log('Our steam is open, lets write to it');
writeStream.write(dataBuffer);
writeStream.end();
}); //writeSteam.once('open')
writeStream.on('close', function() {
fileStats = fs.statSync(tmpFileName);
This works fine and is all well and good, but am I essentially doubling the memory requirements for this section of code, since I have my image in memory (as the original string), and then create a buffer of that same string before writing the file? I'm going to be dealing with a lot of inbound images so doubling my memory requirements is a concern.
I tried several ways to write email.content directly to the stream, but it always produced an invalid file. I'm a rank amateur with modern coding, so you're welcome to tell me this concern is completely unfounded as long as you tell me why so some light will dawn on marble head.
Thanks!
Since you already have the entire file in memory, there's no point in creating a write stream. Just use fs.writeFile
fs.writeFile(tmpFileName, email.content, 'base64', callback)
#Jonathan's answer is a better way to shorten the code you already have, so definitely do that.
I will expand on your question about memory though. The fact is that Node will not write anything to a file without converting it to a Buffer first, so given when you have told us about email.content, there is nothing more you can do.
If you are really worried about this though, then you would need some way to process the value of email.content as it comes in from where ever you are getting it from, as a stream. Then as the data is being streamed into the server, you immediately write it to a file, thus not taking up any more RAM than needed.
If you elaborate more, I can try to fill in more info.

Node.js - How does a readable stream react to a file that is still being written?

I have found a lot of information on how to pump, or pipe data from a read stream to a write stream in Node. The newest version even auto pauses, and resumes for you. However, I have a different need and would like some help.
I am writing a video file using ffmpeg (to a local file, not a writeable stream), and I would like to create a readstream that reads the data as it gets written. Obviously, the read stream speed will surpass how quickly ffmpeg encodes the file. What will happen when the read stream reaches the end of data before ffmpeg finishes writing the file? I assume it will stop the read stream before the file is fully encoded.
Anyone have any suggestions for the best way to pause/resume the read stream so that it doesn't reach the end of the locally encoding file until the encoding is 100% complete?
In summary:
This is what people normally do: readStream --> writeStream (using .pipe)
This is what I want to do: local file (in slow creation process) --> readStream
As always, thanks to the stackOverflow community.
The growing-file module is what you want.

Resources