is the nodejs Buffer asynchronous or synchronous? - node.js

I dont see a callback in the Buffer documentation at http://nodejs.org/api/buffer.html#buffer_buffer. Am I safe to assume that Buffer is synchronous? I'm trying to convert a binary file to a base64 encoded string.
What I'm ultimately trying to do is take a PNG file and store its base64 encoded string in MongoDB. I read somewhere that I should take the PNG file, use Buffer to convert to base64, then pass this base64 output to Mongo.
My code looks something like this:
fs.readFile(filepath, function(err, data) {
var fileBuffer = new Buffer(data).toString('base64');
// do Mongo save here with the fileBuffer ...
});
I'm a bit fearful that Buffer is synchronous, and thus would be blocking other requests while this base64 encoding takes place. If so, is there a better way of converting a binary file to a base64 encoded one for storage in Mongo?

It is synchronous. You could make it asynchronous by slicing your Buffer and converting a small amount at a time and calling process.nextTick() in between, or by running it in a child process - but I wouldn't recommend either of those approaches.
Instead, I would recommend not storing images in your db- store them on disk or perhaps in a file storage service such as Amazon S3, and then store just the file path or URL in your database.

Related

Node uploaded image save - stream vs buffer

I am working on image upload and don't know how to properly deal with storing the received file. It would be nice to analyze the file first if it is really an image or someone just changed the extension. Luckily I use package sharp which has exactly such a feature. I currently work with two approaches.
Buffering approach
I can parse multipart form as a buffer and easily decide whether save a file or not.
const metadata = await sharp(buffer).metadata();
if (metadata) {
saveImage(buffer);
} else {
throw new Error('It is not an image');
}
Streaming approach
I can parse multipart form as a readable stream. First I need to forward the readable stream to writable and store file to disk. Afterward, I need again to create a readable stream from saved file and verify whether it is really image. Otherwise, revert all.
// save uploaded file to file system with stream
readableStream.pipe(createWriteStream('./uploaded-file.jpg'));
// verify whether it is an image
createReadStream('./uploaded-file.jpg').pipe(
sharp().metadata((err, metadata) => {
if (!metadata) {
revertAll();
throw new Error('It is not an image');
}
})
)
It was my intention to avoid using buffer because as I know it needs to store the whole file in RAM. But on the other hand, the approach using streams seems to be really clunky.
Can someone help me to understand how these two approaches differ in terms of performance and used resources? Or is there some better approach to how to deal with such a situation?
In buffer mode, all the data coming from a resource is collected into a buffer, think of it as a data pool, until the operation is completed; it is then passed back to the caller as one single blob of data. Buffers in V8 are limited in size. You cannot allocate more than a few gigabytes of data, so you may hit a wall way before running out of physical memory if you need to read a big file.
On the other hand, streams allow us to process the data as soon as it arrives from the resource. So streams execute their data without storing it all in memory. Streams can be more efficient in terms of both space (memory usage) and time (computation clock time).

What is the difference between async and steam writing files?

I now that it's possible to use async methods (like fs.appendFile) and streams (like fs.createWriteStream) to write files.
But why do we need both of them if streams are asynchronous as well and can provide us with better functionality?
Let's say you're downloading a file, a huge file, 1TB file, and you want to write that file to your filesystem.
You could download the whole file into a buffer in-memory, then fs.appendFile() or fs.writeFile() the buffer to a local file, or try, at least, you'd run out of memory.
Or you could create a read-stream for the downloading file, and pipe it to a write-stream for the write to your file-system:
const readStream = magicReadStreamFromUrl/*[1]*/('https://example.com/large.txt');
const writeStream = fs.createWriteStream('large.txt');
readStream.pipe(writeStream);
This means that the file is downloaded in chunks, and those chunks get piped to the writeStream (which would write them to disk), without having to store it in-memory yourself.
That is the reason for Streaming abstractions in general, and in Node in particular.
The http module supports streaming in this way, as well as most other HTTP libraries like request and axios, I've left out the specifics of how to create a read-stream as an exercise to the reader for brevity.

How to synchronously read from a ReadStream in node

I am trying to read UTF-8 text from a file in a memory and time efficient way. There are two ways to read directly from a file synchronously:
fs.readFileSync will read the entire file and return a buffer containing the file's entire contents
fs.readSync will read a set amount of bytes from a file and return a buffer containing just those contents
I initially just used fs.readFileSync because it's easiest, but I'd like to be able to efficiently handle potentially large files by only reading in chunks of text at a time. So I started using fs.readSync instead. But then I realized that fs.readSync doesn't handle UTF-8 decoding. UTF-8 is simple, so I could whip up some logic to manually decode it, but Node already has services for that, so I'd like to avoid that if possible.
I noticed fs.createReadStream, which returns a ReadStream that can be used for exactly this purpose, but unfortunately it seems to only be available in an asynchronous mode of operation.
Is there a way to read from a ReadStream in a synchronous way? I have a massive stack built on top of this already, and I'd rather not have to refactor it to be asynchronous.
I discovered the string_decoder module, which handles all that UTF-8 decoding logic I was worried I'd have to write. At this point, it seems like a no-brainer to use this on top of fs.readSync to get the synchronous behavior I was looking for.
You basically just keep feeding bytes to it, and as it is able to successfully decode characters, it will emit them. The Node documentation is sufficient at describing how it works.

How to upload file in node.js with http module?

I have this code so far, but can not get the Buffer bianary.
var http = require('http');
var myServer = http.createServer(function(request, response)
{
var data = '';
request.on('data', function (chunk){
data += chunk;
});
request.on('end',function(){
if(request.headers['content-type'] == 'image/jpg') {
var binary = Buffer.concat(data);
//some file handling would come here if binary would be OK
response.write(binary.size)
response.writeHead(201)
response.end()
}
But get this error: throw new TypeError('Usage: Buffer.concat(list, [length])');
You're doing three bad things:
Using the Buffer API wrong - hence the error message.
Concatenating binary data as strings
Buffering data in memory
Mukesh has dealt with #1, so I'll cover the deeper problems.
First, you're receiving binary Buffer chunks and converting them to strings with the default (utf8) encoding, then concatenating them. This will corrupt your data. As well as there existing byte sequences that aren't valid utf8, if a valid sequence is cut in half by a chunk you'll lose that data too.
Instead, you should keep the data always as binary data. Maintain an array of Buffer to which you push each chunk, then concatenate them all at the end.
This leads to the problem #3. You are buffering the whole upload into memory then writing it to a file, instead of streaming it directly to a (temporary) file. This puts a lot of load on your application, both using up memory and using up time allocating it all. You should just pipe the request to a file output stream, then inspect it on disk.
If you are only accepting very small files you may get away with keeping them in memory, but you need to protect yourself from clients sending too much data (and indeed lying about how much they're going to send).

Saving a base 64 string to a file via createWriteStream

I have an image coming into my Node.js application via email (through cloud service provider Mandrill). The image comes in as a base64 encoded string, email.content in the example below. I'm currently writing the image to a buffer, and then a file like this:
//create buffer and write to file
var dataBuffer = new Buffer(email.content, 'base64');
var writeStream = fs.createWriteStream(tmpFileName);
writeStream.once('open', function(fd) {
console.log('Our steam is open, lets write to it');
writeStream.write(dataBuffer);
writeStream.end();
}); //writeSteam.once('open')
writeStream.on('close', function() {
fileStats = fs.statSync(tmpFileName);
This works fine and is all well and good, but am I essentially doubling the memory requirements for this section of code, since I have my image in memory (as the original string), and then create a buffer of that same string before writing the file? I'm going to be dealing with a lot of inbound images so doubling my memory requirements is a concern.
I tried several ways to write email.content directly to the stream, but it always produced an invalid file. I'm a rank amateur with modern coding, so you're welcome to tell me this concern is completely unfounded as long as you tell me why so some light will dawn on marble head.
Thanks!
Since you already have the entire file in memory, there's no point in creating a write stream. Just use fs.writeFile
fs.writeFile(tmpFileName, email.content, 'base64', callback)
#Jonathan's answer is a better way to shorten the code you already have, so definitely do that.
I will expand on your question about memory though. The fact is that Node will not write anything to a file without converting it to a Buffer first, so given when you have told us about email.content, there is nothing more you can do.
If you are really worried about this though, then you would need some way to process the value of email.content as it comes in from where ever you are getting it from, as a stream. Then as the data is being streamed into the server, you immediately write it to a file, thus not taking up any more RAM than needed.
If you elaborate more, I can try to fill in more info.

Resources