Synchronous issue to save a file into GridFS - Node.JS/Express - node.js

I'm trying to save a profile photo from a social network as Facebook in my "fs.files" and "fs.chunks" as well, I can achieve this but the way I found to do it is not properly one.
My steps are:
Make user logon on the Facebook using Passport;
Saving the picture file on the disk (an internal application folder);
Open and read the picture file and store it on the properly collections (files and chunks from Mongo).
The problem happens between steps 2 and 3, because saving a file on the disk in this case is not the best idea and a synchronous issue and latency happens when attempt to store it on the DB.
I had used setTimeout javascript function to make it happens. But it is so bad, I know. I wonder to know somehow a way to get some kind of file stream and store it directly on the GridFS or to make the the actual process more efficient:
The codes:
Get the picture from an URL and save it (disk step)
// processing image 'http://graph.facebook.com/' + user.uid + '/picture';
// facebook -> disk
var userProfilePhoto = user.provider + '_' + user.uid + '.png';
request(user.photoURL)
.pipe(fs.createWriteStream(configApp.temp_files + userProfilePhoto));
Get the picture saved on the disk and store it on the GridFS
setTimeout(function() {
mongoFile.saveFile(user, true, userProfilePhoto, mimeType.mime_png,
function(err) {
if (!err) {
user.save();
}
}
);
}, 5000);
Unfortunately I had to use setTimeout to make it possible, without it the mongo just insert on "fs.files" and skip "fs.chunks" because the file is not ready yet - seems to be not saved yet.

I did it!
I replaced request plugin for http-get so when the file is ready I can store it in MongoDB.

Related

Optimal method for nodejs to hand of image from database to browser

the end result that I need is to send multiple images to a web browser from a database.
The images are stored as blobs.
I know I can stream them out of the database and into a file and then I could just give the url to the file.
I also know I can hand off base64 string to the browser so it can render the image.
My question is which option is the most optimal? Or best practice? Keep in mind that if I go the stream method, I would have to check to see if the image has changed since the last time I displayed it...and if it has changed then I have to restream it out of the database.
I have been playing with the oracldb for node js and was able to successfully extract one blob into a file but I am also having trouble streaming multiple files.
This is a two question post:
Which is the most optimal:
1. Send Base64 string - I kind of like this method because i dont have to worry about streaming out the file and checking if it has changed since it is coming straight from the databse. My concern is can the browser/nodejs handle it? I know those strings can be very large. I could also be sending more than one image at a time.
Stream the blobs into files.
The second part question is how can i get multiple blobs out below is my code on streaming just one file, i found this example from github lobstream1.js
https://raw.githubusercontent.com/oracle/node-oracledb/master/examples/lobstream1.js
Focusing on the code:
// Stream a LOB to a file
var dostream = function(lob, cb) {
if (lob.type === oracledb.CLOB) {
console.log('Writing a CLOB to ' + outFileName);
lob.setEncoding('utf8'); // set the encoding so we get a 'string' not a 'buffer'
} else {
console.log('Writing a BLOB to ' + outFileName);
}
var errorHandled = false;
lob.on(
'error',
function(err) {
console.log("lob.on 'error' event");
if (!errorHandled) {
errorHandled = true;
lob.close(function() {
return cb(err);
});
}
});
lob.on(
'end',
function() {
console.log("lob.on 'end' event");
});
lob.on(
'close',
function() {
// console.log("lob.on 'close' event");
if (!errorHandled) {
return cb(null);
}
});
var outStream = fs.createWriteStream(outFileName);
outStream.on(
'error',
function(err) {
console.log("outStream.on 'error' event");
if (!errorHandled) {
errorHandled = true;
lob.close(function() {
return cb(err);
});
}
});
// Switch into flowing mode and push the LOB to the file
lob.pipe(outStream);
};
Fixed spooling out images with this method, I did change the dostream a bit.
for(var x = 0; x<result.rows.length;x++)
{
outputFileName = x + '.jpg';
console.log(outputFileName);
console.log(x);
var lob = result.rows[x][0];
dostream(lob,outputFileName);
// cb(null,lob);
}
Thank you for any help.
Given all the detail you provided in subsequent comments including the average image size, number of distinct images, memory available to Node.js, number of concurrent users, and the fact that it's "very critical to have the images up to date", here's my initial take...
For the first implementation, stick to the KISS principle and avoid over-engineering. Disable browser caching and don't cache images in Node.js. Instead, rely on the driver and Oracle Database to do the heavy lifting for you.
As for the table storing the images, try to use SecureFile LOBs over BasicFile LOBs (they are known to perform better) if possible. Also, look at the caching options available to both (CACHE, CACHE READS, and NOCACHE). Consider enabling the CACHE READS option based on your stated workload, but work with your DBA to ensure the buffer cache is sized appropriately so you will not impact others.
You can rely on the connection pool's connection request queue to help control how many people are fetching files concurrently. In fact, you might want to create a separate pool just for this purpose so that people fetching LOBs aren't blocking people doing other things in the application. For example, let's say you normally have one connection pool with 10 connections. You could create two connection pools with 5 connections each (use the connection pool cache to make this easy). Then, in the code path that fetches lobs, use the lob pool and use the other pool for everything else.
Given this setup, I'd also recommend NOT streaming the LOBs. Using the driver's ability to buffer the LOBs in Node.js will greatly simplify the code and you should have plenty of memory given such a small number of concurrent users/file fetches.
The biggest problem with this scenario that the images are pretty large and they'll always be flowing from the database through Node.js to the browser. But since you'll be on an internal network, this might not be much of a problem. If it does turn out to be a problem, you can start to add caching in either the browser or Node.js based on what makes the most sense.
Unless you do something like tiling or the base64 inline encoding, each image needs its own URL, so each invocation of node-oracledb would return just one image. You could do some kind of caching by writing to disk, but this seems extra IO - you will need to test to measure your own system's performance and memory requirements. Regarding accessing multiple images in node-oracledb there's some code in https://github.com/oracle/node-oracledb/issues/1041#issuecomment-459002641 that may be useful.

Save an image file into a database with node/request/sequelize/mysql

I'm trying to save a remote image file into a database, but I'm having some issues with it since I've never done it before.
I need to download the image and pass it along (with node-request) with a few other properties to another node api that saves it into a mysql database (using sequelize). I've managed to get some data to save, but when I download it manually and try to open it, it's not really usable and no image shows up.
I've tried a few things: getting the image with node-request, converting it to a base64 string (read about that somewhere) and passing it along in a json payload, but that didn't work. Tried sending it as a multipart, but that didn't work either. Haven't worked with streams/buffers/multipart all that much before and never in node. I've tried looking into node-request pipes, but I couldn't really figure out how possibly apply them to this context.
Here's what I currently have (it's a part es6 class so there's no 'function' keywords; also, request is promisified):
function getImageData(imageUrl) {
return request({
url: imageUrl,
encoding: null,
json: false
});
}
function createEntry(entry) {
return getImageData(entry.image)
.then((imageData) => {
entry.image_src = imageData.toString('base64');
var requestObject = {
url: 'http://localhost:3000/api/entry',
method: 'post',
json: false,
formData: entry
};
return request(requestObject);
});
}
I'm almost 100% certain the problem is in this part because the api just takes what it gets and gives it to sequelize to put into the table, but I could be wrong. Image field is set as longblob.
I'm sure it's something simple once I figure it out, but so far I'm stumped.
This is not a direct answer to your question but it is rarely needed to actually store an image in the database. What is usually done is storing an image on storage like S3, a CDN like CloudFront or even just in a file system of a static file server, and then storing only the file name or some ID of the image in the actual database.
If there is any chance that you are going to serve those images to some clients then serving them from the database instead of a CDN or file system will be very inefficient. If you're not going to serve those images then there is still very little reason to actually put them in the database. It's not like you're going to query the database for specific contents of the image or sort the results on the particular serialization of an image format that you use.
The simplest thing you can do is save the images with a unique filename (either a random string, UUID or a key from your database) and keep the ID or filename in the database with other data that you need. If you need to serve it efficiently then consider using S3 or some CDN for that.

How to save the ID of a saved file using GridFS?

I am trying to save the ID of the file that I send via GridFS to my MongoDB (working with Mongoose). However I cant seem to find out how to get the created ID in fs.files with code?
var writestream = gfs.createWriteStream({
filename: req.file.originalname
});
fs.createReadStream(req.file.path).pipe(writestream);
writestream.on('close', function (file) {
// do something with `file`
console.log(file.filename + 'Written To DB');
});
I cant seem to be able to save the ID of the File that I wrote via the writestream.
The file is created in the lists etc. but how do I manage to save the file so that I can save it in one of my other MongoDB Documents?
This is old and I think you already solved you problem.
If I correctly understood, you were trying to get the id of the saved file.
You can simply do this:
console.log(writestream.id);

How can I stream multiple remote images to a zip file and stream that to browser with ExpressJS?

I've got a small web app built in ExpressJs that allows people in our company to browse product information. A recent feature request requires that users be able to download batches of images (potentially hundreds at a time). These are stored on another server.
Ideally I think I need to to stream the batch of files to a zip file and stream that to the end user's browser as a download. All preferably without having to store the files on the server. The idea being that I want to reduce load on the server as much as possible.
Is it possible to do this or do I need to look at another approach? I've been experimenting with the 'request' module for the initial download.
If anyone can point me in the right direction or recommend any NPM modules that might help it would be very much appreciated.
Thanks.
One useful module for this is archiver, but I'm sure there are others as well.
Here's an example program that shows:
how to retrieve a list of URL's (I'm using async to handle the requests, and also to limit the # of concurrent HTTP requests to 3);
how to add the responses for those URL's to a ZIP file;
to stream the final ZIP file somewhere (in this case to stdout, but in case of Express you can pipe to the response object).
Example:
var async = require('async');
var request = require('request');
var archiver = require('archiver');
function zipURLs(urls, outStream) {
var zipArchive = archiver.create('zip');
async.eachLimit(urls, 3, function(url, done) {
var stream = request.get(url);
stream.on('error', function(err) {
return done(err);
}).on('end', function() {
return done();
});
// Use the last part of the URL as a filename within the ZIP archive.
zipArchive.append(stream, { name : url.replace(/^.*\//, '') });
}, function(err) {
if (err) throw err;
zipArchive.finalize().pipe(outStream);
});
}
zipURLs([
'http://example.com/image1.jpg',
'http://example.com/image2.jpg',
...
], process.stdout);
Do note that although this doesn't require the image files to be locally stored, it does build the ZIP file entirely in memory. Perhaps there are other ZIP modules that would allow you to work around that, although (AFAIK) the ZIP file format isn't really great in terms of streaming, as it depends on metadata being appended to the end of the file.

Write file directly from the request

I am having the following logic:
//Defense mechanism code is before the fs operations...
fs.readFile(req.files.image.path, function (err, data) {
if (err) {
} else {
fs.writeFile(pathLocation, data, function (err) {
if (err) {
return res.send(err, 500);
}
As far as I can tell, I am having fs.read and then fs.write... question is can I avoid first fs.read?, in another words read directly from the stream (req.files.image.path)...
I am trying to optimize the code as much as possible.
req.files.image is not a stream. It has already been buffered and written to disk via middleware (presumably the connect bodyParser). You can just rename it to it's final FS location via fs.rename. The readFile/writeFile is unnecessary.
You could avoid the write the rename by truly streaming it to disk. Remove the bodyParser middleware and directly do: req.pipe(fs.createWriteStream(pathLocation)) in your route handler.
Note since you mentioned it's an image going to S3, you could actually stream straight from the browser, through your app server without hitting the filesystem, up to S3. This is technically possible, but it's brittle, so most production deployments do use a temporary file on the app server to increase reliability.
You can also upload straight from the browser to S3 if you like.

Resources