Whats the best way to save files on backend with mongoDB? - node.js

I'm in the middle of writing a mean stack application and I'm a beginner programmer.
The user must be able to submit some strings from form fields together with a file which would be an image or a pdf.
I´ve looked at Multer then I could store the path to the file together with the form data in MongoDB.
I also found something on GridFS.
What would be the best and easiest way to make this work?

As usual: it depends....
In one of the projects I am working on, we have deliberately chosen to use the database to store the files. One of the reasons is to prevent yet another security and configuration artefact to the solution by introducing a file-management requirements.
Now, mind you that storing a file on disk is just faster than storing it in your database. But if performance is not a really big issue, storing it in MongoDB is a perfectly valid option as well (after all, it is a NoSQL solution that uses binary serialization as one of its core components). You can choose to store files either in one of the normal collections, or in GridFS.
As a rule of thumb: if the files in your solution do not exceed the 16MB, then you can save them as (part of) a document in a normal collection. If they are larger than 16 MB, it is best to save them in GridFS. When saving in GridFS though, you have to play around with chunck-sizes to find an optimal performance.
A blog post on GridFS can be found here
In the past I have posted something in another answer on binary- vs. string-based serialization and the use of GridFS here Note: that post is in C#, but that should not really pose a problem.
Hope this helps, and good luck honing your programmer skills. The best way to learn is to ask and experiment (a lot).

You would never want to save the file as binary data on the DB for the obvious reasons so If you want to store files on the local storage where your app runs then multer is the package that you need.
var path = require('path')
var multer = require('multer')
var storage = multer.diskStorage({
destination: function(req, file, callback) {
callback(null, './uploads')
},
filename: function(req, file, callback) {
console.log(file)
callback(null, file.fieldname + '-' + Date.now() + path.extname(file.originalname))
}
})
app.post('/api/file', function(req, res) {
var upload = multer({
storage: storage
}).single('userFile')
upload(req, res, function(err) {
res.end('File is uploaded')
})
})
You can change a code bit as per your requirements and naming conventions you want. I would suggest that you keep the predictable filename so you won't even need to store the path in DB i.e. for userid 5 file will be stored at /img/user/profile/5.jpg or something similar.
And you can also try storing files on the cloud storage like aws s3 or google storage if you are having too much of files.

Related

How can I stay organized when changing asset locations used by express.js server?

I have been learning express and everything seems to be clicking into place so I am trying now to concentrate on organization and keeping things clean and intuitive. I am wondering if there is a convention for storing where certain files are stored to be accessed by my server? The idea is for example I have a directory something like this:
root/
|--html
|--html1.html
but I want to change the structure mid project to:
root/
|--assets
|--html
|--html1.html
Or for example as a site scales I decide to move my public directory to S3 or some other cloud storage instead.
Now I have to go back everywhere that sends that file and change the path or the entire middleware to send an s3 file. My solution is this, and it works, but I am not sure if it is optimal or if there is a convention that might be better.
I can store everything in an asset module like this which has methods to return a file:
function getFile(path){
return fs.readFileSync(path)
}
module.exports = {
html1: getFile(`${__dirname}/html/html1.html`),
html2: getFile(`${__dirname}/html/html2.html`,
image1: getFile(`${__dirname}/image/image1.html`
}
and import it into my server as "assets" and use it like this:
app.get('/*', (req, res, next) => {
res.set('Content-Type', 'text/html');
res.send(assets.html1);
});
Now if I want to change the path or even use an asset from s3 I can simply change the logic/function/path etc all in one place rather than going through all my routers manually etc.
Any feedback or guidance on where to look for more info is greatly appreciated!

More scalable way to write this Node.js logic for retrieving and updating CSV data from Google Cloud Storage?

I'm creating my first Node.js REST API test project, which retrieves prizes. The logic is meant to do the following:
Retrieve a CSV from the Google Cloud Storage bucket associated with the project
Parse the CSV
Find the first row where the column "Claimed" isn't populated
Update the "Claimed" column to "claimed!"
Overwrite the data in the CSV file in Google Cloud Storage
Return the prize name associated with that row
The logic I have is currently working locally, but I'm wondering if there is a better, more scalable way to write the code. I'll be testing it at a 500/min rate limit for around 100k users per day and would eventually pass an external user-id to be stored in the "Claimed" column. There will be a total of 500k rows (prizes) in the CSV.
The code I'm using is below. Any suggestions for making it scalable would be much appreciated! Thank you in advance.
const csv = require('csv-parser');
const fs = require('fs');
const jsonexport = require('jsonexport');
const bucketName = 'MY-BUCKET';
const filename = 'MY-CSV';
const {Storage} = require('#google-cloud/storage');
const storage = new Storage({keyFilename: "MY-KEY.json"});
const myBucket = storage.bucket(bucketName);
const file = myBucket.file(filename);
let dataArray = [];
file.createReadStream()
.pipe(csv())
.on('data', function (data) {
dataArray.push(data);
})
.on('end', function(){
let prize = dataArray.find(element => element.Claimed == "");
prizeName = prize.Prize_name;
prize.Claimed = "claimed!";
jsonexport(dataArray,function(err, transformedData){
if(err) return console.log(err);
file.save(transformedData, function(err) {
if(err) return console.log(err);
});
});
return prizeName
});
Okay, I have my experience with that. According to a question, it's more about performance, and I suggest that basic code works fine.
I guess the bottleneck is fs and createReadStream. It works fine, but it's not async (this question shows us why).
actually it's, but... you'll flood your RAM with your file, if you would like to store it at once
const fs = require('fs');
const readFile = promisify(fs.readFile);
let file = await readFile(`${path_}/${file}`, {encoding: 'utf8'});
So even if you have 10 files with 50k users instead of one with 500k.. you will needed to parse all 10 at once, but while you do it synchronous..
Well, you logic won't work x10 faster as you may think.
By the way, if you have another code that does the same functionality, but you don't know how to measure it performance speed, use this:
console.time('benchmark_name')
/* Some action here */
console.timeEnd('benchmark_name')
I understand that my answer is not an answer at all, but this is how I would solve this problem as if I were you:
drop the CSV - it will only generate a problem for you, especially when you are dealing with 100K+ rows.
try cloud infrastructure: if you need to store your data somewhere, use MongoDB Atlas (free tier) for example. And don't forget too add some indexes on relevant columns. So as for now, you don't need fs and it streams at all.
Focusing on Mongo Atlas in this case is just an option, you could use Azure Cosmos DB, or anything else. Even use AWS / GCP free tier VPS and launch any suitable-your-needs database. The point is, — run away from Google Sheets/Drive.
Why you should avoid CSV / Google Drive ?
They are not so bad, as you may think but ask yourself, if google drive / csv is the efficient way to store data, why people using databases instead of storing all info in one big *.csv file? I guess this metaphor make sense.
So back to our future DB..
As for now, you only need to connect to your database and modify it's values. You modify everything what you need at once, via one query, instead of:
let prize = dataArray.find(element => element.Claimed == "");
You do not need to find every row one-by-one. This is the precious scalable what you was asking about.
The only thing that you'll need is: AWS Lambda, MongoDB Stitch/Realm, webhook, API to modify you data in DB or add to your form (if you data should been updated via http form). By the way, if you are scared and not-yet-ready to jump and leave Google Sheets behind, you could read this article first and try to connect Google Sheets and your DB (in this case it's MongoDB, but Google Sheets has support of Google Script, so connect any other DB won't be a big problem)
So all this steps will help your app to be more scalable, as you have requested. Also using DB will solve any-other problem with data-validation and so on..
As an opposite site, in one of my projects I am having a dependence from a data source which is actually posts a big *.csv sheet. How big is it? 65K+ rows, and find and updating every value in it, costs 7-12 mins of resourceful time. God, I how I hate that guy, who using csv instead of API endpoints / DB connectors.

Upload small files via binData type in mongodb

I wanna upload small files of size lesser than 16MB to MongoDB via BinData type that i came to know is the only default option for smaller files whereas GRIDFS is ideally used for files exceeding 16MB in size.
Unfortunately I didn't easily get proper documentation and examples of uploading files without GridFS on MongoDB docs. The information I found about BinData type is either quite limited or I failed to understand. Going through several similar questions here (that are mostly python based) and elsewhere, I got some idea about usage of BinData but still I'm unable to successfully upload the files in this way.
I need more information about uploading files via BinData and especially the right way to initialise as I usually get BinData not a function or BinData is not defined errors. Here's my current code where I'm testing the functionality:
import { Meteor } from "meteor/meteor";
import { Mongo } from "meteor/mongo";
export const Attachment = new Mongo.Collection("attachment");
let BinData = Mongo.BinData; //wrong initialisation
function createAttachment(fileData) {
const data = new Buffer(fileData, "base64");
Attachment.insert({file: new BinData(0, data)});
}
Some helpful links:
BSON Types in Mongo
BSON spec
There are several Meteor Packages that you can use for file uploading.
I have used this one myself https://atmospherejs.com/vsivsi/file-collection
It can store your files in gridfs, and provides urls for retrieving images etc
Also:
https://atmospherejs.com/jalik/ufs
https://atmospherejs.com/ostrio/files

Write file directly from the request

I am having the following logic:
//Defense mechanism code is before the fs operations...
fs.readFile(req.files.image.path, function (err, data) {
if (err) {
} else {
fs.writeFile(pathLocation, data, function (err) {
if (err) {
return res.send(err, 500);
}
As far as I can tell, I am having fs.read and then fs.write... question is can I avoid first fs.read?, in another words read directly from the stream (req.files.image.path)...
I am trying to optimize the code as much as possible.
req.files.image is not a stream. It has already been buffered and written to disk via middleware (presumably the connect bodyParser). You can just rename it to it's final FS location via fs.rename. The readFile/writeFile is unnecessary.
You could avoid the write the rename by truly streaming it to disk. Remove the bodyParser middleware and directly do: req.pipe(fs.createWriteStream(pathLocation)) in your route handler.
Note since you mentioned it's an image going to S3, you could actually stream straight from the browser, through your app server without hitting the filesystem, up to S3. This is technically possible, but it's brittle, so most production deployments do use a temporary file on the app server to increase reliability.
You can also upload straight from the browser to S3 if you like.

Synchronous issue to save a file into GridFS - Node.JS/Express

I'm trying to save a profile photo from a social network as Facebook in my "fs.files" and "fs.chunks" as well, I can achieve this but the way I found to do it is not properly one.
My steps are:
Make user logon on the Facebook using Passport;
Saving the picture file on the disk (an internal application folder);
Open and read the picture file and store it on the properly collections (files and chunks from Mongo).
The problem happens between steps 2 and 3, because saving a file on the disk in this case is not the best idea and a synchronous issue and latency happens when attempt to store it on the DB.
I had used setTimeout javascript function to make it happens. But it is so bad, I know. I wonder to know somehow a way to get some kind of file stream and store it directly on the GridFS or to make the the actual process more efficient:
The codes:
Get the picture from an URL and save it (disk step)
// processing image 'http://graph.facebook.com/' + user.uid + '/picture';
// facebook -> disk
var userProfilePhoto = user.provider + '_' + user.uid + '.png';
request(user.photoURL)
.pipe(fs.createWriteStream(configApp.temp_files + userProfilePhoto));
Get the picture saved on the disk and store it on the GridFS
setTimeout(function() {
mongoFile.saveFile(user, true, userProfilePhoto, mimeType.mime_png,
function(err) {
if (!err) {
user.save();
}
}
);
}, 5000);
Unfortunately I had to use setTimeout to make it possible, without it the mongo just insert on "fs.files" and skip "fs.chunks" because the file is not ready yet - seems to be not saved yet.
I did it!
I replaced request plugin for http-get so when the file is ready I can store it in MongoDB.

Resources