Big file upload strategy in node js and aws s3

Big file upload strategy in node js and aws s3 - node.js

I am working in application where react js application uploading files to backend ( node js, Loopback 4 based ) and backend then send those files AWS S3 Bucket. Also upload is multiple files based so can lead up any number of files within groups something like below
POST Request
Group 1
Group Type: Text
File 1
File 2
File 3
Group 2
Group Type: Text
File 4
File 5
File 6
Group 3
Group Type: Text
File 7
File 8
File 9
One single Form Data request sending all these details to backend server. For now every this is working file for files.
Now as per production requirements, These files can be very big, like some of files above than 100MB but less than 500MB. I am thinking, on average it is possible to upload GBs of file upload in single POST Request.
This make me to rethink whole solution is feasible or not and if failure chances will increase in production with each such request. While Node Js/Multer application itself do not have limits ( and I can update those limits if required ).
Their is one idea to make whole upload process interactive by showing upload progress, also add some solution where user can work to other pages while files is still uploading ( As these are react SPA pages may be redux can handle this or not) something like AWS create services.
Lastly I am thinking if instead of upload by frontend, some FTP server setup and schedule sync files to S3 by backend scripts is better and safe idea. Later by web sockets we can inform user to sync completion.
If their is some industry standard approach to make such solution.

Related

Serving and processing large CSV and XML files from API

I am working on a node web application and require a form to enable users to provide a URL containing a (potentially 100mb) large CSV or XML file. This would then be submitted and trigger the server (Express) to download the file using fetch, process it and then save it to my Postgres database.
The problem I am having is the size of the file. Responses from the API take minutes to return and I'm worried this solution is not optimal for a production application. I've also seen that many servers (including cloud based ones) have response size limits on them, which would obviously be exceeded here.
Is there a better way to do this than simply via a fetch request?
Thanks

NodeJS, how to handle image uploading with MongoDB?

I would like to know what is the best way to handle image uploading and saving the reference to the database. What I'm mostly interested is what order do you do the process in?
Should you upload the images first in the front-end (say Cloudinary), and then call the API with result links to the images and save it to the database?
Or should you upload the images to the server first, and upload them from the back-end and save the reference afterwards?
OR, should you do the image uploading after you save the record in the database and then update it once the images were uploaded?

It really depends on the resources, timeline, and number of images you need to upload daily.
So basically if you have very few images to upload then you can upload that image to your server then upload it to any cloud storage(s3, Cloudinary,..) you are using. As this will be very easy to implement(you can find code snippet over the internet) and you can securely maintain your secret keys/credential to your cloud platform on the server side.
But, according to me best way of doing this will be something like this. I am taking user registration as an example
Make server call to get a temporary credential to upload files on the cloud(Generally, all the providers give this functionality i.e. STS/Signed URL in AWS).
The user will fill up the form and select the image on the client side. When the user clicks the submit button make one call to save the user in the database and start upload with credentials. If possible keep a predictable path for upload. Like for user upload /users/:userId or something like that. this highly depends on your use case.
Now when upload finishes make a server call for acknowledgment and store some flag in the database.
Now advantages of this approach are:
You are completely offloading your server from handling file operations which are pretty heavy and I/O blocking and you are distributing that load to all clients.
If you want to post process the files after upload you can easily integrate this with serverless platforms and do that on there and again offload that.
You can easily provide retry mechanism to your users in case of file upload fails but they won't need to refill the data, just upload the image/file again
You don't need to expose the URL directly to the client for file upload as you are using temporary Creds.

If the significance of the images in your app is high then ideally, you should not complete the transaction until the image is saved. The approach should be to create an object in your code which you will eventually insert into mongodb, start upload of image to cloud and then add the link to this object. Finally then insert this object into mongodb in one go. Do not make repeated calls. Anything before that, raise an error and catch the exception

You can have many answers,
if you are working with big files greater than 16mb please go with gridfs and multer,
( changing the images to a different format and save them to mongoDB)
If your files are actually less than 16 mb, please try using this Converter that changes the image of format jpeg / png to a format of saving to mongodb, and you can see this as an easy alternative for gridfs ,
please check this github repo for more details..

For a web app that allows simple image uploads, how should I store the images? Confused about file system vs. cdn

Every search result says something about storing the images in the file system but store the paths in the database, but I'm not sure exactly what "file system" means. Would that mean you have something like:
/public (assets)
/js
/css
/img
/app (frontend)
/server (backend)
and you'd upload directly to that /public/img directory?
I remember trying something like that in the past with a Node.js app hosted on Heroku, and it wouldn't let me. I had to set up Amazon S3 and upload the images THERE, which leads to my confusion.
Is using something like Amazon S3 the usual practice or do people upload directly to the /img directory (assuming this is the "file system"?) and it just happened to be the case that Heroku doesn't allow this but other hosts do?

I'd characterize the pattern as "store the data in a blob storage service, store a pointer in your database". The uploaded file is the "blob" - once it has left the user's computer and filesystem, is it really a file anymore? :) On the server, a file system can store that "blob". S3 can store that blob. In the first case, you are storing a path. In the second case, you are storing the URL to the S3 object. A database could even store that blob (not at all recommended, though...)
In any case, the question to ask is: "what happens when I need two app servers to support my traffic?". Wherever that blob goes, both app servers need access to it.
In a data center under your control, there are many ways to share a filesystem across servers - network attached storage (NFS- or SMB-mounted volumes), or storage area networks (iSCSI, Fibre Channel). With more limited network/hardware configuration options in cloud-based Infrastructure/Platform-as-a-Service providers, the de facto standard is S3 because it is inexpensive, reliable, easy to use, and can completely offload serving the file from your servers.
For Heroku, though, you don't have much control over the file system. And, know that the file system for each of your dynos is "ephemeral" - it goes away when the dyno restarts. Which will happen when your app goes idle, or every 24 hours, whichever comes first. So that forces the choice a little.
Final point - S3 comes with the ancillary benefit of taking the burden of serving the blob off of your servers. You can also store files directly to S3 from the browser, without routing it through your app (see https://devcenter.heroku.com/articles/s3-upload-node). The benefit in both cases is that those downloads/uploads can take up lots of your application's precious time for stuff that's pretty rote.

Uploading directly to a host file system is generally not a best practice. This is one reason services like S3 are so popular.
If you're using the host file system and ever need more than one instance of a server, the file systems will grow out of sync. Imagine one user uploads 'foo.jpg' to server A (A/app/uploads) and another uploads 'bar.jpg' to server B (B/app/uploads). When either of these images is later requested, the request has a 50% chance of failing, depending on whether the load balancer routes the request to server A or server B.
There are several ancillary benefits to avoiding the host filesystem. For instance, you can set the filesystem serving your app to read-only for increased security. Files are a form of state, and stateless web servers allow you to do things like blow away one instance and deploy another instance to take over its work.

You might find this of help:
https://codeforgeek.com/2014/11/file-uploads-using-node-js/
I used multer in my node.js server file to handle uploading from the front end. Basically I had an html form that would submit the image to the server file, where it would be handled by multer. This actually led it to be saved in the file system (to answer your question concretely, yes, this was to something like the /img directory right in your project file structure). My application is running on heroku, and this feature works on there as well. However, I would not recommending using the file system to store your image like this (I doubt you will have enough space for a large amount of images/files) - using AWS storage or a DB would be better.

Heroku dynos not sharing the file system

My heroku web app has a feature to download images from S3. It works like this:
There is one endpoint (A) to request downloading an array of images, that returns a task id.
Those images are downloaded by A to the tmp Heroku folder of my app. And when all images are downloaded, a zip file is created.
While images can still be downloading, the web client calls another endpoint (B) with the task id from point 1. This second endpoint checks how many images are already downloaded to return a progress percentage. When the zip is already created it "returns" the zip file and the images are downloaded.
This approach worked fine in Heroku with 1 dyno. Unfortunately, after scaling to 2 dynos, we have realised that it doesn't work anymore. The reason is that dynos in Heroku doesn't share the same file system, and endpoint A and B are managed by different dynos. Therefore, the dyno in endpoint B doesn't find any file.
Is there an easy way to make my approach work with multiple dynos?
If not, how should I implement the feature described? (downloading multiple images from S3 in a zip file)

You could create a second S3 bucket and push the zip file to the second S3 bucket once it's done downloading. Then you can redirect the client to download the zip file directly from S3.
Then setup a process to run periodically to clean out anything old in that S3 bucket.

I think the solution is here , that may help
http://technomile.github.io/wordpress/setup.html

AngularJs: How to upload images to another service?

I am building a MEAN application (Angular + node + Express + Mongo).
In this app there are users who can upload a limited amount of pictures (lets say 5).
I really want to avoird storing too many data on my server.
So I am looking for a module that let users upload the images to a service such as picasa, imageshack... The service should be transparent to the user.
When it's done, I save the picture URL in my DB and so I can retrieve it and display pictures easily.
Do you know such module / tutorial to do that? Does it even exists?
I have been looking but it seems to not exists.

The easiest way to have a file upload service with AngularJS as the front end and NodeJS as the backend is to use the jQuery File Upload for use with AngularJS, which can be found here.
It makes use of a directive that you can use to upload your file.
You need to specify a route to which the uploaded file should be POST'ed to.
In this route handle, that is in you Node.js server, you can then post it to the external image hosting servers. This is something that you can write on your own or you can use the node.js modules for the respective hosts (if they exist).

I find a service doing it:
http://cloudinary.com/
With a nodejs integration:
http://cloudinary.com/documentation/node_integration
It seems perfect (free up to 500 MO and 50.000 pictures, far enough for my needs).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string