Torn between a couple practices for file upload to CDN - node.js

The platform I'm working on involves the client(ReactJS) and server(NodeJs, Express), of course.
The major feature of this platform involves users uploading images constantly.
Everything has been setup successfully using multer to receive images in formdata on my api server and now its time to create an "image management system".
The main problem I'll be tackling is the unpredictable file size of users. The files are images and the depend on the OS of the user i.e user taking pictures, users taking screenshots.
The first solution is to determine the max file size, and to transport it using a compression algorithm to the api server. When the backend receives it successfully, images are uploaded to a CDN (Cloudinary) and then the link is stored in the database along with other related records.
The second which im strongly leaning towards is shifting this "upload to CDN" system to the client side and make the client connect to cloudinary directly and after grab the secure link and insert into the JSON that would be sent to the server.
This eliminates the problem of grappling with file sizes which is progress, but I'll like to know if it is a good practice.

Restricting the file size is possible when using the Cloudinary Upload Widget for client-side uploads.
You can include the 'maxFileSize' parameter when calling the widget, setting its value to 500000 (value should be provided in bytes).
https://cloudinary.com/documentation/upload_widget
If the client will try and upload a larger file he/she will get back an error stating the max file size is exceeded and the upload will fail.
Alternatively, you can choose to limit the dimensions of the image, so if exceeded, instead of failing the upload with an error, it will automatically scale down to the given dimensions while retaining aspect ratio and the upload request will succeed.
However, using this method doesn't guarantee the uploaded file will be below a certain desired size (e.g. 500kb) as each image is different and one image that is scaled-down to given dimensions can result in a file size that is smaller than your threshold, while another image may slightly exceed it.
This can be achieved using the limit cropping method as part of an incoming transformation.
https://cloudinary.com/documentation/image_transformations#limit
https://cloudinary.com/documentation/upload_images#incoming_transformations

Related

NodeJS, how to handle image uploading with MongoDB?

I would like to know what is the best way to handle image uploading and saving the reference to the database. What I'm mostly interested is what order do you do the process in?
Should you upload the images first in the front-end (say Cloudinary), and then call the API with result links to the images and save it to the database?
Or should you upload the images to the server first, and upload them from the back-end and save the reference afterwards?
OR, should you do the image uploading after you save the record in the database and then update it once the images were uploaded?
It really depends on the resources, timeline, and number of images you need to upload daily.
So basically if you have very few images to upload then you can upload that image to your server then upload it to any cloud storage(s3, Cloudinary,..) you are using. As this will be very easy to implement(you can find code snippet over the internet) and you can securely maintain your secret keys/credential to your cloud platform on the server side.
But, according to me best way of doing this will be something like this. I am taking user registration as an example
Make server call to get a temporary credential to upload files on the cloud(Generally, all the providers give this functionality i.e. STS/Signed URL in AWS).
The user will fill up the form and select the image on the client side. When the user clicks the submit button make one call to save the user in the database and start upload with credentials. If possible keep a predictable path for upload. Like for user upload /users/:userId or something like that. this highly depends on your use case.
Now when upload finishes make a server call for acknowledgment and store some flag in the database.
Now advantages of this approach are:
You are completely offloading your server from handling file operations which are pretty heavy and I/O blocking and you are distributing that load to all clients.
If you want to post process the files after upload you can easily integrate this with serverless platforms and do that on there and again offload that.
You can easily provide retry mechanism to your users in case of file upload fails but they won't need to refill the data, just upload the image/file again
You don't need to expose the URL directly to the client for file upload as you are using temporary Creds.
If the significance of the images in your app is high then ideally, you should not complete the transaction until the image is saved. The approach should be to create an object in your code which you will eventually insert into mongodb, start upload of image to cloud and then add the link to this object. Finally then insert this object into mongodb in one go. Do not make repeated calls. Anything before that, raise an error and catch the exception
You can have many answers,
if you are working with big files greater than 16mb please go with gridfs and multer,
( changing the images to a different format and save them to mongoDB)
If your files are actually less than 16 mb, please try using this Converter that changes the image of format jpeg / png to a format of saving to mongodb, and you can see this as an easy alternative for gridfs ,
please check this github repo for more details..

Is this logic for a profile image uploader be scalable?

Would the following generic logic for implementing a profile upload feature be scalable in production?
1. Inside a web app user selects an image to upload
2. Image gets sent to the server where it gets stored in memory and validated using nodejs package called: multer
3. If the image file is valid, a unique name is generated for the file
4. The image is resized to 150 x 150 using nodejs package called: sharp
5. Image is streamed to google cloud storage
6. Once image is saved a public URL of the image is saved under the user’s profile inside of the database
7. The image ULR is sent back to the client and the image gets displayed
Languages used to implement the above
This would be implemented using:
firebase cloud functions running on Nodejs as a backend
Google Cloud Storage for holding images
firebase database for saving image url for the user uploading the image
My current concerns with this are:
Holding an image in memory while it gets validated and processed will potentially lead to servers clogging up during heavy load
How would this way of generating a unique name for the image scale?: uuidV4 + current date in milliseconds + last 5 characters of the image's original file name
Both multer and sharp are efficient enough for your needs:
Uploaded 2MB pictures downsized to 150x150 resolution.
Let's say you'll have 100,000 users. Every user uploads a picture that gets downsized to 20kb image. That's 1.907GB ~ 2GB of storage.
Google Cloud Storage:
One instance in Frankfurt
2GB regional storage
= $0.55 a year
Google Cloud Functions:
With 100k users, for simplicity, let's assume it will be evenly distributed and so we'll have 8334 per month users uploading their images. And let's be extremely pessimistic and say that one resize function will take 3s.
8334 resize operations per month
3s per function
Networking throughput 2MB per function
= $16.24 a year
So you're good to go. Happy coding!
This answer will get outdated eventually, so I'm including screenshot of google price calculator:

Manage security on file upload to nodejs

I have an image upload view on my client (ember.js) that send the resized image to nodejs rest api;
it works well but it is easy for someone expert to force upload of a non-resized image;
I would like to keep the resize process on the client because this allows users to select heavy-weight images, that are resized locally and uploaded only after that, when they are lightweight;
If someone else uses something like this, I'm interested on how it is possible to make this as safe as possible;
As a rule of thumb when developing web applications is never ever trust any data coming from the client side, always try to do a check in your server side!
Use authentication, this ensures that user only allow to upload data to their own account and not fiddling others files.
Add a special message passing between your server and client, a simple example would be
i. send a post API request first (that contains the image information and targeted compressed size) to your server indicating that your client is starting to compress the picture
ii. when uploading, add a metadata to include the complete compressed image, and check the uploaded image with your server if it is within the accepted threshold, else discard it
You could enhance the security of the message passing to be more complicated!
This would be my simple security, anyone else got better solution? :)
Approaches here also work for file uploads. You can use a combination of checking:
content-length header and/or (i.e. req.headers['content-length'] > x)
reading stream size as it's being read by server. (i.e req.on('data'))
If the stream data exceeds a certain size you can respond accordingly. Check out something like Multer for file uploads, specifically the limits section. Best approach would probably the second option.

Expressjs File Upload Customization

Expressjs has bodyParser middleware which can handle file-uploads and can even store them in a directory given in options. But in my app I want to store the files in Amazon S3, so I basically want to stream the file straight to S3 without having to store it locally at all.
But the problem is validation of the file. How can I be sure that these files are all images. Checking the content-type isn't good enough option coz that can be faked. I want to know is it ok if I do the validation after streaming the file to S3?? I am asking from the security point of view.
After storing the image, I need to retrieve it for creating thumbnails, How can I do it asynchronously after giving the response after file upload?
You have contradictory goals of not wanting to store it locally during upload but then also wanting to download it needlessly again to make thumbnails. If you want to go for technical slickness awards, you can simultaneously stream the file upload request body to a local temporary file as well as S3. Or you can do what the rest of the industry does and store it in a local temporary file and then thumbnail it, and then upload all sizes to S3. Either of these approaches alleviates any need to immediately download it from S3 to make thumbnails.
How exactly do you intend to validate that it's really an image? You could look at the first chunk of file data and validate for the file type's magic number if that gives you warm fuzzies, but ultimately it's untrusted user data. The second half of the supposed image file could be virus code and that is just as easily faked at the Content-Type header. Sounds like your security concerns are mostly driven by FUD as opposed to specific threats you intend to defend against. As long as you don't take the user's uploaded data, mark it executable and run it as root on your server, any non-image data is just going to be corrupt and fail to render correctly in a browser (and/or cause your thumbnailer program to exit with an error or perhaps crash in an extreme case).
Regarding validation can I just try to create a thumbnail and if I can't then its not a valid image and delete it. Is this way fine?
Most of the time, yes. There will be edge cases where your thumbnailer cannot process an image but a browser can as thumbnailers are not perfect and some images are partially corrupt. For example, I have found some animated GIFs that render and animate fine in a web browser but graphicsmagick crashes trying to process them. Not sure there's anything that can be done about those 0.01% edge cases.
And for uploads part, can I send a response to the user and than carry on with the thumbnail creation and storing it in S3?
Yes, that is generally the best approach so the user knows their upload succeeded. Generally image processing is usually architected as a "work queue" model where you just record that there's work to do and then proceed and a separate process or processes take work off the queue and complete it.

Securing anonymous image uploads

Sites like www.ebayclassifieds.com let users upload images in order to see thumbnail previews and make image adjustments before posting content. Visitors are able to upload images to those sites anonymously without any authorization beforehand.
Can the same type of image previews be done for a smaller site that has bandwidth and disk space constraints? I'd guess that one would set up a cron job to periodically delete images that were anonymously uploaded. But what are other measures that can be taken so that bandwidth usage and disk space don't get out of hand, in case someone tries to spam your site with bogus image uploads?
Here are some ideas off the top of my head:
Use session state to keep track of uploaded files and delete them automatically when the session expires.
Limit uploads per session/visitor (ie. one per anonymous visitor)
Limit the maximum size of a file that can be uploaded.
Limit image types to only those that are compressed (ie. don't allow BMPs)
Scale the images down to a reasonable size as soon as they are uploaded. You probably don't need full size.
Besides madisonw's answer I would just add, use CAPTCHA for the upload as well, so users can't use automated tools to upload the images at large...

Resources