Expressjs File Upload Customization - node.js

Expressjs has bodyParser middleware which can handle file-uploads and can even store them in a directory given in options. But in my app I want to store the files in Amazon S3, so I basically want to stream the file straight to S3 without having to store it locally at all.
But the problem is validation of the file. How can I be sure that these files are all images. Checking the content-type isn't good enough option coz that can be faked. I want to know is it ok if I do the validation after streaming the file to S3?? I am asking from the security point of view.
After storing the image, I need to retrieve it for creating thumbnails, How can I do it asynchronously after giving the response after file upload?

You have contradictory goals of not wanting to store it locally during upload but then also wanting to download it needlessly again to make thumbnails. If you want to go for technical slickness awards, you can simultaneously stream the file upload request body to a local temporary file as well as S3. Or you can do what the rest of the industry does and store it in a local temporary file and then thumbnail it, and then upload all sizes to S3. Either of these approaches alleviates any need to immediately download it from S3 to make thumbnails.
How exactly do you intend to validate that it's really an image? You could look at the first chunk of file data and validate for the file type's magic number if that gives you warm fuzzies, but ultimately it's untrusted user data. The second half of the supposed image file could be virus code and that is just as easily faked at the Content-Type header. Sounds like your security concerns are mostly driven by FUD as opposed to specific threats you intend to defend against. As long as you don't take the user's uploaded data, mark it executable and run it as root on your server, any non-image data is just going to be corrupt and fail to render correctly in a browser (and/or cause your thumbnailer program to exit with an error or perhaps crash in an extreme case).
Regarding validation can I just try to create a thumbnail and if I can't then its not a valid image and delete it. Is this way fine?
Most of the time, yes. There will be edge cases where your thumbnailer cannot process an image but a browser can as thumbnailers are not perfect and some images are partially corrupt. For example, I have found some animated GIFs that render and animate fine in a web browser but graphicsmagick crashes trying to process them. Not sure there's anything that can be done about those 0.01% edge cases.
And for uploads part, can I send a response to the user and than carry on with the thumbnail creation and storing it in S3?
Yes, that is generally the best approach so the user knows their upload succeeded. Generally image processing is usually architected as a "work queue" model where you just record that there's work to do and then proceed and a separate process or processes take work off the queue and complete it.

Related

Torn between a couple practices for file upload to CDN

The platform I'm working on involves the client(ReactJS) and server(NodeJs, Express), of course.
The major feature of this platform involves users uploading images constantly.
Everything has been setup successfully using multer to receive images in formdata on my api server and now its time to create an "image management system".
The main problem I'll be tackling is the unpredictable file size of users. The files are images and the depend on the OS of the user i.e user taking pictures, users taking screenshots.
The first solution is to determine the max file size, and to transport it using a compression algorithm to the api server. When the backend receives it successfully, images are uploaded to a CDN (Cloudinary) and then the link is stored in the database along with other related records.
The second which im strongly leaning towards is shifting this "upload to CDN" system to the client side and make the client connect to cloudinary directly and after grab the secure link and insert into the JSON that would be sent to the server.
This eliminates the problem of grappling with file sizes which is progress, but I'll like to know if it is a good practice.
Restricting the file size is possible when using the Cloudinary Upload Widget for client-side uploads.
You can include the 'maxFileSize' parameter when calling the widget, setting its value to 500000 (value should be provided in bytes).
https://cloudinary.com/documentation/upload_widget
If the client will try and upload a larger file he/she will get back an error stating the max file size is exceeded and the upload will fail.
Alternatively, you can choose to limit the dimensions of the image, so if exceeded, instead of failing the upload with an error, it will automatically scale down to the given dimensions while retaining aspect ratio and the upload request will succeed.
However, using this method doesn't guarantee the uploaded file will be below a certain desired size (e.g. 500kb) as each image is different and one image that is scaled-down to given dimensions can result in a file size that is smaller than your threshold, while another image may slightly exceed it.
This can be achieved using the limit cropping method as part of an incoming transformation.
https://cloudinary.com/documentation/image_transformations#limit
https://cloudinary.com/documentation/upload_images#incoming_transformations

NodeJS, how to handle image uploading with MongoDB?

I would like to know what is the best way to handle image uploading and saving the reference to the database. What I'm mostly interested is what order do you do the process in?
Should you upload the images first in the front-end (say Cloudinary), and then call the API with result links to the images and save it to the database?
Or should you upload the images to the server first, and upload them from the back-end and save the reference afterwards?
OR, should you do the image uploading after you save the record in the database and then update it once the images were uploaded?
It really depends on the resources, timeline, and number of images you need to upload daily.
So basically if you have very few images to upload then you can upload that image to your server then upload it to any cloud storage(s3, Cloudinary,..) you are using. As this will be very easy to implement(you can find code snippet over the internet) and you can securely maintain your secret keys/credential to your cloud platform on the server side.
But, according to me best way of doing this will be something like this. I am taking user registration as an example
Make server call to get a temporary credential to upload files on the cloud(Generally, all the providers give this functionality i.e. STS/Signed URL in AWS).
The user will fill up the form and select the image on the client side. When the user clicks the submit button make one call to save the user in the database and start upload with credentials. If possible keep a predictable path for upload. Like for user upload /users/:userId or something like that. this highly depends on your use case.
Now when upload finishes make a server call for acknowledgment and store some flag in the database.
Now advantages of this approach are:
You are completely offloading your server from handling file operations which are pretty heavy and I/O blocking and you are distributing that load to all clients.
If you want to post process the files after upload you can easily integrate this with serverless platforms and do that on there and again offload that.
You can easily provide retry mechanism to your users in case of file upload fails but they won't need to refill the data, just upload the image/file again
You don't need to expose the URL directly to the client for file upload as you are using temporary Creds.
If the significance of the images in your app is high then ideally, you should not complete the transaction until the image is saved. The approach should be to create an object in your code which you will eventually insert into mongodb, start upload of image to cloud and then add the link to this object. Finally then insert this object into mongodb in one go. Do not make repeated calls. Anything before that, raise an error and catch the exception
You can have many answers,
if you are working with big files greater than 16mb please go with gridfs and multer,
( changing the images to a different format and save them to mongoDB)
If your files are actually less than 16 mb, please try using this Converter that changes the image of format jpeg / png to a format of saving to mongodb, and you can see this as an easy alternative for gridfs ,
please check this github repo for more details..

Upload to AWS S3 Best Practice

I'm creating a node backend application and I have an entity which can have files assigned.
I have the following options:
Make a request and upload the files as soon as the user selects them in the frontend form and assign them to the entity when the user makes the request to create / update it
Upload the files in the same request which creates / updates the entity
I was wondering if there is a best practice for this scenario. I can't really decide whats better.
This is one of those "depends" answers, and it depends how you are doing uploads and if you plan to clean up your S3 buckets.
I'd suggest creating the entity first (option #2), because than you can store what S3 files are with that entity. If you tried option #1, you might have untracked files (or some kind of staging area), which could require cleanup at some point in the future. (If you files are small, it may never matter, and you just eat that $0.03/GB fee each month : )
I've been seeing on some web sites that look like option #1, where files are included in my form/document as I'm "editing". Pasting an image from my buffer is particularly sweet, and sometimes I see a placeholder of text while it is uploaded, showing the picture when complete. Now I think these "documents" are actually saved on their servers in some kind of draft status, so it might be your option #2 anyway. You could do the same that creates a draft entity, and finalizes it later on (and then have a way to clean out drafts and their attachments at some point).
Also, depending on bucket privacy you need to achieve, have a look at AWS Cognito to upload directly from the browser. You could save your server bandwidth, and reduce your request time, by not using your server as as pass-through.

Uploading and requesting images from Meteor app

I want to upload images from the client to the server. The client must see a list of all images he or she has and see the image itself (a thumbnail or something like that).
I saw people using two methods (generically speaking)
1- Upload image and save the binaries to MongoDB
2- Upload an image and move it to a folder, save the path somewhere (the classic method, and the one I implemented so far)
What are the pros and cons of each method and how can I retrieve the data and show it in a template in each case (getting the path and writing to the src attribute of and img tag and sending the binaries) ?
Problems found so far: when I request foo.jpg (localhost:3000/uploads/foo.jpg) that I uploaded and the server moved to a known folder, my router (iron router) fails to find how to deal with the request.
1- Upload image and save the binaries to MongoDB
Either you limit the file size to 16MB and use only basic mongoDB, either you use gridFS and can store anything (no size limit). There are several pro-cons of using this method, but IMHO it is much better than storing on the file system :
Files don't touch your file system, they are piped to you database
You get back all the benefits of mongo and you can scale up without worries
Files are chunked and you can only send a specific byte range (useful for streaming or download resuming)
Files are accessed like any other mongo document, so you can use the allow/deny function, pub/sub, etc.
2- Upload an image and move it to a folder, save the path somewhere
(the classic method, and the one I implemented so far)
In this case, either you store everything in your public folder and make everything publicly accessible using the files names + paths, either you use dedicated asset delivery system, such as an ngix server. Either way, you will be using something less secure and maintenable than the first option.
This being said, have a look at the file collection package. It is much simpler than collection-fs and will offer you everything you are looking for out of the box (including a file api, gridFS storing, resumable uploads, and many other things).
Problems found so far: when I request foo.jpg
(localhost:3000/uploads/foo.jpg) that I uploaded and the server moved
to a known folder, my router (iron router) fails to find how to deal
with the request.
Do you know this path leads to your root folder public/uploads/foo.jpg directory? If you put it there, you should be able to request it.

Manage security on file upload to nodejs

I have an image upload view on my client (ember.js) that send the resized image to nodejs rest api;
it works well but it is easy for someone expert to force upload of a non-resized image;
I would like to keep the resize process on the client because this allows users to select heavy-weight images, that are resized locally and uploaded only after that, when they are lightweight;
If someone else uses something like this, I'm interested on how it is possible to make this as safe as possible;
As a rule of thumb when developing web applications is never ever trust any data coming from the client side, always try to do a check in your server side!
Use authentication, this ensures that user only allow to upload data to their own account and not fiddling others files.
Add a special message passing between your server and client, a simple example would be
i. send a post API request first (that contains the image information and targeted compressed size) to your server indicating that your client is starting to compress the picture
ii. when uploading, add a metadata to include the complete compressed image, and check the uploaded image with your server if it is within the accepted threshold, else discard it
You could enhance the security of the message passing to be more complicated!
This would be my simple security, anyone else got better solution? :)
Approaches here also work for file uploads. You can use a combination of checking:
content-length header and/or (i.e. req.headers['content-length'] > x)
reading stream size as it's being read by server. (i.e req.on('data'))
If the stream data exceeds a certain size you can respond accordingly. Check out something like Multer for file uploads, specifically the limits section. Best approach would probably the second option.

Resources