I have an image upload view on my client (ember.js) that send the resized image to nodejs rest api;
it works well but it is easy for someone expert to force upload of a non-resized image;
I would like to keep the resize process on the client because this allows users to select heavy-weight images, that are resized locally and uploaded only after that, when they are lightweight;
If someone else uses something like this, I'm interested on how it is possible to make this as safe as possible;
As a rule of thumb when developing web applications is never ever trust any data coming from the client side, always try to do a check in your server side!
Use authentication, this ensures that user only allow to upload data to their own account and not fiddling others files.
Add a special message passing between your server and client, a simple example would be
i. send a post API request first (that contains the image information and targeted compressed size) to your server indicating that your client is starting to compress the picture
ii. when uploading, add a metadata to include the complete compressed image, and check the uploaded image with your server if it is within the accepted threshold, else discard it
You could enhance the security of the message passing to be more complicated!
This would be my simple security, anyone else got better solution? :)
Approaches here also work for file uploads. You can use a combination of checking:
content-length header and/or (i.e. req.headers['content-length'] > x)
reading stream size as it's being read by server. (i.e req.on('data'))
If the stream data exceeds a certain size you can respond accordingly. Check out something like Multer for file uploads, specifically the limits section. Best approach would probably the second option.
Related
The platform I'm working on involves the client(ReactJS) and server(NodeJs, Express), of course.
The major feature of this platform involves users uploading images constantly.
Everything has been setup successfully using multer to receive images in formdata on my api server and now its time to create an "image management system".
The main problem I'll be tackling is the unpredictable file size of users. The files are images and the depend on the OS of the user i.e user taking pictures, users taking screenshots.
The first solution is to determine the max file size, and to transport it using a compression algorithm to the api server. When the backend receives it successfully, images are uploaded to a CDN (Cloudinary) and then the link is stored in the database along with other related records.
The second which im strongly leaning towards is shifting this "upload to CDN" system to the client side and make the client connect to cloudinary directly and after grab the secure link and insert into the JSON that would be sent to the server.
This eliminates the problem of grappling with file sizes which is progress, but I'll like to know if it is a good practice.
Restricting the file size is possible when using the Cloudinary Upload Widget for client-side uploads.
You can include the 'maxFileSize' parameter when calling the widget, setting its value to 500000 (value should be provided in bytes).
https://cloudinary.com/documentation/upload_widget
If the client will try and upload a larger file he/she will get back an error stating the max file size is exceeded and the upload will fail.
Alternatively, you can choose to limit the dimensions of the image, so if exceeded, instead of failing the upload with an error, it will automatically scale down to the given dimensions while retaining aspect ratio and the upload request will succeed.
However, using this method doesn't guarantee the uploaded file will be below a certain desired size (e.g. 500kb) as each image is different and one image that is scaled-down to given dimensions can result in a file size that is smaller than your threshold, while another image may slightly exceed it.
This can be achieved using the limit cropping method as part of an incoming transformation.
https://cloudinary.com/documentation/image_transformations#limit
https://cloudinary.com/documentation/upload_images#incoming_transformations
Using Twitter as an example: Twitter has an endpoint for uploading file data. https://developer.twitter.com/en/docs/media/upload-media/api-reference/post-media-upload-append
Can anyone provide an example of a real HTTP message containing, for example, image file data, showing how it is supposed to be structured? I'm fairly sure Twitter's documentation is nonsense, as their "example request" is the following:
POST https://upload.twitter.com/1.1/media/upload.json?command=APPEND&media_id=123&segment_index=2&media_data=123
Is the media_data really supposed to go in the URL? What if you have raw binary media data? Would it go in the body? How is the REST service to know how the data is encoded?
You're looking at the chunked uploader - it's intended for sending large files, breaking them into chunks, so a network failure doesn't mean you have to re-upload a 100 MB .mp4. It is, as a result, fairly complicated. (Side note: The file data goes in the request body, not the URL as a GET parameter... as indicated by "Requests should be multipart/form-data POST format.")
There's a far less complicated unchunked uploader that'll be easier to work with if you're just uploading a regular old image.
All of this gets a lot easier if you use one of Twitter's recommended libraries for your language.
to upload a file, you need to send it in a form, in node.js server you save accept the incoming file using formidable.
You can also use express-fileupload or multer
I would like to know what is the best way to handle image uploading and saving the reference to the database. What I'm mostly interested is what order do you do the process in?
Should you upload the images first in the front-end (say Cloudinary), and then call the API with result links to the images and save it to the database?
Or should you upload the images to the server first, and upload them from the back-end and save the reference afterwards?
OR, should you do the image uploading after you save the record in the database and then update it once the images were uploaded?
It really depends on the resources, timeline, and number of images you need to upload daily.
So basically if you have very few images to upload then you can upload that image to your server then upload it to any cloud storage(s3, Cloudinary,..) you are using. As this will be very easy to implement(you can find code snippet over the internet) and you can securely maintain your secret keys/credential to your cloud platform on the server side.
But, according to me best way of doing this will be something like this. I am taking user registration as an example
Make server call to get a temporary credential to upload files on the cloud(Generally, all the providers give this functionality i.e. STS/Signed URL in AWS).
The user will fill up the form and select the image on the client side. When the user clicks the submit button make one call to save the user in the database and start upload with credentials. If possible keep a predictable path for upload. Like for user upload /users/:userId or something like that. this highly depends on your use case.
Now when upload finishes make a server call for acknowledgment and store some flag in the database.
Now advantages of this approach are:
You are completely offloading your server from handling file operations which are pretty heavy and I/O blocking and you are distributing that load to all clients.
If you want to post process the files after upload you can easily integrate this with serverless platforms and do that on there and again offload that.
You can easily provide retry mechanism to your users in case of file upload fails but they won't need to refill the data, just upload the image/file again
You don't need to expose the URL directly to the client for file upload as you are using temporary Creds.
If the significance of the images in your app is high then ideally, you should not complete the transaction until the image is saved. The approach should be to create an object in your code which you will eventually insert into mongodb, start upload of image to cloud and then add the link to this object. Finally then insert this object into mongodb in one go. Do not make repeated calls. Anything before that, raise an error and catch the exception
You can have many answers,
if you are working with big files greater than 16mb please go with gridfs and multer,
( changing the images to a different format and save them to mongoDB)
If your files are actually less than 16 mb, please try using this Converter that changes the image of format jpeg / png to a format of saving to mongodb, and you can see this as an easy alternative for gridfs ,
please check this github repo for more details..
I'm trying to limit data usage when serving images to ensure the user isn't loading bloated pages on mobile while still maintaining the ability to serve larger images on desktop.
I was looking at Twitter and noticed they append :large to the end of the url
e.g. https://pbs.twimg.com/media/CDX2lmOWMAIZPw9.jpg:large
I'm really just curious how this request is being handled, if you go directly to that link there is no scripts on the page so I'm assuming it's done serverside.
Can this be done using something like Restify/Express on a Node instance? More than anything I'm really just curious how it is done.
Yes, it can be done using Express in Node. However, it can't be done using express.static() since it is not a static request. Rather, the receiving function must handle the request by parsing the querystring (or whatever :large is) in order to dynamically respond with the appropriate image.
Generally the images will have already been pre-generated during the user-upload phase for a set of varying sizes (e.g. small, medium, large, original), and the function checks the querystring to determine which static request to respond with.
That is a much higher-performing solution than generating the appropriately-sized image server-side on every request from the original image, though sometimes that dynamic approach is necessary if the server is required to generate a non-finite set of image sizes.
Expressjs has bodyParser middleware which can handle file-uploads and can even store them in a directory given in options. But in my app I want to store the files in Amazon S3, so I basically want to stream the file straight to S3 without having to store it locally at all.
But the problem is validation of the file. How can I be sure that these files are all images. Checking the content-type isn't good enough option coz that can be faked. I want to know is it ok if I do the validation after streaming the file to S3?? I am asking from the security point of view.
After storing the image, I need to retrieve it for creating thumbnails, How can I do it asynchronously after giving the response after file upload?
You have contradictory goals of not wanting to store it locally during upload but then also wanting to download it needlessly again to make thumbnails. If you want to go for technical slickness awards, you can simultaneously stream the file upload request body to a local temporary file as well as S3. Or you can do what the rest of the industry does and store it in a local temporary file and then thumbnail it, and then upload all sizes to S3. Either of these approaches alleviates any need to immediately download it from S3 to make thumbnails.
How exactly do you intend to validate that it's really an image? You could look at the first chunk of file data and validate for the file type's magic number if that gives you warm fuzzies, but ultimately it's untrusted user data. The second half of the supposed image file could be virus code and that is just as easily faked at the Content-Type header. Sounds like your security concerns are mostly driven by FUD as opposed to specific threats you intend to defend against. As long as you don't take the user's uploaded data, mark it executable and run it as root on your server, any non-image data is just going to be corrupt and fail to render correctly in a browser (and/or cause your thumbnailer program to exit with an error or perhaps crash in an extreme case).
Regarding validation can I just try to create a thumbnail and if I can't then its not a valid image and delete it. Is this way fine?
Most of the time, yes. There will be edge cases where your thumbnailer cannot process an image but a browser can as thumbnailers are not perfect and some images are partially corrupt. For example, I have found some animated GIFs that render and animate fine in a web browser but graphicsmagick crashes trying to process them. Not sure there's anything that can be done about those 0.01% edge cases.
And for uploads part, can I send a response to the user and than carry on with the thumbnail creation and storing it in S3?
Yes, that is generally the best approach so the user knows their upload succeeded. Generally image processing is usually architected as a "work queue" model where you just record that there's work to do and then proceed and a separate process or processes take work off the queue and complete it.