Owin Selfhost: deal with large files - owin

I found this article:
http://www.strathweb.com/2012/09/dealing-with-large-files-in-asp-net-web-api/
It tells how to set the TransferMode when Selfhosting with System.Web.Http.Selfhost.
Is there a similar way to set the TransferMode to Streaming when selfhosting with Owin instead?

As far as I can tell Microsoft's Owin server already streams all uploads and downloads. How you handle this data in your WebApi (or equivalent) determines whether or not it gets fully buffered or stays as a stream.
For serving file downloads, make sure that the response content is a StreamContent. For file uploads, this depends on what format the message body is in. For example, this answer covers streaming multipart content to disk

Related

How do I upload a file to a REST endpoint?

Using Twitter as an example: Twitter has an endpoint for uploading file data. https://developer.twitter.com/en/docs/media/upload-media/api-reference/post-media-upload-append
Can anyone provide an example of a real HTTP message containing, for example, image file data, showing how it is supposed to be structured? I'm fairly sure Twitter's documentation is nonsense, as their "example request" is the following:
POST https://upload.twitter.com/1.1/media/upload.json?command=APPEND&media_id=123&segment_index=2&media_data=123
Is the media_data really supposed to go in the URL? What if you have raw binary media data? Would it go in the body? How is the REST service to know how the data is encoded?
You're looking at the chunked uploader - it's intended for sending large files, breaking them into chunks, so a network failure doesn't mean you have to re-upload a 100 MB .mp4. It is, as a result, fairly complicated. (Side note: The file data goes in the request body, not the URL as a GET parameter... as indicated by "Requests should be multipart/form-data POST format.")
There's a far less complicated unchunked uploader that'll be easier to work with if you're just uploading a regular old image.
All of this gets a lot easier if you use one of Twitter's recommended libraries for your language.
to upload a file, you need to send it in a form, in node.js server you save accept the incoming file using formidable.
You can also use express-fileupload or multer

Is there a way to know how large a file being streamed via http is?

I'm sending a fairly large file via express in a little node app of mine. As the file is so large, I send the file in a stream.
On the receiving node app that is downloading the file, I would really like the ability to track the progress of the download.
This SO question uses the 'content-length' header to determine how big a file is. However after a little more research, it seems that when streaming a file the content-length cannot be known ahead of time.
Is there something obvious I'm missing here? I'm a bit surprised that there is no way to know how big a file being streamed is before it's finished downloading...
if you upload via multipart/form-data you can find the content-length included.

Creating an HTML or PDF "file" in memory and streaming it in Node.js

I have a need to create a pdf or html document within a Node.js express API which then sends that document over HTTP to an API managing our CMS.
So functionally I would like to create the document and POST it as part of a multipart-form upload POST request to an external service.
I see how to do this if after I create the file, I then turn around and write it disk. After that point I can do a read stream of the file from that path to format the POST request with the file.
However I'm wondering how I can perform this action without writing the file to disk and then reading it into a read stream. It seems I should be able to accomplish this without that IO.
Anybody able to point me to a good example or library that does something along these lines?
You can extend Writable and/or Readable streams. By the first look this library do what you need, with the same way - extending built-in streams.

Manage security on file upload to nodejs

I have an image upload view on my client (ember.js) that send the resized image to nodejs rest api;
it works well but it is easy for someone expert to force upload of a non-resized image;
I would like to keep the resize process on the client because this allows users to select heavy-weight images, that are resized locally and uploaded only after that, when they are lightweight;
If someone else uses something like this, I'm interested on how it is possible to make this as safe as possible;
As a rule of thumb when developing web applications is never ever trust any data coming from the client side, always try to do a check in your server side!
Use authentication, this ensures that user only allow to upload data to their own account and not fiddling others files.
Add a special message passing between your server and client, a simple example would be
i. send a post API request first (that contains the image information and targeted compressed size) to your server indicating that your client is starting to compress the picture
ii. when uploading, add a metadata to include the complete compressed image, and check the uploaded image with your server if it is within the accepted threshold, else discard it
You could enhance the security of the message passing to be more complicated!
This would be my simple security, anyone else got better solution? :)
Approaches here also work for file uploads. You can use a combination of checking:
content-length header and/or (i.e. req.headers['content-length'] > x)
reading stream size as it's being read by server. (i.e req.on('data'))
If the stream data exceeds a certain size you can respond accordingly. Check out something like Multer for file uploads, specifically the limits section. Best approach would probably the second option.

Expressjs File Upload Customization

Expressjs has bodyParser middleware which can handle file-uploads and can even store them in a directory given in options. But in my app I want to store the files in Amazon S3, so I basically want to stream the file straight to S3 without having to store it locally at all.
But the problem is validation of the file. How can I be sure that these files are all images. Checking the content-type isn't good enough option coz that can be faked. I want to know is it ok if I do the validation after streaming the file to S3?? I am asking from the security point of view.
After storing the image, I need to retrieve it for creating thumbnails, How can I do it asynchronously after giving the response after file upload?
You have contradictory goals of not wanting to store it locally during upload but then also wanting to download it needlessly again to make thumbnails. If you want to go for technical slickness awards, you can simultaneously stream the file upload request body to a local temporary file as well as S3. Or you can do what the rest of the industry does and store it in a local temporary file and then thumbnail it, and then upload all sizes to S3. Either of these approaches alleviates any need to immediately download it from S3 to make thumbnails.
How exactly do you intend to validate that it's really an image? You could look at the first chunk of file data and validate for the file type's magic number if that gives you warm fuzzies, but ultimately it's untrusted user data. The second half of the supposed image file could be virus code and that is just as easily faked at the Content-Type header. Sounds like your security concerns are mostly driven by FUD as opposed to specific threats you intend to defend against. As long as you don't take the user's uploaded data, mark it executable and run it as root on your server, any non-image data is just going to be corrupt and fail to render correctly in a browser (and/or cause your thumbnailer program to exit with an error or perhaps crash in an extreme case).
Regarding validation can I just try to create a thumbnail and if I can't then its not a valid image and delete it. Is this way fine?
Most of the time, yes. There will be edge cases where your thumbnailer cannot process an image but a browser can as thumbnailers are not perfect and some images are partially corrupt. For example, I have found some animated GIFs that render and animate fine in a web browser but graphicsmagick crashes trying to process them. Not sure there's anything that can be done about those 0.01% edge cases.
And for uploads part, can I send a response to the user and than carry on with the thumbnail creation and storing it in S3?
Yes, that is generally the best approach so the user knows their upload succeeded. Generally image processing is usually architected as a "work queue" model where you just record that there's work to do and then proceed and a separate process or processes take work off the queue and complete it.

Resources