I'm trying to download a file from a browser using socket.io-stream. In a basic form, this is actually doable and there is a working example here.
However, that solution:
First streams the file contents to the browser using socket.io-stream.
Assemble the chunks in the client as a blob.
Create a hidden link to the blob location.
This forces the browser to contain the whole blob in memory before it can initiate the download. I'm working with really large blobs, so that is not advisable.
I would prefere to download the stream directly, instead of buffering it in memory in the browser.
Is that possible?
I know this is easy to do just with plain HTTP, but there are some reasons that make this simplest option not available in my case.
Related
how you doing?
I'm trying to download a excel file from a web site (Specifically DataCamp) in order to use its data into an automatic process, but before to get the file is necessary to sign in on the page. I was thinking that this would be possible with the JSON Query on the HTTP action, but to be honest I don't know where to start (I'm new on Azure).
The process that I need to emulate to get the file extraction would be as follow (I know this could be possible with an API or RPA but I don't have any available for now):
Could you tell me guys some advices (how to get the desired result or at least where to make research)? is this even posibile?
Best regards.
If you don't have other ways, e.g. your source is on an SFTP, etc. than using an HTTP Action should work, pass the BODY to your next action (e.g. you might want to persist that on a BLOB if content is binary).
If your content is "readable", e.g. JSON, CSV and want to load for processing, you need to ensure, for large files, that you read it in Chunks to load it completely before processing.
Detailed explanation at https://learn.microsoft.com/en-us/azure/logic-apps/logic-apps-handle-large-messages#download-content-in-chunks
For azure blob storage sdk of c#, there're multi methods for download / upload blob.
Download methods: DownloadText, DownloadToByteArray, DownloadToStream, DownloadToFile.
Upload methods: UploadText, UploadFromByteArray, UploadFromStream, UploadFromFile.
How do I choose these methods? like when the file is large during download/upload, and would some methods cause encode issues etc.?
Thanks.
You choose based on what you have or what you want; these things are here to make your life easy.
If you have/want a file, use the File methods (so you don't have to eg read your file into a byte array or attach a stream before uploading it, or so you can just download a file from the blob to your server)
If you have/want a stream, use the stream methods (imagine you want to send the blob data to a client, down a tcp socket - no point writing it to a file on your server then reading the file and sending it to the client, you should just open a stream from the blob and read from it and write to the rxpnsocket that goes to the client. This minimizes server resource use)
If you have/want an array, use the array methods (maybe you want to process it in memory some how)
See the docs for more info https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.storage.blob.cloudblockblob?view=azure-dotnet
I have a need to create a pdf or html document within a Node.js express API which then sends that document over HTTP to an API managing our CMS.
So functionally I would like to create the document and POST it as part of a multipart-form upload POST request to an external service.
I see how to do this if after I create the file, I then turn around and write it disk. After that point I can do a read stream of the file from that path to format the POST request with the file.
However I'm wondering how I can perform this action without writing the file to disk and then reading it into a read stream. It seems I should be able to accomplish this without that IO.
Anybody able to point me to a good example or library that does something along these lines?
You can extend Writable and/or Readable streams. By the first look this library do what you need, with the same way - extending built-in streams.
I want to upload images from the client to the server. The client must see a list of all images he or she has and see the image itself (a thumbnail or something like that).
I saw people using two methods (generically speaking)
1- Upload image and save the binaries to MongoDB
2- Upload an image and move it to a folder, save the path somewhere (the classic method, and the one I implemented so far)
What are the pros and cons of each method and how can I retrieve the data and show it in a template in each case (getting the path and writing to the src attribute of and img tag and sending the binaries) ?
Problems found so far: when I request foo.jpg (localhost:3000/uploads/foo.jpg) that I uploaded and the server moved to a known folder, my router (iron router) fails to find how to deal with the request.
1- Upload image and save the binaries to MongoDB
Either you limit the file size to 16MB and use only basic mongoDB, either you use gridFS and can store anything (no size limit). There are several pro-cons of using this method, but IMHO it is much better than storing on the file system :
Files don't touch your file system, they are piped to you database
You get back all the benefits of mongo and you can scale up without worries
Files are chunked and you can only send a specific byte range (useful for streaming or download resuming)
Files are accessed like any other mongo document, so you can use the allow/deny function, pub/sub, etc.
2- Upload an image and move it to a folder, save the path somewhere
(the classic method, and the one I implemented so far)
In this case, either you store everything in your public folder and make everything publicly accessible using the files names + paths, either you use dedicated asset delivery system, such as an ngix server. Either way, you will be using something less secure and maintenable than the first option.
This being said, have a look at the file collection package. It is much simpler than collection-fs and will offer you everything you are looking for out of the box (including a file api, gridFS storing, resumable uploads, and many other things).
Problems found so far: when I request foo.jpg
(localhost:3000/uploads/foo.jpg) that I uploaded and the server moved
to a known folder, my router (iron router) fails to find how to deal
with the request.
Do you know this path leads to your root folder public/uploads/foo.jpg directory? If you put it there, you should be able to request it.
Expressjs has bodyParser middleware which can handle file-uploads and can even store them in a directory given in options. But in my app I want to store the files in Amazon S3, so I basically want to stream the file straight to S3 without having to store it locally at all.
But the problem is validation of the file. How can I be sure that these files are all images. Checking the content-type isn't good enough option coz that can be faked. I want to know is it ok if I do the validation after streaming the file to S3?? I am asking from the security point of view.
After storing the image, I need to retrieve it for creating thumbnails, How can I do it asynchronously after giving the response after file upload?
You have contradictory goals of not wanting to store it locally during upload but then also wanting to download it needlessly again to make thumbnails. If you want to go for technical slickness awards, you can simultaneously stream the file upload request body to a local temporary file as well as S3. Or you can do what the rest of the industry does and store it in a local temporary file and then thumbnail it, and then upload all sizes to S3. Either of these approaches alleviates any need to immediately download it from S3 to make thumbnails.
How exactly do you intend to validate that it's really an image? You could look at the first chunk of file data and validate for the file type's magic number if that gives you warm fuzzies, but ultimately it's untrusted user data. The second half of the supposed image file could be virus code and that is just as easily faked at the Content-Type header. Sounds like your security concerns are mostly driven by FUD as opposed to specific threats you intend to defend against. As long as you don't take the user's uploaded data, mark it executable and run it as root on your server, any non-image data is just going to be corrupt and fail to render correctly in a browser (and/or cause your thumbnailer program to exit with an error or perhaps crash in an extreme case).
Regarding validation can I just try to create a thumbnail and if I can't then its not a valid image and delete it. Is this way fine?
Most of the time, yes. There will be edge cases where your thumbnailer cannot process an image but a browser can as thumbnailers are not perfect and some images are partially corrupt. For example, I have found some animated GIFs that render and animate fine in a web browser but graphicsmagick crashes trying to process them. Not sure there's anything that can be done about those 0.01% edge cases.
And for uploads part, can I send a response to the user and than carry on with the thumbnail creation and storing it in S3?
Yes, that is generally the best approach so the user knows their upload succeeded. Generally image processing is usually architected as a "work queue" model where you just record that there's work to do and then proceed and a separate process or processes take work off the queue and complete it.