How to download a file from website by using logic app? - azure

how you doing?
I'm trying to download a excel file from a web site (Specifically DataCamp) in order to use its data into an automatic process, but before to get the file is necessary to sign in on the page. I was thinking that this would be possible with the JSON Query on the HTTP action, but to be honest I don't know where to start (I'm new on Azure).
The process that I need to emulate to get the file extraction would be as follow (I know this could be possible with an API or RPA but I don't have any available for now):
Could you tell me guys some advices (how to get the desired result or at least where to make research)? is this even posibile?
Best regards.

If you don't have other ways, e.g. your source is on an SFTP, etc. than using an HTTP Action should work, pass the BODY to your next action (e.g. you might want to persist that on a BLOB if content is binary).
If your content is "readable", e.g. JSON, CSV and want to load for processing, you need to ensure, for large files, that you read it in Chunks to load it completely before processing.
Detailed explanation at https://learn.microsoft.com/en-us/azure/logic-apps/logic-apps-handle-large-messages#download-content-in-chunks

Related

Risks involved in allowing users to upload json files to the server side

I'm working on a project that has a feature that allows users to submit and upload their work to the server side as JSON files..are there any security risks involved with that like hijacking/injection? and how can i prevent them?
i'm using Node.JS on the server side, are there any packages that can help with this issue?
Thanks
From my own experience whenever we are asking users to upload anything it is dangerous. We are writing manual serialization and it is like after receiving the user's input we convert all risky characters into something else for example convert { to %br% and so on and then desterilize them.
It also depends how you implemented your code , you may also use URL Encode. And you will need to test your code with some invalid data and add proper exception handlers in your code.

Creating an HTML or PDF "file" in memory and streaming it in Node.js

I have a need to create a pdf or html document within a Node.js express API which then sends that document over HTTP to an API managing our CMS.
So functionally I would like to create the document and POST it as part of a multipart-form upload POST request to an external service.
I see how to do this if after I create the file, I then turn around and write it disk. After that point I can do a read stream of the file from that path to format the POST request with the file.
However I'm wondering how I can perform this action without writing the file to disk and then reading it into a read stream. It seems I should be able to accomplish this without that IO.
Anybody able to point me to a good example or library that does something along these lines?
You can extend Writable and/or Readable streams. By the first look this library do what you need, with the same way - extending built-in streams.

Is there any reliable way to capture file upload event in a chrome extension?

In my content script, I want to monitor which file a is getting uploaded to a web application.
I monitor "change" event for any "input:file" element. It works on any site (such as Gmail) that uses "input:file".
However sites like imgur, use SWFUpload mechanism. I tried to capture "fileQueued" event on element that I suspected to be swfupload. But that did not work.
How can I capture file upload event for sites that use swfupload?
Are there any other plugins that manage file uploading that I would need to take care in my content script?
Is there any generic mechanism to tackle this problem?
(I am aware of drag-n-drop mechanism, but I have not handled that case so far.
I have also read following relevant question on SO:
Grab file with chrome extension before upload)
It's probably worth your time to experiment with the chrome.webRequest API; it appears that the onBeforeRequest event contains info about file uploads. It's a complex API with extra parameters to addListener; read the docs thoroughly.

Expressjs File Upload Customization

Expressjs has bodyParser middleware which can handle file-uploads and can even store them in a directory given in options. But in my app I want to store the files in Amazon S3, so I basically want to stream the file straight to S3 without having to store it locally at all.
But the problem is validation of the file. How can I be sure that these files are all images. Checking the content-type isn't good enough option coz that can be faked. I want to know is it ok if I do the validation after streaming the file to S3?? I am asking from the security point of view.
After storing the image, I need to retrieve it for creating thumbnails, How can I do it asynchronously after giving the response after file upload?
You have contradictory goals of not wanting to store it locally during upload but then also wanting to download it needlessly again to make thumbnails. If you want to go for technical slickness awards, you can simultaneously stream the file upload request body to a local temporary file as well as S3. Or you can do what the rest of the industry does and store it in a local temporary file and then thumbnail it, and then upload all sizes to S3. Either of these approaches alleviates any need to immediately download it from S3 to make thumbnails.
How exactly do you intend to validate that it's really an image? You could look at the first chunk of file data and validate for the file type's magic number if that gives you warm fuzzies, but ultimately it's untrusted user data. The second half of the supposed image file could be virus code and that is just as easily faked at the Content-Type header. Sounds like your security concerns are mostly driven by FUD as opposed to specific threats you intend to defend against. As long as you don't take the user's uploaded data, mark it executable and run it as root on your server, any non-image data is just going to be corrupt and fail to render correctly in a browser (and/or cause your thumbnailer program to exit with an error or perhaps crash in an extreme case).
Regarding validation can I just try to create a thumbnail and if I can't then its not a valid image and delete it. Is this way fine?
Most of the time, yes. There will be edge cases where your thumbnailer cannot process an image but a browser can as thumbnailers are not perfect and some images are partially corrupt. For example, I have found some animated GIFs that render and animate fine in a web browser but graphicsmagick crashes trying to process them. Not sure there's anything that can be done about those 0.01% edge cases.
And for uploads part, can I send a response to the user and than carry on with the thumbnail creation and storing it in S3?
Yes, that is generally the best approach so the user knows their upload succeeded. Generally image processing is usually architected as a "work queue" model where you just record that there's work to do and then proceed and a separate process or processes take work off the queue and complete it.

Can I capture JSON data already being sent with a userscript/Chrome extension?

I'm trying to write a userscript/Chrome extension to capture JSON data being sent while using a web service so that I can reformat it and display selected portion on page. Currently the JSON is sent as the application loads (as I've observed from watching traffic with Fiddler 2). Is my only option to request the JSON again or is capture possible? As I'm not providing a code example, a requested answer is even some guidance on what method / topic to research or if I'm barking up the wrong tree.
No easy way.
If it is for a specific site you might look into intercepting and overwriting part of a code which sends a request. For example if it is sent on a button click you can replace existing click handler with your own implementation.
You can also try to make a proxy for XMLHttpRequest. Not sure if this even possible, never seen a working example. You can look at some attempts here.
For all these tasks you probably would need to run your javascript code out of sandboxed content script to be able to access parent page variables, so you would need to inject <script> tag with your code right into the page from a content script:

Resources