Sending excel file from a Blob Storage to a REST Endpoint in Azure Functions with Node - node.js

Working on a small personal project where I can drop an .xlsx file on Azure Blob and it'll trigger( Node.js Blob Storage Trigger fn ) and send to a REST endpoint to be parsed and worked with etc.
I've been able to set it up and have the file be moved to another blob( intend to set up logic on the HTTP response to REST endpoint to then archive said file);
I'm not exactly sure how to set up the correct code and bindings to take the ingested .xlsx file and send the whole thing to an endpoint.
Bonus Question: is it better practice to zip the file or convert to binary or anything before sending? Performance isn't too big of a concern currently.
Thanks for any information or any pointers.

The recommended approach is to use event grid trigger of blob storage to trigger an event grid trigger based Azure function. Please refer these which seems to meet your requirement
Blob Event and Event Grid Trigger For Azure Function.
Note: Using blob trigger of Azure function may not be as reliable as Event Grid trigger for high volume scenario.
To answer to your bonus question, I think it won't be much beneficial to zip your .xlsx files since those are already compressed behind the scene.

Related

Email notification about new files on Azure File share

What are possible ways to implement such scenario?
I can think of some Azure function which will periodically check the share for new files. Are there any other possibilities.
I have been thinking also about duplicating the files to Blob storage and generate the notifications from there.
Storage content trigger is by default available for blobs. If you look for migrating to blob storage, then you can utilise BlobTrigger Azure function. In case of file trigger in File Share, the below are my suggestions as requested:
A TimerTrigger Azure function that acts as a poll to check for new file in that time frame the previous trigger occured.
Recurrence trigger in logic app to poll and check for new contents.
A continuous WebJob to continuously poll the File Share checking for new contents.
In my opinion, duplicating the files to Blob storage and making your notification work may not be a great option, because such operation once again requires a polling mechanism which can be achieved with options like a few mentioned above, but is still unnecessary.

Security concerns when uploading data to Azure Blob Storage directly from frontend

I am working on a project where we have to store some audio/video files on Azure Blob Storage and after the file is uploaded we need to calculate some price on the basis of the length of the file in minutes. We have an Angular frontend and the idea was to upload the file directly from the frontend, get the response from Azure with the file stats , then call a backend API to put that data in the database.
What I am wondering is what are the chances of manipulation of data in between getting the file data back from Azure and calling our backend API. Is there any chance the length could be modified before sending it to our API?
One possible solution would be to make use of Azure Event Grid with Blob integration. Whenever a blob is uploaded, an event will be raised automatically that you can consume in an Azure Function and save the data in your database.
There's a possibility that a user might re-upload same file with different size. If that happens, you will get another event (apart from the original event when the blob was first created). How you handle with updates would be entirely up to you.

Triggered copy data from blob to ADLS extracting path from filename

I am trying to centralize our data to a ADLSgen2 data lake. One of our datasets is 'dumped' in a blob storage and I want to have a triggered copy.
The files that are stored in the blob storage have a data as a filename (can be a arbitrary date) in JSON format. What I want is that new files are (binary) copied to a folder on the data lake with path using pieces of the date that are present in the filename.
2020-01-01.json → raw/blob/2020/01/raw_reports_blob_2020-01-01.json
First I tried a data copy job and a Pipeline in Azure Synapse but I am not sure how to set the sink path with details from source filename. It seems that the copy-data-tool cannot be triggered by new blob files. The pipeline method looks pretty powerful and I guess it is possible. What I want is not that difficult on Linux so I guess it must be possible in Azure as well.
Second, I tried to create an Azure Function as I am pretty comfortable with Python, however here I have a similar problem as I need to define in/out bindings. The out bindings are defined at design time and do not give me the freedom to the kind of path based on the source filename. Also, it feels somewhat overkill for a simple binary copy action. I can have the function triggered with new files in the blob and reading them is no problem.
I am relatively new to Azure and any help towards a solution is more than welcome.
See this answer as well: https://stackoverflow.com/a/66393471/496289
There is concept of "copy" per sē in ADLS. You read/download from source and write/upload to target.
As someone mentioned Data Factory can do this.
You can also use:
azcopy from a Power Shell Azure Function. azcopy cp "https://[srcaccount].blob.core.windows.net/[container]/[path/to/blob]?[SAS]" "https://[destaccount].blob.core.windows.net/[container]/[path/to/blob]?[SAS]"
Python/Java/... Azure Function. You'll have to download the file (in chunks if it's big) and upload it (in chunks if big).
Databricks. This would be similar misuse of a tool as using Azure Synapse Analytics to copy data between storage accounts.
Azure Logic apps. See this and this. Never used them, but I believe they are less code than Azure Function and have some programming capabilities, if it helps you create destination path programmatically.
Things to remember:
Data Factory, can be expensive. Especially compared to Azure Functions on consumption plan.
Azure Functions on consumption plan have 10 minute max before they timeout. So can't use it if you have files in GBs/TBs.
You'll be paying egress costs if applicable.

Process event files into Azure EventHub

I am fairly new to Azure .
I have a requirement where Source will send the event data in flat files. File will contain header and trailer records and events as data records. Each file will be in 10MB size and can contains about 50000-60000 events.
I want to process this file using python/scala and send the data into Azure eventhub. Can someone suggest me is this the best solution and how can I achieve this please?
Its an architectural question but you can use either Azure Logic Apps or Azure Functions.
First of all you should trigger whatever you choose by upload a file to Blob Storage. The file will gets picked and processed and then sent.
Use Azure Logic apps if you can simply parse the files for instance because they are JSON files and then simply repeat for each event and direct it to the event hub you want.
If the parsing of the files is more complex use Azure Functions, write up the code and output it to an event hub.

Can Azure Data Factory write to FTP

I want to write the output of pipeline to an FTP folder. ADF seems to support on-premises file but not FTP folder.
How can I write the output in text format to an FTP folder?
Unfortunately FTP Servers are not a supported data store for ADF as of right now. Therefore there is no OOTB way to interact with an FTP Server for either reading or writing.
However, you can use a custom activity to make it possible, but it will require some custom development to make this happen. A fellow Cloud Solution Architect within MS put together a blog post that talks about how he did it for one of his customers. Please take a look at the following:
https://blogs.msdn.microsoft.com/cloud_solution_architect/2016/07/02/creating-ftp-data-movement-activity-for-azure-data-factory-pipeline/
I hope that this helps.
Upon thinking about it you might be able to achieve what you want in a mildly convoluted way by writing the output to a Azure Blob storage account and then either
1) manually: downloading and pushing the file to the "FTP" site from the Blob storage account or
2) automatically: using Azure CLI to pull the file locally and then push it to the "FTP" site with a batch or shell script as appropriate
As a lighter weight approach to custom activities (certainly the better option for heavy work).
You may wish to consider using azure functions to write to ftp (note there is a time out when using a consumption plan - not in other plans, so it will depend on how big the files are).
https://learn.microsoft.com/en-us/azure/azure-functions/functions-create-storage-blob-triggered-function
You could instruct data factory to write to a intermediary blob storage.
And use blob storage triggers in azure functions to upload them as soon as they appear in blob storage.
Or alternatively, write to blob storage. And then use a timer in logic apps to upload from blob storage to ftp. Logic Apps hide a tremendous amount of power behind there friendly exterior.
You can write a Logic app that will pick your file up from Azure storage and send it to an FTP site. Then call the Logic App using a Data Factory Web Activity.
Make sure you do some error handling in your Logic app to return 400 if the ftp fails.

Resources