Logic Apps - Get Blob Content Using Path - azure

I have a event driven logic app (blob event) which reads a block blob using the path and uploads the content to Azure Data Lake. I noticed the logic app is failing with 413 (RequestEntityTooLarge) reading a large file (~6 GB). I understand that logic apps has the limitation of 1024 MB - https://learn.microsoft.com/en-us/connectors/azureblob/ but is there any work around to handle this type of situation? The alternative solution I am working on is moving this step to Azure Function and get the content from the blob. Thanks for your suggestions!

If you want to use an Azure function, I would suggest you to have a look at this at this article:
Copy data from Azure Storage Blobs to Data Lake Store
There is a standalone version of the AdlCopy tool that you can deploy to your Azure function.
So your logic app will call this function that will run a command to copy the file from blob storage to your data lake factory. I would suggest you to use a powershell function.
Another option would be to use Azure Data Factory to copy file to Data Lake:
Copy data to or from Azure Data Lake Store by using Azure Data Factory
You can create a job that copy file from blob storage:
Copy data to or from Azure Blob storage by using Azure Data Factory
There is a connector to trigger a data factory run from logic app so you may not need azure function but it seems that there is still some limitations:
Trigger Azure Data Factory Pipeline from Logic App w/ Parameter

You should consider using Azure Files connector:https://learn.microsoft.com/en-us/connectors/azurefile/
It is currently in preview, the advantage it has over Blob is that it doesn't have a size limit. The above link includes more information about it.

For the benefit of others who might be looking for a solution of this sort.
I ended up creating an Azure Function in C# as the my design dynamically parses the Blob Name and creates the ADL structure based on the blob name. I have used chunked memory streaming for reading the blob and writing it to ADL with multi threading for adderssing the Azure Functions time out of 10 minutes.

Related

Automatically pickup uploaded text file from Blob Storage and import data to Azure SQL

I have a Blob Storage, and an Azure SQL DB.
When I upload a text file to my Blob Storage, says users.txt which contains list of users I need to import to User table in my SQL DB.
Is there a way that whenever a file arrive to Blob Storage, it will trigger an event. That event will trigger another event to import data to SQL DB(I don't know, but may be an Azure function, Logic App...). Therefore I don't need to write any code. Is that possible? If so, could you please let me know step by step how to do it?
Any help would be highly appreciated!.
Thanks!.
Teka a look at Azure Blob storage trigger for Azure Functions, which describes how you can use a "blob added" event to trigger an Azure Function. You can do something like below.
[FunctionName("SaveTextBlobToDb")]
public static void Run(
[BlobTrigger("container-with-text-files/{name}", Connection = "StorageConnectionAppSetting")] Stream streamWithTextFile)
{
// your logic for handling new blob (streamWithTextFile)
}
In the implementation, you can save the blob content to your SQL database. If you want to make sure that the blob is not lost due to any transient errors (like issues with db connectivity), you can first put the info about new blob to an Azure storage queue, and then have a separate Azure Function to take each blob-info from the queue and transfer the content to the database.
One solution that comes to mind, other than the options you already know, is Azure Data Factory. It is a kind of ETL tool for the cloud. It allows you to set up pipelines for data processing with defined inputs and outputs. In your scenario the input would be a blob and the output would be a Sql Server database record.
You can trigger the pipeline to be executed in the event a new blob is added. The docs even have an example showing just that, you can find it here.
In your case you can probably use the Copy Activity to copy the data from the blob to sql server. A tutorial titled "Copy data from Azure Blob storage to a SQL Database by using the Copy Data tool" is found here
An Azure Function will do the job as well but will involve coding. A Logic App is also a good option.
You answered your question...azure function or logic app. You can declaratively bind to your blob within an azure function, you can use the blob trigger on a logic app as well. Someone suggested data factory (this would necessarily be the most expensive option).

Sql to Azure Blob to LogicApp

I am new Azure functions, One of my task is to read data from Sql database and upload that data as a csv file in azure Blob storage using Azure functions and then using logicapps to retreive it. I am stuck with Sql to file to Azure Blob
I would start with the Azure Functions documentation. I did a quick internet search and found this article on how to access to SQL database from an Azure Function: https://learn.microsoft.com/en-us/azure/azure-functions/functions-scenario-database-table-cleanup
Here is another article which shows how to upload content to blob storage: https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob#output
Apply your learnings from both and you should be able to accomplish this task.
What about if instead you create a trigger to start the logic apps when something happen in your DB. Interesting article here : https://flow.microsoft.com/en-us/blog/introducing-triggers-in-the-sql-connector/
you can then pass the information to a function to process the data and push the new csv file to the storage : https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-dotnet?tabs=windows
Optionally you might need to transform what the trigger from sql returns you, there you can use the logic apps transform the input : https://learn.microsoft.com/en-us/azure/logic-apps/logic-apps-enterprise-integration-transform

Is there a way to continuously pipe data from Azure Blob into BigQuery?

I have a bunch of files in Azure Blob storage and it's constantly getting new ones. I was wondering if there is a way for me to first take all the data I have in Blob and move it over to BigQuery and then keep a script or some job running so that all new data in there gets sent over to BigQuery?
BigQuery offers support for querying data directly from these external data sources: Google Cloud Bigtable, Google Cloud Storage, Google Drive. Not include Azure Blob storage. As Adam Lydick mentioned, as a workaround, you could copy data/files from Azure Blob storage to Google Cloud Storage (or other BigQuery-support external data sources).
To copy data from Azure Blob storage to Google Cloud Storage, you can run WebJobs (or Azure Functions), and BlobTriggerred WebJob can trigger a function when a blob is created or updated, in WebJob function you can access the blob content and write/upload it to Google Cloud Storage.
Note: we can install this library: Google.Cloud.Storage to make common operations in client code. And this blog explained how to use Google.Cloud.Storage sdk in Azure Functions.
I'm not aware of anything out-of-the-box (on Google's infrastructure) that can accomplish this.
I'd probably set up a tiny VM to:
Scan your Azure blob storage looking for new content.
Copy new content into GCS (or local disk).
Kick off a LOAD job periodically to add the new data to BigQuery.
If you used GCS instead of Azure Blob Storage, you could eliminate the VM and just have a Cloud Function that is triggered on new items being added to your GCS bucket (assuming your blob is in a form that BigQuery knows how to read). I presume this is part of an existing solution that you'd prefer not to modify though.

Query blobs in Blob storage

I have serialized text data that is stored in a blob inside Azure blob storage. The text is basically key/value data. I am wondering if there is a way to easily query the blob without exploding the data into another table/database or pulling the blob into memory?
Azure Blob storage has no API to query data within the blob - it's just dumb storage. See here for the Blob Storage API. You're essentially stuck reading, deserializing and grabbing your value(s).
Perhaps Azure table storage would be a better fit for this application? That at least keeps things in the realm of an Azure storage account rather than needing to pull in a SQL Server instance.
One option you could consider is to use Data Lake Analytics, as it supports Azure Blobs as data source.
Depending on what your preferred way of accessing the data is, you can use PowerShell, .NET SDK etc. to query the data...

How to transfer csv files from Google Cloud Storage to Azure Datalake Store

I'd like to have our daily csv log files transferred from GCS to Azure Datalake Store, but I can't really figure out what would be the easiest way for it.
Is there a built-in solution for that?
Can I do that with Data Factory?
I'd rather avoid running a VM scheduled to do this with the apis. The idea comes from the GCS->(DataFlow->)BigQuery solution.
Thanks for any ideas!
Yes, you can move data from Google Cloud Storage to Azure Data lake Store using Azure Data Factory by developing custom copy activity. However, in this activity, you will be using APIs for transferring that data. See details on this article.

Resources