Parsing files DAT files, CSV files and Image files using Azure services - azure

I have 5 types of EDI files namely - *.DAT, *.XML, *.TXT, *.CSV and Image files which contains data in them whose data are not in standard format.
I need to parse them and extract required data from them and persist them in SQL Database.
Currently, I'm spending time writing parser class libraries for each type of EDI file and not scalable .
I need to know if there are any azure services which can do the parsing work for me and is scalable.
Can I expect a solution on this regards?

I need to parse them and extract required data from them and persist them in SQL Database.
Yes, You can use Azure Functions to process a Files like CSV an import data into Azure SQL Or Azure Data Factory is also helpful to read or copy many file formats and store them in SQL Server Database in specified formats, there is an practical example provided by Microsoft, Please refer here.
To do with Azure Functions, the following steps are:
Create Azure Functions (Stack: .Net 3.1) of type Blob Trigger and define the local storage account connection string in local.settings.json like below:
In the Function.cs, there will be some boilerplate code which gives the logic of showing the uploaded blob name and its size.
In the Run function, you can define your parsing logic of the uploaded blob files.
Create the Azure SQL Database, configure the server with location, pricing tier and the required settings. After that, Select Set Server Firewall on the database overview page. Click Add Client IP to add your IP Address and Save. Test the database whether you're able to connect.
Deploy the project to Azure Functions App from Visual Studio.
Open your Azure SQL Database in the Azure portal and navigate to Connection Strings. Copy the connection string for ADO.NET.
Paste that Connection String in Azure Function App Settings in the portal.
Test the function app from portal and the remaining steps of uploading files from storage to SQL Database were available in this GitHub documentation
Also for parsing the files like CSV, etc. to JSON Format through Azure Functions, please refer here.

Consider using Azure Data Factory. It supports a range of file types.

Related

Ingest Data From On-Premise SFTP Folder To Azure SQL Database (Azure Data Factory)

Usecase: I have data files of varying size copied to a specific SFTP folder periodically (Daily/Weekly). All these files needs to be validated and processed. Then write them to related tables in Azure SQL. Files are of CSV format and are actually a flat text file which directly corresponds to a specific Table in Azure SQL.
Implementation:
Planning to use Azure Data Factory. So far, from my reading I could see that I can have a Copy pipeline in-order to copy the data from On-Prem SFTP to Azure Blob storage. As well, we can have SSIS pipeline to copy data from On-Premise SQL Server to Azure SQL.
But I don't see a existing solution to achieve what I am looking for. can someone provide some insight on how can I achieve the same?
I would try to use Data Factory with a Data Flow to validate/process the files (if possible for your case). If the validation is too complex/depends on other components, then I would use functions and put the resulting files to blob. The copy activity is also able to import the resulting CSV files to SQL server.
You can create a pipeline that does the following:
Copy data - Copy Files from SFTP to Blob Storage
Do Data processing/validation via Data Flow
and sink them directly to SQL table (via Data Flow sink)
Of course, you need an integration runtime, that can access the on-prem server - either by using VNet integration or by using the self hosted IR. (If it is not publicly accessible)

How to access an azure Database containing data from Azure Log Analytics Query

I have a working query for my app data to be analyzed.
currently it analyzes the last two weeks data with an ago(14d).
Now i want to use a value containing the release date of the apps current version. Since i havent found a way to add a new database table to the already existing database containing the log data in azure analytics, i created a new database in azure and entered my data there.
Now i just don't know, if i can get access to that database at all from within the web query interface of Azure log analytics, or if i have to use some other tool for that?.
i hope that somebody can help me on this.
As always with azure there is a lot of stuff to read about it, but nothing concrete for my issue (or at least i haven't found it yet).
And yes, i know how to insert the data into the query with a let, but since I want to use the same data in different queries, an external location which can be accessed from all the queries would be the solution I prefer.
Thx in advance.
Maverick
You cannot access a db directly. You are better of using a csv/json file in blob storage. In the following example I uploaded a txt file with csv data like this:
2a6c024f-9093-434c-b3b1-000821a15b1a,"Customer 1"
28a658a8-5466-45ea-862c-003b20507dd4,"Customer 2"
c46fb949-d807-4eea-8de4-005dd4beb39a,"Customer 3"
e05b67ee-ff83-4805-b004-0064449f196c,"Customer 4"
Then I can reference this data from log analytics / application insights in a query like this using the externaldata operator:
let customers = externaldata(id:string, companyName:string) [
h#"https://xxx.blob.core.windows.net/myblob.txt?sv=2019-10-10&st=2020-09-29T11%3A39%3A22Z&se=2050-09-30T11%3A39%3A00Z&sr=b&sp=r&sig=xxx"
] with(format="csv");
requests
| extend CompanyId = tostring(customDimensions.CustomerId)
| join kind=leftouter
(
customers
)
on $left.CompanyId == $right.id
The url https://xxx.blob.core.windows.net/myblob.txt?sv=2019-10-10&st=2020-09-29T11%3A39%3A22Z&se=2050-09-30T11%3A39%3A00Z&sr=b&sp=r&sig=xxx is created by creating a url including a SAS token by using the Microsoft Azure Storage Explorer, selecting a blob and then right click -> Get Shared Access Signature. In the popup create a SAS and then copy the uri.
i know Log Analytics uses Azure Data Explorer in the back-end and Azure Data Explorer has a feature to use External Tables within the queries but I am not sure if Log Analytics support External Tables.
External Tables in Azure Data Explorer
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/schema-entities/externaltables#:~:text=An%20external%20table%20is%20a,and%20managed%20outside%20the%20cluster.

Azure Data Factory and SharePoint

I have some Excel files stored in SharePoint online. I want copy files stored in SharePoint folders to Azure Blob storage.
To achieve this, I am creating a new pipeline in Azure Data factory using Azure Portal. What are possible ways to copy files from SharePoint to Azure blob store using Azure Data Factory pipelines?
I have looked at all linked services types in Azure data factory pipeline but couldn't find any suitable type to connect to SharePoint.
Rather than directly accessing the file in SharePoint from Data Factory, you might have to use an intermediate technology and have Data Factory call that. You have a few of options:
Use a Logic App to move the file
Use an Azure Function
Use a custom activity and write your own C# to copy the file.
To call a Logic App from ADF, you use a web activity.
You can directly call an Azure Function now.
We can create a linked service of type 'File system' by providing the directory URL as 'Host' value. To authenticate the user, provide username and password/AKV details.
Note: Use Self-hosted IR
You can use the logic app to fetch data from Sharepoint and load it to azure blob storage and now you can use azure data factory to fetch data from blob even we can set an event trigger so that if any file comes into blob container the azure pipeline will automatically trigger.
You can use Power Automate (https://make.powerautomate.com/) to do this task automatically:
Create an Automated cloud flow trigger whenever a new file is dropped in a SharePoint
Use any mentioned trigger as per your requirement and fill in the SharePoint details
Add an action to create a blob and fill in the details as per your use case
By using this you will be pasting all the SharePoint details to the BLOB without even using ADF.
My previous answer was true at the time, but in the last few years, Microsoft has published guidance on how to copy documents from a SharePoint library. You can copy file from SharePoint Online by using Web activity to authenticate and grab access token from SPO, then passing to subsequent Copy activity to copy data with HTTP connector as source.
I ran into some issues with large files and Logic Apps. It turned out there were some extremely large files to be copied from that SharePoint library. SharePoint has a default limit of 100 MB buffer size, and the Get File Content action doesn’t natively support chunking.
I successfully pulled the files with the web activity and copy activity. But I found the SharePoint permissions configuration to be a bit tricky. I blogged my process here.
You can use a binary dataset if you just want to copy the full file rather than read the data.
If my file is located at https://mytenant.sharepoint.com/sites/site1/libraryname/folder1/folder2/folder3/myfile.CSV, the URL I need to retrieve the file is https://mytenant.sharepoint.com/sites/site1/libraryname/folder1/folder2/folder3/myfile.CSV')/$value.
Be careful about when you get your auth token. Your auth token is valid for 1 hour. If you copy a bunch of files sequentially, and it takes longer than that, you might get a timeout error.

Sql to Azure Blob to LogicApp

I am new Azure functions, One of my task is to read data from Sql database and upload that data as a csv file in azure Blob storage using Azure functions and then using logicapps to retreive it. I am stuck with Sql to file to Azure Blob
I would start with the Azure Functions documentation. I did a quick internet search and found this article on how to access to SQL database from an Azure Function: https://learn.microsoft.com/en-us/azure/azure-functions/functions-scenario-database-table-cleanup
Here is another article which shows how to upload content to blob storage: https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob#output
Apply your learnings from both and you should be able to accomplish this task.
What about if instead you create a trigger to start the logic apps when something happen in your DB. Interesting article here : https://flow.microsoft.com/en-us/blog/introducing-triggers-in-the-sql-connector/
you can then pass the information to a function to process the data and push the new csv file to the storage : https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-dotnet?tabs=windows
Optionally you might need to transform what the trigger from sql returns you, there you can use the logic apps transform the input : https://learn.microsoft.com/en-us/azure/logic-apps/logic-apps-enterprise-integration-transform

Azure Data Factory - moving data from On-Premise SQL to Azure SQL

A simple question: Can this be achieved directly? I mean without the Azure blob storage in between (as showed in all the examples)? Can someone provide some code example please.
yes, you can do this directly. In fact, you can do direct copies from any of our supported sources/sinks, you don't have to pass through blob. To go from on-prem SQL Server-->SQL azure, you will need to setup a Data Management Gateway connector on your on-prem server. Then, you use a linked service of type AzureStorage and an output dataset of type AzureSQLTable as the output dataset, instead of AzureBlob as is shown in the example. The exact steps to setup the DMG and the JSON code for the linked services, datasets, and pipelines can be found in our documentation. We are also improving our UI in the near future to make these kinds of copy setups an easy code-free experience.
https://azure.microsoft.com/en-us/documentation/articles/data-factory-sqlserver-connector/

Resources