Azure Data Factory: read from csv and copy row by row to a cosmos db - azure

I'm new to Azure Data Factory. I'm trying to solve the following problem:
Read csv file from Azure Blob
Parse it row by row and dump each row into an existing cosmos db
I am currently looking into a solution that does:
Copy data from source (csv) to sink (Azure Storage Table)
ForEach activity that parses the table and copies the rows into the db
Is this a correct approach, and if it is, how should I set up the dynamic content of the ForEach activity?
Note:
I've tried this solution (link) but I get an error message saying
Reading or replacing offers is not supported for serverless accounts
which means that
CosmosDB Serverless is not currently supported as Sink for the Data flow in Azure Data Factory.

If you use Lookup + ForEach actives, the Foreach Items should be:
#activity('Lookup1').output.value
Your solution may be hard to achieve that.
Since you have found that Data Flow doesn't support Cosmos DB Serverless, I think you may can ref this tutorial: Copy Data From Blob Storage To Cosmos DB Using Azure Data Factory
It uses copy active to copy data from a csv file in Blob Storage to Azure Cosmos DB directly.

Related

How to incrementally load data from Azure Blob storage to Azure SQL Database using Data Factory?

I have a json file stored in Azure Blob Storage and I have loaded it into Azure SQL DB using Data Factory.
Now I would like to find a way in order to load only new records from the file to my database (as the file is being updated every week or so). Is there a way to do it?
Thanks!
You can use the upsert ( slowly changing dimension type 1 ) that is already implemented in Azure Data Factory.
It will add new record and update old record that changed.
Here a quick tutorial :
https://www.youtube.com/watch?v=MzHWZ5_KMYo
I would suggest you to use Dataflow activity.
In Dataflow Activity, you have the option of alter row as shown in below image.
In Alter row you can use Upsert if condition.
Here mention condition as 1 == 1

Can we pass whole data of csv from blob storage to on-premises SQL database using Azure data factory?

I am trying to pass all records of csv from blob storage to on-premises database of SQL using Azure data factory. I know how to pass data one by one using LookUp and Copy activity but I don't know how to pass all records of csv.
you can directly use copy activity with blob file as the source and Azure SQL DB table as sink wherein all records of file would be copied into table.
There is no need of a lookup activity
You will have to use copy activity to copy data from azure blob storage to on-prem SQL database.
You can follow below steps:
Step1:
Select copy activity in data factory.
Step2:
Select Source dataset as azure blob storage
Step3:
Select on-prem sql database as a sink
Step4:
Click on import schema to do the mapping.
Step5:
Finally Execute Copy activity. No need to use lookup activity here.

I have about 20 files of type excel /pdf which can be dowloaded from an Http Server.I need to load this file into Azure Storage using Data Factory

I have 20 files of type Excel/pdf located in different https server. i need to validate these file and load into azure storage Using Data Factory.I need to do apply some business logic on this data and load into azure SQL Database.I need to if we have to create a pipe line and store this data in azure blob storage and then load into Azure sql Database
I have tried creating copy data in data factory
My idea as below:
No.1
Step 1: Use Copy Activity to transfer data from http connector source into blob storage connector sink.
Step 2: Meanwhile, configure a blob storage trigger to execute your logic code so that the blob data will be processed as soon as it's collected into blob storage.
Step 3: Use Copy Activity to transfer data from blob storage connector source into SQL database connector sink.
No.2:
Step 1:Use Copy Activity to transfer data from http connector source into SQL database connector sink.
Step 2: Meanwhile, you could configure stored procedure to add your logic steps. The data will be executed before inserted into table.
I think both methods are feasible. The No.1, the business logic is freer and more flexible. The No.2, it is more convenient, but it is limited by the syntax of stored procedures. You could pick the solution as you want.
The excel and pdf are supported yet. Based on the link,only below formats are supported by ADF diectly:
i tested for csv file and get the below random characters:
You could refer to this case to read excel files in ADF:How to read files with .xlsx and .xls extension in Azure data factory?

Is there any way to filter out some data in copy activity when source is Blob storage and sink is SQL database?

I am trying to copy data from Azure blobs to azure SQL database using Azure Data Factory.
The azure blobs are incrementally stored everytime in the storage account. They are just JSON having key-value pairs. So I want to filter the data on the basis of one key-value before it get copied inside the SQL databse.
you could use stored procedure to do some filtering before writing into your db.
https://learn.microsoft.com/en-us/azure/data-factory/connector-azure-sql-database#invoking-stored-procedure-for-sql-sink

Azure Data Factory: Moving data from Table Storage to SQL Azure

While moving data from Table Storage to SQL Azure, is it possible to obtain only the Delta (The data that hasn't been already moved) using Azure Data Factory?
A more detailed explanation:
There is an Azure Storage Table, which contains some data, which will be updated periodically. And I want to create a Data Factory pipeline which moves this data to an SQL Azure Database. But during each move I only want the newly added data to be written to SQL DB. Is it possible with Azure Data Factory?
See more information on azureTableSourceQuery and copy activity at this link : https://azure.microsoft.com/en-us/documentation/articles/data-factory-azure-table-connector/#azure-table-copy-activity-type-properties.
Also see this link for invoking stored procedure for sql: https://azure.microsoft.com/en-us/documentation/articles/data-factory-azure-sql-connector/#invoking-stored-procedure-for-sql-sink
You can query each time on timestamp to achieve something similar to delta copy, but this is not true delta copy.

Resources