Azure data factory parsing xml in copy activity - string

I am using azure data factory to have a soap API connection data to be transferred to snowflake. I understand that snowflake has to have the data in variant column or csv or we need to have intermediate storage in azure to finally land the data in snowflake. the problem I faced is the data from api is a string within that there is xml data. so when i put the data in blob storage, its a string. how do I avoid this and have the proper columns while putting the data ?
over here, the column is read as string. is there a way to parse it into their respective rows ? I tried to put the collection reference, it still does not recognize individual columns. Any input is highly appreciated.

You need to change to Advanced editor in Mapping section of copy activity. I took the sample data and repro'd this. Below are the steps.
Img:1 Source dataset preview
In mapping section of copy activity,
Click Import Schema
Switch to Advanced editor .
Give the collection reference value.
Img:2 Mapping settings

Related

Cannot convert excel to csv : Azure Synapse Analytics

I want to convert Excel to CVS in Azure Synapse Analytics but I got an error.
The error message is "Invalid excel header with empty value".
The Excel file I want to convert looks like this (created for the question) and I need to remove the blank column A when converting to csv.
I have never used ADF before so I don't know.
Can someone please tell me how to do this?
Any help would be appreciated.
sample.excel
You have to use dataflows to do that in ADF.
First create a linked service for your source data set.
Create linked service for your target folder.
My input looks like this (took from your attached sheet)
Go to the author tab of data factory and select on new dataflow.
Source settings should look like this
Source options: Point to the location where you have stored excel sheet and also select the sheetname, in my case it is sheet1 (For this example I have used Azure Blob storage)
Keep rest of the tabs as default and add a sink to your data flow.
Sink Settings should look like below
Point to the target location where you want to store your csv file (I have used Azure blob storage). Keep rest of the things on default
Go to the new pipeline and pull dataflow activity in your canvas and trigger your dataflow.
And my output in csv looks like this

Add file name to Copy activity in Azure Data Factory

I want to copy data from a CSV file (Source) on Blob storage to Azure SQL Database table (Sink) via regular Copy activity but I want to copy also file name alongside every entry into the table. I am new to ADF so the solution is probably easy but I have not been able to find the answer in the documentation and neither on the internet so far.
My mapping currently looks like this (I have created a table for output with the file name column but this data is not explicitly defined at the column level at the CSV file therefore I need to extract it from the metadata and pair it to the column):
For the first time, I thought that I am going to put dynamic content in there and therefore solve the problem this way. But there is not an option to use dynamic content in each individual box so I do not know how to implement the solution. My next thought was to use Pre-copy script but have not seen how could I use it for this purpose. What is the best way to solve this issue?
In Mapping columns of copy activity you cannot add the dynamic content of Meta data.
First give the source csv dataset to the Get Metadata activity then join it with copy activity like below.
You can add the file name column by the Additional columns in the copy activity source itself by giving the dynamic content of the Get Meta data Actvity after giving same source csv dataset.
#activity('Get Metadata1').output.itemName
If you are sure about the data types of your data then no need to go to the mapping, you can execute your pipeline.
Here I am copying the contents of samplecsv.csv file to SQL table named output.
My output for your reference:

Is there any easy way to move multiple tables in Oracle which has around 5TB data to ADLS

I have a requirement where I need to move data from multiple tables in Oracle to ADLS.
The size of data is around 5TB. These files in ADLS, I might use it in future to connect Power BI.
Is their any easy and efficient way to do this.
Thanks in Advance !
You can do this by using Lookup activity and ForEach in Azure Data Factory.
Create a table or file to store the list of table names which needs to be extracted.
Use Lookup Activity get the tables list.
Pass the list to ForEach activity and by looping each table copy the current item() from oracle to ADLS.
In ForEach, settings->Items, add the following code in the Add Dynamic Content.
#activity('Get-Tables').output.value
Add a Copy activity inside ForEach activity.
In Copy data activity, source > Query and Input the following code:
SELECT * FROM #{item().Table_Name}
Now add the sink dataset(ADLS) and Execute your pipeline.
Please refer Microsoft Documentation to know about the creation of linked services for Oracle.
Please go through this article by Sean Forgatch in MODERN DATA ENGINEERING if you face any issues in the process.

Azure Data Factory V2 - Process One Array Field on Row as a string

I created an Azure Data Factory pipeline that uses a Rest data source to pull data from a Rest API and copy it to an Azure SQL database. Each row in the Rest data source contains approx. 8 fields but one of those fields contains an array of values. I'm using a Copy Data task. How do I get all values from that field to map into 1 of my database fields, possibly as a string? I've tried clicking on "Collection Reference" for that field but if the array field has 5 values, it creates 5 different records in my SQL table for the one source row. If I don't select "Collection Reference", it only grabs the first value in the array.
I looked into using the Data Flow mapping task instead, but that one doesn't seem to support a Rest API dataset as a data source.
Please help.
You can store the output of REST API as a JSON file in Azure blob storage by Copy Data activity. Then you can use that file as Source and do transformation in Data Flow. Also you can use Lookup activity to get the JSON data and invoke the SP to store the data in Azure SQL Database(This way will be cheaper and it's performance will be better).

Copy Blob Data To Sql Database in Azure Data Factory with Conditions

I am performing a a trigger based pipeline to copy data from blob storage to SQL database. In every blob file there are bunch of JSONs from which I need to copy just few of them and I can differenciate them on the basis of a Key-value pair present in every JSON.
So How to filter those JSON containing that Value corresponding to a common key?
One Blob file looks like this. Now While the copy activity is happening ,it should filter data according to the Event- Name: "...".
Data factory in general only moves data, it doesnt modify it. What you are trying to do might be done using a staging table in the sink sql.
You should first load the json values as-is from the blob storage in the staging table, then copy it from the staging table to the real table where you need it, applying your logic to filter in the sql command used to extract it.
Remember that sql databases have built in functions to treat json values: https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server?view=sql-server-2017
Hope this helped!
At this time we do not have an option for the copy activity to filter the content( with an exception of sql source ) .
In your scenario it looks like that already know which values needs to omitted , on way to go will be have a "Stored Procedure" activity , after the copy activity which will be just delete the values which you don't want from the table ,this should be easy to implement but depending on the volume of data it may lead to performance issues . The other option is to have the JSON file cleaned on the storage side before it is ingested .

Resources