I have a ADF pipeline with an ODATA source.
I'm copying that into an SQL database using autocreate table.
I would like to autogenerate the table name as well, but I can't figure out how it's done.
The name I would like to use is the Path name from the source:
It should somehow be used as the table name:
Is this possible in ADF?
Can I get table name from source path in ADF?
we cant access dataset name in pipeline to work around this we can get dataset name by creating parameter
To get Source table name in sink first we have to create parameter or variable in pipeline with name tablename and type string.
Then create parameter in source dataset as table name with type string.
Then in the path add dynamic value #dataset().tablename to get value of parameter created in above step.
In dataset properties add dynamic value for table name #pipeline().parameters.tablename to get value from pipeline parameter and pass this value to source dataset.
Then create parameter in sink dataset as table name with type string.
Then in the schema name add dbo and in tablename add dynamic value #dataset().tablename to get value of parameter created in above step.
In dataset properties add dynamic value for table name #pipeline().parameters.tablename to get value from pipeline parameter and pass this value to sink dataset.
Successfully copied table with same name from ODATA source to SQL database.
Note: I have table name Airlines in my ODATA source.
Related
I have metadata in my Azure SQL db /csv file as below which has old column name and datatypes and new column names.
I want to rename and change the data type of oldfieldname based on those metadata in ADF.
The idea is to store the metadata file in cache and use this in lookup but I am not able to do it in data flow expression builder. Any idea which transform or how I should do it?
I have reproduced the above and able to change the column names and datatypes like below.
This is the sample csv file I have taken from blob storage which has meta data of table.
In your case, take care of new Data types because if we don't give correct types, it will generate error because of the data inside table.
Create dataset and give this to lookup and don't check first row option.
This is my sample SQL table:
Give the lookup output array to ForEach.
Inside ForEach use script activity to execute the script for changing column name and Datatype.
Script:
EXEC SP_RENAME 'mytable2.#{item().OldName}', '#{item().NewName}', 'COLUMN';
ALTER TABLE mytable2
ALTER COLUMN #{item().NewName} #{item().Newtype};
Execute this and below is my SQL table with changes.
I'm querying an API using Azure Data Factory and the data I receive from the API looks like this.
{
"96":"29/09/2022",
"95":"31/08/2022",
"93":"31/07/2022"
)
When I come to write this data to a table, ADF assumes the column names are the numbers and the dates are stored as rows like this
96
95
93
29/09/2022
31/08/2022
31/07/2022
when i would like it to look like this
Date
ID
29/09/2022
96
31/08/2022
95
31/07/2022
93
Does any one have any suggestions on how to handle this, I ideally want to avoid using USP's and dynamic SQL. I really only need the ID for the month of the previous one we're in.
PS - API doesn't support any filtering on this object
Updates
I'm querying the API using a web activity and if i try to store the data to an Array variable the activity fails as the output is an object.
When I use a copy data activity I've set the sink to automatically create the table and the mapping looks likes this
mapping image
Thanks
Instead of directly trying to copy the JSON response to SQL table, convert it the response to a string, extract the required values and insert them into the SQL table.
Look at the following demonstration. I have taken the sample response provided as a parameter (object type). I used set variable activity for extracting the values.
My parameter:
{"93":"31/07/2022","95":"31/08/2022","96":"29/09/2022"}
Dynamic content used in set variable activity:
#split(replace(replace(replace(replace(string(pipeline().parameters.api_op),' ',''),'"',''),'{',''),'}',''),',')
The output for set variable activity will be:
Now inside For each activity (pass the previous variable value as items value in for each), I used copy data to copy each row separately to my sink table (Auto create option enabled). I have taken a sample json file as my source (We are going to ignore all the columns anyway.)
Create the required 2 additional columns called id and date with the following dynamic content:
#id
#split(item(),':')[0]
#date
#split(item(),':')[1]
Configure the sink. Select the database, create dataset, give a name for table (I have given dbo.opTable) and select Auto create table under sink settings.
The following is an image of mapping. Delete the column mappings which are not required and only use additional columns created above.
When I debug the pipeline, it will run successfully, and the required values are inserted into the table. The following is output sink table for reference.
I want schema through get metadata activity so it can be passed as an output to stored procedure
If you give only folder in the dataset that you provided for, Get Meta Data activity it won't show the structure property in the list.
It will show the Folder properties like Item name, type.
To get the structure property, you need to give the file name in the Dataset and check on First row as header.
Now, you can see file properties like structure and columns count in the activity.
Sample Output of a Blob file:
You can pass this JSON string to your stored procedure activity.
I am trying to parse the $$FILEPATH value in the "Additional columns" section of the Copy Activity.
The filepaths have a format of: time_period=202105/part-12345.parquet . I would like just the "202105" portion of the filepath. I cannot hardcode it because there are other time_period folders.
I've tried this (from the link below): #{substring($$FILEPATH, add(indexOf($$FILEPATH, '='),1),sub(indexOf($$FILEPATH, '/'),6))} but I get an error saying Unrecognized expression: $$FILEPATH
The only other things I can think of are using: 1) Get Metadata Activity + For each Activity or 2) possibly trying to do this in DataFlow
$$FILEPATH is the reserved variable to store the file path. You cannot add dynamic expression with $$FILEPATH.
You have to create a variable to store the folder name as required and then pass it dynamically in an additional column.
Below is what I have tried.
As your folder name is not static, getting the folder names using the Get Metadata activity.
Get Metadata Output:
Pass the output to the ForEach activity to loop all the folders.
Add a variable at the pipeline level to store the folder name.
In the ForEach activity, add the set variable activity to extract the date part from the folder name and add the value to the variable.
#substring(item().name, add(indexof(item().name, '='),1), sub(length(item().name), add(indexof(item().name, '='),1)))
Output of Set variable:
In source dataset, parameterize the path/filename to pass them dynamically.
Add copy data activity after set variable and select the source dataset.
a) Pass the current item name of the ForEach activity as a file path. Here I hardcoded the filename as *.parquet to copy all files from that path (this works only when all files have the same structure).
b) Under Additional Column, add a new column, give a name to a new column, and under value, select to Add dynamic content and add the existing variable.
Add Sink dataset in Copy data Sink. I have added the Azure SQL table as my sink.
In Mapping, add filename (new column) to the mappings.
When you run the pipeline, the ForEach activity runs the number of items in the Get Metadata activity.
Output:
So basically my issue is this, I will use metadata to get the names of the files from a source folder in the storage account in azure. I need to parse that name and insert it into a respective table. example below.
File name would be in this format. customer_GUIID_TypeOfData_other information.csv
i.e. 1c56d6s4s33s4_Sales_09112021.csv
156468a5s5s54_Inventory_08022021.csv
so these are 2 different customers and two different types of information.
the tables in SQL will be exactly that without the date. 156468a5s5s54_Inventory or 1c56d6s4s33s4_Sales
how can I copy the data from the CSV to the respective table depending on the file name? I will also need to insert or update existing rows in the destination table based on a unique identifier in the file dataset using AZURE Data Factory.
Get the file name using Get Metadata activity and copy data from CSV to Azure SQL table using Dataflow activity with Upsert enable.
Input blob files:
Step1:
• Create a Delimiter Source dataset. Create a parameter for a filename to pass it dynamically.
• Create Azure SQL database Sink dataset and create a parameter to pass table name dynamically.
Source dataset:
Sink dataset:
Step2:
• Connect Source dataset to Get Metadata activity and pass “*.csv” as the file name to get a list of all file names from blob folder.
Output of Get Metada:
Step3:
• Connect the output of Get Metadata activity to ForEach loop, to load all the incoming Source files/data to Sink.
• Add expression to the items to get the child items from previous activity.
#activity('Get Metadata1').output.childitems
Step4:
• Add dataflow activity inside foreach loop.
• Connect Source to Source dataset.
Dataflow Source:
Step5:
• Connect Sink to Sink dataset.
• Enable Allow upsert to update if record exists based on unique key column.
Step6:
• Add AlterRow between source and sink to add condition for upsert.
• Upsert when unique key column is not null or is found.
Upsert if: isNull(id)==false()
Step7:
• In the ForEach loop, dataflow settings, add expressions for source filename and sink table name dynamically.
Src_file_name: #item().name
• As we are extracting the sink table name from the source file name. Split the file name based on underscore “_” and then combine 1st 2 strings to eliminate the date part.
Sink_tbname: #concat(split(item().name, '_')[0],'_',split(item().name, '_')[1])
Step8:
When the pipeline is run, you can see the loop executes the number of source files in the blob and loads data to respective tables based on the file name.