I'm querying an API using Azure Data Factory and the data I receive from the API looks like this.
{
"96":"29/09/2022",
"95":"31/08/2022",
"93":"31/07/2022"
)
When I come to write this data to a table, ADF assumes the column names are the numbers and the dates are stored as rows like this
96
95
93
29/09/2022
31/08/2022
31/07/2022
when i would like it to look like this
Date
ID
29/09/2022
96
31/08/2022
95
31/07/2022
93
Does any one have any suggestions on how to handle this, I ideally want to avoid using USP's and dynamic SQL. I really only need the ID for the month of the previous one we're in.
PS - API doesn't support any filtering on this object
Updates
I'm querying the API using a web activity and if i try to store the data to an Array variable the activity fails as the output is an object.
When I use a copy data activity I've set the sink to automatically create the table and the mapping looks likes this
mapping image
Thanks
Instead of directly trying to copy the JSON response to SQL table, convert it the response to a string, extract the required values and insert them into the SQL table.
Look at the following demonstration. I have taken the sample response provided as a parameter (object type). I used set variable activity for extracting the values.
My parameter:
{"93":"31/07/2022","95":"31/08/2022","96":"29/09/2022"}
Dynamic content used in set variable activity:
#split(replace(replace(replace(replace(string(pipeline().parameters.api_op),' ',''),'"',''),'{',''),'}',''),',')
The output for set variable activity will be:
Now inside For each activity (pass the previous variable value as items value in for each), I used copy data to copy each row separately to my sink table (Auto create option enabled). I have taken a sample json file as my source (We are going to ignore all the columns anyway.)
Create the required 2 additional columns called id and date with the following dynamic content:
#id
#split(item(),':')[0]
#date
#split(item(),':')[1]
Configure the sink. Select the database, create dataset, give a name for table (I have given dbo.opTable) and select Auto create table under sink settings.
The following is an image of mapping. Delete the column mappings which are not required and only use additional columns created above.
When I debug the pipeline, it will run successfully, and the required values are inserted into the table. The following is output sink table for reference.
Related
I have metadata in my Azure SQL db /csv file as below which has old column name and datatypes and new column names.
I want to rename and change the data type of oldfieldname based on those metadata in ADF.
The idea is to store the metadata file in cache and use this in lookup but I am not able to do it in data flow expression builder. Any idea which transform or how I should do it?
I have reproduced the above and able to change the column names and datatypes like below.
This is the sample csv file I have taken from blob storage which has meta data of table.
In your case, take care of new Data types because if we don't give correct types, it will generate error because of the data inside table.
Create dataset and give this to lookup and don't check first row option.
This is my sample SQL table:
Give the lookup output array to ForEach.
Inside ForEach use script activity to execute the script for changing column name and Datatype.
Script:
EXEC SP_RENAME 'mytable2.#{item().OldName}', '#{item().NewName}', 'COLUMN';
ALTER TABLE mytable2
ALTER COLUMN #{item().NewName} #{item().Newtype};
Execute this and below is my SQL table with changes.
I'm facing a pretty interesting task to convert an arbitrary CSV file to a JSON structure following this schema:
{
"Data": [
["value_1", "value_2"],
["value_3", "value_4"]
]
}
In this case, the input file will look like this:
value_1,value_2
value_3,value_4
The requirement is to use Azure Data Factory and I won't be able to delegate this task to Azure Functions or other services.
I'm thinking about using 'Copy data' activity but can't get my mind around the configuration. TabularTranslator seems to only work with a definite number of columns but the CSV that I can receive can contain any number of columns.
Maybe DataFlows can help me but their setup doesn't look to be an easy one either. Plus, if I get it correctly, DataFlows take more time to start up.
So, basically, I just need to take the CSV content and put it into "Data" 2d array.
Any ideas on how to accomplish this?
To achieve this requirement, using Copy data or TabularTranslator is complicated. This can be achieved using dataflows in the following way.
First create a source dataset using the following configurations. This allows us to read entire row as a single column value (string):
Import the projection and name the column as data. The following is how the data preview looks like:
Now, first split these column values using split function in derived column transformations. I am replacing the same column using split(data,',').
Then, I have added a key column with a constant value 'x' so that I can group all rows and covert the grouped data into array of arrays.
The data would look like this after the above step:
Use aggregate transformation to group by the above created column and use collect aggregate function to create array of arrays (collect(data)).
Use select transformation to select only the above created column Data.
Finally, in the sink, select your destination and create a sink JSON dataset. Choose output to single file in settings and give a file name.
Create dataflow pipeline activity and run the above dataflow. The file will be created, and it looks like the following:
I am creating a pipeline using ADF to copy the data in a XML file to a SQL database. I want this pipeline to be triggered when the XML file is uploaded to Blob Storage. Therefore, here I will be using a parameter with the input Dataset.
Now, in the Copy Data activity that I am using, I want to be able to define the mappings. This is usually quite easy when the path to the file is given, however, in this situation, where a parameter is being used, how can I do this?
From what I have gathered, the mappings can be defined as a JSON schema and assigned to the activity, but is there perhaps an easier way to do this? Maybe by uploading a demo file from which the schema can be imported?
When you want to load a xml file into sql DB you are using a Hierarchical source to tabular sink method.
When copying data from hierarchical source to tabular sink, copy activity supports the following capabilities:
Extract data from objects and arrays.
Cross apply multiple objects with the same pattern from an array, in which case to convert one JSON object into multiple records in tabular result.
You can define such mapping on Data Factory authoring UI:
On copy activity -> mapping tab, click Import schemas button to import both source and sink schemas. As Data Factory samples the top few objects when importing schema, if any field doesn't show up, you can add it to the correct layer in the hierarchy - hover on an existing field name and choose to add a node, an object, or an array.
Select the array from which you want to iterate and extract data. It will be auto populated as Collection reference. Note only single array is supported for such operation.
Map the needed fields to sink. Data Factory automatically determines the corresponding JSON paths for the hierarchical side.
Note: For records where the array marked as collection reference is empty and the check box is selected, the entire record is skipped.
Here I am using a sample XML file at source
If you notice here I have used a dataset parameter to which I will be assigning the file name value as obtained from trigger. And now I have placed it in the file name field for file path property in dataset connection.
Next I have created a pipeline parameter to hold the input obtained from trigger before assigning it to the dataset parameter.
Create storage event trigger
Click continue and you fill find a preview of all the files that are applicable for trigger conditions
When you have moved to next slide, if you have created pipeline parameter, which we have, you will see them there
Fill in the value as per your need. See the available system variables here Storage event trigger scope
Now, lets move to copy data activity, here you will find the dataset parameter, assign the pipeline parameter value to it.
Now move to sink tab in copy activity, since you want the source schema to be followed into sink, best way is to select to Auto create a table.
For which you have to make appropriate changes in sink dataset. Now, to configure sink dataset, for table choose edit and manually enter a name for table which does not already exist in your server i.e a new table will be created in this name in the sql server mentioned in sink. Make sure you clear all schema as you will be getting source schema in copy activity.
Back to mapping tab in copy activity, click on import schema and select the fields you want to copy to table. Additionally you can specify the data types and Collection reference is necessary.
Refer: Parameterize mapping
You can also switch to Advanced editor, in which case you can directly see and edit the fields' JSON paths. If you choose to add new mapping in this view, specify the JSON path.
So when a file is created in the storage a blob created event is triggered and pipeline runs
You can see the new table "dbo.NewTable" created under ktestsql and it has the data from xml as row.
I have a requirement where in I have a source file containing the Table Name(s) in Mapping Data Flow. Based on the Table Name in the file - there needs to be a dynamic query where column metadata, along with some other properties is retrieved from the data dictionary tables and inserted into a different sink table. The table name from the file would be used as a where condition filter.
Since there can be multiple tables listed in the input file (lets assume its a csv with only one column containing the table names), if we decide to use a cache sink for the file :
Is it possible to use the results of that cached sink in the Source transformation query in the same mapping data flow - as a lookup (from where the column metadata is being retrieved) and if Yes, how
What would be the best way to restrict data from the metadata table query based on this table name
Though of alternatively achieving this with a pipeline using For Each passing the table name as parameter to data flow, but in this case if there are 100 tables in the file, there would be 100 iterations and 100 times the cluster would need to be spun up. Please advise if this is wronf or there are better ways to achieve this
You would need to use option 3. Loop through the table names and pass each in as a parameter to the data flow to set the table name in the dataset.
ADF handles the cluster creation and teardown. All you have to worry about is whether you want to execute each sequentially or in parallel and how many. There are concurrency limits in ADF, so you should consider a batch count of 20 if you run in parallel.
I am trying to read data from csv to azure sql db using copy activity. I have selected auto create option for destination table in sink dataset properties.
copy activity is working fine but all columns are getting created with nvarchar(max) or length -1. I dont want -1 length as default length for my auto created sink table columns.
Does any one know how to change column length or create fixed length columns while auto table creation in azure data factory?
As you have clearly detailed, the auto create table option in the copy data activity is going to create a table with generic column definitions. You can run the copy activity initially in this way and then return to the target table and run T-SQL statements to further define the desired column definitions. There is no option to define table schema with the auto create table option.
ALTER TABLE table_name ALTER COLUMN column_name new_data_type(size);
Clearly, the new column definition(s) must match the data that has been initially copied via the first copy activity but this is how you would combine the auto create table option and resulting in clearly defined column definitions.
It would be useful if you created a Azure Data Factory UserVoice entry to request that the auto create table functionality include creation of column definitions as a means of saving the time needed to go back an manually alter the columns to meet specific requirements, etc.