I'm trying to create a simple copy activity to copy data from a source Azure Table to a sink Mongo Cosmos DB but want to also output an extra column to the sink data where the content of the additional column is the run id (or something else that is dynamically set per run).
I can add the extra column easily by defining an additional column in the source schema but can't work out how to set the content (presumably it should be set in the activity), hence the value for the added field is always NULL in the output DB
Thanks
You can do the same, or something similar, and create a Dynamic select statement in your copy activity. So something like SELECT #{item().sourceTableCustomColumnList}, #pipeline().RunId FROM #{item().sourceTableName}
You may refer the MSDN thread which addressing similar issue.
Hope this helps.
Related
I am setting up a pipeline in data factory where the first part of the pipeline needs some pre-processing cleaning. I currently have a script set up to query these rows that need to be deleted, and export these results into a csv.
What I am looking for is essentially the opposite of an upsert copy activity. I would like the procedure to delete the rows in my table based on a matching row.
Apologies in advanced if this is an easy solution, I am fairly new to data factory and just need help looking in the right direction.
Assuming the source from which you are initially getting the rows is different from the sink
There are multiple ways to achieve it.
in case if the number of rows is less, we can leverage script activity or lookup activity to delete the records from the destination table
in case of larger dataset, limitations of lookup activity, you can copy the data into a staging table with in destination and leverage a script activity to delete the matching rows
in case if your org supports usage of dataflows, you can use that to achieve it
I'm querying an API using Azure Data Factory and the data I receive from the API looks like this.
{
"96":"29/09/2022",
"95":"31/08/2022",
"93":"31/07/2022"
)
When I come to write this data to a table, ADF assumes the column names are the numbers and the dates are stored as rows like this
96
95
93
29/09/2022
31/08/2022
31/07/2022
when i would like it to look like this
Date
ID
29/09/2022
96
31/08/2022
95
31/07/2022
93
Does any one have any suggestions on how to handle this, I ideally want to avoid using USP's and dynamic SQL. I really only need the ID for the month of the previous one we're in.
PS - API doesn't support any filtering on this object
Updates
I'm querying the API using a web activity and if i try to store the data to an Array variable the activity fails as the output is an object.
When I use a copy data activity I've set the sink to automatically create the table and the mapping looks likes this
mapping image
Thanks
Instead of directly trying to copy the JSON response to SQL table, convert it the response to a string, extract the required values and insert them into the SQL table.
Look at the following demonstration. I have taken the sample response provided as a parameter (object type). I used set variable activity for extracting the values.
My parameter:
{"93":"31/07/2022","95":"31/08/2022","96":"29/09/2022"}
Dynamic content used in set variable activity:
#split(replace(replace(replace(replace(string(pipeline().parameters.api_op),' ',''),'"',''),'{',''),'}',''),',')
The output for set variable activity will be:
Now inside For each activity (pass the previous variable value as items value in for each), I used copy data to copy each row separately to my sink table (Auto create option enabled). I have taken a sample json file as my source (We are going to ignore all the columns anyway.)
Create the required 2 additional columns called id and date with the following dynamic content:
#id
#split(item(),':')[0]
#date
#split(item(),':')[1]
Configure the sink. Select the database, create dataset, give a name for table (I have given dbo.opTable) and select Auto create table under sink settings.
The following is an image of mapping. Delete the column mappings which are not required and only use additional columns created above.
When I debug the pipeline, it will run successfully, and the required values are inserted into the table. The following is output sink table for reference.
I want to copy data from a CSV file (Source) on Blob storage to Azure SQL Database table (Sink) via regular Copy activity but I want to copy also file name alongside every entry into the table. I am new to ADF so the solution is probably easy but I have not been able to find the answer in the documentation and neither on the internet so far.
My mapping currently looks like this (I have created a table for output with the file name column but this data is not explicitly defined at the column level at the CSV file therefore I need to extract it from the metadata and pair it to the column):
For the first time, I thought that I am going to put dynamic content in there and therefore solve the problem this way. But there is not an option to use dynamic content in each individual box so I do not know how to implement the solution. My next thought was to use Pre-copy script but have not seen how could I use it for this purpose. What is the best way to solve this issue?
In Mapping columns of copy activity you cannot add the dynamic content of Meta data.
First give the source csv dataset to the Get Metadata activity then join it with copy activity like below.
You can add the file name column by the Additional columns in the copy activity source itself by giving the dynamic content of the Get Meta data Actvity after giving same source csv dataset.
#activity('Get Metadata1').output.itemName
If you are sure about the data types of your data then no need to go to the mapping, you can execute your pipeline.
Here I am copying the contents of samplecsv.csv file to SQL table named output.
My output for your reference:
I am trying to read data from csv to azure sql db using copy activity. I have selected auto create option for destination table in sink dataset properties.
copy activity is working fine but all columns are getting created with nvarchar(max) or length -1. I dont want -1 length as default length for my auto created sink table columns.
Does any one know how to change column length or create fixed length columns while auto table creation in azure data factory?
As you have clearly detailed, the auto create table option in the copy data activity is going to create a table with generic column definitions. You can run the copy activity initially in this way and then return to the target table and run T-SQL statements to further define the desired column definitions. There is no option to define table schema with the auto create table option.
ALTER TABLE table_name ALTER COLUMN column_name new_data_type(size);
Clearly, the new column definition(s) must match the data that has been initially copied via the first copy activity but this is how you would combine the auto create table option and resulting in clearly defined column definitions.
It would be useful if you created a Azure Data Factory UserVoice entry to request that the auto create table functionality include creation of column definitions as a means of saving the time needed to go back an manually alter the columns to meet specific requirements, etc.
I am trying to copy data from SAP Hana to Azure Data Lake Store (DLS) using a Copy Activity in a Data Pipeline via Azure Data Factory.
Our copy activity runs fine and we can see that rows made it from Hana to the DLS, but they don't appear to have column names (instead they are just given 0-indexed numbers).
This link says “For structured data sources, specify the structure section only if you want map source columns to sink columns, and their names are not the same.”
We are fine using the original column names from the SAP Hana table, so it seems like we shouldn't need to specify the structure section in our dataset. However, even when we do, we still just see numbers for column names.
We have also seen the translator property at this link, but are not sure if that is the route we need to go.
Can anyone tell me why we aren't seeing the original column names copied into DLS and how we can change that? Thank you!
UPDATE
Setting the firstRowAsHeader property of the format section on our dataset to true basically solved the problem. The console still shows the numerical indices, but now includes the headers we are after as the first row. Upon downloading and opening the file, we can see the numbers are not there (the console just shows them for whatever reason), and it is a standard comma-delimeted file with a header row and one row entry per line.
Example:
COLUMNA,COLUMNB
aVal1,bVal1
aVal2,bVal2
We can now tell our sources and sinks to write and expect this format when reading.
BONUS UPDATE:
To get rid of the numerical indices and see the proper column headers in the console, click Format in the top-left corner, and then check the "First row is a header" box toward the bottom of the resulting blade
See the update above.
The format.firstRowAsHeader property needed to be set to true