ForEach activity to loop through an SQL parameters table? - azure

I'm working on an ETL pipeline in Azure Synapse.
In the previous version I used an Array set as a parameter of the pipeline and it contained JSON objects. For example:
[{"source":{"table":"Address"},"destination {"filename:"Address.parquet"},"source_system":"SQL","loadtype":"full"}}]
This was later used as the item() and I used a ForEach, switch, ifs and nested pipelines to process all the tables. I just passed down the item parameters to the sub pipelines and it worked fine.
My task is now to create a dedicated SQL pool and a table which stores parameters as columns. The columns are: source_table, destination_file, source_system and loadtype.
Example:
source_table
destination_file
source_system
loadtype
"Address"
"Address.parquet"
"SQL"
"full"
I don't know how to use this table in the ForEach activity and how to process the tables this way since this is not an Array.
What I've done so far:
I created the dedicated SQL pool and the following stored procedures:
create_parameters_table
insert_parameters
get_parameters
The get_parameters is an SQL SELECT statement but I don't know how to convert it in a way that could be used in the ForEach activity.
CREATE PROCEDURE get_parameters AS BEGIN SELECT source_table, destination_filename, source_system, load_type FROM parameters END
All these procedures are called in the pipeline as SQL pool stored procedure. I don't know how to loop through the tables. I need to have every row as one table or one object like in the Array.

Take a lookup activity in Azure data factory/ Synapse pipeline and in source dataset of lookup activity, take the table that has the required parameter details.
Make sure to uncheck the first row only check box.
Then take the for-each activity and connect it with lookup activity.
In settings of for-each activity, click add dynamic content in items and type
#activity('Lookup1').output.value
Then you can add other activities like switch/if inside the for-each activity.

Related

How azure data factory -Dataflow can create CRM related entities records in single transaction?

I am trying to implement Azure Data factory DATAFLOW to create CRM entity records into multiple entities in single transaction. If any error occurred in the second entity, then first entity record should be rollback. Please share your idea.
I tried with Json file as input with multiple hierarchy, representing multiple CRM entity. I used Data flow source json dataset and 3 CRM sinks. But, i am unable to achieve single transaction when an error occurred.
ADF does not support roll back option. You can have any Watermark column or flag in the target table which indicates the records which got inserted during the current pipeline run and delete only those records if any error occurred.
Watermark column is the column which can have the timestamp at which the row got inserted or it can be incrementing key. Before running the pipeline, maximum value of the watermark column is noted. Whenever the pipeline is failed, rows inserted after the maximum watermark value can be deleted.
Instead of deleting all records from current pipeline run, if records which are not copied in some entities only need to be deleted then based on the key field, we can delete the rows. Below is the approach.
Source1 and source2 are taken with entity1 and entity2 data respectively.
img1: entity1 data
img2: entity2 data
Id=6 is not copied to entity2. So, this should be deleted from entity1.
Exist transformation is added and left and right stream are given as source1 and 2 respectively. Exists type is doesn't exist. Exists conditions: source1#id = source2#id.
img3: exists transformation settings
Alter row transformation is added and condition is given as delete if true().
img4: Alter Row transformation settings
In sink settings, allow delete is selected and key column is selected as id.
Img5: Sink settings
img6: Sink data preview.
When pipeline with this dataflow is run, all rows which are in entity1 but not in entity2 are deleted.

How terminate pipelines in Azure Synapse when query returns no rows

I have a pipeline A that is invoke by a main pipeline D. It invokes 2 other pipelines B and C. When pipeline A is invoked an extraction query is executed that can return rows or nothing.
In case it returns no rows I would like it to terminate without sending an error message.
It should also terminate the main pipeline D. In other words pipelines B and C shouldn’t be invoked.
How can I invoke such a terminal activity in Azure Synapse? I would like to avoid using a Fail activity as it would be a false negative.
Since your child pipeline has the look up output count, and there is direct way to pass the count to master pipeline, you can consider changing the pipeline configuration.
Instead of using lookup to get the count, you can directly use a copy data activity and write the count of records to a new table.
You can get this data (new table data) using look up in master pipeline and perform the check (whether count is 0 or not).
Look at the following demonstration. I have a table with no records in my azure SQL database. In Pipeline A, I have used the following query as source of copy data activity and auto created a table in sink.
-- in source. Querying the required table for count
select count(*) as count from demo
Now in master pipeline, use an additional lookup to read records from above created count_val table. The output will be as follows:
Now you can use this count in if condition using the following dynamic content:
#equals(activity('Lookup1').output.value[0].count,0)
The condition will be true when the count is 0 (as show below) and hence the flow stops (no activities inside true case). If there are records, then the next pipelines will be executed (false case).

Substring of a global parameter ADF dynamic content

"#{concat(split(string(pipeline().globalParameters.DATABASE), 'JERICHO_'), ' Data Warehouse Load', ' ',substring(utcNow(),0 ,10 ))}"
"#{concat(substring(string(pipeline().globalParameters.DATABASE), 8), ' Data Warehouse Load', ' ',substring(utcNow(),0 ,10 ))}"
The full global parameter is JERICHO_DEV. However I will be publishing this to different environments with different database names (although JERICHO_ will be common in all). Is there anyway to standardise the database name above so that it takes the part after the _ regardless of how many characters it is?
If you want to concatenate the global parameter substring with custom names like that you can use an array variable for the custom names and generate different database names by using ForEach activity.
Please follow below steps after creating the global parameter:
First create an array variable with the Set Variable activity and give all custom names list in that array for example:
["Data Warehouse Load","AZURE SQL DB","SERVERLESS SQL"]
Set variable activity:
Then, connect this to a ForEach activity and give the items value as
#variables('dbnames') and check the Sequential.
ForEach activity:
Now, go to activities inside the ForEach and drag Append Variable activity. Click on it and create a new array variable in the variables section and give your dynamic content.
#concat(substring(string(pipeline().globalParameters.DATABASE),0, 8),item(),' ',substring(utcNow(),0 ,10 ))
Append variable activity dynamic content:
Now create another Set variable activity for the result array output and connect it with ForEach by creating new array variable and values like below. This is optional as I am creating this array for showing the output. You can use the array variable created in append activity as result.
#variables('res_variable')
Set variable activity for output:
Execute the Pipeline and you can see the global parameter DATABASE name common(JERICHO_) in all the database names in the output.
Output:

Azure Data Factory passing a parameter into a function (string replace)

I'm trying to use ADF to create azure table storage tables from one source SQL table.
Within my pipeline..
I can query a distinct list of customers pass this into a for-each
task.
Inside the for each select the data for each customer
But when I try to to create an Azure table for each customers data with the table name based on the customer ID I hit errors.
The customer ID is a GUID so I'm trying to format this to remove the dashes which are invalid in a table name...
Something along the lines of
#replace('#{item().orgid}','-','')
So 7fb6d90f-2cc0-40f5-b4d0-00c82b9935c4 becomes 7fb6d90f2cc040f5b4d000c82b9935c4
I can't seem to get the syntax right
Any ideas?
try this: #replace(item().orgid,'-','').

How to achieve dynamic columnmapping in azure data factory when Dynamics CRM is used as sink

I have a requirement where i need to pass the column mapping dynamically from a stored procedure to the copy activity. This copy activity will perform update operation in Dynamics CRM. The source is SQL server (2014) and sink is Dynamics CRM.
I am fetching the column mapping from a stored procedure using look up activity and passing this parameter to copy activity.'
When i directly provide the below mentioned json value as default value to the parameter, the copy activity is updating the mapped fields correctly.
{"type":"TabularTranslator","columnMappings":{"leadid":"leadid","StateCode":"statecode"}}
But when the json value is fetched from the SP , it is not working . I am getting the error ColumnName is read only.
Please suggest if any conversion is required on the output of the loopup activity before passing the parameter to copy activity. Below is the output of the lookup activity.
{\"type\":\"TabularTranslator\",\"columnMappings\":{\"leadid\":\"leadid\",\"StateCode\":\"statecode\"}}
Appreciate a quick turnaround.
Using parameter directly and Using lookup output are different. can you share how did you write the parameter from the output of lookup actvitiy.
you can refer to this doc https://learn.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity

Resources