How terminate pipelines in Azure Synapse when query returns no rows - azure

I have a pipeline A that is invoke by a main pipeline D. It invokes 2 other pipelines B and C. When pipeline A is invoked an extraction query is executed that can return rows or nothing.
In case it returns no rows I would like it to terminate without sending an error message.
It should also terminate the main pipeline D. In other words pipelines B and C shouldn’t be invoked.
How can I invoke such a terminal activity in Azure Synapse? I would like to avoid using a Fail activity as it would be a false negative.

Since your child pipeline has the look up output count, and there is direct way to pass the count to master pipeline, you can consider changing the pipeline configuration.
Instead of using lookup to get the count, you can directly use a copy data activity and write the count of records to a new table.
You can get this data (new table data) using look up in master pipeline and perform the check (whether count is 0 or not).
Look at the following demonstration. I have a table with no records in my azure SQL database. In Pipeline A, I have used the following query as source of copy data activity and auto created a table in sink.
-- in source. Querying the required table for count
select count(*) as count from demo
Now in master pipeline, use an additional lookup to read records from above created count_val table. The output will be as follows:
Now you can use this count in if condition using the following dynamic content:
#equals(activity('Lookup1').output.value[0].count,0)
The condition will be true when the count is 0 (as show below) and hence the flow stops (no activities inside true case). If there are records, then the next pipelines will be executed (false case).

Related

ForEach activity to loop through an SQL parameters table?

I'm working on an ETL pipeline in Azure Synapse.
In the previous version I used an Array set as a parameter of the pipeline and it contained JSON objects. For example:
[{"source":{"table":"Address"},"destination {"filename:"Address.parquet"},"source_system":"SQL","loadtype":"full"}}]
This was later used as the item() and I used a ForEach, switch, ifs and nested pipelines to process all the tables. I just passed down the item parameters to the sub pipelines and it worked fine.
My task is now to create a dedicated SQL pool and a table which stores parameters as columns. The columns are: source_table, destination_file, source_system and loadtype.
Example:
source_table
destination_file
source_system
loadtype
"Address"
"Address.parquet"
"SQL"
"full"
I don't know how to use this table in the ForEach activity and how to process the tables this way since this is not an Array.
What I've done so far:
I created the dedicated SQL pool and the following stored procedures:
create_parameters_table
insert_parameters
get_parameters
The get_parameters is an SQL SELECT statement but I don't know how to convert it in a way that could be used in the ForEach activity.
CREATE PROCEDURE get_parameters AS BEGIN SELECT source_table, destination_filename, source_system, load_type FROM parameters END
All these procedures are called in the pipeline as SQL pool stored procedure. I don't know how to loop through the tables. I need to have every row as one table or one object like in the Array.
Take a lookup activity in Azure data factory/ Synapse pipeline and in source dataset of lookup activity, take the table that has the required parameter details.
Make sure to uncheck the first row only check box.
Then take the for-each activity and connect it with lookup activity.
In settings of for-each activity, click add dynamic content in items and type
#activity('Lookup1').output.value
Then you can add other activities like switch/if inside the for-each activity.

How azure data factory -Dataflow can create CRM related entities records in single transaction?

I am trying to implement Azure Data factory DATAFLOW to create CRM entity records into multiple entities in single transaction. If any error occurred in the second entity, then first entity record should be rollback. Please share your idea.
I tried with Json file as input with multiple hierarchy, representing multiple CRM entity. I used Data flow source json dataset and 3 CRM sinks. But, i am unable to achieve single transaction when an error occurred.
ADF does not support roll back option. You can have any Watermark column or flag in the target table which indicates the records which got inserted during the current pipeline run and delete only those records if any error occurred.
Watermark column is the column which can have the timestamp at which the row got inserted or it can be incrementing key. Before running the pipeline, maximum value of the watermark column is noted. Whenever the pipeline is failed, rows inserted after the maximum watermark value can be deleted.
Instead of deleting all records from current pipeline run, if records which are not copied in some entities only need to be deleted then based on the key field, we can delete the rows. Below is the approach.
Source1 and source2 are taken with entity1 and entity2 data respectively.
img1: entity1 data
img2: entity2 data
Id=6 is not copied to entity2. So, this should be deleted from entity1.
Exist transformation is added and left and right stream are given as source1 and 2 respectively. Exists type is doesn't exist. Exists conditions: source1#id = source2#id.
img3: exists transformation settings
Alter row transformation is added and condition is given as delete if true().
img4: Alter Row transformation settings
In sink settings, allow delete is selected and key column is selected as id.
Img5: Sink settings
img6: Sink data preview.
When pipeline with this dataflow is run, all rows which are in entity1 but not in entity2 are deleted.

Azure databricks notebook call from azure data factory based on if/else flag

.
I am exec adb notebook in if/else condition from adf
I have a lookup which will check flag condition in delta lake table
SELECT COUNT(*) AS cnt FROM db.check where job_status = 2 and site ='xxx-xxx-xxx'. This will give me a count 2 and I used it in the if part condition #equals(activity('select job status').output.value[0],2) it should call adb notebook else logic app.
Issue After lookup pipline is not going inside if part?
Thank you #anuj for your solution. Posting it as an answer to help other community members.
To refer to lookup activity value of a column in later activities, include the column name in the expression as below.
#equals(activity('select job status').output.value[0].cnt,2)

How to create a audit table in azure data factory which will hold the status for the pipeline run in Azure Data Factory

I have a requirement where a Azure Data Pipeline is running and inside that we have a data flow where different tables are loaded from ADLS to Azure Sql Database. So the issue is I wanted to store the status of the pipeline like success or failure in an audit table as well as Primary Key column ID which is present in Azure SQL database table so that when I want to filter job I on the primary key like for which ID job is success I should get from the audit table.i managed to did something in stored procedure and store the status in a table but I am unable to add a column like ID .Below is the screen shot of pipeline.
The Report_id column is from the table which is loaded from Dataload pipeline.How to add that in audit table so the every time when a pipline runs Report_id is captured and stored in audit table
Audit Table where I want to add Report id
Any help will be appreciated.Thanks
The Data Flow must have a sink. So, after the Data Flow completes, you need to use a Lookup activity to get the value of that Report_Id from the sink. Then, you can set that to a variable and pass that into your Stored Procedure. (You could also just pass it directly to the Stored Procedure from the Lookup using the same expression you would use to set the variable.)

For Each with internal activities Azure Datafactory

Is there a way to send the number of times the cycle was executed?
that is, I have a For Each that executes an ExePipeline and it has 6 activities and only to the last activity I need to send it the number of times that the for each was executed.
at the end of For Each it shows how much data "ItemsCount" entered but I couldn't call that value in the last activity of the pipeline.
someone to help me thanks.
It may depend on what you're using for your items in the foreach activity.
But for example, if your foreach activity is looping over the content of a file that you have retrieved by a previous lookup activity - you can get the count of these items by for example:
#activity('Lookup activity name').output.count

Resources