Azure Datafactory: How to implement nested sql query in transformation data flow

Azure Datafactory: How to implement nested sql query in transformation data flow - azure

[![enter image description here][1]][1]
I have two streams customer and customercontact. I am new to azure data factory. I just want to know which activity in data flow transformation will achieve the below sql query result.
(SELECT *
FROM customercontact
WHERE customerid IN
(SELECT customerid
FROM customer)
ORDER BY timestamp DESC
LIMIT 1)
I can utilize Exist transformation for inner query but I am need some help on how I can fetch the first row after sorting customer contact data.So , basically I am looking for a way to add limit/Top/Offset clause in dataflow.

You can achieve transformation for a given query in data flow with different transformation.
For sorting you can use Sort transformation. Here you can select Order Ascending or descending.
For top few records you can use Rank transformation.
For “IN” clause you can use Exists transformation.
Refer - https://learn.microsoft.com/en-us/azure/data-factory/data-flow-rank
Here is my sample data in SQL as Source
I have used Rank transformation.
After rank transformation one more column i.e. RankColumn got added.
Now to select only top 1 record I have used Filter Row Modifier. I used equals(RankColumn,1) expression to select Top 1 record.
Now finally use Sink activity and run pipeline.

Related

Count of orders - ADF- aggregation

How to convert this CSV file input to the given output in ADLS using ADF?
Input data:
order_id,city,country
L10,Sydney,Australia
L11,Annecy,France
L12,Montceau,France
L13,Paris,France
L14,Montceau,Canada
L15,Ste-Hyacinthe,Canada
Output data:
COUNTRY,CITY,TOTAL_Order
Australia,Sydney,1
Australia,Total,1
Canada,Montréal,1
Canada,Ste-Hyacinthe,1
Canada,Total,2
France,Annecy,1
France,Montceau,1
France,Paris,1
France,Total,3
Total,Total,6
I want to find the count of order ids city wise and country wise using Data Flow. This is similar to roll-up aggregation.

Take three aggregate transforms in dataflow to do this. First is to calculate the count of orderid for every country and city combination. Second aggregate transform is to calculate the count of orderid for every country. Third aggregate transform is to calculate the count orderid for the full table. Below are the detailed steps.
Same input data is taken as source.
img:1 source data preview
Create two new additional branches by clicking + symbol near to Source transformation and click new branch.
In each branch add aggregate transformation.
Aggregate transformation1 settings:
group by : country, city
aggregates: total_order=count(order_id)
img:2 aggregate transform1 data preview
Aggregate transorm2 settings:
group by: country
aggregates: total_order=count(order_id)
img:3 aggregate transform 2 data preview.
Aggregate transorm3 settings: No column in group by.
group by:
aggregates: total_order=count(order_id)
img:4 aggregate transform3 data preview.
Next step is to union all these tables. Since all of these are not in the same structure, Add derived columns transformation to aggregate2 and aggregate3 and create columns with empty string.
Join aggregate1,derived1 and derived2 transformations data using Union transformation.
img:5 Data preview after all transformations.
img: 6 Complete dataflow with all transformations.

Pivot all table in Azure Data Factory - Mapping Data Flow

I am new to Azure and am trying to see if the below request is achievable with data factory I have my csv file with this sample data :
I have this result
My expected data/ result:
enter image description here
Which transformations would be helpful to achieve this?
enter image description here
Thanks.

Use the pivot transformation to create multiple columns from the
unique row values of a single column. Pivot is an aggregation
transformation where you select group by columns and generate pivot
columns using aggregate functions.
The pivot transformation requires three different inputs: group by columns, the pivot key, and how to generate the pivoted columns.
Refer Microsoft official document: Pivot transformation in mapping data flow

Pivoting based on Row Number in Azure Data Factory - Mapping Data Flow

I am new to Azure and am trying to see if the below result is achievable with data factory / mapping data flow without Databricks.
I have my csv file with this sample data :
I have following data in my table :
My expected data/ result:
Which transformations would be helpful to achieve this?
Thanks.

Now, you have the RowNumber column, you can use pivot activity to do row-column pivoting.
I used your sample data to made a test as follows:
My Projection tab is like this:
My DataPreview is like this:
In the Pivot1 activity, we select Table_Name and Row_Number columns to group by. If you don't want Table_Name column, you can delete it here.
At Pivote key tab, we select Col_Name column.
At Pivoted columns, we must select a agrregate function to aggregate the Value column, here I use max().
The result shows:
Please correct me if I understand you wrong in the answer.
update:
The data source like this:
The result shows as you saied, ADF sorts the column alphabetically.It seems no way to customize sorting:
But when we done the sink activity, it will auto mapping into your sql result table.

Return count in ADF Data Flow

I have an ADF Data Flow that outputs 2 sets of values (Name, Location) as shown below:
Is there a way to output the count of Names in each Location via ADF Data Flow?

You can do it with Aggregate action. I tested it with your data.
Start with Aggregate action's Group by section, add location as group by columns.
Mention aggregated column name in the Columns and count(name) as aggregate expression.
Verify the aggregate's result in Aggregate's Data preview

How do we create a generic mapping dataflow in datafactory that will dynamically extract data from different tables with different schema?

I am trying to create a azure datafactory mapping dataflow that is generic for all tables. I am going to pass table name, the primary column for join purpose and other columns to be used in groupBy and aggregate functions as parameters to the DF.
parameters to df
I am unable to refernce this parameter in groupBy
Error: DF-AGG-003 - Groupby should reference atleast one column -
MapDrifted1 aggregate(
) ~> Aggregate1,[486 619]
Has anyone tried this scenario? Please help if you have some knowledge on this or if it can be handled in u-sql script.

We need to first lookup your parameter string name from your incoming source data to locate the metadata and assign it.
Just add a Derived Column previous to your Aggregate and it will work. Call the column 'groupbycol' in your Derived Column and use this formula: byName($group1).
In your Agg, select 'groupbycol' as your groupby column.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Azure Datafactory: How to implement nested sql query in transformation data flow - azure

Related

Count of orders - ADF- aggregation

Pivot all table in Azure Data Factory - Mapping Data Flow

Pivoting based on Row Number in Azure Data Factory - Mapping Data Flow

Return count in ADF Data Flow

How do we create a generic mapping dataflow in datafactory that will dynamically extract data from different tables with different schema?

Categories

Resources