How do you select just one item for an object in an object via Select in Azure Data Factory
{
"CorrelationId": 123,
"ComponentInfo": {
"ComponentId": "1",
"ComponentName": "testC"
}
}
I have a join1 step in my ADF as such and Inspect and see results in that step:
But when I select just the two I need the Data Preview errors out:
Column source1#ComponentInfo not found. The stream is either not connected or column is unavailable
The Select is set as such:
source1#{source1#ComponentInfo}.ComponentName
What is wrong with my selecting ComponentName since it is an object - the selected method was selected from a drop down. I have tried to flatten the data but it is not an array and modify the schema but not sure if I am researching the right select object method.
I reproduced this with above sample data and used select transformation after join. I got the same error as above.
Here, the select may looking the source1#ComponentInfo as column which is an object in this case.
You can get the desired result using derived column transformation.
After join, use derived column and create two new columns for the required columns and give like below in the data flow expression from the input schema.
ComponentName column:
CorrelationId column:
You can see the result in the Data preview.
Then, you can filter the required columns using select transformation.
Result:
Related
I have a source table and I want to use the update policy to copy the data from the source table to the target table but with an additional column that its value is obtained from an external table.
In the update policy I should copy the row from the origin table and look for the value I need to add in the external table through an Id.
I could easily use a view (join) but the problem is performance so I would like to have a table in ADX with the values I need.
The question is how the function should look like to copy the row from the source table to the target table by adding the column from the external table.
Update: it seems that an external table cannot be used in an Update Policy. I get an error when I try to define it
Exception: Request is invalid and cannot be processed: Semantic error: SEM0457: external_table(): usage is not allowed in this context.
your update policy command might look like this:
.alter table table2 policy update
#'[{"IsEnabled": true, "Source": "table1", "Query": "MyFunctionThatDoesTheLookup(table1, table2)", "IsTransactional": true, "PropagateIngestionProperties": false}]'
This means the table is getting data from table1. In between, you use a function. The function would look like this
.create-or-alter function
MyFunctionThatDoesTheLookup
(
raw_table:(id:string,col1:string,col2:int),
target_table:(id:string,col1:string,col2:int, col3:int)
)
{
let mylookuptable = lookuptablename;
raw_table
| join kind=left mylookuptable | project col3 on id
}
I created a Azure Table storage with this format:
ID
Table
1
table1
2
table2
3
table3
4
table4
and in the DataFactory I want to create a query using this table filtering with the IN operator, for example: ID in ('1', '2')
I'm using a Lookup Activity and I created this array parameter:
array parameter
and I want to use it in the query, for example: #ID in pipeline().parameters.Id
image:
data factory - lookup activity query
Does anyone know a way to filter and help? Thanks!
AFAIK, Currently IN and Contains are not supported in Azure Table Storage Queries.
If your array values are less, you can directly give the query like Id eq '1' or Id eq '2' in lookup which will give the following result.
If your array is large, you can try the below approach using a ForEach.
Lookup activity:
ForEach:
Give the Lookup output value.
Inside ForEach Store the item() in a variable as string:
Inside if condition give the following expression:
#contains(pipeline().parameters.arr,item().Id)
Here arr is an array parameter with ["1","2"] values.
In true activities append the Variable value to an array variable:
Result stored in another variable from res1 variable (optional and only for showing result here):
I have a data flow that is doing the following:
Setting date to today>Copy from API to temp table>Lookup ID in source table>Lookup ID in temp table against source table to only pass DeDupe values>ForEach for each ID that passes DeDupe validation>Copy task in ForEach that sequentially picks up each ID from the second lookup and store API data into Azure SQL.
I have all of my logic working, except the final task to use a select statement = to the ID of the DeDupe lookup:
Overall data pipeline
ForEach value being passed
Copy task in ForEach picking up the current item
Input for the copy task each sequence
Output for the copy task each sequence
Notice in the last images, the input is generating the column name instead of just the value into the SQL statement. How do I prevent this, so I can pass just the value that's already defined in literals to the SQL statement? The idea here is to DeDupe values each run since this will be scripted to look into the future for reservation data.
Add column name along with current item in copy activity like #item().UID and use concat() function to combine current item with string value as shown below.
Output of lookup:
Copy data activity source query inside ForEach activity:
#concat('select * from tb1 where id = ',item().id)
Copy data Source Input:
Context: I've a data flow that extracts data from SQL DB, when data comes is just one column with a string separated by tab, in order to manipulate the data properly, I've tried to separate every single column with its corresponding data:
Firstly, to 'rebuild' the table properly I used a 'Derived Column' activity replacing tab with semicolons instead (1)
dropLeft(regexReplace(regexReplace(regexReplace(descripcion,[\t],';'),[\n],';'),[\r],';'),1)
So, after that use 'split()' function to get an array and build the columns (2)
split(descripcion, ';')
Problem: When I try to use 'Flatten' activity (as here https://learn.microsoft.com/en-us/azure/data-factory/data-flow-flatten), is just not working and data flow throws me just one column or if I add an additional column in the 'Flatten' activity I just get another column with the same data that the first one:
Expected output:
column2
column1
column3
2000017
ENVASE CORONA CLARA 24/355 ML GRAB
PC13
2004297
ENVASE V FAM GRAB 12/940 ML USADO
PC15
Could you say me what i'm doing wrong, guys? thanks by the way.
You can use the derived column activity itself, try as below.
After the first derived column, what you have is a string array which can just be split again using derived schema modifier.
Where firstc represent the source column equivalent to your column descripcion
Column1: split(firstc, ';')[1]
Column2: split(firstc, ';')[2]
Column3: split(firstc, ';')[3]
Optionally you can select the columns you need to write to SQL sink
I am working on transforming data in Azure data factory
I have a source file that contains data like this:
ABC Code-01
DEF
GHI
JKL Code-02
MNO
I need to make the data looks like this to the sink file:
ABC Code-01
DEF Code-01
GHI Code-01
JKL Code-02
MNO Code-02
You can achieve this using Fill down concept available in Azure data factory. The code snippet is available here.
Note: The code snippet assumes that you have already added source transformation in data flow.
Steps:
Add source and link it with the source file (I have generated file with your sample data).
Edit the data flow script available on the right corner to add code.
Add the code snippet after the source as shown.
source1 derive(dummy = 1) ~> DerivedColumn
DerivedColumn keyGenerate(output(sk as long),
startAt: 1L) ~> SurrogateKey
SurrogateKey window(over(dummy),
asc(sk, true),
Rating2 = coalesce(Rating, last(Rating, true()))) ~> Window1
After adding the code in the script, data flow generated 3 transformations
a. Derived column transformation with a new dummy column with constant “1”
b. SurrogateKey transformation to generate Key value for each row starting with value 1.
c. Window transformation to perform window based aggregation. Here the code add predefined clause last() to take previous row not Null vale if current row value is NULL.
For more information on Window transformation refer - https://learn.microsoft.com/en-us/azure/data-factory/data-flow-window
As I am getting the values as single column in source, added additional columns in Derived column to split and store the single source column into 2 columns.
Substitute NULL values if column value is blank. If it is blank, last() clause will not recognize as NULL to substitute previous values.
case(length(dropLeft(Column_1,4)) >1, dropLeft(Column_1,4), toString(null()))
Preview of Derived column: Column_1 is the Source raw data, dummy is the column generated from the code snippet added with constant 1, Column1Left & Column1Right are to store the values after splitting (Column_1) raw data.
Note: Column1Right blank values are replaced with NULLs.
In windows transformation:
a. Over – This partition the source data based on the column provided. As there no other columns to uses as partition column, add the dummy column generated using derived column.
b. Sort – Sorts the source data based on the sort column. Add the Surrogate Key column to sort the incoming source data.
c. Window Column – Here, provide the expression to copy not Null value from previous rows only when the current value is Null
coalesce(Column1Right, last(Column1Right,true()))
d. Data preview of window transformation: Here, Column1Right data Null Values are replaced by previous not Null values based on the expression added in Window Columns.
Second derived column is added to concat Column1Left and Column1Right as single column.
Second Derived column preview:
A select transformation is added to only select required columns to the sink and remove unwanted columns (This is optional).
sink data output after fill down process.