How to get the data from previous row in Azure data factory - azure

I am working on transforming data in Azure data factory
I have a source file that contains data like this:
ABC Code-01
DEF
GHI
JKL Code-02
MNO
I need to make the data looks like this to the sink file:
ABC Code-01
DEF Code-01
GHI Code-01
JKL Code-02
MNO Code-02

You can achieve this using Fill down concept available in Azure data factory. The code snippet is available here.
Note: The code snippet assumes that you have already added source transformation in data flow.
Steps:
Add source and link it with the source file (I have generated file with your sample data).
Edit the data flow script available on the right corner to add code.
Add the code snippet after the source as shown.
source1 derive(dummy = 1) ~> DerivedColumn
DerivedColumn keyGenerate(output(sk as long),
startAt: 1L) ~> SurrogateKey
SurrogateKey window(over(dummy),
asc(sk, true),
Rating2 = coalesce(Rating, last(Rating, true()))) ~> Window1
After adding the code in the script, data flow generated 3 transformations
a. Derived column transformation with a new dummy column with constant “1”
b. SurrogateKey transformation to generate Key value for each row starting with value 1.
c. Window transformation to perform window based aggregation. Here the code add predefined clause last() to take previous row not Null vale if current row value is NULL.
For more information on Window transformation refer - https://learn.microsoft.com/en-us/azure/data-factory/data-flow-window
As I am getting the values as single column in source, added additional columns in Derived column to split and store the single source column into 2 columns.
Substitute NULL values if column value is blank. If it is blank, last() clause will not recognize as NULL to substitute previous values.
case(length(dropLeft(Column_1,4)) >1, dropLeft(Column_1,4), toString(null()))
Preview of Derived column: Column_1 is the Source raw data, dummy is the column generated from the code snippet added with constant 1, Column1Left & Column1Right are to store the values after splitting (Column_1) raw data.
Note: Column1Right blank values are replaced with NULLs.
In windows transformation:
a. Over – This partition the source data based on the column provided. As there no other columns to uses as partition column, add the dummy column generated using derived column.
b. Sort – Sorts the source data based on the sort column. Add the Surrogate Key column to sort the incoming source data.
c. Window Column – Here, provide the expression to copy not Null value from previous rows only when the current value is Null
coalesce(Column1Right, last(Column1Right,true()))
d. Data preview of window transformation: Here, Column1Right data Null Values are replaced by previous not Null values based on the expression added in Window Columns.
Second derived column is added to concat Column1Left and Column1Right as single column.
Second Derived column preview:
A select transformation is added to only select required columns to the sink and remove unwanted columns (This is optional).
sink data output after fill down process.

Related

How to Flatten a semicolon Array properly in Azure Data Factory?

Context: I've a data flow that extracts data from SQL DB, when data comes is just one column with a string separated by tab, in order to manipulate the data properly, I've tried to separate every single column with its corresponding data:
Firstly, to 'rebuild' the table properly I used a 'Derived Column' activity replacing tab with semicolons instead (1)
dropLeft(regexReplace(regexReplace(regexReplace(descripcion,[\t],';'),[\n],';'),[\r],';'),1)
So, after that use 'split()' function to get an array and build the columns (2)
split(descripcion, ';')
Problem: When I try to use 'Flatten' activity (as here https://learn.microsoft.com/en-us/azure/data-factory/data-flow-flatten), is just not working and data flow throws me just one column or if I add an additional column in the 'Flatten' activity I just get another column with the same data that the first one:
Expected output:
column2
column1
column3
2000017
ENVASE CORONA CLARA 24/355 ML GRAB
PC13
2004297
ENVASE V FAM GRAB 12/940 ML USADO
PC15
Could you say me what i'm doing wrong, guys? thanks by the way.
You can use the derived column activity itself, try as below.
After the first derived column, what you have is a string array which can just be split again using derived schema modifier.
Where firstc represent the source column equivalent to your column descripcion
Column1: split(firstc, ';')[1]
Column2: split(firstc, ';')[2]
Column3: split(firstc, ';')[3]
Optionally you can select the columns you need to write to SQL sink

Need to add header and trailer record in a csv file - Azure Data factory

I am new to azure data factory and need to implement below logic using Azure Data Factory where we are transferring a csv file from source to destination with some transformation in the file.
Input file contains below data :
111|101|2019-02-04 21:04:57
222|202|2019-02-04 21:33:54
333|202|2019-02-04 20:23:55
Expected Output :
H|TestFile|currentDateTime------------ Need to add this header record. H and TestFile would be static
111|101|2019-02-04 21:04:57
222|202|2019-02-04 21:33:54
333|202|2019-02-04 20:23:55
T|03-------------------------------------- T is static value. Need to add total number of records here.
Can someone please help with this
Update:
After my series of tests, the final result I can get is as follows:
The structure overview is as follows:
I saved the header into a txt file.
source1 stores the source csv file, I set column name as Column_1 at Projection tab.
The source1 data preview is as follows:
At SurrogateKey1 activity, I key in Row_No as Key column and 1 as Start value.
At Window1 activity, select Row_No as Window column, then enter expression max(Row_No).
Window1 data preview is as follows, I can get the max value of the Row_No.
Use Pivot1 activity to switch from columns to rows, enter expression concat('T|',toString(max(Row_No),'00')) to get T|03.
Pivot1 activity data preview is as follows:
The settings of source2 is the same as source1.
At DerivedColumn1,
set column name: Column1 ,
set expression: concat(Column_1,'|',toString(currentTimestamp())).
At SurrogateKey2 activity, I key in Row_No as Key column and 2 as Start value.
SurrogateKey2 activity data preview is as follows:
At Select2 activity, filter the column which we want and give this column an alias.
Data preview is as follows:
headers stores the header info in a csv file. Set Column_1 as column name.
At SurrogateKey3 activity, I key in Row_No as Key column and 1 as Start value.
Union SurrogateKey3 activity with Select2 activity.
It will sort by Row_No column, so the title will be on the first line.
Then we can only select what we need via Select1 activity.
Select1 activity data preview is as follows:
Union Pivot1 activity and Select1 activity via Union2 activity.
The Union2 activity data preview is as follows:
After run debug, final csv file is as follows:

Tableau: Multiple columns in a filter

I have three numeric fields named A,B,C and wants them in a single filter in tableau and based on the one selected in that filter a line chart will be shown. For e.g. in filter Stages B column is selected and line chart of B is shown. Had it been column A selected then line chart of A would be displayed .
Pardon my way of asking question by showing a image. I just picked up learning tableau and not getting this trick any where.
Here is the snapshot of data
Create a (list) parameter named 'ABC'. With the values
A
B
C
Then create a calculated field
IF ABC = 'A' THEN [column_a]
ELSEIF ABC = 'B' THEN [column_b]
ELSEIF ABC = 'C' THEN [column_c]
END
Something like that should work for you. Check out Tableau training here. It's free, but you have to sign up for an account.
Another way without creating a calculated field. Just pivot the three columns to rows and your field on which you can apply filter is created. Let me show you
This is screenshot of input data
I converted three cols to pivots to get data reshaped like this
After renaming pivoted-fields column to Stages I can add directly this one to view and get my desired result.

Dynamically rename/map columns according to Azure Table data model convention

How would you dynamically rename/map columns according to Azure Table data model convention, property key name should follow C# identifiers. Since we cannot guarantee the columns coming to us conform to the standard, or when we get new columns in, that it is automatically fixed.
Example:
column_1 (something_in_parens), column with spaces, ...
returned...
column_1 something_in_parens, column_with_spaces, ...
The obvious solution might be to run a databricks python step in front of the Copy Data functionality, but maybe Copy Data is able to inflect the right schema?
columns = ["some Not so nice column Names", "Another ONE", "Last_one"]
​
new_columns = [x.lower().replace(" ", "_") for x in columns]
# returns ['some_not_so_nice_column_names', 'another_one', 'last_one']

How to keep Rows and Columns headers when applying operation using Matlab

I have a data set stored in an excel file, when i importing data using matlab function :
A=xlread(xls -filename)
matrix A only stored numeric values of my table.. and when i used another function such as:
B= readtable(xls-filename)
then table will view complete data include rows and columns headers but when i apply such operation on it like
Bnorm=normc(B)
its unable to perform normalization on it due to the rows and columns headers ..
my question are:
is there any way to avoid rows and columns header in table B.
is there any way to store rows and columns headers when read table using xlread function .. such that
column header = store first row in (xls-filename)
row headers = store first column in (xls-filename)
thanks for any suggestion
dataset table
normalized matrix when apply xlread(xls-filename
The answers to your specific questions are:
With a table, you can avoid row labels but column labels always exist.
As per the doc for xlsread, the first output is the numeric data, and the second output is the text data, which in this case would include your header information.
But, in this case, you just need to learn how to work with tables properly. You want something like,
>> Bnorm = normc(B{:,2:end});
which extracts all the numeric elements of table B and uses them as input to normc.
If you want the result to be a table then use
Bnorm = B;
Bnorm{:,2:end} = normc(B{:,2:end}));

Resources