Splitting a column into rows on ADF - azure

Hello I am getting rows like below.
ID, NAME,EMAIL,PHONENUMBER
123,ABC, qwe#poi.com|asd#lkj.com, 3636|7363
234,DEF,sjs#djd.com|sndir#fmei.com|cmrjje#fmcj.com,5845|4958|5959
The each person can have multiple emails and phone numbers, separated by |. First email and first phone are linked. Second email and second phone are linked. So they need to be in same records. Can I split this record to multiple rows with one email and one phone per record?

We need to use data flow to achieve that. I created a test, the overall architecture and debug result is as follows:
My source dataset is a text file in Azure data lake gen2.Source1 and Source2 use this same data source.
At DerivedColumn1 activity, we can select the EMAIL column and enter expression split(EMAIL,'|') to split this column to an Array.
At Flatten1 activity, select EMAIL[] as Unroll by and Unroll root.
At SurrogateKey1 activity, enter ROW_NO and start value 1.
The data preview is as follows:
Source2 is the same as Source1, so we jump to DerivedColumn2 activity, we can select the PHONENUMBER column and enter expression split(PHONENUMBER,'|') to split this column to an Array.
At Flatten2 activity, select PHONENUMBER[] as Unroll by and Unroll root.
At SurrogateKey2 activity, enter ROW_NO and start value 1. The data preview is as follows:
At Join1 activity, we can Inner join these two data flows with the key column ROW_NO.
The data preview is as follows:
At Select1 activity, we can select the columns what we need.
The data preview is as follows:
Then we can sink the result to our destination.
That's all.

Related

Delete bottom two rows in Azure Data Flow

I would like to delete the bottom two rows of an excel file in ADF, but I don't know how to do it.
The flow I am thinking of is this.
enter image description here
*I intend to filter -> delete the rows to be deleted in yellow.
The file has over 40,000 rows of data and is updated once a month. (The number of rows changes with each update, so the condition must be specified with a function.)
The contents of the file are also shown here.
The bottom two lines contain spaces and asterisks.
enter image description here
Any help would be appreciated.
I'm new to Azure and having trouble.
I need your help.
Add a surrogate key transformation to put a row number on each row. Add a new branch to duplicate the stream and in that new branch, add an aggregate.
Use the aggregate transformation to find the max() value of the surrogate key counter.
Then subtract 2 from that max number and filter for just the rows up to that max-2.
Let me provide a more detailed answer here ... I think I can get it in here without writing a separate blog.
The simplest way to filter out the final 2 rows is a pattern depicted in the screenshot here. Instead of the new branch, I just created 2 sources both pointing to the same data source. The 2nd stream is there just to get a row count and store it in a cached sink. For the aggregation expression I used this: "count(1)" as the row count aggregator.
In the first stream, that is the primary data processing stream, I add a Surrogate Key transformation so that I can have a row number for each row. I called my key column "sk".
Finally, set the Filter transformation to only allow rows with a row number <= the max row count from the cached sink minus 2.
The Filter expression looks like this: sk <= cachedSink#output().rowcount-2

can we select columns which starts with particular string in ADF Dataflow?

So I have a data and I have to select columns which starts with "Rain" using dataflow.
Is there a way we can do that?
You can achieve this using select transformations. The following is a demonstration of how you can achieve this.
The following are the columns taken as a the source.
After adding the source, use select transformation. In this, remove all the mapped columns. Click on Add Mapping -> Rule-based Mapping
In this rule-based mapping, you can use the condition to select the column names starting with Rain. The rule is startsWith(name,'Rain') and the output column name is $$ (indicates the same name as source column name)
You can inspect the output where you can see that only column with name starting with Rain are selected.

Excel2016: Generate ID based on multiple criteria (no VBA)

I am trying to generate Batch ID based on Course, Date, & Time. All the rows which have the same Course+Date+Time combination should have the same Batch ID. All subsequent combinations should have incremental IDs
Batch ID = LEFT(C2,3)&TEXT(<code formula>,"000")
No VBA, only Excel 2016 formula, please.
Sample data snapshot
Bit of a stretch but try in F2:
=IF(COUNTIFS(C$2:C2,C2,D$2:D2,D2,E$2:E2,E2)>1,LOOKUP(2,1/((C$1:C1=C2)*(D$1:D1=D2)*(E$1:E1=E2)),F$1:F1),UPPER(LEFT(C2,3))&TEXT(MAX(IFERROR((LEFT(F$1:F1,3)=LEFT(C2,3))*RIGHT(F$1:F1,3),0))+1,"000"))
Enter through CtrlShiftEnter
It would be easier and more readable to meet your requirement in three steps, rather than a single formula.
Create a unique ID based on the Course, Date and Time.
Formula:
=CONCATENATE(UPPER(LEFT($C3,3)),TEXT($D3,"ddmmyy"),TEXT($E3,"hhmm"))
Breakdown:
LEFT($C3,3) - take the first three characters of the Course
UPPER() = make the first three characters of the Course uppercase
TEXT($D3,"ddmmyy") = take the date, turn it into text and apply a format
TEXT($E3,"hhmm") = take the time, turn it into text and apply a format
Create a lookup table of the unique ID and Batch ID
Copy all the unique Ids that have been created in step 1
Paste them into a new column separate to your data
On the Data menu tab, select Remove Duplicates in the data tools
Add the Batch ID to lookup.
This way the Batch ID can be generate via formula if the Unique Id's are sorted using Sort A to Z.
See the attached image.
Lookup the unique ID to get the Batch ID
=VLOOKUP($F3,$I$3:$J$7,2,FALSE)

Return count in ADF Data Flow

I have an ADF Data Flow that outputs 2 sets of values (Name, Location) as shown below:
Is there a way to output the count of Names in each Location via ADF Data Flow?
You can do it with Aggregate action. I tested it with your data.
Start with Aggregate action's Group by section, add location as group by columns.
Mention aggregated column name in the Columns and count(name) as aggregate expression.
Verify the aggregate's result in Aggregate's Data preview

How to make a dynamic list from a multi variable table with boolean values?

Please help, very stumped with this one.
I've been provided with a matrix table as seen in image attached. I have a list of users, as well as the courses that they need to be enrolled in.
I'm trying to create a list that will repeat the usernames for every course that they're enrolled in (in column A, and then a list of the courses they are enrolled in (column B)
So far, I've been able to create such a list by creating a pivot table from this data, double clicking the grand total, and sorting the results, but this is a very manual process, and it needs to be replicated by others.
Is that possible?
Google Drive link: https://drive.google.com/file/d/1zXsWZCguia-SLaYAP-81kMX819879zzX/view?usp=sharing
This is just something to get you started:
Convert your data to an Excel table
Steps:
In Excel:
Select the data range (starting from row 3)
Press Ctrl + T
Select Data | Get data from table
Transform your data in Power Query
In Power Query:
Select the columns headers (from FirstName to User)
Right click them
Select UnPivot other columns

Resources