POWERBI, DAX : concatenate strings, split, and keep substrings only once

POWERBI, DAX : concatenate strings, split, and keep substrings only once - string

I try to do the following:
I have a column with strings, each can have several substrings separated by a delimiter ":"
I need to Concatenate the column strings (I do a Filter here to keep only interresting lines)
Then split according to the delimiter ":"
keep the substrings only once, if they are repeated.
Example:
ColumnHeader
AA:BB:CC
BB:DD
DD:AA:EE
EE:AA:DD:BB
BB:EE
...
Expected result would be a unique string:
"AA:BB:CC:DD:EE"
How would you do this in DAX to fill a new column ?
I expected to find for/while loops in DAX like in Python ... but failed.
I Tried this:
List =
VAR SIn = ""
VAR SOut = ""
VAR Cursor = 0
VAR SList =
CONCATENATEX(
FILTER(ATable, ATable[Name] = CTable[Name]),
[ColumnHeader],
":")
VAR pos1 = FIND(":", SList, Cursor, len(SList))
VAR pos2 = FIND(":", SList, pos1, len(SList))
VAR elem = TRIM(MID(SList, pos1+1, pos2-pos1))
// following is not good but is what I would like to do:
VAR SOut = CONCATENATE(SOut, elem)
VAR SList = MID(SList, pos2, len(SList)-pos2)
VAR Cursor = pos2
// I need to loop ... but how ? ... as no for/while loops are possibles ?
Thanks for your help.
=====================================
I manage to tackle this thanks to the answers below.
I will still give a bigger data set for a better understanding of the global problem:
I have 2 tables:
TABLE_BY_ELEMENT
KEY GROUP LIST KEY_DATA
1 G1 AA:BB:FF 11
2 G1 CC:AA 22
3 G1 FF:DD:AA 33
4 G1 CC:DD:AA 44
5 G2 CC:FF:GG 55
6 G2 BB:AA 66
TABLE_BY_GROUP
GROUP GROUP_DATA
G1 1111
G2 2222
And I want to view the data like this:
RESULT_BY_GROUP
GROUP GROUP_DATA NewList
G1 111 AA:BB:FF:CC:DD
G2 222 CC:FF:GG:BB:AA
and also:
RESULT_ELEMENT
KEY LIST KEY_DATA
1 AA:BB:FF 11
2 CC:AA 22
3 FF:DD:AA 33
4 CC:DD:AA 44
5 CC:FF:GG 55
6 BB:AA 66
I hope is is easier to understand with this.

This isn't something DAX is suited for well. If you need to use DAX to make it into a dynamic measure, then you'll probably need to reshape your data to be more usable. For example,
ID ColumnHeader
1 AA
1 BB
1 CC
2 BB
2 DD
3 DD
3 AA
3 EE
...
You can do this split in the query editor using the Split Column > By Delimiter tool and choosing to split on the colon and expand into rows.
Once it's in this more usable format, you can work with it in DAX like this:
List = CONCATENATEX( VALUES('Table'[ColumnHeader]), 'Table'[ColumnHeader], ":" )
Borrowing logic from here, it's possible to do this purely in DAX, but I don't recommend this route.
List =
VAR LongString =
CONCATENATEX ( VALUES ( 'Table1'[ColumnHeader] ), Table1[ColumnHeader], ":" )
VAR StringToPath =
SUBSTITUTE ( LongString, ":", "|" )
VAR PathToTable =
ADDCOLUMNS (
GENERATESERIES ( 1, LEN ( StringToPath ) ),
"Item", PATHITEM ( StringToPath, [Value] )
)
VAR GroupItems =
FILTER (
SUMMARIZE ( PathToTable, [Item] ),
NOT ISBLANK ( [Item] )
)
RETURN
CONCATENATEX ( GroupItems, [Item], ":" )

Let your table looks like below-
Now try this below Advance Editor code in the Power Query Editor-
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcnS0cnKycnZWitWJVgKyXFzALBcXK6CMqyuY4+oK4gCFnJxgykAysQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ColumnHeader = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ColumnHeader", type text}}),
//--NEW STEPS STARTS FROM HERE
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
#"Reordered Columns" = Table.ReorderColumns(#"Added Index",{"Index", "ColumnHeader"}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Reordered Columns", "ColumnHeader", Splitter.SplitTextByDelimiter(":", QuoteStyle.Csv), {"ColumnHeader.1", "ColumnHeader.2", "ColumnHeader.3", "ColumnHeader.4"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"ColumnHeader.1", type text}, {"ColumnHeader.2", type text}, {"ColumnHeader.3", type text}, {"ColumnHeader.4", type text}}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type1", {"Index"}, "Attribute", "Value"),
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute", "Index"}),
#"Removed Duplicates" = Table.Distinct(#"Removed Columns"),
#"Sorted Rows" = Table.Sort(#"Removed Duplicates",{{"Value", Order.Ascending}}),
#"Added Index1" = Table.AddIndexColumn(#"Sorted Rows", "Index", 1, 1, Int64.Type),
#"Reordered Columns1" = Table.ReorderColumns(#"Added Index1",{"Index", "Value"}),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Reordered Columns1", {{"Index", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Reordered Columns1", {{"Index", type text}}, "en-US")[Index]), "Index", "Value", List.Max),
#"Merged Columns" = Table.CombineColumns(#"Pivoted Column",{"1", "2", "3", "4", "5"},Combiner.CombineTextByDelimiter(":", QuoteStyle.None),"Merged")
in
#"Merged Columns"
Here is the final output-

Here is code from Power Query Editor considering GROUP BY-
Create a new table RESULT_BY_GROUP with this below code-
let
Source = TABLE_BY_ELEMENT,
#"Removed Columns" = Table.RemoveColumns(Source,{"KEY", "KEY_DATA"}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Removed Columns", "LIST", Splitter.SplitTextByDelimiter(":", QuoteStyle.Csv), {"LIST.1", "LIST.2", "LIST.3"}),
#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"LIST.1", type text}, {"LIST.2", type text}, {"LIST.3", type text}}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type", {"GROUP"}, "Attribute", "Value"),
#"Removed Columns1" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"}),
#"Removed Duplicates" = Table.Distinct(#"Removed Columns1"),
#"Sorted Rows" = Table.Sort(#"Removed Duplicates",{{"GROUP", Order.Ascending}, {"Value", Order.Ascending}}),
#"Grouped Rows" = Table.Group(#"Sorted Rows", {"GROUP"}, {{"all", each _, type table [GROUP=nullable text, Value=text]}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "NewList", each [all][Value]),
#"Extracted Values" = Table.TransformColumns(#"Added Custom", {"NewList", each Text.Combine(List.Transform(List.Sort(_), Text.From), ":"), type text}),
#"Removed Columns2" = Table.RemoveColumns(#"Extracted Values",{"all"}),
#"Merged Queries" = Table.NestedJoin(#"Removed Columns2", {"GROUP"}, TABLE_BY_GROUP, {"GROUP"}, "TABLE_BY_GROUP", JoinKind.LeftOuter),
#"Expanded TABLE_BY_GROUP" = Table.ExpandTableColumn(#"Merged Queries", "TABLE_BY_GROUP", {"GROUP_DATA "}, {"TABLE_BY_GROUP.GROUP_DATA "}),
#"Renamed Columns" = Table.RenameColumns(#"Expanded TABLE_BY_GROUP",{{"TABLE_BY_GROUP.GROUP_DATA ", "GROUP_DATA"}}),
#"Changed Type1" = Table.TransformColumnTypes(#"Renamed Columns",{{"GROUP", type text}, {"NewList", type text}, {"GROUP_DATA", Int64.Type}})
in
#"Changed Type1"
Here is the final output-
You can easily visualize your second requirement for table RESULT_ELEMENT using your base table TABLE_BY_ELEMENT

Related

Unpivot Several Columns to Result in Two

I receive data that looks like this:
Name
01/01/2023
01/02/2023
Revenue
Revenue
Chris
1
3
£100
£300
Colin
5
8
£500
£800
Pete
2
5
£200
£500
Where name is self-explanatory, the next two columns are dates (in UK format) with the number of days worked in the period shown below, and the final two columns are revenue.
I want to modify this data in Power Query so it looks like this:
Name
Date
Work Days
Revenue
Chris
01/01/2023
1
£100
Chris
01/02/2023
3
£300
Colin
01/01/2023
5
£500
Colin
01/02/2023
8
£800
Pete
01/01/2023
2
£200
Pete
01/02/2023
5
£500
I thought this would be some kind of a pivot operation but I can't figure it out.
Any assistance will be gratefully received.
Thanks,
Chris

One simple way
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
Set0=List.FirstN(Table.ColumnNames(Source),1),
Set1= List.Combine({Set0,List.Alternate(Table.ColumnNames(Source),1,1)}),
Set2=List.Combine({Set0,List.Alternate(List.RemoveFirstN(Table.ColumnNames(Source),1),1,1)}),
Part1 = Table.SelectColumns(Source,Set1),
Part2 = Table.SelectColumns(Source,Set2),
Date1 = Table.AddColumn(Part1,"Date" , each Table.ColumnNames(Part1){1}),
Date2 = Table.AddColumn(Part2,"Date" , each Table.ColumnNames(Part2){1}),
Rename1 = Table.RenameColumns(Date1,{{Table.ColumnNames(Part1){2}, "Revenue"}, {Table.ColumnNames(Part1){1}, "Work Days"}}),
Rename2 = Table.RenameColumns(Date2,{{Table.ColumnNames(Part2){2}, "Revenue"}, {Table.ColumnNames(Part2){1}, "Work Days"}}),
combined = Rename1 & Rename2
in combined
or
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1, Int64.Type),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Added Index", {"Index", "Name"}, "Attribute", "Value"),
#"Added Custom" = Table.AddColumn(#"Unpivoted Other Columns", "Date", each if Text.Start([Attribute],3)="Rev" then null else [Attribute]),
#"Added Custom2" = Table.AddColumn(#"Added Custom", "count", each if Text.Start([Attribute],3)="Rev" then null else [Value]),
#"Added Index1" = Table.AddIndexColumn(#"Added Custom2", "Index.1", 0, 1, Int64.Type),
#"Inserted Modulo" = Table.AddColumn(#"Added Index1", "Modulo", each Number.Mod([Index.1], 2), type number),
#"Sorted Rows" = Table.Sort(#"Inserted Modulo",{{"Index", Order.Ascending}, {"Modulo", Order.Ascending}, {"Attribute", Order.Ascending}}),
#"Filled Down" = Table.FillDown(#"Sorted Rows",{"Date", "count"}),
x=Table.AlternateRows(#"Filled Down",0,1,1),
#"Removed Other Columns" = Table.SelectColumns(x,{"Name", "Value", "Date", "count"})
in #"Removed Other Columns"

Here's another way:
Using List.Generate, create a List of Tables using each Date/Revenue Pair.
For each of the tables, ensure the Revenue Column is named Revenue (and not Revenue2, Revenu3, etc) and then Unpivot the table.
Then expand the column that has the list of tables
The rest is "housekeeping"
Code Edited to provide for varying numbers of "First Columns" to be retained before the sets of Date and Revenue columns
*Change #"Retained Column Count" to reflect that number of Columns
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
colNames = List.Buffer(Table.ColumnNames(Source)),
//How many columns at the beginning are non data pairs
#"Retained Column Count" = 4,
#"First Columns" = List.FirstN(colNames,#"Retained Column Count"),
#"Date Columns" = List.Range(colNames,#"Retained Column Count",(List.Count(colNames)-#"Retained Column Count")/2),
#"Revenue Columns" = List.LastN(colNames,List.Count(#"Date Columns")),
//set data types
types = List.Transform(#"First Columns", each {_, type text}) &
List.Transform(#"Date Columns", each {_, Int64.Type}) &
List.Transform(#"Revenue Columns", each {_, Currency.Type}),
#"Changed Type" = Table.TransformColumnTypes(Source, types, "en-GB"),
//create a list of tables consisting of each date/revenue pair
// then unpivot each table
// ensure Revenue column has the same name throughout
#"Data Pairs" = List.Generate(
()=>[t=Table.SelectColumns(#"Changed Type",#"First Columns" & {#"Date Columns"{0}} & {#"Revenue Columns"{0}}), idx=0],
each [idx] < List.Count(#"Date Columns"),
each [t=Table.SelectColumns(#"Changed Type",#"First Columns" & {#"Date Columns"{[idx]+1}} & {#"Revenue Columns"{[idx]+1}}), idx=[idx]+1],
each Table.Unpivot(
Table.RenameColumns([t], {Table.ColumnNames([t]){#"Retained Column Count"+1},"Revenue"}),
{#"Date Columns"{[idx]}},"Date","Work Days")),
//to a table
// then combine the tables with column names in desired order
#"Converted to Table" = Table.FromList(#"Data Pairs", Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Expanded Column1" = Table.ExpandTableColumn(#"Converted to Table", "Column1", #"First Columns" & {"Date","Work Days","Revenue"}),
#"Changed Type with Locale" = Table.TransformColumnTypes(#"Expanded Column1",
List.Transform(#"First Columns", each {_, type text}) &
{{"Date", type date},
{"Work Days", Int64.Type},
{"Revenue", Currency.Type}}, "en-GB"),
#"Sorted Rows" = Table.Sort(#"Changed Type with Locale",{{"Name", Order.Ascending}, {"Date", Order.Ascending}})
in
#"Sorted Rows"

Dynamically pivot n rows with new row for multiple values

I wish to transform the input to the output shown below:
However for Columns D and K there are multiple values for D and K which causes and error:
M Code to replicate the above:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", Int64.Type}}),
#"Pivoted Column" = Table.Pivot(#"Changed Type", List.Distinct(#"Changed Type"[Column1]), "Column1", "Column2")
in
#"Pivoted Column"
I have also attempted to add an index column so that each data point is unique but this leads to further problems.
So far I have actually grouped the data so that everything is contained in a single cell:
Current M Code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type1" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}}),
#"Grouped Rows" = Table.Group(#"Changed Type1", {"Column1"}, {{"Column2", each Text.Combine([Column2],"#(lf)"), type text}}),
#"Pivoted Column" = Table.Pivot(#"Grouped Rows", List.Distinct(#"Grouped Rows"[Column1]), "Column1", "Column2")
in
#"Pivoted Column"
However this requires the data to be split which is fine but I want to do this dynamically for n Columns i..e A-B or A-AZ and this kind of shifts the problems so that I have to dynamically Split n columns.
Input data:
Column1 Column2
A 1
B 1
C 2
D 3
D 3
D 1
E 2
F 1
G 2
H 1
I 2
J 3
K 1
K 2
L 1
M 2
N 3

A not unusual problem solved by a custom function. Check the link in the credits for an explanation:
Custom Function
//credit: Cam Wallace https://www.dingbatdata.com/2018/03/08/non-aggregate-pivot-with-multiple-rows-in-powerquery/
//Rename: fnPivotAll
(Source as table,
ColToPivot as text,
ColForValues as text)=>
let
PivotColNames = List.Buffer(List.Distinct(Table.Column(Source,ColToPivot))),
#"Pivoted Column" = Table.Pivot(Source, PivotColNames, ColToPivot, ColForValues, each _),
TableFromRecordOfLists = (rec as record, fieldnames as list) =>
let
PartialRecord = Record.SelectFields(rec,fieldnames),
RecordToList = Record.ToList(PartialRecord),
Table = Table.FromColumns(RecordToList,fieldnames)
in
Table,
#"Added Custom" = Table.AddColumn(#"Pivoted Column", "Values", each TableFromRecordOfLists(_,PivotColNames)),
#"Removed Other Columns" = Table.RemoveColumns(#"Added Custom",PivotColNames),
#"Expanded Values" = Table.ExpandTableColumn(#"Removed Other Columns", "Values", PivotColNames)
in
#"Expanded Values"
Main Code
let
//change next line to reflect your actual data source
Source = Excel.CurrentWorkbook(){[Name="Table24"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", Int64.Type}}),
pivot = fnPivotAll(#"Changed Type","Column1","Column2")
in
pivot
Results from your data

This seems to work fine (a) group and add index inside group (b) expand (c) pivot
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Grouped Rows" = Table.Group(Source, {"Column1"}, {{"data", each Table.AddIndexColumn(_, "Index", 0, 1, Int64.Type), type table}}),
#"Expanded data" = Table.ExpandTableColumn(#"Grouped Rows", "data", {"Column2", "Index"}, {"Column2", "Index"}),
#"Pivoted Column" = Table.Pivot(#"Expanded data", List.Distinct(#"Expanded data"[Column1]), "Column1", "Column2"),
#"Removed Columns" = Table.RemoveColumns(#"Pivoted Column",{"Index"})
in #"Removed Columns"

Transpose data based on the proper pattern

This is what I want the date to look like when everything is all done and I transpose the data.
Data
2 Witches Winery and Brewing Company
209 Trade Street
Danville, VA 24541-3545
Phone: (434) 549-2739
Type: Taproom
www.2witcheswinebrew.com
View Map
36 Fifty Brewing
120 N Chestnut St
Marion, VA 24354
Type: Taproom
View Map
6 Bears & A Goat Brewing Company, LLC
1140 International Pkwy
Fredericksburg, VA 22406-1126
Phone: 540-356-9056 Ext. 2
Type: Brewpub
www.6bgbrewingco.com
View Map
Each block of cells represents ONE brewery. I am trying to transpose and put this value into rows. Here is the problem…. Not all the values are in the correct place. The first 3 rows are always same for every single brewery. When it gets to the 4th row of each brewery, that is where it gets tricky. Not all the breweries have a phone, so transposing the data makes all the data not in the right spot. The type should typically be in the “5” row, but since there is no number, it is in the 4th row. About 20% of the data is like this. Anyone have any recommendations.
Apologies, edit forgot to add what I have tried, but it doesn't work as expected.
// Table2
let
Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Added Custom" = Table.AddColumn(Source, "Helper_Column", each if Text.Contains([Column1],"Phone:") then "1 #1" else null),
#"Removed Errors" = Table.RemoveRowsWithErrors(#"Added Custom", {"Helper_Column"}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Removed Errors", {{"Helper_Column", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Helper_Column"),
#"Added Custom1" = Table.AddColumn(#"Split Column by Delimiter", "Helper Column 1", each if [Helper_Column] = "#1" then null else [Column1]),
#"Removed Other Columns" = Table.SelectColumns(#"Added Custom1",{"Helper Column 1"}),
#"Added Custom2" = Table.AddColumn(#"Removed Other Columns", "Helper Column", each if Text.Contains([Helper Column 1],"View Map") then "1 #1" else null),
#"Replaced Errors" = Table.ReplaceErrorValues(#"Added Custom2", {{"Helper Column", null}}),
#"Split Column by Delimiter1" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Errors", {{"Helper Column", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Helper Column"),
#"Added Custom3" = Table.AddColumn(#"Split Column by Delimiter1", "Helper", each if [Helper Column] = "#1" then null else [Helper Column 1]),
#"Removed Other Columns1" = Table.SelectColumns(#"Added Custom3",{"Helper"}),
#"Added Index" = Table.AddIndexColumn(#"Removed Other Columns1", "Index", 0, 1, Int64.Type),
#"Inserted Modulo" = Table.AddColumn(#"Added Index", "Modulo", each Number.Mod([Index], 8), type number),
#"Integer-Divided Column" = Table.TransformColumns(#"Inserted Modulo", {{"Index", each Number.IntegerDivide(_, 8), Int64.Type}}),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Integer-Divided Column", {{"Modulo", type text}}, "en-IN"), List.Distinct(Table.TransformColumnTypes(#"Integer-Divided Column", {{"Modulo", type text}}, "en-IN")[Modulo]), "Modulo", "Helper")
in
#"Pivoted Column"

It depends on how realistic your example is. But the code below may help. It works on your posted data.
But you need to have unambiguous rules.
I derived some from your data and what you wrote, and noted them in the code comments. Of course, if your actual data doesn't follow these rules, the algorithm will not work. And if that is the case, you will have to modify the rules.
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
//assuming each group is contiguous lines
// with a blank line inbetween each group
// the below few lines will create a column on which to group
// then remove the "blank line between"
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1, Int64.Type),
#"Added Custom" = Table.AddColumn(#"Added Index", "group", each if [Column1] = null then [Index] else null, Int64.Type),
#"Filled Up" = Table.FillUp(#"Added Custom",{"group"}),
#"Removed Columns" = Table.RemoveColumns(#"Filled Up",{"Index"}),
#"Filtered Rows" = Table.SelectRows(#"Removed Columns", each ([Column1] <> null)),
//Group, then extract the data
#"Grouped Rows" = Table.Group(#"Filtered Rows", {"group"}, {
//Line one is always the brewery name
{"Brewery Name", each [Column1]{0}, type text},
//Lines 2 and 3 are always the address
{"Address Part 1", each [Column1]{1}, type text},
{"Address Part 2", each [Column1]{2}, type text},
//Phone number starts with "Phone:"
{"Phone", each List.Accumulate([Column1], "", (state, current)=>
if Text.StartsWith(current,"Phone:") then state & current else state), type text},
//Type starts with "Type:"
{"Type", each List.Accumulate([Column1], "", (state, current)=>
if Text.StartsWith(current,"Type:") then state & current else state), type text},
//Other 1 starts with "www."
{"Other 1", each List.Accumulate([Column1], "", (state, current)=>
if Text.StartsWith(current,"www.") then state & current else state), type text},
//Other 2 is the last line
{"Other 2", each List.Last([Column1]), type text}
}),
//Remove the grouper column
#"Removed Columns1" = Table.RemoveColumns(#"Grouped Rows",{"group"})
in
#"Removed Columns1"
Data
Results

How do I remove duplicates WITHIN a row using Power Query?

I have data that looks like this:
Wire
Point1
Point2
Point3
Point4
Point5
Point6
A
WP1
WP1
WP2
WP2
B
WP3
WP4
WP3
WP4
C
WP5
WP5
WP6
WP7
WP6
WP7
(note the varying lengths of each row, and the duplicates)
I would like to have the end result be:
Wire
Point1
Point2
Point3
A
WP1
WP2
B
WP3
WP4
C
WP5
WP6
WP7
Duplicates removed, and blank spaces removed.
This would be VERY similar to the =UNIQUE() function, but that is not available in power query.

It's a lot easier to work with columns, so I'd recommend unpivoting the Point columns, removing duplicates, and then putting it into the shape you want.
Here's a full query you can past into your Advanced Editor to look at each step more closely:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WclTSUQoPMEQijYAkBIHYsTrRSk5gtjGYNIGzYWpMwGqcwWxTJNIMTJqjsGNjAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Wire = _t, Point1 = _t, Point2 = _t, Point3 = _t, Point4 = _t, Point5 = _t, Point6 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Wire", type text}, {"Point1", type text}, {"Point2", type text}, {"Point3", type text}, {"Point4", type text}, {"Point5", type text}, {"Point6", type text}}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type", {"Wire"}, "Attribute", "Value"),
#"Filtered Rows" = Table.SelectRows(#"Unpivoted Other Columns", each ([Value] <> "")),
#"Removed Duplicates" = Table.Distinct(#"Filtered Rows", {"Wire", "Value"}),
#"Grouped Rows" = Table.Group(#"Removed Duplicates", {"Wire"}, {{"Point", each Text.Combine([Value],","), type text}}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Grouped Rows", "Point", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), {"Point1", "Point2", "Point3"})
in
#"Split Column by Delimiter"

Unpivot
Group by Wire
Aggregate into sorted List of Unique Points
Calculate Max number of items in all the Lists to use in the later Column Splitter
Extract the List of points into semicolon separated string
Split into new columns
M Code
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source, List.Transform(Table.ColumnNames(Source), each {_, Text.Type})),
//Unpivot
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type", {"Wire"}, "Attribute", "Value"),
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"}),
//Group by Wire
//Aggregate by sorted, unique list of Points for each Wire
#"Grouped Rows" = Table.Group(#"Removed Columns", {"Wire"}, {
{"Point", each List.Sort(List.Distinct([Value]))}}),
//Calculate the Max unique Points for any Wire (for subsequent splitting
maxPoints = List.Max(List.Transform(#"Grouped Rows"[Point], each List.Count(_))),
//Extract the List values into a semicolon separated list
#"Extracted Values" = Table.TransformColumns(#"Grouped Rows",
{"Point", each Text.Combine(List.Transform(_, Text.From), ";"), type text}),
//Then split into new columns using the semicolon delimiter
#"Split Column by Delimiter" = Table.SplitColumn(#"Extracted Values", "Point",
Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv),maxPoints)
in
#"Split Column by Delimiter"

Transpose table partially

I have a CSV-file containing information about some production batches. When loaded into Excels PowerQuery editor, the table looks like this:
Batch Date RawMaterial1 RawMaterial2 RawMaterial3 Amount1 Amount2 Amount3
123 01.01.2020 Fe Cr Ni 70 19 11
234 01.02.2020 Fe Cr Ni 72 17 9
To make this table more readable, I'm looking for a way to transpose it just partially to transform it into a format like this:
Batch Date RawMaterials Amounts
123 01.01.2020 Fe 70
Cr 19
Ni 11
234 01.02.2020 Fe 72
Cr 17
Ni 11
Is there a way to realize this with PowerQueryM alone?

This can be done quite a bit more simply:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],,
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Batch", Int64.Type}, {"Date", type date}, {"RawMaterial1", type text}, {"RawMaterial2", type text}, {"RawMaterial3", type text}, {"Amount1", Int64.Type}, {"Amount2", Int64.Type}, {"Amount3", Int64.Type}}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type", {"Batch", "Date"}, "Attribute", "Value"),
#"Split Column by Character Transition" = Table.SplitColumn(#"Unpivoted Other Columns", "Attribute", Splitter.SplitTextByCharacterTransition((c) => not List.Contains({"0".."9"}, c), {"0".."9"}), {"Type", "Index"}),
#"Pivoted Column" = Table.Pivot(#"Split Column by Character Transition", List.Distinct(#"Split Column by Character Transition"[Type]), "Type", "Value")
in
#"Pivoted Column"
Unpivot all but the first two columns.
Split the Attribute column into the text part and index part (in the GUI: Transform > Split Column > By Non-Digit to Digit).
Pivot back on the text part column (choose Don't Aggregate in the Pivot Column Advanced options).

This seems to work
Unpivots all columns but first two
Duplicate the data column
Change the column type to number to force an error on the non numerical columns
Change all errors into something recognizable, like 999999999999
Filter based on that into two tables, and add an index to each table
Merge the two tables together
Add new column, using index to see if Batch is same as prior row to eliminate duplicates
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Date", "Batch"}, "Attribute", "Value"),
#"Duplicated Column" = Table.DuplicateColumn(#"Unpivoted Other Columns", "Value", "Value - Copy"),
#"Changed Type" = Table.TransformColumnTypes(#"Duplicated Column",{{"Value - Copy", type number}}),
#"Replaced Errors" = Table.ReplaceErrorValues(#"Changed Type", {{"Value - Copy", 999999999999999}}),
#"Filtered Rows" = Table.SelectRows(#"Replaced Errors", each ([#"Value - Copy"] = 999999999999999)),
#"Filtered Rows2" = Table.SelectRows(#"Replaced Errors", each ([#"Value - Copy"] <> 999999999999999)),
Index1 = Table.AddIndexColumn(#"Filtered Rows", "Index", 0, 1),
Index2 = Table.AddIndexColumn(#"Filtered Rows2", "Index", 0, 1),
#"Merged Queries" = Table.NestedJoin(Index2,{"Index"},Index1,{"Index"},"Index3",JoinKind.LeftOuter),
#"Expanded Index3" = Table.ExpandTableColumn(#"Merged Queries", "Index3", {"Value"}, {"Value.1"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded Index3",{"Attribute", "Value - Copy"}),
#"Added Custom" = Table.AddColumn(#"Removed Columns", "Batch.1", each if [Index] = 0 then [Batch] else if #"Removed Columns"{[Index]-1}[Batch] = [Batch] then null else [Batch]),
#"Added Custom2" = Table.AddColumn(#"Added Custom", "Date.1", each if [Index] = 0 then [Date] else if #"Removed Columns"{[Index]-1}[Batch] = [Batch] then null else [Date]),
#"Removed Columns1" = Table.RemoveColumns(#"Added Custom2",{"Batch", "Date", "Index"}),
#"Reordered Columns" = Table.ReorderColumns(#"Removed Columns1",{"Batch.1", "Date.1", "Value.1", "Value"})
in #"Reordered Columns"

First of all a big thank you to #horseyride. I learned a lot from your suggested code. Sadly when I tried to add the date-column to the unpivot area as well I found a little flaw in the code. But thanks to the lessons I learned from it I was able to produce a slightly more generic version which basically follows the same algorithm.
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Date", type date}}),
#"Unpivot Columns" = Table.UnpivotOtherColumns(#"Changed Type", {"Batch"}, "Attribut", "Wert"),
Filter1 = Table.SelectRows(#"Unpivot Columns", each ([Attribut] <> "Amount1" and [Attribut] <> "Amount2" and [Attribut] <> "Amount3")),
ModFilter1 = Table.AddColumn(Filter1, "Benutzerdefiniert", each if [Attribut] = "Date" then [Attribut] else [Wert], type text),
Filter2 = Table.SelectRows(#"Unpivot Columns", each ([Attribut] <> "RawMaterial1" and [Attribut] <> "RawMaterial2" and [Attribut] <> "RawMaterial3")),
#"IndexFilter1" = Table.AddIndexColumn(ModFilter1, "Index", 0, 1),
#"IndexFilter2" = Table.AddIndexColumn(Filter2, "Index", 0, 1),
#"Join Filtered Indexes" = Table.NestedJoin(IndexFilter1,{"Index"},IndexFilter2,{"Index"},"IndexFilter2",JoinKind.LeftOuter),
#"Expand Joined Column" = Table.ExpandTableColumn(#"Join Filtered Indexes", "IndexFilter2", {"Wert"}, {"IndexFilter2.Wert"}),
#"Remove Columns" = Table.RemoveColumns(#"Expand Joined Column",{"Index", "Attribut", "Wert"}),
#"Rename Columns" = Table.RenameColumns(#"Remove Columns",{{"Benutzerdefiniert", "Attribut"}, {"IndexFilter2.Wert", "Wert"}})
in
#"Rename Columns"
I keep horseyrides answer checked as the right answer as he solves my initial question as it was.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

POWERBI, DAX : concatenate strings, split, and keep substrings only once - string

Related

Unpivot Several Columns to Result in Two

Dynamically pivot n rows with new row for multiple values

Transpose data based on the proper pattern

How do I remove duplicates WITHIN a row using Power Query?

Transpose table partially

Categories

Resources