How do I remove duplicates WITHIN a row using Power Query? - excel

I have data that looks like this:
Wire
Point1
Point2
Point3
Point4
Point5
Point6
A
WP1
WP1
WP2
WP2
B
WP3
WP4
WP3
WP4
C
WP5
WP5
WP6
WP7
WP6
WP7
(note the varying lengths of each row, and the duplicates)
I would like to have the end result be:
Wire
Point1
Point2
Point3
A
WP1
WP2
B
WP3
WP4
C
WP5
WP6
WP7
Duplicates removed, and blank spaces removed.
This would be VERY similar to the =UNIQUE() function, but that is not available in power query.

It's a lot easier to work with columns, so I'd recommend unpivoting the Point columns, removing duplicates, and then putting it into the shape you want.
Here's a full query you can past into your Advanced Editor to look at each step more closely:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WclTSUQoPMEQijYAkBIHYsTrRSk5gtjGYNIGzYWpMwGqcwWxTJNIMTJqjsGNjAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Wire = _t, Point1 = _t, Point2 = _t, Point3 = _t, Point4 = _t, Point5 = _t, Point6 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Wire", type text}, {"Point1", type text}, {"Point2", type text}, {"Point3", type text}, {"Point4", type text}, {"Point5", type text}, {"Point6", type text}}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type", {"Wire"}, "Attribute", "Value"),
#"Filtered Rows" = Table.SelectRows(#"Unpivoted Other Columns", each ([Value] <> "")),
#"Removed Duplicates" = Table.Distinct(#"Filtered Rows", {"Wire", "Value"}),
#"Grouped Rows" = Table.Group(#"Removed Duplicates", {"Wire"}, {{"Point", each Text.Combine([Value],","), type text}}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Grouped Rows", "Point", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), {"Point1", "Point2", "Point3"})
in
#"Split Column by Delimiter"

Unpivot
Group by Wire
Aggregate into sorted List of Unique Points
Calculate Max number of items in all the Lists to use in the later Column Splitter
Extract the List of points into semicolon separated string
Split into new columns
M Code
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source, List.Transform(Table.ColumnNames(Source), each {_, Text.Type})),
//Unpivot
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type", {"Wire"}, "Attribute", "Value"),
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"}),
//Group by Wire
//Aggregate by sorted, unique list of Points for each Wire
#"Grouped Rows" = Table.Group(#"Removed Columns", {"Wire"}, {
{"Point", each List.Sort(List.Distinct([Value]))}}),
//Calculate the Max unique Points for any Wire (for subsequent splitting
maxPoints = List.Max(List.Transform(#"Grouped Rows"[Point], each List.Count(_))),
//Extract the List values into a semicolon separated list
#"Extracted Values" = Table.TransformColumns(#"Grouped Rows",
{"Point", each Text.Combine(List.Transform(_, Text.From), ";"), type text}),
//Then split into new columns using the semicolon delimiter
#"Split Column by Delimiter" = Table.SplitColumn(#"Extracted Values", "Point",
Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv),maxPoints)
in
#"Split Column by Delimiter"

Related

Dynamically pivot n rows with new row for multiple values

I wish to transform the input to the output shown below:
However for Columns D and K there are multiple values for D and K which causes and error:
M Code to replicate the above:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", Int64.Type}}),
#"Pivoted Column" = Table.Pivot(#"Changed Type", List.Distinct(#"Changed Type"[Column1]), "Column1", "Column2")
in
#"Pivoted Column"
I have also attempted to add an index column so that each data point is unique but this leads to further problems.
So far I have actually grouped the data so that everything is contained in a single cell:
Current M Code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type1" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}}),
#"Grouped Rows" = Table.Group(#"Changed Type1", {"Column1"}, {{"Column2", each Text.Combine([Column2],"#(lf)"), type text}}),
#"Pivoted Column" = Table.Pivot(#"Grouped Rows", List.Distinct(#"Grouped Rows"[Column1]), "Column1", "Column2")
in
#"Pivoted Column"
However this requires the data to be split which is fine but I want to do this dynamically for n Columns i..e A-B or A-AZ and this kind of shifts the problems so that I have to dynamically Split n columns.
Input data:
Column1 Column2
A 1
B 1
C 2
D 3
D 3
D 1
E 2
F 1
G 2
H 1
I 2
J 3
K 1
K 2
L 1
M 2
N 3
A not unusual problem solved by a custom function. Check the link in the credits for an explanation:
Custom Function
//credit: Cam Wallace https://www.dingbatdata.com/2018/03/08/non-aggregate-pivot-with-multiple-rows-in-powerquery/
//Rename: fnPivotAll
(Source as table,
ColToPivot as text,
ColForValues as text)=>
let
PivotColNames = List.Buffer(List.Distinct(Table.Column(Source,ColToPivot))),
#"Pivoted Column" = Table.Pivot(Source, PivotColNames, ColToPivot, ColForValues, each _),
TableFromRecordOfLists = (rec as record, fieldnames as list) =>
let
PartialRecord = Record.SelectFields(rec,fieldnames),
RecordToList = Record.ToList(PartialRecord),
Table = Table.FromColumns(RecordToList,fieldnames)
in
Table,
#"Added Custom" = Table.AddColumn(#"Pivoted Column", "Values", each TableFromRecordOfLists(_,PivotColNames)),
#"Removed Other Columns" = Table.RemoveColumns(#"Added Custom",PivotColNames),
#"Expanded Values" = Table.ExpandTableColumn(#"Removed Other Columns", "Values", PivotColNames)
in
#"Expanded Values"
Main Code
let
//change next line to reflect your actual data source
Source = Excel.CurrentWorkbook(){[Name="Table24"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", Int64.Type}}),
pivot = fnPivotAll(#"Changed Type","Column1","Column2")
in
pivot
Results from your data
This seems to work fine (a) group and add index inside group (b) expand (c) pivot
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Grouped Rows" = Table.Group(Source, {"Column1"}, {{"data", each Table.AddIndexColumn(_, "Index", 0, 1, Int64.Type), type table}}),
#"Expanded data" = Table.ExpandTableColumn(#"Grouped Rows", "data", {"Column2", "Index"}, {"Column2", "Index"}),
#"Pivoted Column" = Table.Pivot(#"Expanded data", List.Distinct(#"Expanded data"[Column1]), "Column1", "Column2"),
#"Removed Columns" = Table.RemoveColumns(#"Pivoted Column",{"Index"})
in #"Removed Columns"

Compare columns return maximum power query

I have data from multiple suppliers which I wish to compare. The data shown in the image below has been previously transformed via a series of steps using power query. The final step was to pivot the Supplier column (in this example consisting of X,Y,Z) so that these new columns can be compared and the maximum value is returned.
How can I compare the values in columns X, Y and Z to do this? Importantly, X Y and Z arent necessarily the only suppliers. If I Add Say A as a new supplier to the original data, a new column A will be generated and I wish to include this in the comparison so that at column at the end outputs the highest value found for each row. So reading from the top down it would read in this example: 3,3,1,1,5,0.04,10 etc.
Thanks
Link to file https://onedrive.live.com/?authkey=%21AE_6NgN3hnS6MpA&id=8BA0D02D4869CBCA%21763&cid=8BA0D02D4869CBCA
M Code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Material", type text}, {"Residual Solvents", type text}, {"RMQ", type text}, {"COA", type text}, {"Technical Data Sheet", type text}}),
//Replace Time and null with blank
#"Replaced Value" = Table.ReplaceValue(#"Changed Type","00:00:00","",Replacer.ReplaceText,{"Material", "RMQ", "Residual Solvents", "Technical Data Sheet", "COA"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value",null,"",Replacer.ReplaceValue,{"Material", "RMQ", "Residual Solvents", "Technical Data Sheet", "COA"}),
//Trims all whitespace from user
#"Power Trim" = Table.TransformColumns(#"Replaced Value1",{{"Material", #"PowerTrim", type text}, {"Residual Solvents", #"PowerTrim", type text}, {"RMQ", #"PowerTrim", type text}, {"COA", #"PowerTrim", type text}, {"Technical Data Sheet",#"PowerTrim", type text}}),
//Unpivot to develop a single column of solvent/metals/date data
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Power Trim", {"Material", "Supplier"}, "Attribute", "Value"),
//split into rows by line feed
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Unpivoted Other Columns",
{{"Value", Splitter.SplitTextByDelimiter("#(lf)", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Value"),
#"Trimmed Text" = Table.TransformColumns(#"Split Column by Delimiter",{{"Value", Text.Trim, type text}}),
//filter out the blank rows
#"Filtered Rows" = Table.SelectRows(#"Trimmed Text", each ([Value] <> "" and [Value] <> "Not Provided")),
//Add custom column for separating the tables
#"Added Custom" = Table.AddColumn(#"Filtered Rows", "Custom", each try Date.FromText([Value]) otherwise
if [Value] = "Heavy Metals" or [Value] = "Residual Solvents" or [Value] = "Other" then [Value] else null),
#"Changed Type1" = Table.TransformColumnTypes(#"Added Custom",{{"Custom", type text}}),
#"Filled Down" = Table.FillDown(#"Changed Type1",{"Custom"}),
//Filter the value and custom columns to remove contaminant type from Value column and remove dates from Custom column
#"Filtered Rows1" = Table.SelectRows(#"Filled Down", each ([Custom] = "Heavy Metals" or [Custom] = "Residual Solvents") and ([Value] <> "Heavy Metals" and [Value] <> "Residual Solvents")),
//split substance from amount
#"Split Column by Delimiter1" = Table.SplitColumn(#"Filtered Rows1", "Value",
Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, true), {"Substance", "Amount"}),
//Filter for Solvents Table
#"Filtered Rows2" = Table.SelectRows(#"Split Column by Delimiter1", each ([Custom] = "Heavy Metals")),
#"Changed Type2" = Table.TransformColumnTypes(#"Filtered Rows2",{{"Amount", type number}}),
//Group by Material and Substance, then extract the Max contaminant and Source
#"Grouped Rows" = Table.Group(#"Changed Type2", {"Substance","Material", "Supplier"}, {
{"Amount", each List.Max([Amount]), type number},
{"Source", (t) => t[Attribute]{List.PositionOf(t[Amount],List.Max(t[Amount]))}, type text}
}),
#"Sorted Rows" = Table.Sort(#"Grouped Rows",{{"Substance", Order.Ascending}}),
//PIVOT to compare suppliers
#"Pivoted Column" = Table.Pivot(#"Sorted Rows", List.Distinct(#"Sorted Rows"[Supplier]), "Supplier", "Amount", List.Sum)
in
#"Pivoted Column"
Add an Index Column starting with zero (0).
Add a Custom Column:
=List.Max(
Record.ToList(
Table.SelectColumns(#"Added Index",
List.RemoveFirstN(
Table.ColumnNames(#"Pivoted Column"),3)){[Index]}))
Then remove the Index column
Algorithm
Generate a list of the relevant column names to sum.
We exclude the first three column names from this list
Note that we refer to the step PRIOR to adding the Index column for the list of column names. If we referred to the actual previous step where we added the Index column, we'd also have to remove the Last column name
Select the relevant columns
{[Index]} will return a record corresponding to the Index number.
Convert the record to a list and use List.Max

Arranging values in a specific order using power query

I wish to clean up a table I have been working on removing complex formulas and opting to sort my data using PQE.
Below is a made up example of the data I am working with.
I wish to have the output column alternate between material and sub values akin to headers and sub headers in a book.
M Code so far:
let
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Material ", type text}, {"Sub ", type text}, {"CAS", type text}}),
#"Trimmed Text" = Table.TransformColumns(#"Changed Type",{{"Material ", Text.Trim, type text}, {"Sub ", Text.Trim, type text}}),
#"Cleaned Text" = Table.TransformColumns(#"Trimmed Text",{{"Sub ", Text.Clean, type text}}),
#"Replaced Value" = Table.ReplaceValue(#"Cleaned Text","-","",Replacer.ReplaceText,{"Sub "}),
#"Merged Columns" = Table.CombineColumns(#"Replaced Value",{"Material ", "Sub "},Combiner.CombineTextByDelimiter(" ", QuoteStyle.None),"Merged"),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Merged Columns", {{"Merged", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Merged"),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Merged", type text}})
in
#"Changed Type1"
The problem with this code is that it doesn't properly list the materials when combined and I also don't know how to get the CAS numbers to be correctly sorted.
If anyone has any thoughts how to achieve the desired output it would be appreciated.
You can get your output from your input using the same technique I showed you in your last, similar question.
create two lists of the columns; then List.Zip to combine them.
The trick is that for List 1, (eg output column 1), you may need to add the contents of the Material column at the top of the Sub column list (or replace the - if that's all there is; and, if that is the case, add a - at the start of the CAS list; so things will line up at the end.
M Code
let
Source = Excel.CurrentWorkbook(){[Name="Table13"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Material", type text}, {"Sub", type text}, {"CAS", type text}}),
//combine the columns
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each
let
L1 = if [Sub] = "-" then {[Material]}
else List.Combine({{[Material]},Text.Split([Sub],"#(lf)")}),
L2 = if [Sub] = "-" then {[CAS]}
else List.Combine({{"-"},Text.Split([CAS],"#(lf)")})
in
List.Zip({L1,L2})),
#"Removed Columns1" = Table.RemoveColumns(#"Added Custom",{"Material", "Sub", "CAS"}),
//split the combined columns
#"Expanded Custom" = Table.ExpandListColumn(#"Removed Columns1", "Custom"),
#"Extracted Values" = Table.TransformColumns(#"Expanded Custom",
{"Custom", each Text.Combine(List.Transform(_, Text.From), ";"), type text}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Extracted Values",
"Custom", Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv), {"Material", "CAS"})
in
#"Split Column by Delimiter"

Is there a Power Query function that seperates a comma seperate list into value columns?

Let's say that I have a column that has a comma seperated list of values in each row
Row ID | Symptoms
1 | Vomiting,Diarrhoea
2 | Diarrhoea
3 | Cough
Would it be possible to transform this into the following in Power Query?:
Row ID | Symptoms - Cough | Symptoms - Diarrhoea | Symptom - Vomiting
1 | No | Yes | Yes
2 | No | Yes | No
3 | Yes| No | No
Unfortunately could not a find a specific question and answers that tackles my challenge on Stack Overflow and tried looking up multiple videos on YouTube with limited success.
I suggest:
Split the symptom column by comma
Select the Row ID column and unpivot other columns
Add a custom column consisting of the word "Yes"
Pivot on the Value column (containing the symptoms) and Don't Aggregate
Replace the null with "No"
The code as written should adjust as you add/remove rows or symptoms from your orginal data table.
If you need to prefix all of the symptom columns with Symptoms - , that can be done with a few more lines of code.
M Code
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Row ID", Int64.Type}, {"Symptoms", type text}}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "Symptoms", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv)),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Split Column by Delimiter", {"Row ID"}, "Attribute", "Value"),
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"}),
#"Trimmed Text" = Table.TransformColumns(#"Removed Columns",{{"Value", Text.Trim, type text}}),
#"Added Custom" = Table.AddColumn(#"Trimmed Text", "Custom", each "Yes"),
#"Pivoted Column" = Table.Pivot(#"Added Custom", List.Distinct(#"Added Custom"[Value]), "Value", "Custom"),
#"Replaced Value" = Table.ReplaceValue(#"Pivoted Column",null,"No",Replacer.ReplaceValue,Table.ColumnNames(#"Pivoted Column"))
in
#"Replaced Value"
EDIT: Provoked by #AlexisOlson comment, I added a few lines of code to both alphabetize the symptom list, and also to prefix each symptom name with the string Symptoms - .
I also added numerous comments to the code, which you will also be able to see in the Applied Steps window.
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Row ID", Int64.Type}, {"Symptoms", type text}}),
//Split the different symptoms into separate columns
#"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "Symptoms", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv)),
//Unpivot other than "Row ID" so as to have all the symptoms in a single column
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Split Column by Delimiter", {"Row ID"}, "Attribute", "Value"),
//Remove the Attribute column
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"}),
//Trim the symptoms column to get rid of any leading spaces that might be there
#"Trimmed Text" = Table.TransformColumns(#"Removed Columns",{{"Value", Text.Trim, type text}}),
//Sort symptoms alphabetically
#"Sorted Rows" = Table.Sort(#"Trimmed Text",{{"Value", Order.Ascending}}),
//Create a new column with the "Yes"'s
#"Added Custom" = Table.AddColumn(#"Sorted Rows", "Custom", each "Yes"),
//Pivot on the symptoms column to create new table with each symptom as a column header
#"Pivoted Column" = Table.Pivot(#"Added Custom", List.Distinct(#"Added Custom"[Value]), "Value", "Custom"),
#"Changed Type1" = Table.TransformColumnTypes(#"Pivoted Column",{{"Cough", type text}, {"Diarrhoea", type text}, {"Headache", type text}, {"Myalgia", type text}, {"Vomiting", type text}}),
//Replace the resultant nulls with "No"'s
#"Replaced Value" = Table.ReplaceValue(#"Changed Type1",null,"No",Replacer.ReplaceValue,Table.ColumnNames(#"Changed Type1")),
firstColName = Table.ColumnNames(#"Replaced Value"){0},
//Prefix each column header with "Symptoms - " except first
rename = Table.TransformColumnNames(#"Replaced Value", each if _ <> firstColName then "Symptoms - " & _ else _)
in
rename

Power Query: Concatenate All Rows

I am looking to concatenate rows instead of columns. I can do this by pivoting and merging columns, but that process is not repeatable as the rows change (the column merging then hardcodes the values). How would I do a GroupBy function but concatenate rather than sum/count/etc?
NOTE: It's worth noting that the text must stay as a Text type. The prices are sometimes $275pp or something and I need to maintain the letters
A visual of what I am trying to do is below:
You just need to Group by Group and do some manipulations to add the quotation marks and brackets:
let
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Price", type text}, {"Group", Int64.Type}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "quotPrice", each """" & [Price] & """"),
#"Grouped Rows" = Table.Group(#"Added Custom", {"Group"}, {{"Grouped", each _, type table [Price=text, Group=number, quotPrice=text]}}),
#"Added Custom1" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.Column([Grouped],"quotPrice")),
#"Extracted Values" = Table.TransformColumns(#"Added Custom1", {"Custom", each Text.Combine(List.Transform(_, Text.From), ", "), type text}),
#"Added Custom2" = Table.AddColumn(#"Extracted Values", "Text", each "[" & [Custom] & "]"),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom2",{"Grouped", "Custom"})
in
#"Removed Columns"
The answer you're looking for is to combine a Table.Group() function with a Text.Combine() function.
This will work for you:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WUjEyN1XSUTJUitUBcowNDBAcQyOQjBGEYwSWgXIMzZFkLMEyxkqxsQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Price = _t, Group = _t]),
#"Grouped rows" = Table.Group(Source, {"Group"}, {{"Count", each "[""" & Text.Combine([Price], """, """) & """]", type text}})
in
#"Grouped rows"

Resources