Enumerate Text Values in Power-Query - excel

I have a column in my table that has some text values (input) which I would like to convert to numbers (output) for each unique text value, so that I can do some regression analysis:
Input
Output
AOP
1
AOS
2
AOS
2
AOS
2
AOP
1
null
0 or null
AOP
1
I initially tried to do this do this with several Transform: Replace Values steps, however I don't know how to:
make this flexible to different numbers of unique values (not hardcode 3 replacements but handle n where n is the number of unique values in input)
repeat this for many columns of my table
avoid looping as far as possible
What's a better approach?

One way is add custom column with below formula, and do that for each column you care to apply it to, using the value of each text character to generate a unique number
= try
List.Accumulate(Text.ToList([Input]), "", (state, current)=>
state&Number.ToText(Character.ToNumber(current), "0000")) otherwise null
this would transform all column's text into unique numbers, replacing the original data:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
Function = (x) => try List.Accumulate(Text.ToList(x), "", (state, current)=> state&Number.ToText(Character.ToNumber(current), "0000")) otherwise null,
TransformList = List.Transform(Table.ColumnNames(Source), each {_ , Function}),
Output = Table.TransformColumns(Source, TransformList)
in Output
this would transform all column's text into unique numbers, appending the new columns to existing columns:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
Function = (x) => try List.Accumulate(Text.ToList(x), "", (state, current)=> state&Number.ToText(Character.ToNumber(current), "0000")) otherwise null,
TransformList = List.Transform(Table.ColumnNames(Source), each {_ , Function}),
Output = Table.TransformColumns(Source, TransformList),
Numericals=Table.RenameColumns( Output, List.Zip( { Table.ColumnNames( Output), List.Transform(Table.ColumnNames(Output), each _ &"number") } ) ),
#"Merged Queries" = Table.NestedJoin(Table.AddIndexColumn(Source, "Index", 0, 1),{"Index"},Table.AddIndexColumn(Numericals, "Index2", 0, 1),{"Index2"},"Tabl2",JoinKind.LeftOuter),
#"Expanded Tabl2" = Table.ExpandTableColumn(#"Merged Queries", "Tabl2", Table.ColumnNames( Numericals),Table.ColumnNames( Numericals)),
#"Removed Columns" = Table.RemoveColumns(#"Expanded Tabl2",{"Index"})
in #"Removed Columns"

Related

Restrict transformation to Header row or Row 1 Power Query

I need to find and replace the headers of my Source Table in Power query
I am able to do this with BulkReplace
But this searches the entire table, is there a way to restrict BulkReplace to only the headers, or if not then I can demote the headers and run BulkReplace on just Row 1 of the Source Table
Thank you
sumAppHeads (Find Replace Table)
In my Power Query, I have
BulkReplaceStepHeaders = fBulkReplaceStep(#"Demoted Headers", sumAppHeaders, Table.ColumnNames(#"Demoted Headers")),
let BulkReplace = (DataTable as table, FindReplaceTable as table, DataTableColumn as list) =>
let
//Convert the FindReplaceTable to a list using the Table.ToRows function
//so we can reference the list with an index number
FindReplaceList = Table.ToRows(FindReplaceTable),
//Count number of rows in the FindReplaceTable to determine
//how many iterations are needed
Counter = Table.RowCount(FindReplaceTable),
//Define a function to iterate over our list
//with the Table.ReplaceValue function
BulkReplaceValues = (DataTableTemp, n) =>
let
//Replace values using nth item in FindReplaceList
ReplaceTable = Table.ReplaceValue(
DataTableTemp,
//replace null with empty string in nth item
if FindReplaceList{n}{0} = null then "" else FindReplaceList{n}{0},
if FindReplaceList{n}{1} = null then "" else FindReplaceList{n}{1},
Replacer.ReplaceValue,
DataTableColumn
)
in
//if we are not at the end of the FindReplaceList
//then iterate through Table.ReplaceValue again
if n = Counter - 1
then ReplaceTable
else #BulkReplaceValues(ReplaceTable, n + 1),
//Evaluate the sub-function at the first row
Output = BulkReplaceValues(DataTable, 0)
in
Output
in
BulkReplace
Demote the headers
Transpose the table
Replace the old column names that are now all in Column1
Transpose the table back
Promote the headers
Try this
BulkReplaceStepHeaders = fBulkReplaceStep(Table.FirstN(#"Demoted Headers", 1),sumAppHeaders,Table.ColumnNames(#"Demoted Headers")) & Table.Skip(#"Demoted Headers", 1),
This grabs the column names, merges against the replace table to find new names, then does a rename to use the new names
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Merged Queries" = Table.NestedJoin(Table.FromList(Table.ColumnNames(Source)), {"Column1"}, ReplaceTable, {"Find"}, "Table2", JoinKind.LeftOuter),
#"Expanded Table2" = Table.ExpandTableColumn(#"Merged Queries", "Table2", {"Replace"}, {"Replace"}),
#"NewNames" = Table.AddColumn(#"Expanded Table2", "Custom", each if [Replace]=null then [Column1] else [Replace])[Custom],
#"Rename"=Table.RenameColumns( Source, List.Zip( { Table.ColumnNames( Source ), #"NewNames" } ) )
in #"Rename"

Apply Power Query to all columns

I have a Power query that finds and replaces values listed in a table that I work through from here Bulk Find And Replace In Power Query
But I need to apply it to All columns.
How to do this without listing all the columns as they are dynamic and keep changing
Thanks
What I have so far
let
Source = Excel.CurrentWorkbook(){[Name="MyData"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Job Title", type text}}),
BulkReplaceStep = fBulkReplace(#"Changed Type", MyFindReplace, {"Job Title","Job Title2"})
in
BulkReplaceStep
The find/replace data table
let
Source = Excel.CurrentWorkbook(){[Name="MyFindReplace"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Find", type text}, {"Replace", type text}})
in
#"Changed Type
Bulkreplace
let BulkReplace = (DataTable as table, FindReplaceTable as table, DataTableColumn as list) =>
let
//Convert the FindReplaceTable to a list using the Table.ToRows function
//so we can reference the list with an index number
FindReplaceList = Table.ToRows(FindReplaceTable),
//Count number of rows in the FindReplaceTable to determine
//how many iterations are needed
Counter = Table.RowCount(FindReplaceTable),
//Define a function to iterate over our list
//with the Table.ReplaceValue function
BulkReplaceValues = (DataTableTemp, n) =>
let
//Replace values using nth item in FindReplaceList
ReplaceTable = Table.ReplaceValue(
DataTableTemp,
//replace null with empty string
if FindReplaceList{n}{0} = null then "" else FindReplaceList{n}{0},
if FindReplaceList{n}{1} = null then "" else FindReplaceList{n}{1},
Replacer.ReplaceText,
DataTableColumn
)
in
//if we are not at the end of the FindReplaceList
//then iterate through Table.ReplaceValue again
if n = Counter - 1
then ReplaceTable
else #BulkReplaceValues(ReplaceTable, n + 1),
//Evaluate the sub-function at the first row
Output = BulkReplaceValues(DataTable, 0)
in
Output
in
BulkReplace
This works
Change this:
BulkReplaceStep = fBulkReplace(#"Changed Type", MyFindReplace, {"Job Title","Job Title2"})
To This:
BulkReplaceStep = fBulkReplace(#"Changed Type", MyFindReplace, Table.ColumnNames(#"Changed Type"))

Split decimal from text as batch

Attempting to split decimal numbers in batch using a prevo=ious formula provided on here however the result is an error stating that null or "" or "x" (where is a number) cant be converted to the type list.
The formula:
=try Text.Remove([Column1],Text.ToList(Text.Remove([Column1],{"0".."9","."}))) otherwise null works when applied to a single column however when trying to create a create a table from these columns I get the followings errors:
Desired Output:
M Code:
let
Source = Excel.CurrentWorkbook(){[Name="Table19"]}[Content],
#"Added Custom" = Table.AddColumn(Source, "Custom", each Table.FromColumns({
(try Text.Remove([Column1],Text.ToList(Text.Remove([Column1],{"0".."9","."}))) otherwise null),
(try Text.Remove([Column2],Text.ToList(Text.Remove([Column2],{"0".."9","."}))) otherwise null)
}))
in
#"Added Custom"
I would like to be able to generate a Table.FromColumns, for n columns which I can then expand. This is just an example and in reality, the number of columns can vary quite a lot.
Update
To better visualise what I am trying to do in power query I wish to create this scenario:
Such that this table can be expanded to:
Probably something obvious but any help appreciated.
I would just
split the columns based on character transition, including the decimal in the character list.
Then Trim the resultant columns to remove any leading/following spaces
Note: Code edited to allow for any number of columns to be split in two. Column names can be dynamic also
let
Source = Excel.CurrentWorkbook(){[Name="Table21"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}}),
//Generate new table from all the columns
//create List of columns
colList = Table.ToColumns(#"Changed Type"),
colNames = Table.ColumnNames(#"Changed Type"),
//convert each column
splitCols = List.Generate(
()=>[colPair=
List.Transform(colList{0},(li)=>
Splitter.SplitTextByCharacterTransition(
{"0".."9","."}, (c) => not List.Contains({"0".."9","."}, c))
(li)),
cn = colNames{0},
idx=0],
each [idx] < List.Count(colList),
each [colPair=
List.Transform(colList{[idx]+1},(li)=>
Splitter.SplitTextByCharacterTransition(
{"0".."9","."}, (c) => not List.Contains({"0".."9","."}, c))
(li)),
cn=colNames{[idx]+1},
idx=[idx]+1],
each List.Zip([colPair]) & {List.Transform({1..2}, (n)=> [cn] & "." & Text.From(n))}),
newCols = List.Combine(List.Transform(splitCols, each List.RemoveLastN(_,1))),
newColNames = List.Combine(List.Transform(splitCols, each List.Last(_))),
newTable = Table.FromColumns(newCols,newColNames),
//trim the excess spaces
trimOps = List.Transform(Table.ColumnNames(newTable), each {_, Text.Trim}),
trimAll = Table.TransformColumns(newTable, trimOps)
in
trimAll
Example with three columns
Again, if you want to retain the original columns in your result table, you need to change three lines in the code:
...
newCols = Table.ToColumns(#"Changed Type") & List.Combine(List.Transform(splitCols, each List.RemoveLastN(_,1))),
newColNames = Table.ColumnNames(#"Changed Type") & List.Combine(List.Transform(splitCols, each List.Last(_))),
newTable = Table.FromColumns(newCols,newColNames),
...
Edited to be usable for multiple columns
let Source =Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
#"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1, Int64.Type),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Added Index", {"Index"}, "Attribute", "Value"),
#"Split Column by Delimiter" = Table.SplitColumn(#"Unpivoted Other Columns", "Value", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, false), {"Value.1", "Value.2"}),
#"Removed Columns" = Table.RemoveColumns(#"Split Column by Delimiter",{"Value.2"}),
#"rename1" = Table.TransformColumns(#"Removed Columns",{{"Attribute", each _&"a", type text}}),
#"Pivoted Column" = Table.RemoveColumns(Table.Pivot(#"rename1", List.Distinct(#"Lowercased Text"[Attribute]), "Attribute", "Value.1"),{"Index"}),
#"Removed Columns2" = Table.RemoveColumns(#"Split Column by Delimiter",{"Value.1"}),
rename = Table.TransformColumns(#"Removed Columns2",{{"Attribute", each _ & "b", type text}}),
#"Pivoted Column1" = Table.RemoveColumns(Table.Pivot(rename, List.Distinct(rename[Attribute]), "Attribute", "Value.2"),{"Index"}),
TFC = Table.FromColumns(Table.ToColumns(Source)&Table.ToColumns(#"Pivoted Column")&Table.ToColumns(#"Pivoted Column1"),Table.ColumnNames(Source)&Table.ColumnNames(#"Pivoted Column")&Table.ColumnNames(#"Pivoted Column1"))
in TFC
I would just duplicate the two original columns (Add Column > Duplicate column) and then split the resulting columns on the left most " " delimiter. No M code needed.

Power Query Applying a Function Across Every Column

I am trying to write a query that takes a table and multiplies every number in the table by 100. I've gotten close, but I am having trouble applying it correctly to every column. Below is the code I have so far. The line starting with ReplaceTable is the line I have working for one column, and the line below was my attempt at getting it to work for other columns. I am dealing with a small subset currently, but the real data will potentially have ~100 columns, so I do not want to do this by hand. If there's a better way to do this task, please let me know. I am new to Power Query, so if able please explain my error/the solution so I can learn. Thanks!
let
Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
//Organization will always be of type text. The others will be should be numbers, unless user error
#"Changed Type" = Table.TransformColumnTypes(Source, {{"Organization", type text}, {"A", Int64.Type}, {"B", Int64.Type}, {"C", Int64.Type}}),
//function to replace all values in all columns with multiplied values
MultiplyReplace = (DataTable as table, DataTableColumns as list) =>
let
Counter = Table.ColumnCount(DataTable),
ReplaceCol = (DataTableTemp, i) =>
let
colName = {DataTableColumns{i}},
col = Table.Column(DataTableTemp, colName),
//LINE THAT WORKS- want this functionality for ALL columns
ReplaceTable = Table.ReplaceValue(DataTableTemp, each[A], each if [A] is number then [A]*100 else [A], Replacer.ReplaceValue, colName)
//ReplaceTable = Table.ReplaceValue(DataTableTemp, each col, each if col is number then col*100 else col, Replace.ReplaceValue, colName)
in
if i = Counter-1 then ReplaceTable else #ReplaceCol(ReplaceTable, i+1)
in
ReplaceCol(DataTable, 0),
allColumns = Table.ColumnNames(#"Changed Type"),
#"Multiplied Numerics" = MultiplyReplace(#"Changed Type", allColumns)
//#"Restored Type" = Value.ReplaceTypes(#"Multiplied Numerics", #"Changed Type")
in
#"Multiplied Numerics"
The issue involves the scope of the functions and the variables.
With a hard-coded column name (such as [A]), the code is understanding the shorthand to actually mean _[A]. Within a Table.ReplaceValue function, that _ is referencing the current Record or row. However, the col variable is referencing the entire table column. When used in the replacer function, it causes an error. (Unfortunately(?), errors in replacer functions are just ignored with no error message, so issues can be hard to trace.)
In the corrected code, I got rid of the col variable, since it's being determined at the wrong scope level. I changed colName to being text instead of a list, and then used the Record.Field function with _ (the current record within the Table.ReplaceValue function) and the text value colName to extract the desired record for the calculations with the Table.ReplaceValue function itself.
Corrected Code
let
Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
//Organization will always be of type text. The others will be should be numbers, unless user error
#"Changed Type" = Table.TransformColumnTypes(Source, {{"Organization", type text}, {"A", Int64.Type}, {"B", Int64.Type}, {"C", Int64.Type}}),
//function to replace all values in all columns with multiplied values
MultiplyReplace = (DataTable as table, DataTableColumns as list) =>
let
Counter = Table.ColumnCount(DataTable),
ReplaceCol = (DataTableTemp, i) =>
let
colName = DataTableColumns{i},
//LINE THAT WORKS- want this functionality for ALL columns
ReplaceTable = Table.ReplaceValue(DataTableTemp,each Record.Field(_, colName), each if Record.Field(_, colName) is number then Record.Field(_, colName)*100 else Record.Field(_, colName),Replacer.ReplaceValue,{colName})
//ReplaceTable = Table.ReplaceValue(DataTableTemp, each col, each if col is number then col*100 else col, Replace.ReplaceValue, colName)
in
if i = Counter-1 then ReplaceTable else #ReplaceCol(ReplaceTable, i+1)
in
ReplaceCol(DataTable, 0),
allColumns = Table.ColumnNames(#"Changed Type"),
#"Multiplied Numerics" = MultiplyReplace(#"Changed Type", allColumns)
//#"Restored Type" = Value.ReplaceTypes(#"Multiplied Numerics", #"Changed Type")
in
#"Multiplied Numerics"

Convert column to cell string Power Query

I need to fit all the values of a column in Power Query into a 1-cell string separated by commas, as the example below:
To do this, I have the following piece of code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Transposed Table" = Table.Transpose(Source),
#"Merged Columns" = Table.CombineColumns(#"Transposed Table",{"Column1", "Column2", "Column3"},Combiner.CombineTextByDelimiter(",", QuoteStyle.None),"Merged"),
#"KeepString" = #"Merged Columns"[Merged]{0}
in
#"KeepString"
The problem with this code is that it assumes there will always be 3 columns, which is not always the case. How can I merge all columns (regardless of how many there are) into one?
You can do this with List.Accumulate:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
KeepString = List.Accumulate(Source[User], "", (state, current) => if state = "" then current else state & "," & current)
in
KeepString
You can also use Table.ColumnNames to get the list of all the column names. You can pass this into Table.CombineColumns, so your modified solution would be:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Transposed Table" = Table.Transpose(Source),
#"Merged Columns" = Table.CombineColumns(#"Transposed Table", Table.ColumnNames(#"Transposed Table"),Combiner.CombineTextByDelimiter(",", QuoteStyle.None),"Merged"),
#"KeepString" = #"Merged Columns"[Merged]{0}
in
#"KeepString"
You can also use a shorter code, like this:
let
Source=Excel.CurrentWorkbook( {[Name="Table1"]}[Content],
Result = Text.Combine(Source[User], ",")
in
Result

Resources