Power Query - Merging rows of data based on unique ID - pivot

I have a problem similar to the one detailed here and am employing the pivot-unpivot solution, which is working well so far. My data is more complex though, and as it is drawing from multiple sources sometimes there are discrepant values.
Essentially - after applying the pivot/unpivot, the grouping works perfectly but I end up getting a lot of errors. All of them appear to be the same:
Expression.Error: There were too many elements in the enumeration to complete the operation.
Details: List
In an effort to resolve this, I added a 5th parameter to my Pivot.Column command:
each Text.Combine(_, "#(lf)")
This results in the errors showing the values being combined instead. However, sometimes the values displayed will be the exact same. How can I get these to actually merge, while only showing an error/cell values in the cells with discrepant data? I am new to power query and not sure if there is a better solution than "Text.combine"
Some examples below... Thanks for your help
Merged table looks something like:
Unique ID
Data A
Data B
Data C
ABC
123
789
null
ABC
123
null
name2
BCD
234
null
null
BCD
null
null
null
BCD
1234
null
name2
EFG
333
222
name1
EFG
null
222
null
ABC
null
null
null
Following pivot/unpivot with text combine (I am not sure how to show line breaks here, so have delineated using a comma):
Unique ID
Data A
Data B
Data C
ABC
123, 123
789
name2
BCD
234, 1234
null
name2
EGF
333
222, 222
name1
What I want:
Unique ID
Data A
Data B
Data C
ABC
123
789
name2
BCD
234, 1234
null
name2
EGF
333
222
name1
Where the Data A point for BCD would be an error, so I can see that there's something that needs to be fixed in the source data tables.

With your data like this:
Unique ID
Data A
Data B
Data C
ABC
123
789
null
ABC
123
null
name2
BCD
234
null
null
BCD
null
null
null
BCD
1234
null
name2
EFG
333
222
name1
EFG
null
222
null
ABC
null
null
null
Right click the Unique ID column, select "Unpivot Other Columns"
Change the resulting Value column type to "Text"
Select all Columns. Right click, choose "Remove Duplicates".
Select the Attribute column. Choose Pivot from the Transform Tab. Choose Values column from the drop down. Choose Don't Aggregate under Advanced options. Add your existing code as the fifth parameter each Text.Combine(_, "#(lf)")

right click the UniqueID column and unpivot other columns
transform .. data type ... text for the 3 columns
click select all 3 columns, right click, remove duplicates
click select Unique ID and Attribute columns, right click, group by ... keep default options, click ok
in formula bar change end of grouping formula
from
each Table.RowCount(_), Int64.Type}})
to
each Text.Combine(List.Transform([Value], Text.From), ","), type text}})
or
each Text.Combine(List.Transform([Value], Text.From), "#(lf)"), type text}})
Click select Attribute column
Transform ... pivot column ... values column:count, advanced options: dont aggregate
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Unique ID"}, "Attribute", "Value"),
#"Changed Type1" = Table.TransformColumnTypes(#"Unpivoted Other Columns",{{"Unique ID", type text}, {"Attribute", type text}, {"Value", type text}}),
#"Removed Duplicates" = Table.Distinct(#"Changed Type1"),
#"Grouped Rows" = Table.Group(#"Removed Duplicates", {"Unique ID", "Attribute"}, {{"Count", each Text.Combine(List.Transform([Value], Text.From), ", "), type text}}),
#"Pivoted Column" = Table.Pivot(#"Grouped Rows", List.Distinct(#"Grouped Rows"[Attribute]), "Attribute", "Count")
in #"Pivoted Column"

Related

How to separate Phone Numbers in the Name Column

I have a few problems that I have been stuck with for a few days now.
I have a table as below:
| Full Name | Atlanta_Email_Only
| 16788889999 | random#gmail.com
| 14045556666 | notreal#gmail.com
| John Harris | johnharris#atlanta.com
| Sarah Smith | sarahsmith#atlanta.com
How can I use Power Query Editor to separate the Full Name into 2 columns; one is Join By Phone, and one is Full Name.
And for the email, how can I delete all the emails that does not contain the word Atlanta in it.
I have tried to use Split Column -> By Digit to Non-Digit / By Non_Digit to Digit for the Full Name, but it didn't work.
I also tried the Add Column -> Conditonal Column to drop the Email without containing the word Atlanta, but it also didn't work.
Thank you for you help.
In powerquery ... Right click Full name column and duplicate it
Click the new column, Transform data type .. whole number
Right click new column, replace errors, null
That is the numbers
Add column .. custom column to compare the new column with the original column using formula similar to:
= if [#"Full Name - Copy"] = null then [Full Name] else null
This is the text
Right click and remove original Full Name column
To filter the emails, right click the email column, transform .. lowercase
edit the code in the code window (or in Home ... advanced ... ) from
, Text.Lower, type text}})
to
, each if Text.Contains(_,"atlanta") then _ else null , type text}})
Full code sample below:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Duplicated Column" = Table.DuplicateColumn(Source, "Full Name", "Join By Phone"),
#"Changed Type1" = Table.TransformColumnTypes(#"Duplicated Column",{{"Join By Phone", Int64.Type}}),
#"Replaced Errors" = Table.ReplaceErrorValues(#"Changed Type1", {{"Join By Phone", null}}),
#"Added Custom" = Table.AddColumn(#"Replaced Errors", "FullName2", each if [#"Join By Phone"] = null then [Full Name] else null),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Full Name"}),
#"FilterEmail" = Table.TransformColumns(#"Removed Columns",{{"Atlanta_Email_Only", each if Text.Contains(_,"atlanta") then _ else null , type text}})
in #"FilterEmail"

Excel Power Query Merge and Transform columns

I am trying to combine 2 columns into one and then replace "Null" rows with the value in an adjacent columns using Excel Power Query. SO far I haven't been able to resolve this issue.
These are my unsuccessful attempts:
Attempt 1:
= Table.ExpandTableColumn(Source, "F 52 AGR_1016", {"AGR_NAME"}, {"F 52 AGR_1016.AGR_NAME"}),
else Table.ReplaceValue(Source, each "", Replacer.ReplaceValue{"New value of line"})
Attempt 2:
= Table.ExpandTableColumn(Source, "F 52 AGR_1016", {"AGR_NAME"}, {"F 52 AGR_1016.AGR_NAME"}),
if #"F 52 AGR_1016" = "" then Replacer.ReplaceValue("","",{"New Value of line"}), else
I get the following error message, however Excel does not show me where exactly that error is:
Expression.SyntaxError: Token Eof expected.
It is a bit hard to tell what you are doing, and the code format is incorrect. You can't append if then to Table.ExpandTableColumn
To merge two column: click select them, right click, then choose merge columns
To add a column that tests other column values, add column .. custom column ... and use = if xxx then yyy else zzz
= if [Col1] = null then [Col2] else [Col3]
This code expands three columns; merges the text value of Column1 and Column2 into a new column called Merged; creates Column4 where the result is Merge if Column3 is a null, and Column3 if Column3 is not a null
...
#"Expanded" = Table.ExpandTableColumn(Source, "AllRows", {"Column1", "Column2", "Column3"}, {"Column1", "Column2", "Column3"}),
#"Merged Columns" = Table.CombineColumns(#"Expanded",{"Column2", "Column1"},Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Merged"),
#"Added Custom" = Table.AddColumn(#"Merged Columns", "Custom4", each if [Column3]=null then [Merged] else [Column3])

How to number each occurrence of a substring in a cell in Power Query?

I'm fairly new to Power Query and have hit a hiccup that's been bothering me all day. I've read multiple threads here and on the Power BI community and none has really cleared my question, and my logic suggests a few different options to achieve what I want, but my lack of experience blocks any solution I attempt.
Context:
I'm building a database for product import/export into WooCommerce, eBay and other channels; which takes some inputs by the (non tech savyy) users in Excel and develops several of the required fields. One of those is the image file names for each product.
I have this columns (in a much larger query table):
| ImageBaseName | ImageQTY | ImageIDs |
| product-name.jpg | 3 | product-name.jpg product-name.jpg product-name.jpg |
| other-product.jpg| 5 |other-product.jpg other-product.jpg...other-product.jpg |
And my desired output would be:
| ImageBaseName | ImageQTY | ImageIDs |
| product-name.jpg | 3 | product-name-1.jpg product-name-2.jpg product-name-3.jpg |
| other-product.jpg| 5 |other-product-1.jpg other-product-2.jpg...other-product-5.jpg |
In fact I don't need the two first columns if I get the ImageIDs like that.
The ImageBaseName column is generated from the input product name.
The ImageQTY column is direct input by the user.
The ImageIDs column I got so far is from using:
= Table.AddColumn(#"previous step", "ImageIDs", each Text.Trim(Text.Repeat ([ImageBaseName]&" ", [ImageQty])))
And these are the options I've considered thus far:
Option 1: Text.Combine(Text.Split ImageIDs and (somehow) count and number each item in the list) and concatenate it all back... Which would probably start like this: Text.Combine(Text.Split,,,
Option 2 Using the UI, splitting the ImageIDs by each space and by a high number of columns (as I don't know how many images each product will have, but probably no more than 12) and then assign a number suffix to each of those columns and then putting it all back together, but it feels messy as hell.
Option 3 Probably theres a clean calculated way to generate the numbered image base names based on the number in the second column, and then attach the .jpg at the end of each, but honestly I don't know how.
I'd like it to be on the same table as I am already dealing with different queries...
Any help would be gladly accepted.
Starting with this as Table1:
This M code...
let
Source = Table1,
SplitAndIndexImageIDs = Table.AddColumn(Source, "Custom", each Table.AddIndexColumn(Table.FromColumns({Text.Split([ImageIDs]," ")}),"Index",1)),
RenameImageIDs = Table.AddColumn(SplitAndIndexImageIDs, "NewImageIDs", each Text.Combine(Table.AddColumn([Custom],"newcolumn",each Text.BeforeDelimiter([Column1], ".") & "-" &Text.From([Index]) & "." & Text.AfterDelimiter([Column1], "."))[newcolumn],", ")),
#"Removed Other Columns1" = Table.SelectColumns(RenameImageIDs,{"ImageBaseName", "ImageQTY", "NewImageIDs"})
in
#"Removed Other Columns1"
Should give you this result:
Here's a chunky "uber step" piece of code you could put in a custom column given the ImageBaseName and ImageQty columns
Text.Combine
(
List.Transform
(
List.Zip
(
{
List.Repeat({Text.BeforeDelimiter([ImageBaseName], ".", {0, RelativePosition.FromEnd})},[ImageQTY])
,
List.Transform({1..[ImageQTY]}, each "-" & Number.ToText(_) &".")
,
List.Repeat({Text.AfterDelimiter([ImageBaseName], ".", {0, RelativePosition.FromEnd})}, [ImageQTY])
}
)
, each Text.Combine(_)
)
, " "
)
Summary is you create the components of your string as 3 lists (text before file type, numbers 1 through qty, text after file type). Then you use List.Zip which combines the three text components into their own lists. Then we convert those lists back to a single piece of text with List.Transform and Text.Combine.
Lets assume range Table1 contains two columns ImageBaseName and Quantity
Add column ... Index column...
Right Click ImageBaseName Split Column...By Delimiter... --Custom--, use a period as the delimiter and split at Right-most delimiter. That will pull the image suffix off
Add Column ... Custom Column ... name it list and use formula ={1..[Quantity]} which will create a list of values from 1 to the Quantity
Click the double arrow at the top of the new list column and choose expand to new rows
Click-Select the list, Quantity, ImageBaseName.2, ImageBaseName.1 columns and Transform ... Data Type...Text
Add Column .. Custom Column .. name it Custom and use formula =[ImageBaseName.1]&"-"&[list]&"."&[ImageBaseName.2] to put together all the parts
Right-click Index Group By ... [x] Basic, Group By index, new column name ImageIDs, Operation count rows
That will generate code like this:
Table.Group(#"Added Custom1", {"Index"}, {{"ImageIDs", each Table.RowCount(_), type number}})
Use formula bar to change the formula as shown below. It will combine rows using , as a separator
Table.Group(#"Added Custom1", {"Index"}, {{"ImageIDs", each Text.Combine([Custom], ", "), type text}})
Full sample code is below that you can paste into Home .. Advanced Editor...
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1),
#"Split Column by Delimiter" = Table.SplitColumn(#"Added Index", "ImageBaseName", Splitter.SplitTextByEachDelimiter({"."}, QuoteStyle.Csv, true), {"ImageBaseName.1", "ImageBaseName.2"}),
#"Added Custom" = Table.AddColumn(#"Split Column by Delimiter", "list", each {1..[Quantity]}),
#"Expanded list" = Table.ExpandListColumn(#"Added Custom", "list"),
#"Changed Type1" = Table.TransformColumnTypes(#"Expanded list",{{"list", type text}, {"Quantity", type text}, {"ImageBaseName.2", type text}, {"ImageBaseName.1", type text}}),
#"Added Custom1" = Table.AddColumn(#"Changed Type1", "Custom", each [ImageBaseName.1]&"-"&[list]&"."&[ImageBaseName.2]),
#"Grouped Rows" = Table.Group(#"Added Custom1", {"Index"}, {{"ImageIDs", each Text.Combine([Custom], ", "), type text}})
in #"Grouped Rows"
There are probably many ways to combine all this into one uber step, but I thought I'd show the parts

Generate all Possible Unique Combinations of two Excel Columns

I have the following simplified data set which I need to create a unique list from and transpose the data from column B at the same time. I think I need to use INDEX, but I am unsure on the correct syntax for this scenario.
The data in column B is delimited by a space.
This is what my data looks like:
|---------------------|------------------|
| Column A | Column B |
|---------------------|------------------|
| 1 | AA BB |
|---------------------|------------------|
| 2 | BB CC |
|---------------------|------------------|
| 3 | DD EE |
|---------------------|------------------|
Required result
|---------------------|------------------|
| Column A | Column B |
|---------------------|------------------|
| 1 | AA |
|---------------------|------------------|
| 1 | BB |
|---------------------|------------------|
| 2 | BB |
|---------------------|------------------|
| 2 | CC |
|---------------------|------------------|
| 3 | DD |
|---------------------|------------------|
| 3 | EE |
|---------------------|------------------|
To get your output table given your input table, you can use Power Query, from the UI, in just a few steps:
Split Column B by the space delimiter.
Select Column A and then select to unpivot other columns
Delete the extra column Attribute that appears when you unpivot.
This is the M code for that operation
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", Int64.Type}, {"Column2", type text}}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "Column2", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"Column2.1", "Column2.2"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column2.1", type text}, {"Column2.2", type text}}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type1", {"Column1"}, "Attribute", "Value"),
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"})
in
#"Removed Columns"
And the results:
Ron Rosenfeld's answer unpivots the data as the OP indicated by the required result.
If you need to create all combinations of data from two columns (rather than unpivoting), normalize the data by placing each set of values in its own column. In this example, Column B has two data entries per cell, which can be split using Data > Text to Columns. To work with unique entries, either use the standard Excel tool Data > Remove Duplicates, or in Excel Power Query Editor, right click the data column header and click Remove Duplicates.
Create separate queries for each column to be included in the combinations. By adding a custom column with a formula referring to the first data query, Power Query will perform a Full Outer Join across the two columns resulting in all combinations.
Final Table Result
Step 1: Data > Text to Columns
(a) Select Column B. In the ribbon, go to Data > Text to Columns.
(b) Split the data on the appropriate delimiter (Space, Tab, etc.).
Step 2: Combine data and remove duplicates
(a) Cut data from Column C and paste into Column B
(b) Paste Column C data into Column B.
(c) Select Column B and then click Data > Remove Duplicates
(d) If warning pops up about data found next to selection, click "Continue with the current selection"
(e) Select checkbox for Column B and click OK.
Step 3: Create data query for Column A
(a) Select Column A and click Data > From Table/Range
(b) Query Settings > PROPERTIES > Name and enter name "ColumnA"
(c) Home > Close & Load > Close & Load To...
(d) Select: Only Create Connection
Step 4: Create data query for Column B
(a) Select Column B
(b) Data > From Table/Range
(c) Query Settings > PROPERTIES > Name and enter name "ColumnB"
(d) Add Column > Custom Column
(e) New column name: Combinations
(f) Custom column formula: =ColumnA
(g) Expand the new "Combinations" column (icon with left/right arrows)
(h) Drag the "Combinations" column to the left side
(i) Home > Close & Load
Step 5: Sort the output data table

Filter companies that have at least 3 specific products

I have an excel pivot table (and a table dataset behind) that has the structure like the one below. How can I filter/show only companies (Col A) with Products (Col B) 1 AND 2 AND 3? Sounds like something easy but can't find a way to do that. No problem by achieving this using Power Query (available in Power BI or Excel).
A1: Company 1 | B1: Product 1
A2: Company 1 | B2: Product 2
A3: Company 1 | B3: Product 3
A4: Company 1 | B4: Product 4
A5: Company 2 | B5: Product 1
A6: Company 3 | B6: Product 1
A7: Company 4 | B7: Product 1
A8: Company 4 | B8: Product 2
A9: Company 4 | B9: Product 3
A10: Company 4 | B9: Product 4
A11: Company 4 | B9: Product 5
Here's an approach using Power Query.
Starting with this brought into Power Query from the table in Excel:
I then group on Company (Transform > Group By):
Then I add a new custom column (Add Column > Custom Column) to flag whether each company has the 3 products included in its associated grouped table's Product column:
Then I filter out the FALSE entries from the new custom column (use button at top right of Custom column):
Then I expand the Products column from the embedded table in the AllData column (use button at top right of AllData column).
Then I remove the Custom column:
Here's the M code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Company", type text}, {"Product", type text}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Company"}, {{"AllData", each _, type table}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each List.ContainsAll([AllData][Product], {"Product 1","Product 2","Product 3"})),
#"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([Custom] = true)),
#"Expanded AllData" = Table.ExpandTableColumn(#"Filtered Rows", "AllData", {"Product"}, {"Product"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded AllData",{"Custom"})
in
#"Removed Columns"
Basically, you'll need to do a couple of things to do this entirely in Excel:
Add a new table that lists the products, with a column indicating whether that product is included/flagged:
Update your company/product table to have 2 helper columns: One to VLOOKUP whether the product is flagged, and one to indicate whether a company has all 3 flagged products:
The first helper column would use a formula like =VLOOKUP([#Product],tProducts,2,FALSE).
The second helper column would use a formula like =COUNTIFS([Company],[#Company],[Product Flagged],TRUE)>=3.
Rows with a TRUE in Column D have 1 each of Products 1, 2, and 3 (unless you have rows with duplicate company/product combinations, where it gets a bit trickier):
In your pivot table, you can filter by this helper column:

Resources