Generate all Possible Unique Combinations of two Excel Columns - excel

I have the following simplified data set which I need to create a unique list from and transpose the data from column B at the same time. I think I need to use INDEX, but I am unsure on the correct syntax for this scenario.
The data in column B is delimited by a space.
This is what my data looks like:
|---------------------|------------------|
| Column A | Column B |
|---------------------|------------------|
| 1 | AA BB |
|---------------------|------------------|
| 2 | BB CC |
|---------------------|------------------|
| 3 | DD EE |
|---------------------|------------------|
Required result
|---------------------|------------------|
| Column A | Column B |
|---------------------|------------------|
| 1 | AA |
|---------------------|------------------|
| 1 | BB |
|---------------------|------------------|
| 2 | BB |
|---------------------|------------------|
| 2 | CC |
|---------------------|------------------|
| 3 | DD |
|---------------------|------------------|
| 3 | EE |
|---------------------|------------------|

To get your output table given your input table, you can use Power Query, from the UI, in just a few steps:
Split Column B by the space delimiter.
Select Column A and then select to unpivot other columns
Delete the extra column Attribute that appears when you unpivot.
This is the M code for that operation
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", Int64.Type}, {"Column2", type text}}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "Column2", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"Column2.1", "Column2.2"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column2.1", type text}, {"Column2.2", type text}}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type1", {"Column1"}, "Attribute", "Value"),
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"})
in
#"Removed Columns"
And the results:

Ron Rosenfeld's answer unpivots the data as the OP indicated by the required result.
If you need to create all combinations of data from two columns (rather than unpivoting), normalize the data by placing each set of values in its own column. In this example, Column B has two data entries per cell, which can be split using Data > Text to Columns. To work with unique entries, either use the standard Excel tool Data > Remove Duplicates, or in Excel Power Query Editor, right click the data column header and click Remove Duplicates.
Create separate queries for each column to be included in the combinations. By adding a custom column with a formula referring to the first data query, Power Query will perform a Full Outer Join across the two columns resulting in all combinations.
Final Table Result
Step 1: Data > Text to Columns
(a) Select Column B. In the ribbon, go to Data > Text to Columns.
(b) Split the data on the appropriate delimiter (Space, Tab, etc.).
Step 2: Combine data and remove duplicates
(a) Cut data from Column C and paste into Column B
(b) Paste Column C data into Column B.
(c) Select Column B and then click Data > Remove Duplicates
(d) If warning pops up about data found next to selection, click "Continue with the current selection"
(e) Select checkbox for Column B and click OK.
Step 3: Create data query for Column A
(a) Select Column A and click Data > From Table/Range
(b) Query Settings > PROPERTIES > Name and enter name "ColumnA"
(c) Home > Close & Load > Close & Load To...
(d) Select: Only Create Connection
Step 4: Create data query for Column B
(a) Select Column B
(b) Data > From Table/Range
(c) Query Settings > PROPERTIES > Name and enter name "ColumnB"
(d) Add Column > Custom Column
(e) New column name: Combinations
(f) Custom column formula: =ColumnA
(g) Expand the new "Combinations" column (icon with left/right arrows)
(h) Drag the "Combinations" column to the left side
(i) Home > Close & Load
Step 5: Sort the output data table

Related

How to number each occurrence of a substring in a cell in Power Query?

I'm fairly new to Power Query and have hit a hiccup that's been bothering me all day. I've read multiple threads here and on the Power BI community and none has really cleared my question, and my logic suggests a few different options to achieve what I want, but my lack of experience blocks any solution I attempt.
Context:
I'm building a database for product import/export into WooCommerce, eBay and other channels; which takes some inputs by the (non tech savyy) users in Excel and develops several of the required fields. One of those is the image file names for each product.
I have this columns (in a much larger query table):
| ImageBaseName | ImageQTY | ImageIDs |
| product-name.jpg | 3 | product-name.jpg product-name.jpg product-name.jpg |
| other-product.jpg| 5 |other-product.jpg other-product.jpg...other-product.jpg |
And my desired output would be:
| ImageBaseName | ImageQTY | ImageIDs |
| product-name.jpg | 3 | product-name-1.jpg product-name-2.jpg product-name-3.jpg |
| other-product.jpg| 5 |other-product-1.jpg other-product-2.jpg...other-product-5.jpg |
In fact I don't need the two first columns if I get the ImageIDs like that.
The ImageBaseName column is generated from the input product name.
The ImageQTY column is direct input by the user.
The ImageIDs column I got so far is from using:
= Table.AddColumn(#"previous step", "ImageIDs", each Text.Trim(Text.Repeat ([ImageBaseName]&" ", [ImageQty])))
And these are the options I've considered thus far:
Option 1: Text.Combine(Text.Split ImageIDs and (somehow) count and number each item in the list) and concatenate it all back... Which would probably start like this: Text.Combine(Text.Split,,,
Option 2 Using the UI, splitting the ImageIDs by each space and by a high number of columns (as I don't know how many images each product will have, but probably no more than 12) and then assign a number suffix to each of those columns and then putting it all back together, but it feels messy as hell.
Option 3 Probably theres a clean calculated way to generate the numbered image base names based on the number in the second column, and then attach the .jpg at the end of each, but honestly I don't know how.
I'd like it to be on the same table as I am already dealing with different queries...
Any help would be gladly accepted.
Starting with this as Table1:
This M code...
let
Source = Table1,
SplitAndIndexImageIDs = Table.AddColumn(Source, "Custom", each Table.AddIndexColumn(Table.FromColumns({Text.Split([ImageIDs]," ")}),"Index",1)),
RenameImageIDs = Table.AddColumn(SplitAndIndexImageIDs, "NewImageIDs", each Text.Combine(Table.AddColumn([Custom],"newcolumn",each Text.BeforeDelimiter([Column1], ".") & "-" &Text.From([Index]) & "." & Text.AfterDelimiter([Column1], "."))[newcolumn],", ")),
#"Removed Other Columns1" = Table.SelectColumns(RenameImageIDs,{"ImageBaseName", "ImageQTY", "NewImageIDs"})
in
#"Removed Other Columns1"
Should give you this result:
Here's a chunky "uber step" piece of code you could put in a custom column given the ImageBaseName and ImageQty columns
Text.Combine
(
List.Transform
(
List.Zip
(
{
List.Repeat({Text.BeforeDelimiter([ImageBaseName], ".", {0, RelativePosition.FromEnd})},[ImageQTY])
,
List.Transform({1..[ImageQTY]}, each "-" & Number.ToText(_) &".")
,
List.Repeat({Text.AfterDelimiter([ImageBaseName], ".", {0, RelativePosition.FromEnd})}, [ImageQTY])
}
)
, each Text.Combine(_)
)
, " "
)
Summary is you create the components of your string as 3 lists (text before file type, numbers 1 through qty, text after file type). Then you use List.Zip which combines the three text components into their own lists. Then we convert those lists back to a single piece of text with List.Transform and Text.Combine.
Lets assume range Table1 contains two columns ImageBaseName and Quantity
Add column ... Index column...
Right Click ImageBaseName Split Column...By Delimiter... --Custom--, use a period as the delimiter and split at Right-most delimiter. That will pull the image suffix off
Add Column ... Custom Column ... name it list and use formula ={1..[Quantity]} which will create a list of values from 1 to the Quantity
Click the double arrow at the top of the new list column and choose expand to new rows
Click-Select the list, Quantity, ImageBaseName.2, ImageBaseName.1 columns and Transform ... Data Type...Text
Add Column .. Custom Column .. name it Custom and use formula =[ImageBaseName.1]&"-"&[list]&"."&[ImageBaseName.2] to put together all the parts
Right-click Index Group By ... [x] Basic, Group By index, new column name ImageIDs, Operation count rows
That will generate code like this:
Table.Group(#"Added Custom1", {"Index"}, {{"ImageIDs", each Table.RowCount(_), type number}})
Use formula bar to change the formula as shown below. It will combine rows using , as a separator
Table.Group(#"Added Custom1", {"Index"}, {{"ImageIDs", each Text.Combine([Custom], ", "), type text}})
Full sample code is below that you can paste into Home .. Advanced Editor...
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1),
#"Split Column by Delimiter" = Table.SplitColumn(#"Added Index", "ImageBaseName", Splitter.SplitTextByEachDelimiter({"."}, QuoteStyle.Csv, true), {"ImageBaseName.1", "ImageBaseName.2"}),
#"Added Custom" = Table.AddColumn(#"Split Column by Delimiter", "list", each {1..[Quantity]}),
#"Expanded list" = Table.ExpandListColumn(#"Added Custom", "list"),
#"Changed Type1" = Table.TransformColumnTypes(#"Expanded list",{{"list", type text}, {"Quantity", type text}, {"ImageBaseName.2", type text}, {"ImageBaseName.1", type text}}),
#"Added Custom1" = Table.AddColumn(#"Changed Type1", "Custom", each [ImageBaseName.1]&"-"&[list]&"."&[ImageBaseName.2]),
#"Grouped Rows" = Table.Group(#"Added Custom1", {"Index"}, {{"ImageIDs", each Text.Combine([Custom], ", "), type text}})
in #"Grouped Rows"
There are probably many ways to combine all this into one uber step, but I thought I'd show the parts

How to copy a data row from column A to column B, between each data row

Troubles with the update formula Troubles with formula, asking for a missing matrix Steps, I have tried to retrieve data from column B to column D
Know is telling that I insert insufficient argumentsGood afternoon,
I have column B, with descriptions in Portuguese, row by row and column D with the translations in English:
I'm trying to insert in column D the corresponding translation in Portuguese under each data row in English.
But I can't find any formula to do that, also I didn't find any question like this in the forum.
The only nearest question about, is to insert a blank row between data rows with this formula =MOD(ROW(D2),2)=0 or with a filter adding series. And retrieving data with vlookup, as in the attached image.
You can use power query to tackle this task.
I have used the following data for demonstration. Please note I am using Excel 365 English version.
| Portuguese | English |
|------------|---------|
| um | one |
| dois | two |
| trĂªs | three |
| quatro | four |
| cinco | five |
| seis | six |
| Sete | seven |
| oito | eight |
| nove | nine |
| dez | ten |
Steps are:
Load/Add the data set to Power Query Editor;
Make a duplicated column of Portuguese, then add an Index column with index starting from 1, then you should have something like below:
Use Merge Columns function under Transform tab to merge the English column with the Portuguese - Copy column with a custom delimiter such as hashtag # (as long as this delimiter is not part of your original texts), then you should have:
Use Split Columns function under Transform tab to split the merged column by the same delimiter #, and make sure in the Advanced Settings to choose to put the results into Rows as shown below:
The output will look like the following:
You can choose to remove the Portuguese column if you do not want to show it in the final output, then Close & Load the table to a new worksheet (by default).
Here are the power query M Codes behind the scene. All functions used are within GUI so should be easy to follow and execute.
let
Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Portuguese", type text}, {"English", type text}}),
#"Duplicated Column" = Table.DuplicateColumn(#"Changed Type", "Portuguese", "Portuguese - Copy"),
#"Added Index" = Table.AddIndexColumn(#"Duplicated Column", "Index", 1, 1),
#"Merged Columns" = Table.CombineColumns(#"Added Index",{"English", "Portuguese - Copy"},Combiner.CombineTextByDelimiter("#", QuoteStyle.None),"Merged"),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Merged Columns", {{"Merged", Splitter.SplitTextByDelimiter("#", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Merged"),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Merged", type text}})
in
#"Changed Type1"
Let me know if you have any questions. Cheers :)
Put this formula in the Result Column and adjust $A$1 to your first portuguese term and $B$1 to you first translated term:
'=OFFSET($A$1;((ROW()-ROW($B$1))/2)-ROW($A$1)+1;0)
You should get a Column, where every Portuguese term is repeated. Now you can overwrite the formulas in the upper cells with the english translations.
The formula calculates the difference between current (translated row) cell and the first translated row and cuts it in half: that is the row of the portuguese term to associate with this cell. Then it uses that row number as offset to the first row of portuguese terms.
Now, if you want to have the forst row empty, you can of course fill the whole formula in the true part of an if formula:
=IF(MOD(ROW()-ROW($B$1),2)=0;"";OFFSET($A$1;((ROW()-ROW($B$1))/2)-ROW($A$1)+1;0))
That is something you will often do in excel and I assume you know that trick. It makes the core formula a little harder to read, but it basically says: if the current row inside this block is divisible by two, then set the row empty, else set the row equal to the formula I presented above.

How to update values in more than one column based on another column with M in PowerQuery [duplicate]

I want to do something similar to Power Query Transform a Column based on Another Column, but I'm getting stuck on how to modify the syntax for my particular goal.
Similar to the linked question, assume that I have the following table:
Table 1:
Column A | Column B | Column C
------------------------------
1 | 4 | 7
2 | 5 | 8
3 | 6 | 9
Instead of changing the value of the Column A conditional on Column B, I want to multiply the values in multiple columns (Column B and Column C) by those in Column A and replace the values in the initial columns so that I can get the following:
Table 1:
Column A | Column B | Column C
------------------------------
1 | 4 | 7
2 | 10 | 16
3 | 18 | 27
Is this possible to do without using multiple sequences of Table.AddColumn followed by Table.RemoveColumns?
I have also tried Table.TransformColumns based on this, but not been able to get the syntax right to achieve this.
Table.TransformColumns won't give you Column A unless you can index back into the table, which will only be possible if your columns only have unique data.
Table.TransformRows will let you build new rows with whatever logic you want:
let
Source = Csv.Document("Column A,Column B,Column C
1,4,7
2,5,8
3,6,9"),
PromotedHeaders = Table.PromoteHeaders(Source),
ChangedType = Table.TransformColumnTypes(PromotedHeaders,{{"Column A", type number}, {"Column B", type number}, {"Column C", type number}}),
MultipliedRows = Table.FromRecords(Table.TransformRows(ChangedType,
each [
Column A = [Column A],
Column B = [Column A] * [Column B],
Column C = [Column A] * [Column C]
]))
in
MultipliedRows
This works well for columns B and C, but if you need B through Z you might want fancier logic to avoid repeating yourself.
EDIT: A more general solution for many columns is to use Record.TransformFields on a list of transforms for all column names except "Column A".
let
Source = Csv.Document("Column A,Column B,Column C,D,E,F
1,4,7,1,2,3
2,5,8,4,5,6
3,6,9,7,8,9"),
PromotedHeaders = Table.PromoteHeaders(Source),
ChangedType = Table.TransformColumnTypes(PromotedHeaders,{{"Column A", type number}, {"Column B", type number}, {"Column C", type number}, {"D", type number}, {"E", type number}, {"F", type number}}),
MultipliedRows = Table.FromRecords(Table.TransformRows(ChangedType, (row) =>
let
ColumnA = row[Column A],
OtherColumns = List.RemoveItems(Record.FieldNames(row), {"Column A"}),
Transforms = List.Transform(OtherColumns, (name) => { name, (cell) => cell * ColumnA })
in
Record.TransformFields(row, Transforms)))
in
MultipliedRows
I think the Table.AddColumn followed by Table.RemoveColumns is the usual and clearest way for this transformation. I'm also not happy with the fact that this results in so many steps in PowerQuery.
But due to internal backfolding methods of PowerQuery this will usualy not result in better performance. (PowerQuery trys to give the main Work back to the queried Database if avaiable)
Assuming this doesn't need to be VBA and/or programmatic, you can just copy values in the first column, then highlight the values in the second column, and "Paste Special..." > Multiply.
That will produce the results in the same place you paste the multiplier.

Filter companies that have at least 3 specific products

I have an excel pivot table (and a table dataset behind) that has the structure like the one below. How can I filter/show only companies (Col A) with Products (Col B) 1 AND 2 AND 3? Sounds like something easy but can't find a way to do that. No problem by achieving this using Power Query (available in Power BI or Excel).
A1: Company 1 | B1: Product 1
A2: Company 1 | B2: Product 2
A3: Company 1 | B3: Product 3
A4: Company 1 | B4: Product 4
A5: Company 2 | B5: Product 1
A6: Company 3 | B6: Product 1
A7: Company 4 | B7: Product 1
A8: Company 4 | B8: Product 2
A9: Company 4 | B9: Product 3
A10: Company 4 | B9: Product 4
A11: Company 4 | B9: Product 5
Here's an approach using Power Query.
Starting with this brought into Power Query from the table in Excel:
I then group on Company (Transform > Group By):
Then I add a new custom column (Add Column > Custom Column) to flag whether each company has the 3 products included in its associated grouped table's Product column:
Then I filter out the FALSE entries from the new custom column (use button at top right of Custom column):
Then I expand the Products column from the embedded table in the AllData column (use button at top right of AllData column).
Then I remove the Custom column:
Here's the M code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Company", type text}, {"Product", type text}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Company"}, {{"AllData", each _, type table}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each List.ContainsAll([AllData][Product], {"Product 1","Product 2","Product 3"})),
#"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([Custom] = true)),
#"Expanded AllData" = Table.ExpandTableColumn(#"Filtered Rows", "AllData", {"Product"}, {"Product"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded AllData",{"Custom"})
in
#"Removed Columns"
Basically, you'll need to do a couple of things to do this entirely in Excel:
Add a new table that lists the products, with a column indicating whether that product is included/flagged:
Update your company/product table to have 2 helper columns: One to VLOOKUP whether the product is flagged, and one to indicate whether a company has all 3 flagged products:
The first helper column would use a formula like =VLOOKUP([#Product],tProducts,2,FALSE).
The second helper column would use a formula like =COUNTIFS([Company],[#Company],[Product Flagged],TRUE)>=3.
Rows with a TRUE in Column D have 1 each of Products 1, 2, and 3 (unless you have rows with duplicate company/product combinations, where it gets a bit trickier):
In your pivot table, you can filter by this helper column:

Power Query: transform a column by multiplying by another column

I want to do something similar to Power Query Transform a Column based on Another Column, but I'm getting stuck on how to modify the syntax for my particular goal.
Similar to the linked question, assume that I have the following table:
Table 1:
Column A | Column B | Column C
------------------------------
1 | 4 | 7
2 | 5 | 8
3 | 6 | 9
Instead of changing the value of the Column A conditional on Column B, I want to multiply the values in multiple columns (Column B and Column C) by those in Column A and replace the values in the initial columns so that I can get the following:
Table 1:
Column A | Column B | Column C
------------------------------
1 | 4 | 7
2 | 10 | 16
3 | 18 | 27
Is this possible to do without using multiple sequences of Table.AddColumn followed by Table.RemoveColumns?
I have also tried Table.TransformColumns based on this, but not been able to get the syntax right to achieve this.
Table.TransformColumns won't give you Column A unless you can index back into the table, which will only be possible if your columns only have unique data.
Table.TransformRows will let you build new rows with whatever logic you want:
let
Source = Csv.Document("Column A,Column B,Column C
1,4,7
2,5,8
3,6,9"),
PromotedHeaders = Table.PromoteHeaders(Source),
ChangedType = Table.TransformColumnTypes(PromotedHeaders,{{"Column A", type number}, {"Column B", type number}, {"Column C", type number}}),
MultipliedRows = Table.FromRecords(Table.TransformRows(ChangedType,
each [
Column A = [Column A],
Column B = [Column A] * [Column B],
Column C = [Column A] * [Column C]
]))
in
MultipliedRows
This works well for columns B and C, but if you need B through Z you might want fancier logic to avoid repeating yourself.
EDIT: A more general solution for many columns is to use Record.TransformFields on a list of transforms for all column names except "Column A".
let
Source = Csv.Document("Column A,Column B,Column C,D,E,F
1,4,7,1,2,3
2,5,8,4,5,6
3,6,9,7,8,9"),
PromotedHeaders = Table.PromoteHeaders(Source),
ChangedType = Table.TransformColumnTypes(PromotedHeaders,{{"Column A", type number}, {"Column B", type number}, {"Column C", type number}, {"D", type number}, {"E", type number}, {"F", type number}}),
MultipliedRows = Table.FromRecords(Table.TransformRows(ChangedType, (row) =>
let
ColumnA = row[Column A],
OtherColumns = List.RemoveItems(Record.FieldNames(row), {"Column A"}),
Transforms = List.Transform(OtherColumns, (name) => { name, (cell) => cell * ColumnA })
in
Record.TransformFields(row, Transforms)))
in
MultipliedRows
I think the Table.AddColumn followed by Table.RemoveColumns is the usual and clearest way for this transformation. I'm also not happy with the fact that this results in so many steps in PowerQuery.
But due to internal backfolding methods of PowerQuery this will usualy not result in better performance. (PowerQuery trys to give the main Work back to the queried Database if avaiable)
Assuming this doesn't need to be VBA and/or programmatic, you can just copy values in the first column, then highlight the values in the second column, and "Paste Special..." > Multiply.
That will produce the results in the same place you paste the multiplier.

Resources