Excel: How to analyze data in a table that contains multivalue cells - excel

I am working in a science project right now about insects, and I have been logging information about the insects I have been finding along. Right now, I realize that it was a bad decision to register the name of all the insects that I been finding per each observation.
I am not allowed to provide to much information because it is confidential, but I am going to add a similar example of my case in the following table:
# of sample
insect (family)
1
Dermestidae, Histeridae
2
Histeridae, Dichotumius
3
Histeriade
4
Dermestidae, Histeridae
5
Cleridae, Dichotumius
485
Histeriade
486
Dermestidae, Histeridae
487
Dermestidae, Cleridae
488
Histeriade
Something like the above table. In my actual table, I have cells with 5 or 6 diferent insects. The thing is:
How can I search for all the different values? I mean, I want to create a table that contains all the different values and how many of them are... Something like the following table:
Insect (family)
Count
Cleridae
54
Histeridae
154
Dermestidae
34
(There are at least 100 different insects and some of them just appear once, so it is impossible for me to search all the different names manually.
Furthermore, I was thinking about converting my table to a long structure. Something like the following;
Instead of this:
# of sample
insect (family)
1
Dermestidae, Histeridae
2
Histeridae, Dichotumius
3
Histeriade
4
Dermestidae, Histeridae
5
Cleridae, Dichotumius
I want this:
# of sample
insect (family)
1
Dermestidae
1
Histeridae
2
Histeridae
2
Dichotumius
3
Histeriade
4
Dermestidae
4
Histeridae
5
Cleridae
5
Dichotumius
I was thinking that this arrangement should be better than the one that I have now.
I hope someone can help me with this issue. Thanks so much.
I tried the above, but I did´t got it. That's the reasons I asking for help.

What you are trying to accomplish is called unpivoting data. Power query is best for this case. If you want it to do by formula then can try the following formula-
=DROP(REDUCE(0,REDUCE(0,B2:B6,LAMBDA(a,x,VSTACK(a,CONCAT(CHOOSEROWS(A2:A6,ROW(x)-1)&"|")&TEXTSPLIT(x,,",")))),LAMBDA(p,q,VSTACK(p,TEXTSPLIT(q,"|")))),2)

This can be accomplished using Power Query, available in Windows Excel 2010+ and Excel 365 (Windows or Mac)
I am uncertain if you want just the unpivoted table, the Counts of each family, or something else, but I have shown the results at each of the last two steps in the query. You can use what you need.
To use Power Query
Select some cell in your Data Table
Data => Get&Transform => from Table/Range
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table Name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to understand the algorithm
let
//Change next line to reflect your actual data source
Source = Excel.CurrentWorkbook(){[Name="Insects"]}[Content],
//set the data types
#"Changed Type" = Table.TransformColumnTypes(Source,{{"# of sample", Int64.Type}, {"insect (family)", type text}}),
//Split Insect Family column by the comma, into rows
#"Split Column by Delimiter" =
Table.ExpandListColumn(
Table.TransformColumns(#"Changed Type", {{"insect (family)",
Splitter.SplitTextByDelimiter(
",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}),
"insect (family)"),
//Remove any leading and trailing spaces => your unpivoted table
#"Trim Spaces" = Table.TransformColumns(#"Split Column by Delimiter", {"insect (family)", each Text.Trim(_), type text}),
//To create your table with counts, merely Group by the insect (family) column and aggregate with Count
#"Grouped Rows" = Table.Group(#"Trim Spaces", {"insect (family)"}, {{"Count", each Table.RowCount(_), Int64.Type}})
in
#"Grouped Rows"
Source Data
Next to last step showing unpivoted table
Last Step

To answer 'How can I search for all the different values?', the below formula will create a unique list of the insect families (where the insect families are in range B2:B100)
=UNIQUE(TEXTSPLIT(TEXTJOIN(", ",TRUE,B2:B100),"|",", ",TRUE))
You will then be able to use a COUNTIF() formula to find how many tests contain each family.

Related

How to convert categorical values into columns in Excel?

I am working with a dataset that is structured like the one below. As you can see, the indicator column contains binary categorical data.
country_code indicator cumulative_count
AFG cases 52909
AFG deaths 2230
... ... ...
I would like to turn the indicator column into two separate columns (corresponding with the values of indicator: cases and deaths). I.e. I'm expecting the final result to be like this:
country_code cases deaths
AFG 52909 2230
... ... ...
Notes:
The original dataset is publically accessible from ECDC website.
I am only interested in the cumulative_count of one specific year_week (2020-53).
Here is a screenshot of the dataset:
This can also be accomplished using Power Query, available in Windows Excel 2010+ and Excel 365 (Windows or Mac)
To use Power Query
Load your data table into Excel
Select some cell in your Data Table
Data => Get&Transform => from Table/Range or from within sheet
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table Name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to understand the algorithm
let
//Read in the table
//Change table name in next line to your actual table name
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
//Remove the unneeded columns
#"Removed Other Columns" = Table.SelectColumns(Source,{"country_code", "indicator", "year_week", "cumulative_count"}),
//Set the data types for those columns
#"Set Data Type" = Table.TransformColumnTypes(#"Removed Other Columns",{
{"country_code", type text}, {"indicator", type text},{"year_week", type text},{"cumulative_count", Int64.Type}
}),
//Pivot the Indicator column and aggregate by Sum
#"Pivoted Column" = Table.Pivot(#"Set Data Type",
List.Distinct(#"Removed Other Columns"[indicator]), "indicator", "cumulative_count", List.Sum),
//Filter to show only the relevant year-week for rows where thiere is a country_code
// (the others refer to continents)
#"Filtered Rows" = Table.SelectRows(#"Pivoted Column", each ([country_code] <> null) and ([year_week] = "2020-53"))
in
#"Filtered Rows"
filtered to show just 2020-53
If I'm understanding your question correctly. one way:
Add new column F
Formula in $F$2: sumifs($D2:$D$9999, $B2:$B$9999, $B2, $E2:$E$9999, "deaths")
copy formula down through end record
filter column E for "cases"
if you then insert rows above the header row, you can use Subtotal(109, ...) to view cumulative counts for a specific year, or alternatively add another column with Sumif as shown above

VERY messy data: How do I clean up horizontal data that is all inconsistent? [VBA] [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I keep trying ways to fix a dataset, but keep running into problems because of how inconsistent it is.
Here's what the data looks like:
Entry1
Age
45
Occupation
Scientist
Phone Number
408-283-3721
User I.D.
390842
Housing Type
Condo
Square Footage
1073.29
Floors
2
Bathrooms
2.5
Budget Max
$289,287
Household Size
3
Pets?
Yes
Entry2
Floors
2
Square Footage
1974.19
User I.D.
379733
Phone Number
312-246-9121
Pets?
No
Budget Max
$481,621
Household Size
4
Bathrooms
3
Housing Type
Apartment
Occupation
Pilot
Age
32
Entry3
User I.D.
379621
Floors
1
Square Footage
1223.12
Pets?
No
Occupation
Managing Director
Budget Max
$402,342
Phone Number
714-343-1358
Household Size
2
Age
31
Bathrooms
2
Housing Type
House
I want to create a new, cleaned dataset with headers along the top (e.g. "Age", "Occupation", etc) and the values associated (to the right of each variable name cell) as the row, underneath each column.
The variable names are all mixed up, not always on the same column or relative row, so it's not only transposing into a clean new dataset but finding the appropriate values depending on where the variable is (so, I'm thinking something like .Cells.Find(What:="the variable name") for each one and somehow returning the value next to it in a loop). Then, there's the issue where some entries have 3 rows and 8 columns and others 4 rows and 6 columns (not all rows being full too). I also struggle with placing the values under the appropriate column header and not replace the former value. (i.e not just changing one cell but adding to the one below and so on)
There are over 400 records like this, so doing it manually would be super tedious. I'm fairly certain these are all the variations though.
Loop through the data row by row.
If only the first column has data it is the header of an entry. Write that to a new workbook column A.
Enrty Name
Entry1
Then go to the next row. If more than 2 columns have data it is a data row to the previous entry. Data rows contain data in blocks of 2 cells, where the first block is the data description and the second cell the data value.
So you need to loop through the columns of the data rows in blocks of 2:
Take the first block which is Age | 45
Check if the column Age exists. Here it does not so we name the next free column Age and fill in the data to the last enty
Enrty Name
Age
Entry1
45
Then we move on to the next block Occupation | Scientist and do the same. Check if a column Occupation exists? No, ok insert next free column:
Enrty Name
Age
Occupation
Entry1
45
Scientist
We do this until the entire row is done, then we move over to the next one and if this is a data row too, we keep going until we find a new entry header.
So after the first entry your data would look like this:
Enrty Name
Age
Occupation
Phone Number
User I.D.
Housing Type
Square Footage
Floors
Bathrooms
Budget Max
Entry1
45
Scientist
408-283-3721
390842
Condo
1073.29
2
2.5
$289,287
Yes
Then you move over to the next entry
Enrty Name
Age
Occupation
Phone Number
User I.D.
Housing Type
Square Footage
Floors
Bathrooms
Budget Max
Entry1
45
Scientist
408-283-3721
390842
Condo
1073.29
2
2.5
$289,287
Yes
Entry2
The first data set here is Floors | 2, so you search in the first row for Floors it is found in column 8. So we write 2 into column 8.
Enrty Name
Age
Occupation
Phone Number
User I.D.
Housing Type
Square Footage
Floors
Bathrooms
Budget Max
Entry1
45
Scientist
408-283-3721
390842
Condo
1073.29
2
2.5
$289,287
Yes
Entry2
2
If you keep that going you have cleaned up data in the end.
If your real data corresponds to your example, where all the parameters are spelled identically, you can do this using Power Query.
If there are variations in your data that this table doesn't show, examples of these variations would be needed to craft a better solution.
Select some cell in your Data Table
Data => Get&Transform => from Table/Range
Select My data does NOT have headers
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to understand the algorithm
M Code (Modified to deal with missing Parameter Values)
let
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
//Add grouping column Entries and Unpaivot
#"Added Custom" = Table.AddColumn(Source, "Entry", each
if Text.StartsWith([Column1],"entry",Comparer.OrdinalIgnoreCase) then [Column1] else null),
#"Filled Down" = Table.FillDown(#"Added Custom",{"Entry"}),
//Remove extra entry rows
remRows = Table.SelectRows(#"Filled Down", each [Entry] <> [Column1]),
//Table.ReplaceValue(#"Removed Columns1"," ",null,Replacer.ReplaceValue,{"Value"})
//Replace nulls with space so we don't lose one item of a "pair"
#"Replaced Value" = Table.ReplaceValue(remRows,null," ",Replacer.ReplaceValue,Table.ColumnNames(remRows)),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Replaced Value", {"Entry"}, "Attribute", "Value"),
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"}),
#"Added Index" = Table.AddIndexColumn(#"Removed Columns", "Index", 0, 1, Int64.Type),
#"Inserted Integer-Division" = Table.AddColumn(#"Added Index", "Integer-Division", each Number.IntegerDivide([Index], 2), Int64.Type),
#"Removed Columns1" = Table.RemoveColumns(#"Inserted Integer-Division",{"Index"}),
//Group in pairs
//Mark blank subTables
//Extract Entry, Parameter and Value
#"Grouped Rows" = Table.Group(#"Removed Columns1", {"Integer-Division"}, {
{"Empties", each List.NonNullCount(List.ReplaceValue(_[Value]," ", null,Replacer.ReplaceValue))},
{"Entry", each _[Entry]{0}},
{"Parameter", each _[Value]{0}},
{"Value", each _[Value]{1}}
}),
#"Filtered Rows" = Table.SelectRows(#"Grouped Rows", each ([Empties] <> 0)),
#"Removed Columns2" = Table.RemoveColumns(#"Filtered Rows",{"Integer-Division", "Empties"}),
//Group by Entry, Pivot and expand
#"Grouped Rows1" = Table.Group(#"Removed Columns2", {"Entry"}, {
{"Pivot", each Table.Pivot(_, _[Parameter], "Parameter","Value")}
}),
//Column name list for the Pivot table.
//Could sort or order them any way you want
colNames = List.Distinct(#"Removed Columns2"[Parameter]),
#"Expanded Pivot" = Table.ExpandTableColumn(#"Grouped Rows1", "Pivot", colNames,colNames)
in
#"Expanded Pivot"
Original Data
Transformed

Sum multiple rows based on duplicate column data without formula

Based on data available in columns A to D (can be any 100's of columns), I want to sum up all the rows for column E to K (can be any 100's of columns)
The rows should sum up based on duplicate data from rows A to D, the result required as below
This is easily possible to do, with sumif, but would like to know if possible natively in excel or power query without creating unique id for each column or using sumif function or formula of any sort
In powerquery .. unpivot, group, pivot, done.
More detail:
Click select first 4 columns, right click, unpivot other columns
Click select first 4 columns and the new Attribute column, right click, group by
Use Operation:Sum on Column:Value name:count and hit OK
Click select Attribute column and transform .. pivot column... , for value column choose count
File Close and load
Full sample code:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Code1", "Code2", "Code3", "Code4"}, "Attribute", "Value"),
#"Grouped Rows" = Table.Group(#"Unpivoted Other Columns", {"Code1", "Code2", "Code3", "Code4", "Attribute"}, {{"Count", each List.Sum([Value]), type number}}),
#"Pivoted Column" = Table.Pivot(#"Grouped Rows", List.Distinct(#"Grouped Rows"[Attribute]), "Attribute", "Count", List.Sum)
in #"Pivoted Column"
To solve a problem like this, I first do a concrete example and then generalize it. I made a small table in Excel like so:
Code1
Code2
2-Jul-20
3-Jul-20
4-Jul-20
5-Jul-20
6-Jul-20
ERT
EXC
10
6
15
2
ERT
EXC
2
3
23
1
CON
HOR
3
CON
HOR
6
2
356
3
Then I clicked within the table and created a Power Query referencing it. After opening the Power Query Editor, there is a Group By function on the Home tab. It's pretty straightforward to choose the columns you want and the Sum function in a toy example like this.
Then, I opened the Advanced Editor to see what code was auto-generated. It looked something like this:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Grouped Rows orig" = Table.Group(Source, {"Code1", "Code2"}, {{"2-Jul-20", each List.Sum([#"2-Jul-20"]), type nullable number}, {"3-Jul-20", each List.Sum([#"3-Jul-20"]), type nullable number}, {"4-Jul-20", each List.Sum([#"4-Jul-20"]), type nullable number}, {"5-Jul-20", each List.Sum([#"5-Jul-20"]), type nullable number}, {"6-Jul-20", each List.Sum([#"6-Jul-20"]), type nullable number}})
in
#"Grouped Rows orig"
Typically, a Power Query expression is a series of transformations applied to a table, where each one operates on the table as returned from the previous. Here, we start with the original table as "Source" and then do the grouping. The parameters are a little messy, but what we have is: (1) the input table, (2) a list of the column names to group by, and (3) a list of 3-item lists, each of which describe an aggregated column. The sublists have the output column name, the function that does the aggregation, and the data type.
In Power Query, "each" is syntactic sugar for a single parameter function whose parameter is just an underscore. But also, when you have a record or row, you can just use [column] instead of _[column].
So how to generalize the operation you want to do? My first thought is that a convenient grouping function should have two parameters, based on your description. The first is the table to group, and the second is the number of columns starting from the left to group by. If you don't have them arranged contiguously, of course, you could do something else.
sumFromColumn = (t, n) => let
cList = Table.ColumnNames(t),
toGroup = List.FirstN(cList, n),
toSum = List.RemoveFirstN(cList, n),
sumFunc = (cName) => {cName, each List.Sum(Record.Field(_, cName)), type nullable number}
in Table.Group(t, toGroup, List.Transform(toSum, each sumFunc(_))),
#"Grouped Rows" = sumFromColumn(Source, 2), // Group by the first 2 columns and sum the rest
Here is the generalized function I made, which appears to match the original Table.Group operation that was generated by the interface.
The let statement arranges things for readability but does not imply a particular sequence that they happen in. Power Query figures out the dependencies and executes the statements in whatever order is needed.
The list of column names of the table is defined as cList, and split into toGroup and toSum. Then, sumFunc is defined as a function taking a column name and returning the 3-item list needed to define an aggregation operation. In Power Query, functions can return other functions any which way. So here we are defining a function that returns a list, with a function in it. Then we can use List.Transform to take the list of aggregated columns and turn it into the appropriate parameters for Table.Group.
Finally, the actual group by is done with a call like sumFromColumn(Source, 2), which is equivalent to the original statement that hard-codes the column names.
Code1
Code2
2-Jul-20
3-Jul-20
4-Jul-20
5-Jul-20
6-Jul-20
ERT
EXC
12
3
6
38
3
CON
HOR
6
5
356
3
This can easily be changed to sumFromColumn(Source, 1), in which case it will reduce to two rows, but then the second column being non-numeric, will become error values.
Or, you can use sumFromColumn(Source, 3), which will not add things up because the group by columns taken together are distinct.
This way you can easily aggregate any number of columns without caring about their names. I recommend both the Power Query M documentation on microsoft.com and reading about functional programming in general.

Consolidate entries in Excel

I'm attempting to consolidate the unique dates that some students do their homework. Goal is to get the unique number of entries of date&name (i.e. multiple entries of one person on same date counts as one), and ideally the output can be the following. I thought about using arrays or pivot table but I can't think of any non-manual way to do this. Thanks a lot (pardon my poor formatting...).
(Note: the actual problem involves a wide range of dates and >100 names).
Input
Date Name Quan
10/22/2019 Amy 4
10/10/2019 Amy 3
10/23/2019 Amy 1
10/23/2019 Amy 3
10/10/2019 Amy 5
1/31/2011 Cathy 5
1/31/2011 Cathy 2
10/23/2019 Cathy 1
1/31/2011 Cathy 4
1/31/2011 Cathy 5
Output
Date Name Quan
10/23/2019 Amy 4
10/22/2019 Amy 4
10/10/2019 Amy 8
10/23/2019 Cathy 1
1/31/2011 Cathy 16
If you have O365 with dynamic arrays, you can use all formulas
To get the unique list of Date/Name, sorted in the order you show:
eg: F1: =SORT(UNIQUE(INDEX(Table1,SEQUENCE(ROWS(Table1)),{1,2})),{2,1},{1,-1})
Results will spill to the appropriate rows
To get the sums:
eg: H2: =SUMIFS(Table1[Quan],Table1[Date],F2,Table1[Name],G2)
Note that I used a Table and structured references, but you can use regular range references if you prefer. Structured references have the ability of automatically expanding/contracting if you add/remove data from your table.
Note:
If you don't have the latest functions, you can still use formulas:
To create a unique list, you'll need a helper column in, eg: K
K2: =IFERROR(INDEX(Table1[Date]&Table1[Name],MATCH(0,COUNTIF($K$1:K1,Table1[Date]&Table1[Name]),0)),"")
Fill down until Blanks
Then, for the Date Column:
L2: =--LEFT(K2,5)
Name column:
M2: =MID(K2,6,99)
And use SUMIFS as before to get the SUM.
You'll need to sort using the Data/Sort tab.
Another method is by using Power Query aka Get & Transform (available in Excel 2010+) to
Group By Date and Name
Aggregate with Sum
MCode
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Date", type date}, {"Name", type text}, {"Quan", Int64.Type}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Date", "Name"}, {{"Quant", each List.Sum([Quan]), type number}}),
#"Sorted Rows" = Table.Sort(#"Grouped Rows",{{"Date", Order.Descending}, {"Name", Order.Ascending}})
in
#"Sorted Rows"
I assume you have three rows: Date, Name, and Quan?
If that's the case, you can try using the CONCATENATE formula to combine Date and Name, then use the new "Date & Name" row and the "Quan" row to make pivot table.
CONCATENATE result should look like this
Pivot table will look for the same "Date & Name" and give you the "Quan" sum you want.
If you use office 2016 or higher (or you have installed PowerQuery-Addin) and you have a lot of Datarow. Group the Data with PowerQuery and without any formula

How to use Countifs,Or and Sumproduct efficiently

I have a list of accounts with 2 digit modifiers. Some accounts will have more then one modifier. I am looking for accounts with a certain combinations of modifiers.
So I have a list of accounts in the B column.
I have the modifiers in C Column
Example
Act # Modifier
111 80
111 56
111
222 55
222
333 51
333 50
333
I have some working code that works great until I get to many rows.
In this sample formula I have 8 Modifier groups.
50,22,51,62
51,22,62
54,50,51
55,50,51
56,50,51
80,50,51
"AS",50,51
59,50
=IF(OR(SUMPRODUCT(COUNTIFS(B:B,B3,C:C{50,22,51,62}))>=2,SUMPRODUCT(COUNTIFS(B:B,B3,C:C,{51,22,62}))>=2,SUMPRODUCT(COUNTIFS(B:{54,50,51}))>=2,SUMPRODUCT(COUNTIFS(B:B,B3,C:C,{55,50,51}))>=2,SUMPRODUCT(COUNTIFS(B:B,B3,C:C,{56,50,51}))>=2,SUMPRODUCT(COUNTIFS(B:B,B3,C:C,{80,50,51}))>=2,SUMPRODUCT(COUNTIFS(B:B,B3,C:C,{"AS",50,51}))>=2,SUMPRODUCT(COUNTIFS(B:B,B3,C:C,{59,50}))>=2),"Check","")
This code will put check by any account that has 2 or more of the modifiers from any of the 8 groups. It has to be 2 modifiers from the same group though.
I was just wondering if there is a better way to write this? Instead of doing all these or can I just do OR for the different modifier criteria I am looking for?
Something like
=COUNTIFS(H:H,H5,I:I,OR({59,50},{"AS",50,51}))
As requested by #SkysLastChance, I will post my solution using Power Query (PQ) even though the question was tagged to Excel-Formula.
Please note you MUST use Excel 2010 or later versions otherwise you will not be able to use Power Query. My answer might not be robust enough for people who has not used PQ before. So feel free to leave a question if you are unclear with any particular step.
Step 1
Convert the Account List and Modifier Group in the example into Table in your excel worksheet. One way of doing that is to highlight the data including headers and press Ctrl+T. Then you should get two tables as shown below. I have named the first table as Tbl_ActList, and named the second one as Tbl_MoGrp.
Please note I have added some data to the Account List table for result testing purpose.
Step 2
Select any cell within a table, go to the Data tab on top of your excel (mine is Excel 2016), click From Table in the Get & Transform section. It will load and add the table to the built-in PQ Editor. You can exit the editor (and keep the changes), and repeat this step to add the second table to the PQ Editor. Alternatively you can add a new query in the PQ Editor and find the second table from your excel worksheet. I will not demonstrate this process as you can google the know-how later on.
Step 3
Once you have added both tables to the editor, you can start editing/transforming data in each table/query using built-in functions and/or advanced coding. In this case I only used built-in functions.
For the Modifier Group table, I want to transform the original data into a 2-Column list with one column showing which Group the modifier belongs to, and the other column showing a single modifier.
Firstly, use the Split Column function in the Transform tab to split the original modifier groups into single value by using , (comma) as the delimiter.
The new table is in matrix structure which is no ideal for look up purpose, so I used Unpivot Columns function in the Transform tab to convert it into list structure. What I actually did is to highlight the Grp column and select Unpivot Other Columns to get the list. Alternatively you can highlight the first four columns and use Unpivot Columns to get the same list.
Then I renamed Value column as Modifier, and removed the Attribute column to end up a 2-Column table.
Please note all data in each table/query in this example have been set to 'Text' format (aka data type). Data type is very sensitive and specific in PQ, and incorrect data type may lead to error.
Here is the full code behind the scene. All steps are performed using the built-in functions without any advanced coding:
let
Source = Excel.CurrentWorkbook(){[Name="Tbl_MoGrp"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Modifier", type text}, {"Grp", type text}}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "Modifier", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), {"Modifier.1", "Modifier.2", "Modifier.3", "Modifier.4"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Modifier.1", type text}, {"Modifier.2", type text}, {"Modifier.3", type text}, {"Modifier.4", type text}}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type1", {"Grp"}, "Attribute", "Value"),
#"Renamed Columns" = Table.RenameColumns(#"Unpivoted Other Columns",{{"Value", "Modifier"}}),
#"Removed Columns" = Table.RemoveColumns(#"Renamed Columns",{"Attribute"})
in
#"Removed Columns"
Step 4
With the Modifier Group list ready, we can look up the modifier group in the Account List table for each modifier using Merge Queries function in the Home tab. The logic is to find the link between two tables to conduct a look up.
Firstly, select/highlight the column (Modifier) that contains the look up value in the origin table (Tbl_ActList), then select the table (Tbl_MoGrp) that you want to look up from, then select/highlight the corresponding column (Modifier) in the second table, and then click OK to continue.
Please note before merging I have filtered the Modifier column in the Account List table to get rid of cells showing null (blank) as they are not useful for the look up.
After merging the queries there is a new column added to the Account List table. It may look like a column but it contains all data from the Modifier Group table stored in Grp column and Modifier column. As we want to look up the modifier group only, we can Expand the column to show the Grp column only.
Click on the little square box on the right hand side of the header of the last column to trigger the Expand function, then select the Grp column only and click OK to continue.
Now we have a table showing account number, modifier, and modifier group. We can then use the Group By function in the Home tab to find out for each account number how many modifiers have appeared in each applicable modifier group.
Please See below screenshot for the settings for the Group By function.
Then I sorted the table ascending by Acc # column, and filtered the Count column to show values greater than or equal to 2, i.e. at least 2 modifies linked to that account number have appeared in a modifier group.
Here is the full code behind the scene:
let
Source = Excel.CurrentWorkbook(){[Name="Tbl_ActList"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Act #", type text}, {"Modifier", type text}}),
#"Filtered Rows" = Table.SelectRows(#"Changed Type", each ([Modifier] <> null)),
#"Merged Queries" = Table.NestedJoin(#"Filtered Rows", {"Modifier"}, Tbl_MoGrp, {"Modifier"}, "Tbl_Grp", JoinKind.LeftOuter),
#"Expanded Tbl_Grp" = Table.ExpandTableColumn(#"Merged Queries", "Tbl_Grp", {"Grp"}, {"Grp"}),
#"Grouped Rows" = Table.Group(#"Expanded Tbl_Grp", {"Act #", "Grp"}, {{"Count", each Table.RowCount(_), type number}}),
#"Sorted Rows" = Table.Sort(#"Grouped Rows",{{"Act #", Order.Ascending}}),
#"Filtered Rows1" = Table.SelectRows(#"Sorted Rows", each [Count] >= 2)
in
#"Filtered Rows1"
Step 5
The answer could stop at Step 4 as the table has shown the account number that we are looking for. However if there are thousands of account numbers, then it is better to Remove Other Columns except the Act # column, and Remove Duplicates within the column, and then Close & Load the result to a new worksheet. The final result may look like this:
A tip here, before Close & Load any query for the first time, it is better to set the following in your Query Options. It will prevent PQ Editor to load each of your queries to a separate worksheet by default. Just imaging how long it will take if you have 20 queries in your PQ Editor and each of them have more than a thousand lines of data.
Once you change the default option, PQ Editor will only create connections for your queries after you click Close & Load, and you can manually load a specific query result to a worksheet as shown below:
Conclusion
I believe if this question was tagged as a PowerQuery, there may be more concise or 'fancier' answers than mine. Regardless, the things that I like PQ the most are it is a built-in function of excel (2010 and later versions), it is scalable, replicable and more powerful when it comes to data cleansing and transforming.
Cheers :)

Resources