Comparing Multiple Lists of Data in Excel to find gaps - excel

I am conducting a product analysis and have data on all the products my company has and need to compare that against all the products that our competitors have. I need to find out where the gaps are, in particular what products do they have that we don't? from the multiple lists of data I have on each competitors products lists against our product data list.
What is the best formula to use or way to find and interpret this data in Excel?
Thanks

Join the tables using Power query.
Table 1 = your company
key = Product
Table2 through TableN= competitors
key = Product
Combine the competitors tables into a single table.
Do a nestedjoin with JoinKind.RightAnti which will return all the products in Table 2 that do not exist in Table 1
We use a Nested join since the keys have the same Column Header
M Code
You can paste this code into the Power Query Advanced Editor, and change the Name= argument in lines 2 through N to reflect your actual table names
If you have many competitors, it is possible to create a function to gather all the table names, but overkill for just a few
Step through Applied Steps in the Power Query UI to see what each line does.
let
myCompany = Excel.CurrentWorkbook(){[Name="myCompany"]}[Content],
otherCompany = Excel.CurrentWorkbook(){[Name="otherCompany"]}[Content],
company3 = Excel.CurrentWorkbook(){[Name="company3"]}[Content],
//Join the competitor tables
competitors = Table.Combine({otherCompany,company3}),
//finde the missing
missing = Table.NestedJoin(myCompany,"Product",competitors,"Product", "Missing", JoinKind.RightAnti),
#"Removed Columns" = Table.RemoveColumns(missing,{"Product", "Description"}),
#"Expanded Missing" = Table.ExpandTableColumn(#"Removed Columns", "Missing", {"Product", "Description"}, {"Missing.Product", "Missing.Description"}),
#"Sorted Rows" = Table.Sort(#"Expanded Missing",{{"Missing.Product", Order.Ascending}})
in
#"Sorted Rows"

You should definitely do this type of comparison in a different system than Excel.
However, if you definitely want to do it in Excel and only want to compare the products, you can do it like this:
1 Sheet contains a table with all the competitors products.
If the competitor is also important to know, then add that too...
A
B
1
product a
competitor a
2
product b
competitor b
3
product c
competitor a
Then use 1 Sheet with a list of your product and a vlookup to check if the competitor has it. You also add vlookup to the competitor list to see if you have what they got.
Your products list:
A
B
1
product a
=vlookup(A1;$Sheet1.A:B;2;FALSE)
2
product b
=vlookup(B1;$Sheet1.A:B;2;FALSE)
3
product c
=vlookup(AC;$Sheet1.A:B;2;FALSE)
Column B will show which competitor has this.
Add the same kind of vlookup to the competitor too.
Depending on your Excel version, it could be that you should use , instead of ; and the vlookup could be called something in a different language.

Related

Summing, Lookups and multiple sheets in Excel

I'm not a native Excel user (much more of a SQL man) and I have the following scenario that is doing my head in. Mainly because I'm sure it's relatively simple, but because I'm not super-familiar with all the advanced functions of Excel.
I have a 2 sheets in question.
Sheet One has the following columns:
SKU
Price
1234
$10
1235
$20
Sheet Two has the following Columns:
SKU
Business Unit
1234
BU1
1235
BU1
1234
BU1
1234
BU2
1234
BU2
1234
BU2
And I have the following Formula:
=SUMIF('Sheet1'[SKU], VLOOKUP($F$2, sheet2, 2, FALSE), 'Sheet1'[Price])
(Which admittedly is copy-pasta from the Internets and then I've tried to mash together to get it to do what I want)
What I am trying to do is grouping by Business Unit, look up the SKUs and multiply the total, based on Business Unit by the Price - so it would look like the following:
Business Unit
Total Value
BU1
$40
BU2
$30
And my limitations in Excel are causing my hair to fall out as I bang my head against my keyboard - as I'm sure it's relatively simple - but I'm missing something key.
You may try as shown in below as well,
• Formula used in cell G2
=LET(_merge,DROP(HSTACK(A3:B8,XLOOKUP(A3:A8,D3:D4,E3:E4)),,1),
_uBUnit,UNIQUE(INDEX(_merge,,1)),
_tValue,BYROW(_uBUnit,LAMBDA(x,SUM(INDEX(_merge,,2)*(INDEX(_merge,,1)=x)))),
VSTACK({"Business Unit","Total Value"},HSTACK(_uBUnit,_tValue)))
Notes: Break-down & Explanation Of Each.
• _merge --> Returns both the tables as combined after looking the price for each SKU and then excludes the SKU from the array, only keeping the one required as output, i.e., Business Unit & Price
XLOOKUP() --> Looks Up On SKU To Return The Price.
HSTACK() --> Used To Combine Both The Arrays.
=HSTACK(A3:B8,XLOOKUP(A3:A8,D3:D4,E3:E4))
Using DROP() --> To Exclude The SKU Col.
DROP(HSTACK(A3:B8,XLOOKUP(A3:A8,D3:D4,E3:E4)),,1)
• _uBUnit --> Returns the unique value of each Business Unit.
UNIQUE(INDEX(_merge,,1))
• _tValue --> Returns the Total Values of each Business Unit
BYROW(_uBUnit,LAMBDA(x,SUM(INDEX(_merge,,2)*(INDEX(_merge,,1)=x))))
• Lastly we are packing the whole thing, within a VSTACK() & HSTACK() to get the required output.
VSTACK({"Business Unit","Total Value"},HSTACK(_uBUnit,_tValue))
Please suit the data ranges accordingly with your data.
You can also perform such tasks quite easily using Power Query as well:
To accomplish this task using Power Query please follow the steps,
• Select some cell in your Data Table,
• Data Tab => Get&Transform => From Table/Range,
• When the PQ Editor opens: Home => Advanced Editor,
• Make note of all the 2 Tables Names,
• Paste the M Code below in place of what you see.
• And refer the notes
let
//Source Table -- SKUtbl
SourceOne = Excel.CurrentWorkbook(){[Name="SKUtbl"]}[Content],
DataTypeSourceOne = Table.TransformColumnTypes(SourceOne,{{"SKU", Int64.Type}, {"Business Unit", type text}}),
//Source Table -- Pricetbl
SourceTwo = Excel.CurrentWorkbook(){[Name="Pricetbl"]}[Content],
DataTypeSourceTwo = Table.TransformColumnTypes(SourceTwo,{{"SKU", Int64.Type}, {"Price", Int64.Type}}),
//Merging Both Tables
MergeTables = Table.NestedJoin(DataTypeSourceOne, {"SKU"}, DataTypeSourceTwo, {"SKU"}, "Pricetbl", JoinKind.LeftOuter),
Expanded = Table.ExpandTableColumn(MergeTables, "Pricetbl", {"Price"}, {"Price"}),
//Removing the SKU Column
#"Removed Columns" = Table.RemoveColumns(Expanded,{"SKU"}),
//Grouping By Business Unit
#"Grouped Rows" = Table.Group(#"Removed Columns", {"Business Unit"}, {{"Total Value", each List.Sum([Price]), type nullable number}})
in
#"Grouped Rows"
• Change the Table name as BusinessUnittbl before importing it back into Excel.
• When importing, you can either select Existing Sheet with the cell reference you want to place the table or you can simply click on NewSheet
There are more than one way to achieve this. If you have Office 365, then you can try following. I have set it up on one sheet as below.
Formula in Blue Cell G2 is
=UNIQUE($B$2:$B$7,FALSE)
Formula in Gold Cell H2 is
=SUM(LOOKUP(FILTER($A$2:$A$7,$B$2:$B$7=$G2),$D$2:$D$3,$E$2:$E$3))
You will have to adopt this to suit your sheet/data structure.

How to combine multiple columns from a table

My issue is the following: I have a table where I have multiple columns that have date and values but represent different things. Here is an example for my headers:
I Customer name I Type of Service I Payment 1 date I Payment 1 amount I Payment 2 date I Payment 2 amount I Payment 3 date I Payment 3 amount I Payment 4 date I Payment 4 amount I
What I want to do is sumifs the table based on multiple criteria. For example:
I Type of Service I Month 1 I Month 2 I Month 3 I Month 4
Service 1
Service 2
Service 3
The thing is that I do not want to write 4 sumifs (in this case, but in fact I have more that 4 sets of date:value columns).
I was thinking of creating a new table where I could put all the columns below each other (in one table with 4 columns - Customer name, Type of Service, Date and Payment) but the table should be dynamically created, meaning that it should be expanded dynamically with the new entries in the original table (i.e. if the original table has 200 entries, this would make the new table with 4x200=800 entries, if the original table has one more record then the new table should have 4x201=804 records).
I also checked the PowerQuery option but could not get my head around it.
So any help on the matter will be highly appreciated.
Thank you.
You can certainly create your four column table using Power Query. However, I suspect you may be able to also generate your final report using PQ, so you could add that to this code, if you wish.
And it will update but would require a "Refresh" to do the updating.
The "Refresh" could be triggered by
User selecting the Data/Refresh option
A button on the worksheet which user would have to press.
A VBA event-triggered macro
In any event, in order to make the query adaptable to different numbers of columns requires more M-Code than can be generated from the UI, a well as a custom function.
The algorithm below depends on the data being in this format:
Columns 1 and 2 would be Customer | Type of Service
Remaining columns would alternate between Date | Amount and be Labelled: Payment N Date | Payment N Amount where N is some number
If the real data is not in that format, some changes to the code may be necessary.
To use Power Query:
Select some cell in your Data Table
Data => Get&Transform => from Table/Range
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table Name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to understand the algorithm
To enter the Custom Function, while in the PQ Editord
Right click in the Queries Pane
Add New Query from Blank Query
Paste the custom function code into the Advanced Editor
rename the Query fnPivotAll
M Code
let
//Change Table name in next line to be the Actual table name in your workbook
Source = Excel.CurrentWorkbook(){[Name="Table8"]}[Content],
/*set datatypes dynamically with
first two columns as Text
and subsequent columns alternating as Date and Currency*/
textType = List.Transform(List.FirstN(Table.ColumnNames(Source),2), each {_,Text.Type}),
otherType = List.RemoveFirstN(Table.ColumnNames(Source),2),
dateType = List.Transform(
List.Alternate(otherType,1,1,1), each {_, Date.Type}),
currType = List.Transform(
List.Alternate(otherType,1,1,0), each {_, Currency.Type}),
colTypes = List.Combine({textType, dateType, currType}),
typeIt = Table.TransformColumnTypes(Source,colTypes),
//Unpivot all except first two columns
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(typeIt, List.FirstN(Table.ColumnNames(Source),2), "Attribute", "Value"),
//Remove "Payment n " from attribute column
remPmtN = Table.TransformColumns(#"Unpivoted Other Columns",{{"Attribute", each Text.Split(_," "){2}, Text.Type}}),
//Pivot on the Attribute column without aggregation using Custom Function
pivotAll = fnPivotAll(remPmtN,"Attribute","Value"),
typeIt2 = Table.TransformColumnTypes(pivotAll,{{"date", Date.Type},{"amount", Currency.Type}})
in
typeIt2
Custom Function: fnPivotAll
//credit: Cam Wallace https://www.dingbatdata.com/2018/03/08/non-aggregate-pivot-with-multiple-rows-in-powerquery/
(Source as table,
ColToPivot as text,
ColForValues as text)=>
let
PivotColNames = List.Buffer(List.Distinct(Table.Column(Source,ColToPivot))),
#"Pivoted Column" = Table.Pivot(Source, PivotColNames, ColToPivot, ColForValues, each _),
TableFromRecordOfLists = (rec as record, fieldnames as list) =>
let
PartialRecord = Record.SelectFields(rec,fieldnames),
RecordToList = Record.ToList(PartialRecord),
Table = Table.FromColumns(RecordToList,fieldnames)
in
Table,
#"Added Custom" = Table.AddColumn(#"Pivoted Column", "Values", each TableFromRecordOfLists(_,PivotColNames)),
#"Removed Other Columns" = Table.RemoveColumns(#"Added Custom",PivotColNames),
#"Expanded Values" = Table.ExpandTableColumn(#"Removed Other Columns", "Values", PivotColNames)
in
#"Expanded Values"
Sample Data
Output
If this does not give you what you require, or if you have issues going further with it to generate your desired reports, post back.

Sum multiple rows based on duplicate column data without formula

Based on data available in columns A to D (can be any 100's of columns), I want to sum up all the rows for column E to K (can be any 100's of columns)
The rows should sum up based on duplicate data from rows A to D, the result required as below
This is easily possible to do, with sumif, but would like to know if possible natively in excel or power query without creating unique id for each column or using sumif function or formula of any sort
In powerquery .. unpivot, group, pivot, done.
More detail:
Click select first 4 columns, right click, unpivot other columns
Click select first 4 columns and the new Attribute column, right click, group by
Use Operation:Sum on Column:Value name:count and hit OK
Click select Attribute column and transform .. pivot column... , for value column choose count
File Close and load
Full sample code:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Code1", "Code2", "Code3", "Code4"}, "Attribute", "Value"),
#"Grouped Rows" = Table.Group(#"Unpivoted Other Columns", {"Code1", "Code2", "Code3", "Code4", "Attribute"}, {{"Count", each List.Sum([Value]), type number}}),
#"Pivoted Column" = Table.Pivot(#"Grouped Rows", List.Distinct(#"Grouped Rows"[Attribute]), "Attribute", "Count", List.Sum)
in #"Pivoted Column"
To solve a problem like this, I first do a concrete example and then generalize it. I made a small table in Excel like so:
Code1
Code2
2-Jul-20
3-Jul-20
4-Jul-20
5-Jul-20
6-Jul-20
ERT
EXC
10
6
15
2
ERT
EXC
2
3
23
1
CON
HOR
3
CON
HOR
6
2
356
3
Then I clicked within the table and created a Power Query referencing it. After opening the Power Query Editor, there is a Group By function on the Home tab. It's pretty straightforward to choose the columns you want and the Sum function in a toy example like this.
Then, I opened the Advanced Editor to see what code was auto-generated. It looked something like this:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Grouped Rows orig" = Table.Group(Source, {"Code1", "Code2"}, {{"2-Jul-20", each List.Sum([#"2-Jul-20"]), type nullable number}, {"3-Jul-20", each List.Sum([#"3-Jul-20"]), type nullable number}, {"4-Jul-20", each List.Sum([#"4-Jul-20"]), type nullable number}, {"5-Jul-20", each List.Sum([#"5-Jul-20"]), type nullable number}, {"6-Jul-20", each List.Sum([#"6-Jul-20"]), type nullable number}})
in
#"Grouped Rows orig"
Typically, a Power Query expression is a series of transformations applied to a table, where each one operates on the table as returned from the previous. Here, we start with the original table as "Source" and then do the grouping. The parameters are a little messy, but what we have is: (1) the input table, (2) a list of the column names to group by, and (3) a list of 3-item lists, each of which describe an aggregated column. The sublists have the output column name, the function that does the aggregation, and the data type.
In Power Query, "each" is syntactic sugar for a single parameter function whose parameter is just an underscore. But also, when you have a record or row, you can just use [column] instead of _[column].
So how to generalize the operation you want to do? My first thought is that a convenient grouping function should have two parameters, based on your description. The first is the table to group, and the second is the number of columns starting from the left to group by. If you don't have them arranged contiguously, of course, you could do something else.
sumFromColumn = (t, n) => let
cList = Table.ColumnNames(t),
toGroup = List.FirstN(cList, n),
toSum = List.RemoveFirstN(cList, n),
sumFunc = (cName) => {cName, each List.Sum(Record.Field(_, cName)), type nullable number}
in Table.Group(t, toGroup, List.Transform(toSum, each sumFunc(_))),
#"Grouped Rows" = sumFromColumn(Source, 2), // Group by the first 2 columns and sum the rest
Here is the generalized function I made, which appears to match the original Table.Group operation that was generated by the interface.
The let statement arranges things for readability but does not imply a particular sequence that they happen in. Power Query figures out the dependencies and executes the statements in whatever order is needed.
The list of column names of the table is defined as cList, and split into toGroup and toSum. Then, sumFunc is defined as a function taking a column name and returning the 3-item list needed to define an aggregation operation. In Power Query, functions can return other functions any which way. So here we are defining a function that returns a list, with a function in it. Then we can use List.Transform to take the list of aggregated columns and turn it into the appropriate parameters for Table.Group.
Finally, the actual group by is done with a call like sumFromColumn(Source, 2), which is equivalent to the original statement that hard-codes the column names.
Code1
Code2
2-Jul-20
3-Jul-20
4-Jul-20
5-Jul-20
6-Jul-20
ERT
EXC
12
3
6
38
3
CON
HOR
6
5
356
3
This can easily be changed to sumFromColumn(Source, 1), in which case it will reduce to two rows, but then the second column being non-numeric, will become error values.
Or, you can use sumFromColumn(Source, 3), which will not add things up because the group by columns taken together are distinct.
This way you can easily aggregate any number of columns without caring about their names. I recommend both the Power Query M documentation on microsoft.com and reading about functional programming in general.

A Frustrating Set of Variables in Excel

My industry (aftermarket auto components) utilizes a data standard for digital distribution, and I am currently attempting to create a living reference document, formatted with the correct information in the correct way, to make updating our standard database a less time consuming process.
My company has a 'Master Data Sheet' which contains every piece of data for all of the 20k+ products that we sell. All of our pricing and tracking sheets call cells or ranges from the Master Sheet, in addition to most of our front-facing web presence.
Here's my problem. The standard requires that our marketing descriptions be broken into separate lines with a specific identifier code and grouped by item ID:
Item ID Desc Code Desc
CHD001A AAA Brake Kit
CHD001A BAA Cross-drilled...
CHD001A BAA All of our...
CAE221B AAA Replacement Part
CAE221B BAA Reinforced with...
Our Master Data sheet has a different structure:
Item ID Desc - AAA Desc - BAA Desc - BAA
CHD001A Brake Kit Cross drilled... All of our...
CAE221B Replacement Part Reinforced with...
I'm completely stuck on how to get the right info into the right slots. I CANNOT alter the structure of the Master Sheet or I will have to remap at least thirty other spreadsheets. A VLOOKUP won't work in the horizontal way it needs to, and IF statements will get 20 nests in and then lack have a good way to group things. Please help.
Assuming that your task is to find the description of item CH001A in the master db, and that you know the description code, you can use INDEX/MATCH. Here's the setup I used for developing the formula.
I created a simulation of your master in A1:D4 (one row more than the example in your question.
I assigned G2 as the cell where I would enter the Item ID and G3 to enter the Desc Code.
Now the formula =IFERROR(MATCH(G2,A1:A4,0), 1) finds the sheet row number by Item ID and =IFERROR(MATCH("Desc - " & G3,A1:D1,0),1) finds the sheet column number by Desc code. Note that both formulas default to 1 if not found.
Now the formula below will return the description.
=INDEX(A1:D4,IFERROR(MATCH(G2,A1:A4,0), 1),IFERROR(MATCH("Desc - " & G3,A1:D1,0),1))
Observe that the db range A1:D4 includes the captions and both range A1:A4 and A1:D1 start from the extreme top or left. This enables a column or row caption to be displayed in case of error (when an Item ID or Desc Code isn't found).
The formula isn't perfect yet but the method is. I take it that you will be able to tweak it to optimize adaptation to your needs. Let me know if you need help or advice with that.
Very simple to do with Power Query, available in Excel 2010+
Select some cell in the Master Table
If this is not a real Table, it will be changed into one
Data / Get&Transform / From Table/Range
In the PQ editor window that opens, select the Item Id column
Unpivot other columns
Split the resultant Attribute column by `Transition from non-digit to digit
This will get rid of the automatically created suffixes caused by creating
a table with initially identical column headers
If there are digits in the code itself, you'll need to remove the terminal digit(s) using a custom column with a formula.
The best way to do that would depend on the actual structure of your values
Delete the Attribute.2 column (the one with the terminal digits)
Rename the columns appropriately
Here is the generated M Code.
You can just paste this into the Advanced Editor of PQ. If you do, be sure to change the Table Name in Line 2 to whatever your real table name for the Master Data turns out to be.
let
Source = Excel.CurrentWorkbook(){[Name="Master"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Item ID", type text}, {"Desc - AAA", type text}, {"Desc - BAA", type text}, {"Desc - BAA2", type text}}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type", {"Item ID"}, "Attribute", "Value"),
#"Split Column by Character Transition" = Table.SplitColumn(#"Unpivoted Other Columns", "Attribute", Splitter.SplitTextByCharacterTransition((c) => not List.Contains({"0".."9"}, c), {"0".."9"}), {"Attribute.1", "Attribute.2"}),
#"Removed Columns" = Table.RemoveColumns(#"Split Column by Character Transition",{"Attribute.2"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Attribute.1", "Desc Code"}, {"Value", "Desc"}}),
#"Filtered Rows" = Table.SelectRows(#"Renamed Columns", each ([Desc] <> ""))
in
#"Filtered Rows"

Excel - Power Query 2016

I got data from two tables.
Customers (containing customer ID and the total value of orders/funding
Orders (Containing customer ID and each order)
I created a Power Query, then chose the option to "Merge Queries as New". Selected the matching Columns (Customer ID) and chose the option:Left Outer (All from the first and, matching from second => All from the customer table, matching from the order table). Then I expanded the last column of the Query to include what I wanted from the Order table resulting in the table below on the left. The one on the right is what I'm after. The problem is that funding amounts are already totals per customer. I don't need the value of each order broken down. I still need the orders displayed but I don't need their values (just the total per customer). Is it possible to do it like the one below on the right? Otherwise, the grand total is way off.
I think what you're trying to do is join with only the first instance of each value in your Customer column. There doesn't appear to be any feature or GUI element that allows you to do that (I had a look at the reference documentation for Power Query M, maybe I missed something).
To replicate your data, I'm starting off with some tables (left table is namedCustomers, right table is namedOrders):
I then use the M code below (the first few lines are just to get my tables from the sheet):
let
customers = Excel.CurrentWorkbook(){[Name = "Customers"]}[Content],
orders = Excel.CurrentWorkbook(){[Name = "Orders"]}[Content],
merged = Table.NestedJoin(orders, {"CUSTOMER"}, customers, {"CUSTOMER"}, "merged", JoinKind.LeftOuter),
indexColumn = Table.AddIndexColumn(merged, "Temporary", 0, 1),
indexes =
let
uniqueCustomers = Table.Distinct(Table.SelectColumns(indexColumn, {"CUSTOMER"})), // Want to keep as table
listOfRecords = Table.ToRecords(uniqueCustomers),
firstOccurenceIndexes = List.Accumulate(listOfRecords, {}, (listState, currentItem) =>
List.Combine({listState, {Table.PositionOf(indexColumn, currentItem, Occurrence.First, "CUSTOMER")}})
)
in
firstOccurenceIndexes,
expandSelectively =
let
toBoolean = Table.TransformColumns(indexColumn, {{"Temporary", each List.Contains(indexes, _), type logical}}),
tableOrNull = Table.AddColumn(toBoolean, "toExpand", each if [Temporary] then [merged] else null),
dropRedundantColumns = Table.RemoveColumns(tableOrNull, {"merged", "Temporary"}),
expand = Table.ExpandTableColumn(dropRedundantColumns, "toExpand", {"FUNDING"})
in
expand
in
expandSelectively
If your table names and column names match mine (including case sensitivity), then you might just be able to copy-paste all of the M code above into the Advanced Editor and have it work for you. Otherwise, you may need to tweak as necessary.
This is what I get when I load the query to the worksheet.
There might be better (more efficient) ways of doing this, but this is what I have for now.
If you're not using the order ID column, then I would suggest doing a Group By on the OrderTable before merging in the funding so that you'd end up with a table like this instead:
Region Customer OrderCount Funding
South A 3 2394
South B 2 4323
South C 1 1234
South D 2 3423
This way you don't have mixed levels of granularity that cause problems like you are seeing with the totals.

Resources