Aggregating data using INDEX MATCH MATCH or SUMIFS - excel

I'm trying to create an Excel formula that is able to sum multiple rows in a table, where the rows and column to be summed are determined by the contents of other cells.
Ordinarily I would use Index Match Match to achieve this, but the multiple rows summation has left me stumped.
I've seen a couple of examples on here of Index Match with a SUMIFS formula, but nothing that pairs this with Index Match Match.
I have two tables on different Excel sheets. The first one looks a little this (the actual table is 105 columns x 200 rows):
That is from a sheet called "Firm Cost Summary". Row 4 contains a list of unique employee numbers. Column A is the expense category per our accounting system and Column B is a broader category that should be used in Excel to group similar items. Column E onwards then contains the numerical information to be aggregated.
What I would then like to do is summarise that table in a more presentable format that can then be manipulated in other ways. The table looks like this:
That is on a sheet called "Staff Cost Summary". I would like to fill out the info in the yellow cells, i.e. total the salary, bonus, benefits, etc, of each staff member. Ideally this would be a formula I input in cell E6 that I can then drag right and downwards to fill the table.
To give an example, to fill out cell I6 in the second table, the formula should look in cell A6 to find the employee number (1 in this case) and look this up in row 1 of the first table to find the appropriate column of the first table (column E in this case).
The formula should then look in cell I5 of the second table to see that we are looking to aggregate benefits, then look down column B of the first table to find each row that should be summed (rows 7-10 in this case).
With that in mind, here's what I've got:
=INDEX('Firm Cost Summary'!$A$4:$G$10,MATCH('Staff Cost Summary'!$A6,'Firm Cost Summary'!$A$4:$G$10,0),MATCH('Staff Cost Summary'!E$5,'Firm Cost Summary'!$B$4:$B$10,0))
Total benefits for Joe Bloggs are the sum of cell E7:E10 of table 1, i.e. 5 + 10 + 50 + 100 = 165.
Clearly there are multiple matches in column B of that table, so the above formula gives an answer of 0. Any ideas how I can tweak that to make it work?

Put this in E6 and copy over and down
=SUMIFS(INDEX('Firm Cost Summary'!$D:$DD,0,MATCH($A6,'Firm Cost Summary'!$D$4:$DD$4,0)),'Firm Cost Summary'!$B:$B,E$5)
The index/match returns the correct column to be added.

Related

Using LOOKUP Functions with <= and >=

I attempting to use the LOOKUP functions in Excel in a nested(?) fashion and with ranges of data. In the attached picture, the left-hand table is my data that extends for another 360 rows or so. Each row has a unique ID (I've taken this data from a larger set so I wanted to retain it), a State postal abbreviation, and the income level for that data point (each row is data from a different zipcode).
The table on the right is the metadata - quintile levels for income in each state. For each row on the left, I want to look up the state abbreviation from the metadata, then use the adjacent income level to determine and print out the appropriate quintile based on that row in the metadata. I anticipate that the solution would use some form of the lookup functions and inequalities, but I'll take any solution.
For this approach you need Office 365, with the new XMatch function, which can do an approximate match for next bigger number without requiring the data to be sorted.
The formula is
=INDEX($J$1:$N$1,XMATCH(C2,INDEX(J:J,MATCH(B2,H:H,0),0):INDEX(N:N,MATCH(B2,H:H,0)),1))
If you don't have XMatch, you would need to re-arrange the lookup columns from Highest to Lowest. Then you can use
=INDEX($J$1:$N$1,MATCH(C2,INDEX(J:J,MATCH(B2,H:H,0),0):INDEX(N:N,MATCH(B2,H:H,0)),-1))
If you paste this formula on cell D4, would the result be your expected output?
(Highest Quintile for California)
=INDEX(N:N,MATCH(B4,B:B,0))
Paste this to D5
(Lowest Quintile for Ohio)
=INDEX(J:J,MATCH(B5,B:B,0))
The last 0 (zero) in the formula is the match type. It can be replaced with:
1 - less than
0 - exact match
-1 - greater than
depending on your need. Did I get your point?
Your information is a bit sparse. So I tell you what I did and you take it from there.
First I created a named range to comprise 2 columns of your median income table, state and income. Then I created this formula to extract the income by state.
=VLOOKUP($B2,Income,2,FALSE)
Observe that the state name is in column B and the income in the 2nd column of the Income range. Your list may be structured differently. The key to it is that the Income range must have the State in its first column and the 2 in the formula just counts columns from State to Income.
If you place this formula on the same sheet as the Income range it will just produce a copy of the Income column. But that isn't what I did. I placed it in a blank column on the Quintile tab. That happened to be column J, since A:G is taken up with your data, notably, C:G with the columns for quintile numbers. Observe in this transfer that column B displays the state abbreviations. By coincidence it's column B in both sheets. The relevant column is the one on the Quintile sheet. So, the formula still shows the median income for each state but the sequence is determined by the sequence of state names on the tab where the formula resides, and that is the requirement here.
Next, I created this formula and placed it in column K of the Quintile sheet.
=MATCH(J2,C2:G2,1)
This formula determines the column in C:G where the value in J2 is matched. J2, of course, contains the Median income drawn by the VLOOKUP. If that number is nothing it will be interpreted as zero and the lowest quintile returned. Read up on the precise method of the MATCH function.
Now J2 can be integrated into the formula. I did that in a copy of the MATCH formula in column L.
[L2] =MATCH(VLOOKUP($B2,Income,2,FALSE),C2:G2,1)
Observe that the formula in K and L have the same result. I copied them down a few rows to make sure. I got a lot of #N/A errors in this exercise resulting from state abbreviations in the Quintile sheet not being found in the Income range. I think that information is useful. Therefore I didn't suppress it.
The result so far is the quintile, numbered from 1 to 5. I wanted to make sure that the numbers are correct. Therefore I "translated" them to the number in the Quintile table.
For this purpose I created another named range, called this one "Quintiles", comprising of columns C:G. It's important that this range should start in row 1. The columns could be any other columns (not C:G) but they must be the same as specified in the formula, with the lowest quintile being the first. And this became my formula in column M.
[M2] =INDEX(Quintiles,ROW(),L2)
If you actually need this number you can replace the reference to L2 in the formula with the formula in L2.

Sum column based on conditions for subsums

So I have a table which basically looks as follows:
Criterion Value
1 -5
1 1
2 5
2 5
3 2
3 -1
I want to sum the values in column B based on the criteria in column A, but only if the sum for an individual criterion is not negative. So for example if I ask for the sum of all values where criterion is between 1 and 3, the result should be 11 (the values for criterion 1 not being included in the sum because they add up to a negative number.
My first idea was to add a third column with a sumif([criterion];[#criterion];[value]) and then use a sumifs function which checks whether that that third column is negative. However, my table has +100k lines and with that many sumif functions it becomes intolerably slow.
I know I could create a pivot table to the same effect, but that has two drawbacks: I would have to create a separate sheet, which would add complexity, and my table is frequently updated which means I would have to manually update that pivot table every time to allow for downstream calculations. NBD and I could do that as a last resort, but I wonder whether there isn't a more elegant way to solve this problem.
I would want to avoid VBA to avoid complexity (the sheet will be used by other persons).
Thank you
This can be easily done using UNIQUE() and the two versions of SUMIF() in this way:
First collect all the criteria with =UNIQUE(A2:A7) -- Assuming your data are in columns A and B starting from row 2, this goes in cell C2, with "Criteria" in C1
Compute the subtotals for all criteria using =SUMIF($A$2:$A$7, C2, $B$2:$B$7) -- This goes in cell D2 and extends as the criteria do, "Partials" in cell D1
sum all the data in step 2 yielding a positive sum with =SUMIF(D2:D7, ">0") in cell E2
If you have a lot of data I suggest to use the column references to avoid absolute references and the need to adjust the formulas as data change (in number):
The first formula becomes =UNIQUE(A:A) -- Don't care about the heading being taken (strings and empty cells are not summed)
For the second formula use =SUMIF(A:A, C2, B:B)
Use =SUMIF(D:D, ">0") for the last step
This should be reasonably fast, using just as many extra cells as the number of distinct criteria (multiplied by 2).

Getting the sum of entries that match the conditions from 2 columns

I have this setup in my worksheet
where column A contains the conditions I need to look out for,
column B contains the total number of records that fit the conditions,
and columns D and E contain the records I need to count
what I want to happen is to create a formula for column B where it can count the total records that fit 2 conditions I need
for example in the case of B3, there are 2 instances of Dog Red in column E where column D contains the word small, and for B3 there is only 1 instance of Dog Red where column D contains the word large
the same conditions apply for the rest of column B
is there a way to compute for the total records that fit the 2 conditions in a single cell?
The formula below will work for B3:B4. For B6:B7 the reference to A$2 would have to be changed manually.
=COUNTIFS($E$3:$E$9,A$2,$D$3:$D$9,"*"&A3&"*")
It should be possible to extend the formula to be able to find the right color dog in column A so that you can copy the formula down all the way without change. I haven't done this here because I feel your example isn't representative of your final worksheet. For the moment, please just take note that your arrangement of count criteria makes referencing them difficult. Perhaps a better way of displaying them can be found.

How to get the highest values from 2 columns in Excel?

I have a design software which extracts data in to an Excel sheet format
The output is divided into 2 columns, each of these columns has more than 1000 rows.
To make use of this data I need to summarize it to a maximum of the 5 highest values from both of the 2 columns. Therefore, this doesn't mean that it's the maximum of one column and its corresponding value, but it may mean that the 2nd largest value of column 1 & the 4th largest value of column 2.
For example ( if we quoted some of the output data):
The values i should pick here are:
If there is any possible way to achieve that, it will be great
Thanks ..
example file: http://goo.gl/UIEFEv
example file 2: http://goo.gl/VSvuVf
Here's a formula solution. I used 20 rows and extracted the rows which contain the top 5 for each column - you can extend to as many rows as required.
With data in A1:B20 use this formula in D1 confirmed with CTRL+SHIFT+ENTER and copied across to E1 and down both columns:
=IFERROR(INDEX(A$1:A$20,SMALL(IF(($A$1:$A$20>=LARGE($A$1:$A$20,5))+($B$1:$B$20>=LARGE($B$1:$B$20,5)),ROW(A$1:A$20)-ROW(A$1)+1),ROWS(D$1:D1))),"")
Note: there are only eight rows extracted because some of the rows contain values in the top 5 for both columns. I added the highlighting in colums A and B to more clearly illustrate
see screenshot below
Edit:
From the comments below it seems that you want a combination of rows which contain the highest value for that column....and rows which contain the highest total for both columns.
In the original formula there are two conditions joined with "+", i.e.
($A$1:$A$20>=LARGE($A$1:$A$20,5))+($B$1:$B$20>=LARGE($B$1:$B$20,5)
The "+" gives you an "OR" type functionality, e.g. in this case rows are included if individual values are in the top 5 in that particular column. You can add other conditions, so if you want to also add any rows which are in the top 5 considering the total of both columns then you can add another "clause", i.e.
($A$1:$A$20>=LARGE($A$1:$A$20,5))+($B$1:$B$20>=LARGE($B$1:$B$20,5)+($A$1:$A$20+$B$1:$B$20>=LARGE($A$1:$A$20+$B$1:$B$20,5))
....and including that in the complete formula you get this version:
=IFERROR(INDEX(A$1:A$20,SMALL(IF(($A$1:$A$20>=LARGE($A$1:$A$20,5))+($B$1:$B$20>=LARGE($B$1:$B$20,5))+($A$1:$A$20+$B$1:$B$20>=LARGE($A$1:$A$20+$B$1:$B$20,5)),ROW(A$1:A$20)-ROW(A$1)+1),ROWS(D$1:D1))),"")
You could refine that further by using combinations of + and * (for AND), e.g. for the new condition you might only want to include rows with a total in the top 5 if one of the single values is in the top 10 for that column...
Explanation:
The above part shows how you can use + for the OR conditions. In the formula if those conditions are TRUE then the IF function returns the "relative row number" of the range (using ROW(A$1:A$20)-ROW(A$1)+1).
SMALL function then extracts the kth smallest value, k being defined by ROWS(D$1:D1) which starts at 1 in D1 (or E1) and increments by 1 each row.
INDEX function then takes the actual value from that row.
When you run out of qualifying rows SMALL function will return a #NUM! error which IFERROR here converts to a blank
The question is a little unclear but if what you mean is to get the 5 highest values of Column A and their corresponding values in Column B then the five highest values in Column B and the corresponding values in Column A then the (non automated) solution is pretty simple.
Click on a cell with a header title in it.
Click on 'Data' in the top menu.
Click on 'Filter' in the 'Sort & Filter' section.
Click on the button on Column A - select 'Sort Largest to Smallest'
Grab the top five values from both columns then click on the button in column B and repeat.

Compare two data sheets

The issue I'm faced with is I have two sheets of data in Excel. They are a stocksheet list, listing items that have a variance from a stocktake. The items are randomly placed between both documents, so it is almost impossible to do a side-by-side view even if I were to order the columns (which I already have). For example it would be like this:
Sheet 1:
A1 (Apple) (1)
A2 (Carrot) (-3)
A3 (Banana) (4)
A4 (Chocolate (-7)
Whereas Sheet 2 may be:
A1 (Orange) (-2)
A2 (Apple) (3)
A3 (Muffin) (-8)
A4 (Carrot) (3)
So as you can see, the same data may appear, and if it does I want to compare those two sets, to know the variance, i.e. Sheet 1 said -3 whereas sheet 2 said +1... I preferably would like to do this in a batch if possible, as there are over 800 cells to go through.
Just so that you can see what I'm dealing with, here's links to pastebins of both sheets;
Sheet 1: http://pastebin.com/6i7QKJ6N
Sheet 2: http://pastebin.com/zjtC2U7q
Is there anything anyone can think of that would be able to assist me, other than me going through this one by one which I am considering doing?
Excuse me from avoiding the real situation and sticking with your example. Assuming the values are in ColumnB in the corresponding rows, then:
in Sheet1: =VLOOKUP(A1,Sheet2!A:B,2,FALSE)
in Sheet2: =VLOOKUP(A1,Sheet1!A:B,2,FALSE)
say in ColumnsC should 'align' the entries (where both exist, otherwise #N/A). =B1=C1 in D1 copied down should then help to identify the mismatches and say =B1-C1 in E1 copied down the quantification the discrepancies between the sheets, by 'vegetable'.
There should be no need for a batch mode for this.
I'm assuming that the unique identifier for the stock items is the column labelled CYSKU, right?
If that's so, then there are only 192 common items between the two sheets. I ran a vlookup in both sheets a bit similar to the one pnuts used and used a filter.
There are more variances between CYCOST than with CYRETL as far as I can see (I haven't compared the other columns).
To perform the comparison, you can do the following:
Insert a column between columns C and F (just after CYSKU) and put a vlookup formula in row 2 of this column and fill it down:
=VLOOKUP(C2, Sheet2!C:C, 1, 0)
Insert a filter and filter out #N/A from this column to get only those that are common between the two sheets.
In column M (after CYDVAR), insert another vlookup and fill it down:
=VLOOKUP(C2, Sheet2!C:F, 4, 0)
This will give you the corresponding CYRETL from Sheet2. You can then compare the two CYRETL.
How VLOOKUP works:
The first parameter is what VLOOKUP will be looking for.
The second parameter is the table range in which to look the first parameter.
The third parameter is the nth column from which a match will be returned, limited to the table (if the table is in column A:A, only 1 column is available, if the table is A:B, 2 columns are available, etc).
The last parameter is for either exact or approximate match. Exact is 0 (or FALSE) and approximate is 1 (or TRUE).
You can just change the table range and the column number to change the value you're looking for from Sheet2.

Resources