Excel Sum Index Match Across Multiple columns - excel

I am having significant issues trying to resolve my problem. Essentially I need an excel formula that replicates a SUMIFS function, as it appears that sumifs doesn't work in my scenario. Effectively I need to SUM across a horizontal axis, based on the date & header parameter. I have tried summing index-matches, sumifs, aggregates, summing sumif, summing vlookups & hlookups, and I either get errant values or I get the first value (for example, store A would return 0 for 7/8 & Store G would return -3,291)
=SUMIF($1:$1,B22,INDEX($C$2:$AQ$1977,1,MATCH($A982,$A$2:$A$9977,0)))
=SUMIFS(B2:N2,1:1,B22,A:A,A23)
=SUM(SUMIF($B$1:$N$1,$B23,INDEX($B$2:$N$12,1,MATCH($A23,$A$2:$A$12,0))),SUMIF($B$1:$N$1,$B23,INDEX($B$2:$N$12,2,MATCH($A23,$A$2:$A$12,0))),SUMIF($B$1:$N$1,$B23,INDEX($B$2:$N$23,3,MATCH($A23,$A$2:$A12,0)))).
I'm sure the sumrange is what is killing me, but I would ideally like the code to be dynamic enough to locate and sum the cells via references, in case the input data changes at some point. I am working with thousands of rows so the sum range is B2:AQ10000.
The formula is on a different sheet than this but i input it as an example.
What am I missing? Is there a way to do this with Excel?

Use:
=SIMIFS(INDEX($B$2:$N$12,MATCH($A23,$A$2:$A$12,0),0),$B$1:$N$1,B$22)

Related

Excel - Replicating a SUMPRODUCT formula with already summed up values

I have a sheet that uses the below formula to arrive at a figure:
=SUMPRODUCT(E12:K12,E23:K23)/M10
To my understanding, it's getting the sum of E12:K12(593+622+636+620+595+583+589) and multiplying that together with the sum of E23:K23(5740+5160+5432+4640+4716+7372+6696); then dividing the result by M10(39,756).
I have 2 new cells that contain the results of the two summed ranges but when I try to replicate the formula with a regular sum, the result is different:
=SUM(IntradayPlan2!D346*IntradayPlan2!D347)/P12
The result should be 604 but it's coming out at 4383. P12 contains the same number as M10 and the 2 cells in the SUM are summing the same values as the original SUMPRODUCT formula is.
For clarity, I'm currently replicating the main sheet a second time to generate this result. It's slowing down the workbook and most of the other detail on the replicated sheet isn't needed, so I'm trying to get rid of it.
I'm sure I'm missing something when it comes to SUMPRODUCT but after an hour of Googling around I can't work it out. Is there a way of replicating the same maths procedure with my new totals instead?
After the great explanation by Gowtham I don't think it's possible to recreate the result of a SUMPRODUCT by using total values of the 2 used arrays.
Those cells actually get their values from another sheet, which I need to keep anyway. So I've created a couple of new rows which puts those figures in a line (their otherwise spread out across the sheet) and have referenced those new groups instead.
=SUMPRODUCT(IntradayPlan2!D355:J355,IntradayPlan2!D356:J356)/O12
And I'm getting the expected result :)

Excel calculation process / INDIRECT in an array formula

I got this spreadsheet where on daily basis I will be pasting data ranging from hundreds to thousands of rows. And this spreadsheet is heavy loaded with functions. I would like to cut calculation time as much as possible but not quite sure how excel is designed to process data. Is there a difference between two functions which are ranging like these:
SUM(A2:A50) and SUM(A2:A99999)
How I look at this: SUM function on first example will stop running once it reaches A50 cell and on second example it will keep running until it reaches A99999. From this I can say first is more efficient to have in your spreadsheet.
Please advice.
I got this formula which returns me an #N/A value and I believe this is because INDIRECT is included in array formula (Ctrl + Shift + Enter).
Whole formula itself:
=SUM(IF((INDIRECT("'" & $K$6 & "'!" & $K$7)>CODES!$K$2)+(MMULT(--('Data paste'!AD2:AF5>CODES!$K$1),{1;1;1})>0)+(MMULT(--('Data paste'!AD2:AF5>CODES!$K$3),{1;1;1})>1)+--('Data paste'!AD2:AD5*'Data paste'!AE2:AE5*'Data paste'!AF2:AF5>CODES!$K$4), 1, 0)*--('Data paste'!Q2:Q5=CODES!C2))
Please find a spreadsheet uploaded onto the GDrive so it is easier for you to understand what do I mean. Formula is located on CODES sheet E2 cell.
Thank you.
Obviously, referencing smaller ranges will always be faster or at least equally fast. However, Excel does have some optimization:
Many Excel built-in functions (SUM, SUMIF) calculate whole column references efficiently because they automatically recognize the last used row in the column. However, array calculation functions like SUMPRODUCT either cannot handle whole column references or calculate all the cells in the column.
so that shouldn't be terribly important for SUM(A2:A99999).
As for question number 2: Yes, you may use INDIRECT() within an array formula. It just currently returns NA because you're trying to add two arrays of different length:
{FALSE; TRUE} + {FALSE; FALSE; FALSE; FALSE} = {0; 1; #NA; #NA}
How to fix your issue:
change CODES!K7 to
=COUNTA('Data paste'!A:A)
and change CODES!E2 to
=SUM(IF(('Data paste'!AA2:INDEX('Data paste'!AA:AA,$K$7)>CODES!$K$2)
+(MMULT(--('Data paste'!AD2:INDEX('Data paste'!AF:AF,$K$7)>CODES!$K$1),{1;1;1})>0)
+(MMULT(--('Data paste'!AD2:INDEX('Data paste'!AF:AF,$K$7)>CODES!$K$3),{1;1;1})>1)
+--('Data paste'!AD2:INDEX('Data paste'!AD:AD,$K$7)*'Data paste'!AE2:INDEX('Data paste'!AE:AE,$K$7)*'Data paste'!AF2:INDEX('Data paste'!AF:AF,$K$7)>CODES!$K$4), 1, 0)
*--('Data paste'!Q2:INDEX('Data paste'!Q:Q,$K$7)=CODES!C2))
I have replaced Indirect() calls by Index(), which should benefit performance.

Sumproduct or Countif on a 2D matrix

I'm working on data from a population of people with allergies. Each person has a unique ExceptionID, and each allergen has a unique AllergenID (451 in total).
I have a data table with 2 columns (ExceptionID and AllergenID), where each person's allergies are listed row by row. This means that the ExceptionID column has repeated values for people with multiple allergies, and the AllergenID column has repeated values for the different people who have that allergy.
I am trying to count how many times each pair of allergies is present in this population (e.g. Allergen#107 & Allergen#108, Allergen#107 & Allergen#109,etc). To keep it simple I've created a matrix of 451 rows X 451 columns, representing every pair (twice actually because A/B and B/A are equivalent).
I somehow need to use the row name (allergenID) to lookup the ExceptionID in my data table, and count the cases where that matches the ExceptionIDs from the column name (also AllergenID). I have no problem using Vlookup or Index/Match, but I'm struggling with the correct combination of a lookup and Sumproduct or Countif formula.
Any help is greatly appreciated!
Mike
PS I'm using Excel 2016 if that changes anything.
-=UPDATE=-
So the methods suggested by Dirk and MacroMarc both worked, though I couldn't apply the latter to my full data set (17,000+ rows) because it was taking a long time.
I've since decided to turn this into a VBA macro because we now want to see the counts of triplets instead of pairs.
With the 2 columns you start with, it is as good as impossible... You would need to check every ExceptionID to have 2 different specific AllergenID. Better use a helper-table with ExceptionID as rows and AllergenID as columns (or the opposite... whatever you like). The helper table needs a formula like:
=COUNTIFS($A:$A,$D2,$B:$B,E$1)
Which then can be auto-filled. (The ranges are from my example, you need to change them to your needs).
With this helper-matrix you can easily go for your bigger matrix like this:
=COUNTIFS(E:E,1,INDEX($E:$G,,MATCH($I2,$E$1:$G$1,0)),1)
Again, you can auto-fill with this formula, but you need to change it, so it fits your needs.
Because the columns have the same ID2 (would be your AllergenID), there is no need to lookup them because E:E changes automatically with the auto-fill.
Most important part of the formulas are the $ which should not be messed up, or you can not auto-fill it.
Picture of my self-made example (formulas are from the upper left cell in each table):
If you still have any questions, just ask :)
It can be done straight from your original set-up with array formulas:
Please note that array formulas MUST be entered with Ctrl-Shift-Enter, before copying across and down:
In the example pic, I have NAMED the data ranges $A$2:$A$21 as 'People' and $B$2:$B$21 as 'Allergens' to make it a nicer set-up. You can see in the formula bar how that looks as a formula. However you could use the standard references like this in your first matrix cell:
EDIT: silly me, N function is not needed to turn the booleans into 1's and 0's, since multiplying booleans will do the trick. Below formula works...
SUM(IF(MATCH($A$2:$A$21,$A$2:$A$21,0)=ROW($A$2:$A$21)-1, NOT(ISERROR(MATCH($A$2:$A$21&$E2,$A$2:$A$21&$B$2:$B$21,0)))*NOT(ISERROR(MATCH($A$2:$A$21&F$1, $A$2:$A$21&$B$2:$B$21,0))), 0))
Then copy from F2 across and down. It can be perhaps improved in technique with sumproduct or whatever, but it's just a rough example of the technique....

Excel Pivot - Count of distinct values in a given range

I've the following data set from which I need the count of distinct values in a pivot. I've tried few function like FREQUENCY, COUNTIFS etc. but I could not make it.
Input
Input Data
Output
Expected Output
=SUM(IF((B2:D4=C10),1,0))
To get result after using formula hit ctrl+shift+enter
I think it's an awkward case because the data values are in more than one column and because they are text not numbers.
The only way I could come up with would be to repeat a standard method of getting the distinct values and then use COUNTIF to get the counts.
So starting in F2 I have:-
=IFERROR(INDEX($B$2:$B$4,MATCH(0,COUNTIFS($F$1:$F1,$B$2:$B$4),0)),
IFERROR(INDEX($C$2:$C$4,MATCH(0,COUNTIFS($F$1:$F1,$C$2:$C$4),0)),
IFERROR(INDEX($D$2:$D$4,MATCH(0,COUNTIFS($F$1:$F1,$D$2:$D$4),0)),"")))
(It's an array formula and must be entered with CtrlShiftEnter)
And starting in G2:-
=COUNTIF($B$2:$D$4,F2)
To avoid having to specify an exact range (e.g. $B2:$B4), you could use the following in F2 and adjust it to the maximum number of rows you are likely to use:-
=IFERROR(INDEX($B$2:$B$10,MATCH(0,IF(ISTEXT($B$2:$B$10),COUNTIFS($F$1:$F1,$B$2:$B$10),1),0)),
IFERROR(INDEX($C$2:$C$10,MATCH(0,IF(ISTEXT($C$2:$C$10),COUNTIFS($F$1:$F1,$C$2:$C$10),1),0)),
IFERROR(INDEX($D$2:$D$10,MATCH(0,IF(ISTEXT($D$2:$D$10),COUNTIFS($F$1:$F1,$D$2:$D$10),1),0)),"")))
and this in G2:-
=IF(F2="","",COUNTIF($B$2:$D$10,F2))
but of course it's restricted to three columns and anything beyond this I think may point to a VBA solution.
There is a also a general formula for distinct values from a 2d array here but the output includes a zero when blank rows and columns are included so would need some modification.
So here is the modified formula from the reference above with error handling starting in I2:-
=IFERROR(INDEX(tbl_text, MIN(IF( IF(ISTEXT(tbl_text),COUNTIF($I$1:$I1, tbl_text),1)=0, ROW(tbl_text)-MIN(ROW(tbl_text))+1)),
MATCH(0, COUNTIF($I$1:$I1, INDEX(tbl_text, MIN(IF(IF(ISTEXT(tbl_text),COUNTIF($I$1:$I1, tbl_text),1)=0, ROW(tbl_text)-MIN(ROW(tbl_text))+1)), , 1)), 0), 1),"")
With the counts starting in J2:-
=IF(J2="","",COUNTIF(tbl_text,J2))
where tbl_text is a named range defined (when I tested it) as $B$2:$E$10
This I think should meet your additional criterion of it being more generalized because you can set tbl_text to include the maximum number of rows and columns that you are likely to use.
Will need a slight further modification to ignore blanks within the table.

Cross table comparisons, sumproduct

I am trying to compare two different Excel (2010/xlsx) tables with related data to find matches. They would be on different sheets but in the same workbook (not that it should affect the problem).
I think the best route is some combination of sumproduct, match, and index... but I haven't been able to get them to work so far. I see the main question (cell G17) being solved by creating a subset of rows from Table 2 to compare against their corresponding data in Table 1 (index/match), then using arrays to do a multiple criteria selection to count how many match the criteria I chose (sumproduct).
I have played around with vlookup, countif(s), and sumif(s) but haven't seen a good way to apply them to this problem.
You can use SUMIF as a "quasi-lookup" like this
=SUMPRODUCT((file="doc")*(modified < SUMIF(user,creator,create)))
I'm not sure how to do it in a single cell as you've asked, but I would create an extra column in the second table which uses vlookup to find the created date, and another column containing whether or not the created date is greater than the modified date. Finally, you could use countif to combine them.
To be more concrete, in your example, I would put =vlookup(F3,A$3:D$5,2,FALSE) in cell I3, and =I3>H3 into cell J3, and expand both of these down. Then cell G17 could be given by =countif(J3:J5,TRUE).

Resources