How does the SUMPRODUCT command works in this example? - excel

The following code allows me to determine distinct values in a pivot table in Excel:
=SUMPRODUCT(($A$A:$A2=A2)*($B$2:$B2=B2))
See also: Simple Pivot Table to Count Unique Values
The code runs perfectly fine. However, can somebody help me understand how this code actually works?

You write: the following code allows me to determine distinct values in a pivot table in Excel
No. That formula alone does not do that. Read on for the explanation of what does.
There's a typo in the formula. It should be
=SUMPRODUCT(($A$2:$A2=A2)*($B$2:$B2=B2))
See the difference?
The formula starts in row 2 and is copied down. In each row, the $A$2 reference and the $B$2 reference will stay the same. The $ signs make them absolute references. The relative references $A2 and A2 will change their row numbers when copied down, so in row 3 the A2 will change to A3 and B2 will change to B3. In the next row it will be A4 and B4, and so on.
You may want to create a sample scenario with data similar to that in the thread you link to. Then use the "Evaluate Formula" tool on the Formulas ribbon to see step by step what is calculated. The formula evaluates from the inside out. Let's assume the formula has been copied down to row 5 and we are now looking at
=SUMPRODUCT(($A$2:$A5=A5)*($B$2:$B5=B5))
($A$2:$A5=A5) this bit compares all the cells from A2 to A5 with the value in A5. The result is an array of four values, either true or false. The next bit ($B$2:$B5=B5) also returns an array of true or false values.
These two arrays are multiplied and the result is an array of 1 or 0 values. Each array has the same number of values.
The first value of the first array will be multiplied with the first value of the second array. (see the red arrows)
The second value of the first array will be multiplied with the second value of the second array. (see the blue arrows)
and so on.
True * True will return 1, everything else will return 0. The result of the multiplication is:
The nature of the SumProduct function is to sum the result of the multiplications (the product), so that is what it does.
This function alone does not do anything at all to establish distinct values in Excel. In the thread you link to, the Sumproduct is wrapped in an IF statement and THAT is where the distinct values are identified.
=IF(SUMPRODUCT(($A$2:$A2=A2)*($B$2:$B2=B2))>1,0,1)
In plain words: If the combination of the value in column A of the current row and column B of the current row has already appeared above, return a zero, otherwise, return a 1.
This marks distinct values of the combined columns A and B.

Firts, i think you made a type here, as the formula should be :
=SUMPRODUCT(($A$2:$A2=A2)*($B$2:$B2=B2))
Let's decompose it in 2 parts:
First, we check the cells between A2 and A2, so only one cell, and we check the number of cells wich are equals to A2. In this case, the output should be 1, as you're comparing A2 with A2. However, you're not limited to compare A2 with A2. If you had chosen 2 cells equals to A2, the results would have been 2.You can compare as many cells as you want with A2 (replace the characters after the $ to modulate).
We do the same for the second bracket, except the pivot value is B2.
After that, you need to understand what the function SUMPRODUCT does. It sum the value of the product for a range of array. For example, say you have the value 1 on A1, 1 on A2, 2 on B1 and 3 on B2, if you make SUMPRODUCT((A1:A2)*(B1:B2)) , you will obtain (1*2) + (1*3) = 5. So, in the example you gave us, it will give the sum of (A2=A2)*(B2=B2) = 1.
So, it will output the number of pair (Ax,Bx) which is equals to (A2,B2). With the link, you can see that, if you select the first line only, the function will output 1 (and so the IF will output 1), but if you select the first 2 lines, the function will output 2, (and so the IF will output 0).
I hope this made sense to you, as i hoped i didn't make any mistakes along the explanation.

Related

Using nested IF function in excel: is there a way to check next row for same IF function?

Let's say I have a simplified table of only 2 columns. Column one is Group number, with groups of different sizes. Column 2 is also numbered but has a lot of blanks.
I basically want to fill these blanks with the next NOT blank, if it is still in the same first-column-group.
I don't know if it works to show a table here, but this is what it is now and what I want to achieve with a simplified dataset:
What I tried is to use an If function. First, it checks if B1 is blank. If it is blank, I want it to check if A1 is the same "group" value as A2. If that is also true, then I want the formula to return me the value of B2. However, if B2 is also blank I want it to loop: Check if A1/A2 are the same as A3, and then give the value of B3. And continue. Plus! If B1 is not blank to begin with it just needs to return the value B1. And if B1 is blank but A1 is not the same group as A2, I want it to remain blank (or 0 is also fine).
The only way I did this was making an extremely nested if function that kept repeating for the [if true] part, which looks like this:
=IF(B2="";(IF(A2=A3;
(IF(B3="";(IF(A3=A4;
(IF(B4="";(IF(A4=A5;
(IF(B5="";(IF(A5=A6;
(IF(B6="";(IF(A6=A7;
(IF(B7="";(IF(A7=A8;
(IF(B8="";(IF(A8=A9;"xx";""))
;B8));""));B7));""));B6));""));B5));""));B4));""));B3));""));B2)
However, sometimes there are 100+ blanks within one group, and I'm trying to avoid looping 100+ repetitions of that formula.Right now, when there are not enough loops for within a group, it shows the xx. Is there no way to just replace the B1 by a continuous string of cells of column B? I looked at the Row function but it only works when you pull it down if I understand it correctly.
Would really appreciate the help!
If you did want to try a formula, something like this will work:
=IF(B2<>"",B2,IFERROR(INDEX(B2:B$19,MATCH(A2,IF(B2:B$19<>"",A2:A$19),0)),""))
entered as an array formula using CtrlShiftEnter in my version of Excel (2019).
The inner IF statement produces an array where only the rows that have a number in column B will have a value equal to the group value in column A, otherwise it will default to FALSE, so you get
False
False
1
False
1
False
False
2
False
...
then you do a Match on the value of A which is in the current row in the array to find the next row where there is a non-blank value in column B (3 in the first case). Then Index produces the corresponding value from column B in the row found by the Match (1 in the first case).
If the match fails because there are no further entries in the current group (like in rows 15 and 16 of the screenshot) then it will call in the "" from the IFERROR statement, so you get a blank in that row.
You could change the IFERROR to zero if you preferred to have zeroes in those cells rather than blanks:
=IF(B2<>"",B2,IFERROR(INDEX(B2:B$19,MATCH(A2,IF(B2:B$19<>"",A2:A$19),0)),0))

Excel Formula to find the first higher value

In column D (Result), I would like to have the following formula.
For each Cell in column C, find in column B the first value higher than the value of the cell of column C (starting from the same row) and gives as output the difference between the values found in column A (Count).
Example:
the value in C2 is 40. the first cell of B that has a value higher than 40 is B6. So D2 takes A6.value - A2.value = 5 - 1 = 4.
Can it be done without the use of VBA?
It can easily be accomplished with an array formula (so you have to enter the formula with Ctrl+Shift+Enter ) :
{=MATCH(TRUE;IF(B2:$B$7>C2;TRUE;FALSE);0)-1}
Put this formula in cell D2, and just drag down. You only have to change the end of your data set (change $B$7 into the real last cell of the column with data)
The formula works as follows :
The IF statement results in an array with TRUE/FALSE values that meet your criteria : {FALSE;FALSE;FALSE;FALSE;TRUE;FALSE}
The MATCH (with the 0 switch) searches the array for the index of the first match, which is 5 in our case
And you have to subtract 1 to get the offset to the cell where the function is placed, so this gives you 4
So although you have to enter it as an array formula (you will get an N/A error without the ctrl+shift+enter), the result is just a single number.
Also, depending on your data set, you might want to add some ERROR handling in case no match has been found, e.g. just using the example data set in your question, the result in cell D5 will be N/A so you have to decide what value you want the result to be in such case.
And finally, I did not use the values in column A, as I assumed this is just a sequential ascending counter. If this is not the case, and you specifically want to find the difference between the corresponding values in that column, you can use the variant mentioned by Foxfire... in one of the other answers: =MIN(IF(B2:$B$6>C2;A2:$A$6))-A2
A slightly adjusted and shortened answer on Peter K.'s suggestion:
In D2:
=MATCH(TRUE,$B3:B$7>C2,0)
enter the formula with ctrl+shift+enter
Something like this should work for you. First transform your range to a table.
=IFERROR(AGGREGATE(15,6,--([#Second]<[First])*(ROW([#Second])<=ROW([First]))/--([#Second]<[First]*(ROW([#Second])<=ROW([First])))*[Count],1) - [#Count],"")
In this formula the comparison between [second] and [First] starts at the same row. That means that the value in D5 is 0 and not 1. (Like #Foxfire And Burns And Burnslike stated in the comments).
Ok, I did not post the answer waiting until OP answered why D5 is 1 instead of 0, but my formula is also an array formula. It would be:
=MIN(IF(B2:$B$6>C2;A2:$A$6))-A2
To type this formula in array mode, you need to type it as usual, but instead of pressing ENTER, you need to press CTRL+SHIFT+ENTER

IF Function/VLOOKUP Combo

So this seems like it should be pretty easy. I could just concatenate and make another column in the data to make a unique combo and get my answer. But that just seems so messy. So here I am reaching out to you fine folks to pick your brains.
I want to look up HQLine and Description in the MPCC tab to return the correct MPCC Code. I tried a couple IF statements with VLOOKUPS but couldn't get it right.
So I need to look up BK3 Positive Crankcase Ventilation (PCV) Connector in the MPCC tab. So it needs to match BK3 and the Long description and then give me the correct code.
Here is the missing data file
Here is the MPCC export list that I want to search
Use SUMIFS.
SUMIFS will find the sum in a table of rows that meet certain criteria. If the MPCC is always a number, and the MQAb-LongDescription is always unique, SUMIFS will find the correct ID.
=SUMIFS(Sheet1!C$2:C$100,Sheet1!A$2:A$100,A2,Sheet1!B$2:B$100,B2)
where Sheet1!A$2:A$100 is the HQAb data, Sheet1!B$2:B$100 is the Long Description data, Sheet1!C$2:C$100 is the MPCC Number data, A2 is the HQLine, and B2 is the Description.
The formula would go in C1.
More information on VLookup with Multiple Criteria
You can use an Index/Match with multiple criteria.
I'm assuming that you will put this formula in "Sheet1", cell C2, and your lookup data is in a sheet called "Sheet2", columns A, B, C from row 2 to 30.
Put this in Sheet1, C2:
=INDEX(Sheet2!$C$2:$C$30,MATCH(A2&B2,Sheet2!$A$2:$A$30&Sheet2!$B$2:$B$30,0))
(Enter with CTRL+SHIFT+ENTER) and drag down.)
Adjust the ranges as necessary.
lets assume your first Table is on sheet 1 in the range A1:C11 and the MPCC codes are located on Sheet 2 in the range A1:C32. Each table has a header row so your data really starts in row 2.
Similar to BruceWayne's answer of using an array formula, you can bring the array calculation inside of a formula and avoid the special array treatment. There are a few functions that can do this. I will demonstrate with the SUMPRODUCT function;
On Sheet 1, Cell C2, use the following formula:
=INDEX('Sheet 2'!$C$1:C$32,SUMPRODUCT((A2='Sheet 2'!$A$2:A$32)*(B2='Sheet 2'!$B$2:B$32)*row('Sheet 2'!$A$2:A$32))
Explanation:
When the value in A2 matches the value in the range in the second sheet it will be true and false when it does not. when True False get used in math operations they are treated at 1 and 0 respectively. Therefore the only result from your two search criteria will be the rows where A2 match is true and B2 match is true and this will have a value of 1. The 1 will then be multiplied by the row number. Since all other results will be 0 since your list is a unique combination, the sum part of sumproduct will sum up to the row number where your unique row is located. This in turn is used by the indext function to return the row to give your unique number.

In Excel, how do I get the header corresponding to the max value from a subset of a range?

I'm pushing beyond my Excel knowledge here. I'm trying to do a poll like thing in Excel. My problem lies on showing the selected result. Here's what I have so far:
I need to select the header corresponding to the cell with the highest value in the range B2:G2 (type 1). However, if there's a tie, I need to select the header corresponding to the highest value in the range B3:G3 amongst the cells with highest values in the range B2:G2.
In my sample, column "bb" and "cc" both share highest value on type 1 (5). So, in order to determine the winner, I need to compare the highest value for type 2 between them. Since "bb" is 0 and "cc" is 1, I expect "cc" as final result.
Components for formula are below:
J2: Displays the count of cells on line 2 with the highest value in the range. So, 2. I did that with COUNTIF comparing with MAX.
K2: Displays the first header it finds with the highest value on line 2. I managed with the following formula:
=INDEX($B$1:$G$1;0;MATCH(MAX($B$2:$G$2);$B$2:$G$2;0))
To be honest, I don't fully understand that formula. Did it with help of tutorials from the internet.
I2: Displays "TIE" when there's a tie on range B2:G2. Otherwise display the winning header (K2).
J3: Displays the number of cells with the maximum value on range B3:G3 but only considering winning cells from line 2. I did that with COUNTIFS.
=COUNTIFS(B3:G3;LARGE(B3:G3;1);B2:G2;MAX(B2:G2))
Edit: Just found out by entering number "4" on B3 that this formula above is also not working...
I3: Should follow the same pattern as the cell above. Displays "TIE" when there's still a TIE. Otherwise would display winning header (to be presented on K3).
K3: I don't know what to put here. Probably because I don't quite understand that formula with INDEX, MATCH and so on, I can't figure out a way to check the highest value between the two "winning" columns from the line above and get the header.
Could somebody help me with this?
First, let's establish if there is a tie. As you have discovered, you can do this by counting how many times the highest number appears in the range.
=COUNTIF($B2:$G2;MAX($B2:$G2))
If that count is more than 1, then there is a tie.
=IF(COUNTIF($B2:$G2;MAX($B2:$G2))>1;"TIE";"no tie")
In case of a tie you want to involve the values in row 3 as a tie breaker. We could add them to the values in row 2 using this array formula. You must confirm the array formula with Ctrl+Shift+Enter, not just Enter, otherwise it won't work.
=INDEX($B$1:$G$1,MATCH(MAX(((IF(B2:G2=MAX(B2:G2),MAX(B2:G2),0))+B3:G3)),INDEX((B2:G2+B3:G3),0)))
You only want to factor in row 3 if there is a tie, though, so you can re-use the IF statement from above and replace the "tie" in the formula above with the array formula and remember to press Ctrl+Shift+Enter!!
=IF(COUNTIF($B$2:$G$2,MAX($B$2:$G$2))>1,INDEX($B$1:$G$1,MATCH(MAX(((IF(B2:G2=MAX(B2:G2),MAX(B2:G2),0))+B3:G3)),INDEX((B2:G2+B3:G3),0))),"no tie")
You already have the formula to look up the value if there is no tie.
My system uses the comma as the list separator. I have manually replaced these with semicolons in the formulas I posted, but please bear with me if I may have missed one.
Now you can copy these formulas down to row 3. If there is a tie in the data in row 3, you will need data in row 4 to break the tie.
To understand the Index/Match combo, start with your first formula and read it from the inside out. The Max() finds the largest number. The Match() returns the position, i.e. column number, of the largest number in the range B2 to G2, i.e. 2 (the second column in the range). Index looks at B1 to G1 and returns the column value from the position that the Match returned, i.e. the 2nd column, which is the text bb.
Using row 3 as the tie breaker, the formula works pretty much the same, only that rows 2 and 3 are added together when the value in row 2 is the Max value and then that number is used to find the Max and the Match.
Here is an approach with sumproducts. I dont really inderstand what results you want in I3, J3, and K3. will try to workout.
I2:
=IF(SUMPRODUCT(--(B2:G2=MAX(B2:G2)))>1,"TIE","")
J2:
=SUMPRODUCT(--(B2:G2=MAX(B2:G2)))
K2:
=IF(B7>1,OFFSET(B1,0,SUMPRODUCT(--(B3:G3=MAX(B3:G3))*--(B2:G2=MAX(B2:G2))*{0,1,2,3,4,5})),OFFSET(B1,0,SUMPRODUCT(--(B2:G2=MAX(B2:G2))*{0,1,2,3,4,5})))
the {0,1,2,3,4,5} refers to the number of headers, if there are more, this array needs to changed

Set condition before running Vlookup VBA

I created a Vlookup but first want a condition to be met to determine what Vlookup to use.
Such as
If cell = 1 then
run Vlookup #1
If cell = 2 then
run Vlookup #2
There are only 4 possible variables that the cell could equal. This would have to be on a loop, because the entries in the worksheet are multiple. It would first see what the specific cell equaled, then determine what Vlookup to use.
Any insight?
I doubt you need a loop because this might be handled within the VLOOKUP, though I am assuming the conditional cell (one of four values) is not the trigger cell for the VLOOKUP:
Here A1 is the condition, C1 the trigger value and D1 contains the formula for the lookup:
=VLOOKUP(C1,$H$1:$L$3,A1+1,0)
and hence also the result k in the example.
A2 merely determines how many columns across to step when looking for the appropriate value in the lookup table (here in the box).
We know from ColumnH that 20 is in the second row of the table and A1+1 says take the value A1 (ie four) columns further to the right.
Change the blue to 2 and the yellow to 30, for example, and the result is f.
If values are for sure integers 1 to N, just like #Tim pointed out you can use CHOOSE:
=CHOOSE(A1;Vlookup1;Vlookup2;...;VlookupN)
If A1=1 Vlookup1 will be executed, A1=2 Vlookup2....A1=N... VlookupN

Resources