I'm trying to calculate the conditional median of a chart that looks like this:
A | B
-------
x | 1
x | 1
x | 3
x |
y | 4
z | 5
I'm using MS Excel 2007. I am aware of the AVERAGEIF() statement, but there is no equivalent for Median. The main trick is that there are rows with no data - such as the 4th "a" above. In this case, I don't want this row considered at all in the calculations.
Googling has suggested the following, but Excel won't accept the formula format (maybe because it's 2007?)
=MEDIAN(IF((A:A="x")*(A:A<>"")), B:B)
Excel gives an error saying there is something wrong with my formula(something to do with the * in the condition) I had also tried the following, but it counts blank cells as 0's in the calculations:
=MEDIAN(IF(A:A = "x", B:B, "")
I am aware that those formulas return Excel "arrays", which means one must enter "Ctrl-shift-enter" to get it to work correctly.
How can I do a conditional evaluation and not consider blank cells?
Nested if statements.
=MEDIAN(IF(A:A = "x",IF(B:B<>"",B:B, ""),"")
Not much to explain - it checks if A is x. If it is, it checks if B is non-blank. Anything that matches both conditions gets calculated as part of the median.
Given the following data set:
A | B
------
x |
x |
x | 2
x | 3
x | 4
x | 5
The above formula returns 3.5, which is what I believe you wanted.
Use the Googled formula, but instead of hitting Enter after you type it into the formula bar, hit Ctrl+Shift+Enter simultaneously (instead of Enter). This places brackets around the formula and will treat it as an array.
Be warned, if you edit it, you cannot hit Enter again or the formula will not be valid. If editing, you must do the same thing when done (Ctrl+Shift+Enter).
There is another way that does not involve the array formula that requires the CtrlShiftEnter operation.
It uses the Aggregate() function offered in Excel 2010, 2011 and beyond. The method also works for min,max and various percentiles.
The Aggregate() allows errors to be ignored, so the trick is to make all values that are not required cause errors. The easiest way is to do the task set above is:
=Aggregate(16,6,(B:B)/((A:A = "x")*(B:B<>"")),0.5)
The first and last parameters set the scene to do a percentile 50%, which is a median, the second says ignore all errors (including DIV#0) and the third says select the B column data, and divide it by a number which is one for all non empty values that have an x in the A column, and a zero otherwise.
The zeros create a divide by zero exception and will be ignored because a/1=a and a/0=Div#0
The technique works for quartiles (with an appropriate p value), all other percentiles of course, and for max and min using the large or small function with appropriate arguments.
This is a similar construct to the Sumproduct() tricks that are so popular, but which cannot be used on any quantiles or max min values as it produces zeros which look like numbers to these functions.
Bob Jordan
Perhaps to generalize it a little more, instead of this...
{=MEDIAN(IF(A:A="x",IF(B:B<>"",B:B)))}
... you could use the following:
{=QUARTILE.EXC(IF(A:A="x",IF(B:B<>"",B:B)),2)}
Note that the curly brackets refer to an array formula; you should not place the brackets in your formula but press CTRL+SHIFT+ENTER (or CMD+SHIFT+ENTER on macOS) when entering the formula
Then you could easily get the first and third quartile by altering the last number from 2 to 1 or 3 respectively. QUARTILE.EXC is what most commercial statistical software (e.g. Minitab) use by the way. The "regular" function is QUARTILE.INC, or for the older versions of Excel, just QUARTILE.
Related
I need to generate a list of values found in Column B, but I only want to include rows where the formula in Column F tests TRUE. This would ideally be a list contained within one cell where all the values from Column B are listed, separated by commas.
As an example:
| B | ... | F
----------------------------------
1 | 15 | | TRUE
2 | 10 | | TRUE
EXPECTED RESULT: "15,10"
I've tried VLOOKUP and INDEX/MATCH, but have thus far gotten nowhere.
If you have the TEXTJOIN function (Office 365, Excel 2016+), you can do it with a single formula:
=TEXTJOIN(",",TRUE,IF(F:F=TRUE,B:B,""))
This is an array formula, and you need to "confirm" it by holding down ctrl + shift while hitting enter. If you do this correctly, Excel will place braces {...} around the formula as observed in the formula bar
If your Excel does not have TEXTJOIN you will likely need VBA.
And you should shorten the whole column ranges that I used. Smaller ranges will improve computation speed. You could use either a dynamic range reference, or some size that is sure to encompass the entire data set.
I want to know how many cells it take to sum N. Please see following example:
number | cells to sum of 100
100 | 1
50 | 2
20 | 3
25 | 4
15 | 4
90 | 2
10 | 2
See the last column, it find the min number of current cell + previous cells to sum of 100.
Is there a way to do so?
Thanks.
In B2, array formula**:
=IFERROR(1+ROWS(A$2:A2)-MATCH(100,MMULT(TRANSPOSE(A$2:A2),0+(ROW(A$2:A2)>=TRANSPOSE(ROW(A$2:A2)))),-1),"Not Possible")
Copy down as required.
Change the hard-coded threshold value (100 here) as required.
As way of an explanation as to the part:
MMULT(TRANSPOSE(A$2:A2),0+(ROW(A$2:A2)>=TRANSPOSE(ROW(A$2:A2))))
using the data provided and taking the version of the above from B5, i.e.:
MMULT(TRANSPOSE(A$2:A5),0+(ROW(A$2:A5)>=TRANSPOSE(ROW(A$2:A5))))
the first part of which, i.e.:
TRANSPOSE(A$2:A5)
returns:
{100,50,20,25}
and the second part of which, i.e.:
0+(ROW(A$2:A5)>=TRANSPOSE(ROW(A$2:A5)))
resolves to:
0+({2;3;4;5}>=TRANSPOSE({2;3;4;5}))
i.e.:
0+({2;3;4;5}>={2,3,4,5})
which is:
0+{TRUE,FALSE,FALSE,FALSE;TRUE,TRUE,FALSE,FALSE;TRUE,TRUE,TRUE,FALSE;TRUE,TRUE,TRUE,TRUE})
which is:
{1,0,0,0;1,1,0,0;1,1,1,0;1,1,1,1}
An understanding of matrix multiplication will tell us that:
MMULT(TRANSPOSE(A$2:A5),0+(ROW(A$2:A5)>=TRANSPOSE(ROW(A$2:A5))))
which is here:
MMULT({100,50,20,25},{1,0,0,0;1,1,0,0;1,1,1,0;1,1,1,1})
is:
{195,95,45,25}
i.e. an array whose four elements are equivalent to, respectively:
=SUM(A2:A5)
=SUM(A3:A5)
=SUM(A4:A5)
=SUM(A5:A5)
Regards
**Array formulas are not entered in the same way as 'standard' formulas. Instead of pressing just ENTER, you first hold down CTRL and SHIFT, and only then press ENTER. If you've done it correctly, you'll notice Excel puts curly brackets {} around the formula (though do not attempt to manually insert these yourself).
I did the first 3 with an excel formula:
D3>100
C4 is where your numbers start, so C4=100, C5=50 etc.
Formula is on D4, D5, D6 etc
On D4:
=IF(C4>=D3;1;"False")
On D5:
=IF(C5>=D3;1;IF(C5+C4>=D3;2;"Error"))
On D6:
=IF(C6>=D3;1;IF(C6+C5>=D3;2;IF(C6+C5+C4>=D4;3;"Error")))
You can keep doing this, just keep replacing "Error" with an longer/updated version of IF(C6+C5+C4>=D4;3.
I don't know if this is the best way, but this will achieve it.
One way to solve this is to create an NxN matrix of equations instead of just a column. An example picture is provided. Columns E through I are hidden. The last column on the right determines the number required
Theoretically, you can also hard code the equations if the number of rows needed to get to 100 is a known small number. For example, if the number of rows is always four or less, C8 would be =IFS(B8>=100,1,SUM(B7:B8)>=100,2,SUM(B6:B8)>=100,3,SUM(B5:B8)>=100,4). BTW, you'll run into sum boundary problems with this equation on the first, second, and third rows. Therefore, the first row will need to be =if(B8>=100,1,""), the second row would be =IFS(B9>=100,1,SUM(B8:B9)>100,2,TRUE,"") and so on.
I have this problem at work to populate the worksheet with the right case number.
Sheet 1: (Report)
SSN | Service Date
123456 | 10/01/2014
Sheet 2: (Data)
SSN | Case Number | Start Date | End Date
123456 | 0000000 | 01/01/2010 | 12/31/2012
123456 | 1111111 | 01/01/2013 | 05/31/2014
123456 | 2222222 | 06/01/2014 | 11/10/2015
How can I do a VLOOKUP based on the Service Date to be within the "range" of the Start and End Date of another sheet?
In this case I would like to lookup the SSN and return case number 2222222 because that is the case active for such date of service.
I was looking online and found "MATCH". I am able to match the first result of the case matches the SSN, but how to go to the next case if it does not match?
=IF(E2>=INDEX('CASE NUMBERS'!A:F,MATCH(C2,'CASE NUMBERS'!A:A,0),4)&E2<=INDEX('CASE NUMBERS'!A:F,MATCH(C2,'CASE NUMBERS'!A:A,0),5),"YES","NO")
I am using Excel 2013 on Windows 7 at work.
You will need 3 conditions. a) Is the start date less than the Service Date b) Is the End Date greater than the Service Date and c) do the SSN numbers match?
Use the newer AGGREGATE¹ function to force any non-matches into an error state while using the ignore errors option (e.g. 6) to discard errors.
=INDEX(Sheet2!$B$2:$B$9999, AGGREGATE(15, 6, ROW($1:$9998)/((Sheet2!C$2:C$9999<=B2)*(Sheet2!D$2:D$9999>=B2)*(Sheet2!A$2:A$9999=A2)), 1))
For all intents and purposes, a worksheet formula treats FALSE as zero (e.g. 0) and TRUE as one (e.g. 1). Any number multiplied by zero is zero and any number multiplied by one is the same number. The AGGREGATE function is retrieving the row position of the first match within Sheet2!B2"B9999. That row position will be a number somewhere within ROW(1:9998). Any of the rows that do not match all three condition will have at least one zero multiplied by the denominator. This makes the denominator zero. Anything divided by zero forces a #DIV/0! error and AGGREGATE will discard those from the result set. AGGREGATE's 15 option is the SMALL and the last 1 is the k ordinal for SMALL (the very smallest). So of all the rows that match all three conditions, AGGREGATE returns the lowest one to the INDEX function which retrieves the value from Sheet2!B2"B9999.
Tighten the ranges up to a maximum of 5 rows and use the Evaluate Formula command to step through the formula and gain a better understanding.
It may be worthwhile to note that it is very easy to convert this formula to retrieve the second, third, etc. matches as well since it only requires sequencing the k ordinal up.
¹ The AGGREGATE function was introduced with Excel 2010. It is not available in earlier versions.
If SSN is in A1 of both sheets and your Case Numbers are numeric (other than 0000000) then you might try:
=SUMIFS(Sheet2!B:B,Sheet2!A:A,A2,Sheet2!C:C,"<="&B2,Sheet2!D:D,">="&B2)
SUMIFS is explained here (and elsewhere!).
This array-formula will always print the last match:
=INDEX(Sheet2!B:B,MAX((Sheet2!A:A=A2)*(Sheet2!C:C<=B2)*(Sheet2!D:D>=B2)*ROW(A:A)))
This is an array formula and must be confirmed with Ctrl+Shift+Enter.
It works if there are multiple solutions which fit the criteria
It also works with every kind of data you want to show (values/dates/strings)
! However, you should cut the range as short as possible. (its a huge calculation for the entire sheet)
I have this, for example:
ColA ColB
X 1
Y 2
Z 3
X 4
I want to be able to summarize all values in Column B which
Column A=X or
Column A=Y.
The result should be 7 (1+2+4).
I did this:
SUM(IF(COUNTIF(A:A,"X"),VLOOOKUP("X",A:B,2,),"0"), IF(COUNTIF(A:A,"Y"),VLOOOKUP("Y",A:B,2,),"0"))
For some reason, it returns 3. It doesn't adds the second value of X for some reason.
Any ideas why?
Thanks!
=SUMPRODUCT(((A2:A5="X")+(A2:A5="Y"))*(B2:B5))
If you select a portion of the formula and press Ctrl+=, you can see how it is evaluated.
=SUMPRODUCT((({TRUE;FALSE;FALSE;TRUE})+({FALSE;TRUE;FALSE;FALSE}))*(B2:B5))
Now when those two arrays are added together, the TRUE is coerced to a 1 and the FALSE to a zero.
=SUMPRODUCT(({1;1;0;1})*(B2:B5))
The resulting array of 1's and 0's is multiplied by the array from B2:B5.
=SUMPRODUCT({1;2;0;4})
And summed up to 7.
Your formula returns an error (tooo many o’s!) but with VLOOKUPs 3. Since the problem is not with Y, simplify the issue by taking out that part of the formula:
=IF(COUNTIF(A:A,"X"),VLOOKUP("X",A:B,2,),"0")
This results in 1. But so does:
=VLOOKUP("X",A:B,2,)
Hence COUNTIF(A:A,"X") (which returns 2 because there are two instances of X) does not actually help. Replaced with 7, or 103 or 5=5 - no difference.
You are obviously aware that plain vanilla VLOOKUP stops ‘searching’ once it finds the first instance that meets its ‘rules’ but unfortunately inserting a 2 with COUNTIF is not enough to ‘tell’ VLOOKUP “after finding the first match, now go off and find the second as well”.
So an answer to your question as expressed is “Yes. VLOOKUP cannot be made aware of multiple instances with the =COUNTIF function.”
Hi All You Amazing People
Update
You know what, I should let you know that I am actually trying to do this with numbers and not alphabets. For instance, I have a field with value like 225566 and I am trying to pick out fields which have 55 in them. It is only now I realize this might make a huge difference
ColumnA | ColumnB |
225566 | 2
125589 | 3
95543 | 2
(Below is what I had asked first and later realized I wasn't asking the right question.)
*Lets say I have a table as
ColumnA | ColumnB |
AABBC | 2
AADDC | 3
ZZBBC | 2
Now how could I get a SUMIF for those rows where Column A has a field with BB in it? Assume that there are hundreds of rows. I realize that I have to borrow something conceptually from the way text to column is done. But I wonder if anyone would know how I could do this. Thanks a lot.*
Since you're trying to do this on numbers, you'll need to use an array formula.
If your test values are in A3:A5 and your values to sum are in B3:B5, this will work:
=SUM( IF(ISERROR(FIND("55", TEXT(A3:A5,"#"))), 0, 1) * B3:B5 )
When entering an array formula, use Ctrl-Shift-Enter rather than just hitting Enter.
This sums the product of the sum value and a 0 or 1 from the IF() statement, which tests whether or not each test value, after being converted to text, contains a "55".
I think you will need an matrix/array formular to do this:
{=SUM(IF(ISERROR(FINDEN("55";A2:A4;1));0;1))}
The weird brakets {} indicate it is an matrix formular you get them by pressing SHIFT+CTRL+RETURN instead of Return when editing the formula.
This formula will cycle through the range A2:A4, check if it finds "55" inside and if so add 1 to the sum.
Google array/matrix formulas as they are not self explanatory.
Best
Jan
In Excel 2003 and 2007 (and possibly earlier versions, I cannot test), you can use * as a wildcard character in the match. For example, with your sample data set C1 to
=SUMIF(A1:A3,"*BB*",B1:B3)
and you should see the value 4.
Create a 3rd column (ColumnC) and put this formula in it:
=Text(A2,0)
Drag that column down to complete your column. This will format the value as text. Next, use SUMIF as DocMax explained, except with different columns:
=SUMIF(C1:C3,"*BB*",B1:B3)
The reason you do this is because you need to be reading a Text value, not a Number value when using the *BB* comparison of SUMIF. Great question.