Excel - dynamically average the top values of a column - excel

I have a column that may contain X number of values. I am looking to get the average of the top Y values, based on the number of X values that are above 0. There are just two rules on how to come to this value.
Y must represent 49% or less of the values greater than 0. If I have 11 total values that are above 0, then I can only take the top 5.39 values that I can take an average on.
Y must be a whole number and rounded down. To take the last piece of criteria further, 5.39 would round down to 5, so I am essentially getting the average of the top 5 values.
I feel like I am getting close with different solutions but have struggled. For instance, the solution at Getting average of top 30% of the values in one column is close, however, it appears to find the values above the 0.7 number, and not necessary the top 30%.
Here is my data,
92.593
88.889
45.679
88.889
0
88.889
87.654
0
69.136
41.975
49.383
0
40.741
50.617
0
Here I have 11 values that are above 0. I know that using the criteria I mentioned above, I can only average the top 5 values. Those top 5 values average out to be 89.383.
I have tried something like this,
=AVERAGE(LARGE(A1:A14), ROW($B$1:$B2)))
Where B1 just has the value 1, and B2 has a formula that calculates the number of values I can have, with no such luck.

I don't know your XL version, but this also works with older versions
=AVERAGE(LARGE(IF(A1:A15>0,A1:A15),ROW(INDIRECT(B1&":"&B2))))
remember to confirm as array formula using CTRL+SHIFT+ENTER
I hope that's what you are looking for,
bye.

Related

Excel - standard deviation where source cells are a count

I have some data that looks like this
Condition 1
Condition 2
Condition 3
Condition 4
Condition 5
0
0
0
70
0
0
50
10
0
0
120
0
0
5
5
Where the value in each cell is the number of meters of an asset that is the given condition. Or in other words, a count of the number of meters that are a '4'.
How do I calculate a standard deviation for this? Obviously the std.dev would be '0' for the first row, higher for row 2, and fairly low for row 3.
Something similar to REPT, but that repeats a value x times in a formula?
I considered a helper column but the number of meters that there are in total makes this impractical.
I am not a math expert, but I can show you how to "make a range of numbers" based on the criteria shown, using Excel 365.
Suppose your data is in the range B2:F4 as shown below. In cell G2, enter the following formula and drag it down:
=STDEV.P(--FILTERXML("<t><s>"&TEXTJOIN("</s><s>",1,REPT($B$1:$F$1&"</s><s>",$B2:$F2))&"</s></t>","//s[number()=.]"))
The above will calculate the standard deviation using the STDEV.P function, but I am unsure if this is the right function to use as there are many other variations to the original STDEV function.
Regardless, the following part of the formula is able to return a range of numbers as desired:
=--FILTERXML("<t><s>"&TEXTJOIN("</s><s>",1,REPT($B$1:$F$1&"</s><s>",$B2:$F2))&"</s></t>","//s[number()=.]")
You can view this question and the answer by JvdV to understand the use of the FILTERXML function.
Another way of doing it is to use the alternative SD formula
which would give you
=SQRT((SUM(A2:E2*COLUMN(A2:E2)^2)-SUM(A2:E2*COLUMN(A2:E2))^2/SUM(A2:E2))/SUM(A2:E2))
for the population standard deviation.
The Excel 365 version using Let is more readable I think:
=LET(x,COLUMN(A2:E2),
mpy,A2:E2,
n,SUM(mpy),
sumxsq,SUM(mpy*x^2),
sumsqx,SUM(mpy*x)^2,
numerator,sumxsq-sumsqx/n,
SQRT(numerator/n)
)
A bit less obviously, you could get it from the original formula
=SQRT(SUM(A2:E2*(COLUMN(A2:E2)-SUM(A2:E2*COLUMN(A2:E2))/SUM(A2:E2))^2/SUM(A2:E2)))
Again, in Excel 365 you could write this as:
=LET(x,COLUMN(A2:E2),
mpy,A2:E2,
n,SUM(mpy),
xbar,SUM(mpy*x/n),
numerator,SUM(mpy*(x-xbar)^2),
SQRT(numerator/n)
)
Change the denominator to
(SUM(A2:E2)-1)
for the sample standard deviation.
I ended up figuring it out.
I added a column which calculated the average. (Say column F)
I then had a formula like this
=SQRT(SUM(A2*POWER((1-F2),2),B2*POWER((2-F2),2),C2*POWER((3-F2),2),D2*POWER((4-F2),2),E2*POWER((5-F2),2))/SUM(A2:E2))
Essentially this calculated the variance from the mean for each condition value, multiplied by the number of values (e.g. number of meters) of asset that are that particular condition, then did the normal other standard deviation calculations (sum, divide by total, square).

Excel merge two lists

I have two Excel lists:
One extensive with 20 thousand lines. In which:
Two columns are important: First: Unique ID, Second: a value (number formatted).
It can be a value that appears several times, or only once.
I have to create the second list. In this list I have only one column of values that I would like to have.
I need a formula that will look for values from List 2 in List 1 and then match a Unique ID to each value.
It is important that, when no direct value exist. In this case it has to search for a sample which is in about 3-5% value deviation.
Example: there was no value 127, but within 3%, 125 was found.
I've tried indexing and comparison, but it does not seem to work.
VLOOKUP worked, but without 3-5% deviation
I am very grateful for the help.
Example: http://www.filedropper.com/excellist1and2
If the value exists in the list, you can use VLOOKUP or INDEX(MATCH to find it - that's the easy part. If the value is not in the list, then you need to find the nearest value.
The nearest "low" value will be the MAX value ≤ our input, and the nearest "high" value will be the MIN value ≥ our input.
If you have Office 365, you can use MINIFS($D$1:$D$6,$D$1:$D$6,">="&B1,$D$1:$D$6,"<="&(B1*1.05)) and MAXIFS($D$1:$D$6,$D$1:$D$6,"<="&B1,$D$1:$D$6,">="&(B1*0.95)) )` here. If not, you'll need an Array Formula, we can build that "±5%" in early, to simplify the formula.
Starting with the Low values, we want the MAX value ≤ our input and ≥ 95% of our input. Putting an Array Formula in a SUMPRODUCT so that we can use it in a normal formula, we get =SUMPRODUCT(MAX($D$1:$D$6*--($D$1:$D$6<=B1)*--($D$1:$D$6>=(B1*0.95))))
The High values are slightly harder, because we can't just multiply be 0 to cancel out anything too low, or over 105% of the target. We need to add a huge number like 1E+99 (a 1 with ninty-nine 0s after it) instead, so that the MIN will ignore them: SUMPRODUCT(MIN($D$1:$D$6+1E+99*(--($D$1:$D$6<B1)+--($D$1:$D$6>(B1*1.05)))))
The last steps are to decide which of these numbers is closer to the target, and then to find the Unique ID to match. The %closeness calculations are (TARGET - LOW)/TARGET and (HIGH - TARGET)/TARGET), and subtracting one from the other gives you 2-(HIGH + LOW)/TARGET - a Positive number means "High" is closer, a Negative number means that "Low" is closer, and 0 means they are both the same distance (I'll default this to the Low number). We then use SIGN to change it to ±1, add 2 to get 1,2 or 3 and finish up with CHOOSE to output our number. In pseudo-code, CHOOSE(2+SIGN(2-(HIGH+LOW)/TARGET),LOW,LOW,HIGH), and the full thing:
CHOOSE(2+SIGN(2-(SUMPRODUCT(MAX($D$1:$D$6*--($D$1:$D$6<=B1)*--($D$6>B1*0.95)))+SUMPRODUCT(MIN($D$1:$D$6+1E+99*(--($D$1:$D$6<B1)+--($D$1:$D$6>(B1*1.05))))))/B1),SUMPRODUCT(MAX($D$1:$D$6*--($D$1:$D$6<=B1)*--($D$6>B1*0.95))),SUMPRODUCT(MAX($D$1:$D$6*--($D$1:$D$6<=B1)*--($D$6>B1*0.95))),SUMPRODUCT(MIN($D$1:$D$6+1E+99*(--($D$1:$D$6<B1)+--($D$1:$D$6>(B1*1.05))))))
Now, we have a number. All we need to do is either use VLOOKUP, or use MATCH to get the row it is on, and INDEX to pull the data for that row:
Office 365:
=IFERROR(VLOOKUP(B1,$D$1:$E$6,2,FALSE),VLOOKUP(CHOOSE(2+SIGN(2-(MAXIFS($D$1:$D$6,$D$1:$D$6,"<="&B1,$D$1:$D$6,">="&(B1*0.95))+MINIFS($D$1:$D$6,$D$1:$D$6,">="&B1,$D$1:$D$6,"<="&(B1*1.05)))/B1),MAXIFS($D$1:$D$6,$D$1:$D$6,"<="&B1,$D$1:$D$6,">="&(B1*0.95)),MAXIFS($D$1:$D$6,$D$1:$D$6,"<="&B1,$D$1:$D$6,">="&(B1*0.95)),MINIFS($D$1:$D$6,$D$1:$D$6,">="&B1,$D$1:$D$6,"<="&(B1*1.05))),$D$1:E$7,2,FALSE))
Otherwise:
=IFERROR(VLOOKUP(B1,$D$1:$E$6,2,FALSE),VLOOKUP(CHOOSE(2+SIGN(2-(SUMPRODUCT(MAX($D$1:$D$6*--($D$1:$D$6<=B1)*--($D$6>B1*0.95)))+SUMPRODUCT(MIN($D$1:$D$6+1E+99*(--($D$1:$D$6<B1)+--($D$1:$D$6>(B1*1.05))))))/B1),SUMPRODUCT(MAX($D$1:$D$6*--($D$1:$D$6<=B1)*--($D$6>B1*0.95))),SUMPRODUCT(MAX($D$1:$D$6*--($D$1:$D$6<=B1)*--($D$6>B1*0.95))),SUMPRODUCT(MIN($D$1:$D$6+1E+99*(--($D$1:$D$6<B1)+--($D$1:$D$6>(B1*1.05)))))),$D$1:E$7,2,FALSE))
(Obviously, change $D$1:$D$6 and $D$1:$E$6 to your actual data table ranges, and B1 to the input-value range)

lowest value in top 50%

I'm working on a King of the Hill/Elimination bracket spreadsheet & I have a cell that I want to return the cut (last score in the top 50%). Would anyone know how to go about this?
I have a formula for the average of the range, excluding 0's, but that isnt accurate since it isnt actually showing the lowest score. =AVERAGEIF(F9:G667,"<>0")
What you're describing is "percentile". Excel's percentile function interpolates, so I'm not sure it's appropriate for your use case.
See https://support.office.com/en-us/article/percentile-function-91b43a53-543c-4708-93de-d626debdddca
At the very least, you can compute the percentile, then take the minimum score of all values filtered to be above the interpolated 50th percentile.
Here's a slightly clever implementation:
Assume your data is in the range C2:C11.
In c13, we'll compute the 50th percentile as =PERCENTILE(C2:C11, 0.5)
In column d, we'll use an IF statement to either select the adjacent value from column c, or a very large number, depending on whether the value is greater than the percentile. E.g., =IF(C2 > $C$13,C2,400000)
Now we can take the min of column d: =MIN(D2:D11)
The only clever bit is using a giant number when the value in column c is less than the percentile, so that it effectively becomes invisible to the min operation.
The last score in the top 50% is the kth largest item where k is the number of items divided by 2. Excel has a useful function called LARGE, which returns the kth largest item. It also have an even more useful function called AGGREGATE which lets you do things like SUM, COUNT, AVERAGE, LARGE, SMALL, etc - but skips hidden rows or Error Values.
So, to get the kth item of the list (ignoring Error Values) we would use =AGGREGATE(15, 6, F9:G667, k) - of course, this is not going to skip 0, because 0 is not an error. But, you know what is? Divide by 0. So, if we do F9*F9/F9, then we will either get F9 (for F9<>0) or the #DIV/0! error.
This now means our function is =AGGREGATE(15, 6, F9:G667*F9:G667/F9:G667, k), but we still need to decide on a value for k. Well, if we have 3 items then we want the 2nd one, if we have 6 items then we want the 6rd one - so, for n items we want item n÷2, rounded up. Well, that's what the ROUNDUP function is for!
Still, we need to know what n is - but that's simple. It's just the number of non-0 items in the list, or COUNTIF(F9:G667,"<>0") (Or ">0" if your list cannot contain negative numbers)
Plug it all together, and we get this:
=AGGREGATE(15, 6, F9:G667*F9:G667/F9:G667, ROUNDUP(COUNTIF(F9:G667,"<>0"), 0))

Compare 2 set of cells and count

I'm trying to make something of a pattern recognize counting cell and calculate their probability of occurrence.
Set 1 is from a database that looks like this (consists of 6+1 columns):
ABABAB 1
CACACA 2
CACACA 2
CACACA 2
CACACA 1
ABABAB 1
Set 2 is what I input manually (only 6 columns, without the 7th column information)
CACACA
If I want to know the probability of the 7th column as "2", pattern recognize counting cell should return 75, where if its "1" it should return 25.
Been cracking my brain using countifs function but nothing seems to work.
From my understanding you have seven columns, like this:
You would like to make a ratio between a combination that includes all seven columns (from A to G in the picture), and the combination of the first six columns (from A to F in the picture).
In this case you may use COUNTIFS as follows (I will use your same example CACACA):
=COUNTIFS(A1:A6,"C",B1:B6,"A",C1:C6,"C",D1:D6,"A",E1:E6,"C",F1:F6,"A",G1:G6,"2")/COUNTIFS(A1:A6,"C",B1:B6,"A",C1:C6,"C",D1:D6,"A",E1:E6,"C",F1:F6,"A")
The result will be 0.75, so if you want 75 you can just multiply the result by 100.
If you want to check the result also for CACACA 1, you just need to change the last value of the first COUNTIFS in the formula from "2" to "1" and you will get 0.25.
In case my understanding was wrong or you need more support do not hesitate to drop me a note!

Calculate the average of the 3 highest grades

I'm relatively new to excel but I am making a gradebook that calculates all of my grades.
One of my classes has an interesting way to calculate grades. There are 4 quizzes, and the lowest one will be dropped (essentially removed from the calculation completely).
How would I go about this, I tried using
=((SUM(D2:D5)-SUM(SMALL(D2:D5,4)))/(COUNT(D2:D5)*100))
on data like this
D2|75
D3|80
D4|83
D5|65
So in this case, I want the 65 to be removed, then calculate the average
I am not getting any error but the average is wrong
Subtract 1 from the count. In the example you are dividing by 4 instead of 3.
Like this:
=((SUM(D2:D5)-MIN(D2:D5))/(COUNT(D2:D5) - 1) * 100
You can average the top 3 out of 4 with this formula
=AVERAGE(LARGE(D2:D5,{1,2,3}))
Add the four tests together, subtract out the min(.) of the same range, and divide the total by 3.
If your scores are in B2, C2, D2, and E2 then something like:
=(SUM(B2:E2)-MIN(B2:E2))/3
Was this helpful?

Resources