Excel - standard deviation where source cells are a count - excel

I have some data that looks like this
Condition 1
Condition 2
Condition 3
Condition 4
Condition 5
0
0
0
70
0
0
50
10
0
0
120
0
0
5
5
Where the value in each cell is the number of meters of an asset that is the given condition. Or in other words, a count of the number of meters that are a '4'.
How do I calculate a standard deviation for this? Obviously the std.dev would be '0' for the first row, higher for row 2, and fairly low for row 3.
Something similar to REPT, but that repeats a value x times in a formula?
I considered a helper column but the number of meters that there are in total makes this impractical.

I am not a math expert, but I can show you how to "make a range of numbers" based on the criteria shown, using Excel 365.
Suppose your data is in the range B2:F4 as shown below. In cell G2, enter the following formula and drag it down:
=STDEV.P(--FILTERXML("<t><s>"&TEXTJOIN("</s><s>",1,REPT($B$1:$F$1&"</s><s>",$B2:$F2))&"</s></t>","//s[number()=.]"))
The above will calculate the standard deviation using the STDEV.P function, but I am unsure if this is the right function to use as there are many other variations to the original STDEV function.
Regardless, the following part of the formula is able to return a range of numbers as desired:
=--FILTERXML("<t><s>"&TEXTJOIN("</s><s>",1,REPT($B$1:$F$1&"</s><s>",$B2:$F2))&"</s></t>","//s[number()=.]")
You can view this question and the answer by JvdV to understand the use of the FILTERXML function.

Another way of doing it is to use the alternative SD formula
which would give you
=SQRT((SUM(A2:E2*COLUMN(A2:E2)^2)-SUM(A2:E2*COLUMN(A2:E2))^2/SUM(A2:E2))/SUM(A2:E2))
for the population standard deviation.
The Excel 365 version using Let is more readable I think:
=LET(x,COLUMN(A2:E2),
mpy,A2:E2,
n,SUM(mpy),
sumxsq,SUM(mpy*x^2),
sumsqx,SUM(mpy*x)^2,
numerator,sumxsq-sumsqx/n,
SQRT(numerator/n)
)
A bit less obviously, you could get it from the original formula
=SQRT(SUM(A2:E2*(COLUMN(A2:E2)-SUM(A2:E2*COLUMN(A2:E2))/SUM(A2:E2))^2/SUM(A2:E2)))
Again, in Excel 365 you could write this as:
=LET(x,COLUMN(A2:E2),
mpy,A2:E2,
n,SUM(mpy),
xbar,SUM(mpy*x/n),
numerator,SUM(mpy*(x-xbar)^2),
SQRT(numerator/n)
)
Change the denominator to
(SUM(A2:E2)-1)
for the sample standard deviation.

I ended up figuring it out.
I added a column which calculated the average. (Say column F)
I then had a formula like this
=SQRT(SUM(A2*POWER((1-F2),2),B2*POWER((2-F2),2),C2*POWER((3-F2),2),D2*POWER((4-F2),2),E2*POWER((5-F2),2))/SUM(A2:E2))
Essentially this calculated the variance from the mean for each condition value, multiplied by the number of values (e.g. number of meters) of asset that are that particular condition, then did the normal other standard deviation calculations (sum, divide by total, square).

Related

Excel - dynamically average the top values of a column

I have a column that may contain X number of values. I am looking to get the average of the top Y values, based on the number of X values that are above 0. There are just two rules on how to come to this value.
Y must represent 49% or less of the values greater than 0. If I have 11 total values that are above 0, then I can only take the top 5.39 values that I can take an average on.
Y must be a whole number and rounded down. To take the last piece of criteria further, 5.39 would round down to 5, so I am essentially getting the average of the top 5 values.
I feel like I am getting close with different solutions but have struggled. For instance, the solution at Getting average of top 30% of the values in one column is close, however, it appears to find the values above the 0.7 number, and not necessary the top 30%.
Here is my data,
92.593
88.889
45.679
88.889
0
88.889
87.654
0
69.136
41.975
49.383
0
40.741
50.617
0
Here I have 11 values that are above 0. I know that using the criteria I mentioned above, I can only average the top 5 values. Those top 5 values average out to be 89.383.
I have tried something like this,
=AVERAGE(LARGE(A1:A14), ROW($B$1:$B2)))
Where B1 just has the value 1, and B2 has a formula that calculates the number of values I can have, with no such luck.
I don't know your XL version, but this also works with older versions
=AVERAGE(LARGE(IF(A1:A15>0,A1:A15),ROW(INDIRECT(B1&":"&B2))))
remember to confirm as array formula using CTRL+SHIFT+ENTER
I hope that's what you are looking for,
bye.

Excel merge two lists

I have two Excel lists:
One extensive with 20 thousand lines. In which:
Two columns are important: First: Unique ID, Second: a value (number formatted).
It can be a value that appears several times, or only once.
I have to create the second list. In this list I have only one column of values that I would like to have.
I need a formula that will look for values from List 2 in List 1 and then match a Unique ID to each value.
It is important that, when no direct value exist. In this case it has to search for a sample which is in about 3-5% value deviation.
Example: there was no value 127, but within 3%, 125 was found.
I've tried indexing and comparison, but it does not seem to work.
VLOOKUP worked, but without 3-5% deviation
I am very grateful for the help.
Example: http://www.filedropper.com/excellist1and2
If the value exists in the list, you can use VLOOKUP or INDEX(MATCH to find it - that's the easy part. If the value is not in the list, then you need to find the nearest value.
The nearest "low" value will be the MAX value ≤ our input, and the nearest "high" value will be the MIN value ≥ our input.
If you have Office 365, you can use MINIFS($D$1:$D$6,$D$1:$D$6,">="&B1,$D$1:$D$6,"<="&(B1*1.05)) and MAXIFS($D$1:$D$6,$D$1:$D$6,"<="&B1,$D$1:$D$6,">="&(B1*0.95)) )` here. If not, you'll need an Array Formula, we can build that "±5%" in early, to simplify the formula.
Starting with the Low values, we want the MAX value ≤ our input and ≥ 95% of our input. Putting an Array Formula in a SUMPRODUCT so that we can use it in a normal formula, we get =SUMPRODUCT(MAX($D$1:$D$6*--($D$1:$D$6<=B1)*--($D$1:$D$6>=(B1*0.95))))
The High values are slightly harder, because we can't just multiply be 0 to cancel out anything too low, or over 105% of the target. We need to add a huge number like 1E+99 (a 1 with ninty-nine 0s after it) instead, so that the MIN will ignore them: SUMPRODUCT(MIN($D$1:$D$6+1E+99*(--($D$1:$D$6<B1)+--($D$1:$D$6>(B1*1.05)))))
The last steps are to decide which of these numbers is closer to the target, and then to find the Unique ID to match. The %closeness calculations are (TARGET - LOW)/TARGET and (HIGH - TARGET)/TARGET), and subtracting one from the other gives you 2-(HIGH + LOW)/TARGET - a Positive number means "High" is closer, a Negative number means that "Low" is closer, and 0 means they are both the same distance (I'll default this to the Low number). We then use SIGN to change it to ±1, add 2 to get 1,2 or 3 and finish up with CHOOSE to output our number. In pseudo-code, CHOOSE(2+SIGN(2-(HIGH+LOW)/TARGET),LOW,LOW,HIGH), and the full thing:
CHOOSE(2+SIGN(2-(SUMPRODUCT(MAX($D$1:$D$6*--($D$1:$D$6<=B1)*--($D$6>B1*0.95)))+SUMPRODUCT(MIN($D$1:$D$6+1E+99*(--($D$1:$D$6<B1)+--($D$1:$D$6>(B1*1.05))))))/B1),SUMPRODUCT(MAX($D$1:$D$6*--($D$1:$D$6<=B1)*--($D$6>B1*0.95))),SUMPRODUCT(MAX($D$1:$D$6*--($D$1:$D$6<=B1)*--($D$6>B1*0.95))),SUMPRODUCT(MIN($D$1:$D$6+1E+99*(--($D$1:$D$6<B1)+--($D$1:$D$6>(B1*1.05))))))
Now, we have a number. All we need to do is either use VLOOKUP, or use MATCH to get the row it is on, and INDEX to pull the data for that row:
Office 365:
=IFERROR(VLOOKUP(B1,$D$1:$E$6,2,FALSE),VLOOKUP(CHOOSE(2+SIGN(2-(MAXIFS($D$1:$D$6,$D$1:$D$6,"<="&B1,$D$1:$D$6,">="&(B1*0.95))+MINIFS($D$1:$D$6,$D$1:$D$6,">="&B1,$D$1:$D$6,"<="&(B1*1.05)))/B1),MAXIFS($D$1:$D$6,$D$1:$D$6,"<="&B1,$D$1:$D$6,">="&(B1*0.95)),MAXIFS($D$1:$D$6,$D$1:$D$6,"<="&B1,$D$1:$D$6,">="&(B1*0.95)),MINIFS($D$1:$D$6,$D$1:$D$6,">="&B1,$D$1:$D$6,"<="&(B1*1.05))),$D$1:E$7,2,FALSE))
Otherwise:
=IFERROR(VLOOKUP(B1,$D$1:$E$6,2,FALSE),VLOOKUP(CHOOSE(2+SIGN(2-(SUMPRODUCT(MAX($D$1:$D$6*--($D$1:$D$6<=B1)*--($D$6>B1*0.95)))+SUMPRODUCT(MIN($D$1:$D$6+1E+99*(--($D$1:$D$6<B1)+--($D$1:$D$6>(B1*1.05))))))/B1),SUMPRODUCT(MAX($D$1:$D$6*--($D$1:$D$6<=B1)*--($D$6>B1*0.95))),SUMPRODUCT(MAX($D$1:$D$6*--($D$1:$D$6<=B1)*--($D$6>B1*0.95))),SUMPRODUCT(MIN($D$1:$D$6+1E+99*(--($D$1:$D$6<B1)+--($D$1:$D$6>(B1*1.05)))))),$D$1:E$7,2,FALSE))
(Obviously, change $D$1:$D$6 and $D$1:$E$6 to your actual data table ranges, and B1 to the input-value range)

Excel: Dynamic SUM with OFFSET calculation

I need to create a dynamic sum of data chunks out of a large data set contained in a csv file (>100k rows). The data is planned to be displayed in PowerBI but I have literately no idea of the DAX coding language or VBA. So I hope I can preformat the data in excel.
The way to distinguish the data sub-sets I want to sum up is a counting row. The rows starts with every new subset from 1 but the final number >= 1 is totally ‘random’.
The first row is the countingRow the second row is the dataRow.
> 1 45
2 20
3 20
4 10 -> SUM 95
> 1 30
2 5 -> SUM 35
> 1 X -> new SUM
I think it is possible to work with the SUM, IF and OFFSET function.
My plan was to check whether a cell contains a 1 or not. Check the range between two true values minus one cell, then calculate the offset sum in the other column.
But when I thought I found the solution I realized that I have no way to bring my pointer to a new data subset.
Which function do I need to move my calculation threw the column?
Is it even possible to do an calculation of this scale in excel?
PS: I'm although thankful for a DAX or VBA tutorial which could bring me to a solution.
In the following sample data image, use the following formula in D2.
=IF(OR(A3={1,""}), SUM(INDEX(B:B, AGGREGATE(14, 6, ROW($1:2)/(A$1:A2=1), 1)):INDEX(B:B, ROW())), TEXT(,))
Fill down.
A slightly shorter formula without array type formula:
=IF(OR(A3={1,""}),SUM($B$2:B2)-SUM($C$1:C1),"")

Rounding cell value to nearest thousand in excel

I have been trying to round a cell value up or down to the nearest thousand but can't get it to work. I'm trying to calculate my hourly rate based on the current exchange rate from USD to VND But if the resulting total is something like 22,325 then it should round down to 22,000 and likewise if the hundreds are 500 or more it should round up to 23,000
So where the hourly rate says 527,325 it should round down to 527,000. The cell already contains a formula to multiply the the current exchange rate by the USD.
Using the ROUND function:
=ROUND(A1,-3)
Where A1 is the cell containing the number you wish to round.
The negative number specifies digits to the left of the decimal point to replace with zeros (the number of zeros at the end of the number).
Like so:
=ROUND( 34528, -3 ) = 35000
As for the OP's example:
=ROUND( 22325, -3 ) = 22000
The OP also stated:
likewise if the hundreds are 500 or more it should round up to 23,000
=ROUND( 22500, -3 ) = 23000
See ROUND Function Office support
You can also try this:
=MROUND(A1,1000)
This should give you what you want.
You've asked specifically for a formula that when it 'says 527,325 it should round down to 527,000'. For this you would need the FLOOR¹ function or ROUNDDOWN¹ function.
=FLOOR(A1, 1000)
=ROUNDDOWN(A1, -3)
There is also leaving the number alone but formatting with a custom number format of 0, K but this does not round down. If the number was 527,501 it would display 528 K not 527 K.
      
¹ The counterparts to FLOOR and ROUNDDOWN are the CEILING function and ROUNDUP function.
Divide by 1000, then round, then multiply by 1000.
The formula for rounding A1 in this way would be:
=ROUND(A1/1000)*1000
If a formula already exists in the A1 cell, just replace the A1 with the new formula in the upper expression.

Excel: AVERAGE IF

Lets say I have an Excel sheet such that:
Column 1 contains salaries
Column 2 contains gender (M/F)
How can I calculate the average salary for females?
=AVERAGE(IF(B1:B10="F",A1:A10))
entered as an array function (ie using Shift-CTRL-Enter rather than just Enter)
Allthough the answer is already answered/accepted I can't resist to add my 2 cents:
Sums and averages normally are displayed at the bottom of a list. You can use the SUBTOTAL() function to calculate sum and average and specify to include or exclude "hidden" values, i.e. values suppressed by a filter. So the solution could be:
create a formula =SUBTOTAL(101,A2:A6) for the average
create a formula =SUBTOTAL(109,A2:A6) for the sum
create an autofilter on the Gender column
Now, when you filter for "F", "M" or all, the correct sum and average will always be computed.
Hope that helps - Good luck
To do it without an array formula just use this:
=SUMIF(B:B,"F",A:A)/COUNTIF(B:B,"F")
Answered question, solid formulas - BUT - beware of conditional averages based on numbers:
In a very similar situation I tried:
"=AVERAGE(IF(F696:F824<0;E696:E824))" (shift control enter) - In English this asked Excel to calculate an average on all numbers in column E if a result in column F was negative (a loss) - e.g. "calculate an average for all items where a loss occured". (no circular reference)
When asked to calculate an average for all items where a gain (x>0) occured, Excel got it right. However, when the conditional average was based on a loss - Excel produced a huge error (7.53 instead of 28.27).
I then opened exactly the same document in Open Office, where Calc got the (correct) answer 28,27 from the same Array formula.
Recalculating the whole thing in steps in Excel (first new column of only losses, new for only gains, new column for only E-values where a loss/gain occured, then a "clean" (unconditional) average calculation, produced the correct values.
Thus, it should be noted that Excel and Open Office produce different answers (and Excel 2007, Swedish language version, gets them wrong) in some cases of conditional averages.
(sorry for my long cautionary tale - but be a bit careful when the condition is a number would be my advice)
You can put filter in top line of your sheet. Then Filter only Fs from Column2, then calculate average of values.
Or you can add additional column with female salaries only.
The formula would be like: =IF(B1 = "Female";A1)
Then you simply calculate average of newly created column.
Salaries gender
2500 M 0 2500*0
2400 F 1 2400×1
2300 F 1 2300×1
sum =2 sum=4700 average=4700/2
Maybe it be complex.

Resources