excel calculate stdev given distribution - excel

I have a table which describes "Products Per User" distribution.
It has 2 columns nProducts and nUsers:
First row has values nProducts = 1, nUsers = 60000, meaning 60000 users bought 1 product.
Second row has values nProducts = 2, nUsers = 20000, meaning 20000 users bought 2 products, and so on...
I want to calculate its STDEV. How do I do it in excel?
In addition, could you tell me how to calculate in excel how many users bought:
nProducts > thresh?
Thanks
Li

Do you mean you want to calculate the standard deviation of the number of products bought? Try this "array formula"
=STDEV(IF(B2:B10>=TRANSPOSE(ROW(INDIRECT("1:"&MAX(B2:B10)))),A2:A10))
confirmed with CTRL+SHIFT+ENTER
So in a small example if you have 1 product bought by 4 people and 2 products bought by 3 people that will give you the standard deviation of the following values
1,1,1,1,2,2,2
Is that what you need?
Note that you can also use this "non array" version
=(SUMPRODUCT((A2:A10-SUMPRODUCT(A2:A10,B2:B10)/SUM(B2:B10))^2,B2:B10)/(SUM(B2:B10)-1))^(1/2)
which calculates standard deviation by calculating the square root of the average of the squared differences of the values from their average value

Just select all the second column values.
Then on the Ribbon bar:
go to Forumulas
click More Functions
select Statistical
click on STDEV.P
At this point as parameters you should have your second column already selected, just clik OK and you will have your Standard Deviation calculated.
In case you can write it manually the formula is =STDEV.P(B2:B10)
I supposed you have values from the 2nd to the 10th row in column B.
Thanks,
Mucio

Related

Calculating the representative/average group of categorical data

I have a summary dataset below with 3 variables:
Rating
Frequency
Total Value
A
2
$10000
B
15
$24003
C
5
$56789
...
There are 18 different rating categories each with varying frequencies and values. I need to workout the average group which the data falls into so which of the rating groups is the average in both frequency and total value, so in short the data is rating group B for example.
I'm sure there must be a proper way to do this but haven't been able to easily find the answer online.
I've tried calculating some kind of weighted average but struggling as each category would be equally weighted?
In Office 365 you could use:
=LET(range,A2:C4,
rating,INDEX(range,,1),
freq,INDEX(range,,2),
total,INDEX(range,,3),
w_avg,SUMPRODUCT(freq,total)/SUM(freq),
delta,BYROW(total,LAMBDA(t,
MAX(t,w_avg)-MIN(t,w_avg))),
INDEX(rating,XMATCH(MIN(delta),delta)))
It requires the three columns as an input and names the columns. Than it calculates the weighted average as in my comment.
Then it checks the difference between the total and the weighted average per row. Than it indexes the rating and matches the one with the smallest difference (closest match to weighted average).
Note: in case of ties this would result in the first listed as a result. Otherwise we need to use FILTER.
For older Excel I have a solution including a helper cell & column:
In cell E2 use: =SUMPRODUCT(B2:B4,C2:C4)/SUM(B2:B4) to calculate the weighted average.
In cell D2 use: =MAX(C2,$E$2)-MIN(C2,$E$2) and drag this down to the last used row in your range to calculate the difference between the total of the rating and the weighted average.
In cell F2 use: =INDEX(A2:A4,MATCH(MIN(D2:D4),D2:D4),0)
To match the rating of the smallest difference.
Or this one line monster:
=INDEX(A2:A4,
MATCH(
MIN(
IF(C2:C4>=SUMPRODUCT(B2:B4,C2:C4)/SUM(B2:B4),
C2:C4,
SUMPRODUCT(B2:B4,C2:C4)/SUM(B2:B4))-
IF(C2:C4<=SUMPRODUCT(B2:B4,C2:C4)/SUM(B2:B4),
C2:C4,
SUMPRODUCT(B2:B4,C2:C4)/SUM(B2:B4))),
IF(C2:C4>=SUMPRODUCT(B2:B4,C2:C4)/SUM(B2:B4),
C2:C4,
SUMPRODUCT(B2:B4,C2:C4)/SUM(B2:B4))-
IF(C2:C4<=SUMPRODUCT(B2:B4,C2:C4)/SUM(B2:B4),
C2:C4,
SUMPRODUCT(B2:B4,C2:C4)/SUM(B2:B4)),
0))

Excel - getting a value based on the max value off another row in a Table

I'm looking for a solution for a problem I'm facing in Excel. This is my table simplified:
Every sale has an unique ID, but more people can have contributed to a sale. the column "name" and "share of sales(%)" show how many people have contributed and what their percentage was.
Sale_ID
Name
Share of sales(%)
1
Person A
100
2
Person B
100
3
Person A
30
3
Person C
70
Now I want to add a column to my table that shows the name of the person that has the highest share of sales percentage per Sales_ID. Like this:
Sale_ID
Name
Share of sales(%)
Highest sales
1
Person A
100
Person A
2
Person B
100
Person B
3
Person A
30
Person C
3
Person C
70
Person C
So when multiple people have contributed the new column shows only the one with the highest value.
I hope someone can help me, thanks in advance!
You can try this on cell D2:
=LET(maxSales, MAXIFS(C2:C5,A2:A5,A2:A5),
INDEX(B2:B5, XMATCH(A2:A5&maxSales,A2:A5&C2:C5)))
or just removing the LET since maxSales is used only one time:
=INDEX(B2:B5, XMATCH(A2:A5&MAXIFS(C2:C5,A2:A5,A2:A5),A2:A5&C2:C5))
On cell E2 I provided another solution via MAP/XLOOKUP:
=LET(maxSales, MAXIFS(C2:C5,A2:A5,A2:A5),
MAP(A2:A5, maxSales, LAMBDA(a,b, XLOOKUP(a&b, A2:A5&C2:C5, B2:B5))))
similarly without LET:
=MAP(A2:A5, MAXIFS(C2:C5,A2:A5,A2:A5),
LAMBDA(a,b, XLOOKUP(a&b, A2:A5&C2:C5, B2:B5)))
and here is the output:
Explanation
The trick here is to identify the max share of sales per each group and this can be done via MAXIFS(max_range, criteria_range1, criteria1, [criteria_range2, criteria2], ...). The size and shape of the max_range and criteria_rangeN arguments must be the same.
MAXIFS(C2:C5,A2:A5,A2:A5)
it produces the following output:
maxSales
100
100
70
70
MAXIFS will provide an output of the same size as criteria1, so it returns for each row the corresponding maximum sales for each Sale_ID column value.
It is the array version equivalent to the following formula expanding it down:
MAXIFS($C$2:$C$5,$A$2:$A$5,A2)
INDEX/XMATCH Solution
Having the array with the maximum Shares of sales, we just need to identify the row position via XMATCH to return the corresponding B2:B5 cell via INDEX. We use concatenation (&) to consider more than one criteria to find as part of the XMATCH input arguments.
MAP/XLOOKUP Solution
We use MAP to find for each pair of values (a,b) per row, of the first two MAP input arguments where is the maximum value found for that group and returns the corresponding Name column value. In order to make a lookup based on an additional criteria we use concatenation (&) in XLOOKUP first two input arguments.

How to get weighted sum depending on two conditions in Excel?

I have this table in Excel:
I am trying to get weighted sum depending on two conditions:
Whether it is Company 1 or Company 2 (shares quantity differ)
Whether column A (Company 1) and column B (Company 2) has 0 or 1 (multipliers differ)
Example:
Lets calculate weighted sum for row 2:
Sum = 2 (multiplier 1) * 50 (1 share price) * 3 (shares quantity for Company 1) +
+0.5 (multiplier 0) * 50 (1 share price) * 6 (shares quantity for Company 2) = 450
So, Sum for Row 2 = 450.
For now I am checking only for multipliers (1 or 0) using this code:
=COUNTIF(A2:B2,0)*$B$9*$B$8 + COUNTIF(A2:B2,1)*$B$9*$B$7
But it does not take into account the shares quantities for Company 1 or Company 2. I only multiply 1 share price with multipliers, but not with shares quantity).
How can I also check whether it is Company 1 or Company 2 in order to multiply by corresponding Shares quantity?
Upd:
Rasmus0607 gave a solution when there are only two companies:
=$B$9*$E$8*IF(A2=1;$B$7;$B$8)+$B$9*$E$9*IF(B2=1;$B$7;$B$8)
Tom Sharpe gave a more general solution (number of companies can be greater than 2)
I uploaded my Excel file to DropBox:
Excel file
I can offer a more general way of doing it with the benefit of hindsight that you can apply to more than two columns by altering the second CHOOSE statement:-
=SUM(CHOOSE(2-A2:B2,$B$7,$B$8)*CHOOSE(COLUMN(A:B),$E$8,$E$9))*$B$9
Unfortunately it's an array formula that you have to enter with CtrlShiftEnter. But it's a moot point whether or not it would be better just to use one of the other answers with some repetition and keep it simple.
You could also try this:-
=SUMPRODUCT(N(OFFSET($B$6,2-A2:B2,0)),N(OFFSET($E$7,COLUMN(A:B),0)))*$B$9
Here's how it would be for three companies
=SUM(CHOOSE(2-A2:C2,$B$7,$B$8)*CHOOSE(COLUMN(A:C),$F$8,$F$9,$F$10))*$B$9
(array formula) or
=SUMPRODUCT(N(OFFSET($B$6,2-A2:C2,0)),N(OFFSET($F$7,COLUMN(A:C),0)))*$B$9
=$B$9*$E$8*IF(A2=1;$B$7;$B$8)+$B$9*$E$9*IF(B2=1;$B$7;$B$8)
Since in the COUNTIF function, you don't know beforehand, which company column contains a 0, or a 1, I would suggest a longer, but more systematic solution using IF:
=$B$9*$E$8*IF(A2=1;2;0,5)+$B$9*$E$9*IF(B2=1;2;0,5)
This is a bit less general, but should produce the result you expect in this case.

Excel IF OR Statement

I am having trouble determining the correct way to calculate a final rank order for four categories. Each of the four metrics make up a higher group. A Top 10 of each category is applied to the respective product to risk analysis.
CURRENT LOGIC - Assignment of 25% max per category.
Columns - Y4
Parts
0.25
25
=IF(L9=1,$Y$4,IF(L9=2,$Y$4*0.9, IF(L9=3,$Y$4*0.8, IF(L9=4,$Y$4*0.7, IF(L9=5,$Y$4*0.6, IF(L9=6,$Y$4*0.5, IF(L9=7,$Y$4*0.4, IF(L9=8,$Y$4*0.3, IF(L9=9,$Y$4*0.2, IF(L9=10,$Y$4*0.1,0))))))))))
DESIRED...
I would like to use a statement to determine three criteria in order to apply a score (1=100, 2=90, 3=80, etc..).
SUM the rank positions of each of the four categories-apply product rank ascending (not including NULL since it's not in the Top 10)
IF a product is identified in more than one metric-apply a significant contribution weight of (*.75),
IF a product has the number 1 rank in any of the four metrics-apply a score of (100).
Data - UPDATED EXAMPLE
(Product) Parts Labor Overhead External Final Score
"XYZ" 3 1 7 7 100
"ABC" NULL 6 NULL 2 100
"LMN" 4 NULL NULL NULL 70
This is way beyond my capability. ANY assistance is appreciated greatly!!!
Jim
I figured this is a good start and I can alter the weight as needed to reflect the reality of the situation.
=AVERAGE(G28:I28)+SUM(G28:I28)*0.25
However, I couldn't figure out how to put a cap on the score of no more than 100 points.
I am still unclear of what exactly you are attempting and if this will work, but how about this simple matrix using an array formula and some conditional formatting.
Array Formula in F2 (make sure to press Ctrl+Shift+Enter when exiting formula edit mode)
=MIN(100,SUM(IF(B2:E2<>"NULL",CHOOSE(B2:E2,100,90,80,70,60,50,40,30,20,10))))
Conditional Formatting defined as shown below.
Red = 100 value where it comes from a 1
Yellow = 100 value where it comes from more than 1 factor, but without a 1.

Find the top n values in a range while keeping the sum of values in another range under x value

I'd like to accomplish the following task. There are three columns of data. Column A represents price, where the sum needs to be kept under $100,000. Column B represents a value. Column C represents a name tied to columns A & B.
Out of >100 rows of data, I need to find the highest 8 values in column B while keeping the sum of the prices in column A under $100,000. And then return the 8 names from column C.
Can this be accomplished?
EDIT:
I attempted the Solver solution w/ no luck. 200 rows looks to be the max w/ Solver, and that is what I'm using now. Here are the steps I've taken:
Create a column called rank RANK(B2,$B$2:$B$200) (used column D -- what is the purpose of this?)
Create a column called flag just put in zeroes (used column E)
Create 3 total cells total_price (=SUM(A2:A200)), total_value (=SUM(B2:B200)) and total_flag (=(E2:E200))
Use solver to minimize total_value (shouldn't this be maximize??)
Add constraints -Total_price<=100000 -Total_flag=8 -Flag cells are binary
Using Simplex LP, it simply changes the flags for the first 8 values. However, the total price for the first 8 values is >$100,000 ($140k). I've tried changing some options in the Solver Parameters as well as using different solving methods to no avail. I'd like to post an image of the parameter settings, but don't have enough "reputation".
EDIT #2:
The first 5 rows looks like this, price goes down to ~$6k at the bottom of the table.
Price Value Name Rank Flag
$22,538 42.81905675 Blow, Joe 1 0
$22,427 37.36240932 Doe, Jane 2 0
$17,158 34.12127693 Hall, Cliff 3 0
$16,625 33.97654031 Povich, John 4 0
$15,631 33.58212402 Cow, Holy 5 0
I'll give you the solver solution as a starting point. It involves the creation of some extra columns and total cells. Note solver is limited in the amount of cells it can handle but will work with 100 anyway.
Create a column called rank RANK(B2,$B$2:$B$100)
Create a column called flag just put in zeroes
Create 3 total cells total_price, total_value and total_flag
Use solver to minimize total_value
Add constraints
-Total_price<=100000
-Total_flag=8
-Flag cells are binary
This will flag the rows you want and you can grab the names however you want.

Resources