Calculating the representative/average group of categorical data - excel-formula

I have a summary dataset below with 3 variables:
Rating
Frequency
Total Value
A
2
$10000
B
15
$24003
C
5
$56789
...
There are 18 different rating categories each with varying frequencies and values. I need to workout the average group which the data falls into so which of the rating groups is the average in both frequency and total value, so in short the data is rating group B for example.
I'm sure there must be a proper way to do this but haven't been able to easily find the answer online.
I've tried calculating some kind of weighted average but struggling as each category would be equally weighted?

In Office 365 you could use:
=LET(range,A2:C4,
rating,INDEX(range,,1),
freq,INDEX(range,,2),
total,INDEX(range,,3),
w_avg,SUMPRODUCT(freq,total)/SUM(freq),
delta,BYROW(total,LAMBDA(t,
MAX(t,w_avg)-MIN(t,w_avg))),
INDEX(rating,XMATCH(MIN(delta),delta)))
It requires the three columns as an input and names the columns. Than it calculates the weighted average as in my comment.
Then it checks the difference between the total and the weighted average per row. Than it indexes the rating and matches the one with the smallest difference (closest match to weighted average).
Note: in case of ties this would result in the first listed as a result. Otherwise we need to use FILTER.
For older Excel I have a solution including a helper cell & column:
In cell E2 use: =SUMPRODUCT(B2:B4,C2:C4)/SUM(B2:B4) to calculate the weighted average.
In cell D2 use: =MAX(C2,$E$2)-MIN(C2,$E$2) and drag this down to the last used row in your range to calculate the difference between the total of the rating and the weighted average.
In cell F2 use: =INDEX(A2:A4,MATCH(MIN(D2:D4),D2:D4),0)
To match the rating of the smallest difference.
Or this one line monster:
=INDEX(A2:A4,
MATCH(
MIN(
IF(C2:C4>=SUMPRODUCT(B2:B4,C2:C4)/SUM(B2:B4),
C2:C4,
SUMPRODUCT(B2:B4,C2:C4)/SUM(B2:B4))-
IF(C2:C4<=SUMPRODUCT(B2:B4,C2:C4)/SUM(B2:B4),
C2:C4,
SUMPRODUCT(B2:B4,C2:C4)/SUM(B2:B4))),
IF(C2:C4>=SUMPRODUCT(B2:B4,C2:C4)/SUM(B2:B4),
C2:C4,
SUMPRODUCT(B2:B4,C2:C4)/SUM(B2:B4))-
IF(C2:C4<=SUMPRODUCT(B2:B4,C2:C4)/SUM(B2:B4),
C2:C4,
SUMPRODUCT(B2:B4,C2:C4)/SUM(B2:B4)),
0))

Related

What dynamic array formula can I use to calculate runway or burnrate in Excel?

I'm using dynamic array functions in Excel (SCAN, MAP, LET, BYCOL, etc); wihtout VBA or regular SUMIF formulas, to create a runway or burnrate-type table. So, I start with a $10,000 budget, month 1 $2,000 are spent, so $2,000 come out of the budget, month 2 $3,000, and so on until the cash available is 0 for the remaining months of the year. With a table showing how much cash was used from the budget per month, Desired outcome in the case below.
A
B
C
D
E
F
1
Budget
$10,000
2
Month
1
2
3
4
5
3
Expense
-$2,000
-$3,000
-$7,000
-$4,000
-$2,000
4
Desired outcome
-$2,000
-$3,000
-$5,000
$0
$0
Note that the Desired outcome amount is how much of the budget was used to cover the expenses.
Notice that Month 3 I spent $7,000, but from the budget, only $5,000 were left; so that's what I show.
Studied all the dynamic array (SPILL!) functions and lambda functions that I could find on the internet (this video by excelisfun is great) but I couldn't make it work. Some combination between SCAN or MAP would be the go-to solution I would think.
The solution should be one formula that leverage MS365 dynamic array functions.
I used REDUCE to get your result:
=LET(budget, -B1,
expenses, B3:F3,
DROP(REDUCE( 0, expenses,
LAMBDA( x, y,
HSTACK( x,
IF( SUM(x,y)>=budget,
y,
budget-SUM(x))))),
,1))
It creates a cumulative sum of expenses and checks if it's greater than or equal to the budget (since your expenses are negatives I converted budget to a negative).
If it is it, the value of expenses is shown, otherwise the remainder of the budget minus the cumulative sum of the expenses.

Return three largest values from distribution in one formula

Suppose I have the following datapoints. I would like to extract the cumulative percentage distribution of this set of the three largest values.
So first step would be to transform to 100% distribution and secondly summarise the three largest values of the new distribution.
Data
0.00
1.35
11.05
24.85
37.85
15.40
6.95
1.65
0.25
I can calculate the individual percentage point with a simple datapoint / sum of datapoints per row and use =LARGE 1,2,3 on the new column to sum up the values. However, the challenge is to make all calculations in a single cell and just return just the final value.
In this case, the target value would be: 0.2494 + 0.3804 + 0.1548 = 0.7849 or 78.48%
Thanks for the help
Wrap a LARGE in SUMPRODUCT:
=SUMPRODUCT(LARGE(A2:A10,{1,2,3}))/SUM(A2:A10)

How to find row and column names of the n-th highest values in a 2D array in Excel?

I want to find the row and column names of the n-th highest values in a 2D array in Excel.
My array has a header row (the Coins) and a header column (the Markets). The data itself displays if a coin is supported on the market and if so what the approximate return of investment (ROI) will be in percent.
Example
An example of the array could look like this:
ROI
Coin A
Coin B
Coin C
Market 1
N/A
7.8%
5.7%
Market 2
0.4%
6.8%
N/A
Market 3
0.45%
7.6%
12.3%
Pay attention: So some values are also set to N/A (or is there a better way to display that a market doesn't support a specific coin? I don't want to enter 0% as it makes it harder to spot is a coin is supported by the market. I also don't want to leave that field blank because then I don't know if I already checked that market for that coin.)
Preferred output
The output for the example table from above with n=3 should then look like this (from high ROI to low):
Coin
Market
ROI
C
3
12.3%
B
1
7.8%
A
3
0.45%
Requirements
Each coin must only be shown once. So, for example, Coin B must not be listed twice in the Top3 output (once for Market 1: 7.8% and once for Market 3: 7.6%)
What I tried
So I thought about how to split up that problem into smaller parts. I think, it will come to these main parts:
find header/row name
here I found something to find the column name for the highest value per row but I wasn't able to adapt it to a working solution for a 2D array
find max in 2D array
here they describe to find the max value in a 2D array but not how to find the n-th highest values
find n-th highest values
here is a good explanation on how to find the highest n values of a 1D array but not how to apply that for a 2D array
only include each coin once
So I really tried to solve this myself but I struggle with adding these different parts together.

Calculating Total in excel with just % values

I have the following sales data:
I would like to calculate the "Total" rowbut in my actual spreadsheet I do not have access to "LY". It is also not possible to add this column in. Is there a way to calculate the total YoY value of-33% just by knowing the -50% and +50% YoY values for each product?
Many thanks, Alan.
You can calculate LY given the Amt and YoY.
LY = Amt/(1+YoY)
So, in your example, you can calculate your result with :
C6: =(SUMPRODUCT(B4:B5,1/(1+C4:C5))-B6)/SUMPRODUCT(B4:B5,1/(1+C4:C5))

excel calculate stdev given distribution

I have a table which describes "Products Per User" distribution.
It has 2 columns nProducts and nUsers:
First row has values nProducts = 1, nUsers = 60000, meaning 60000 users bought 1 product.
Second row has values nProducts = 2, nUsers = 20000, meaning 20000 users bought 2 products, and so on...
I want to calculate its STDEV. How do I do it in excel?
In addition, could you tell me how to calculate in excel how many users bought:
nProducts > thresh?
Thanks
Li
Do you mean you want to calculate the standard deviation of the number of products bought? Try this "array formula"
=STDEV(IF(B2:B10>=TRANSPOSE(ROW(INDIRECT("1:"&MAX(B2:B10)))),A2:A10))
confirmed with CTRL+SHIFT+ENTER
So in a small example if you have 1 product bought by 4 people and 2 products bought by 3 people that will give you the standard deviation of the following values
1,1,1,1,2,2,2
Is that what you need?
Note that you can also use this "non array" version
=(SUMPRODUCT((A2:A10-SUMPRODUCT(A2:A10,B2:B10)/SUM(B2:B10))^2,B2:B10)/(SUM(B2:B10)-1))^(1/2)
which calculates standard deviation by calculating the square root of the average of the squared differences of the values from their average value
Just select all the second column values.
Then on the Ribbon bar:
go to Forumulas
click More Functions
select Statistical
click on STDEV.P
At this point as parameters you should have your second column already selected, just clik OK and you will have your Standard Deviation calculated.
In case you can write it manually the formula is =STDEV.P(B2:B10)
I supposed you have values from the 2nd to the 10th row in column B.
Thanks,
Mucio

Resources