How to calculate average of a column of numbers linked to each frequency bin making up a histogram, Excel 2010? - excel

I have three columns. Column A consists of numbers, column B consists of bin ranges, and column C consists of number data relevant to the individual data in column A.
Using columns A and B, I created a frequency histogram where all the data in column A have been grouped into the bins of column B. I would like to calculate the average value of each bin using the data from column C (i.e., calculate a mean value for each bin using data from column C that is associated to each value (from column A) that made up each bin).
Can anybody help?
Thanks for the replies. Here is an example of the data (Unfortunately I can not paste in images):
Below are three columns with headers Jar Type (in volume (ml)), Cookies (he number of chocolate chip cookies in the jar), and Interval for bins (bins to count the jar types):
Jar type-cookies-intervals for bins
500 3 100
500 1 150
500 0.5 200
250 3 250
150 1 300
500 1 350
150 2 400
250 2 450
### # 500
Making a histogram of the frequency of jar types gives this grouping:
Bin-Frequency
100 0
150 2
200 0
250 2
300 0
350 0
400 0
450 0
500 4
More 0
Now what I am trying to do is to find out what is the mean number of cookies that can be found in each type of jar. For example, for the 500ml we know that there are 4x500ml jars, and that in each of the 500ml we have 3+1+0.5+1 = 5.5 cookies in total. the mean would be 1.735 cookies.
My issue is that I have 5000+ numbers that separate into 100 bins.

The question calls for a "wandering trace" of a scatterplot: the values of column A (plot them on the horizontal axis) are placed into bins, which therefore comprise vertical strips in the scatterplot. The values of column C (plotted on the vertical axis) are averaged within each strip. This technique smooths out and summarizes apparent trends in the scatterplot.
In this example with 100 records the original data are in black and computed values are in green. Here is the wandering trace of means:
The open circles plot column C (associated values) against column A (data) while the solid squares, connected with a dashed red trace, plot the bin means (column G) against the midpoints (column F).
Any statistical package will provide functions for grouping data and performing operations on those groups. Excel does this to a limited extent with its SUMIF and COUNTIF functions. To use them, create a column (D in the spreadsheet) showing the grouping factor. (That's a simple lookup in the sorted BINS vector using the VLOOKUP function with its "range" option set to true.) SUMIF computes sums by group factor and COUNTIF counts by group factor. Their ratios are the bin means.
Here is what the formulas look like:
Only three formulas were actually entered and then copied down as needed:
=VLOOKUP(A2, Bins, 1, TRUE) computes the group for the value in cell A2. Bins a name for the array $(-2,-3, \ldots, 3)$ in column B.
=AVERAGE(B3:B4) computes the midpoint of the first bin. This was used as a horizontal plotting position in the scatterplot.
=SUMIF(Bin,"="&B3,NewValues)/COUNTIF(Bin, "="&B3) is where all the work is done. Bin refers to the group codes in column D and NewValues refers to the associated values in column C. The tricky parts are the constructs "="&B3: these form a text value instructing the data to be grouped by comparison to the number in cell B3, which is the first endpoint. Because this is a formula, copying it down automatically updates the B3 to B4, then B5, etc.

Related

Excel Array formula to count moving average outliers

I've tried a few things on this and settled on a 'cheap' solution. Wanted to know if this can be done directly and more elegantly.
Problem Statement and Sample Data
Assume we have a table in excel with ~200 columns and a large number of rows (~10k).
Sample Data:
identifier
val1
val2
val3
...
val200
ID_1
100
102
34
...
89
We want to add a column at the end that shows us how many "moving average" outliers exist. A moving average outlier is defined as a point that is outside the range (mean - 2 * std deviations, mean + 2 * std deviations), where the mean and std dev is calculated using the previous 10 values (therefore its a moving average outlier).
We will not test the first 10 values. But from val11, the previous 10 values will be used to form the window and we want to test if the value is an outlier.
My Solution so far
I created another table of same dimensions as the original. In cells from val11 (to val200, for all columns), I put in the formula below in the new table. And then, I can simply sum the columns in each row in the new table.
Assume val11 is on X2 in the "shocks" worksheet (for first row):
=IF(OR(shocks!X2<AVERAGEA(shocks!D2:W2)-2STDEVA(shocks!D2:W2),shocks!X2>AVERAGEA(shocks!D2:W2)+2STDEVA(shocks!D2:W2)),1,"")
But if possible, I want to avoid having a second table since it bloats and slows down the file. Any help would be greaty appreciated

Google Sheets Arithmetic Search

I have two Google sheets tabs:
I.)
--A-- --B--
--1-- type lessThan10Apart
--2-- Car 1
--3-- Plane 0
II.)
--A-- --B-- --C--
--1-- type sourceA sourceB
--2-- Car 1 100
--3-- Plane 10 100
--4-- Car 2 4
My question is how to create the lessThan10Apart formula above. lessThan10Apart should match up the type from sheet I to sheet II and only count the rows that: Are less than 10 units between A and B. But you can also imagine wanting to do any kind of arithmetic between columns B and C and running a COUNT.
My first attempt is something along the lines of:
=COUNTIFS('sheetII'!A:A),$A2, //Match column A
ABS('sheetII'!C:C-'sheetII'!B:B)<10 //Doesn't work!
)
The problem is that you can't seem to be able to do range calculations like this in COUNTIFS.
For the count (per F4 in supplied image),
=SUMPRODUCT(--(ABS(B2:B4-C2:C4)<10))
For the validSum (sum of absolute difference between B & C; per G4 in supplied image),
=SUMPRODUCT(--(ABS(B2:B4-C2:C4)<10), ABS(B2:B4-C2:C4))
Do not use full column references. Minimize your referenced ranges.
Discard the Car text in E4 in the above image.

Excel IF OR Statement

I am having trouble determining the correct way to calculate a final rank order for four categories. Each of the four metrics make up a higher group. A Top 10 of each category is applied to the respective product to risk analysis.
CURRENT LOGIC - Assignment of 25% max per category.
Columns - Y4
Parts
0.25
25
=IF(L9=1,$Y$4,IF(L9=2,$Y$4*0.9, IF(L9=3,$Y$4*0.8, IF(L9=4,$Y$4*0.7, IF(L9=5,$Y$4*0.6, IF(L9=6,$Y$4*0.5, IF(L9=7,$Y$4*0.4, IF(L9=8,$Y$4*0.3, IF(L9=9,$Y$4*0.2, IF(L9=10,$Y$4*0.1,0))))))))))
DESIRED...
I would like to use a statement to determine three criteria in order to apply a score (1=100, 2=90, 3=80, etc..).
SUM the rank positions of each of the four categories-apply product rank ascending (not including NULL since it's not in the Top 10)
IF a product is identified in more than one metric-apply a significant contribution weight of (*.75),
IF a product has the number 1 rank in any of the four metrics-apply a score of (100).
Data - UPDATED EXAMPLE
(Product) Parts Labor Overhead External Final Score
"XYZ" 3 1 7 7 100
"ABC" NULL 6 NULL 2 100
"LMN" 4 NULL NULL NULL 70
This is way beyond my capability. ANY assistance is appreciated greatly!!!
Jim
I figured this is a good start and I can alter the weight as needed to reflect the reality of the situation.
=AVERAGE(G28:I28)+SUM(G28:I28)*0.25
However, I couldn't figure out how to put a cap on the score of no more than 100 points.
I am still unclear of what exactly you are attempting and if this will work, but how about this simple matrix using an array formula and some conditional formatting.
Array Formula in F2 (make sure to press Ctrl+Shift+Enter when exiting formula edit mode)
=MIN(100,SUM(IF(B2:E2<>"NULL",CHOOSE(B2:E2,100,90,80,70,60,50,40,30,20,10))))
Conditional Formatting defined as shown below.
Red = 100 value where it comes from a 1
Yellow = 100 value where it comes from more than 1 factor, but without a 1.

Excel - Multiply Until Total Reached

I want to multiply x*y until x>=20, then multiply z that value and have the results displayed as two values, the multiple and multiple*z
The question behind the formula is, how many boxes of x capacity do I need to have a total capacity of 20 liters and how much does that cost.
x = volume of bottle
y = number of bottles in a box
z = price per box
This could be done very easily by hand, but I've been playing (with little effect) in excel for a while and would like a solution.
I hope that makes sense
I rather think what you would like is the formula provided by #Jeeped but for:
I want to multiply x*y until x>=20, then multiply z that value and have the results displayed as two values, the multiple and multiple*z
label two arrays from 1 to 20 for columns and rows as shown, populate V1 with the price per box and in B2:
=IF(AND($A2*B$1>20,A2>=20),"",$A2*B$1)
and in X2:
=IF(B2="","",$V$1*B2)
with both formulae copied across 19 columns and those two sets of 20 formulae then copied down 19 rows. The result should be similar to:

Find the top n values in a range while keeping the sum of values in another range under x value

I'd like to accomplish the following task. There are three columns of data. Column A represents price, where the sum needs to be kept under $100,000. Column B represents a value. Column C represents a name tied to columns A & B.
Out of >100 rows of data, I need to find the highest 8 values in column B while keeping the sum of the prices in column A under $100,000. And then return the 8 names from column C.
Can this be accomplished?
EDIT:
I attempted the Solver solution w/ no luck. 200 rows looks to be the max w/ Solver, and that is what I'm using now. Here are the steps I've taken:
Create a column called rank RANK(B2,$B$2:$B$200) (used column D -- what is the purpose of this?)
Create a column called flag just put in zeroes (used column E)
Create 3 total cells total_price (=SUM(A2:A200)), total_value (=SUM(B2:B200)) and total_flag (=(E2:E200))
Use solver to minimize total_value (shouldn't this be maximize??)
Add constraints -Total_price<=100000 -Total_flag=8 -Flag cells are binary
Using Simplex LP, it simply changes the flags for the first 8 values. However, the total price for the first 8 values is >$100,000 ($140k). I've tried changing some options in the Solver Parameters as well as using different solving methods to no avail. I'd like to post an image of the parameter settings, but don't have enough "reputation".
EDIT #2:
The first 5 rows looks like this, price goes down to ~$6k at the bottom of the table.
Price Value Name Rank Flag
$22,538 42.81905675 Blow, Joe 1 0
$22,427 37.36240932 Doe, Jane 2 0
$17,158 34.12127693 Hall, Cliff 3 0
$16,625 33.97654031 Povich, John 4 0
$15,631 33.58212402 Cow, Holy 5 0
I'll give you the solver solution as a starting point. It involves the creation of some extra columns and total cells. Note solver is limited in the amount of cells it can handle but will work with 100 anyway.
Create a column called rank RANK(B2,$B$2:$B$100)
Create a column called flag just put in zeroes
Create 3 total cells total_price, total_value and total_flag
Use solver to minimize total_value
Add constraints
-Total_price<=100000
-Total_flag=8
-Flag cells are binary
This will flag the rows you want and you can grab the names however you want.

Resources