Histogram in Excel with bins - excel

I have an excel spreadsheet with score and frequency of scores, as such:
Score Count
0 2297802
1 2392803
2 1258527
3 969550
4 818579
5 675646
6 591326
7 598960
8 506268
9 448232
10 414830
11 382808
...
I'm looking for a way to 'bucket' these scores in intervals of (say) 3 and plot them to show the distribution:
Score Count
0-2 5949132
3-5 2463775
...
And so on
I'm using Excel for Mac and I tried defining a 3 interval bin in the Analysis ToolPak but that appears to work only on raw data as opposed to the counts that I already have.

in cells D2 downwards, enter your upper (inclusive) bin limits
in cell E2, enter =SUMIF(A:A,"<="&D2,B:B)-SUM(E$1:E1)
copy E2 downwards
example result:

Related

How does excel calculate values when you drag out a range?

I have been trying to find an answer online but haven't been able to find one.
When given a range of values, selecting this range and dragging out the cells will generate more values. How are these values calculated? In certain cases it is easy to figure, like when all values are the same or when they are increasing by a steady interval, but how are values calculated when more random sequences of values are given?
For example, given the range
Val 1
Val 2
Val 3
Val 4
Val 5
Val 6
5
5
6
54
5
2
when selecting all values and dragging out to the right, I will end up with the following range:
Val 1
Val 2
Val 3
Val 4
Val 5
Val 6
Dragged out 1
Dragged out 2
Dragged out 3
5
5
6
54
5
2
16.133
17.976
18.019
How are the three dragged out values calculated?
This is done using linear regression, as calculated by the least squares method, explained in this Wikipedia-article.
As an illustration, I have created an Excel sheet, containing the numbers from 1 to 6 and I've added your numbers. Then I've added the numbers 7-9 and used least squares method (as supported by Excel) and put everything in a graph. Please realise that the original values are shown but overwritten by the estimated values in the attached graph (the yellow cells contain the formula of the cell at its left):

Feeding Multiple Distant Cells as One Array in Excel?

I have the following table in Excel.
A B C D E
1 i x y mu sigma
2 0 1 2 =average(b$1:b1,b3:b$12) =stdev.s(b$1:b1,b3:b$12)
3 1 3 4 =average(b$1:b2,b4:b$12) =stdev.s(b$1:b2,b4:b$12)
4 2 2 1 =average(b$1:b3,b5:b$12) =stdev.s(b$1:b3,b5:b$12)
5 3 1 2 ... ...
6 4 2 5
7 5 4 7
8 6 8 1
9 7 2 3
10 8 5 9
11 9 1 3
The ith mu calculates the average without the ith observation—the leave-one-out average. I can also calculate the leave-one-out standard deviations, but how can I do this for the leave-one-out correlations then? correl requires two arrays and I cannot feed two or more distant cells as one array using commas. Can I input non-consecutive cells as one array? For example, I tried =correl((b$1:b1,b3:b$12),(c$1:c1,c3:c$12)), but failed. Thanks for your reading.
Because CORREL takes arrays we can use an array formula:
=CORREL(INDEX(B:B,N(IF({1},MODE.MULT(IF(ROW($B$2:$B$11)<>ROW(),ROW($B$2:$B$11)*{1,1}))))),INDEX(C:C,N(IF({1},MODE.MULT(IF(ROW($C$2:$C$11)<>ROW(),ROW($C$2:$C$11)*{1,1}))))))
Use this easier array formula as per the OP(I overthought this):
=CORREL(IF(ROW($B$2:$B$11)=ROW(),"",$B$2:$B$11),IF(ROW($C$2:$C$11)=ROW(),"",$C$2:$C$11))
Being an array formula one needs to use Ctrl-shift-Enter instead of Enter when exiting edit mode.
This will pass two arrays of only the numbers not on the row where the formula is placed.
Now, eventually when microsoft releases the dynamic array formula into Office 365 this can be simplified using FILTER()
=CORREL(FILTER($B$2:$B$11,ROW($B$2:$B$11)<>ROW()),FILTER($C$2:$C$11,ROW($C$2:$C$11)<>ROW()))

Excel using SUMIF to calculate totals of multiple columns

I'm trying to use Excle's SUMIF to calculate totals of Col1 to Col5 for dates that are similar.
My formula is as follows =SUMIF($A2:$A7,A10,$B2:$F7), but this only gives me the total of a single column.
How can I get the Totals of all the columns based on the date like I've shown in my results.
Date Col 1 Col 2 Col 3 Col 4 Col 5
1/5/2017 1 2 2
1/5/2017 5 3 1
1/5/2017 9 5 5
2/5/2017 10 5 3
2/5/2017 20 10 3
2/5/2017 6 8 1 5
Desired Results
1/5/2017 15 7 7 3 1
2/5/2017 30 11 11 11 8
use below formula in cell B11
=SUMIF($A$2:$A$7,$A11,B$2:B$7)
Per the example you provided, One solution is to use SUMPRODUCT
Multiplies corresponding components in the given arrays, and returns the sum of those products
Microsoft Docs give a thorough example, but per SO etiquette, here is an example in case of link-rot: [FYI, I used absolute reference for easier filling across, arbitrary how you get it done though]
Forumlas shown:
Formula is kind of hard to see without clicking on image:
=SUMPRODUCT(($B$3:$B$8=$B$11)*C3:C8)
This basically breaks down like this, it searches the B:B column for a match, and it will naturally return a true or false for the match, or 0/1 counterparts, and multiplys that by the number found in the column to the right (C3:C8), so it will either be 1 * # = # or 0 * # = 0

Manipulating function sample sizes in Excel

Suppose I had two time series consisting of weekly data points, and I want to compute the covariance of the time series for the last n weeks using the covariance function in Excel.
Would it be possible to set this scenario up in such a way that a certain cell contains the number of weeks of data I want to compute the covariance for?
That is, changing the cell element to k would lead to the already computed covariance for n weeks to change to the covariance of the data series for the last k weeks?
You decided that sample data was not important so here is some.
date nmbr
03-30-2017 4
04-04-2017 4
04-07-2017 2
04-09-2017 2
04-12-2017 1
04-15-2017 4
04-18-2017 1
04-21-2017 2
04-24-2017 1
04-26-2017 3
04-30-2017 4
05-02-2017 5
05-07-2017 4
05-09-2017 2
05-10-2017 1
05-12-2017 5
05-14-2017 4
My crystal ball tells me that this question is not so much about Excel's COVARIANCE.P or COVARIANCE.S but about limiting date related data. To this end, I'll simply SUM 4 weeks of data.
The formulas needed in E2:H2 (see supplied image) are:
=TODAY()
4
=FLOOR(E2-(F2*7), 7)+1
=SUM(INDEX(B:B, MATCH(G2, A:A)+ISNA(MATCH(G2, A:A, 0))):INDEX(B:B, MATCH(1E+99, A:A)))
Note that the dates are in ascending order.

Excel - allocate weight based on text

I have an risk control assessment where some controls are key and hold greater weight than non key controls.
Key vaule (1-4)
Y 4
Y 3
N 2
N 2
I want the keys with a "Y" to be summed at a weight of 70% and the non-keys with an "N" to be summed at a weight of 30%.
If we add the column we get 11. However, I want the 7 (4+3) to be multiplied by 70% and the 4 (2+2) be multiplied by 30%.
There may be 4 rows or 40. There generally are only 1 or 2 key controls ("Y"), but, if there are 40 rows or controls, there may be up to 5 "Y"s.
Any thoughts?
A simple way to do what I think you want would be to create a third column that had formulas like this one: =IF(A1="Y",B1*0.7,B1*0.3). Then, you could use the SUM function to add up all of the results. See the cells with formulas below.
Key Value Weighted Value
Y 1 =IF(A2="Y",B2*0.7,B2*0.3)
N 2 =IF(A3="Y",B3*0.7,B3*0.3)
N 3 =IF(A4="Y",B4*0.7,B4*0.3)
Y 4 =IF(A5="Y",B5*0.7,B5*0.3)
=SUM(C2:C5)
Here would be the result...
Key Value Weighted Value
Y 1 0.7
N 2 0.6
N 3 0.9
Y 4 2.8
5
As you can seen from #TimWilliams' comment, there is some uncertainty about your requirement but if it is weighting factors then the following formula might suit:
=IF(A2="Y",C$1*SUM(B:B)*B2/SUMIF(A:A,"=Y",B:B)/SUM(B:B),(1-C$1)*SUM(B:B)*B2/SUMIF(A:A,"=n",B:B)/SUM(B:B))
copied down to suit and assuming a layout as shown:

Resources