How does excel calculate values when you drag out a range? - excel

I have been trying to find an answer online but haven't been able to find one.
When given a range of values, selecting this range and dragging out the cells will generate more values. How are these values calculated? In certain cases it is easy to figure, like when all values are the same or when they are increasing by a steady interval, but how are values calculated when more random sequences of values are given?
For example, given the range
Val 1
Val 2
Val 3
Val 4
Val 5
Val 6
5
5
6
54
5
2
when selecting all values and dragging out to the right, I will end up with the following range:
Val 1
Val 2
Val 3
Val 4
Val 5
Val 6
Dragged out 1
Dragged out 2
Dragged out 3
5
5
6
54
5
2
16.133
17.976
18.019
How are the three dragged out values calculated?

This is done using linear regression, as calculated by the least squares method, explained in this Wikipedia-article.
As an illustration, I have created an Excel sheet, containing the numbers from 1 to 6 and I've added your numbers. Then I've added the numbers 7-9 and used least squares method (as supported by Excel) and put everything in a graph. Please realise that the original values are shown but overwritten by the estimated values in the attached graph (the yellow cells contain the formula of the cell at its left):

Related

Arranging a list of values randomly

I would like to arrange a list of values randomly with a formula automatically.
My data has repetitive values.
Sample data in columns A:E
Data
4 6 8 0 0
1 5 5 7 9
So when randomized, in columns I:M, it would look something like this:
Randomized Data
6 0 8 4 0
5 1 7 5 9
I tried something with rand and randbetween, and found a formula but it doesn't work repetitive numbers. Can I get any suggestion please? thanks.
You can use a combination of RAND, RANK and INDEX:
The strategy is to create a vector of random numbers, calculate their rank, and then use this rank to read off the columns on the data set.
In the above screenshot I used:
=RANK(A4,$A$4:$E$4)
in A5,
=INDEX($A$1:$E$2,1,A6)
in A8, and
=INDEX($A$1:$E$2,2,A6)
in A9.
It might be possible to get this down to a single array formula, though the result then wouldn't be very readable.

Does of order data points in excel influence the Regression results in Excel

I tried to do a regression analysis with some 91 data points. When I did the regression analysis initially, I got R value as 0.366733. Later I sorted the datapoints from smallest to largest and then did the regression analysis. My new R value is 0.04323. Does the order in which the original data points are arranged influence the regression analysis
The ordering of paired datapoints does not matter in regression
For example:
5 9
6 1
3 7
9 5
6 4
Gives a correlation (which is the same as standardized regression) of -0.37
If I reorder the entire data based on column 1 values:
3 7
5 9
6 1
6 4
9 5
I get the same correlation of -0.37. Notice that the pairs are still aligned, i.e. both columns are being sorted together
But in Excel its very easy to get into a situation like the following, where you're sorting by only a single column. Meaning one column will be the ordered, but the pair alignment is broken because the second column doesnt change:
3 9
5 1
6 7
6 5
9 4
Now I get a correlation of -0.41. The pairs of data are no longer aligned and effectively makes this a completely different dataset than before
Bottom line: when youre sorting in Excel make sure you've selected all of your data for the sort and not just a single column

How to match two sets of data by dates which do not synchronise and include missing values in Excel

Please forgive any errors or shortcomings in this question, it's my first on stackoverflow.
I have two sets of data in Excel of differing lengths and frequency, and would like to be able to place a value of 0 for where they don't synchronise, and match the rest.
For example, dataset 1 could be:
Date Set1
01-01-2010 10
01-03-2010 4
01-04-2010 8
01-05-2010 5
01-06-2010 10
01-09-2010 12
01-10-2010 9
01-11-2010 4
And dataset 2 could be:
Date Set2
01-03-2010 102
01-06-2010 104
01-10-2010 102
I'm looking for an output table that displays the values alongside each other for dates matching, 0 otherwise, like so:
Date Set1 Set2
01-01-2010 10 0
01-03-2010 4 102
01-04-2010 8 0
01-05-2010 5 0
01-06-2010 10 104
01-09-2010 12 0
01-10-2010 9 102
01-11-2010 4 0
I can't seem to be able to crack this with my limited knowledge and the lack of synchronisation in the data. Any help would be much appreciated, thanks.
You can do this using a VLOOKUP nested in an IFERROR statement.
The two equations used (and dragged down to last unique date row) are:
H3 = IFERROR(VLOOKUP(G3,A:B,2,0),0)) & I3 = IFERROR(VLOOKUP(G3,D:E,2,0),0))
This will not work if you have duplicate dates in the same data set with varying values since VLOOKUP will always return the first matched value (reading top down).
Place Set1 in A1:B9 (header in row 1). Add a column of zeros next to it in column C, so A2:A9 is dates, B2:B9 is values and C2:C9 is zeros.
Place Set2 (without the header) in A10:B12; move the Set2 data to column C and put zeros in column B, so A10:A12 is dates, B10:B12 is zeros, C10:C12 is values.
Sort the range A2:C12 by Date (column A).
Easier to show with a screenshot but newbies are not allowed to post images.

Histogram in Excel with bins

I have an excel spreadsheet with score and frequency of scores, as such:
Score Count
0 2297802
1 2392803
2 1258527
3 969550
4 818579
5 675646
6 591326
7 598960
8 506268
9 448232
10 414830
11 382808
...
I'm looking for a way to 'bucket' these scores in intervals of (say) 3 and plot them to show the distribution:
Score Count
0-2 5949132
3-5 2463775
...
And so on
I'm using Excel for Mac and I tried defining a 3 interval bin in the Analysis ToolPak but that appears to work only on raw data as opposed to the counts that I already have.
in cells D2 downwards, enter your upper (inclusive) bin limits
in cell E2, enter =SUMIF(A:A,"<="&D2,B:B)-SUM(E$1:E1)
copy E2 downwards
example result:

Excel: select a changing range from a column and copy it

I have a very long column( 50000 rows) and I want select a range (sample) of cells (number of rows) in order to apply certain calculations. You don't have to worry about the calculation formula. Here I just need help on how to do the sampling. The range(sample) should be changing based on window size. The window size is a number I can choose .i.e such in the example below I chose the window size to be 4. In other words, I need to have samples, and these samples are based on selected number of rows of the long column. The size(number of rows) for all of the samples will be equal to the window size. However, each sample is shifted by one from the main column. Now, the samples should be in seperate columns , and thats why I need to copy the selection in each sample in its required column.For illustration purpose, assume the below example:
let assume the window size( number of rows)=4
test
1
2
3
4
5
6
7
8
9
10
The expected output should be :
main col sample1 sample2 sample3 sample4 sample5 sample6
1 1 2 3 4 5 6
2 2 3 4 5 6 7
3 3 4 5 6 7 8
4 4 5 6 7 8 9
5
6
7
8
9
10
each sample of size 4 rows , however, each new sample shifs by 1 from the main column. Note we get 6 samples to cover the whole number of rows in the main column. What basically should be done: sample1 will be from row1-to-row4 from the main col. Now, sample2 will be 4 values, however, from row2-to-row5, from the main column. sample3 will be 4 values, from row3-to-row6, and so on until we cover the whole range of the main column. So there are two main process, selection and copy of selection.
I have tried to use the offset and other logical function ... but it didn't work. I don't want to use Macros or VBA... Is there are any built in functions to solve the problem.?
This is basically a variation of a range transpose. Use the formula:
=INDEX($A:$A,COLUMN()+ROW()-2,1)
Then just copy to whatever window size you want. It will automatically move down the main column by one row for every column over it is. The result looks like this (you are responsible for copying the formula to the right size):
Bonus, you can automate the column header "Sample N" with:
="Sample " & COLUMN()-1

Resources