Does of order data points in excel influence the Regression results in Excel - excel

I tried to do a regression analysis with some 91 data points. When I did the regression analysis initially, I got R value as 0.366733. Later I sorted the datapoints from smallest to largest and then did the regression analysis. My new R value is 0.04323. Does the order in which the original data points are arranged influence the regression analysis

The ordering of paired datapoints does not matter in regression
For example:
5 9
6 1
3 7
9 5
6 4
Gives a correlation (which is the same as standardized regression) of -0.37
If I reorder the entire data based on column 1 values:
3 7
5 9
6 1
6 4
9 5
I get the same correlation of -0.37. Notice that the pairs are still aligned, i.e. both columns are being sorted together
But in Excel its very easy to get into a situation like the following, where you're sorting by only a single column. Meaning one column will be the ordered, but the pair alignment is broken because the second column doesnt change:
3 9
5 1
6 7
6 5
9 4
Now I get a correlation of -0.41. The pairs of data are no longer aligned and effectively makes this a completely different dataset than before
Bottom line: when youre sorting in Excel make sure you've selected all of your data for the sort and not just a single column

Related

How does excel calculate values when you drag out a range?

I have been trying to find an answer online but haven't been able to find one.
When given a range of values, selecting this range and dragging out the cells will generate more values. How are these values calculated? In certain cases it is easy to figure, like when all values are the same or when they are increasing by a steady interval, but how are values calculated when more random sequences of values are given?
For example, given the range
Val 1
Val 2
Val 3
Val 4
Val 5
Val 6
5
5
6
54
5
2
when selecting all values and dragging out to the right, I will end up with the following range:
Val 1
Val 2
Val 3
Val 4
Val 5
Val 6
Dragged out 1
Dragged out 2
Dragged out 3
5
5
6
54
5
2
16.133
17.976
18.019
How are the three dragged out values calculated?
This is done using linear regression, as calculated by the least squares method, explained in this Wikipedia-article.
As an illustration, I have created an Excel sheet, containing the numbers from 1 to 6 and I've added your numbers. Then I've added the numbers 7-9 and used least squares method (as supported by Excel) and put everything in a graph. Please realise that the original values are shown but overwritten by the estimated values in the attached graph (the yellow cells contain the formula of the cell at its left):

Excel rotate radar chart

I have been trying to create a windrose that displays the occurence of multiple wind speeds and their respective wind direction. Using other very helpful posts on here I've gotten pretty close to what I want. There is just one thing I can't seem to fix.
As you can see in the figure below the graph starts at 0 degrees while I want the "North" wind direction to start at -11,25 (or +348,75) degrees.
Currently the radial axis labels are added using a pie chart while the rest of the data is plotted in a filled radar chart. It is easy to rotate the pie chart but I can't seem to find a similar function for rotating the radar chart. Any help would be much appreciated. The excel file is attached beneath the figure.
EDIT: Locked excel file against editing
Excel file
I haven't fully digested the netiquette of this website and not sure if it is a good idea to try giving you an answer 6+ months after you posted. Also hope that by this time you found an answer.
If not, this link should be of help:
https://superuser.com/questions/687036/how-to-make-a-pie-radar-chart
In the example the creator made one field for each degree and started the first series, which would be equivalent to your north at 0°. However nothing prevents you from starting at 348.
I have not tested but I also think that nothing prevents you from adding even more "resolution", e.g. half-degree steps.. or even more to your discretion.
EDIT: following L.Guthardt's feedback.
In order to provide you an answer I opted to simplify your table and chart. Mostly for convenience, but also because I struggle to get a full understanding of the original "architecture". Still, the solution should work at any level and is based on two key elements:
first you will have to double the number of rows from 16 to 32 (thus each quadrant being repeated two times, e.g. ... nne - nne - ne - ne...)
second, you have to start and finish with N as showcased here
Direction Cat6
N 6
NNE 4 4
NNE 6
NE 4 4
NE 6
ENE 4 4
ENE 6
E 4 4
E 6
ESE 4 4
ESE 6
SE 4 4
SE 6
SSE 4 4
SSE 6
S 4 4
S 6
SSW 4 4
SSW 6
SW 4 4
SW 6
WSW 4 4
WSW 6
W 4 4
W 6
WNW 4 4
WNW 6
NW 4 4
NW 6
NNW 4 4
NNW 6
N 4 4
which will generate
for the pie chart I used a separate range with alternate gaps in the labels
Direction Dummy
N 1
1
NNE 1
1
NE 1
1
ENE 1
1
E 1
1
ESE 1
1
SE 1
1
SSE 1
1
S 1
1
SSW 1
1
SW 1
1
WSW 1
1
W 1
1
WNW 1
1
NW 1
1
NNW 1
1
Rotating radar charts in Excel can be achieved by building a separate table for plotting the chart. It would have three columns:
Column A: New categories
Column B: Original categories (calculated from A)
Column C: Original data using VLOOKUP() on B
The chart will be plotted using columns B and C. Column B category numbers are offset by the desired number of categories.
If the chart needs to be rotated by other than multiples of a category degree (e.g., 30 degrees for 12 categories), you would need to add rows in between (corresponding to the amount of rotation in relation to the category degree). For example, to rotate a 12-category radar chart by multiples of 15 degrees, one extra row is needed in-between each original category row (to create 24 new categories). In this case, you would need to calculate the intermediate values by linearly interpolating between actual data points.
The trick is that blank category values are not displayed on the chart and the values for these categories blend in smoothly with the real data (because they are interpolated).
I will post an example if the above is not clear enough.
P.S. I cannot look at your new Excel file (in Answers) because it exceeds 5 MB (see screenshot 1).
So I did keep working on this problem and the best solution I've come up with (while using Microsoft Excel) looks as follows:
Currently, the number of sectors in the plot is fixed at 16. If I want to make this number variable, the table required for the plot data requires a very large amount of lookup functions which make the spreadsheet too slow to work with.
I've uploaded the new Excel file here to take a look at:
Excel file

Manipulating function sample sizes in Excel

Suppose I had two time series consisting of weekly data points, and I want to compute the covariance of the time series for the last n weeks using the covariance function in Excel.
Would it be possible to set this scenario up in such a way that a certain cell contains the number of weeks of data I want to compute the covariance for?
That is, changing the cell element to k would lead to the already computed covariance for n weeks to change to the covariance of the data series for the last k weeks?
You decided that sample data was not important so here is some.
date nmbr
03-30-2017 4
04-04-2017 4
04-07-2017 2
04-09-2017 2
04-12-2017 1
04-15-2017 4
04-18-2017 1
04-21-2017 2
04-24-2017 1
04-26-2017 3
04-30-2017 4
05-02-2017 5
05-07-2017 4
05-09-2017 2
05-10-2017 1
05-12-2017 5
05-14-2017 4
My crystal ball tells me that this question is not so much about Excel's COVARIANCE.P or COVARIANCE.S but about limiting date related data. To this end, I'll simply SUM 4 weeks of data.
The formulas needed in E2:H2 (see supplied image) are:
=TODAY()
4
=FLOOR(E2-(F2*7), 7)+1
=SUM(INDEX(B:B, MATCH(G2, A:A)+ISNA(MATCH(G2, A:A, 0))):INDEX(B:B, MATCH(1E+99, A:A)))
Note that the dates are in ascending order.

Weighted Trendline in Excel

This is an extension to the question asked in the forums a few years ago:
Excel produces scatter diagrams for sets of pair values. It also gives the option of producing a best fit trendline and formula for the trendline. It also produces bubble diagrams which take into consideration a weight provided with each value. However, the weight has no influence on the trendline or formula. Here is an example set of values, with their mappings and weights.
Value Map Weight
0 1 10
1 2 10
2 5 10
3 5 20
4 6 20
5 1 1
I have used the formula that brettDJ offered:
=INDEX(LINEST(B2:B7*C2:C7^0.5,IF({1,0},1,A2:A7)*C2:C7^0.5,TRUE,TRUE),3,1)
However, I could not understand why we used the ^0.5 here to sqrt the weights.
The original question is here

Ignore #N/As in Excel LINEST function with multiple independent variables (known_x's)

I am trying to find the equation of a plane of best fit to a set of x,y,z data using the LINEST function. Some of the z data is missing, meaning that there are #N/As in the z column. For example:
A B C
(x) (y) (z)
1 1 1 5.1
2 2 1 5.4
3 3 1 5.7
4 1 2 #N/A
5 2 2 5.2
6 3 2 5.5
7 1 3 4.7
8 2 3 5
9 3 3 5.3
I would like to do =LINEST(C1:C9,A1:B9), but the #N/A causes this to return a value error.
I found a solution for a single independent variable (one column of known_x's, i.e. fitting a line to x,y data), but I have not been able to extend it for two independent variables (two known_x's columns, i.e. fitting a plane to x,y,z data). The solution I found is here: http://www.excelforum.com/excel-general/647448-linest-question.html, and the formula (slightly modified for my application) is:
=LINEST(
N(OFFSET(C1:C9,SMALL(IF(ISNUMBER(C1:C9),ROW(C1:C9)-ROW(C1)),
ROW(INDIRECT("1:"&COUNT(C1:C9)))),0,1)),
N(OFFSET(A1:A9,SMALL(IF(ISNUMBER(C1:C9),ROW(C1:C9)-ROW(C1)),
ROW(INDIRECT("1:"&COUNT(C1:C9)))),0,1)),
)
which is equivalent to =LINEST(C1:C9,A1:A9), ignoring the row containing the #N/A.
The formula from the posted link could probably be adapted but it is unwieldy. Least squares with missing data can be viewed as a regression with weight 1 for numeric values and weight 0 for non-numeric values. Based on this observation you could try this (with Ctrl+Shift+Enter in a 1x3 range):
=LINEST(IF(ISNUMBER(C1:C9),C1:C9,),IF(ISNUMBER(C1:C9),CHOOSE({1,2,3},1,A1:A9,B1:B9),),)
This gives the equation of the plane as z=-0.2x+0.3y+5 which can be checked against the results of using LINEST(C1:C8,A1:B8) with the error row removed.

Resources