Excel linest formula for weighted polynomial fit - excel

How to specify excel linest weighted polynomial fit formula, something like
LINEST(y*w^0.5,IF({1,0},1,x)*w^0.5,FALSE,TRUE), but this is for linear fit. I'm looking for similar formula for 2nd order and 3rd order polynomial regression fit.

In a reply to the other post in Weighted trendline an approach was already suggested for weighted polynomials. For example for a cubic fit try with CTRL+SHIFT+ENTER in a 4x1 range:
=LINEST(y*w^0.5,(x-1E-99)^{0,1,2,3}*w^0.5,FALSE)
(-1e-99 ensures that 0^0=1). Similar to the linear case for R^2 try:
=INDEX(LINEST((y-SUMPRODUCT(y,w)/SUM(w))*w^0.5,(x-1E-99)^{0,1,2,3}*w^0.5,FALSE,TRUE),3,1)
Derivation
In standard least squares we find the vector b that minimises:|y-Xb|²=(y-Xb)'(y-Xb)
In the weighted case b is chosen to minimise instead: |W(y-Xb)|²=(y-Xb)'W'W(y-Xb)
So the weighted regression is Wy on WX where W'W = W² is the diagonal matrix of the weights.

Related

How to compute linear regression using multivariate least squares method without using scikit-learn library?

My question is classification of the iris dataset using multi-variate linear regression without using the scikit-learn library.
I have this formula that is needed to find the beta values for the dataset.
enter image description here
β^=(X′X)−1X′Y
This is the dataset in question: http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data
How to compute the linear regression using this formula. I understand that linear regression is
Yi = β0 + β1X1i + ... + βkXki + ϵi
I have computed beta values using the above formula using matrix multiplication. How to find the linear regression equation now? I have assumed the first 4 columns as the A matrix and the label column as the Y matrix with values 1,2,3 respectively.
How do i compute the ϵi values. Do i assume them to be zero? Any help is appreciated. Thanks in advance.

Confusion matrix 5x5 formula for finding accuracy, precision, recall ,and f1-score

im try to study confusion matrix. i know about 2x2 confusion matrix but i still don't understand how to count 5x5 confusion matrix for finding accuracy, precision, recall and, f1 - score. Can anyone help me with this ? i appreciate every help.
See my answer here: Calculating Equal error rate(EER) for a multi class classification problem
In short, one strategy is to split the multiclass problem into a set of binary classification, for each class a "one vs. all others" classification. Then for each binary problem you can calculate F1, precision and recall, and if you want you can average (uniformly or weighted) the scores of each class to get one F1 score which will represent the multiclass problem.
As for confusion matrix larger than 2x2: the rows are the true labels and the columns are predicated labels. Then the number in cell (i,j) is the number of samples from class i which were classified as class j (note that i=j corresponds to correct prediction). The accuracy is the trace of the confusion matrix divided by the number of samples.

Find the best fitting distribution using excel

I estimated the theta of exponential distribution and the theta and tau of weibull distribution. I want to compare the two distribution to see which one is the best fitting of my data. How can i do that in excel? Can i find the R squared value in excel?
You can simply use function correl(x range, y range) and make it square.
for example
=CORREL(A1:A10,B1:B10)^2
For more information go to https://www.youtube.com/watch?v=RYcYyxoKq0U

Quadratic and cubic regression in Excel using LINEST

I am trying to use LINEST in Excel 2013 to get the coefficients for a cubic function but LINEST does not work well with non-linear functions according to this link MS KB828533. Apparently this is because of the way collinearity is handled. A similar question is asked here Quadratic and cubic regression in Excel but it does not address the problem.
Excel's builtin Column Chart | Trendline (3rd degree poly) produces correct coefficients. However, LINEST as well as Data Analysis | Regression both give wrong coefficients.
EDIT: Excel's builtin Column Chart does NOT produce correct coefficients for polynomials. Only use Column Chart trendline for linear data! (please see answer).
This is my data:
x y
2006 7798
2007 8027
2008 9526
2009 11661
2010 16014
2011 18731
2012 23405
2013 25294
2014 28578
I can only get the third coefficent (here x3) using this:
={LINEST(y;(x-AVERAGE(x))^{1,2,3})}
Results:
Coef Chart LINEST
x3 -62.295 -62.295
x2 1098.254 163.834
x1 -2746.214 3564.226
intcpt 9528.659 15467.104
CORRECT x3 correct, rest WRONG
I have also tried a more complex LINEST like this:
={MMULT(LINEST(y;(x-AVERAGE(x))^{1,2,3});
IFERROR(COMBIN({3;2;1;0};{3,2,1,0})*(-AVERAGE(x))^({3;2;1;0}-{3,2,1,0});0))}
But in similar fashion only x3 is correct and the rest is wrong.
Any help is appreciated.
Problem solved. It turns out that using anything else but XY Scatter Plot to calculate regression coefficients for polynomials (or a trendline), will produce wrong coefficients.
Conclusively, do not use Line, Bar and Column charts to calculate regression coefficients for polynomials. The following image shows the difference in calculated coefficients - the top figure uses XY Scatter Plot and produces correct coefficients while bottom figure is created with a Column chart. Both figures uses the same data.

How do I calculate the standard deviation between weighted measurements?

I have several weighted values for which I am taking a weighted average. I want to calculate a weighted standard deviation using the weighted values and weighted average. How would I modify the typical standard deviation to include weights on each measurement?
This is the standard deviation formula I am using.
When I simply use each weighted value for 'x' and the weighted average for '\bar{x}', the result seems smaller than it should be.
I just found this wikipedia page discussing data of equal significance vs weighted data. The correct way to calculate the biased weighted estimator of variance is
,
though the following, on-the-fly implementation, is more efficient computationally as it does not require calculating the weighted average before looping over the sum on the weighted differences squared
.
Despite my skepticism, I tried both and got the exact same results.
Note, be sure to use the weighted average
.

Resources