I need help with calculation of polynomial approximation function with intercept at 0, 0 point.
I have few points in Excel like this:
Point with chart
As you can see i make chart of this point then add trend line with equation.
Now i add intercep at 0, 0 point using "Set intercept" function in chart trend line settings.
chart with intercept at 0, 0 point
Of course equation changed. Can anyone tell me how to solve it mathematically?
I have C# application where i calculate approximation but now i need functionality like in Excel - calculate approximation with intercept at point 0, 0.
For any fixed size, you could use the function =LINEST and the fact that if y = b*x + ax^2 then y is linear in the two (correlated) variables x and x^2:
In A12:C12 I entered the array formula (Ctrl + Shift + Enter` to accept)
=LINEST(A2:A10,B2:C10)
and in A14:B14 I entered
=LINEST(A2:A10,B2:C10,FALSE)
Note that in this approach it was more natural to place the y column before the x column.
Similar approaches work for higher, though fixed, degree. For something more flexible, you might want to port your C# code to VBA.
Related
I'm hoping someone will be able to tell me why the equation that Excel generated is not giving the correct results as it is graphed correctly.
I have some X and Y points that I will list below. I plotted those points in Excel and then plotted the trend line, and had it show me the equation of the trendline. When I take the equation and then plug in the X values I get very different answers back.
X and Y Values
X Y
0 3
3 2
5 1.4
7 1
10 0.5
18 0.1
When I set the intercept to 3, the equation of the trendline is y = 0.0088x5 - 0.1457x4 + 0.8753x3 - 2.224x2 + 1.4798x + 3
Screenshot of Excel window with equation
Any help is greatly appreciated.
I suspect you didn't set up your graph correctly.
Select a single cell in your table
Insert/Scatter (and decide which you want with regard to markers, etc)
Select the line and add Trendline
Set you parameters for the trendline
If you want to get the formula for the trendline from the "show formula" option, be sure to format the trendline label to be numeric with 15 decimals. Otherwise the equation will certainly not work, even if it appears to be correct.
Note that you can obtain the formula directly using the LINEST worksheet function.
=LINEST(Y,X^{1,2,3,4,5}) returns the array:
{0.0000399230399230442,-0.00152188552188569,0.0192991822991846,-0.0840134680134806,-0.217128427128402,2.99999999999999}
The last value in the array is the y-intercept
The slight differences are due to the use of different algorithms for the two methods.
Most of us may be aware of normal distribution curves however those who are new to front-loaded and back-loaded normal distribution, I would like to provide the background and then would proceed on stating my problem.
Front-Loaded Distribution: As demonstrated below, it have a rapid start. For e.g. in a project when more resources assumed to be consumed early in the project, cost/hours is distributed aggressively at the start of project.
Back-Loaded Distribution: Contrary to Front-Loaded distribution, it start out with a lower slope and increasingly steep towards the end of the project. For e.g. when most resources assumed to be consumed late in the project.
In the above charts, green line is S-Curve which represents cumulative distribution (utilization of resources over the proposed time) and the blue Columns represents the isolated distribution of resources (Cost/Hours) in that period.
For reference, I am providing the Bell Curve / standard normal distribution (when Mean=Median) chart (below) and the associated formula to begin with.
Problem Statement: I was able to generate the normal distribution curve (See below with formulae) however I am unable to find a solution for Front loaded or Back Loaded curves.
How to bring the skewness to the right (front-loaded / positively skewed distribution which means mean is greater than median) and left skewed (back-loaded / negatively skewed distribution which means mean is less than median) in a normal distribution?
Formula Explaned:
Cell B8 denotes arbitrarily chosen standard deviation. It affects the kurtosis of normal distribution. In the above screenshot, I am choosing the range of the normal distribution to be from -3SD to 3SD.
Cell B9 to B18 denotes the even distribution of Z-Score using the formula:
=B8-((2*$B$8)/Period)
Cell C9 to C18 denotes the normal distribution on the basis of Z Score and the Amount using the formula:
=(NORMSDIST(B9)-NORMSDIST(B8))*Amount/(1-2*NORMSDIST($B$8))
Update: Following one of the link in comment, I closest got to the below situation. The issue is highlighted in Yellow pattern as due to the usage of volatile Rand() function the charts are not smooth as they should be. As my given formula above do not create ZigZag pattern, I am sure we can have skewed normal distribution and smooth too !
Note:
I am using Excel 2016, so I welcome if any newly introduced formula can solve my problem. Also, I am not hesitant to use UDFs.
The numbers of front-load and back-load distribution are notional. They could vary. I am only interested in shape of resulting chart.
Kindly help !
You can generate the curve using below methods and can use the numbers generated by them for your requirement.
With formulae
The curve
Notes:
If you want to change the bins you have to drag the cells down or up
in order to complete the series
If you want to change the total cost, you can change the multiplier
If you want to change the tilting of the curve you can change the
divider in column C which is currently set to 2, if it is -2 the tilt
will change direction, you can experiment with different numbers,
the direction depends upon either it is less than zero or greater
than zero
For copy past
=A2+180/($G$3-1)
=RADIANS(A2)
=$G$4*SIN(B2 + SIN(B2)/2)
I used the actual mathematical formulas to arrive at the result. It looks like to me what you wanted to achieve. The orange cells in 'Skewed' section are the ones which can be changed to vary the degree and direction of skew. Some pictures for demonstration are below, followed by the formulas used.
Formulas in row 5, column
B:=(A5*$A$2)+0 (0 is the mean, you can change as you like)
C:=(1/($A$2* SQRT(2*PI())))*EXP(-(B5^2)/2)
D:=0.5*(1+ERF(B5/SQRT(2)))
E:=$A$1*C5
F: =(A5*$A$2*(1+$F$2*SIN((F4*PI())/(2*$F$4))))+0 (0 is the mean, you can change as you like)
G:=(1/($A$2* SQRT(2*PI())))*EXP(-((F5+$G$2)^2)/2)
H:=0.5*(1+ERF((B5+$G$2)/SQRT(2)))
I:=$A$1*G5
If you want to make sure the bins always have a value in them, you can use the following approach, which uses normal distributions and simply changes the mean and the standard deviation to get a curve that you want.
Changing the mean moves the peak to the left or right. Changing the standard deviation makes the quantities more uniform or more variable. I've used 0-1000 as my default range in the example below, but it should be easy to modify the formula to bring any value you want. NOTE in order to fulfill your requirement that all bins must be non-zero, you need to manually adjust the numbers till you get a curve that suits.
Yellow cells are for data entry, green cells are a count (so if you add bins, they would need to be numbered according to the sequence).
Formula in cell B7 (copied down to cell B16):
=NORMDIST($A7*1000/MAX($A$6:$A$17),$B$3,$B$4,TRUE)-NORMDIST($A6*1000/MAX($A$6:$A$17),$B$3,$B$4,TRUE)
Formula in cell C7 (copied down to cell C16):
=IF(A7=MAX($A$6:$A$17),$C$5-SUM(C$6:C6),ROUND(B7/SUM($B$7:$B$17)*$C$5,0))
Adding new bins is simple enough and is still based on a 0-1000 range, so you don't need to change any numbers other than adding rows and copying down the formulae:
The above example is also showing how a narrow standard deviation and a high mean combine to make the starting bins have very little quantity. But there is still a value (as long as count is big enough).
You may want to pre-define the different skewness selections if this is going to be used by other people (make column B dependent on a lookup, for example) but hopefully this is extensible enough for your needs.
If you are open to a Python answer the I can give you the code to get Python Pandas libary to generate the random observations from a skewed Normal and then bin (bucket) them for you. The following in a Python script which captures the use case but also can be created using COM and so creatable from VBA.
import numpy as np
import pandas as pd
from scipy.stats import skewnorm
class PythonSkewedNormal(object):
_reg_clsid_ = "{1583241D-27EA-4A01-ACFB-4905810F6B98}"
_reg_progid_= 'SciPyInVBA.PythonSkewedNormal'
_public_methods_ = ['GeneratePopulation','BinnedSkewedNormal']
def GeneratePopulation(self,a, sz):
# https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.seed.html
np.random.seed(10);
#https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/scipy.stats.skewnorm.html
return skewnorm.rvs(a, size=sz).tolist();
def BinnedSkewedNormal(self,a, sz, bins):
# https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.seed.html
np.random.seed(10);
#https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/scipy.stats.skewnorm.html
pop = skewnorm.rvs(a, size=sz); #.tolist();
bins2 = np.array(bins)
bins3 = pd.cut(pop,bins2)
table = pd.value_counts(bins3, sort=False)
table.index = table.index.astype(str)
return table.reset_index().values.tolist();
if __name__=='__main__':
print ("Registering COM server...")
import win32com.server.register
win32com.server.register.UseCommandLine(PythonSkewedNormal)
And the VBA client code
Option Explicit
Sub TestPythonSkewedNormal()
Dim skewedNormal As Object
Set skewedNormal = CreateObject("SciPyInVBA.PythonSkewedNormal")
Dim lSize As Long
lSize = 100
Dim shtData As Excel.Worksheet
Set shtData = ThisWorkbook.Worksheets.Item("Sheet3") '<--- change sheet to your circumstances
shtData.Cells.Clear
Dim vBins
vBins = Array(-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5)
'Stop
Dim vBinnedData
vBinnedData = skewedNormal.BinnedSkewedNormal(-5, lSize, vBins)
Dim rngData As Excel.Range
Set rngData = shtData.Cells(2, 1).Resize(UBound(vBins) - LBound(vBins), 2)
rngData.Value2 = vBinnedData
'Stop
End Sub
Sample output
(-5, -4] 0
(-4, -3] 0
(-3, -2] 4
(-2, -1] 32
(-1, 0] 57
(0, 1] 7
(1, 2] 0
(2, 3] 0
(3, 4] 0
(4, 5] 0
Original code deposited on my blog
Based on #usmanhaq 's ans, vba macro made for distribution curve simulation. Corrected for 100% scaling of front & backloading curve.
click here to go Github Lib
This is really weird. I calculate R^2 values with Excel in two different ways and the results differ hugely. Why?
1) First I use Excel to do a linear regression via a graph, and use the "Add Trendline..." right mouse button functionality to specify Intercept = 0. The R square value shows -3.253. The regressed equation is Y = -0.1321 * X
2) Then I use Excel to do a linear regression via LINEST function. I highlight 5x2 rows and in the top left cell, I type "=LINEST ([Y vector]; [X vector], FALSE, TRUE). The False means the intercept is 0, and the True means Excel should print additional regression statistical information. Then I press CTRL + SHIFT + Enter. This will show me additional statistics, such as R^2 value in the third left cell. Which turns out to be 0.11166. The regressed equation is Y = -0.1321 * X
My question is; what am I doing wrong in calculating R^2 with the graph? Python and statsmodels.api confirms that R^2 is 0.11166, and the regressed equation is Y = -0.1321 * X.
Y =
0.0291970802919708
0.141801551718973
0.145668034655723
0.0691229530946433
0.0431577486597426
0.133618351873374
X =
-0.35551988
-0.20577599
0.10780785
-0.25028796
-0.42762184
0.02442197
Your calculation is correct. Scatter plot does not return correct R^2 when the intercept is 0. This is an formula fo R^2
where
If you use standard regression model, you use average value of y as y̅. But when you assume that the intercept equals 0, you need to set y̅ as zero. If you use the average value of y instead of zero, you get the R^2 = -3.252767.
You can see the calculation here. The SStot wrong column uses average value of y as y̅. Then the R^2 value equals to -3.252767. If you use 0 (as I did in SStot right column), then you get 0.111.
It is an old bug described by Microsoft here:https://support.microsoft.com/en-us/help/829249/you-will-receive-an-incorrect-r-squared-value-in-the-chart-tool-in-excel-2003
You need to use the LINEST function to get correct R^2 value.
Me and my fellow engineers just got tangled up in this. Based on this discussion and what we observed, the R^2 is wrong all of the time except when Excel calculates the best y-intercept. Any other y-intercept (either forced through Zero OR user-defined), is wrong.
I am calculating Euclidian distance between points in an Excel application, and also need to be able to specify the direction of the difference in two-dimensional location for each pair of points.
Does anyone know how to implement this in Excel?
Below is a simplified illustration of my current Euclidian distance calculation. I have two points, and calculate how far apart Point1 is from Point2. But I would also like to find the direction (in degrees preferably) between Point1 and Point2.
For direction, you could use the angle that the vector from point one to point two makes with respect to the positive x axis:
=DEGREES(ATAN2(B3-B2,C3-C2))
this will return a number between -180 and +180 degrees. The ATAN2 function is given by ATAN2(x,y) = arctan(y/x) with the refinement that it returns pi/2 rather than a division by 0 error if x = 0 and also gives an answer in the appropriate quadrant.
When Excel determines the axis values it will use to represent your data in a chart, the values are 'evenly distributed'.
For Example:
If you plot the following series in an Excel Line Chart.
[0.22,0.33,0.44,0.55,0.66,0.77,0.88,0.99,1.1,1.21,1.32,1.43,1.54,1.65,1.76,1.87,1.98,2.09,2.2]
Excel determines that the y-axis values should be [0,0.5,1,1.5,2,2.5].
What technique or formula is used to determine these values ?
After doing some experiementing, I conclude that Excel:
1) It will keep the Y axis starting at zero unless you explicitly tell it otherwise
2) It will set the Y axis max one major tick higher than your largest value
3) The last part seems more arbitrary - it clearly has "preferred" units (.1, .2, .5, 1, 2, 5, 10, 50, 100, etc) that it will choose for the major tick. It will use the smallest preferred unit that will result in between 5 and 10 major ticks, meeting the requirements above.
Please have a look at:
Reasonable optimized chart scaling
And
Algorithm for "nice" grid line intervals on a graph
There are more questions like this.