How to restore (predict) data based on correlation/regression in Excel? - excel

I have some data in which a feature (height) is correlated with output variable (price). How to restore missing data (nulls) in height feature based on existing dependancy (correlation) between these variables?
To be more clear:
Input and output variables have clear correlation. I guess that predicting missing values for excel is not a difficult procedure. But I need some directions how to implement it.

If you got the slope (m) and intercept (c) of the regression line in E2 and E3 (say):-
=SLOPE(C2:C9,B2:B9)
=INTERCEPT(C2:C9,B2:B9)
you could re-arrange the simple regression equation y=mx+c to predict the x-values
x=(y-c)/m
So your predicted heights would be:-
=IF(ISBLANK(B2),(C2-E$3)/E$2,B2)
starting in D2.

You might try the FORECAST¹ function. The first blank does not have enough preceding data to generate a forecast result so a simple ratio will have to suffice but the remaining values can be generated and take previously generated FORECAST results into consideration for their own result(s).
        
The formula in E2 is,
=IF(ISBLANK(B2), FORECAST(C2, B$2:B$9, C$2:C$9), B2)
¹ See Forecasting functions for alternative algorithms in data prediction.

Related

Finding Distance From List Of Coordnates

looking to see if someone can suggest a site or excel method to find the distance of multiple long,lat coordinates from a one point
example: I have a starting point and 7 other coordinates, is there a way to find how far away (in KM/MI) each point is from the starting point?
Starting loction : 33.17261,-117.14571
list of coordinates
32.75827,-117.17577
32.76079,-117.18589
32.76444,-117.20174
32.59815,-117.01685
32.66387,-117.05577
32.59811,-117.01681
32.66381,-117.05571
lets assume the column they are stored in is column A starting in row 2 (assuming you have a header row). The first thing you are going to want to do is split them into their own columns so you can work with the numbers. There are a couple of ways to do this. The simplest is using the built in feature Text-to-Columns located on the Data ribbon.
Have the whole column selected when you start the process and on the first page, select Delimited.
On the second step choose "," as a delimeter and then press finish. You do not need the third step.
once that is done your data should now be sitting in two columns.
I will place the point you are referring to in C2 and D2, you can use text to columns for this part too or just type it in.
So base on information on another site (assuming its correct), take the earth as a sphere with radius, 6371 km. Place this value in E2.
Next convert your list into X and Y values using the following equations:
F2
=$E$2*COS(RADIANS($B2))*COS(RADIANS($A2))
G2
=$E$2*SIN(RADIANS($B2))*SIN(RADIANS($A2))
note that the degrees were converted to radians for use in excels trig functions
Repeat the process for you starting point coordinates and place the formulas in H2 and I2.
H2
=$E$2*COS(RADIANS($D2))*COS(RADIANS($C2))
I2
=$E$2*SIN(RADIANS($D2))*SIN(RADIANS($C2))
Finally in J2 use the following formula:
=SQRT((F2-$H$2)^2+(G2-$I$2)^2)
Copy the formulas in F2, G2, and J2 down as for as your source list goes. The values in J represent the distance between the X Y points. It does not the curvature of the earth though. Apparently there are many different models to predict this. You need one that works for your area if you want something more refined.
For geodesic grade accuracy (fraction of a mm) you can use my Excel add-in available on GitHub: https://github.com/tdjastrzebski/Vincenty-Excel, in particular VincentyInvDistance() function. The solution implements Vincenty's formulae,
Otherwise use Haversine formula but it does not provide geodesic grade results.
The topic is not that trivial, see Geodesics on an ellipsoid for more details.

Excel Interpolate with logarithmic prediction

Is there a function within Excel to Interpolate while taking into account a logarithmic prediction?
At the moment I am using linear interpolation but would like to find a better way to fill in the blanks if possible.
There's no logarithmic regression or interpolation in Excel, even in the Anlaysis ToolPak. You'll need much more advanced software for that, such as MatLab.
If you're stuck working in Excel... here's a possible mathematical solution:
Rather than working with the raw data x and y, instead try plotting x and a^y, where a is the base. (Or plotting log(x,a) against y.) If you have the correct base a (and there's no vertical offset), you will then have a linear relationship from which you can perform a linear interpolation as normal, then convert the interpolated values back to actual values by taking the log of them.
If you don't know what a is, then you can instead calculate a line of best fit for an arbitrary a, calculate the standard residuals, and then use Problem Solver to modify a until you get the lowest possible standard residuals, at which point you have the best estimate of a.
Similarly if there is a vertical offset b, you'll need to test some variables there that also result in a linear relationship. Plot x against a^(y-b)

VBA EXCEL Fitting Curve with freely chosen function

As start situation, I have an xy-chart with some values on it whose progression resemble an exponential function. I need to write a code that draws a fitting curve on the chart, but I have to use a particular function which is not exponential (because I need to get some coefficients from it).
One of the functions i need to use is K(C-x)²/(1+x) whereby k and C are the parameters I need.(They are two and it makes it a lot more complicated) Obviously you can't find this kind of structure on the fitting curve tool in Excel. Is there any possibility to have a fitting curve to a chart where you can write yourself the structure of the function?
Sorry if I don't add any written code, but i just need a hint to start writing.
Thank you
I did something to similar to this a while ago. The approach I took was to use the solver (as gary's student suggests). I think it was fired from VBA but that's unimportant.
Basically you'd have two input cells per row of data with your variables K and C. Then you need to find the difference (errors) between the values the function produces with the values in the input cells compared to the actual values (I think using errors^2 gives quicker conversion). You then sum the differences in another cell. When running the solver, you ask it to minimise the sum of differences by changing K and C.
Does that makes sense...?

Excel average every 0.5 meters, irregular distances between data points

I have a data set that has height values every so often, like topography data in a straight line with GPS coordinates. I used the GPS coordinates and trigonometry to make a cumulative distance column. However, the distance between points varies. Sometimes its 10 cm sometimes its 13, sometimes its 40.
I would like to take the average height every 0.5 meters, but sometimes the distance column doesnt even land on a multiple of 0.5! This would mean my output column would be significantly shorter than my raw data column.
I think my main problem is I do not know what this process is called in order to Google it. Another problem is that the distances are irregular as mentioned above. Things I think may have something to do with it:
averageif?
binning? I do not want a histrogram though, just the data.
Thanks for the help and if you do not know the answer but at least know what I should be writing in the search bars that would be helpful as well. Thanks!
Perhaps this will work for you. I made up a series of distance vs height measurements and determined that a third order polynomial curve fit pretty well. (A different curve might best fit your real data, so you would have to alter the formula accordingly). I then used that formula to derive a set of new heights for the desired ditances at, in my example five unit differences.
The formula under Extrapolated heights is an ARRAY formula entered into all the cells at once. You select D2:D12, enter the formula in D2 and, hold down CTRL-SHIFT while hitting ENTER. If you did this correctly, you will have the same formula in each cell surrounded by curly braces {...}
Then you can decide how you want to "Average" the heights.

Direct Power Regression in Excel

I want to obtain the R^2 values for several pairs of X v/s Y data.
It can be easily done in Matlab.
But in excel, I believe one needs to create new columns with logarithmic values or something.
Is there a direct, neat, formulas-based, Matlab-esque way to do this in Excel?
Matalb is the next generation of Excel ;) So definitely Excel is dull compared to Matalb. But dont' get demotivated, coz it's still a matrix based (Row,Col) arena...
Here is a function to try out:
RSQ function.
RSQ(known_y's,known_x's)
References for different ways:
calculate R-square in Excel
Edit:
If you need the logarithmic then you may have to use the following:
=RSQ(y-range,LN(x-range))

Resources