Linear regression library for Go language - statistics
I'm looking for a Go library that implements linear regression with MLE or LSE.
Has anyone seen one?
There is this stats library, but it doesn't seem to have what I need:
https://github.com/grd/statistics
Thanks!
Implementing an LSE (Least Squared Error) linear regression is fairly simple.
Here's an implementation in JavaScript - it should be trivial to port to Go.
Here's an (untested) port:
package main
import "fmt"
type Point struct {
X float64
Y float64
}
func linearRegressionLSE(series []Point) []Point {
q := len(series)
if q == 0 {
return make([]Point, 0, 0)
}
p := float64(q)
sum_x, sum_y, sum_xx, sum_xy := 0.0, 0.0, 0.0, 0.0
for _, p := range series {
sum_x += p.X
sum_y += p.Y
sum_xx += p.X * p.X
sum_xy += p.X * p.Y
}
m := (p*sum_xy - sum_x*sum_y) / (p*sum_xx - sum_x*sum_x)
b := (sum_y / p) - (m * sum_x / p)
r := make([]Point, q, q)
for i, p := range series {
r[i] = Point{p.X, (p.X*m + b)}
}
return r
}
func main() {
// ...
}
I have implemented the following using gradient descent, it only gives the coefficients but takes any number of explanatory variables and is reasonably accurate:
package main
import "fmt"
func calc_ols_params(y []float64, x[][]float64, n_iterations int, alpha float64) []float64 {
thetas := make([]float64, len(x))
for i := 0; i < n_iterations; i++ {
my_diffs := calc_diff(thetas, y, x)
my_grad := calc_gradient(my_diffs, x)
for j := 0; j < len(my_grad); j++ {
thetas[j] += alpha * my_grad[j]
}
}
return thetas
}
func calc_diff (thetas []float64, y []float64, x[][]float64) []float64 {
diffs := make([]float64, len(y))
for i := 0; i < len(y); i++ {
prediction := 0.0
for j := 0; j < len(thetas); j++ {
prediction += thetas[j] * x[j][i]
}
diffs[i] = y[i] - prediction
}
return diffs
}
func calc_gradient(diffs[] float64, x[][]float64) []float64 {
gradient := make([]float64, len(x))
for i := 0; i < len(diffs); i++ {
for j := 0; j < len(x); j++ {
gradient[j] += diffs[i] * x[j][i]
}
}
for i := 0; i < len(x); i++ {
gradient[i] = gradient[i] / float64(len(diffs))
}
return gradient
}
func main(){
y := []float64 {3,4,5,6,7}
x := [][]float64 {{1,1,1,1,1}, {4,3,2,1,3}}
thetas := calc_ols_params(y, x, 100000, 0.001)
fmt.Println("Thetas : ", thetas)
y_2 := []float64 {1,2,3,4,3,4,5,4,5,5,4,5,4,5,4,5,6,5,4,5,4,3,4}
x_2 := [][]float64 {{1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1},
{4,2,3,4,5,4,5,6,7,4,8,9,8,8,6,6,5,5,5,5,5,5,5},
{4,1,2,3,4,5,6,7,5,8,7,8,7,8,7,8,7,7,7,7,7,6,5},
{4,1,2,5,6,7,8,9,7,8,7,8,7,7,7,7,7,7,6,6,4,4,4},}
thetas_2 := calc_ols_params(y_2, x_2, 100000, 0.001)
fmt.Println("Thetas_2 : ", thetas_2)
}
Result:
Thetas : [6.999959251448524 -0.769216974483968]
Thetas_2 : [1.5694174539341945 -0.06169183063112409 0.2359981255871977 0.2424327101610395]
go playground
I checked my results with python.pandas and they were very close:
In [24]: from pandas.stats.api import ols
In [25]: df = pd.DataFrame(np.array(x).T, columns=['x1','x2','x3','y'])
In [26]: from pandas.stats.api import ols
In [27]: x = [
[4,2,3,4,5,4,5,6,7,4,8,9,8,8,6,6,5,5,5,5,5,5,5],
[4,1,2,3,4,5,6,7,5,8,7,8,7,8,7,8,7,7,7,7,7,6,5],
[4,1,2,5,6,7,8,9,7,8,7,8,7,7,7,7,7,7,6,6,4,4,4]
]
In [28]: y = [1,2,3,4,3,4,5,4,5,5,4,5,4,5,4,5,6,5,4,5,4,3,4]
In [29]: x.append(y)
In [30]: df = pd.DataFrame(np.array(x).T, columns=['x1','x2','x3','y'])
In [31]: ols(y=df['y'], x=df[['x1', 'x2', 'x3']])
Out[31]:
-------------------------Summary of Regression Analysis-------------------------
Formula: Y ~ <x1> + <x2> + <x3> + <intercept>
Number of Observations: 23
Number of Degrees of Freedom: 4
R-squared: 0.5348
Adj R-squared: 0.4614
Rmse: 0.8254
F-stat (3, 19): 7.2813, p-value: 0.0019
Degrees of Freedom: model 3, resid 19
-----------------------Summary of Estimated Coefficients------------------------
Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
--------------------------------------------------------------------------------
x1 -0.0618 0.1446 -0.43 0.6741 -0.3453 0.2217
x2 0.2360 0.1487 1.59 0.1290 -0.0554 0.5274
x3 0.2424 0.1394 1.74 0.0983 -0.0309 0.5156
intercept 1.5704 0.6331 2.48 0.0226 0.3296 2.8113
---------------------------------End of Summary---------------------------------
and
In [34]: df_1 = pd.DataFrame(np.array([[3,4,5,6,7], [4,3,2,1,3]]).T, columns=['y', 'x'])
In [35]: df_1
Out[35]:
y x
0 3 4
1 4 3
2 5 2
3 6 1
4 7 3
[5 rows x 2 columns]
In [36]: ols(y=df_1['y'], x=df_1['x'])
Out[36]:
-------------------------Summary of Regression Analysis-------------------------
Formula: Y ~ <x> + <intercept>
Number of Observations: 5
Number of Degrees of Freedom: 2
R-squared: 0.3077
Adj R-squared: 0.0769
Rmse: 1.5191
F-stat (1, 3): 1.3333, p-value: 0.3318
Degrees of Freedom: model 1, resid 3
-----------------------Summary of Estimated Coefficients------------------------
Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
--------------------------------------------------------------------------------
x -0.7692 0.6662 -1.15 0.3318 -2.0749 0.5365
intercept 7.0000 1.8605 3.76 0.0328 3.3534 10.6466
---------------------------------End of Summary---------------------------------
In [37]: df_1 = pd.DataFrame(np.array([[3,4,5,6,7], [4,3,2,1,3]]).T, columns=['y', 'x'])
In [38]: ols(y=df_1['y'], x=df_1['x'])
Out[38]:
-------------------------Summary of Regression Analysis-------------------------
Formula: Y ~ <x> + <intercept>
Number of Observations: 5
Number of Degrees of Freedom: 2
R-squared: 0.3077
Adj R-squared: 0.0769
Rmse: 1.5191
F-stat (1, 3): 1.3333, p-value: 0.3318
Degrees of Freedom: model 1, resid 3
-----------------------Summary of Estimated Coefficients------------------------
Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
--------------------------------------------------------------------------------
x -0.7692 0.6662 -1.15 0.3318 -2.0749 0.5365
intercept 7.0000 1.8605 3.76 0.0328 3.3534 10.6466
---------------------------------End of Summary---------------------------------
There's a project called gostat which has a bayes package which should be able to do linear regressions.
Unfortunately the documentation is somewhat lacking, so you'll probably have to read the code to learn how to use it. I dabbled with it a bit myself but haven't touched the bayes package.
Related
Calculating the avg point of students from a txt file and return a dict
I'm quite new to Python and I have been working on this problem for a week, still can't figure this out, pls help. The txt input file is like this (the first number in each line is the Student ID; Math, Phsc, Chem and Bio each has 4 scores, the rest has 5, separated by ';'): StudentID, Math, Physics, Chemistry, Biology, Literature, Language, History, Geography 1; 5,7,7,8;5,5,6,6;8,6,7,7;4,8,5,7;7,7,6,7,9;7,5,8,6,7;7,8,8,5,9;5,8,6,8,7 2; 8,6,8,6;5,5,8,4;4,9,9,7;4,9,3,4;6,7,7,7,4;8,9,6,7,5;5,7,7,9,6;6,6,4,4,7 3; 5,8,9,8;7,8,8,7;6,6,7,6;5,7,9,7;6,3,5,8,8;5,6,6,6,8;7,7,6,6,7;8,5,3,6,4 4; 7,9,9,8;7,9,7,6;10,7,6,7;7,9,8,7;6,8,8,5,7;8,6,6,4,8;7,5,8,6,7;7,6,8,6,8 5; 9,7,4,6;4,6,5,5;7,5,6,7;6,9,7,6;7,9,7,6,6;6,7,7,8,8;7,9,6,8,6;8,6,8,8,5 6; 6,7,7,7;4,6,9,7;5,5,7,7;7,6,5,7;7,9,7,8,7;8,7,7,8,9;9,9,8,8,9;8,7,9,7,5 Math, Phsc, Chem and Bio have 4 weights for each score: 5%, 10%, 15%, 70%, which means, for example, the avg point of Math of Student 1 = 5x5% + 7x10% + 7x15% + 8x70% Litr, Lang, Hist and Geo has 5 weights: 5%, 10%, 10%, 15%, 60% Requirment: Calculate the avg point of each student and output to a dict like this: {‘Student 1’: {‘Math’: 9.00; ‘Physics’: 8.55, …}, ‘Student 2’: {…‘History’: 9.00; ‘Geography’: 8.55}} Thank you.
Considering that the script.py and your text file es student.txt are at the same path (directory): final_dict = {} with open("student.txt", "r") as f: for idx, l in enumerate(f.readlines()): if l != "\n": if idx == 0: l = l.replace("\n", "") header = l.split(", ")[1:] else: final_dict.update({f"Student {l[0]}": {}}) marks = l.split("; ")[1].replace("\n", "").split(";") for i, mark in enumerate(marks): current_subject_int_marks = tuple(map(int, mark.split(","))) len_marks = len(current_subject_int_marks) if len_marks < 5: avr = ( current_subject_int_marks[0] * 0.05 + current_subject_int_marks[1] * 0.10 + current_subject_int_marks[2] * 0.15 + current_subject_int_marks[3] * 0.70 ) else: avr = ( current_subject_int_marks[0] * 0.05 + current_subject_int_marks[1] * 0.10 + current_subject_int_marks[2] * 0.10 + current_subject_int_marks[3] * 0.15 + current_subject_int_marks[4] * 0.60 ) final_dict[f"Student {l[0]}"].update({header[i]: avr}) print(final_dict)
Get a certain combination of numbers in Python
Is there a efficient and convenient solution in Python to do something like - Find largest combination of two numbers x and y, with the following conditions - 0 < x < 1000 0 < y < 2000 x/y = 0.75 x & y are integers It's easy to do it using a simple graphing calculator but trying to find the best way to do it in Python
import pulp My_optimization_prob = pulp.LpProblem('My_Optimization_Problem', pulp.LpMaximize) # Creating the variables x = pulp.LpVariable("x", lowBound = 1, cat='Integer') y = pulp.LpVariable("y", lowBound = 1, cat='Integer') # Adding the Constraints My_optimization_prob += x + y #Maximize X and Y My_optimization_prob += x <= 999 # x < 1000 My_optimization_prob += y <= 1999 # y < 2000 My_optimization_prob += x - 0.75*y == 0 # x/y = 0.75 #Printing the Problem and Constraints print(My_optimization_prob) My_optimization_prob.solve() #printing X Y print('x = ',pulp.value(x)) print('y = ',pulp.value(y))
Probably just - z = [(x, y) for x in range(1, 1000) for y in range(1, 2000) if x/y==0.75] z.sort(key=lambda x: sum(x), reverse=True) z[0] #Returns (999, 1332) This is convenient, not sure if this is the most efficient way.
Another possible relatively efficient solution is - x_upper_limit = 1000 y_upper_limit = 2000 x = 0 y = 0 temp_variable = 0 ratio = 0.75 for i in range(x_upper_limit, 0, -1): temp_variable = i/ratio if temp_variable.is_integer() and temp_variable < y_upper_limit: x = i y = int(temp_variable) break print(x,y)
tkinter how do i calculate the normal vector and the conservation of the kinetic energy of all particles in python?
I was trying to calculate the normal vector n formula for normal vector and the tangential vectors t tangential vector n=v of two particles p1 and p2 to find the conservation of kinetic energy conservation of energy or here's another way to write the formula: formula 2 but i don't really know where and how in the code to implent this? from tkinter import * from random import * from math import * myHeight=250#400 myWidth=400#800 mySpeed=20#100 col= randint(0,255) radius = randint(0,50) print (col) #x= 60 global particules particules = [] def initialiseParticule(dx,dy,radius,color): x, y = randint(0,myWidth), randint(0,myHeight) #100 radius = randint(0,10) #color = randint(0,255) #col1=str(color) k = myCanvas.create_oval(x-radius,y-radius,\ x+radius,y+radius,\ width=2,fill=color) b = [x, y, dx, dy, radius] particules.append(b) #print(k) def updateParticules(): N = len(particules) for i in range(N): # update displacement particules[i][0] += particules[i][2] particules[i][1] += particules[i][3] #xi += vxi #yi += vyi # collision with walls if particules[i][0]<particules[i][4]or particules[i][0]>=myWidth-particules[i][4]: particules[i][2] *= -1 if particules[i][1]<particules[i][4] or particules[i][1]>=myHeight-particules[i][4]: particules[i][3] *= -1 # collision with other particles for j in range(N): if i != j: xi, yi = particules[i][0], particules[i][1] vxi, vyi = particules[i][2], particules[i][3] xj, yj = particules[j][0], particules[j][1] vxj, vyj = particules[j][2], particules[j][3] dij = sqrt((xi-xj)**2 + (yi-yj)**2) # print(dij) # # collision !!! if dij <= particules[i][4]+particules[j][4]: particules[i][2] *= -1 particules[j][2] *= -1 particules[i][3] *= -1 particules[j][3] *= -1 r = particules[i][4] myCanvas.coords(i+1, particules[i][0]-r, particules[i][1]-r, particules[i][0]+r, particules[i][1]+r) def animation(): miseAJourBalles() myCanvas.after(mySpeed, animation) mainWindow=Tk() mainWindow.title('Pong') #mainWindow.geometry(str(myWidth)+'x'+str(myHeight+100)) myCanvas=Canvas(mainWindow, bg='dark grey', height=myHeight, width=myWidth) myCanvas.pack(side=TOP) N = 3 for n in range(N): # initialiseParticules( -1, -1, radius,'randint(0,10)') initialiseParticules( -1, -1, radius,'pink') animation() #bou=Button(mainWindow,text="Leave",command=mainWindow.destroy) #bou.pack() mainWindow.mainloop()
Geometry Arc Algorithm
I searched all internet and didn't find any pseudo code that solved this problem, I want to find an Arc between two points, A and B, using 5 arguments: Start Point End Point Radius (Don't know if this is needed) Angle Quality Example: StartPoint = The green point on the left is the Start Point set on the arguments EndPoint = The green point on the right is the End Point set on the arguments Angle = Angle of the Arc(Semi Circle) Quality = How many red circles to create I would like to have a pseudo code to solve this problem Thanks in advance :D
Let start point is P0, end point P1, angle Fi. R is not needed At first find arc center. Get middle of P0-P1 segment. M = (P0 + P1) / 2 // M.x = (P0.x + P1.x) / 2 , same for y And direction vector D = (P1 - P0) / 2 Get length of D lenD = Math.Hypot(D.x, D.y) //Vector.Length, sqrt of sum of squares Get unit vector uD = D / lenD Get (left) perpendicular vector (P.x, P.y) = (-uD.y, ud.x) Now circle center if F = Pi then C.x = M.x C.y = M.y else C.x = M.x + P.x * Len / Tan(Fi/2) C.y = M.y + P.y * Len / Tan(Fi/2) Vector from center to start point: CP0.x = P0.x - C.x CP0.y = P0.y - C.y Then you can calculate coordinates of N intermediate points at the arc using rotation of vector CP0 around center point an = i * Fi / (NSeg + 1); X[i] = C.x + CP0.x * Cos(an) - CP0.y * Sin(an) Y[i] = C.y + CP0.x * Sin(an) + CP0.y * Cos(an) Working Delphi code procedure ArcByStartEndAngle(P0, P1: TPoint; Angle: Double; NSeg: Integer); var i: Integer; len, dx, dy, mx, my, px, py, t, cx, cy, p0x, p0y, an: Double; xx, yy: Integer; begin mx := (P0.x + P1.x) / 2; my := (P0.y + P1.y) / 2; dx := (P1.x - P0.x) / 2; dy := (P1.y - P0.y) / 2; len := Math.Hypot(dx, dy); px := -dy / len; py := dx / len; if Angle = Pi then t := 0 else t := len / Math.Tan(Angle / 2); cx := mx + px * t; cy := my + py * t; p0x := P0.x - cx; p0y := P0.y - cy; for i := 0 to NSeg + 1 do begin an := i * Angle / (NSeg + 1); xx := Round(cx + p0x * Cos(an) - p0y * Sin(an)); yy := Round(cy + p0x * Sin(an) + p0y * Cos(an)); Canvas.Ellipse(xx - 3, yy - 3, xx + 4, yy + 4); end; end; Result for (Point(100, 0), Point(0, 100), Pi / 2, 8 (Y-axis down at the picture)
Extrapolation -- awk based
I need help in the following: I have a data file (columns separated by "\t" tabular) like this data.dat # y1 y2 y3 y4 17.1685 21.6875 20.2393 26.3158 These are x values of 4 points for a linear fit. The four y values are constant: 0, 200, 400, 600. I can create a linear fit of the point pairs (x,y): (x1,y1)=(17.1685,0), (x2,y2)=(21.6875,200), (x3,y3)=(20.2393,400), (x4,y4)=(26.3158,600). Now I would like to make a linear fit on three of these point paris, (x1,y1), (x2,y2), (x3,y3) and (x2,y2), (x3,y3), (x4,y4) and (x1,y1), (x3,y3), (x4,y4) and (x1,y1), (x2,y2), (x4,y4). If I have these three of points with a linear fit I would like to know the value of the x value of the extrapolated point being out of these three fitted points. I have so far this awk code: #!/usr/bin/awk -f BEGIN{ z[1] = 0; z[2] = 200; z[3] = 400; z[4] = 600; } { split($0,str,"\t"); n = 0.0; for(i=1; i<=NF; i++) { centr[i] = str[i]; n += 1.0; # printf("%d\t%f\t%.1f\t",i,centr[i],z[i]); } # print ""; if (n > 2) { lsq(n,z,centr); } } function lsq(n,x,y) { sx = 0.0 sy = 0.0 sxx = 0.0 syy = 0.0 sxy = 0.0 eps = 0.0 for (i=1;i<=n;i++) { sx += x[i] sy += y[i] sxx += x[i]*x[i] sxy += x[i]*y[i] syy += y[i]*y[i] } if ( (n==0) || ((n*sxx-sx*sx)==0) ) { next; } # print "number of data points = " n; a = (sxx*sy-sxy*sx)/(n*sxx-sx*sx) b = (n*sxy-sx*sy)/(n*sxx-sx*sx) for(i=1;i<=n;i++) { ycalc[i] = a+b*x[i] dy[i] = y[i]-ycalc[i] eps += dy[i]*dy[i] } print "# Intercept =\t"a" print "# Slope =\t"b" for (i=1;i<=n;i++) { printf("%8g %8g %8g \n",x[i],y[i],ycalc[i]) } } # function lsq() So, If we extrapolate to the place of 4th 0 17.1685 <--(x1,y1) 200 21.6875 <--(x2,y2) 400 20.2393 <--(x3,y3) 600 22.7692 <<< (x4 = 600,y1 = 22.7692) If we extrapolate to the place of 3th 0 17.1685 <--(x1,y1) 200 21.6875 <--(x2,y2) 400 23.6867 <<< (x3 = 400,y3 = 23.6867) 600 26.3158 <--(x4,y4) 0 17.1685 200 19.35266 <<< 400 20.2393 600 26.3158 0 18.1192 <<< 200 21.6875 400 20.2393 600 26.3158 My current output is the following: $> ./prog.awk data.dat # Intercept = 17.4537 # Slope = 0.0129968 0 17.1685 17.4537 200 21.6875 20.0531 400 20.2393 22.6525 600 26.3158 25.2518
Assuming the core calculation in the lsq function is OK (it looks about right, but I haven't scrutinized it), then that gives you the slope and intercept for the least sum of squares line of best fit for the input data set (parameters x, y, n). I'm not sure I understand the tail end of the function. For your 'take three points and calculate the fourth' problem, the simplest way is to generate the 4 subsets (logically, by deleting one point from the set of four on each of four calls), and redo the calculation. You need to call another function that takes the line data (slope, intercept) from lsq and interpolates (extrapolates) the value at another y value. That's a straight-forward calculation (x = m * y + c), but you need to determine which y value is missing from the set of 3 you pass in. You could 'optimize' (meaning 'complicate') this scheme by dropping one value at a time from the 'sums of squares' and 'sums' and 'sum of products' values, recalculating the slope, intercept, and then calculating the missing point again. (I'll also observe that normally it would be the x-coordinates with the fixed values 0, 200, 400, 600 and the y-coordinates would be the values read. However, that's just a matter of orientation, so it is not crucial.) Here's at least plausibly working code. Since awk automatically splits on white space, there's no need for you to split on tabs specifically; the read loop takes this into account. The code needs serious refactoring; there is a ton of repetition in it - however, I also have a job that I'm supposed to do. #!/usr/bin/awk -f BEGIN{ z[1] = 0; z[2] = 200; z[3] = 400; z[4] = 600; } { for (i = 1; i <= NF; i++) { centr[i] = $i } if (NF > 2) { lsq(NF, z, centr); } } function lsq(n, x, y) { if (n == 0) return sx = 0.0 sy = 0.0 sxx = 0.0 syy = 0.0 sxy = 0.0 for (i = 1; i <= n; i++) { print "x[" i "] = " x[i] ", y[" i "] = " y[i] sx += x[i] sy += y[i] sxx += x[i]*x[i] sxy += x[i]*y[i] syy += y[i]*y[i] } if ((n*sxx - sx*sx) == 0) return # print "number of data points = " n; a = (sxx*sy-sxy*sx)/(n*sxx-sx*sx) b = (n*sxy-sx*sy)/(n*sxx-sx*sx) for (i = 1; i <= n; i++) { ycalc[i] = a+b*x[i] } print "# Intercept = " a print "# Slope = " b print "Line: x = " a " + " b " * y" for (i = 1; i <= n; i++) { printf("x = %8g, yo = %8g, yc = %8g\n", x[i], y[i], ycalc[i]) } print "" print "Different subsets\n" for (drop = 1; drop <= n; drop++) { print "Subset " drop sx = sy = sxx = sxy = syy = 0 j = 1 for (i = 1; i <= n; i++) { if (i == drop) continue print "x[" j "] = " x[i] ", y[" j "] = " y[i] sx += x[i] sy += y[i] sxx += x[i]*x[i] sxy += x[i]*y[i] syy += y[i]*y[i] j++ } if (((n-1)*sxx - sx*sx) == 0) continue a = (sxx*sy-sxy*sx)/((n-1)*sxx-sx*sx) b = ((n-1)*sxy-sx*sy)/((n-1)*sxx-sx*sx) print "Line: x = " a " + " b " * y" xt = x[drop] yt = a + b * xt; print "Interpolate: x = " xt ", y = " yt } } Since awk doesn't provide an easy way to pass back multiple values from a function, nor does it provide structures other than arrays (sometimes associative), it is not perhaps the best language for this task. On the other hand, it can be made to do the job. You might be able to bundle the Least Squares calculation in a function that returns an array containing the slope and intercept, and then use that. Your turn to explore options. Given the script lsq.awk and the input file lsq.data shown, I get the output shown: $ cat lsq.data 17.1685 21.6875 20.2393 26.3158 $ awk -f lsq.awk lsq.data x[1] = 0, y[1] = 17.1685 x[2] = 200, y[2] = 21.6875 x[3] = 400, y[3] = 20.2393 x[4] = 600, y[4] = 26.3158 # Intercept = 17.4537 # Slope = 0.0129968 Line: x = 17.4537 + 0.0129968 * y x = 0, yo = 17.1685, yc = 17.4537 x = 200, yo = 21.6875, yc = 20.0531 x = 400, yo = 20.2393, yc = 22.6525 x = 600, yo = 26.3158, yc = 25.2518 Different subsets Subset 1 x[1] = 200, y[1] = 21.6875 x[2] = 400, y[2] = 20.2393 x[3] = 600, y[3] = 26.3158 Line: x = 18.1192 + 0.0115708 * y Interpolate: x = 0, y = 18.1192 Subset 2 x[1] = 0, y[1] = 17.1685 x[2] = 400, y[2] = 20.2393 x[3] = 600, y[3] = 26.3158 Line: x = 16.5198 + 0.0141643 * y Interpolate: x = 200, y = 19.3526 Subset 3 x[1] = 0, y[1] = 17.1685 x[2] = 200, y[2] = 21.6875 x[3] = 600, y[3] = 26.3158 Line: x = 17.7985 + 0.0147205 * y Interpolate: x = 400, y = 23.6867 Subset 4 x[1] = 0, y[1] = 17.1685 x[2] = 200, y[2] = 21.6875 x[3] = 400, y[3] = 20.2393 Line: x = 18.163 + 0.007677 * y Interpolate: x = 600, y = 22.7692 $ Edit: In the previous version of the answer, the subsets were multiplying by n instead of (n-1). The values in the revised output seem to agree with what you expect. The residual issues are presentational, not computational.