How to solve non linear optimization problem with scipy - python-3.x

I need to solve a non linear optimization problem in Python. I found out that scipy solves optimization problems, however I don't know what I am doing wrong since with some example input it can't find the correct solution that I have in NEOS server solver Knitro AMPL.
My problem is that, given a set of points it must find the biggest ellipse inscribed that at max touches those points and the points are never included inside of it.
Theory
Formulating the optimization problem, I have a and b the semiaxis, phi the rotation, xc and yc the coordinates of the centre and points the list of points with each element in the form of [x, y] -> [0, 1] indices.
On paper the problem and the constraints are these, a, b, phi, xc, yc are real, the points are integers:
NEOS
The files I used in NEOS are these:
mod
dat
run
With successful results (complete):
xc = 143.012
yc = 262.634
a = 181.489
b = 140.429
phi = 1.43575
Python
So, my python code is this, it is my first time using scipy for optimization, so I don't exclude errors of understanding how it works from the documentation.
from typing import List
import numpy as np
from scipy.optimize import *
def ellipse_calc(
points: List[List[int]],
verbose: bool = False
):
centre = [0, 0]
for i in range(len(points)):
centre[0] += points[i][0]
centre[1] += points[i][1]
centre[0] /= len(points)
centre[1] /= len(points)
if verbose:
print(f'centre: {centre[0]:.2f}, {centre[1]:.2f}')
max_x = max([p[0] for p in points])
max_y = max([p[1] for p in points])
min_x = min([p[0] for p in points])
min_y = min([p[1] for p in points])
initial_axis = 0.25 * (max_x - min_x + max_y - min_y)
if verbose:
print(initial_axis)
constraints = [
NonlinearConstraint(lambda x: x[0], 1, np.inf),
NonlinearConstraint(lambda x: x[1], 1, np.inf),
NonlinearConstraint(lambda x: x[2], 0, np.inf),
]
for i in range(len(points)):
constraints += [NonlinearConstraint(
lambda x:
(points[i][0] - x[3]) ** 2 * (np.cos(x[2]) ** 2 / x[0]**2 + np.sin(x[2]) ** 2 / x[1]**2) +
(points[i][1] - x[4]) ** 2 * (np.sin(x[2]) ** 2 / x[0]**2 + np.cos(x[2]) ** 2 / x[1]**2) +
2 * (points[i][0] - x[3]) * (points[i][1] - x[4]) *
np.cos(x[2]) * np.sin(x[2]) * (1 / x[1]**2 - 1 / x[0]**2), 1, np.inf)]
result = minimize(
lambda x: -np.pi * x[0] * x[1],
[initial_axis, initial_axis, 0, centre[0], centre[1]],
constraints=constraints
)
print(result)
if __name__ == '__main__':
points = [[50,44],[91,44],[161,44],[177,44],[44,88],[189,88],[239,88],[259,88],[2,132],[250,132],[2,176],[329,176],[2,220],[289,220],[2,264],[288,264],[2,308],[277,308],[2,352],[285,352],[2,396],[25,396],[35,396],[231,396],[284,396],[298,396],[36,440],[76,440],[106,440],[173,440]]
ellipse_calc(points, True)
This try, that has the same data I tried on NEOS gives as output the following:
fun: -8.992626773255127e+40
jac: array([-5.68832805e+20, -4.96651566e+20, -0.00000000e+00, -0.00000000e+00,
-0.00000000e+00])
message: 'Inequality constraints incompatible'
nfev: 54
nit: 10
njev: 9
status: 4
success: False
x: array([ 1.58089104e+20, 1.81065104e+20, -1.24564497e+15, -1.55647883e+10,
-2.76654483e+10])
Does anyone know what I am doing wrong and how to fix it? Also, I don't really know if it is possible to solve this problem with scipy, in that case I am looking for a free library to solve it or even to alternative methods of finding that ellipse equation

This isn't a complete answer, but it should help you to get started. Here are two hints:
Pass simple box constraints on the variables as boundaries, not as constraints. That is, use
bounds = [(1, None), (1, None), (0, None), (None, None), (None, None)]
and pass it to minimize via the bounds parameter.
You need to be really careful when defining constraints through lambda expressions inside a loop, see here. You need to capture the loop variable i by lambda x, i=i: your_fun. Otherwise, each of your constraints uses i=29 and thus evaluates the last point. This can easily be observed by evaluating all constraints for a specific value.
Then you should at least get a feasible solution with an objective value of 79384. Note also that you can shorten your code significantly by using numpy functions instead of loops.

Related

Numba jit and Scipy

I have found a few posts on the subject here, but most of them did not have a useful answer.
I have a 3D NumPy dataset [images number, x, y] in which the probability that the pixel belongs to a class is stored as a float (0-1). I would like to correct the wrong segmented pixels (with high performance).
The probabilities are part of a movie in which objects are moving from right to left and possibly back again. The basic idea is that I fit the pixels with a Gaussian function or comparable function and look at around 15-30 images ( [i-15 : i+15 ,x, y] ). It is very probable that if the previous 5 pixels and the following 5 pixels are classified in this class, this pixel also belongs to this class.
To illustrate my problem I add a sample code, the results were calculated without the usage of numba:
from scipy.optimize import curve_fit
from scipy import exp
import numpy as np
from numba import jit
#jit
def fit(size_of_array, outputAI, correct_output):
x = range(size_of_array[0])
for i in range(size_of_array[1]):
for k in range(size_of_array[2]):
args, cov = curve_fit(gaus, x, outputAI[:, i, k])
correct_output[2, i, k] = gaus(2, *args)
return correct_output
#jit
def gaus(x, a, x0, sigma):
return a*exp(-(x-x0)**2/(2*sigma**2))
if __name__ == '__main__':
# output_AI = [imageNr, x, y] example 5, 2, 2
# At position [2][1][1] is the error, the pixels before and after were classified to the class but not this pixel.
# The objects do not move in such a speed, so the probability should be corrected.
outputAI = np.array([[[0.1, 0], [0, 0]], [[0.8, 0.3], [0, 0.2]], [[1, 0.1], [0, 0.2]],
[[0.1, 0.3], [0, 0.2]], [[0.8, 0.3], [0, 0.2]]])
correct_output = np.zeros(outputAI.shape)
# I correct now in this example only all pixels in image 3, in the code a loop runs over the whole 3D array and
# corrects every image and every pixel separately
size_of_array = outputAI.shape
correct_output = fit(size_of_array, outputAI, correct_output)
# numba error: Compilation is falling back to object mode WITH looplifting enabled because Function "fit" failed
# type inference due to: Untyped global name 'curve_fit': cannot determine Numba type of <class 'function'>
print(correct_output[2])
# [[9.88432346e-01 2.10068763e-01]
# [6.02428922e-20 2.07921125e-01]]
# The wrong pixel at position [0][0] was corrected from 0.2 to almost 1, the others are still not assigned
# to the class.
Unfortunately numba does NOT work. I always get the following error:
Compilation is falling back to object mode WITH looplifting enabled because Function "fit" failed type inference due to: Untyped global name 'curve_fit': cannot determine Numba type of <class 'function'>
** ------------------------------------------------------------------------**
Update 04.08.2020
Currently I have this solution for my problem in mind. But I am open for further suggestions.
from scipy.optimize import curve_fit
from scipy import exp
import numpy as np
import time
def fit_without_scipy(input):
x = range(input.size)
x0 = outputAI[i].argmax()
a = input.max()
var = (input - input.mean())**2
return a * np.exp(-(x - x0) ** 2 / (2 * var.mean()))
def fit(input):
x = range(len(input))
try:
args, cov = curve_fit(gaus, x, outputAI[i])
return gaus(x, *args)
except:
return input
def gaus(x, a, x0, sigma):
return a * exp(-(x - x0) ** 2 / (2 * sigma ** 2))
if __name__ == '__main__':
nr = 31
N = 100000
x = np.linspace(0, 30, nr)
outputAI = np.zeros((N, nr))
correct_output = outputAI.copy()
correct_output_numba = outputAI.copy()
perfekt_result = outputAI.copy()
for i in range(N):
perfekt_result[i] = gaus(x, np.random.random(), np.random.randint(-N, 2*N), np.random.random() * np.random.randint(0, 100))
outputAI[i] = perfekt_result[i] + np.random.normal(0, 0.5, nr)
start = time.time()
for i in range(N):
correct_output[i] = fit(outputAI[i])
print("Time with scipy: " + str(time.time() - start))
start = time.time()
for i in range(N):
correct_output_numba[i] = fit_without_scipy(outputAI[i])
print("Time without scipy: " + str(time.time() - start))
for i in range(N):
correct_output[i] = abs(correct_output[i] - perfekt_result[i])
correct_output_numba[i] = abs(correct_output_numba[i] - perfekt_result[i])
print("Mean deviation with scipy: " + str(correct_output.mean()))
print("Mean deviation without scipy: " + str(correct_output_numba.mean()))
Output [with nr = 31 and N = 100000]:
Time with scipy: 193.27853846549988 secs
Time without scipy: 2.782526969909668 secs
Mean deviation with scipy: 0.03508043754489116
Mean deviation without scipy: 0.0419951370808896
In the next step I would try to speed up the code even more with numba. Currently this does not work because of the argmax function.
Curve_fit eventually calls into either least_squares (pure python) or leastsq (C extension). You have three options:
figure out how to make numba-jitted code talk to a C extension which powers leastsq
extract relevant parts of least_squares and numba.jit them
implement the LowLevelCallable support for least_squares or minimize.
None of these is easy. OTOH all of these would be interesting to a wider audience if successful.

Tolerance for termination is ignored in scipy optimize minimize

I have a simple optimization problem that, with some specific data, makes scipy.optimize.minimize ignore the tol argument. From the documentation, tol determines the "tolerance for termination", that is, the maximum error accepted for the objective function, in my understanding (am I wrong?). However in the next working example, when tol is set to 0.1 for example, or other small numbers, the optimizations finishes with a "Optimization terminated successfully" message even when the objective function > tol. Is this a bug in Scipy's method or am I misunderstanding something here?
The optimization problem: I need to make a linear combination of var1 and var2, which are two time series, scaling them by parameters Btd and Bta. I need that the mean of the linear combination approximates to a target value Target, a scalar. So I simply minimize the absolute difference between np.mean(Btd*var1 + Bta*var2) and Target. The constraints are that the scaling coefficients must be >0 and that the ratio of means np.mean(Btd*var1)/np.mean(Bta*var2) should approximate to the function gi/(1-gi), where gi is a scalar in the interval [0,1].
Reproducible code:
import numpy as np
import scipy.optimize as opt
# The data that exactly reproduce the error:
time = np.arange(1979,2011)
var2=np.array([ 88.95705521, 74.5398773 , 72.08588957, 65.64417178,
50. , 72.39263804, 77.3006135 , 72.08588957,
64.41717791, 96.62576687, 69.93865031, 84.96932515,
86.50306748, 82.20858896, 80.98159509, 73.00613497,
66.25766871, 67.48466258, 79.75460123, 65.64417178,
70.24539877, 84.66257669, 76.3803681 , 83.74233129,
83.74233129, 78.2208589 , 88.03680982, 87.73006135,
100. , 71.16564417, 73.6196319 , 85.58282209])
var1=np.array([300. , 420.89552239, 333.58208955, 355.97014925,
376.11940299, 510.44776119, 420.89552239, 434.32835821,
333.58208955, 394.02985075, 523.88059701, 411.94029851,
353.73134328, 434.32835821, 355.97014925, 398.50746269,
476.86567164, 371.64179104, 445.52238806, 544.02985075,
416.41791045, 427.6119403 , 541.79104478, 579.85074627,
429.85074627, 414.17910448, 420.89552239, 528.35820896,
577.6119403 , 490.29850746, 600. , 454.47761194])
X=np.transpose([var1, var2])
# Global parameters
Target = 3.0
gi = 0.7
# This model is a simple linear combination of the two time series.
def MyModel(modelparams, X, gi):
Bta, Btd = modelparams
Eta = Bta*X[:,0]
Etd = Btd*X[:,1]
Etot = Eta + Etd
return Etot, Eta, Etd
# Objective function
def Obj(modelparams):
Bta, Bdt = modelparams
Etot, Eta, Etd = MyModel([Bta, Bdt], X, gi)
return abs(np.mean(Etot)-Target)
# Ratio constraint
def Ratio(modelparams):
import numpy as np
Bta, Btd = modelparams
Etot, Eta, Etd = MyModel([Bta, Btd], X, gi)
A = np.mean(Etd)/np.mean(Eta)
B = gi/(1-gi)
# The epsilon comes in to loosen a bit only this constraint
epsilon = 0.1
return -abs(abs(A-B)-epsilon)
# This is my solution to make the parameters different from zero.
# The ineq-type constraint makes them >=0.
def TDPos(modelparams):
Bta, Btd = modelparams
return Btd - 10**(-5)
def TAPos(modelparams):
Bta, Btd = modelparams
return Bta - 10**(-5)
constraints=[{'type': 'ineq', 'fun': Ratio},
{'type': 'ineq', 'fun': TDPos},
{'type': 'ineq', 'fun': TAPos}]
# Bounds or Model Parameters
bounds=((0, None), (0, None))
# Minimize
modelparams0=[Target/np.nanmean(var1), Target/np.nanmean(var2)]
result = opt.minimize(Obj, modelparams0,
tol=0.1,
method='SLSQP',
options={'maxiter': 40000 }, #,'ftol': 0.1},
bounds=bounds,
constraints=constraints)
print(result)
Prints out:
fun: 3.0
jac: array([439.92537314, 77.31019938])
message: 'Optimization terminated successfully.'
nfev: 20
nit: 4
njev: 4
status: 0
success: True
x: array([0., 0.])
My problem:
fun: 3.0 > tol: 0.1
which is not desired.
TL;DR: scipy.optimize.minimize ignores the stop argument tol. Why?
EDIT: Moreover, the optimal solution [0, 0] ignores two of the ineq constraints, designed to make this couple of parameters > 10**(-5). Is this part of the same problem?

PyTorch doesn't seem to be optimizing correctly

I have posted this question on Data Science StackExchange site since StackOverflow does not support LaTeX. Linking it here because this site is probably more appropriate.
The question with correctly rendered LaTeX is here: https://datascience.stackexchange.com/questions/48062/pytorch-does-not-seem-to-be-optimizing-correctly
The idea is that I am considering sums of sine waves with different phases. The waves are sampled with some sample rate s in the interval [0, 2pi]. I need to select phases in such a way, that the sum of the waves at any sample point is minimized.
Below is the Python code. Optimization does not seem to be computed correctly.
import numpy as np
import torch
def phaseOptimize(n, s = 48000, nsteps = 1000):
learning_rate = 1e-3
theta = torch.zeros([n, 1], requires_grad=True)
l = torch.linspace(0, 2 * np.pi, s)
t = torch.stack([l] * n)
T = t + theta
for jj in range(nsteps):
loss = T.sin().sum(0).pow(2).sum() / s
loss.backward()
theta.data -= learning_rate * theta.grad.data
print('Optimal theta: \n\n', theta.data)
print('\n\nMaximum value:', T.sin().sum(0).abs().max().item())
Below is a sample output.
phaseOptimize(5, nsteps=100)
Optimal theta:
tensor([[1.2812e-07],
[1.2812e-07],
[1.2812e-07],
[1.2812e-07],
[1.2812e-07]], requires_grad=True)
Maximum value: 5.0
I am assuming this has something to do with broadcasting in
T = t + theta
and/or the way I am computing the loss function.
One way to verify that optimization is incorrect, is to simply evaluate the loss function at random values for the array $\theta_1, \dots, \theta_n$, say uniformly distributed in $[0, 2\pi]$. The maximum value in this case is almost always much lower than the maximum value reported by phaseOptimize(). Much easier in fact is to consider the case with $n = 2$, and simply evaluate at $\theta_1 = 0$ and $\theta_2 = \pi$. In that case we get:
phaseOptimize(2, nsteps=100)
Optimal theta:
tensor([[2.8599e-08],
[2.8599e-08]])
Maximum value: 2.0
On the other hand,
theta = torch.FloatTensor([[0], [np.pi]])
l = torch.linspace(0, 2 * np.pi, 48000)
t = torch.stack([l] * 2)
T = t + theta
T.sin().sum(0).abs().max().item()
produces
3.2782554626464844e-07
You have to move computing T inside the loop, or it will always have the same constant value, thus constant loss.
Another thing is to initialize theta to different values at indices, otherwise because of the symmetric nature of the problem the gradient is the same for every index.
Another thing is that you need to zero gradient, because backward just accumulates them.
This seems to work:
def phaseOptimize(n, s = 48000, nsteps = 1000):
learning_rate = 1e-1
theta = torch.zeros([n, 1], requires_grad=True)
theta.data[0][0] = 1
l = torch.linspace(0, 2 * np.pi, s)
t = torch.stack([l] * n)
for jj in range(nsteps):
T = t + theta
loss = T.sin().sum(0).pow(2).sum() / s
loss.backward()
theta.data -= learning_rate * theta.grad.data
theta.grad.zero_()
You're being bitten by both PyTorch and math. Firstly, you need to
Zero out the gradient by setting theta.grad = None before each backward step. Otherwise the gradients accumulate instead of overwriting the previous ones
You need to recalculate T at each step. PyTorch is not symbolic, unlike TensorFlow and T = t + theta means "T equals the sum of current t and current theta" and not "T equals the sum of t and theta, whatever their values may be at any time in the future".
With those fixes you get the following code:
def phaseOptimize(n, s = 48000, nsteps = 1000):
learning_rate = 1e-3
theta = torch.zeros(n, 1, requires_grad=True)
l = torch.linspace(0, 2 * np.pi, s)
t = torch.stack([l] * n)
T = t + theta
for jj in range(nsteps):
T = t + theta
loss = T.sin().sum(0).pow(2).sum() / s
theta.grad = None
loss.backward()
theta.data -= learning_rate * theta.grad.data
T = t + theta
print('Optimal theta: \n\n', theta.data)
print('\n\nMaximum value:', T.sin().sum(0).abs().max().item())
which will still not work as you expect because of math.
One can easily see that the minimum to your loss function is when theta are also uniformly spaced over [0, 2pi). The problem is that you are initializing your parameters as torch.zeros, which leads to all those values being equal (this is the polar opposite of equispaced!). Since your loss function is symmetrical with respect to permutations of theta, the computed gradients are equal and the gradient descent algorithm can never "differentiate them". In more mathematical terms, you're unlucky enough to initialize your algorithm exactly on a saddle point, so it cannot continue. If you add any noise, it will converge. For instance with
theta = torch.zeros(n, 1) + 0.001 * torch.randn(n, 1)
theta.requires_grad_(True)

How to avoid NaN in numpy implementation of logistic regression?

EDIT: I already made significant progress. My current question is written after my last edit below and can be answered without the context.
I currently follow Andrew Ng's Machine Learning Course on Coursera and tried to implement logistic regression today.
Notation:
X is a (m x n)-matrix with vectors of input variables as rows (m training samples of n-1 variables, the entries of the first column are equal to 1 everywhere to represent a constant).
y is the corresponding vector of expected output samples (column vector with m entries equal to 0 or 1)
theta is the vector of model coefficients (row vector with n entries)
For an input row vector x the model will predict the probability sigmoid(x * theta.T) for a positive outcome.
This is my Python3/numpy implementation:
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
vec_sigmoid = np.vectorize(sigmoid)
def logistic_cost(X, y, theta):
summands = np.multiply(y, np.log(vec_sigmoid(X*theta.T))) + np.multiply(1 - y, np.log(1 - vec_sigmoid(X*theta.T)))
return - np.sum(summands) / len(y)
def gradient_descent(X, y, learning_rate, num_iterations):
num_parameters = X.shape[1] # dim theta
theta = np.matrix([0.0 for i in range(num_parameters)]) # init theta
cost = [0.0 for i in range(num_iterations)]
for it in range(num_iterations):
error = np.repeat(vec_sigmoid(X * theta.T) - y, num_parameters, axis=1)
error_derivative = np.sum(np.multiply(error, X), axis=0)
theta = theta - (learning_rate / len(y)) * error_derivative
cost[it] = logistic_cost(X, y, theta)
return theta, cost
This implementation seems to work fine, but I encountered a problem when calculating the logistic-cost. At some point the gradient descent algorithm converges to a pretty good fitting theta and the following happens:
For some input row X_i with expected outcome 1 X * theta.T will become positive with a good margin (for example 23.207). This will lead to sigmoid(X_i * theta) to become exactly 1.0000 (this is because of lost precision I think). This is a good prediction (since the expected outcome is equal to 1), but this breaks the calculation of the logistic cost, since np.log(1 - vec_sigmoid(X*theta.T)) will evaluate to NaN. This shouldn't be a problem, since the term is multiplied with 1 - y = 0, but once a value of NaN occurs, the whole calculation is broken (0 * NaN = NaN).
How should I handle this in the vectorized implementation, since np.multiply(1 - y, np.log(1 - vec_sigmoid(X*theta.T))) is calculated in every row of X (not only where y = 0)?
Example input:
X = np.matrix([[1. , 0. , 0. ],
[1. , 1. , 0. ],
[1. , 0. , 1. ],
[1. , 0.5, 0.3],
[1. , 1. , 0.2]])
y = np.matrix([[0],
[1],
[1],
[0],
[1]])
Then theta, _ = gradient_descent(X, y, 10000, 10000) (yes, in this case we can set the learning rate this large) will set theta as:
theta = np.matrix([[-3000.04008972, 3499.97995514, 4099.98797308]])
This will lead to vec_sigmoid(X * theta.T) to be the really good prediction of:
np.matrix([[0.00000000e+00], # 0
[1.00000000e+00], # 1
[1.00000000e+00], # 1
[1.95334953e-09], # nearly zero
[1.00000000e+00]]) # 1
but logistic_cost(X, y, theta) evaluates to NaN.
EDIT:
I came up with the following solution. I just replaced the logistic_cost function with:
def new_logistic_cost(X, y, theta):
term1 = vec_sigmoid(X*theta.T)
term1[y == 0] = 1
term2 = 1 - vec_sigmoid(X*theta.T)
term2[y == 1] = 1
summands = np.multiply(y, np.log(term1)) + np.multiply(1 - y, np.log(term2))
return - np.sum(summands) / len(y)
By using the mask I just calculate log(1) at the places at which the result will be multiplied with zero anyway. Now log(0) will only happen in wrong implementations of gradient descent.
Open questions: How can I make this solution more clean? Is it possible to achieve a similar effect in a cleaner way?
If you don't mind using SciPy, you could import expit and xlog1py from scipy.special:
from scipy.special import expit, xlog1py
and replace the expression
np.multiply(1 - y, np.log(1 - vec_sigmoid(X*theta.T)))
with
xlog1py(1 - y, -expit(X*theta.T))
I know it is an old question but I ran into the same problem, and maybe it can help others in the future, I actually solved it by implementing normalization on the data before appending X0.
def normalize_data(X):
mean = np.mean(X, axis=0)
std = np.std(X, axis=0)
return (X-mean) / std
After this all worked well!

Linear Regression algorithm works with one data-set but not on another, similar data-set. Why?

I created a linear regression algorithm following a tutorial and applied it to the data-set provided and it works fine. However the same algorithm does not work on another similar data-set. Can somebody tell me why this happens?
def computeCost(X, y, theta):
inner = np.power(((X * theta.T) - y), 2)
return np.sum(inner) / (2 * len(X))
def gradientDescent(X, y, theta, alpha, iters):
temp = np.matrix(np.zeros(theta.shape))
params = int(theta.ravel().shape[1])
cost = np.zeros(iters)
for i in range(iters):
err = (X * theta.T) - y
for j in range(params):
term = np.multiply(err, X[:,j])
temp[0, j] = theta[0, j] - ((alpha / len(X)) * np.sum(term))
theta = temp
cost[i] = computeCost(X, y, theta)
return theta, cost
alpha = 0.01
iters = 1000
g, cost = gradientDescent(X, y, theta, alpha, iters)
print(g)
On running the algo through this dataset I get the output as matrix([[ nan, nan]]) and the following errors:
C:\Anaconda3\lib\site-packages\ipykernel\__main__.py:2: RuntimeWarning: overflow encountered in power
from ipykernel import kernelapp as app
C:\Anaconda3\lib\site-packages\ipykernel\__main__.py:11: RuntimeWarning: invalid value encountered in double_scalars
However this data set works just fine and outputs matrix([[-3.24140214, 1.1272942 ]])
Both the datasets are similar, I have been over it many times but can't seem to figure out why it works on one dataset but not on other. Any help is welcome.
Edit: Thanks Mark_M for editing tips :-)
[Much better question, btw]
It's hard to know exactly what's going on here, but basically your cost is going the wrong direction and spiraling out of control, which results in an overflow when you try to square the value.
I think in your case it boils down to your step size (alpha) being too big which can cause gradient descent to go the wrong way. You need to watch the cost in gradient descent and makes sure it's always going down, if it's not either something is broken or alpha is to large.
Personally, I would reevaluate the code and try to get rid of the loops. It's a matter of preference, but I find it easier to work with X and Y as column vectors. Here is a minimal example:
from numpy import genfromtxt
# this is your 'bad' data set from github
my_data = genfromtxt('testdata.csv', delimiter=',')
def computeCost(X, y, theta):
inner = np.power(((X # theta.T) - y), 2)
return np.sum(inner) / (2 * len(X))
def gradientDescent(X, y, theta, alpha, iters):
for i in range(iters):
# you don't need the extra loop - this can be vectorize
# making it much faster and simpler
theta = theta - (alpha/len(X)) * np.sum((X # theta.T - y) * X, axis=0)
cost = computeCost(X, y, theta)
if i % 10 == 0: # just look at cost every ten loops for debugging
print(cost)
return (theta, cost)
# notice small alpha value
alpha = 0.0001
iters = 100
# here x is columns
X = my_data[:, 0].reshape(-1,1)
ones = np.ones([X.shape[0], 1])
X = np.hstack([ones, X])
# theta is a row vector
theta = np.array([[1.0, 1.0]])
# y is a columns vector
y = my_data[:, 1].reshape(-1,1)
g, cost = gradientDescent(X, y, theta, alpha, iters)
print(g, cost)
Another useful technique is to normalize your data before doing regression. This is especially useful when you have more than one feature you're trying to minimize.
As a side note - if you're step size is right you shouldn't get overflows no matter how many iterations you do because the cost will will decrease with every iteration and the rate of decrease will slow.
After 1000 iterations I arrived at a theta and cost of:
[[ 1.03533399 1.45914293]] 56.041973778
after 100:
[[ 1.01166889 1.45960806]] 56.0481988054
You can use this to look at the fit in an iPython notebook:
%matplotlib inline
import matplotlib.pyplot as plt
plt.scatter(my_data[:, 0].reshape(-1,1), y)
axes = plt.gca()
x_vals = np.array(axes.get_xlim())
y_vals = g[0][0] + g[0][1]* x_vals
plt.plot(x_vals, y_vals, '--')

Resources