I'm trying to estimate the parameters (v, n, k) defined in fit_func. I tried the default least squares fit but I couldn't find the parameters successfully.
def fit_func(x, v, n, k):
return v * x ** n / (k ** n + x ** n)
x = [2.5, 2.71317829, 4.08, 4.18604651, 5.19379845, 6.92,
7.98449612, 8.94, 9.92248062, 9.94, 12.36, 13.48837209]
y = [0.16054661, 0.14643943, 0.11639118, 0.11796543, 0.15609638, 0.29527088,
0.40774818, 0.51331307, 0.6163489, 0.61807529, 0.78372639, 0.78643515]
popt, pcov = curve_fit(fit_func, x, y)
print(popt)
plt.plot(x, y, '*')
plt.plot(x, fit_func(x, *popt), 'r')
plt.show()
I get the following error:
raise RuntimeError("Optimal parameters not found: " + errmsg)
RuntimeError: Optimal parameters not found: Number of calls to function has reached maxfev = 800.
I'm not sure if I have selected the right method.
Suggestions on alternate methods that I could use to estimate the parameters will be really helpful.
The function y(x) = v * x ** n / (k ** n + x ** n) = v / (k ** n * x ** (-n) + 1) is strictly increasing or decreasing not both. This is not the shape convenient for the data : See the figure below. This can be a cause of bad fitting.
Another possible cause of failure might be the initial values of the parameters v , k , n which have to be set in order to start the iterative computation for non-linear regression.
Just looking at the graph of the points distribution one can see that a cubic function would be more convenient. This is much simpler because the regression is linear and doesn't require initial guess of the parameters. The fitting is very good.
Related
trying to implement custom MLE for binomial distribution (for learning purpose) stuck with implantation of binomial coefficient in google JAX . there is no analog for scipy.special.binom() implemented.
what shall i use instead ?
The binomial coefficient for general real-valued inputs can be computed in terms of the gamma function, which is available in JAX via jax.scipy.special.gammaln. Here's one way you could define it:
def binom(x, y):
return jnp.exp(gammaln(x + 1) - gammaln(y + 1) - gammaln(x - y + 1))
Here is a (sequential) integer implementation using JAX.
def binom_int_seq(x : int, y : int):
def scan_body(carry, values):
n, d = values
carry = (carry*n)//d
return carry, None
y = max(y, x-y)
nd = jnp.concatenate(
(jnp.arange(y+2, x+1, dtype = 'u8')[:,None],
jnp.arange(2, x-y+1, dtype = 'u8')[:,None],),
axis = 1
)
bc, *_ = jax.lax.scan(scan_body, jnp.array(y+1, dtype = 'u8'), nd)
return bc
binom_int_seq_jit = jax.jit(binom_int_seq, static_argnums = (0, 1))
which gives
x, y = 60, 31
bc_ref = sp.special.comb(x, y, exact=True)
# 114449595062769120
binom_int_seq(x, y)-bc_ref
# DeviceArray(0, dtype=uint64)
# Using above logarithmic gamma function based implementation
binom(x, y)-bc_ref
# DeviceArray(496., dtype=float64, weak_type=True)
Keep in mind the binom_int_seq implementation is only correct if
(x-max(x-y, y))*sp.special.comb(x, y, exact=True) < jnp.iinfo(jnp.uint64).max
Unlike the real-valued version, the error will be sudden and catastrophic if this condition is not satisfied.
There may be other ways to increase this constraint, such as running cancellations based upon prime factorisation, without resorting to larger unsigned integers (/arbitrary precision).
A monoidal version could be implemented which computes the binomial coefficient numerator and denominator reductions then integer divides, but this places stricter constraints on the maximum arguments.
I'm trying to implement vectorized logistic regression on the Iris dataset. This is the implementation from Andrew Ng's youtube series on deep learning. My best predictions using this method have been 81% accuracy while sklearn's implementation achieves 100% with completely different values for coefficients and bias. Also, I cant seem to get get proper outputs from my cost function. I suspect it is an issue with computing the gradients of the weights and bias with respect to the cost function though in the course he provides all of the necessary equations ( unless there is something in the actual exercise which I don't have access to being left out.) My code is as follows.
n = 4
m = 150
y = y.reshape(1, 150)
X = X.reshape(4, 150)
W = np.zeros((4, 1))
b = np.zeros((1,1))
for epoch in range(1000):
Z = np.dot(W.T, X) + b
A = sigmoid(Z) # 1/(1 + e **(-Z))
J = -1/m * np.sum(y * np.log(A) + (1-y) * (1 - np.log(A))) #cost function
dz = A - y
dw = 1/m * np.dot(X, dz.T)
db = np.sum(dz)
W = W - 0.01 * dw
b = b - 0.01 * db
if epoch % 100 == 0:
print(J)
My output looks something like this.
-1.6126604413879289
-1.6185960074767125
-1.6242504226045396
-1.6296400635926438
-1.6347800862216104
-1.6396845400653066
-1.6443664703028427
-1.648838008214648
-1.653110451818512
-1.6571943378913891
W and b values are:
array([[-0.68262679, -1.56816916, 0.12043066, 1.13296948]])
array([[0.53087131]])
Where as sklearn outputs:
(array([[ 0.41498833, 1.46129739, -2.26214118, -1.0290951 ]]),
array([0.26560617]))
I understand sklearn uses L2 regularization but even when turned off it's still far from the correct values. Any help would be appreciated. Thanks
You are likely getting strange results because you are trying to use logistic regression where y is not a binary choice. Categorizing the iris data is a multiclass problem, y can be one of three values:
> np.unique(iris.target)
> array([0, 1, 2])
The cross entropy cost function expects y to either be one or zero. One way to handle this is the one vs all method.
You can check each class by making y a boolean of whether the iris in in one class or not. For example here you can make y a data set of either class 1 or not:
y = (iris.target == 1).astype(int)
With that your cost function and gradient descent should work, but you'll need to run it multiple times and pick the best score for each example. Andrew Ng's class talks about this method.
EDIT:
It's not clear what you are starting with for data. When I do this, don't reshape the inputs. So you should double check that all your multiplication is delivering the shapes you want. On thing I notice that's a little odd, is the last term in your cost function. I generally do this:
cost = -1/m * np.sum(Y*np.log(A) + (1-Y) * np.log(1-A))
not:
-1/m * np.sum(y * np.log(A) + (1-y) * (1 - np.log(A)))
Here's code that converges for me using the dataset from sklearn:
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
# Iris is a multiclass problem. Here, just calculate the probabily that
# the class is `iris_class`
iris_class = 0
Y = np.expand_dims((iris.target == iris_class).astype(int), axis=1)
# Y is now a data set of booleans indicating whether the sample is or isn't a member of iris_class
# initialize w and b
W = np.random.randn(4, 1)
b = np.random.randn(1, 1)
a = 0.1 # learning rate
m = Y.shape[0] # number of samples
def sigmoid(Z):
return 1/(1 + np.exp(-Z))
for i in range(1000):
Z = np.dot(X ,W) + b
A = sigmoid(Z)
dz = A - Y
dw = 1/m * np.dot(X.T, dz)
db = np.mean(dz)
W -= a * dw
b -= a * db
cost = -1/m * np.sum(Y*np.log(A) + (1-Y) * np.log(1-A))
if i%100 == 0:
print(cost)
I would like to change the distance used by KNeighborsClassifier from sklearn. By distance I mean the one in the feature space to see who are the neighbors of a point. More specifically, I want to use the following distance:
d(X1,X2) = 0.1 * |X1[0] - X2[0]| + 0.9*|X1[1] - X2[1]|
Thank you.
Just define your custom metric like this:
def mydist(X1, X2):
return 0.1 * abs(X1[0] - X2[0]) + 0.9*abs(X1[1] - X2[1])
Then initialize your KNeighboursClassifier using the metric parameter like this
clf = KNeighborsClassifier(n_neighbors=3,metric=mydist,)
You can read more about distances available in sklearn and custom distance measures here
Just be sure that according to the official documentation, your custom metric should follow the following properties
Non-negativity: d(x, y) >= 0
Identity: d(x, y) = 0 if and only if x == y
Symmetry: d(x, y) = d(y, x)
Triangle Inequality: d(x, y) + d(y, z) >= d(x, z)
Here is an example of custom metric as well.
I'm new to python (and programming in general) and want to make a polynomial fit using curve_fit, where the order of the polynomials (or the number of fit parameters) is variable.
I made this code which is working for a fixed number of 3 parameters a,b,c
# fit function
def fit_func(x, a,b,c):
p = np.polyval([a,b,c], x)
return p
# do the fitting
popt, pcov = curve_fit(fit_func, x_data, y_data)
But now I'd like to have my fit function to only depend on a number N of parameters instead of a,b,c,....
I'm guessing that's not a very hard thing to do, but because of my limited knowledge I can't get it work.
I've already looked at this question, but I wasn't able to apply it to my problem.
You can define the function to be fit to your data like this:
def fit_func(x, *coeffs):
y = np.polyval(coeffs, x)
return y
Then, when you call curve_fit, set the argument p0 to the initial guess of the polynomial coefficients. For example, this plot is generated by the script that follows.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
# Generate a sample input dataset for the demonstration.
x = np.arange(12)
y = np.cos(0.4*x)
def fit_func(x, *coeffs):
y = np.polyval(coeffs, x)
return y
fit_results = []
for n in range(2, 6):
# The initial guess of the parameters to be found by curve_fit.
# Warning: in general, an array of ones might not be a good enough
# guess for `curve_fit`, but in this example, it works.
p0 = np.ones(n)
popt, pcov = curve_fit(fit_func, x, y, p0=p0)
# XXX Should check pcov here, but in this example, curve_fit converges.
fit_results.append(popt)
plt.plot(x, y, 'k.', label='data')
xx = np.linspace(x.min(), x.max(), 100)
for p in fit_results:
yy = fit_func(xx, *p)
plt.plot(xx, yy, alpha=0.6, label='n = %d' % len(p))
plt.legend(framealpha=1, shadow=True)
plt.grid(True)
plt.xlabel('x')
plt.show()
The parameters of polyval specify p is an array of coefficients from the highest to lowest. With x being a number or array of numbers to evaluate the polynomial at. It says, the following.
If p is of length N, this function returns the value:
p[0]*x**(N-1) + p[1]*x**(N-2) + ... + p[N-2]*x + p[N-1]
def fit_func(p,x):
z = np.polyval(p,x)
return z
e.g.
t= np.array([3,4,5,3])
y = fit_func(t,5)
503
which is if you do the math here is right.
I created a linear regression algorithm following a tutorial and applied it to the data-set provided and it works fine. However the same algorithm does not work on another similar data-set. Can somebody tell me why this happens?
def computeCost(X, y, theta):
inner = np.power(((X * theta.T) - y), 2)
return np.sum(inner) / (2 * len(X))
def gradientDescent(X, y, theta, alpha, iters):
temp = np.matrix(np.zeros(theta.shape))
params = int(theta.ravel().shape[1])
cost = np.zeros(iters)
for i in range(iters):
err = (X * theta.T) - y
for j in range(params):
term = np.multiply(err, X[:,j])
temp[0, j] = theta[0, j] - ((alpha / len(X)) * np.sum(term))
theta = temp
cost[i] = computeCost(X, y, theta)
return theta, cost
alpha = 0.01
iters = 1000
g, cost = gradientDescent(X, y, theta, alpha, iters)
print(g)
On running the algo through this dataset I get the output as matrix([[ nan, nan]]) and the following errors:
C:\Anaconda3\lib\site-packages\ipykernel\__main__.py:2: RuntimeWarning: overflow encountered in power
from ipykernel import kernelapp as app
C:\Anaconda3\lib\site-packages\ipykernel\__main__.py:11: RuntimeWarning: invalid value encountered in double_scalars
However this data set works just fine and outputs matrix([[-3.24140214, 1.1272942 ]])
Both the datasets are similar, I have been over it many times but can't seem to figure out why it works on one dataset but not on other. Any help is welcome.
Edit: Thanks Mark_M for editing tips :-)
[Much better question, btw]
It's hard to know exactly what's going on here, but basically your cost is going the wrong direction and spiraling out of control, which results in an overflow when you try to square the value.
I think in your case it boils down to your step size (alpha) being too big which can cause gradient descent to go the wrong way. You need to watch the cost in gradient descent and makes sure it's always going down, if it's not either something is broken or alpha is to large.
Personally, I would reevaluate the code and try to get rid of the loops. It's a matter of preference, but I find it easier to work with X and Y as column vectors. Here is a minimal example:
from numpy import genfromtxt
# this is your 'bad' data set from github
my_data = genfromtxt('testdata.csv', delimiter=',')
def computeCost(X, y, theta):
inner = np.power(((X # theta.T) - y), 2)
return np.sum(inner) / (2 * len(X))
def gradientDescent(X, y, theta, alpha, iters):
for i in range(iters):
# you don't need the extra loop - this can be vectorize
# making it much faster and simpler
theta = theta - (alpha/len(X)) * np.sum((X # theta.T - y) * X, axis=0)
cost = computeCost(X, y, theta)
if i % 10 == 0: # just look at cost every ten loops for debugging
print(cost)
return (theta, cost)
# notice small alpha value
alpha = 0.0001
iters = 100
# here x is columns
X = my_data[:, 0].reshape(-1,1)
ones = np.ones([X.shape[0], 1])
X = np.hstack([ones, X])
# theta is a row vector
theta = np.array([[1.0, 1.0]])
# y is a columns vector
y = my_data[:, 1].reshape(-1,1)
g, cost = gradientDescent(X, y, theta, alpha, iters)
print(g, cost)
Another useful technique is to normalize your data before doing regression. This is especially useful when you have more than one feature you're trying to minimize.
As a side note - if you're step size is right you shouldn't get overflows no matter how many iterations you do because the cost will will decrease with every iteration and the rate of decrease will slow.
After 1000 iterations I arrived at a theta and cost of:
[[ 1.03533399 1.45914293]] 56.041973778
after 100:
[[ 1.01166889 1.45960806]] 56.0481988054
You can use this to look at the fit in an iPython notebook:
%matplotlib inline
import matplotlib.pyplot as plt
plt.scatter(my_data[:, 0].reshape(-1,1), y)
axes = plt.gca()
x_vals = np.array(axes.get_xlim())
y_vals = g[0][0] + g[0][1]* x_vals
plt.plot(x_vals, y_vals, '--')