I am trying to train a multi-class classifier with multinomial logistic regression and gradient descent. Specifically, the model will have a trained weights matrix w with shape (C, D) where C is the number of classes and D is the number of features of each input. Also, we will have a bias vector b with dimension (C,). We have an (N, D) input matrix X, where N is the number of training inputs, and a vector y with shape (N,), where each entry in y is a number from 0 to C - 1, indicating which class the input belongs to. I have written the following code:
for _ in range(max_iterations):
z = np.apply_along_axis(lambda v: v - max(v), 1, X # w.T + b)
probs = np.exp(z)
denom = np.sum(probs, axis=1)
for i in range(C):
for j in range(N):
if i == y[j]:
w[i] -= (step_size / N) * ((probs[j][i] / denom[j]) - 1) * X[j]
b[i] -= (step_size / N) * ((probs[j][i] / denom[j]) - 1)
else:
w[i] -= (step_size / N) * (probs[j][i] / denom[j]) * X[j]
b[i] -= (step_size / N) * (probs[j][i] / denom[j])
This produces the correct weights and bias that I want, but clearly it doesn't take advantage of numpy's operations to speed things up. So I tried to speed some of it up with the following code:
for _ in range(max_iterations):
z = np.apply_along_axis(lambda v: v - max(v), 1, X # w.T + b)
probs = np.exp(z)
denom = np.sum(probs, axis=1)
s = np.zeros((N, C))
for i in range(N):
s[i] = probs[i] / denom[i]
for i in range(N):
s[i][y[i]] += -1
for c in range(C):
grad_w = s.T[c] # X
w[c] += (step_size / N) * grad_w
b[c] += (step_size / N) * sum(s.T[c])
I was hoping that this would produce the same results as in the previous part while being faster... and it managed to be faster, but with incorrect results.
So I have a couple of questions. First, why is my second piece of code not producing the right results, and what would be a fix for it? Second, and more importantly, how would I optimize this further? This is mainly for me to learn how to take advantage of numpy's vectorized operations.
This may help with some of the iterations.
Start with a small 2d array:
In [251]: probs = np.arange(12).reshape(3,4)
In [252]: denom = np.sum(probs, axis=1)
In [253]: denom
Out[253]: array([ 6, 22, 38])
To divide a (3,4) array by a (3,), we need to make the later (3,1):
In [254]: probs/denom[:,None]
Out[254]:
array([[0. , 0.16666667, 0.33333333, 0.5 ],
[0.18181818, 0.22727273, 0.27272727, 0.31818182],
[0.21052632, 0.23684211, 0.26315789, 0.28947368]])
Read, and reread, the numpy documentation on broadcasting if that doesn't make sense.
Another way to get the required 2d denom, is:
In [255]: denom = np.sum(probs, axis=1, keepdims=True)
In [256]: denom
Out[256]:
array([[ 6],
[22],
[38]])
In [257]: probs/denom
Out[257]:
array([[0. , 0.16666667, 0.33333333, 0.5 ],
[0.18181818, 0.22727273, 0.27272727, 0.31818182],
[0.21052632, 0.23684211, 0.26315789, 0.28947368]])
The same should work for the max subtraction that you use with apply_along_axis. apply... is not a speed tool, and not superior to simple iteration.
In [258]: np.max(probs, axis=1, keepdims=True)
Out[258]:
array([[ 3],
[ 7],
[11]])
In [259]: probs - _
Out[259]:
array([[-3, -2, -1, 0],
[-3, -2, -1, 0],
[-3, -2, -1, 0]])
Related
I need to solve a non linear optimization problem in Python. I found out that scipy solves optimization problems, however I don't know what I am doing wrong since with some example input it can't find the correct solution that I have in NEOS server solver Knitro AMPL.
My problem is that, given a set of points it must find the biggest ellipse inscribed that at max touches those points and the points are never included inside of it.
Theory
Formulating the optimization problem, I have a and b the semiaxis, phi the rotation, xc and yc the coordinates of the centre and points the list of points with each element in the form of [x, y] -> [0, 1] indices.
On paper the problem and the constraints are these, a, b, phi, xc, yc are real, the points are integers:
NEOS
The files I used in NEOS are these:
mod
dat
run
With successful results (complete):
xc = 143.012
yc = 262.634
a = 181.489
b = 140.429
phi = 1.43575
Python
So, my python code is this, it is my first time using scipy for optimization, so I don't exclude errors of understanding how it works from the documentation.
from typing import List
import numpy as np
from scipy.optimize import *
def ellipse_calc(
points: List[List[int]],
verbose: bool = False
):
centre = [0, 0]
for i in range(len(points)):
centre[0] += points[i][0]
centre[1] += points[i][1]
centre[0] /= len(points)
centre[1] /= len(points)
if verbose:
print(f'centre: {centre[0]:.2f}, {centre[1]:.2f}')
max_x = max([p[0] for p in points])
max_y = max([p[1] for p in points])
min_x = min([p[0] for p in points])
min_y = min([p[1] for p in points])
initial_axis = 0.25 * (max_x - min_x + max_y - min_y)
if verbose:
print(initial_axis)
constraints = [
NonlinearConstraint(lambda x: x[0], 1, np.inf),
NonlinearConstraint(lambda x: x[1], 1, np.inf),
NonlinearConstraint(lambda x: x[2], 0, np.inf),
]
for i in range(len(points)):
constraints += [NonlinearConstraint(
lambda x:
(points[i][0] - x[3]) ** 2 * (np.cos(x[2]) ** 2 / x[0]**2 + np.sin(x[2]) ** 2 / x[1]**2) +
(points[i][1] - x[4]) ** 2 * (np.sin(x[2]) ** 2 / x[0]**2 + np.cos(x[2]) ** 2 / x[1]**2) +
2 * (points[i][0] - x[3]) * (points[i][1] - x[4]) *
np.cos(x[2]) * np.sin(x[2]) * (1 / x[1]**2 - 1 / x[0]**2), 1, np.inf)]
result = minimize(
lambda x: -np.pi * x[0] * x[1],
[initial_axis, initial_axis, 0, centre[0], centre[1]],
constraints=constraints
)
print(result)
if __name__ == '__main__':
points = [[50,44],[91,44],[161,44],[177,44],[44,88],[189,88],[239,88],[259,88],[2,132],[250,132],[2,176],[329,176],[2,220],[289,220],[2,264],[288,264],[2,308],[277,308],[2,352],[285,352],[2,396],[25,396],[35,396],[231,396],[284,396],[298,396],[36,440],[76,440],[106,440],[173,440]]
ellipse_calc(points, True)
This try, that has the same data I tried on NEOS gives as output the following:
fun: -8.992626773255127e+40
jac: array([-5.68832805e+20, -4.96651566e+20, -0.00000000e+00, -0.00000000e+00,
-0.00000000e+00])
message: 'Inequality constraints incompatible'
nfev: 54
nit: 10
njev: 9
status: 4
success: False
x: array([ 1.58089104e+20, 1.81065104e+20, -1.24564497e+15, -1.55647883e+10,
-2.76654483e+10])
Does anyone know what I am doing wrong and how to fix it? Also, I don't really know if it is possible to solve this problem with scipy, in that case I am looking for a free library to solve it or even to alternative methods of finding that ellipse equation
This isn't a complete answer, but it should help you to get started. Here are two hints:
Pass simple box constraints on the variables as boundaries, not as constraints. That is, use
bounds = [(1, None), (1, None), (0, None), (None, None), (None, None)]
and pass it to minimize via the bounds parameter.
You need to be really careful when defining constraints through lambda expressions inside a loop, see here. You need to capture the loop variable i by lambda x, i=i: your_fun. Otherwise, each of your constraints uses i=29 and thus evaluates the last point. This can easily be observed by evaluating all constraints for a specific value.
Then you should at least get a feasible solution with an objective value of 79384. Note also that you can shorten your code significantly by using numpy functions instead of loops.
I have a non-generated 1D NumPy array. For now, we will use a generated one.
import numpy as np
arr1 = np.random.uniform(0, 100, 1_000)
I need an array that will be correlated 0.3 with it:
arr2 = '?'
print(np.corrcoef(arr1, arr2))
Out[1]: 0.3
I've adapted this answer by whuber on stats.SE to NumPy. The idea is to generate a second array noise randomly, and then compute the residuals of a least-squares linear regression of noise on arr1. The residuals necessarily have a correlation of 0 with arr1, and of course arr1 has a correlation of 1 with itself, so an appropriate linear combination of a*arr1 + b*residuals will have any desired correlation.
import numpy as np
def generate_with_corrcoef(arr1, p):
n = len(arr1)
# generate noise
noise = np.random.uniform(0, 1, n)
# least squares linear regression for noise = m*arr1 + c
m, c = np.linalg.lstsq(np.vstack([arr1, np.ones(n)]).T, noise)[0]
# residuals have 0 correlation with arr1
residuals = noise - (m*arr1 + c)
# the right linear combination a*arr1 + b*residuals
a = p * np.std(residuals)
b = (1 - p**2)**0.5 * np.std(arr1)
arr2 = a*arr1 + b*residuals
# return a scaled/shifted result to have the same mean/sd as arr1
# this doesn't change the correlation coefficient
return np.mean(arr1) + (arr2 - np.mean(arr2)) * np.std(arr1) / np.std(arr2)
The last line scales the result so that the mean and standard deviation are the same as arr1's. However, arr1 and arr2 will not be identically distributed.
Usage:
>>> arr1 = np.random.uniform(0, 100, 1000)
>>> arr2 = generate_with_corrcoef(arr1, 0.3)
>>> np.corrcoef(arr1, arr2)
array([[1. , 0.3],
[0.3, 1. ]])
I am not very experienced with sympy so sorry if this is a simple question.
How can I use sympy to expand binomial expressions? For example say I want to have sympy compute the coefficient of $x^2$ in the polynomial $(x^2 + x + 1)^n$ (where I would expect the answer to be $n + \binom{n}{2}$).
I tried the following code:
x = symbols('x')
n = symbols('n', integer=True, nonnegative = True)
expand((x**2+x+1)**n)
but the result is just $(x^2+x+1)^n$ whereas I would want the binomial expansion, i.e .
Thanks in advance.
If the exponent is not symbolic then the following gives the coefficient very quickly for the power of an arbitrary polynomial with integer coefficients, e.g.,
>>> eq
x**3 + 3*x + 2
>>> (Poly(eq)**42).coeff_monomial(x**57)
2294988464559317378977138572972
But there is currently no routine to indicate the coefficient if the exponent of the polynomial is symbolic. rsolve can also be used to express the closed form if a pattern can be seen in the coefficient, too:
>>> print([((x**2+x+1)**i).expand().coeff(x**2) for i in range(8)])
[0, 1, 3, 6, 10, 15, 21, 28]
>>> from sympy.abc import n
>>> f=Function('f') # f(n) represents the coefficient of x**2 for a given n
The coefficient for x^2 for a given n is n more than the last value:
>>> rsolve(f(n)-f(n-1)-n, f(n),{f(0):0,f(1):1})
n*(n + 1)/2
This final expression is the coefficient of x^2 for arbitrary n.
Issue 17889 gives a routine that will compute the coefficient of a term in a univariate polynomial (with arbitrary coefficients for each term) raised to the power of n:
>>> eq = 2 + x + x**2
>>> unicoeff(eq, 4).simplify()
Piecewise(
(0, n < 2),
(2**(n - 3)*n*(n - 1), n < 3),
(2**(n - 4)*n**2*(n - 1), n < 4),
(2**n*n*(n - 1)*(n**2 + 19*n + 6)/384, True))
>>> _.subs(n, 5)
210
>>> (eq**5).expand().coeff(x**4)
210
For your expression (where the constant is 1):
>>> unicoeff(1+x+x**2,2).simplify()
Piecewise((0, n < 1), (n, n < 2), (n*(n + 1)/2, True))
I'm trying to convert some code I have written in numpy which contains a nested-loop into tensor operations found in PyTorch. However, after trying to implement my own version I'm not getting the same value on the output. I have managed to do the same with a single loop, so I'm not entirely sure what I'm doing wrong.
#(Numpy Version)
#calculate Kinetic Energy
summation = 0.0
for i in range(0,len(k_values)-1):
summation += (k_values[i]**2.0)*wavefp[i]*(((self.hbar*kp_values[i])**2.0)/(2.0*self.mu))*wavef[i]
Ek = step*(4.0*np.pi)*summation
#(Numpy Version)
#calculate Potential Energy
summation = 0.0
for i in range(0,len(k_values)-1):
for j in range(0,len(kp_values)-1):
summation+= (k_values[i]**2.0)*wavefp[i]*(kp_values[j]**2.0)*wavef[j]*self.MTV[i,j]
Ep = (step**2.0)*(4.0*np.pi)*(2.0/np.pi)*summation
#####################################################
#(PyTorch Version)
#calcualte Kinetic Energy
Ek = step*(4.0*np.pi)*torch.sum( k_values.pow(2)*wavefp.mul(wavef)*((kp_values.mul(self.hbar)).pow(2)/(2.0*self.mu)) )
#(PyTorch Version)
#calculate Potential Energy
summation = 0.0
for i in range(0,len(k_values)-1):
summation += ((k_values[i].pow(2)).mul(wavefp[i]))*torch.sum( (kp_values.pow(2)).mul(wavef).mul(self.MTV[i,:]) )
Ep = (step**2.0)*(4.0*np.pi)*(2.0/np.pi)*summation
The arrays/tensors k_values, kp_values, wavef, and wavefp have dimensions of (1000,1). The values self.hbar, and self.mu, and step are scalars. The variable self.MTV is a matrix of size (1000,1000).
I would expect that both methods would give the same output but they don't. The code for calculating the Kinetic Energy (in both Numpy and PyTorch) give the same value. However, the potential energy calculation differ, and I'm not entirely sure why.
Many Thanks in advance!
The problem is in the shapes. You have kp_values and wavef in (1000, 1) which needs to be converted to (1000, ) before the multiplications. The outcome of (kp_values.pow(2)).mul(wavef).mul(MTV[i,:]) is a matrix but you asummed it is a vector.
So, the following should work.
summation += ((k_values[i].pow(2)).mul(wavefp[i]))*torch.sum((kp_values.squeeze(1)
.pow(2)).mul(wavef.squeeze(1)).mul(MTV[i,:]))
And a loop-free Numpy and PyTorch solution would be:
step = 1.0
k_values = np.random.randint(0, 100, size=(1000, 1)).astype("float") / 100
kp_values = np.random.randint(0, 100, size=(1000, 1)).astype("float") / 100
wavef = np.random.randint(0, 100, size=(1000, 1)).astype("float") / 100
wavefp = np.random.randint(0, 100, size=(1000, 1)).astype("float") / 100
MTV = np.random.randint(0, 100, size=(1000, 1000)).astype("float") / 100
# Numpy solution
term1 = k_values**2.0 * wavefp # 1000 x 1
temp = kp_values**2.0 * wavef # 1000 x 1
term2 = np.matmul(temp.transpose(1, 0), MTV).transpose(1, 0) # 1000 x 1000
summation = np.sum(term1 * term2)
print(summation)
# PyTorch solution
term1 = k_values.pow(2).mul(wavefp) # 1000 x 1
term2 = kp_values.pow(2).mul(wavef).transpose(0, 1).matmul(MTV) # 1000 x 1000
summation = torch.sum(term2.transpose(0, 1).mul(term1)) # 1000 x 1000
print(summation.item())
Output
12660.407492918514
12660.407492918514
EDIT: I already made significant progress. My current question is written after my last edit below and can be answered without the context.
I currently follow Andrew Ng's Machine Learning Course on Coursera and tried to implement logistic regression today.
Notation:
X is a (m x n)-matrix with vectors of input variables as rows (m training samples of n-1 variables, the entries of the first column are equal to 1 everywhere to represent a constant).
y is the corresponding vector of expected output samples (column vector with m entries equal to 0 or 1)
theta is the vector of model coefficients (row vector with n entries)
For an input row vector x the model will predict the probability sigmoid(x * theta.T) for a positive outcome.
This is my Python3/numpy implementation:
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
vec_sigmoid = np.vectorize(sigmoid)
def logistic_cost(X, y, theta):
summands = np.multiply(y, np.log(vec_sigmoid(X*theta.T))) + np.multiply(1 - y, np.log(1 - vec_sigmoid(X*theta.T)))
return - np.sum(summands) / len(y)
def gradient_descent(X, y, learning_rate, num_iterations):
num_parameters = X.shape[1] # dim theta
theta = np.matrix([0.0 for i in range(num_parameters)]) # init theta
cost = [0.0 for i in range(num_iterations)]
for it in range(num_iterations):
error = np.repeat(vec_sigmoid(X * theta.T) - y, num_parameters, axis=1)
error_derivative = np.sum(np.multiply(error, X), axis=0)
theta = theta - (learning_rate / len(y)) * error_derivative
cost[it] = logistic_cost(X, y, theta)
return theta, cost
This implementation seems to work fine, but I encountered a problem when calculating the logistic-cost. At some point the gradient descent algorithm converges to a pretty good fitting theta and the following happens:
For some input row X_i with expected outcome 1 X * theta.T will become positive with a good margin (for example 23.207). This will lead to sigmoid(X_i * theta) to become exactly 1.0000 (this is because of lost precision I think). This is a good prediction (since the expected outcome is equal to 1), but this breaks the calculation of the logistic cost, since np.log(1 - vec_sigmoid(X*theta.T)) will evaluate to NaN. This shouldn't be a problem, since the term is multiplied with 1 - y = 0, but once a value of NaN occurs, the whole calculation is broken (0 * NaN = NaN).
How should I handle this in the vectorized implementation, since np.multiply(1 - y, np.log(1 - vec_sigmoid(X*theta.T))) is calculated in every row of X (not only where y = 0)?
Example input:
X = np.matrix([[1. , 0. , 0. ],
[1. , 1. , 0. ],
[1. , 0. , 1. ],
[1. , 0.5, 0.3],
[1. , 1. , 0.2]])
y = np.matrix([[0],
[1],
[1],
[0],
[1]])
Then theta, _ = gradient_descent(X, y, 10000, 10000) (yes, in this case we can set the learning rate this large) will set theta as:
theta = np.matrix([[-3000.04008972, 3499.97995514, 4099.98797308]])
This will lead to vec_sigmoid(X * theta.T) to be the really good prediction of:
np.matrix([[0.00000000e+00], # 0
[1.00000000e+00], # 1
[1.00000000e+00], # 1
[1.95334953e-09], # nearly zero
[1.00000000e+00]]) # 1
but logistic_cost(X, y, theta) evaluates to NaN.
EDIT:
I came up with the following solution. I just replaced the logistic_cost function with:
def new_logistic_cost(X, y, theta):
term1 = vec_sigmoid(X*theta.T)
term1[y == 0] = 1
term2 = 1 - vec_sigmoid(X*theta.T)
term2[y == 1] = 1
summands = np.multiply(y, np.log(term1)) + np.multiply(1 - y, np.log(term2))
return - np.sum(summands) / len(y)
By using the mask I just calculate log(1) at the places at which the result will be multiplied with zero anyway. Now log(0) will only happen in wrong implementations of gradient descent.
Open questions: How can I make this solution more clean? Is it possible to achieve a similar effect in a cleaner way?
If you don't mind using SciPy, you could import expit and xlog1py from scipy.special:
from scipy.special import expit, xlog1py
and replace the expression
np.multiply(1 - y, np.log(1 - vec_sigmoid(X*theta.T)))
with
xlog1py(1 - y, -expit(X*theta.T))
I know it is an old question but I ran into the same problem, and maybe it can help others in the future, I actually solved it by implementing normalization on the data before appending X0.
def normalize_data(X):
mean = np.mean(X, axis=0)
std = np.std(X, axis=0)
return (X-mean) / std
After this all worked well!