how to fix an 'invalid value encountered in power' error? - python-3.x

I am writing a function to evaluate and return a non linear system of equations, and give the jacobian. I then plan to call the function in a while loop to use the newton method to solve the system of equations.
I used the numpy package and read over its documentation, tried to limit the number of iterations, changed the dtype in the array and searched online to see if someone else had a similar problem.
This function is meant to solve a neoclassical growth model (a problem in macroeconomics) in finite time , T. The set of equations include T euler equations, T constraints, and one terminal condition. Thus the result should be an array of length 2T+1 containing the values of the equations, and a (2T+1)x(2T+1) jacobian matrix.
When I try to run the function for small array (arrays of length 1, and 3) it works perfectly. As soon as I try an array of length 5 or more, I start encountering RuntimeWarnings.
import numpy as np
def solver(args, params):
b,s,a,d = params[0], params[1], params[2], params[3]
guess = np.copy(args)
#Euler
euler = guess[:len(guess)//2]**(-sigma) - beta*guess[1:len(guess)//2+1]**(-sigma)*(1-delta+alpha*guess[len(guess)//2+1:]**(alpha-1))
#Budget Constraint
kzero_to_T = np.concatenate(([k0], guess[len(guess)//2+1:]))
bc_t = guess[:len(guess)//2] + guess[len(guess)//2+1:] - kzero_to_T[:-1]**alpha - (1-delta)*kzero_to_T[:-1]
bc_f = guess[len(guess)//2] -kzero_to_T[-1]**alpha - kzero_to_T[-1]*(1-delta)
bc = np.hstack((bc_t, bc_f))
Evals = np.concatenate((euler, bc))
# top half of the jacobian
jac_dot_5 = np.zeros((len(args)//2, len(args)))
for t in range(len(args)//2):
for i in range(len(args)):
if t == i and len(args)//2+(i+1)<=len(args):
jac_dot_5[t][t] = -sigma*args[t]**(-sigma-1)
jac_dot_5[t][t+1] = sigma*beta*args[t+1]*(1-delta+alpha*args[len(args)//2+(t+1)]**(alpha-1))
jac_dot_5[t][len(args)//2+(t+1)] = beta*args[t+1]**(-sigma)*alpha*(alpha-1)*args[len(args)//2+(t+1)]
# bottom half of the jacobian
jac_dot_1 = np.zeros((len(args)//2, len(args)))
for u in range(len(args)//2):
for v in range(len(args)):
if u==v and u>=1 and (len(args)//2 + u+1 < len(args)):
jac_dot_1[u][u] = 1
jac_dot_1[u][len(args)//2+(u)] = 1
jac_dot_1[u][len(args)//2+(u+1)] = -alpha*args[len(args)//2 + (u+1)]**(alpha-1) -(1-delta)
jac_dot_1[0][0] = 1
jac_dot_1[0][len(args)//2 +1] = 1
# last row of the jacobian
final_bc = np.zeros((1,len(args)))
final_bc[0][len(args)//2] = 1
final_bc[0][-1] = -alpha*args[-1]**(alpha-1) -(1-delta)
jac2Tn1 = np.concatenate((jac_dot_5, jac_dot_1, final_bc), axis=0)
point = coll.namedtuple('point', ['Output', 'Jacobian', 'Euler', 'BC'])
result = point(Output = Evals, Jacobian = jac2Tn1, Euler=euler, BC=bc )
return result
The code for implementing the algorithm:
p = (beta, sigma, alpha, delta)
for i in range(20):
k0 = np.linspace(2.49, 9.96, 20)[i]
vars0 = np.array([1,1,1,1,1], dtype=float)
vars1 = np.array([20,20,20,20,20], dtype=float)
Iter2= 0
while abs(solver(vars1,p).Output).max()>1e-8 and Iter2<300:
Iter2+=1
inv_jac1 = np.linalg.inv(solver(vars0,p).Jacobian)
vars1 = vars0 - inv_jac1#solver(vars0,p).Output
vars0=vars1
if Iter2 == 100:
break
I expect the output to be vars1 containing the updated values. The actual output is array([nan, nan, nan, nan, nan]). The way the function has been written, it should be able to give the output for inputs of arbitrary guesses of length 2T+1, where T is number of periods of time.
I get three error messages during the execution of the loop:
C:\Users\Peter\Anaconda3\lib\site-packages\ipykernel_launcher.py:19: RuntimeWarning: invalid value encountered in power
C:\Users\Peter\Anaconda3\lib\site-packages\ipykernel_launcher.py:23: RuntimeWarning: invalid value encountered in power
C:\Users\Peter\Anaconda3\lib\site-packages\ipykernel_launcher.py:41: RuntimeWarning: invalid value encountered in double_scalars
I tried to code my issue from scratch and I couldn't make it any shorter- I need both, the evaluations of the equations and the jacobian to implement the algorithm. From my testing it looks like at some point the equation results (the solver(vars0,p).Output entry) become nan, but I am not sure why that would happen, the array should get close to 0, per the condition abs(solver(vars1,p).Output).max()>1e-8 and then just break out of the loop.

Related

Scipy Optimize Basin Hopping fails

I am working on a cost minimizing function to help with allocation/weights in a portfolio of stocks. I have the following code for the "Objective Function". This works when I tried it with 15 variables(stocks). However, when I tried it with 55 stocks it failed.
I have tried it with a smaller sample of stocks(15) and it works fine. The num_assets variable below is the number of stocks in the portfolio.
def get_metrics(weights):
weights = np.array(weights)
returnsR = np.dot(returns_annualR, weights )
volatilityR = np.sqrt(np.dot(weights.T, np.dot(cov_matrixR, weights)))
sharpeR = returnsR / volatilityR
drawdownR = np.multiply(weights, dailyDD).sum(axis=1, skipna =
True).min()
drawdownR = f(drawdownR)
calmarR = returnsR / drawdownR
results = (sharpeR * 0.3) + (calmarR * 0.7)
return np.array([returnsR, volatilityR, sharpeR, drawdownR, calmarR,
results])
def objective(weights):
# the number 5 is the index from the get_metrics array
return get_metrics(weights)[5] * -1
def check_sum(weights):
#return 0 if sum of the weights is 1
return np.sum(weights)-1
bound = (0.0,1.0)
bnds = tuple(bound for x in range (num_assets))
bx = list(bnds)
""" Custom step-function """
class RandomDisplacementBounds(object):
"""random displacement with bounds: see: https://stackoverflow.com/a/21967888/2320035
Modified! (dropped acceptance-rejection sampling for a more specialized approach)
"""
def __init__(self, xmin, xmax, stepsize=0.5):
self.xmin = xmin
self.xmax = xmax
self.stepsize = stepsize
def __call__(self, x):
"""take a random step but ensure the new position is within the bounds """
min_step = np.maximum(self.xmin - x, -self.stepsize)
max_step = np.minimum(self.xmax - x, self.stepsize)
random_step = np.random.uniform(low=min_step, high=max_step, size=x.shape)
xnew = x + random_step
return xnew
bounded_step = RandomDisplacementBounds(np.array([b[0] for b in bx]), np.array([b[1] for b in bx]))
minimizer_kwargs = {"method":"L-BFGS-B", "bounds": bnds}
globmin = sco.basinhopping(objective,
x0=num_assets*[1./num_assets],
minimizer_kwargs=minimizer_kwargs,
take_step=bounded_step,
disp=True)
The output should be an array of numbers that add up to 1 or 100%. However, this is not happening.
This function is a failure on my end as well. It failed to choose values which were lower -- ie., regardless of output from optimization function (negative or positive), it persisted until the parameter I was optimizing was as bad as it could possibly be. I suspect that since the function violates function encapsulation and relies on "function attributes" to adjust stepsize, the developer may not have respected encapsulated function scope elsewhere, and surprising behavior is happening as a result.
Regardless, in terms of theory, anything else is just a (dubious) estimated numerical partial second derivative (numerical Hessian, or "estimated curvature" for us mere mortals) based "performance" "gain", which reduces to a randomly-biased annealer in discrete, chaotic (phase space, continuous) or mixed (continuous and discrete) search spaces with volatile curvatures or planar areas (due to numerical underflow and loss of precision).
Anyways, import the following:
scipy.optimize.dual_anneal
dual anneal

Iterations over 2d numpy arrays with while and for statements

In the code supplied below I am trying to iterate over 2D numpy array [i][k]
Originally it is a code which was written in Fortran 77 which is older than my grandfather. I am trying to adapt it to python.
(for people interested whatabouts: it is a simple hydraulics transients event solver)
Bear in mind that all variables are introduced in my code which I don't paste here.
H = np.zeros((NS,50))
Q = np.zeros((NS,50))
Here I am assigning the first row values:
for i in range(NS):
H[0][i] = HR-i*R*Q0**2
Q[0][i] = Q0
CVP = .5*Q0**2/H[N]
T = 0
k = 0
TAU = 1
#Interior points:
HP = np.zeros((NS,50))
QP = np.zeros((NS,50))
while T<=Tmax:
T += dt
k += 1
for i in range(1,N):
CP = H[k][i-1]+Q[k][i-1]*(B-R*abs(Q[k][i-1]))
CM = H[k][i+1]-Q[k][i+1]*(B-R*abs(Q[k][i+1]))
HP[k][i-1] = 0.5*(CP+CM)
QP[k][i-1] = (HP[k][i-1]-CM)/B
#Boundary Conditions:
HP[k][0] = HR
QP[k][0] = Q[k][1]+(HP[k][0]-H[k][1]-R*Q[k][1]*abs(Q[k][1]))/B
if T == Tc:
TAU = 0
CV = 0
else:
TAU = (1.-T/Tc)**Em
CV = CVP*TAU**2
CP = H[k][N-1]+Q[k][N-1]*(B-R*abs(Q[k][N-1]))
QP[k][N] = -CV*B+np.sqrt(CV**2*(B**2)+2*CV*CP)
HP[k][N] = CP-B*QP[k][N]
for i in range(NS):
H[k][i] = HP[k][i]
Q[k][i] = QP[k][i]
Remember i is for rows and k is for columns
What I am expecting is that for all k number of columns the values should be calculated until T<=Tmax condition is met. I cannot figure out what my mistake is, I am getting the following errors:
RuntimeWarning: divide by zero encountered in true_divide
CVP = .5*Q0**2/H[N]
RuntimeWarning: invalid value encountered in multiply
QP[N][k] = -CV*B+np.sqrt(CV**2*(B**2)+2*CV*CP)
QP[N][k] = -CV*B+np.sqrt(CV**2*(B**2)+2*CV*CP)
ValueError: setting an array element with a sequence.
Looking at your first iteration:
H = np.zeros((NS,50))
Q = np.zeros((NS,50))
for i in range(NS):
H[0][i] = HR-i*R*Q0**2
Q[0][i] = Q0
The shape of H is (NS,50), but when you iterate over a range(NS) you apply that index to the 2nd dimension. Why? Shouldn't it apply to the dimension with size NS?
In numpy arrays have 'C' order by default. Last dimension is inner most. They can have a F (fortran) order, but let's not go there. Thinking of the 2d array as a table, we typically talk of rows and columns, though they don't have a formal definition in numpy.
Lets assume you want to set the first column to these values:
for i in range(NS):
H[i, 0] = HR - i*R*Q0**2
Q[i, 0] = Q0
But we can do the assignment whole rows or columns at a time. I believe new versions of Fortran also have these 'whole-array' functions.
Q[:, 0] = Q0
H[:, 0] = HR - np.arange(NS) * R * Q0**2
One point of caution when translating to Python. Indexing starts with 0; so does ranges and np.arange(...).
H[0][i] is functionally the same as H[0,i]. But when using slices you have to use the H[:,i] format.
I suspect your other iterations have similar problems, but I'll stop here for now.
Regarding the errors:
The first:
RuntimeWarning: divide by zero encountered in true_divide
CVP = .5*Q0**2/H[N]
You initialize H as zeros so it is normal that it complains of division by zero. Maybe you should add a conditional.
The third:
QP[N][k] = -CV*B+np.sqrt(CV**2*(B**2)+2*CV*CP)
ValueError: setting an array element with a sequence.
You define CVP = .5*Q0**2/H[N] and then CV = CVP*TAU**2 which is a sequence. And then you try to assign a derivate form it to QP[N][K] which is an element. You are trying to insert an array to a value.
For the second error I think it might be related to the third. If you could provide more information I would like to try to understand what happens.
Hope this has helped.

python scipy fmin not completing succesfully

I have a function that I am attempting to minimize for multiple values. For some values it terminates successfully however for others the error
Warning: Maximum number of function evaluations has been exceeded.
Is the error that is given. I am unsure of the role of maxiter and maxfun and how to increase or decrease these in order to successfully get to the minimum. My understanding is that these values are optional so I am unsure of what the default values are.
# create starting parameters, parameters equal to sin(x)
a = 1
k = 0
h = 0
wave_params = [a, k, h]
def wave_func(func_params):
"""This function calculates the difference between a sinewave (sin(x)) and raw_data (different sin wave)
This is the function that will be minimized by modulating a, b, k, and h parameters in order to minimize
the difference between curves."""
a = func_params[0]
b = 1
k = func_params[1]
h = func_params[2]
y_wave = a * np.sin((x_vals-h)/b) + k
error = np.sum((y_wave - raw_data) * (y_wave - raw_data))
return error
wave_optimized = scipy.optimize.fmin(wave_func, wave_params)
You can try using scipy.optimize.minimize with method='Nelder-Mead' https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html.
https://docs.scipy.org/doc/scipy/reference/optimize.minimize-neldermead.html#optimize-minimize-neldermead
Then you can just do
minimum = scipy.optimize.minimize(wave_func, wave_params, method='Nelder-Mead')
n_function_evaluations = minimum.nfev
n_iterations = minimum.nit
or you can customize the search algorithm like this:
minimum = scipy.optimize.minimize(
wave_func, wave_params, method='Nelder-Mead',
options={'maxiter': 10000, 'maxfev': 8000}
)
I don't know anything about fmin, but my guess is that it behaves extremely similarly.

Euler beam, solving differential equation in python

I must solve the Euler Bernoulli differential beam equation which is:
w’’’’(x) = q(x)
and boundary conditions:
w(0) = w(l) = 0
and
w′′(0) = w′′(l) = 0
The beam is as shown on the picture below:
beam
The continious force q is 2N/mm.
I have to use shooting method and scipy.integrate.odeint() func.
I can't even manage to start as i do not understand how to write the differential equation as a system of equation
Can someone who understands solving of differential equations with boundary conditions in python please help!
Thanks :)
The shooting method
To solve the fourth order ODE BVP with scipy.integrate.odeint() using the shooting method you need to:
1.) Separate the 4th order ODE into 4 first order ODEs by substituting:
u = w
u1 = u' = w' # 1
u2 = u1' = w'' # 2
u3 = u2' = w''' # 3
u4 = u3' = w'''' = q # 4
2.) Create a function to carry out the derivation logic and connect that function to the integrate.odeint() like this:
function calc(u, x , q)
{
return [u[1], u[2], u[3] , q]
}
w = integrate.odeint(calc, [w(0), guess, w''(0), guess], xList, args=(q,))
Explanation:
We are sending the boundary value conditions to odeint() for x=0 ([w(0), w'(0) ,w''(0), w'''(0)]) which calls the function calc which returns the derivatives to be added to the current state of w. Note that we are guessing the initial boundary conditions for w'(0) and w'''(0) while entering the known w(0)=0 and w''(0)=0.
Addition of derivatives to the current state of w occurs like this:
# the current w(x) value is the previous value plus the current change of w in dx.
w(x) = w(x-dx) + dw/dx
# others are calculated the same
dw(x)/dx = dw(x-dx)/dx + d^2w(x)/dx^2
# etc.
This is why we are returning values [u[1], u[2], u[3] , q] instead of [u[0], u[1], u[2] , u[3]] from the calc function, because u[1] is the first derivative so we add it to w, etc.
3.) Now we are able to set up our shooting method. We will be sending different initial boundary values for w'(0) and w'''(0) to odeint() and then check the end result of the returned w(x) profile to determine how close w(L) and w''(L) got to 0 (the known boundary conditions).
The program for the shooting method:
# a function to return the derivatives of w
def returnDerivatives(u, x, q):
return [u[1], u[2], u[3], q]
# a shooting funtion which takes in two variables and returns a w(x) profile for x=[0,L]
def shoot(u2, u4):
# the number of x points to calculate integration -> determines the size of dx
# bigger number means more x's -> better precision -> longer execution time
xSteps = 1001
# length of the beam
L= 1.0 # 1m
xSpace = np.linspace(0, L, xSteps)
q = 0.02 # constant [N/m]
# integrate and return the profile of w(x) and it's derivatives, from x=0 to x=L
return odeint(returnDerivatives, [ 0, u2, 0, u4] , xSpace, args=(q,))
# the tolerance for our results.
tolerance = 0.01
# how many numbers to consider for u2 and u4 (the guess boundary conditions)
u2_u4_maxNumbers = 1327 # bigger number, better precision, slower program
# you can also divide into separate variables like u2_maxNum and u4_maxNum
# these are already tested numbers (the best results are somewhere in here)
u2Numbers = np.linspace(-0.1, 0.1, u2_u4_maxNumbers)
# the same as above
u4Numbers = np.linspace(-0.5, 0.5, u2_u4_maxNumbers)
# result list for extracted values of each w(x) profile => [u2Best, u4Best, w(L), w''(L)]
# which will help us determine if the w(x) profile is inside tolerance
resultList = []
# result list for each U (or w(x) profile) => [w(x), w'(x), w''(x), w'''(x)]
resultW = []
# start generating numbers for u2 and u4 and send them to odeint()
for u2 in u2Numbers:
for u4 in u4Numbers:
U = []
U = shoot(u2,u4)
# get only the last row of the profile to determine if it passes tolerance check
result = U[len(U)-1]
# only check w(L) == 0 and w''(L) == 0, as those are the known boundary cond.
if (abs(result[0]) < tolerance) and (abs(result[2]) < tolerance):
# if the result passed the tolerance check, extract some values from the
# last row of the w(x) profile which we will need later for comaprisons
resultList.append([u2, u4, result[0], result[2]])
# add the w(x) profile to the list of profiles that passed the tolerance
# Note: the order of resultList is the same as the order of resultW
resultW.append(U)
# go through the resultList (list of extracted values from last row of each w(x) profile)
for i in range(len(resultList)):
x = resultList[i]
# both boundary conditions are 0 for both w(L) and w''(L) so we will simply add
# the two absolute values to determine how much the sum differs from 0
y = abs(x[2]) + abs(x[3])
# if we've just started set the least difference to the current
if i == 0:
minNum = y # remember the smallest difference to 0
index = 0 # remember index of best profile
elif y < minNum:
# current sum of absolute values is smaller
minNum = y
index = i
# print out the integral for w(x) over the beam
sum = 0
for i in resultW[index]:
sum = sum + i[0]
print("The integral of w(x) over the beam is:")
print(sum/1001) # sum/xSteps
This outputs:
The integral of w(x) over the beam is:
0.000135085272117
To print out the best profile for w(x) that we found:
print(resultW[index])
which outputs something like:
# w(x) w'(x) w''(x) w'''(x)
[[ 0.00000000e+00 7.54147813e-04 0.00000000e+00 -9.80392157e-03]
[ 7.54144825e-07 7.54142917e-04 -9.79392157e-06 -9.78392157e-03]
[ 1.50828005e-06 7.54128237e-04 -1.95678431e-05 -9.76392157e-03]
...,
[ -4.48774290e-05 -8.14851572e-04 1.75726275e-04 1.01560784e-02]
[ -4.56921910e-05 -8.14670764e-04 1.85892353e-04 1.01760784e-02]
[ -4.65067671e-05 -8.14479780e-04 1.96078431e-04 1.01960784e-02]]
To double check the results from above we will also solve the ODE using the numerical method.
The numerical method
To solve the problem using the numerical method we first need to solve the differential equations. We will get four constants which we need to find with the help of the boundary conditions. The boundary conditions will be used to form a system of equations to help find the necessary constants.
For example:
w’’’’(x) = q(x);
means that we have this:
d^4(w(x))/dx^4 = q(x)
Since q(x) is constant after integrating we have:
d^3(w(x))/dx^3 = q(x)*x + C
After integrating again:
d^2(w(x))/dx^2 = q(x)*0.5*x^2 + C*x + D
After another integration:
dw(x)/dx = q(x)/6*x^3 + C*0.5*x^2 + D*x + E
And finally the last integration yields:
w(x) = q(x)/24*x^4 + C/6*x^3 + D*0.5*x^2 + E*x + F
Then we take a look at the boundary conditions (now we have expressions from above for w''(x) and w(x)) with which we make a system of equations to solve the constants.
w''(0) => 0 = q(x)*0.5*0^2 + C*0 + D
w''(L) => 0 = q(x)*0.5*L^2 + C*L + D
This gives us the constants:
D = 0 # from the first equation
C = - 0.01 * L # from the second (after inserting D=0)
After repeating the same for w(0)=0 and w(L)=0 we obtain:
F = 0 # from first
E = 0.01/12.0 * L^3 # from second
Now, after we have solved the equation and found all of the integration constants we can make the program for the numerical method.
The program for the numerical method
We will make a FOR loop to go through the entire beam for every dx at a time and sum up (integrate) w(x).
L = 1.0 # in meters
step = 1001.0 # how many steps to take (dx)
q = 0.02 # constant [N/m]
integralOfW = 0.0; # instead of w(0) enter the boundary condition value for w(0)
result = []
for i in range(int(L*step)):
x= i/step
w = (q/24.0*pow(x,4) - 0.02/12.0*pow(x,3) + 0.01/12*pow(L,3)*x)/step # current w fragment
# add up fragments of w for integral calculation
integralOfW += w
# add current value of w(x) to result list for plotting
result.append(w*step);
print("The integral of w(x) over the beam is:")
print(integralOfW)
which outputs:
The integral of w(x) over the beam is:
0.00016666652805511192
Now to compare the two methods
Result comparison between the shooting method and the numerical method
The integral of w(x) over the beam:
Shooting method -> 0.000135085272117
Numerical method -> 0.00016666652805511192
That's a pretty good match, now lets see check the plots:
From the plots it's even more obvious that we have a good match and that the results of the shooting method are correct.
To get even better results for the shooting method increase xSteps and u2_u4_maxNumbers to bigger numbers and you can also narrow down the u2Numbers and u4Numbers to the same set size but a smaller interval (around the best results from previous program runs). Keep in mind that setting xSteps and u2_u4_maxNumbers too high will cause your program to run for a very long time.
You need to transform the ODE into a first order system, setting u0=w one possible and usually used system is
u0'=u1,
u1'=u2,
u2'=u3,
u3'=q(x)
This can be implemented as
def ODEfunc(u,x): return [ u[1], u[2], u[3], q(x) ]
Then make a function that shoots with experimental initial conditions and returns the components of the second boundary condition
def shoot(u01, u03): return odeint(ODEfunc, [0, u01, 0, u03], [0, l])[-1,[0,2]]
Now you have a function of two variables with two components and you need to solve this 2x2 system with the usual methods. As the system is linear, the shooting function is linear as well and you only need to find the coefficients and solve the resulting linear system.

matrices are not aligned Error: Python SciPy fmin_bfgs

Problem Synopsis:
When attempting to use the scipy.optimize.fmin_bfgs minimization (optimization) function, the function throws a
derphi0 = np.dot(gfk, pk)
ValueError: matrices are not aligned
error. According to my error checking this occurs at the very end of the first iteration through fmin_bfgs--just before any values are returned or any calls to callback.
Configuration:
Windows Vista
Python 3.2.2
SciPy 0.10
IDE = Eclipse with PyDev
Detailed Description:
I am using the scipy.optimize.fmin_bfgs to minimize the cost of a simple logistic regression implementation (converting from Octave to Python/SciPy). Basically, the cost function is named cost_arr function and the gradient descent is in gradient_descent_arr function.
I have manually tested and fully verified that *cost_arr* and *gradient_descent_arr* work properly and return all values properly. I also tested to verify that the proper parameters are passed to the *fmin_bfgs* function. Nevertheless, when run, I get the ValueError: matrices are not aligned. According to the source review, the exact error occurs in the
def line_search_wolfe1
function in # Minpack's Wolfe line and scalar searches as supplied by the scipy packages.
Notably, if I use scipy.optimize.fmin instead, the fmin function runs to completion.
Exact Error:
File
"D:\Users\Shannon\Programming\Eclipse\workspace\SBML\sbml\LogisticRegression.py",
line 395, in fminunc_opt
optcost = scipy.optimize.fmin_bfgs(self.cost_arr, initialtheta, fprime=self.gradient_descent_arr, args=myargs, maxiter=maxnumit, callback=self.callback_fmin_bfgs, retall=True)
File
"C:\Python32x32\lib\site-packages\scipy\optimize\optimize.py", line
533, in fmin_bfgs old_fval,old_old_fval)
File "C:\Python32x32\lib\site-packages\scipy\optimize\linesearch.py", line
76, in line_search_wolfe1
derphi0 = np.dot(gfk, pk)
ValueError: matrices are not aligned
I call the optimization function with:
optcost = scipy.optimize.fmin_bfgs(self.cost_arr, initialtheta, fprime=self.gradient_descent_arr, args=myargs, maxiter=maxnumit, callback=self.callback_fmin_bfgs, retall=True)
I have spent a few days trying to fix this and cannot seem to determine what is causing the matrices are not aligned error.
ADDENDUM: 2012-01-08
I worked with this a lot more and seem to have narrowed the issues (but am baffled on how to fix them). First, fmin (using just fmin) works using these functions--cost, gradient. Second, the cost and the gradient functions both accurately return expected values when tested in a single iteration in a manual implementation (NOT using fmin_bfgs). Third, I added error code to optimize.linsearch and the error seems to be thrown at def line_search_wolfe1 in line: derphi0 = np.dot(gfk, pk).
Here, according to my tests, scipy.optimize.optimize pk = [[ 12.00921659]
[ 11.26284221]]pk type = and scipy.optimize.optimizegfk = [[-12.00921659] [-11.26284221]]gfk type =
Note: according to my tests, the error is thrown on the very first iteration through fmin_bfgs (i.e., fmin_bfgs never even completes a single iteration or update).
I appreciate ANY guidance or insights.
My Code Below (logging, documentation removed):
Assume theta = 2x1 ndarray (Actual: theta Info Size=(2, 1) Type = )
Assume X = 100x2 ndarray (Actual: X Info Size=(2, 100) Type = )
Assume y = 100x1 ndarray (Actual: y Info Size=(100, 1) Type = )
def cost_arr(self, theta, X, y):
theta = scipy.resize(theta,(2,1))
m = scipy.shape(X)
m = 1 / m[1] # Use m[1] because this is the length of X
logging.info(__name__ + "cost_arr reports m = " + str(m))
z = scipy.dot(theta.T, X) # Must transpose the vector theta
hypthetax = self.sigmoid(z)
yones = scipy.ones(scipy.shape(y))
hypthetaxones = scipy.ones(scipy.shape(hypthetax))
costright = scipy.dot((yones - y).T, ((scipy.log(hypthetaxones - hypthetax)).T))
costleft = scipy.dot((-1 * y).T, ((scipy.log(hypthetax)).T))
def gradient_descent_arr(self, theta, X, y):
theta = scipy.resize(theta,(2,1))
m = scipy.shape(X)
m = 1 / m[1] # Use m[1] because this is the length of X
x = scipy.dot(theta.T, X) # Must transpose the vector theta
sig = self.sigmoid(x)
sig = sig.T - y
grad = scipy.dot(X,sig)
grad = m * grad
return grad
def fminunc_opt_bfgs(self, initialtheta, X, y, maxnumit):
myargs= (X,y)
optcost = scipy.optimize.fmin_bfgs(self.cost_arr, initialtheta, fprime=self.gradient_descent_arr, args=myargs, maxiter=maxnumit, retall=True, full_output=True)
return optcost
In case anyone else encounters this problem ....
1) ERROR 1: As noted in the comments, I incorrectly returned the value from my gradient as a multidimensional array (m,n) or (m,1). fmin_bfgs seems to require a 1d array output from the gradient (that is, you must return a (m,) array and NOT a (m,1) array. Use scipy.shape(myarray) to check the dimensions if you are unsure of the return value.
The fix involved adding:
grad = numpy.ndarray.flatten(grad)
just before returning the gradient from your gradient function. This "flattens" the array from (m,1) to (m,). fmin_bfgs can take this as input.
2) ERROR 2: Remember, the fmin_bfgs seems to work with NONlinear functions. In my case, the sample that I was initially working with was a LINEAR function. This appears to explain some of the anomalous results even after the flatten fix mentioned above. For LINEAR functions, fmin, rather than fmin_bfgs, may work better.
QED
As of current scipy version you need not pass fprime argument. It will compute the gradient for you without any issues. You can also use 'minimize' fn and pass method as 'bfgs' instead without providing gradient as argument.

Resources