I am working on a cost minimizing function to help with allocation/weights in a portfolio of stocks. I have the following code for the "Objective Function". This works when I tried it with 15 variables(stocks). However, when I tried it with 55 stocks it failed.
I have tried it with a smaller sample of stocks(15) and it works fine. The num_assets variable below is the number of stocks in the portfolio.
def get_metrics(weights):
weights = np.array(weights)
returnsR = np.dot(returns_annualR, weights )
volatilityR = np.sqrt(np.dot(weights.T, np.dot(cov_matrixR, weights)))
sharpeR = returnsR / volatilityR
drawdownR = np.multiply(weights, dailyDD).sum(axis=1, skipna =
True).min()
drawdownR = f(drawdownR)
calmarR = returnsR / drawdownR
results = (sharpeR * 0.3) + (calmarR * 0.7)
return np.array([returnsR, volatilityR, sharpeR, drawdownR, calmarR,
results])
def objective(weights):
# the number 5 is the index from the get_metrics array
return get_metrics(weights)[5] * -1
def check_sum(weights):
#return 0 if sum of the weights is 1
return np.sum(weights)-1
bound = (0.0,1.0)
bnds = tuple(bound for x in range (num_assets))
bx = list(bnds)
""" Custom step-function """
class RandomDisplacementBounds(object):
"""random displacement with bounds: see: https://stackoverflow.com/a/21967888/2320035
Modified! (dropped acceptance-rejection sampling for a more specialized approach)
"""
def __init__(self, xmin, xmax, stepsize=0.5):
self.xmin = xmin
self.xmax = xmax
self.stepsize = stepsize
def __call__(self, x):
"""take a random step but ensure the new position is within the bounds """
min_step = np.maximum(self.xmin - x, -self.stepsize)
max_step = np.minimum(self.xmax - x, self.stepsize)
random_step = np.random.uniform(low=min_step, high=max_step, size=x.shape)
xnew = x + random_step
return xnew
bounded_step = RandomDisplacementBounds(np.array([b[0] for b in bx]), np.array([b[1] for b in bx]))
minimizer_kwargs = {"method":"L-BFGS-B", "bounds": bnds}
globmin = sco.basinhopping(objective,
x0=num_assets*[1./num_assets],
minimizer_kwargs=minimizer_kwargs,
take_step=bounded_step,
disp=True)
The output should be an array of numbers that add up to 1 or 100%. However, this is not happening.
This function is a failure on my end as well. It failed to choose values which were lower -- ie., regardless of output from optimization function (negative or positive), it persisted until the parameter I was optimizing was as bad as it could possibly be. I suspect that since the function violates function encapsulation and relies on "function attributes" to adjust stepsize, the developer may not have respected encapsulated function scope elsewhere, and surprising behavior is happening as a result.
Regardless, in terms of theory, anything else is just a (dubious) estimated numerical partial second derivative (numerical Hessian, or "estimated curvature" for us mere mortals) based "performance" "gain", which reduces to a randomly-biased annealer in discrete, chaotic (phase space, continuous) or mixed (continuous and discrete) search spaces with volatile curvatures or planar areas (due to numerical underflow and loss of precision).
Anyways, import the following:
scipy.optimize.dual_anneal
dual anneal
Related
I'm attempting to solve the differential equation:
m(t) = M(x)x'' + C(x, x') + B x'
where x and x' are vectors with 2 entries representing the angles and angular velocity in a dynamical system. M(x) is a 2x2 matrix that is a function of the components of theta, C is a 2x1 vector that is a function of theta and theta' and B is a 2x2 matrix of constants. m(t) is a 2*1001 array containing the torques applied to each of the two joints at the 1001 time steps and I would like to calculate the evolution of the angles as a function of those 1001 time steps.
I've transformed it to standard form such that :
x'' = M(x)^-1 (m(t) - C(x, x') - B x')
Then substituting y_1 = x and y_2 = x' gives the first order linear system of equations:
y_2 = y_1'
y_2' = M(y_1)^-1 (m(t) - C(y_1, y_2) - B y_2)
(I've used theta and phi in my code for x and y)
def joint_angles(theta_array, t, torques, B):
phi_1 = np.array([theta_array[0], theta_array[1]])
phi_2 = np.array([theta_array[2], theta_array[3]])
def M_func(phi):
M = np.array([[a_1+2.*a_2*np.cos(phi[1]), a_3+a_2*np.cos(phi[1])],[a_3+a_2*np.cos(phi[1]), a_3]])
return np.linalg.inv(M)
def C_func(phi, phi_dot):
return a_2 * np.sin(phi[1]) * np.array([-phi_dot[1] * (2. * phi_dot[0] + phi_dot[1]), phi_dot[0]**2])
dphi_2dt = M_func(phi_1) # (torques[:, t] - C_func(phi_1, phi_2) - B # phi_2)
return dphi_2dt, phi_2
t = np.linspace(0,1,1001)
initial = theta_init[0], theta_init[1], dtheta_init[0], dtheta_init[1]
x = odeint(joint_angles, initial, t, args = (torque_array, B))
I get the error that I cannot index into torques using the t array, which makes perfect sense, however I am not sure how to have it use the current value of the torques at each time step.
I also tried putting odeint command in a for loop and only evaluating it at one time step at a time, using the solution of the function as the initial conditions for the next loop, however the function simply returned the initial conditions, meaning every loop was identical. This leads me to suspect I've made a mistake in my implementation of the standard form but I can't work out what it is. It would be preferable however to not have to call the odeint solver in a for loop every time, and rather do it all as one.
If helpful, my initial conditions and constant values are:
theta_init = np.array([10*np.pi/180, 143.54*np.pi/180])
dtheta_init = np.array([0, 0])
L_1 = 0.3
L_2 = 0.33
I_1 = 0.025
I_2 = 0.045
M_1 = 1.4
M_2 = 1.0
D_2 = 0.16
a_1 = I_1+I_2+M_2*(L_1**2)
a_2 = M_2*L_1*D_2
a_3 = I_2
Thanks for helping!
The solver uses an internal stepping that is problem adapted. The given time list is a list of points where the internal solution gets interpolated for output samples. The internal and external time lists are in no way related, the internal list only depends on the given tolerances.
There is no actual natural relation between array indices and sample times.
The translation of a given time into an index and construction of a sample value from the surrounding table entries is called interpolation (by a piecewise polynomial function).
Torque as a physical phenomenon is at least continuous, a piecewise linear interpolation is the easiest way to transform the given function value table into an actual continuous function. Of course one also needs the time array.
So use numpy.interp1d or the more advanced routines of scipy.interpolate to define the torque function that can be evaluated at arbitrary times as demanded by the solver and its integration method.
I am writing a function to evaluate and return a non linear system of equations, and give the jacobian. I then plan to call the function in a while loop to use the newton method to solve the system of equations.
I used the numpy package and read over its documentation, tried to limit the number of iterations, changed the dtype in the array and searched online to see if someone else had a similar problem.
This function is meant to solve a neoclassical growth model (a problem in macroeconomics) in finite time , T. The set of equations include T euler equations, T constraints, and one terminal condition. Thus the result should be an array of length 2T+1 containing the values of the equations, and a (2T+1)x(2T+1) jacobian matrix.
When I try to run the function for small array (arrays of length 1, and 3) it works perfectly. As soon as I try an array of length 5 or more, I start encountering RuntimeWarnings.
import numpy as np
def solver(args, params):
b,s,a,d = params[0], params[1], params[2], params[3]
guess = np.copy(args)
#Euler
euler = guess[:len(guess)//2]**(-sigma) - beta*guess[1:len(guess)//2+1]**(-sigma)*(1-delta+alpha*guess[len(guess)//2+1:]**(alpha-1))
#Budget Constraint
kzero_to_T = np.concatenate(([k0], guess[len(guess)//2+1:]))
bc_t = guess[:len(guess)//2] + guess[len(guess)//2+1:] - kzero_to_T[:-1]**alpha - (1-delta)*kzero_to_T[:-1]
bc_f = guess[len(guess)//2] -kzero_to_T[-1]**alpha - kzero_to_T[-1]*(1-delta)
bc = np.hstack((bc_t, bc_f))
Evals = np.concatenate((euler, bc))
# top half of the jacobian
jac_dot_5 = np.zeros((len(args)//2, len(args)))
for t in range(len(args)//2):
for i in range(len(args)):
if t == i and len(args)//2+(i+1)<=len(args):
jac_dot_5[t][t] = -sigma*args[t]**(-sigma-1)
jac_dot_5[t][t+1] = sigma*beta*args[t+1]*(1-delta+alpha*args[len(args)//2+(t+1)]**(alpha-1))
jac_dot_5[t][len(args)//2+(t+1)] = beta*args[t+1]**(-sigma)*alpha*(alpha-1)*args[len(args)//2+(t+1)]
# bottom half of the jacobian
jac_dot_1 = np.zeros((len(args)//2, len(args)))
for u in range(len(args)//2):
for v in range(len(args)):
if u==v and u>=1 and (len(args)//2 + u+1 < len(args)):
jac_dot_1[u][u] = 1
jac_dot_1[u][len(args)//2+(u)] = 1
jac_dot_1[u][len(args)//2+(u+1)] = -alpha*args[len(args)//2 + (u+1)]**(alpha-1) -(1-delta)
jac_dot_1[0][0] = 1
jac_dot_1[0][len(args)//2 +1] = 1
# last row of the jacobian
final_bc = np.zeros((1,len(args)))
final_bc[0][len(args)//2] = 1
final_bc[0][-1] = -alpha*args[-1]**(alpha-1) -(1-delta)
jac2Tn1 = np.concatenate((jac_dot_5, jac_dot_1, final_bc), axis=0)
point = coll.namedtuple('point', ['Output', 'Jacobian', 'Euler', 'BC'])
result = point(Output = Evals, Jacobian = jac2Tn1, Euler=euler, BC=bc )
return result
The code for implementing the algorithm:
p = (beta, sigma, alpha, delta)
for i in range(20):
k0 = np.linspace(2.49, 9.96, 20)[i]
vars0 = np.array([1,1,1,1,1], dtype=float)
vars1 = np.array([20,20,20,20,20], dtype=float)
Iter2= 0
while abs(solver(vars1,p).Output).max()>1e-8 and Iter2<300:
Iter2+=1
inv_jac1 = np.linalg.inv(solver(vars0,p).Jacobian)
vars1 = vars0 - inv_jac1#solver(vars0,p).Output
vars0=vars1
if Iter2 == 100:
break
I expect the output to be vars1 containing the updated values. The actual output is array([nan, nan, nan, nan, nan]). The way the function has been written, it should be able to give the output for inputs of arbitrary guesses of length 2T+1, where T is number of periods of time.
I get three error messages during the execution of the loop:
C:\Users\Peter\Anaconda3\lib\site-packages\ipykernel_launcher.py:19: RuntimeWarning: invalid value encountered in power
C:\Users\Peter\Anaconda3\lib\site-packages\ipykernel_launcher.py:23: RuntimeWarning: invalid value encountered in power
C:\Users\Peter\Anaconda3\lib\site-packages\ipykernel_launcher.py:41: RuntimeWarning: invalid value encountered in double_scalars
I tried to code my issue from scratch and I couldn't make it any shorter- I need both, the evaluations of the equations and the jacobian to implement the algorithm. From my testing it looks like at some point the equation results (the solver(vars0,p).Output entry) become nan, but I am not sure why that would happen, the array should get close to 0, per the condition abs(solver(vars1,p).Output).max()>1e-8 and then just break out of the loop.
I have a function that I am attempting to minimize for multiple values. For some values it terminates successfully however for others the error
Warning: Maximum number of function evaluations has been exceeded.
Is the error that is given. I am unsure of the role of maxiter and maxfun and how to increase or decrease these in order to successfully get to the minimum. My understanding is that these values are optional so I am unsure of what the default values are.
# create starting parameters, parameters equal to sin(x)
a = 1
k = 0
h = 0
wave_params = [a, k, h]
def wave_func(func_params):
"""This function calculates the difference between a sinewave (sin(x)) and raw_data (different sin wave)
This is the function that will be minimized by modulating a, b, k, and h parameters in order to minimize
the difference between curves."""
a = func_params[0]
b = 1
k = func_params[1]
h = func_params[2]
y_wave = a * np.sin((x_vals-h)/b) + k
error = np.sum((y_wave - raw_data) * (y_wave - raw_data))
return error
wave_optimized = scipy.optimize.fmin(wave_func, wave_params)
You can try using scipy.optimize.minimize with method='Nelder-Mead' https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html.
https://docs.scipy.org/doc/scipy/reference/optimize.minimize-neldermead.html#optimize-minimize-neldermead
Then you can just do
minimum = scipy.optimize.minimize(wave_func, wave_params, method='Nelder-Mead')
n_function_evaluations = minimum.nfev
n_iterations = minimum.nit
or you can customize the search algorithm like this:
minimum = scipy.optimize.minimize(
wave_func, wave_params, method='Nelder-Mead',
options={'maxiter': 10000, 'maxfev': 8000}
)
I don't know anything about fmin, but my guess is that it behaves extremely similarly.
I must solve the Euler Bernoulli differential beam equation which is:
w’’’’(x) = q(x)
and boundary conditions:
w(0) = w(l) = 0
and
w′′(0) = w′′(l) = 0
The beam is as shown on the picture below:
beam
The continious force q is 2N/mm.
I have to use shooting method and scipy.integrate.odeint() func.
I can't even manage to start as i do not understand how to write the differential equation as a system of equation
Can someone who understands solving of differential equations with boundary conditions in python please help!
Thanks :)
The shooting method
To solve the fourth order ODE BVP with scipy.integrate.odeint() using the shooting method you need to:
1.) Separate the 4th order ODE into 4 first order ODEs by substituting:
u = w
u1 = u' = w' # 1
u2 = u1' = w'' # 2
u3 = u2' = w''' # 3
u4 = u3' = w'''' = q # 4
2.) Create a function to carry out the derivation logic and connect that function to the integrate.odeint() like this:
function calc(u, x , q)
{
return [u[1], u[2], u[3] , q]
}
w = integrate.odeint(calc, [w(0), guess, w''(0), guess], xList, args=(q,))
Explanation:
We are sending the boundary value conditions to odeint() for x=0 ([w(0), w'(0) ,w''(0), w'''(0)]) which calls the function calc which returns the derivatives to be added to the current state of w. Note that we are guessing the initial boundary conditions for w'(0) and w'''(0) while entering the known w(0)=0 and w''(0)=0.
Addition of derivatives to the current state of w occurs like this:
# the current w(x) value is the previous value plus the current change of w in dx.
w(x) = w(x-dx) + dw/dx
# others are calculated the same
dw(x)/dx = dw(x-dx)/dx + d^2w(x)/dx^2
# etc.
This is why we are returning values [u[1], u[2], u[3] , q] instead of [u[0], u[1], u[2] , u[3]] from the calc function, because u[1] is the first derivative so we add it to w, etc.
3.) Now we are able to set up our shooting method. We will be sending different initial boundary values for w'(0) and w'''(0) to odeint() and then check the end result of the returned w(x) profile to determine how close w(L) and w''(L) got to 0 (the known boundary conditions).
The program for the shooting method:
# a function to return the derivatives of w
def returnDerivatives(u, x, q):
return [u[1], u[2], u[3], q]
# a shooting funtion which takes in two variables and returns a w(x) profile for x=[0,L]
def shoot(u2, u4):
# the number of x points to calculate integration -> determines the size of dx
# bigger number means more x's -> better precision -> longer execution time
xSteps = 1001
# length of the beam
L= 1.0 # 1m
xSpace = np.linspace(0, L, xSteps)
q = 0.02 # constant [N/m]
# integrate and return the profile of w(x) and it's derivatives, from x=0 to x=L
return odeint(returnDerivatives, [ 0, u2, 0, u4] , xSpace, args=(q,))
# the tolerance for our results.
tolerance = 0.01
# how many numbers to consider for u2 and u4 (the guess boundary conditions)
u2_u4_maxNumbers = 1327 # bigger number, better precision, slower program
# you can also divide into separate variables like u2_maxNum and u4_maxNum
# these are already tested numbers (the best results are somewhere in here)
u2Numbers = np.linspace(-0.1, 0.1, u2_u4_maxNumbers)
# the same as above
u4Numbers = np.linspace(-0.5, 0.5, u2_u4_maxNumbers)
# result list for extracted values of each w(x) profile => [u2Best, u4Best, w(L), w''(L)]
# which will help us determine if the w(x) profile is inside tolerance
resultList = []
# result list for each U (or w(x) profile) => [w(x), w'(x), w''(x), w'''(x)]
resultW = []
# start generating numbers for u2 and u4 and send them to odeint()
for u2 in u2Numbers:
for u4 in u4Numbers:
U = []
U = shoot(u2,u4)
# get only the last row of the profile to determine if it passes tolerance check
result = U[len(U)-1]
# only check w(L) == 0 and w''(L) == 0, as those are the known boundary cond.
if (abs(result[0]) < tolerance) and (abs(result[2]) < tolerance):
# if the result passed the tolerance check, extract some values from the
# last row of the w(x) profile which we will need later for comaprisons
resultList.append([u2, u4, result[0], result[2]])
# add the w(x) profile to the list of profiles that passed the tolerance
# Note: the order of resultList is the same as the order of resultW
resultW.append(U)
# go through the resultList (list of extracted values from last row of each w(x) profile)
for i in range(len(resultList)):
x = resultList[i]
# both boundary conditions are 0 for both w(L) and w''(L) so we will simply add
# the two absolute values to determine how much the sum differs from 0
y = abs(x[2]) + abs(x[3])
# if we've just started set the least difference to the current
if i == 0:
minNum = y # remember the smallest difference to 0
index = 0 # remember index of best profile
elif y < minNum:
# current sum of absolute values is smaller
minNum = y
index = i
# print out the integral for w(x) over the beam
sum = 0
for i in resultW[index]:
sum = sum + i[0]
print("The integral of w(x) over the beam is:")
print(sum/1001) # sum/xSteps
This outputs:
The integral of w(x) over the beam is:
0.000135085272117
To print out the best profile for w(x) that we found:
print(resultW[index])
which outputs something like:
# w(x) w'(x) w''(x) w'''(x)
[[ 0.00000000e+00 7.54147813e-04 0.00000000e+00 -9.80392157e-03]
[ 7.54144825e-07 7.54142917e-04 -9.79392157e-06 -9.78392157e-03]
[ 1.50828005e-06 7.54128237e-04 -1.95678431e-05 -9.76392157e-03]
...,
[ -4.48774290e-05 -8.14851572e-04 1.75726275e-04 1.01560784e-02]
[ -4.56921910e-05 -8.14670764e-04 1.85892353e-04 1.01760784e-02]
[ -4.65067671e-05 -8.14479780e-04 1.96078431e-04 1.01960784e-02]]
To double check the results from above we will also solve the ODE using the numerical method.
The numerical method
To solve the problem using the numerical method we first need to solve the differential equations. We will get four constants which we need to find with the help of the boundary conditions. The boundary conditions will be used to form a system of equations to help find the necessary constants.
For example:
w’’’’(x) = q(x);
means that we have this:
d^4(w(x))/dx^4 = q(x)
Since q(x) is constant after integrating we have:
d^3(w(x))/dx^3 = q(x)*x + C
After integrating again:
d^2(w(x))/dx^2 = q(x)*0.5*x^2 + C*x + D
After another integration:
dw(x)/dx = q(x)/6*x^3 + C*0.5*x^2 + D*x + E
And finally the last integration yields:
w(x) = q(x)/24*x^4 + C/6*x^3 + D*0.5*x^2 + E*x + F
Then we take a look at the boundary conditions (now we have expressions from above for w''(x) and w(x)) with which we make a system of equations to solve the constants.
w''(0) => 0 = q(x)*0.5*0^2 + C*0 + D
w''(L) => 0 = q(x)*0.5*L^2 + C*L + D
This gives us the constants:
D = 0 # from the first equation
C = - 0.01 * L # from the second (after inserting D=0)
After repeating the same for w(0)=0 and w(L)=0 we obtain:
F = 0 # from first
E = 0.01/12.0 * L^3 # from second
Now, after we have solved the equation and found all of the integration constants we can make the program for the numerical method.
The program for the numerical method
We will make a FOR loop to go through the entire beam for every dx at a time and sum up (integrate) w(x).
L = 1.0 # in meters
step = 1001.0 # how many steps to take (dx)
q = 0.02 # constant [N/m]
integralOfW = 0.0; # instead of w(0) enter the boundary condition value for w(0)
result = []
for i in range(int(L*step)):
x= i/step
w = (q/24.0*pow(x,4) - 0.02/12.0*pow(x,3) + 0.01/12*pow(L,3)*x)/step # current w fragment
# add up fragments of w for integral calculation
integralOfW += w
# add current value of w(x) to result list for plotting
result.append(w*step);
print("The integral of w(x) over the beam is:")
print(integralOfW)
which outputs:
The integral of w(x) over the beam is:
0.00016666652805511192
Now to compare the two methods
Result comparison between the shooting method and the numerical method
The integral of w(x) over the beam:
Shooting method -> 0.000135085272117
Numerical method -> 0.00016666652805511192
That's a pretty good match, now lets see check the plots:
From the plots it's even more obvious that we have a good match and that the results of the shooting method are correct.
To get even better results for the shooting method increase xSteps and u2_u4_maxNumbers to bigger numbers and you can also narrow down the u2Numbers and u4Numbers to the same set size but a smaller interval (around the best results from previous program runs). Keep in mind that setting xSteps and u2_u4_maxNumbers too high will cause your program to run for a very long time.
You need to transform the ODE into a first order system, setting u0=w one possible and usually used system is
u0'=u1,
u1'=u2,
u2'=u3,
u3'=q(x)
This can be implemented as
def ODEfunc(u,x): return [ u[1], u[2], u[3], q(x) ]
Then make a function that shoots with experimental initial conditions and returns the components of the second boundary condition
def shoot(u01, u03): return odeint(ODEfunc, [0, u01, 0, u03], [0, l])[-1,[0,2]]
Now you have a function of two variables with two components and you need to solve this 2x2 system with the usual methods. As the system is linear, the shooting function is linear as well and you only need to find the coefficients and solve the resulting linear system.
Problem Synopsis:
When attempting to use the scipy.optimize.fmin_bfgs minimization (optimization) function, the function throws a
derphi0 = np.dot(gfk, pk)
ValueError: matrices are not aligned
error. According to my error checking this occurs at the very end of the first iteration through fmin_bfgs--just before any values are returned or any calls to callback.
Configuration:
Windows Vista
Python 3.2.2
SciPy 0.10
IDE = Eclipse with PyDev
Detailed Description:
I am using the scipy.optimize.fmin_bfgs to minimize the cost of a simple logistic regression implementation (converting from Octave to Python/SciPy). Basically, the cost function is named cost_arr function and the gradient descent is in gradient_descent_arr function.
I have manually tested and fully verified that *cost_arr* and *gradient_descent_arr* work properly and return all values properly. I also tested to verify that the proper parameters are passed to the *fmin_bfgs* function. Nevertheless, when run, I get the ValueError: matrices are not aligned. According to the source review, the exact error occurs in the
def line_search_wolfe1
function in # Minpack's Wolfe line and scalar searches as supplied by the scipy packages.
Notably, if I use scipy.optimize.fmin instead, the fmin function runs to completion.
Exact Error:
File
"D:\Users\Shannon\Programming\Eclipse\workspace\SBML\sbml\LogisticRegression.py",
line 395, in fminunc_opt
optcost = scipy.optimize.fmin_bfgs(self.cost_arr, initialtheta, fprime=self.gradient_descent_arr, args=myargs, maxiter=maxnumit, callback=self.callback_fmin_bfgs, retall=True)
File
"C:\Python32x32\lib\site-packages\scipy\optimize\optimize.py", line
533, in fmin_bfgs old_fval,old_old_fval)
File "C:\Python32x32\lib\site-packages\scipy\optimize\linesearch.py", line
76, in line_search_wolfe1
derphi0 = np.dot(gfk, pk)
ValueError: matrices are not aligned
I call the optimization function with:
optcost = scipy.optimize.fmin_bfgs(self.cost_arr, initialtheta, fprime=self.gradient_descent_arr, args=myargs, maxiter=maxnumit, callback=self.callback_fmin_bfgs, retall=True)
I have spent a few days trying to fix this and cannot seem to determine what is causing the matrices are not aligned error.
ADDENDUM: 2012-01-08
I worked with this a lot more and seem to have narrowed the issues (but am baffled on how to fix them). First, fmin (using just fmin) works using these functions--cost, gradient. Second, the cost and the gradient functions both accurately return expected values when tested in a single iteration in a manual implementation (NOT using fmin_bfgs). Third, I added error code to optimize.linsearch and the error seems to be thrown at def line_search_wolfe1 in line: derphi0 = np.dot(gfk, pk).
Here, according to my tests, scipy.optimize.optimize pk = [[ 12.00921659]
[ 11.26284221]]pk type = and scipy.optimize.optimizegfk = [[-12.00921659] [-11.26284221]]gfk type =
Note: according to my tests, the error is thrown on the very first iteration through fmin_bfgs (i.e., fmin_bfgs never even completes a single iteration or update).
I appreciate ANY guidance or insights.
My Code Below (logging, documentation removed):
Assume theta = 2x1 ndarray (Actual: theta Info Size=(2, 1) Type = )
Assume X = 100x2 ndarray (Actual: X Info Size=(2, 100) Type = )
Assume y = 100x1 ndarray (Actual: y Info Size=(100, 1) Type = )
def cost_arr(self, theta, X, y):
theta = scipy.resize(theta,(2,1))
m = scipy.shape(X)
m = 1 / m[1] # Use m[1] because this is the length of X
logging.info(__name__ + "cost_arr reports m = " + str(m))
z = scipy.dot(theta.T, X) # Must transpose the vector theta
hypthetax = self.sigmoid(z)
yones = scipy.ones(scipy.shape(y))
hypthetaxones = scipy.ones(scipy.shape(hypthetax))
costright = scipy.dot((yones - y).T, ((scipy.log(hypthetaxones - hypthetax)).T))
costleft = scipy.dot((-1 * y).T, ((scipy.log(hypthetax)).T))
def gradient_descent_arr(self, theta, X, y):
theta = scipy.resize(theta,(2,1))
m = scipy.shape(X)
m = 1 / m[1] # Use m[1] because this is the length of X
x = scipy.dot(theta.T, X) # Must transpose the vector theta
sig = self.sigmoid(x)
sig = sig.T - y
grad = scipy.dot(X,sig)
grad = m * grad
return grad
def fminunc_opt_bfgs(self, initialtheta, X, y, maxnumit):
myargs= (X,y)
optcost = scipy.optimize.fmin_bfgs(self.cost_arr, initialtheta, fprime=self.gradient_descent_arr, args=myargs, maxiter=maxnumit, retall=True, full_output=True)
return optcost
In case anyone else encounters this problem ....
1) ERROR 1: As noted in the comments, I incorrectly returned the value from my gradient as a multidimensional array (m,n) or (m,1). fmin_bfgs seems to require a 1d array output from the gradient (that is, you must return a (m,) array and NOT a (m,1) array. Use scipy.shape(myarray) to check the dimensions if you are unsure of the return value.
The fix involved adding:
grad = numpy.ndarray.flatten(grad)
just before returning the gradient from your gradient function. This "flattens" the array from (m,1) to (m,). fmin_bfgs can take this as input.
2) ERROR 2: Remember, the fmin_bfgs seems to work with NONlinear functions. In my case, the sample that I was initially working with was a LINEAR function. This appears to explain some of the anomalous results even after the flatten fix mentioned above. For LINEAR functions, fmin, rather than fmin_bfgs, may work better.
QED
As of current scipy version you need not pass fprime argument. It will compute the gradient for you without any issues. You can also use 'minimize' fn and pass method as 'bfgs' instead without providing gradient as argument.