Issue when creating a custom loss function. Sorry I am a bit new to matconvnet
So essentially the output for my Neural Network is aimed to be a vector with 2 elements (ex: [1,2]) with an error function based upon the RMSE
So I changed the cnn_train so that the labels would be instead a 2 by # of training examples. In the code below x = 1 x 1 x 2 x batchSize set and c is the labels.
function y = wf_rmse(x, c, varargin)
% Custom loss function for MSE Error
org = size(x);
x = reshape(x, size(c));
if ~isempty(varargin) && ~ischar(varargin{1}) % passed in dzdy
dzdy = varargin{1} ;
varargin(1) = [] ;
else
dzdy = [] ;
end
% Forward pass
if(nargin <= 2 || isempty(dzdy))
y = sum(sum(((x-c).^2))/2));
% Back pass
elseif(nargin == 3 && ~isempty(dzdy))
y = 2 * dzdy * (x - c);
y = reshape(y, org);
end
When I include this as part of the network, the network initializes fine, but I get an error during training of, even though the dimensions of the gradient output should match the dimensions of the others.
Error using vl_nnconv
DEROUTPUT dimensions are incompatible with X and FILTERS.
Any suggestions on resolving the issue?
Related
In PyTorch, I want to do the following calculation:
l1 = f(x.detach(), y)
l1.backward(retain_graph=True)
l2 = -1*f(x, y.detach())
l2.backward()
where f is some function, and x and y are tensors that require gradient. Notice that x and y may both be the results of previous calculations which utilize shared parameters (for example, maybe x=g(z) and y=g(w) where g is an nn.Module).
The issue is that l1 and l2 are both numerically identical, up to the minus sign, and it seems wasteful to repeat the calculation f(x,y) twice. It would be nicer to be able to calculate it once, and apply backward twice on the result. Is there any way of doing this?
One possibility is to manually call autograd.grad and update the w.grad field of each nn.Parameter w. But I'm wondering if there is a more direct and clean way to do this, using the backward function.
I took this answer from here.
We can calculate f(x,y) once, without detaching neither x or y, if we ensure that we we multiply by -1 the gradient flowing through x. This can be done using register_hook:
x.register_hook(lambda t: -t)
l = f(x,y)
l.backward()
Here is code demonstrating that this works:
import torch
lin = torch.nn.Linear(1, 1, bias=False)
lin.weight.data[:] = 1.0
a = torch.tensor([1.0])
b = torch.tensor([2.0])
loss_func = lambda x, y: (x - y).abs()
# option 1: this is the inefficient option, presented in the original question
lin.zero_grad()
x = lin(a)
y = lin(b)
loss1 = loss_func(x.detach(), y)
loss1.backward(retain_graph=True)
loss2 = -1 * loss_func(x, y.detach()) # second invocation of `loss_func` - not efficient!
loss2.backward()
print(lin.weight.grad)
# option 2: this is the efficient method, suggested in this answer.
lin.zero_grad()
x = lin(a)
y = lin(b)
x.register_hook(lambda t: -t)
loss = loss_func(x, y) # only one invocation of `loss_func` - more efficient!
loss.backward()
print(lin.weight.grad) # the output of this is identical to the previous print, which confirms the method
# option 3 - this should not be equivalent to the previous options, used just for comparison
lin.zero_grad()
x = lin(a)
y = lin(b)
loss = loss_func(x, y)
loss.backward()
print(lin.weight.grad)
I am working on a cost minimizing function to help with allocation/weights in a portfolio of stocks. I have the following code for the "Objective Function". This works when I tried it with 15 variables(stocks). However, when I tried it with 55 stocks it failed.
I have tried it with a smaller sample of stocks(15) and it works fine. The num_assets variable below is the number of stocks in the portfolio.
def get_metrics(weights):
weights = np.array(weights)
returnsR = np.dot(returns_annualR, weights )
volatilityR = np.sqrt(np.dot(weights.T, np.dot(cov_matrixR, weights)))
sharpeR = returnsR / volatilityR
drawdownR = np.multiply(weights, dailyDD).sum(axis=1, skipna =
True).min()
drawdownR = f(drawdownR)
calmarR = returnsR / drawdownR
results = (sharpeR * 0.3) + (calmarR * 0.7)
return np.array([returnsR, volatilityR, sharpeR, drawdownR, calmarR,
results])
def objective(weights):
# the number 5 is the index from the get_metrics array
return get_metrics(weights)[5] * -1
def check_sum(weights):
#return 0 if sum of the weights is 1
return np.sum(weights)-1
bound = (0.0,1.0)
bnds = tuple(bound for x in range (num_assets))
bx = list(bnds)
""" Custom step-function """
class RandomDisplacementBounds(object):
"""random displacement with bounds: see: https://stackoverflow.com/a/21967888/2320035
Modified! (dropped acceptance-rejection sampling for a more specialized approach)
"""
def __init__(self, xmin, xmax, stepsize=0.5):
self.xmin = xmin
self.xmax = xmax
self.stepsize = stepsize
def __call__(self, x):
"""take a random step but ensure the new position is within the bounds """
min_step = np.maximum(self.xmin - x, -self.stepsize)
max_step = np.minimum(self.xmax - x, self.stepsize)
random_step = np.random.uniform(low=min_step, high=max_step, size=x.shape)
xnew = x + random_step
return xnew
bounded_step = RandomDisplacementBounds(np.array([b[0] for b in bx]), np.array([b[1] for b in bx]))
minimizer_kwargs = {"method":"L-BFGS-B", "bounds": bnds}
globmin = sco.basinhopping(objective,
x0=num_assets*[1./num_assets],
minimizer_kwargs=minimizer_kwargs,
take_step=bounded_step,
disp=True)
The output should be an array of numbers that add up to 1 or 100%. However, this is not happening.
This function is a failure on my end as well. It failed to choose values which were lower -- ie., regardless of output from optimization function (negative or positive), it persisted until the parameter I was optimizing was as bad as it could possibly be. I suspect that since the function violates function encapsulation and relies on "function attributes" to adjust stepsize, the developer may not have respected encapsulated function scope elsewhere, and surprising behavior is happening as a result.
Regardless, in terms of theory, anything else is just a (dubious) estimated numerical partial second derivative (numerical Hessian, or "estimated curvature" for us mere mortals) based "performance" "gain", which reduces to a randomly-biased annealer in discrete, chaotic (phase space, continuous) or mixed (continuous and discrete) search spaces with volatile curvatures or planar areas (due to numerical underflow and loss of precision).
Anyways, import the following:
scipy.optimize.dual_anneal
dual anneal
I am writing a function to evaluate and return a non linear system of equations, and give the jacobian. I then plan to call the function in a while loop to use the newton method to solve the system of equations.
I used the numpy package and read over its documentation, tried to limit the number of iterations, changed the dtype in the array and searched online to see if someone else had a similar problem.
This function is meant to solve a neoclassical growth model (a problem in macroeconomics) in finite time , T. The set of equations include T euler equations, T constraints, and one terminal condition. Thus the result should be an array of length 2T+1 containing the values of the equations, and a (2T+1)x(2T+1) jacobian matrix.
When I try to run the function for small array (arrays of length 1, and 3) it works perfectly. As soon as I try an array of length 5 or more, I start encountering RuntimeWarnings.
import numpy as np
def solver(args, params):
b,s,a,d = params[0], params[1], params[2], params[3]
guess = np.copy(args)
#Euler
euler = guess[:len(guess)//2]**(-sigma) - beta*guess[1:len(guess)//2+1]**(-sigma)*(1-delta+alpha*guess[len(guess)//2+1:]**(alpha-1))
#Budget Constraint
kzero_to_T = np.concatenate(([k0], guess[len(guess)//2+1:]))
bc_t = guess[:len(guess)//2] + guess[len(guess)//2+1:] - kzero_to_T[:-1]**alpha - (1-delta)*kzero_to_T[:-1]
bc_f = guess[len(guess)//2] -kzero_to_T[-1]**alpha - kzero_to_T[-1]*(1-delta)
bc = np.hstack((bc_t, bc_f))
Evals = np.concatenate((euler, bc))
# top half of the jacobian
jac_dot_5 = np.zeros((len(args)//2, len(args)))
for t in range(len(args)//2):
for i in range(len(args)):
if t == i and len(args)//2+(i+1)<=len(args):
jac_dot_5[t][t] = -sigma*args[t]**(-sigma-1)
jac_dot_5[t][t+1] = sigma*beta*args[t+1]*(1-delta+alpha*args[len(args)//2+(t+1)]**(alpha-1))
jac_dot_5[t][len(args)//2+(t+1)] = beta*args[t+1]**(-sigma)*alpha*(alpha-1)*args[len(args)//2+(t+1)]
# bottom half of the jacobian
jac_dot_1 = np.zeros((len(args)//2, len(args)))
for u in range(len(args)//2):
for v in range(len(args)):
if u==v and u>=1 and (len(args)//2 + u+1 < len(args)):
jac_dot_1[u][u] = 1
jac_dot_1[u][len(args)//2+(u)] = 1
jac_dot_1[u][len(args)//2+(u+1)] = -alpha*args[len(args)//2 + (u+1)]**(alpha-1) -(1-delta)
jac_dot_1[0][0] = 1
jac_dot_1[0][len(args)//2 +1] = 1
# last row of the jacobian
final_bc = np.zeros((1,len(args)))
final_bc[0][len(args)//2] = 1
final_bc[0][-1] = -alpha*args[-1]**(alpha-1) -(1-delta)
jac2Tn1 = np.concatenate((jac_dot_5, jac_dot_1, final_bc), axis=0)
point = coll.namedtuple('point', ['Output', 'Jacobian', 'Euler', 'BC'])
result = point(Output = Evals, Jacobian = jac2Tn1, Euler=euler, BC=bc )
return result
The code for implementing the algorithm:
p = (beta, sigma, alpha, delta)
for i in range(20):
k0 = np.linspace(2.49, 9.96, 20)[i]
vars0 = np.array([1,1,1,1,1], dtype=float)
vars1 = np.array([20,20,20,20,20], dtype=float)
Iter2= 0
while abs(solver(vars1,p).Output).max()>1e-8 and Iter2<300:
Iter2+=1
inv_jac1 = np.linalg.inv(solver(vars0,p).Jacobian)
vars1 = vars0 - inv_jac1#solver(vars0,p).Output
vars0=vars1
if Iter2 == 100:
break
I expect the output to be vars1 containing the updated values. The actual output is array([nan, nan, nan, nan, nan]). The way the function has been written, it should be able to give the output for inputs of arbitrary guesses of length 2T+1, where T is number of periods of time.
I get three error messages during the execution of the loop:
C:\Users\Peter\Anaconda3\lib\site-packages\ipykernel_launcher.py:19: RuntimeWarning: invalid value encountered in power
C:\Users\Peter\Anaconda3\lib\site-packages\ipykernel_launcher.py:23: RuntimeWarning: invalid value encountered in power
C:\Users\Peter\Anaconda3\lib\site-packages\ipykernel_launcher.py:41: RuntimeWarning: invalid value encountered in double_scalars
I tried to code my issue from scratch and I couldn't make it any shorter- I need both, the evaluations of the equations and the jacobian to implement the algorithm. From my testing it looks like at some point the equation results (the solver(vars0,p).Output entry) become nan, but I am not sure why that would happen, the array should get close to 0, per the condition abs(solver(vars1,p).Output).max()>1e-8 and then just break out of the loop.
I have a function that I am attempting to minimize for multiple values. For some values it terminates successfully however for others the error
Warning: Maximum number of function evaluations has been exceeded.
Is the error that is given. I am unsure of the role of maxiter and maxfun and how to increase or decrease these in order to successfully get to the minimum. My understanding is that these values are optional so I am unsure of what the default values are.
# create starting parameters, parameters equal to sin(x)
a = 1
k = 0
h = 0
wave_params = [a, k, h]
def wave_func(func_params):
"""This function calculates the difference between a sinewave (sin(x)) and raw_data (different sin wave)
This is the function that will be minimized by modulating a, b, k, and h parameters in order to minimize
the difference between curves."""
a = func_params[0]
b = 1
k = func_params[1]
h = func_params[2]
y_wave = a * np.sin((x_vals-h)/b) + k
error = np.sum((y_wave - raw_data) * (y_wave - raw_data))
return error
wave_optimized = scipy.optimize.fmin(wave_func, wave_params)
You can try using scipy.optimize.minimize with method='Nelder-Mead' https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html.
https://docs.scipy.org/doc/scipy/reference/optimize.minimize-neldermead.html#optimize-minimize-neldermead
Then you can just do
minimum = scipy.optimize.minimize(wave_func, wave_params, method='Nelder-Mead')
n_function_evaluations = minimum.nfev
n_iterations = minimum.nit
or you can customize the search algorithm like this:
minimum = scipy.optimize.minimize(
wave_func, wave_params, method='Nelder-Mead',
options={'maxiter': 10000, 'maxfev': 8000}
)
I don't know anything about fmin, but my guess is that it behaves extremely similarly.
I must solve the Euler Bernoulli differential beam equation which is:
w’’’’(x) = q(x)
and boundary conditions:
w(0) = w(l) = 0
and
w′′(0) = w′′(l) = 0
The beam is as shown on the picture below:
beam
The continious force q is 2N/mm.
I have to use shooting method and scipy.integrate.odeint() func.
I can't even manage to start as i do not understand how to write the differential equation as a system of equation
Can someone who understands solving of differential equations with boundary conditions in python please help!
Thanks :)
The shooting method
To solve the fourth order ODE BVP with scipy.integrate.odeint() using the shooting method you need to:
1.) Separate the 4th order ODE into 4 first order ODEs by substituting:
u = w
u1 = u' = w' # 1
u2 = u1' = w'' # 2
u3 = u2' = w''' # 3
u4 = u3' = w'''' = q # 4
2.) Create a function to carry out the derivation logic and connect that function to the integrate.odeint() like this:
function calc(u, x , q)
{
return [u[1], u[2], u[3] , q]
}
w = integrate.odeint(calc, [w(0), guess, w''(0), guess], xList, args=(q,))
Explanation:
We are sending the boundary value conditions to odeint() for x=0 ([w(0), w'(0) ,w''(0), w'''(0)]) which calls the function calc which returns the derivatives to be added to the current state of w. Note that we are guessing the initial boundary conditions for w'(0) and w'''(0) while entering the known w(0)=0 and w''(0)=0.
Addition of derivatives to the current state of w occurs like this:
# the current w(x) value is the previous value plus the current change of w in dx.
w(x) = w(x-dx) + dw/dx
# others are calculated the same
dw(x)/dx = dw(x-dx)/dx + d^2w(x)/dx^2
# etc.
This is why we are returning values [u[1], u[2], u[3] , q] instead of [u[0], u[1], u[2] , u[3]] from the calc function, because u[1] is the first derivative so we add it to w, etc.
3.) Now we are able to set up our shooting method. We will be sending different initial boundary values for w'(0) and w'''(0) to odeint() and then check the end result of the returned w(x) profile to determine how close w(L) and w''(L) got to 0 (the known boundary conditions).
The program for the shooting method:
# a function to return the derivatives of w
def returnDerivatives(u, x, q):
return [u[1], u[2], u[3], q]
# a shooting funtion which takes in two variables and returns a w(x) profile for x=[0,L]
def shoot(u2, u4):
# the number of x points to calculate integration -> determines the size of dx
# bigger number means more x's -> better precision -> longer execution time
xSteps = 1001
# length of the beam
L= 1.0 # 1m
xSpace = np.linspace(0, L, xSteps)
q = 0.02 # constant [N/m]
# integrate and return the profile of w(x) and it's derivatives, from x=0 to x=L
return odeint(returnDerivatives, [ 0, u2, 0, u4] , xSpace, args=(q,))
# the tolerance for our results.
tolerance = 0.01
# how many numbers to consider for u2 and u4 (the guess boundary conditions)
u2_u4_maxNumbers = 1327 # bigger number, better precision, slower program
# you can also divide into separate variables like u2_maxNum and u4_maxNum
# these are already tested numbers (the best results are somewhere in here)
u2Numbers = np.linspace(-0.1, 0.1, u2_u4_maxNumbers)
# the same as above
u4Numbers = np.linspace(-0.5, 0.5, u2_u4_maxNumbers)
# result list for extracted values of each w(x) profile => [u2Best, u4Best, w(L), w''(L)]
# which will help us determine if the w(x) profile is inside tolerance
resultList = []
# result list for each U (or w(x) profile) => [w(x), w'(x), w''(x), w'''(x)]
resultW = []
# start generating numbers for u2 and u4 and send them to odeint()
for u2 in u2Numbers:
for u4 in u4Numbers:
U = []
U = shoot(u2,u4)
# get only the last row of the profile to determine if it passes tolerance check
result = U[len(U)-1]
# only check w(L) == 0 and w''(L) == 0, as those are the known boundary cond.
if (abs(result[0]) < tolerance) and (abs(result[2]) < tolerance):
# if the result passed the tolerance check, extract some values from the
# last row of the w(x) profile which we will need later for comaprisons
resultList.append([u2, u4, result[0], result[2]])
# add the w(x) profile to the list of profiles that passed the tolerance
# Note: the order of resultList is the same as the order of resultW
resultW.append(U)
# go through the resultList (list of extracted values from last row of each w(x) profile)
for i in range(len(resultList)):
x = resultList[i]
# both boundary conditions are 0 for both w(L) and w''(L) so we will simply add
# the two absolute values to determine how much the sum differs from 0
y = abs(x[2]) + abs(x[3])
# if we've just started set the least difference to the current
if i == 0:
minNum = y # remember the smallest difference to 0
index = 0 # remember index of best profile
elif y < minNum:
# current sum of absolute values is smaller
minNum = y
index = i
# print out the integral for w(x) over the beam
sum = 0
for i in resultW[index]:
sum = sum + i[0]
print("The integral of w(x) over the beam is:")
print(sum/1001) # sum/xSteps
This outputs:
The integral of w(x) over the beam is:
0.000135085272117
To print out the best profile for w(x) that we found:
print(resultW[index])
which outputs something like:
# w(x) w'(x) w''(x) w'''(x)
[[ 0.00000000e+00 7.54147813e-04 0.00000000e+00 -9.80392157e-03]
[ 7.54144825e-07 7.54142917e-04 -9.79392157e-06 -9.78392157e-03]
[ 1.50828005e-06 7.54128237e-04 -1.95678431e-05 -9.76392157e-03]
...,
[ -4.48774290e-05 -8.14851572e-04 1.75726275e-04 1.01560784e-02]
[ -4.56921910e-05 -8.14670764e-04 1.85892353e-04 1.01760784e-02]
[ -4.65067671e-05 -8.14479780e-04 1.96078431e-04 1.01960784e-02]]
To double check the results from above we will also solve the ODE using the numerical method.
The numerical method
To solve the problem using the numerical method we first need to solve the differential equations. We will get four constants which we need to find with the help of the boundary conditions. The boundary conditions will be used to form a system of equations to help find the necessary constants.
For example:
w’’’’(x) = q(x);
means that we have this:
d^4(w(x))/dx^4 = q(x)
Since q(x) is constant after integrating we have:
d^3(w(x))/dx^3 = q(x)*x + C
After integrating again:
d^2(w(x))/dx^2 = q(x)*0.5*x^2 + C*x + D
After another integration:
dw(x)/dx = q(x)/6*x^3 + C*0.5*x^2 + D*x + E
And finally the last integration yields:
w(x) = q(x)/24*x^4 + C/6*x^3 + D*0.5*x^2 + E*x + F
Then we take a look at the boundary conditions (now we have expressions from above for w''(x) and w(x)) with which we make a system of equations to solve the constants.
w''(0) => 0 = q(x)*0.5*0^2 + C*0 + D
w''(L) => 0 = q(x)*0.5*L^2 + C*L + D
This gives us the constants:
D = 0 # from the first equation
C = - 0.01 * L # from the second (after inserting D=0)
After repeating the same for w(0)=0 and w(L)=0 we obtain:
F = 0 # from first
E = 0.01/12.0 * L^3 # from second
Now, after we have solved the equation and found all of the integration constants we can make the program for the numerical method.
The program for the numerical method
We will make a FOR loop to go through the entire beam for every dx at a time and sum up (integrate) w(x).
L = 1.0 # in meters
step = 1001.0 # how many steps to take (dx)
q = 0.02 # constant [N/m]
integralOfW = 0.0; # instead of w(0) enter the boundary condition value for w(0)
result = []
for i in range(int(L*step)):
x= i/step
w = (q/24.0*pow(x,4) - 0.02/12.0*pow(x,3) + 0.01/12*pow(L,3)*x)/step # current w fragment
# add up fragments of w for integral calculation
integralOfW += w
# add current value of w(x) to result list for plotting
result.append(w*step);
print("The integral of w(x) over the beam is:")
print(integralOfW)
which outputs:
The integral of w(x) over the beam is:
0.00016666652805511192
Now to compare the two methods
Result comparison between the shooting method and the numerical method
The integral of w(x) over the beam:
Shooting method -> 0.000135085272117
Numerical method -> 0.00016666652805511192
That's a pretty good match, now lets see check the plots:
From the plots it's even more obvious that we have a good match and that the results of the shooting method are correct.
To get even better results for the shooting method increase xSteps and u2_u4_maxNumbers to bigger numbers and you can also narrow down the u2Numbers and u4Numbers to the same set size but a smaller interval (around the best results from previous program runs). Keep in mind that setting xSteps and u2_u4_maxNumbers too high will cause your program to run for a very long time.
You need to transform the ODE into a first order system, setting u0=w one possible and usually used system is
u0'=u1,
u1'=u2,
u2'=u3,
u3'=q(x)
This can be implemented as
def ODEfunc(u,x): return [ u[1], u[2], u[3], q(x) ]
Then make a function that shoots with experimental initial conditions and returns the components of the second boundary condition
def shoot(u01, u03): return odeint(ODEfunc, [0, u01, 0, u03], [0, l])[-1,[0,2]]
Now you have a function of two variables with two components and you need to solve this 2x2 system with the usual methods. As the system is linear, the shooting function is linear as well and you only need to find the coefficients and solve the resulting linear system.