gradient for fmin_tnc not working - python-3.x

I am training multiclass logistic regression for handwritting recognition.For function minimization i am using fmin_tnc.
I have implemented gradient function as follows:
def gradient(theta,*args):
X,y,lamda = args;
m = np.size(X,0);
h = X.dot(theta);
grad = (1/m) * X.T.dot( sigmoid(h)-y );
grad[1:np.size(grad),] = grad[1:np.size(grad),] + (lamda/
m)*theta[1:np.size(theta),] ;
return grad.flatten()
#flattened because fmin_tnc accepts list of gradients
This yields correct gradient values for small set example provided below:
theta_t = np.array([[-2],[-1],[1],[2]]);
X_t = np.array([[1,0.1,0.6,1.1],[1,0.2,0.7,1.2],[1,0.3,0.8,1.3],
[1,0.4,0.9,1.4],[1,0.5,1,1.5]])
y_t = np.array([[1],[0],[1],[0],[1]])
lamda_t = 3
But when using checkgrad function from scipy its giving error of 0.6222474393497573
I am not able to trace why this is happening.Because of this may be fmin_tnc is not performing any optimization and always gives optimized parameters equal to initial parameters given.

fmin_tnc function call is as follows:
optimize.fmin_tnc(func=lrcostfunction, x0=initial_theta,fprime = gradient,args=
(X,tmp_y.flatten(),lamda))
As y and theta passed is of form 1-d array having size(n,) it should be converted to 2-d array having size (n,1).This is because 2-d array form is used in gradient function implementation.
Correct implementation is as follow:
def gradient(theta,*args):
#again y and theta reshaped for same reason
X,y,lamda = args;
l = np.size(X,1);
theta = np.reshape(theta,(l,1));
m = np.size(X,0);
y = np.reshape(y,(m,1));
h = sigmoid( X.dot(theta) );
grad = (1/m) * X.T.dot( h-y );
grad[1:np.size(grad),] = grad[1:np.size(grad),] +
(lamda/m)*theta[1:np.size(theta),] ;
return grad.ravel()

Related

Linear spline loop to quadratic spline loop

def LinearSpline(x, fx): #to determine the coefficients
'''
Valid call:
coeffs = LinearSpline(x, fx)
Inputs:
x : (array) x values at which we have f(x) values.
fx : (array) f(x)values associated with x values.
Output:
coeffs : (array) coefficients of the linear spline.
Assumption:
All inputs are given correctly.
'''
nsegs = len(x)-1
A = np.zeros((2*nsegs, 2*nsegs))
b = np.zeros((2*nsegs, 1))
for i in range(nsegs):
A[2*i, 2*i] = x[i]
A[2*i, 2*i+1] = 1.0
A[2*i+1, 2*i] = x[i+1]
A[2*i+1, 2*i+1] = 1.0
b[2*i] = fx[i]
b[2*i+1] = fx[i+1]
# solve the system
coeffs = la.solve(A, b)
print(A)
print(b)
return coeffs
*I created a linear spline loop...need to also create a qudratic but is having trouble printing the "c" componet for the c coefficient....any help would be greatly appreciated

How to use scipy's least_squares

I am trying to implement a simple model estimation in Python.
I have an ARCH model:
logR_t = u + theta_1 * logR_t + \epsilon_t
where logR_t are my log-returns vector, u and theta_1 are the two parameters to be estimated and \epsilon_t are my residuals.
In Matlab, I have the following lines to call the optimiser on the function Error_ARCH. The initial guess for the parameters is 1, their lower bounds are -10 and upper bounds are 10.
ARCH.param = lsqnonlin( #(param) Error_ARCH(param, logR), [1 1], [-10 -10], [10 10]);
[ARCH.Error, ARCH.Residuals] = Error_ARCH( ARCH.param, logR);
Where the error to minimise is given as:
function [error, residuals] = Error_ARCH(param, logreturns)
% Initialisation
y_hat = zeros(length(logreturns), 1 );
% Parameters
u = param(1);
theta1 = param(2);
% Define model
ARCH =#(z) u + theta1.*z;
for i = 2:length(logreturns)
y_hat(i) = ARCH( logreturns(i-1) );
end
error = abs( logreturns - y_hat );
residuals = logreturns - y_hat;
end
I would like a similar thing in Python but I am stuck since I do not know where to specify the arguments to the least_squares function in SciPy. So far I have:
from scipy.optimize import least_squares
def model(param, z):
"""This is the model we try to estimate equation"""
u = param[0]
theta1 = param[1]
return u + theta1*z
def residuals_ARCH(param, z):
return z - model(param, z)
When I call the lsq optimisizer, I get an error:
residuals_ARCH() missing 1 required positional argument: 'z'
guess = [1, 1]
result = least_squares(residuals_ARCH, x0=guess, verbose=1, bounds=(-10, 10))
Thank you for all your help
The least_squares method expects a function with signature fun(x, *args, **kwargs). Hence, you can use a lambda expression similar to your Matlab function handle:
# logR = your log-returns vector
result = least_squares(lambda param: residuals_ARCH(param, logR), x0=guess, verbose=1, bounds=(-10, 10))

How to resolve value error in Scipy function fmintnc?

I am trying to implement coursera assignments in python, while doing Scipy optimise for logistic regression. However, I am getting the error below.
Can any one help!
Note: cost, gradient functions are working fine.
#Sigmoid function
def sigmoid(z):
h_of_z = np.zeros([z.shape[0]])
h_of_z = np.divide(1,(1+(np.exp(-z))))
return h_of_z
def cost(x,y,theta):
m = y.shape[0]
h_of_x = sigmoid(np.matmul(x,theta))
term1 = sum(-1 * y.T # np.log(h_of_x) - (1-y.T) # np.log(1-h_of_x))
J = 1/m * term1
return J
def grad(x,y,theta):
grad = np.zeros_like(theta)
m = y.shape[0]
h_of_x = sigmoid(x#theta)
grad = (x.T # (h_of_x - y)) * (1/m)
return grad
#add intercept term for X
x = np.hstack([np.ones_like(y),X[:,0:2]])
#initialise theta
[m,n] = np.shape(x)
initial_theta = np.zeros([n,1])
#optimising theta from given theta and gradient
result = opt.fmin_tnc(func=cost, x0=initial_theta, args=(x, y))
ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 99 is different from 3)
I got it !
so the problem is fmin_tnc function programmed in a way we should parse the the parameter 'theta' before calling arguments x and y .
Since in my function 'cost' I have passed x and y first, it interpreted values differently so thrown ValueError .
Below are the corrected code..
def sigmoid(x):
return 1/(1+np.exp(-x))
def cost(theta,x,y):
J = (-1/m) * np.sum(np.multiply(y, np.log(sigmoid(x # theta)))
+ np.multiply((1-y), np.log(1 - sigmoid(x # theta))))
return J
def gradient(theta,x,y):
h_of_x = sigmoid(x#theta)
grad = 1 / m * (x.T # (h_of_x - y))
return grad
#initialise theta
init_theta = np.zeros([n+1,1])
#optimise theta
from scipy import optimize as op
result = op.fmin_tnc(func=cost,
x0=init_theta.flatten(),
fprime=gradient,
args=(x,y.flatten()))

How to calculate backpropagation through tf.while_loop to use as loss function

I want to implement an Fourier Ring Correlation Loss for two images to train a GAN. Therefore I'd like to loop over a specific amount of times and calculate the loss. This works fine for a normal Python loop. To speed up the process I want to use the tf.while_loop but unfortunately I am not able to track the gradients through my while loop. I constructed a dummy example just to calculate gradients during a while loop but it doesn't work. First, the working python loop :
x = tf.constant(3.0)
y = tf.constant(2.0)
for i in range(3):
y = y * x
grad = tf.gradients(y, x)
with tf.Session() as ses:
print("output : ", ses.run(grad))
This works and gives the output
[54]
If i do the same with a tf.while_loop it doesn't work:
a = tf.constant(0, dtype = tf.int64)
b = tf.constant(3, dtype = tf.int64)
x = tf.constant(3.0)
y = tf.constant(2.0)
def cond(a,b,x,y):
return tf.less(a,b)
def body(a,b,x,y):
y = y * x
with tf.control_dependencies([y]):
a = a + 1
return [a,b,x,y]
results = tf.while_loop(cond, body, [a,b,x,y], back_prop = True)
grad = tf.gradients(y, results[2])
with tf.Session() as ses:
print("grad : ", ses.run(grad))
The output is :
TypeError: Fetch argument None has invalid type '<'class 'NoneType'>
So I guess somehow tensorflow is not able to do the backpropagation.
The problem still accours if you use tf.GradientTape() instead of tf.gradients().
I changed the code so that it now outputs the gradients:
import tensorflow as tf
a = tf.constant(0, dtype = tf.int64)
b = tf.constant(3, dtype = tf.int64)
x = tf.Variable(3.0, tf.float32)
y = tf.Variable(2.0, tf.float32)
dy = tf.Variable(0.0, tf.float32)
def cond(a,b,x,y,dy):
return tf.less(a,b)
def body(a,b,x,y,dy):
y = y * x
dy = tf.gradients(y, x)[0]
with tf.control_dependencies([y]):
a = a + 1
return [a,b,x,y,dy]
init = tf.global_variables_initializer()
with tf.Session() as ses:
ses.run(init)
results = ses.run(tf.while_loop(cond, body, [a,b,x,y,dy], back_prop = True))
print("grad : ", results[-1])
The things I modified:
I made x and y into variables and added their initialisation init.
I added a variable called dy which will contain the gradient of y.
I moved the tf.while_loop inside the session.
Put the evaluation of the gradient inside the body function
I think the problem before was that when you define grad = tf.gradients(y, results[2]) the loop has not run yet, so y is not a function of x. Therefore, there is no gradient.
Hope this helps.

Neural network numerical gradient check not working with matrices using Python-numpy

I'm trying to implement a simple numerical gradient check using Python 3 and numpy to be used for neural network.
It works well for simple 1D functions but fails when applied to matrices of parameters.
My guess is that either my cost function is not calculated well for a matrix or that the way I do the numerical gradient check is wrong somehow.
See code below and thanks for your help!
import numpy as np
import random
import copy
def gradcheck_naive(f, x):
""" Gradient check for a function f.
Arguments:
f -- a function that takes a single argument (x) and outputs the
cost (fx) and its gradients grad
x -- the point (numpy array) to check the gradient at
"""
rndstate = random.getstate()
random.setstate(rndstate)
fx, grad = f(x) # Evaluate function value at original point
#fx=cost
#grad=gradient
h = 1e-4
# Iterate over all indexes in x
it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
while not it.finished:
ix = it.multi_index #multi-index number
random.setstate(rndstate)
xp = copy.deepcopy(x)
xp[ix] += h
fxp, gradp = f(xp)
random.setstate(rndstate)
xn = copy.deepcopy(x)
xn[ix] -= h
fxn, gradn = f(xn)
numgrad = (fxp-fxn) / (2*h)
# Compare gradients
reldiff = abs(numgrad - grad[ix]) / max(1, abs(numgrad), abs(grad[ix]))
if reldiff > 1e-5:
print ("Gradient check failed.")
print ("First gradient error found at index %s" % str(ix))
print ("Your gradient: %f \t Numerical gradient: %f" % (
grad[ix], numgrad))
return
it.iternext() # Step to next dimension
print ("Gradient check passed!")
#sanity check with 1D function
exp_f = lambda x: (np.sum(np.exp(x)), np.exp(x))
gradcheck_naive(exp_f, np.random.randn(4,5)) #this works fine
#sanity check with matrices
#forward pass
W = np.random.randn(5,10)
x = np.random.randn(10,3)
D = W.dot(x)
#backpropagation pass
gradx = W
func_f = lambda x: (np.sum(W.dot(x)), gradx)
gradcheck_naive(func_f, np.random.randn(10,3)) #this does not work (grad check fails)
I figured it out! (my math teacher would be so proud...)
The short answer is that I was mixing up matrices dot product and element wise product.
When using an element wise product, the gradient is equal to:
W = np.array([[2,4],[3,5],[3,1]])
x = np.array([[1,7],[5,-1],[4,7]])
D = W*x #element-wise multiplication
gradx = W
func_f = lambda x: (np.sum(W*x), gradx)
gradcheck_naive(func_f, np.random.randn(3,2))
When using the dot product, the gradient becomes:
W = np.array([[2,4],[3,5]]))
x = np.array([[1,7],[5,-1],[5,1]])
D = x.dot(W)
unitary = np.array([[1,1],[1,1],[1,1]])
gradx = unitary.dot(np.transpose(W))
func_f = lambda x: (np.sum(x.dot(W)), gradx)
gradcheck_naive(func_f, np.random.randn(3,2))
I was also wondering how did the element wise product behave with matrices of not equal dimensions like below:
x = np.random.randn(10)
W = np.random.randn(3,10)
D1 = x*W
D2 = W*x
Turns out that D1=D2 (same dimension as W=3x10) and my understanding is that x is being broadcasted by numpy to be a 3x10 matrix to allow the element wise multiplication.
Conclusion: when in doubt, write it out with small matrices to figure out where the error is.

Resources