What's the correct way to do gradient descent on an arbitrary function with no input using Pytorch?
x = torch.tensor(x_init, requires_grad=True)
opt = torch.optim.Adam([x])
cost_fnx = cost(x)
for iteration_count in range(100):
opt.zero_grad()
cost_fnx.backward()
opt.step()
When I tried the above, I got this error:
RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.
The error occurs because you are trying to backpropagate on the same graph multiple times. You most likely need to recompute the cost value (your regularizer function since it only has the model's parameters as input) to backpropagate again. Something like:
x = x_init.requires_grad_(True)
opt = torch.optim.Adam([x])
for iteration_count in range(2):
cost_fnx = cost(x)
opt.zero_grad()
cost_fnx.backward()
opt.step()
Related
I'm trying to create a matrix of gradient with the gradient of each observation by parameters and EPOCH. If my model has 100 obs, 1000 params and 10 Epoch, my matrix should be (100,1000,10).
The problem is that I'm not able to get those gradient. The parameters and the observation are set at required_gradient=True.
I've tried to run this after each observation pass thru the net:
for p in net.parameters():
paramgradlist.append(p.grad)
But the gradient stays the same of each params stays the same for all observations.
Thank you
You are not copying your data and instead of storing a reference to the gradients. In the end, this means all your observations will be the same (i.e. the gradients' final value).
Instead, you could clone the gradients before appending them to the list:
for p in net.parameters():
paramgradlist.append(p.grad.clone())
I have some losses in a loop storing them in a tensor loss. Now I want to multiply a weight tensor to the loss tensor to have final loss, but after torch.dot(), the result scalar, ll_new, has requires_grad=False. The following is my code.
loss_vector = torch.FloatTensor(total_loss_q)
w_norm = F.softmax(loss_vector, dim=0)
ll_new = torch.dot(loss_vector,w_norm)
How can I have requires_grad=False for the ll_new after doing the above?
I think the issue is in the line: loss_vector = torch.FloatTensor(total_loss_q) as requires_grad for loss_vector is False (default value). So, you should do:
loss_vector = torch.FloatTensor(total_loss_q, requires_grad=True)
The issue most likely lies within this part:
I have some losses in a loop storing them in a tensor loss
You are most likely losing requires_grad somewhere in the process before torch.dot. E.g. if you use something like .item() on individual losses when constructing total_loss_q tensor.
What type is your total_loss_q? If it is a list of integers then there is no way your gradients will propagate through that. You need to construct total_loss_q in such a way that it is a tensor which knows how each individual loss was constructed (i.e. can propagate gradients to your trainable weights).
Suppose I have a simple one-hidden-layer network that I'm training in the typical way:
for x,y in trainData:
optimizer.zero_grad()
out = self(x)
loss = self.lossfn(out, y)
loss.backward()
optimizer.step()
This works as expected, but if I instead pre-allocate and update the output array, I get an error:
out = torch.empty_like(trainData.tensors[1])
for i,(x,y) in enumerate(trainData):
optimizer.zero_grad()
out[i] = self(x)
loss = self.lossfn(out[i], y)
loss.backward()
optimizer.step()
RuntimeError: Trying to backward through the graph a second time, but
the buffers have already been freed. Specify retain_graph=True when
calling backward the first time.
What's happening here that in the second version Pytorch attempts to backward through the graph again? Why is this not an issue in the first version? (Note that this error occurs even if I don't zero_grad())
The error implies that the program is trying to backpropagate through a set of operations a second time. The first time you backpropagate through a set of operations, pytorch deletes the computational graph to free memory. Therefore, the second time you try to backpropagate it fails as the graph has already been deleted.
Here's a detailed explanation of the same.
Short answer
Use loss.backward(retain_graph=True). This will not delete the computational graph.
Detailed answer
In the first version, in each loop iteration, a new computational graph is generated every time out = self(x) is run.
Every loop's graph
out = self(x) -> loss = self.lossfn(out, y)
In the second version, since out is declared outside the loop, the computational graphs in every loop have a parent node outside.
- out[i] = self(x) -> loss = self.lossfn(out[i], y)
out[i] - | - out[i] = self(x) -> loss = self.lossfn(out[i], y)
- out[i] = self(x) -> loss = self.lossfn(out[i], y)
Therefore, here's a timeline of what happens.
The first iteration runs
The computation graph is deleted including the parent node
The second iteration attempts to backpropagate but failed since it didn't find the the parent node
I've been trying to implement a custom objective function in Keras (the negative log likelihood of the normal distribution)
Keras expects one argument for the ground truth tensor, and one for the predictions tensor; for y_pred,I'm passing a tensor that should represent a nx2 matrix where the first column is the mean of the distribution, and the second the precision.
My problem is that I haven't been able to get a clear idea how I properly slice y_pred before passing it into the likelihood function without yielding the error
'Expected an array-like object, but found a Variable: maybe you are trying to call a function on a (possibly shared) variable instead of a numeric array?'
While I understand that I'm feeding l_func arguments of the variable type when it expects an array,I don't seem to be able to grok how to properly split the input y_pred variable into its mean and precision components to plug into the likelihood function. Here are some attempts; if someone could enlighten me about how to proceed, I would greatly appreciate it.
def log_likelihood(y_true,y_pred):
mu = T.vector('mu')
beta = T.vector('beta')
x=T.vector('x')
likelihood = .5*(beta*(x-mu)**2)-T.log(beta/(2*np.pi))
l_func = function([mu,beta,x], likelihood)
return(l_func(y_pred[:,0],y_pred[:,1],y_true))
def log_likelihood(y_true,y_pred):
likelihood = .5*(y_pred[:,1]*(y_true-y_pred[:,0])**2)-T.log(y_pred[:,1]/(2*np.pi))
l_func = function([y_true,y_pred], likelihood)
return(l_func(y_true,y_pred))
def log_likelihood(y_true,y_pred):
mu=y_pred[:,0]
beta=y_pred[:,1]
x=y_true
mu_function=function([y_pred],mu)
beta_function=function([y_pred],beta)
id_function=function([y_true],x)
likelihood = .5*(beta_function(y_pred)*(id_function(y_true)-mu_function(y_pred))**2)-T.log(beta_function(y_pred)/(2*np.pi))
l_func = function([y_true,y_pred], likelihood)
return(l_func(y_true,y_pred))
For diagnostic purposes, I am grabbing the gradients of the network periodically. One way to do this is to return the gradients as output of the theano function. However, copying the gradients from the GPU to CPU memory every time may be costly so I would prefer to do it only periodically. At the moment, I am achieving this by creating two function objects, one which returns the gradient and one which doesn't.
However, I do not know whether this is optimal and am looking for a more elegant way to achieve the same thing.
Your first function obviously executes a training step and updates all your parameters.
The second function must return the gradients of your parameters.
The fastest way to do what you are asking is to add the updates for the training step to the second function and when logging the gradients, don't call the first function, but only the second.
gradients = [ ... ]
train_f = theano.function([x, y], [], updates=updates)
train_grad_f = theano.function([x, y], gradients, updates=updates)
num_iters = 1000
grad_array = []
for i in range(num_iters):
# every 10 training steps keep log of gradients
if i % 10 == 0:
grad_array.append(train_grad_f(...))
else:
train_f(...)
Update
if you wish to have a single function to do this, you can do the following
from theano.ifelse import ifelse
no_grad = T.iscalar('no_grad')
example_gradient = T.grad(example_cost, example_variable)
# if no_grad is > 0 then return the gradient, otherwise return zeros array
out_grad = ifelse(T.gt(no_grad,0), example_gradient, T.zeros_like(example_variable))
train_f = theano.function([x, y, no_grad], [out_grad], updates=updates)
So when you want to retrieve the gradients you call
train_f(x_data, y_data, 1)
otherwise
train_f(x_data, y_data, 0)