I am trying to compute the gradient of
out = x.sign()*torch.pow(x.abs(), alpha)
with respect to alpha.
I tried the following so far:
class Power(nn.Module):
def __init__(self, alpha=2.):
super(Power, self).__init__()
self.alpha = nn.Parameter(torch.tensor(alpha))
def forward(self, x):
return x.sign()*torch.abs(x)**self.alpha
but this class keeps giving me nan in training of my network. I expect to see something like grad=out*torch.log(x) but cannot get to it. This code, for instance, returns nothing:
alpha_rooting = Power()
x = torch.randn((1), device='cpu', dtype=torch.float)
out = (alpha_rooting(x)).sum()
out.backward()
print(out.grad)
I am trying to use autograd for this by no luck either. How should I go about solving this? Thanks.
The Power() class that you have written works as expected. There is a problem in how you are using it actually. Gradients are stored in .grad of that variable and not the out variable as you have used above. You can change the code as below.
alpha_rooting = Power()
x = torch.randn((1), device='cpu', dtype=torch.float)
out = (alpha_rooting(x)).sum()
# compute gradients of all parameters with respect to out (dout/dparam)
out.backward()
# print gradient of alpha
# Note that gradients are store in .grad of parameter not out variable
print(alpha_rooting.alpha.grad)
# compare if it is approximately correct to exact grads
err = (alpha_rooting.alpha.grad - out*torch.log(x))**2
if (err <1e-8):
print("Gradients are correct")
Related
I was reading this blog from PyTorch. Just before the AutoGrad in training Section , it is mentioned
Be aware that only leaf nodes of the computation have their gradients computed. If you tried, for example, print(c.grad) you’d get back None. In this simple example, only the input is a leaf node, so only it has gradients computed.
Then weights are also considered to be leaf nodes. In the subsequent AutoGrad in training Section this below code block is executed.
BATCH_SIZE = 16
DIM_IN = 1000
HIDDEN_SIZE = 100
DIM_OUT = 10
class TinyModel(torch.nn.Module):
def __init__(self):
super(TinyModel, self).__init__()
self.layer1 = torch.nn.Linear(1000, 100)
self.relu = torch.nn.ReLU()
self.layer2 = torch.nn.Linear(100, 10)
def forward(self, x):
x = self.layer1(x)
x = self.relu(x)
x = self.layer2(x)
return x
some_input = torch.randn(BATCH_SIZE, DIM_IN, requires_grad=False)
ideal_output = torch.randn(BATCH_SIZE, DIM_OUT, requires_grad=False)
model = TinyModel()
When
print(model.layer2.weight.grad)
is executed it is shown as None.
But after training with the below code snippet, the weights have gradients,
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)
prediction = model(some_input)
loss = (ideal_output - prediction).pow(2).sum()
loss.backward()
print(model.layer2.weight.grad[0][0:10])
So is my understanding correct? i.e when weights are initialised by calling TinyModel(), the requires_autograd is set to False. Only when the training starts happening with loss.backward() then the requires_autograd is set to True and the gradient is kept track?
But in other examples, when we create PyTorch models from scratch, where we initialise the weights randomly along with requires_grad=True, the gradient is tracked from beginning.
Or is the gradient tracking generally enabled only when it is started to be trained? If so why initially it was returning None in the above example.
Thank You in advance
I assume by requires_autograd you mean requires_grad.
when weights are initialised by calling TinyModel(), the requires_autograd is set to False.
No, this isn't true. The attribute model.layer2.weight is an instance of nn.Parameter which has requires_grad == True by default. You can verify this yourself:
model = TinyModel()
assert isinstance(model.layer2.weight, nn.Parameter)
assert model.layer2.weight.requires_grad
assert all(p.requires_grad for p in model.parameters())
why initially it was returning None in the above example
The value of model.layer2.weight.grad is None because at that point no gradient is computed yet. In fact, no forward computation is even computed yet. When loss.backward() is executed, the autograd engine computes the gradient of all tensor p with p.requires_grad == True and stores this gradient in p.grad. That's why model.layer2.weight.grad is no longer None after loss.backward().
i'm trying to define the loss function of a two-class classification problem. However, the target label is not hard label 0,1, but a float number between 0~1.
torch.nn.CrossEntropy in Pytorch do not support soft label so i'm trying to write a cross entropy function by my self.
My function looks like this
def cross_entropy(self, pred, target):
loss = -torch.mean(torch.sum(target.flatten() * torch.log(pred.flatten())))
return loss
def step(self, batch: Any):
x, y = batch
logits = self.forward(x)
loss = self.criterion(logits, y)
preds = logits
# torch.argmax(logits, dim=1)
return loss, preds, y
however it does not work at all.
Can anyone give me a suggestion is there any mistake in my loss function?
It seems like BCELoss and the robust version BCEWithLogitsLoss are working with fuzzy targets "out of the box". They do not expect target to be binary" any number between zero and one is fine.
Please read the doc.
I'm exporting a PyTorch model via TorchScript tracing, but I'm facing issues. Specifically, I have to perform some operations on tensor sizes, but the JIT compilers hardcodes the variable shapes as constants, braking compatibility with tensor of different sizes.
For example, create the class:
class Foo(nn.Module):
"""Toy class that plays with tensor shape to showcase tracing issue.
It creates a new tensor with the same shape as the input one, except
for the last dimension, which is doubled. This new tensor is filled
based on the values of the input.
"""
def __init__(self):
nn.Module.__init__(self)
def forward(self, x):
new_shape = (x.shape[0], 2*x.shape[1]) # incriminated instruction
x2 = torch.empty(size=new_shape)
x2[:, ::2] = x
x2[:, 1::2] = x + 1
return x2
and run the test code:
x = torch.randn((3, 5)) # create example input
foo = Foo()
traced_foo = torch.jit.trace(foo, x) # trace
print(traced_foo(x).shape) # obviously this works
print(traced_foo(x[:, :4]).shape) # but fails with a different shape!
I could solve the issue by scripting, but in this case I really need to use tracing. Moreover, I think that tracing should be able to handle tensor size manipulations correctly.
but in this case I really need to use tracing
You can freely mix torch.script and torch.jit wherever needed. For example one could do this:
import torch
class MySuperModel(torch.nn.Module):
def __init__(self, *args, **kwargs):
super().__init__()
self.scripted = torch.jit.script(Foo(*args, **kwargs))
self.traced = Bar(*args, **kwargs)
def forward(self, data):
return self.scripted(self.traced(data))
model = MySuperModel()
torch.jit.trace(model, (input1, input2))
You could also move part of the functionality dependent on shape to separate function and decorate it with #torch.jit.script:
#torch.jit.script
def _forward_impl(x):
new_shape = (x.shape[0], 2*x.shape[1]) # incriminated instruction
x2 = torch.empty(size=new_shape)
x2[:, ::2] = x
x2[:, 1::2] = x + 1
return x2
class Foo(nn.Module):
def forward(self, x):
return _forward_impl(x)
There is no other way than script for that as it has to understand your code. With tracing it merely records operations you perform on the tensor and has no knowledge of control flow dependent on data (or shape of data).
Anyway, this should cover most of the cases and if it doesn't you should be more specific.
This bug has been fixed on 1.10.2
I am using U - Net and implementing the weighting technique described in the papers from 2015 (U-Net: Convolutional Networks for Biomedical
Image Segmentation) and 2019 (U-Net – Deep Learning for Cell Counting, Detection, and Morphometry). In that technique there is a variance σ and a weight w_0. I would like, especially the σ, to be a learnable parameter instead of guessing which value is best from dataset to dataset.
From what I found, I can do this using nn.Parameter.
To use the learned σ from epoch to epoch, I need somehow to pass this new value to the get_item function of the DataSet through the DataLoader.
My current take on this, is to extend torch.utils.data.DataLoader where the new init has an extra parameter accepting the user specified/learnable parameters. Given the source code of torch.utils.data.DataLoader, I do not understand where and how the DataLoader calls the DataSet instance and hence to pass these parameters.
Code wise, in the DataSet definition there is the function
def __getitem__(self, index):
that I can change as
def __getitem__(self, index, sigma):
and to make use of the updated, newly learned σ.
My problem is that during training, I iterate through training dataset as
for epoch in range( checkpoint[ 'epoch'], num_epochs):
....
for ii, ( X, y, y_weight, fname) in enumerate( dataLoader[ phase]):
In that enumeration of DataLoader, how can I pass the new σ to the DataLoader such that the DataLoader will pass it to the DataSet getitem function mentioned above?
EDIT
Currently, I define inside the DataSet class a parameter sigma
class MedicalImageDataset( Dataset):
def __init__(self, fname, img_transform = None, mask_transform = None, weight_transform = None, sigma = 8):
...
self.sigma = sigma
def __getitem__(self, index):
sigma = self.sigma
...
which I update through the DataLoader as
dataLoader[ 'train'].dataset.sigma = model.sigma
where,
model.sigma
is a custom parameter defined as
model.register_parameter( name = 'sigma', param = torch.nn.Parameter( torch.tensor( 16, dtype = torch.float16), requires_grad = True))
after creating the model.
My problem is, that model.sigma doesn't look being updated from epoch to epoch. Specifically, is the same as the initial value. Why is this?
Having a look at optimizer.state_dict() I couldn't find any parameter named 'sigma', whereas I can find one in model.named_parameters().
Finally, this parameter sigma is not attached to any layer, it's kinda "free".
What you need to do is to set sigma as an attribute of the Dataset and change it between epochs.
For the dataset definition
class UNetDataset(object):
def __init__(self, ..., sigma=5):
self.sigma = sigma
Now, within __getitem__, you can use the sigma value using self.sigma
Now within your training cycle, after every epoch, you can change the sigma value by setting the sigma attribute of the Dataset
for epoch in range(num_epochs):
dataset.sigma = #whatever value you want
for i,(x,y) in enumarate(DataLoader):
Given a simple 2 layer neural network, the traditional idea is to compute the gradient w.r.t. the weights/model parameters. For an experiment, I want to compute the gradient of the error w.r.t the input. Are there existing Pytorch methods that can allow me to do this?
More concretely, consider the following neural network:
import torch.nn as nn
import torch.nn.functional as F
class NeuralNet(nn.Module):
def __init__(self, n_features, n_hidden, n_classes, dropout):
super(NeuralNet, self).__init__()
self.fc1 = nn.Linear(n_features, n_hidden)
self.sigmoid = nn.Sigmoid()
self.fc2 = nn.Linear(n_hidden, n_classes)
self.dropout = dropout
def forward(self, x):
x = self.sigmoid(self.fc1(x))
x = F.dropout(x, self.dropout, training=self.training)
x = self.fc2(x)
return F.log_softmax(x, dim=1)
I instantiate the model and an optimizer for the weights as follows:
import torch.optim as optim
model = NeuralNet(n_features=args.n_features,
n_hidden=args.n_hidden,
n_classes=args.n_classes,
dropout=args.dropout)
optimizer_w = optim.SGD(model.parameters(), lr=0.001)
While training, I update the weights as usual. Now, given that I have values for the weights, I should be able to use them to compute the gradient w.r.t. the input. I am unable to figure out how.
def train(epoch):
t = time.time()
model.train()
optimizer.zero_grad()
output = model(features)
loss_train = F.nll_loss(output[idx_train], labels[idx_train])
acc_train = accuracy(output[idx_train], labels[idx_train])
loss_train.backward()
optimizer_w.step()
# grad_features = loss_train.backward() w.r.t to features
# features -= 0.001 * grad_features
for epoch in range(args.epochs):
train(epoch)
It is possible, just set input.requires_grad = True for each input batch you're feeding in, and then after loss.backward() you should see that input.grad holds the expected gradient. In other words, if your input to the model (which you call features in your code) is some M x N x ... tensor, features.grad will be a tensor of the same shape, where each element of grad holds the gradient with respect to the corresponding element of features. In my comments below, I use i as a generalized index - if your parameters has for instance 3 dimensions, replace it with features.grad[i, j, k], etc.
Regarding the error you're getting: PyTorch operations build a tree representing the mathematical operation they are describing, which is then used for differentiation. For instance c = a + b will create a tree where a and b are leaf nodes and c is not a leaf (since it results from other expressions). Your model is the expression, and its inputs as well as parameters are the leaves, whereas all intermediate and final outputs are not leaves. You can think of leaves as "constants" or "parameters" and of all other variables as of functions of those. This message tells you that you can only set requires_grad of leaf variables.
Your problem is that at the first iteration, features is random (or however else you initialize) and is therefore a valid leaf. After your first iteration, features is no longer a leaf, since it becomes an expression calculated based on the previous ones. In pseudocode, you have
f_1 = initial_value # valid leaf
f_2 = f_1 + your_grad_stuff # not a leaf: f_2 is a function of f_1
to deal with that you need to use detach, which breaks the links in the tree, and makes the autograd treat a tensor as if it was constant, no matter how it was created. In particular, no gradient calculations will be backpropagated through detach. So you need something like
features = features.detach() - 0.01 * features.grad
Note: perhaps you need to sprinkle a couple more detaches here and there, which is hard to say without seeing your whole code and knowing the exact purpose.