I'm exporting a PyTorch model via TorchScript tracing, but I'm facing issues. Specifically, I have to perform some operations on tensor sizes, but the JIT compilers hardcodes the variable shapes as constants, braking compatibility with tensor of different sizes.
For example, create the class:
class Foo(nn.Module):
"""Toy class that plays with tensor shape to showcase tracing issue.
It creates a new tensor with the same shape as the input one, except
for the last dimension, which is doubled. This new tensor is filled
based on the values of the input.
"""
def __init__(self):
nn.Module.__init__(self)
def forward(self, x):
new_shape = (x.shape[0], 2*x.shape[1]) # incriminated instruction
x2 = torch.empty(size=new_shape)
x2[:, ::2] = x
x2[:, 1::2] = x + 1
return x2
and run the test code:
x = torch.randn((3, 5)) # create example input
foo = Foo()
traced_foo = torch.jit.trace(foo, x) # trace
print(traced_foo(x).shape) # obviously this works
print(traced_foo(x[:, :4]).shape) # but fails with a different shape!
I could solve the issue by scripting, but in this case I really need to use tracing. Moreover, I think that tracing should be able to handle tensor size manipulations correctly.
but in this case I really need to use tracing
You can freely mix torch.script and torch.jit wherever needed. For example one could do this:
import torch
class MySuperModel(torch.nn.Module):
def __init__(self, *args, **kwargs):
super().__init__()
self.scripted = torch.jit.script(Foo(*args, **kwargs))
self.traced = Bar(*args, **kwargs)
def forward(self, data):
return self.scripted(self.traced(data))
model = MySuperModel()
torch.jit.trace(model, (input1, input2))
You could also move part of the functionality dependent on shape to separate function and decorate it with #torch.jit.script:
#torch.jit.script
def _forward_impl(x):
new_shape = (x.shape[0], 2*x.shape[1]) # incriminated instruction
x2 = torch.empty(size=new_shape)
x2[:, ::2] = x
x2[:, 1::2] = x + 1
return x2
class Foo(nn.Module):
def forward(self, x):
return _forward_impl(x)
There is no other way than script for that as it has to understand your code. With tracing it merely records operations you perform on the tensor and has no knowledge of control flow dependent on data (or shape of data).
Anyway, this should cover most of the cases and if it doesn't you should be more specific.
This bug has been fixed on 1.10.2
Related
Let's say I have a DataLoader
dataloader = DataLoader(dataset, batch_size=32)
I want to define a neural network that can feed forward my custom function. I know in traditional fully-connected network, I could just using its Linear or other already existed functions. And, importantly, those functions can feed with DataLoader and automatically run with batch.
Now I want to define my own function, but I don't know how to write it without for loop. For example (I randomly write some custom function f(x)),
def f(x):
x = np.sin(np.exp(x)) + np.log(x) - 1/x
return x
class NeuralNet(nn.Module):
def __init__(self, input_dim):
super(NeuralNet, self).__init__()
self.net = f
def forward(self, x_batch):
result = torch.zeros(len(x_batch))
for i in len(x_batch):
result[i] = self.net(x_batch[i])
return result
Or for loop in f
def f(x_batch):
result = torch.zeros(len(x_batch))
for i in range(x_batch):
result[i] = np.sin(np.exp(x_batch[i])) + np.log(x_batch[i]) - 1/x_batch[i]
return result
class NeuralNet(nn.Module):
def __init__(self, input_dim):
super(NeuralNet, self).__init__()
self.net = f
def forward(self, x_batch):
return self.net(x_batch)
Is there any way to get rid of for loop, in order to do parallel calculation on GPU? Cause I think for loop is not going to utilize the advantage of GPU. Or did I misunderstand something?
Basic mathematical operators implemented in PyTorch work with n-dimension tensors out-of-the-box. If you define your f function this way, you can feed the whole batch in one go.
In your case you could do:
def f(x):
x = torch.sin(torch.exp(x)) + torch.log(x) - 1/x
return x
class NeuralNet(nn.Module):
def __init__(self):
super(NeuralNet, self).__init__()
self.net = f
def forward(self, x_batch):
result = self.net(x_batch)
return result
>>> net = NeuralNet()
>>> net(torch.tensor([[1,2,3]]))
If you stick with PyTorch operators (most have a backward function implemented) and don't switch to Numpy at some point. Then you will be able to call the backward pass on your model.
Of course, all of these computations can be carried out on the GPU!
Pennylane provides a torch layer
https://pennylane.readthedocs.io/en/stable/code/api/pennylane.qnn.TorchLayer.html
I am re-implementing the Invertible Residual Networks architecture.
class iResNetBlock(nn.Module):
def __init__(self, input_size, hidden_size):
self.bottleneck = nn.Sequential(
LinearContraction(input_size, hidden_size),
LinearContraction(hidden_size, input_size),
nn.ReLU(),
)
def forward(self, x):
return x + self.bottleneck(x)
def inverse(self, y):
x = y.clone()
while not converged:
# fixed point iteration
x = y - self.bottleneck(x)
return x
I want to add a custom backward pass to the inverse function. Since it is a fixed point iteration, one can make use of the implicit function theorem to avoid unrolling of the loop, and instead compute the gradient by solving a linear system. This is for example done in the Deep Equilibrium Models architecture.
def inverse(self, y):
with torch.no_grad():
x = y.clone()
while not converged:
# fixed point iteration
x = y - self.bottleneck(x)
return x
def custom_backward_inverse(self, grad_output):
pass
How do I register my custom backwards pass for this function? I want that, when I later define some loss such as r = loss(y, model.inverse(other_model(model(x)))), that r.backwards() correctly uses my custom gradient for the inverse call.
Ideally the solution should be torchscript-compatible.
I'm trying to develop a layer in Keras which works with 3D tensors. To make it flexible, I would like to postpone the code that relies on the input's exact shape as much as possible.
My layer is overriding 5 methods:
from tensorflow.python.keras.layers import Layer
class MyLayer(Layer):
def __init__(self, **kwargs):
pass
def build(self, input_shape):
pass
def call(self, inputs, verbose=False):
second_dim = K.int_shape(inputs)[-2]
# Do something with the second_dim
def compute_output_shape(self, input_shape):
pass
def get_config(self):
pass
And I'm using this layer like this:
input = Input(batch_shape=(None, None, 128), name='input')
x = MyLayer(name='my_layer')(input)
model = Model(input, x)
But I'm facing an error since the second_dim is None. How can I develop a layer that relies on the dimensions of the input but it's ok with it being provided by the actual data and not the input layer?
I ended up asking the same question differently, and I've got a perfect answer:
What is the right way to manipulate the shape of a tensor when there are unknown elements in it?
The gist of it is, don't treat the dimensions directly. Use them by reference and not by value. So, do not use K.int_shape and instead use K.shape. And use Keras operations to compose and come up with a new shape:
shape = K.shape(x)
newShape = K.concatenate([
shape[0:1],
shape[1:2] * shape[2:3],
shape[3:4]
])
I was following this method
(https://discuss.pytorch.org/t/dynamic-parameter-declaration-in-forward-function/427) to dynamically assign parameters in forward function.
However, my parameter is not just one single weight tensor but it is nn.Sequential.
When I implement below:
class MyModule(nn.Module):
def __init__(self):
# you need to register the parameter names earlier
self.register_parameter('W_di', None)
def forward(self, input):
if self.W_di is None:
self.W_di = nn.Sequential(
nn.Linear(mL_n * 2, 1024),
nn.ReLU(),
nn.Linear(1024, self.hS)).to(device)
I get the following error.
TypeError: cannot assign 'torch.nn.modules.container.Sequential' as parameter 'W_di' (torch.nn.Parameter or None expected)
Is there any way that I can register nn.Sequential as a whole param? Thanks!
If you or other users still have this problem, one solution to consider is using nn.ModuleList instead of nn.Sequential.
While nn.Sequential is useful for defining a fixed sequence of layers in PyTorch, nn.ModuleList is a more flexible container that allows direct access and modification of individual layers within the list. This can be especially helpful when dealing with dynamic models or architectures that require more complex layer arrangements.
My gut feeling is that you cannot do it. Even in the static model declaration, nn.Module also specifies the parameters of every sub-modules (e.g., nn.Conv2d or nn.Linear) in a nested way. That is, every kernel or bias is registered one by one and independently.
One workaround might be to introduce dynamic sub-modules. Here is my brief implementation. One can define desired dynamic behaviors inside the function DynamicLinear.
import torch
import torch.nn as nn
class DynamicLinear(nn.Module):
def __init__(self):
super(DynamicLinear, self).__init__()
# you need to register the parameter names earlier
self.register_parameter('W_di', None)
def forward(self, x):
if self.W_di is None:
# dynamically define a linear function here
self.W_di = nn.Parameter(torch.ones(1, 1)).to(x.device)
return self.W_di # x
class MyModule(nn.Module):
def __init__(self):
super(MyModule, self).__init__()
self.net = nn.Sequential(
DynamicLinear(),
nn.ReLU(),
DynamicLinear())
def forward(self, x):
return self.net(x)
m = MyModule()
x = torch.ones(1, 1)
y = m(x)
# output: 1
print(y)
Given a simple 2 layer neural network, the traditional idea is to compute the gradient w.r.t. the weights/model parameters. For an experiment, I want to compute the gradient of the error w.r.t the input. Are there existing Pytorch methods that can allow me to do this?
More concretely, consider the following neural network:
import torch.nn as nn
import torch.nn.functional as F
class NeuralNet(nn.Module):
def __init__(self, n_features, n_hidden, n_classes, dropout):
super(NeuralNet, self).__init__()
self.fc1 = nn.Linear(n_features, n_hidden)
self.sigmoid = nn.Sigmoid()
self.fc2 = nn.Linear(n_hidden, n_classes)
self.dropout = dropout
def forward(self, x):
x = self.sigmoid(self.fc1(x))
x = F.dropout(x, self.dropout, training=self.training)
x = self.fc2(x)
return F.log_softmax(x, dim=1)
I instantiate the model and an optimizer for the weights as follows:
import torch.optim as optim
model = NeuralNet(n_features=args.n_features,
n_hidden=args.n_hidden,
n_classes=args.n_classes,
dropout=args.dropout)
optimizer_w = optim.SGD(model.parameters(), lr=0.001)
While training, I update the weights as usual. Now, given that I have values for the weights, I should be able to use them to compute the gradient w.r.t. the input. I am unable to figure out how.
def train(epoch):
t = time.time()
model.train()
optimizer.zero_grad()
output = model(features)
loss_train = F.nll_loss(output[idx_train], labels[idx_train])
acc_train = accuracy(output[idx_train], labels[idx_train])
loss_train.backward()
optimizer_w.step()
# grad_features = loss_train.backward() w.r.t to features
# features -= 0.001 * grad_features
for epoch in range(args.epochs):
train(epoch)
It is possible, just set input.requires_grad = True for each input batch you're feeding in, and then after loss.backward() you should see that input.grad holds the expected gradient. In other words, if your input to the model (which you call features in your code) is some M x N x ... tensor, features.grad will be a tensor of the same shape, where each element of grad holds the gradient with respect to the corresponding element of features. In my comments below, I use i as a generalized index - if your parameters has for instance 3 dimensions, replace it with features.grad[i, j, k], etc.
Regarding the error you're getting: PyTorch operations build a tree representing the mathematical operation they are describing, which is then used for differentiation. For instance c = a + b will create a tree where a and b are leaf nodes and c is not a leaf (since it results from other expressions). Your model is the expression, and its inputs as well as parameters are the leaves, whereas all intermediate and final outputs are not leaves. You can think of leaves as "constants" or "parameters" and of all other variables as of functions of those. This message tells you that you can only set requires_grad of leaf variables.
Your problem is that at the first iteration, features is random (or however else you initialize) and is therefore a valid leaf. After your first iteration, features is no longer a leaf, since it becomes an expression calculated based on the previous ones. In pseudocode, you have
f_1 = initial_value # valid leaf
f_2 = f_1 + your_grad_stuff # not a leaf: f_2 is a function of f_1
to deal with that you need to use detach, which breaks the links in the tree, and makes the autograd treat a tensor as if it was constant, no matter how it was created. In particular, no gradient calculations will be backpropagated through detach. So you need something like
features = features.detach() - 0.01 * features.grad
Note: perhaps you need to sprinkle a couple more detaches here and there, which is hard to say without seeing your whole code and knowing the exact purpose.