I was following this method
(https://discuss.pytorch.org/t/dynamic-parameter-declaration-in-forward-function/427) to dynamically assign parameters in forward function.
However, my parameter is not just one single weight tensor but it is nn.Sequential.
When I implement below:
class MyModule(nn.Module):
def __init__(self):
# you need to register the parameter names earlier
self.register_parameter('W_di', None)
def forward(self, input):
if self.W_di is None:
self.W_di = nn.Sequential(
nn.Linear(mL_n * 2, 1024),
nn.ReLU(),
nn.Linear(1024, self.hS)).to(device)
I get the following error.
TypeError: cannot assign 'torch.nn.modules.container.Sequential' as parameter 'W_di' (torch.nn.Parameter or None expected)
Is there any way that I can register nn.Sequential as a whole param? Thanks!
If you or other users still have this problem, one solution to consider is using nn.ModuleList instead of nn.Sequential.
While nn.Sequential is useful for defining a fixed sequence of layers in PyTorch, nn.ModuleList is a more flexible container that allows direct access and modification of individual layers within the list. This can be especially helpful when dealing with dynamic models or architectures that require more complex layer arrangements.
My gut feeling is that you cannot do it. Even in the static model declaration, nn.Module also specifies the parameters of every sub-modules (e.g., nn.Conv2d or nn.Linear) in a nested way. That is, every kernel or bias is registered one by one and independently.
One workaround might be to introduce dynamic sub-modules. Here is my brief implementation. One can define desired dynamic behaviors inside the function DynamicLinear.
import torch
import torch.nn as nn
class DynamicLinear(nn.Module):
def __init__(self):
super(DynamicLinear, self).__init__()
# you need to register the parameter names earlier
self.register_parameter('W_di', None)
def forward(self, x):
if self.W_di is None:
# dynamically define a linear function here
self.W_di = nn.Parameter(torch.ones(1, 1)).to(x.device)
return self.W_di # x
class MyModule(nn.Module):
def __init__(self):
super(MyModule, self).__init__()
self.net = nn.Sequential(
DynamicLinear(),
nn.ReLU(),
DynamicLinear())
def forward(self, x):
return self.net(x)
m = MyModule()
x = torch.ones(1, 1)
y = m(x)
# output: 1
print(y)
Related
So I want to try out an adaptive activation function for my neural network. This means I want to have a custom loss that is similar to a standard one (like tanh or relu), however I want to add some trainable parameters.
Currently, I am trying to add this trainable parameter by creating the activation function as a custom layer:
class AdaptiveActivation(keras.layers.Layer):
> """
Adaptive activation function that is changed in training process.
> """
def __init__(self, act="tanh"):
super(AdaptiveActivation, self).__init__()
self.a = tf.Variable(0.1, dtype=tf.float32, trainable=True)
self.n = tf.constant(10.0, dtype=tf.float32)
self.act = act
def call(self, x):
if self.act == "tanh":
return keras.activations.tanh(self.a*self.n*x)
elif self.act == "relu":
return keras.activations.relu(self.a*self.n*x)
However - if I understood some test outputs correctly - this means every time I call the activation function, there will be a unique parameter a. This means for every hidden layer, I get a different a. What I want, is one single a for all my activation functions. So instead of say 9 different values for a per epoch, just always one a that can change between epochs.
Furthermore, is there an easy way to obtain the a from this layer for output during training?
ok the solution was stupidly easy, I can just pass a trainable tensorflow variable to the layer from outside and assign it to the self.a there.
class AdaptiveActivation(keras.layers.Layer):
"""
Adaptive activation function that is changed in training process.
"""
def __init__(self, a, act="tanh"):
super(AdaptiveActivation, self).__init__()
self.a = a
self.n = tf.constant(5.0, dtype=tf.float32)
self.act = act
def call(self, x):
if self.act == "tanh":
return keras.activations.tanh(self.a*self.n*x)
elif self.act == "relu":
return keras.activations.relu(self.a*self.n*x)
This also solves the "issue" of tracking it.
It does feel very unnecessary though, why couldn't I just have done this without having to implement a new layer first.
I'm trying to access model parameters using the internal ._parameters method. When I define the model as below, I get model parameters without any issue
model = nn.Linear(10, 10)
print(model._parameters)
However, when I use this method to get parameters of a model defined as a class, I get an empty OrderedDict().
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.fc = nn.Linear(10, 10)
def forward(self, x):
return self.fc(x)
model = MyModel()
print(model._parameters)
Is there a solution to this using ._parameters?
NOTE: I understand that using internal methods are frowned upon.
There are three types of objects in an nn.Module: tensors stored inside _parameters, buffers inside _buffers, and modules inside _modules. All three are private (indicated by the _ prefix), as such they are not meant to be used by the end-user.
The private nn.Module attribute _parameters is an OrderedDict containing parameters of the module ("parameters" as in nn.Parameters, not nn.Modules). That is the reason why it is empty in your example. Have a look at the following module instead:
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.p = nn.Parameter(torch.rand(10))
def forward(self, x):
return self.fc(x)
>>> model = MyModel()
>>> print(model._parameters)
OrderedDict([('p', Parameter containing:
tensor([8.5576e-01, 1.4343e-01, 3.2866e-04, 9.4876e-01, 4.4837e-01, 9.7725e-02,
2.7249e-01, 6.7258e-01, 5.6823e-01, 4.0484e-01], requires_grad=True))])
I understand that using internal methods are frowned upon.
Do not use _parameters. You should instead use the appropriate API for this use case, which is nn.Module.parameters:
for p in model.parameters():
print(p)
Each linear layer in the MyModel class is a module, and can be accessed using _modules. So you can access the parameters of your model using:
for module_key in model._modules.keys():
for param_key in model._modules[module_key]._parameters:
p = model._modules[module_key]._parameters[param_key]
Let's say I have a DataLoader
dataloader = DataLoader(dataset, batch_size=32)
I want to define a neural network that can feed forward my custom function. I know in traditional fully-connected network, I could just using its Linear or other already existed functions. And, importantly, those functions can feed with DataLoader and automatically run with batch.
Now I want to define my own function, but I don't know how to write it without for loop. For example (I randomly write some custom function f(x)),
def f(x):
x = np.sin(np.exp(x)) + np.log(x) - 1/x
return x
class NeuralNet(nn.Module):
def __init__(self, input_dim):
super(NeuralNet, self).__init__()
self.net = f
def forward(self, x_batch):
result = torch.zeros(len(x_batch))
for i in len(x_batch):
result[i] = self.net(x_batch[i])
return result
Or for loop in f
def f(x_batch):
result = torch.zeros(len(x_batch))
for i in range(x_batch):
result[i] = np.sin(np.exp(x_batch[i])) + np.log(x_batch[i]) - 1/x_batch[i]
return result
class NeuralNet(nn.Module):
def __init__(self, input_dim):
super(NeuralNet, self).__init__()
self.net = f
def forward(self, x_batch):
return self.net(x_batch)
Is there any way to get rid of for loop, in order to do parallel calculation on GPU? Cause I think for loop is not going to utilize the advantage of GPU. Or did I misunderstand something?
Basic mathematical operators implemented in PyTorch work with n-dimension tensors out-of-the-box. If you define your f function this way, you can feed the whole batch in one go.
In your case you could do:
def f(x):
x = torch.sin(torch.exp(x)) + torch.log(x) - 1/x
return x
class NeuralNet(nn.Module):
def __init__(self):
super(NeuralNet, self).__init__()
self.net = f
def forward(self, x_batch):
result = self.net(x_batch)
return result
>>> net = NeuralNet()
>>> net(torch.tensor([[1,2,3]]))
If you stick with PyTorch operators (most have a backward function implemented) and don't switch to Numpy at some point. Then you will be able to call the backward pass on your model.
Of course, all of these computations can be carried out on the GPU!
Pennylane provides a torch layer
https://pennylane.readthedocs.io/en/stable/code/api/pennylane.qnn.TorchLayer.html
I'm exporting a PyTorch model via TorchScript tracing, but I'm facing issues. Specifically, I have to perform some operations on tensor sizes, but the JIT compilers hardcodes the variable shapes as constants, braking compatibility with tensor of different sizes.
For example, create the class:
class Foo(nn.Module):
"""Toy class that plays with tensor shape to showcase tracing issue.
It creates a new tensor with the same shape as the input one, except
for the last dimension, which is doubled. This new tensor is filled
based on the values of the input.
"""
def __init__(self):
nn.Module.__init__(self)
def forward(self, x):
new_shape = (x.shape[0], 2*x.shape[1]) # incriminated instruction
x2 = torch.empty(size=new_shape)
x2[:, ::2] = x
x2[:, 1::2] = x + 1
return x2
and run the test code:
x = torch.randn((3, 5)) # create example input
foo = Foo()
traced_foo = torch.jit.trace(foo, x) # trace
print(traced_foo(x).shape) # obviously this works
print(traced_foo(x[:, :4]).shape) # but fails with a different shape!
I could solve the issue by scripting, but in this case I really need to use tracing. Moreover, I think that tracing should be able to handle tensor size manipulations correctly.
but in this case I really need to use tracing
You can freely mix torch.script and torch.jit wherever needed. For example one could do this:
import torch
class MySuperModel(torch.nn.Module):
def __init__(self, *args, **kwargs):
super().__init__()
self.scripted = torch.jit.script(Foo(*args, **kwargs))
self.traced = Bar(*args, **kwargs)
def forward(self, data):
return self.scripted(self.traced(data))
model = MySuperModel()
torch.jit.trace(model, (input1, input2))
You could also move part of the functionality dependent on shape to separate function and decorate it with #torch.jit.script:
#torch.jit.script
def _forward_impl(x):
new_shape = (x.shape[0], 2*x.shape[1]) # incriminated instruction
x2 = torch.empty(size=new_shape)
x2[:, ::2] = x
x2[:, 1::2] = x + 1
return x2
class Foo(nn.Module):
def forward(self, x):
return _forward_impl(x)
There is no other way than script for that as it has to understand your code. With tracing it merely records operations you perform on the tensor and has no knowledge of control flow dependent on data (or shape of data).
Anyway, this should cover most of the cases and if it doesn't you should be more specific.
This bug has been fixed on 1.10.2
I'm trying to develop a layer in Keras which works with 3D tensors. To make it flexible, I would like to postpone the code that relies on the input's exact shape as much as possible.
My layer is overriding 5 methods:
from tensorflow.python.keras.layers import Layer
class MyLayer(Layer):
def __init__(self, **kwargs):
pass
def build(self, input_shape):
pass
def call(self, inputs, verbose=False):
second_dim = K.int_shape(inputs)[-2]
# Do something with the second_dim
def compute_output_shape(self, input_shape):
pass
def get_config(self):
pass
And I'm using this layer like this:
input = Input(batch_shape=(None, None, 128), name='input')
x = MyLayer(name='my_layer')(input)
model = Model(input, x)
But I'm facing an error since the second_dim is None. How can I develop a layer that relies on the dimensions of the input but it's ok with it being provided by the actual data and not the input layer?
I ended up asking the same question differently, and I've got a perfect answer:
What is the right way to manipulate the shape of a tensor when there are unknown elements in it?
The gist of it is, don't treat the dimensions directly. Use them by reference and not by value. So, do not use K.int_shape and instead use K.shape. And use Keras operations to compose and come up with a new shape:
shape = K.shape(x)
newShape = K.concatenate([
shape[0:1],
shape[1:2] * shape[2:3],
shape[3:4]
])