I want to square the result of a maxpool layer.
I tried the following:
class CNNClassifier(Classifier): # nn.Module
def __init__(self, in_channels):
super().__init__()
self.save_hyperparameters('in_channels')
self.cnn = nn.Sequential(
# maxpool
nn.MaxPool2d((1, 5), stride=(1, 5)),
torch.square(),
# layer1
nn.Conv2d(in_channels=in_channels, out_channels=32, kernel_size=5,
)
Which to the experienced PyTorch user for sure makes no sense.
Indeed, the error is quite clear:
TypeError: square() missing 1 required positional arguments: "input"
How can I feed in to square the tensor from the preceding layer?
You can't put a PyTorch function in a nn.Sequential pipeline, it needs to be a nn.Module.
You could wrap it like this:
class Square(nn.Module):
def forward(self, x):
return torch.square(x)
Then use it inside your sequential layer like so:
class CNNClassifier(Classifier): # nn.Module
def __init__(self, in_channels):
super().__init__()
self.save_hyperparameters('in_channels')
self.cnn = nn.Sequential(
nn.MaxPool2d((1, 5), stride=(1, 5)),
Square(),
nn.Conv2d(in_channels=in_channels, out_channels=32, kernel_size=5))
Related
I use PyTorch for training neural networks. While saving the model, the weights of the network are saved, while the activation functions are not captured. Now, I reload the model from the saved weights with the activation functions changed, the model load still does not throw error. Further, the network outputs incorrect values (obviously). Is there a way to save the structure of the neural network along with the weights? An MWE is presented below.
import torch
from torch import nn
class Test(nn.Module):
def __init__(self):
super(Test, self).__init__()
self.fc1 = nn.Linear(10, 25)
self.fc2 = nn.Linear(25, 10)
self.relu = nn.ReLU()
self.tanh = nn.Tanh()
def forward(self, inputs):
return self.tanh(self.fc2(self.relu(self.fc1(inputs))))
To save
test = Test().float()
torch.save(test.state_dict(), "test.pt")
To load
import torch
from torch import nn
class Test1(nn.Module):
def __init__(self):
super(Test, self).__init__()
self.fc1 = nn.Linear(10, 25)
self.fc2 = nn.Linear(25, 10)
self.relu = nn.ReLU()
self.tanh = nn.Tanh()
def forward(self, inputs):
return self.relu(self.fc2(self.tanh(self.fc1(inputs))))
test1 = Test1().float()
test1.load_state_dict(torch.load("test.pt")) # Loads without error. However the activation functions, tanh and relu are interchanged, and the network outputs incorrect values.
Is there a way to also capture the activation functions, while saving? Thanks.
I have two Neural networks Child1 and Child2. The output of Child1 is fed to the child2. The output of child2 is fed to the child1 again. Each one has their respective cost function. I have created a parent module to pack both the modules in a single module. I have done this because I have a requirement that the back propagation(derivative calculation) of each of the loss function of the neural networks(child1 and child2) should be done with respect to the weights of both the neural networks combined(child1 + child2). Could I do it this way? Could I backpropagate through my Parent module and get the gradient of each of the loss functions with repect to the combined weights?
class Parent(nn.Module):
def __init__(self,in_features,z_dim):
super().__init__()
self.my_child1 = Child1(z_dim)
self.my_child2 = Child2 (in_features)
def forward(self,input):
input=self.my_child1(input)
input=self.my_child2(input)
return input
class Child1(nn.Module):
def __init__(self, in_features):
super().__init__()
self.child1 = nn.Sequential(
nn.Linear(in_features, 128),
nn.LeakyReLU(0.01),
nn.Linear(128, 1),
nn.Sigmoid(),
)
def forward(self, x):
return self.child1(x)
class Child2(nn.Module):
def __init__(self, z_dim, img_dim):
super().__init__()
self.child2 = nn.Sequential(
nn.Linear(z_dim, 256),
nn.LeakyReLU(0.01),
nn.Linear(256, img_dim),
nn.Tanh(),
)
def forward(self, x):
return self.child2(x)
In the torch.optim documentation, it is stated that model parameters can be grouped and optimized with different optimization hyperparameters. It says that
For example, this is very useful when one wants to specify per-layer
learning rates:
optim.SGD([
{'params': model.base.parameters()},
{'params': model.classifier.parameters(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)
This means that model.base’s parameters will use the default
learning rate of 1e-2, model.classifier’s parameters will use a
learning rate of 1e-3, and a momentum of 0.9 will be used for all
parameters.
I was wondering how to define such groups that have parameters() attribute. What came to my mind was something in the form of
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.base()
self.classifier()
self.relu = nn.ReLU()
def base(self):
self.fc1 = nn.Linear(1, 512)
self.fc2 = nn.Linear(512, 264)
def classifier(self):
self.fc3 = nn.Linear(264, 128)
self.fc4 = nn.Linear(128, 964)
def forward(self, y0):
y1 = self.relu(self.fc1(y0))
y2 = self.relu(self.fc2(y1))
y3 = self.relu(self.fc3(y2))
return self.fc4(y3)
How should I modify the snippet above to be able to get model.base.parameters()? Is the only way to define a nn.ParameterList and explicitly add weights and biases of the desired layers to that list? What is the best practice?
I will show three approaches to solving this. In the end though, it comes down to personal preference.
- Grouping parameters with nn.ModuleDict.
I noticed here an answer using nn.Sequential to group the layers which allow to
target different sections of the model using the parameters attribute of nn.Sequential. Indeed base and classifier might be more than sequential layers. I believe a more general approach is to leave the module as is, but instead, initialize an additional nn.ModuleDict module which will contain all parameters ordered by the optimization group in separate nn.ModuleLists:
class MyModel(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(1, 512)
self.fc2 = nn.Linear(512, 264)
self.fc3 = nn.Linear(264, 128)
self.fc4 = nn.Linear(128, 964)
self.params = nn.ModuleDict({
'base': nn.ModuleList([self.fc1, self.fc2]),
'classifier': nn.ModuleList([self.fc3, self.fc4])})
def forward(self, y0):
y1 = self.relu(self.fc1(y0))
y2 = self.relu(self.fc2(y1))
y3 = self.relu(self.fc3(y2))
return self.fc4(y3)
Then you can define your optimizer with:
optim.SGD([
{'params': model.params.base.parameters()},
{'params': model.params.classifier.parameters(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)
Do note MyModel's parameters' generator won't contain duplicate parameters.
- Creating an interface for accessing parameter groups.
A different solution is to provide an interface in the nn.Module to separate the parameters into groups:
class MyModel(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(1, 512)
self.fc2 = nn.Linear(512, 264)
self.fc3 = nn.Linear(264, 128)
self.fc4 = nn.Linear(128, 964)
def forward(self, y0):
y1 = self.relu(self.fc1(y0))
y2 = self.relu(self.fc2(y1))
y3 = self.relu(self.fc3(y2))
return self.fc4(y3)
def base_params(self):
return chain(m.parameters() for m in [self.fc1, self.fc2])
def classifier_params(self):
return chain(m.parameters() for m in [self.fc3, self.fc4])
Having imported itertools.chain as chain.
Then define your optimizer with:
optim.SGD([
{'params': model.base_params()},
{'params': model.classifier_params(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)
- Using child nn.Modules.
Lastly, you can define your module sections as submodules (here it comes down as the method as the nn.Sequential one, yet you can generalize this to any submodules).
class Base(nn.Sequential):
def __init__(self):
super().__init__(nn.Linear(1, 512),
nn.ReLU(),
nn.Linear(512, 264),
nn.ReLU())
class Classifier(nn.Sequential):
def __init__(self):
super().__init__(nn.Linear(264, 128),
nn.ReLU(),
nn.Linear(128, 964))
class MyModel(nn.Module):
def __init__(self):
super().__init__()
self.base = Base()
self.classifier = Classifier()
def forward(self, y0):
features = self.base(y0)
out = self.classifier(features)
return out
Here again you can use the same interface as the first method:
optim.SGD([
{'params': model.base.parameters()},
{'params': model.classifier.parameters(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)
I would argue this is the best practice. However, it forces you to define each of your components into separate nn.Module, which can be a hassle when experimenting with more complex models.
You can use torch.nn.Sequential to define base and classifier. Your class definition can then be:
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.base = nn.Sequential(nn.Linear(1, 512), nn.ReLU(), nn.Linear(512,264), nn.ReLU())
self.classifier = nn.Sequential(nn.Linear(264,128), nn.ReLU(), nn.Linear(128,964))
def forward(self, y0):
return self.classifier(self.base(y0))
Then, you can access parameters using model.base.parameters() and model.classifier.parameters().
I was wondering if there are many convolutional layers (conv1 --> conv2 ). How can we get the input channels parameter for the conv2 from the conv1 output channel?
class MyModel(nn.Module):
def __init__(self, in_ch, num_features, out_ch2):
super(MyModel, self).__init__()
self.conv1 = nn.Conv2D(in_channels,num_features)
self.conv2 = nn.Conv2D(in_channnels_from_out_channels_of_conv1,out_ch2)
Can I get the out_channels from the conv1 layer and use it as in_ch for conv2?
Second parameter of nn.Conv2D constructor is number of output channels:
self.conv1 = nn.Conv2D(in_channels,conv1_out_channels)
self.conv2 = nn.Conv2D(conv1_out_channels,out_ch2)
as described in the docs
Also it available as a property:
self.conv1.out_channels
I'm using convGRU from here and it works OK when I use it in the Sequential mode but it does not with the Functional. When I say it does not work, I mean that I'm getting black predictions from the Functional, while from the Sequential the outputs are similar to the inputs. Everything else in the code remains the same.
Below is an example of what I consistently get with one and another (being the first row the target and the second the prediction)
class MyModule(nn.Module):
def __init__(self):
super(MyModule, self).__init__()
self.rnn1 = ConvGRU(
input_size=(64, 64),
input_dim=1,
hidden_dim=1,
kernel_size=(3, 3),
num_layers=1,
dtype=dtype,
batch_first=True,
bias=True,
return_all_layers=False,
)
def forward(self, x):
x = self.rnn1(x)
return x
versus
model = nn.Sequential(
ConvGRU(
input_size=(64, 64),
input_dim=1,
hidden_dim=1,
kernel_size=(3, 3),
num_layers=1,
dtype=dtype,
batch_first=True,
bias=True,
return_all_layers=False,
)
)
Any idea if when programming custom layers there should be different considerations for when it is intended to use in Functional or Sequential?
Thanks!