I'm using convGRU from here and it works OK when I use it in the Sequential mode but it does not with the Functional. When I say it does not work, I mean that I'm getting black predictions from the Functional, while from the Sequential the outputs are similar to the inputs. Everything else in the code remains the same.
Below is an example of what I consistently get with one and another (being the first row the target and the second the prediction)
class MyModule(nn.Module):
def __init__(self):
super(MyModule, self).__init__()
self.rnn1 = ConvGRU(
input_size=(64, 64),
input_dim=1,
hidden_dim=1,
kernel_size=(3, 3),
num_layers=1,
dtype=dtype,
batch_first=True,
bias=True,
return_all_layers=False,
)
def forward(self, x):
x = self.rnn1(x)
return x
versus
model = nn.Sequential(
ConvGRU(
input_size=(64, 64),
input_dim=1,
hidden_dim=1,
kernel_size=(3, 3),
num_layers=1,
dtype=dtype,
batch_first=True,
bias=True,
return_all_layers=False,
)
)
Any idea if when programming custom layers there should be different considerations for when it is intended to use in Functional or Sequential?
Thanks!
Related
I want to square the result of a maxpool layer.
I tried the following:
class CNNClassifier(Classifier): # nn.Module
def __init__(self, in_channels):
super().__init__()
self.save_hyperparameters('in_channels')
self.cnn = nn.Sequential(
# maxpool
nn.MaxPool2d((1, 5), stride=(1, 5)),
torch.square(),
# layer1
nn.Conv2d(in_channels=in_channels, out_channels=32, kernel_size=5,
)
Which to the experienced PyTorch user for sure makes no sense.
Indeed, the error is quite clear:
TypeError: square() missing 1 required positional arguments: "input"
How can I feed in to square the tensor from the preceding layer?
You can't put a PyTorch function in a nn.Sequential pipeline, it needs to be a nn.Module.
You could wrap it like this:
class Square(nn.Module):
def forward(self, x):
return torch.square(x)
Then use it inside your sequential layer like so:
class CNNClassifier(Classifier): # nn.Module
def __init__(self, in_channels):
super().__init__()
self.save_hyperparameters('in_channels')
self.cnn = nn.Sequential(
nn.MaxPool2d((1, 5), stride=(1, 5)),
Square(),
nn.Conv2d(in_channels=in_channels, out_channels=32, kernel_size=5))
below is my demo code, just to simply show I've written a batch_norm layer, and when I export the corresponding model to onnx file and use Netron to render the network, I found that the BN layer is missing, since I disable the bias, I can see the bias still exists.
after a few modify of the code I confirm that the bias showed in the Netron app is the BN because when I delete the BN layer and disable bias, the b section disappled.
the Netron app can render the model I downloaded from internet correctly, so it's can't be the app's problem, but what's wrong in my code?
class myModel(nn.Module):
def __init__(self):
super().__init__()
self.layers = nn.Sequential(
nn.Conv2d(3, 20, 3, stride=2, bias=False),
nn.Conv2d(20, 40, 3, stride=2, bias=False),
nn.BatchNorm2d(40),
nn.ReLU(inplace=True),
nn.Flatten(),
nn.Linear(1000, 8) # 24x24x3 12x12x20 5x5x40=1000
)
def forward(self, x):
return self.layers(x)
m = myModel()
torch.onnx.export(m, (torch.ones(1,3,24,24),), 'test.onnx')
here is the capture, BatchNorm disappeared and bias shows
update:
when I delete all conv layers, the batchnorm shows:
it's a version specific problem, and if I switch the order bn and relu, it will render the bn layer normally.
I'm trying to implement following ResNet block, which ResNet consists of blocks with two convolutional layers and a skip connection. For some reason it doesn't add the output of skip connection, if applied, or input to the output of convolution layers.
The ResNet block has:
Two convolutional layers with:
3x3 kernel
no bias terms
padding with one pixel on both sides
2d batch normalization after each convolutional layer
The skip connection:
simply copies the input if the resolution and the number of channels do not change.
if either the resolution or the number of channels change, the skip connection should have one convolutional layer with:
1x1 convolution without bias
change of the resolution with stride (optional)
different number of input channels and output channels (optional)
the 1x1 convolutional layer is followed by 2d batch normalization.
The ReLU nonlinearity is applied after the first convolutional layer and at the end of the block.
My code:
class Block(nn.Module):
def __init__(self, in_channels, out_channels, stride=1):
"""
Args:
in_channels (int): Number of input channels.
out_channels (int): Number of output channels.
stride (int): Controls the stride.
"""
super(Block, self).__init__()
self.skip = nn.Sequential()
if stride != 1 or in_channels != out_channels:
self.skip = nn.Sequential(
nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(out_channels))
else:
self.skip = None
self.block = nn.Sequential(
nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=3, padding=1, stride=1, bias=False),
nn.BatchNorm2d(out_channels),
nn.ReLU(),
nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=3, padding=1, stride=1, bias=False),
nn.BatchNorm2d(out_channels))
def forward(self, x):
out = self.block(x)
if self.skip is not None:
out = self.skip(x)
else:
out = x
out += x
out = F.relu(out)
return out
The problem is in the reuse of the out variable. Normally, you'd implement like this:
def forward(self, x):
identity = x
out = self.block(x)
if self.skip is not None:
identity = self.skip(x)
out += identity
out = F.relu(out)
return out
If you like "one-liners":
def forward(self, x):
out = self.block(x)
out += (x if self.skip is None else self.skip(x))
out = F.relu(out)
return out
If you really like one-liners (please, that is too much, do not choose this option :))
def forward(self, x):
return F.relu(self.block(x) + (x if self.skip is None else self.skip(x)))
I am beginner and I am trying to implement AlexNet for image classification. The pytorch implementation of AlexNet is as follows:
class AlexNet(nn.Module):
def __init__(self, num_classes=1000):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(64, 192, kernel_size=5, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(192, 384, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(384, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
)
self.avgpool = nn.AdaptiveAvgPool2d((6, 6))
self.classifier = nn.Sequential(
nn.Dropout(),
nn.Linear(256 * 6 * 6, 4096),
nn.ReLU(inplace=True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(inplace=True),
nn.Linear(4096, num_classes),
)
def forward(self, x):
x = self.features(x)
x = self.avgpool(x)
x = x.view(x.size(0), 256 * 6 * 6)
x = self.classifier(x)
return x
However I am trying to implement the network for a input size of (3,448,224) with num of classes = 8.
I have no idea on how to change x.view in the forward method and how many layers I should drop to get optimum performance. Please help.
As stated in https://github.com/pytorch/vision/releases:
Since, most of the pretrained models provided in torchvision (the newest version) already added self.avgpool = nn.AdaptiveAvgPool2d((size, size)) to resolve the incompatibility with input size. So you don't have to care about it so much.
Below is the code, very short.
import torchvision
import torch.nn as nn
num_classes = 8
model = torchvision.models.alexnet(pretrained=True)
# replace the last classifier
model.classifier[6] = nn.Linear(4096, num_classes)
# now you can trained it with your dataset of size (3, 448, 224)
Transfer learning
There are two popular ways to do transfer learning. Suppose that we trained a model M in very large dataset D_large, now we would like to transfer the "knowledge" learned by the model M to our new model, M', on other datasets such as D_other (which has a smaller size than that of D_large).
Use (most) parts of M as the architecture of our new M' and initialize those parts with the weights trained on D_large. We can start training the model M' on the dataset D_other and let it learn the weights of those above parts from M to find the optimal weights on our new dataset. This is usually referred as fine-tuning the model M'.
Same as the above method except that before training M' we freeze all the parameters of those parts and start training M' on our dataset D_other. In both cases, those parts from M are mostly the first components in the model M' (the base). However, in this case, we refer those parts of M as the model to extract the features from the input dataset (or feature extractor). The accuracy obtained from the two methods may differ a little to some extent. However, this method guarantees the model doesn't overfit on the small dataset. It's a good point in terms of accuracy. On the other hands, when we freeze the weights of M, we don't need to store some intermediate values (the hidden outputs from each hidden layer) in the forward pass and also don't need to compute the gradients during the backward pass. This improves the speed of training and reduces the memory required during training.
The implementation
Along with Alexnet, a lot of pretrained models on ImageNet is already provided by Facebook team such as ResNet, VGG.
To fit your requirements the most in the aspect of model size, it would be nice to use VGG11, and ResNet which have fewest parameters in their model family.
I just pick VGG11 as an example:
Obtain a pretrained model from torchvision.
Freeze the all the parameters of this model.
Replace the last layer in the model by your new Linear layer to perform your classification. This means that you can reuse all most everything of M to M'.
import torchvision
# obtain the pretrained model
model = torchvision.models.vgg11(pretrained=True)
# freeze the params
for param in net.parameters():
param.requires_grad = False
# replace with your classifier
num_classes = 8
net.classifier[6] = nn.Linear(in_features=4096, out_features=num_classes)
# start training with your dataset
Warnings
In the old torchvision package version, there is no self.avgpool = nn.AdaptiveAvgPool2d((size, size)) which makes harder to train on our input size which is different from [3, 224, 224] used in training ImageNet. You can do a little effort as below:
class OurVGG11(nn.Module):
def __init__(self, num_classes=8):
super(OurVGG11, self).__init__()
self.vgg11 = torchvision.models.vgg11(pretrained=True)
for param in self.vgg11.parameters():
param.requires_grad = False
# Add a avgpool here
self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
# Replace the classifier layer
self.vgg11.classifier[-1] = nn.Linear(4096, num_classes)
def forward(self, x):
x = self.vgg11.features(x)
x = self.avgpool(x)
x = x.view(x.size(0), 512 * 7 * 7)
x = self.vgg11.classifier(x)
return x
model = OurVGG11()
# now start training `model` on our dataset.
Try out with different models in torchvision.models.
The following is a Feed-forward network using the nn.functional() module in PyTorch
import torch.nn as nn
import torch.nn.functional as F
class newNetwork(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(784, 128)
self.fc2 = nn.Linear(128, 64)
self.fc3 = nn.Linear(64,10)
def forward(self,x):
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.softmax(self.fc3(x))
return x
model = newNetwork()
model
The following is the same Feed-forward using nn.sequential() module to essentially build the same thing. What is the difference between the two and when would i use one instead of the other?
input_size = 784
hidden_sizes = [128, 64]
output_size = 10
Build a feed-forward network
model = nn.Sequential(nn.Linear(input_size, hidden_sizes[0]),
nn.ReLU(),
nn.Linear(hidden_sizes[0], hidden_sizes[1]),
nn.ReLU(),
nn.Linear(hidden_sizes[1], output_size),
nn.Softmax(dim=1))
print(model)
There is no difference between the two. The latter is arguably more concise and easier to write and the reason for "objective" versions of pure (ie non-stateful) functions like ReLU and Sigmoid is to allow their use in constructs like nn.Sequential.