Hi I'm trying to make this model using pytorch.
Each input is consisted of 20 images of size 28 X 28, which is C1 ~ Cp in the image.
Each image goes to CNN of same structure, but their outputs are concatenated eventually.
I'm currently struggling with feeding multiple inputs to each of its respective CNN model.
Each model in the first box with three convolutional layers will look like this as a code, but I'm not quite sure how I can put 20 different input to separate models of same structure to eventually concatenate.
self.features = nn.Sequential(
nn.Conv2d(1,10, kernel_size = 3, padding = 1),
nn.ReLU(),
nn.Conv2d(10, 14, kernel_size=3, padding=1),
nn.ReLU(),
nn.Conv2d(14, 18, kernel_size=3, padding=1),
nn.ReLU(),
nn.Flatten(),
nn.Linear(28*28*18, 256)
)
I've tried out giving a list of inputs as an input to forward function, but it ended up with an error and won't go through.
I'll be more than happy to explain further if anything is unclear.
Simply define forward as taking a list of tensors as input, then process each input with the corresponding CNN (in the example snippet, CNNs share the same structure but don't share parameters, which is what I assume you need. You'll need to fill in the dots ... according to your specifications.
class MyModel(torch.nn.Module):
def __init__(self, ...):
...
self.cnns = torch.nn.ModuleList([torch.nn.Sequential(...) for _ in range(20)])
def forward(xs: list[Tensor]):
return torch.cat([cnn(x) for x, cnn in zip(xs, self.cnns)], dim=...)
Assuming each path have it's own weights, may be this could be done with grouped convolution, although pre fusion Linear can cause some trouble.
P = 20
self.features = nn.Sequential(
nn.Conv2d(1*P,10*P, kernel_size = 3, padding = 1, groups = P ),
nn.ReLU(),
nn.Conv2d(10*P, 14*P, kernel_size=3, padding=1, groups = P),
nn.ReLU(),
nn.Conv2d(14*P, 18*P, kernel_size=3, padding=1, groups = P),
nn.ReLU(),
nn.Conv2d(18*P, 256*P, kernel_size=28, groups = P), # not shure about this one
nn.Flatten(),
nn.Linear(256*P, 1024 )
)
Related
I am passing a batch of 256 images x_s with the following dimensions [256, 3, 560, 448]. However I whenever i try to feed my images to the CNN i get the following error:
ErrorTypeError: __init__() takes 1 positional argument but 2 were given'
Not sure what it means here by '1 positional argument'. I am passing in the images using an iterator which I created. Below is my code for the training loop up until the point where it breaks:
for e in range(num_epochs):
print(f'Epoch{e+1:04d}/ {num_epochs:04d}', end='\n================\n')
dl_source_iter = iter(dl_source)
dl_target_iter = iter(dl_target)
for batch in range(max_batches):
optimizer.zero_grad()
p = float(batch + e * max_batches) / (num_epochs *max_batches)
grl_lambda = 2. / (1. + np.exp(-10 * p)) - 1
x_s, y_s = next(dl_source_iter)
y_s_domain = torch.zeros(256, dtype=torch.long)
class_pred, domain_pred = Cnn(x_s, grl_lambda) #This is the line which throws an error
Here is my convolutional neural network:
class Cnn(nn.Module):
def __init__(self):
super(Cnn, self).__init__()
self.feature_extract= nn.Sequential(
nn.Conv2d(3, 64, 5, 1, 1),
nn.BatchNorm2d(64),
nn.MaxPool2d(2),
nn.ReLU(True),
nn.Conv2d(64, 50, 5, 1, 1),
nn.BatchNorm2d(50),
nn.MaxPool2d(2),
nn.ReLU(True),
nn.Dropout2d(),
)
self.num_cnn_features = 50*5*5
self.class_classifier = nn.Sequential(
nn.Linear(self.num_cnn_features, 200),
nn.BatchNorm1d(200),
nn.Dropout2d(),
nn.ReLU(True),
nn.Linear(200, 200),
nn.BatchNorm1d(200),
nn.ReLU(True),
nn.Linear(200, 182),
nn.LogSoftmax(dim = 1),
)
self.DomainClassifier = nn.Sequential(
nn.Linear(self.num_cnn_features, 100),
nn.BatchNorm1d(100),
nn.ReLU(True),
nn.Linear(100, 2),
nn.LogSoftmax(dim=1)
)
def forward(self, x, grl_lambda=1.0):
features = self.feature_extract(x)
features = features.view(-1, self.num_cnn_features)
features_grl = GradientReversalFn(features, grl_lambda)
class_pred = self.class_classifier(features)
domain_pred = self.DomainClassifier(features_grl)
return class_pred, domain_pred
Does anyone have any guesses as to why this might be happening? I can't seem to figure out what is going wrong. Any help would be greatly appreciated.
You need to create a Cnn object before you can pass data to it. You are calling the Cnn class constructor __init__, which expects no arguments, rather than the forward method for an instance of the Cnn class, which is what you actually want to do.
# outside of loop
model = Cnn()
# inside loop
class_pred, domain_pred = model(x_s, grl_lambda)
below is my demo code, just to simply show I've written a batch_norm layer, and when I export the corresponding model to onnx file and use Netron to render the network, I found that the BN layer is missing, since I disable the bias, I can see the bias still exists.
after a few modify of the code I confirm that the bias showed in the Netron app is the BN because when I delete the BN layer and disable bias, the b section disappled.
the Netron app can render the model I downloaded from internet correctly, so it's can't be the app's problem, but what's wrong in my code?
class myModel(nn.Module):
def __init__(self):
super().__init__()
self.layers = nn.Sequential(
nn.Conv2d(3, 20, 3, stride=2, bias=False),
nn.Conv2d(20, 40, 3, stride=2, bias=False),
nn.BatchNorm2d(40),
nn.ReLU(inplace=True),
nn.Flatten(),
nn.Linear(1000, 8) # 24x24x3 12x12x20 5x5x40=1000
)
def forward(self, x):
return self.layers(x)
m = myModel()
torch.onnx.export(m, (torch.ones(1,3,24,24),), 'test.onnx')
here is the capture, BatchNorm disappeared and bias shows
update:
when I delete all conv layers, the batchnorm shows:
it's a version specific problem, and if I switch the order bn and relu, it will render the bn layer normally.
def __init__(self):
super().__init__()
self.conv = nn.Sequential(
nn.Conv2d(32, 64, kernel_size=5, stride=2),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.Conv2d(64, 64, kernel_size=3, stride=2),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.Conv2d(64, 64, kernel_size=3, stride=2),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.Conv2d(64, 64, kernel_size=3, stride=2),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.Conv2d(64, 64, kernel_size=3, stride=2),
nn.BatchNorm2d(64),
nn.AvgPool2d()
)
conv_out_size = self._get_conv_out((32, 110, 110))
self.fc = nn.Sequential(
nn.Linear(conv_out_size, 1),
nn.Sigmoid(),
)
I have this model where everything to my eyes is fine. However, It says that I have to remove bias from the convolution if the convolution is followed by a normalization layer, because it already contains a parameter for the bias. Can you explain why and how I can do that?
Batch normalization = gamma * normalize(x) + bias
So, using bias in convolution layer and then again in batch normalization will cancel out the bias in the process of mean subtraction.
You can just put bias = False in your convolution layer to ignore this conflict as the default value for bias is True in pytorch
The answer is already accepted but still, I would like to add a point here. One of the advantages of Batch Normalization is that it can be folded in a convolution layer. This means that we can replace the Convolution followed by the Batch Normalization operation with just one convolution with different weights. It is a good practice folding batch normalization and you can refer to the link here Folding Batch Norm.
I have also written some python script for your understanding. Kindly check this.
def fold_batch_norm(conv_layer, bn_layer):
"""Fold the batch normalization parameters into the weights for
the previous layer."""
conv_weights = conv_layer.get_weights()[0]
# Keras stores the learnable weights for a BatchNormalization layer
# as four separate arrays:
# 0 = gamma (if scale == True)
# 1 = beta (if center == True)
# 2 = moving mean
# 3 = moving variance
bn_weights = bn_layer.get_weights()
gamma = bn_weights[0]
beta = bn_weights[1]
mean = bn_weights[2]
variance = bn_weights[3]
epsilon = 1e-7
new_weights = conv_weights * gamma / np.sqrt(variance + epsilon)
param = conv_layer.get_config()
#Note that it will handle for all cases
if param['use_bias'] == True:
bias = conv_layer.get_weights()[1]
new_bias = beta + (bias - mean) * gamma / np.sqrt(variance + epsilon)
else:
new_bias = beta - mean * gamma / np.sqrt(variance + epsilon)
return new_weights, new_bias
You can consider this idea in your future projects as well. Cheers :)
If the pre-trained network doesn't have bias in conv2d layer [use_bias = false], folding batchnorm would require it to use bias.
Is there an easy way to change the use_bias config in pre-trained keras network ?
layer.set_weights(fold_batch_norm(..)) won't work since original weights didn't have bias.
I'm using convGRU from here and it works OK when I use it in the Sequential mode but it does not with the Functional. When I say it does not work, I mean that I'm getting black predictions from the Functional, while from the Sequential the outputs are similar to the inputs. Everything else in the code remains the same.
Below is an example of what I consistently get with one and another (being the first row the target and the second the prediction)
class MyModule(nn.Module):
def __init__(self):
super(MyModule, self).__init__()
self.rnn1 = ConvGRU(
input_size=(64, 64),
input_dim=1,
hidden_dim=1,
kernel_size=(3, 3),
num_layers=1,
dtype=dtype,
batch_first=True,
bias=True,
return_all_layers=False,
)
def forward(self, x):
x = self.rnn1(x)
return x
versus
model = nn.Sequential(
ConvGRU(
input_size=(64, 64),
input_dim=1,
hidden_dim=1,
kernel_size=(3, 3),
num_layers=1,
dtype=dtype,
batch_first=True,
bias=True,
return_all_layers=False,
)
)
Any idea if when programming custom layers there should be different considerations for when it is intended to use in Functional or Sequential?
Thanks!
I am beginner and I am trying to implement AlexNet for image classification. The pytorch implementation of AlexNet is as follows:
class AlexNet(nn.Module):
def __init__(self, num_classes=1000):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(64, 192, kernel_size=5, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(192, 384, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(384, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
)
self.avgpool = nn.AdaptiveAvgPool2d((6, 6))
self.classifier = nn.Sequential(
nn.Dropout(),
nn.Linear(256 * 6 * 6, 4096),
nn.ReLU(inplace=True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(inplace=True),
nn.Linear(4096, num_classes),
)
def forward(self, x):
x = self.features(x)
x = self.avgpool(x)
x = x.view(x.size(0), 256 * 6 * 6)
x = self.classifier(x)
return x
However I am trying to implement the network for a input size of (3,448,224) with num of classes = 8.
I have no idea on how to change x.view in the forward method and how many layers I should drop to get optimum performance. Please help.
As stated in https://github.com/pytorch/vision/releases:
Since, most of the pretrained models provided in torchvision (the newest version) already added self.avgpool = nn.AdaptiveAvgPool2d((size, size)) to resolve the incompatibility with input size. So you don't have to care about it so much.
Below is the code, very short.
import torchvision
import torch.nn as nn
num_classes = 8
model = torchvision.models.alexnet(pretrained=True)
# replace the last classifier
model.classifier[6] = nn.Linear(4096, num_classes)
# now you can trained it with your dataset of size (3, 448, 224)
Transfer learning
There are two popular ways to do transfer learning. Suppose that we trained a model M in very large dataset D_large, now we would like to transfer the "knowledge" learned by the model M to our new model, M', on other datasets such as D_other (which has a smaller size than that of D_large).
Use (most) parts of M as the architecture of our new M' and initialize those parts with the weights trained on D_large. We can start training the model M' on the dataset D_other and let it learn the weights of those above parts from M to find the optimal weights on our new dataset. This is usually referred as fine-tuning the model M'.
Same as the above method except that before training M' we freeze all the parameters of those parts and start training M' on our dataset D_other. In both cases, those parts from M are mostly the first components in the model M' (the base). However, in this case, we refer those parts of M as the model to extract the features from the input dataset (or feature extractor). The accuracy obtained from the two methods may differ a little to some extent. However, this method guarantees the model doesn't overfit on the small dataset. It's a good point in terms of accuracy. On the other hands, when we freeze the weights of M, we don't need to store some intermediate values (the hidden outputs from each hidden layer) in the forward pass and also don't need to compute the gradients during the backward pass. This improves the speed of training and reduces the memory required during training.
The implementation
Along with Alexnet, a lot of pretrained models on ImageNet is already provided by Facebook team such as ResNet, VGG.
To fit your requirements the most in the aspect of model size, it would be nice to use VGG11, and ResNet which have fewest parameters in their model family.
I just pick VGG11 as an example:
Obtain a pretrained model from torchvision.
Freeze the all the parameters of this model.
Replace the last layer in the model by your new Linear layer to perform your classification. This means that you can reuse all most everything of M to M'.
import torchvision
# obtain the pretrained model
model = torchvision.models.vgg11(pretrained=True)
# freeze the params
for param in net.parameters():
param.requires_grad = False
# replace with your classifier
num_classes = 8
net.classifier[6] = nn.Linear(in_features=4096, out_features=num_classes)
# start training with your dataset
Warnings
In the old torchvision package version, there is no self.avgpool = nn.AdaptiveAvgPool2d((size, size)) which makes harder to train on our input size which is different from [3, 224, 224] used in training ImageNet. You can do a little effort as below:
class OurVGG11(nn.Module):
def __init__(self, num_classes=8):
super(OurVGG11, self).__init__()
self.vgg11 = torchvision.models.vgg11(pretrained=True)
for param in self.vgg11.parameters():
param.requires_grad = False
# Add a avgpool here
self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
# Replace the classifier layer
self.vgg11.classifier[-1] = nn.Linear(4096, num_classes)
def forward(self, x):
x = self.vgg11.features(x)
x = self.avgpool(x)
x = x.view(x.size(0), 512 * 7 * 7)
x = self.vgg11.classifier(x)
return x
model = OurVGG11()
# now start training `model` on our dataset.
Try out with different models in torchvision.models.