why Netron render BatchNorm2d layer as bias on my model? - conv-neural-network

below is my demo code, just to simply show I've written a batch_norm layer, and when I export the corresponding model to onnx file and use Netron to render the network, I found that the BN layer is missing, since I disable the bias, I can see the bias still exists.
after a few modify of the code I confirm that the bias showed in the Netron app is the BN because when I delete the BN layer and disable bias, the b section disappled.
the Netron app can render the model I downloaded from internet correctly, so it's can't be the app's problem, but what's wrong in my code?
class myModel(nn.Module):
def __init__(self):
super().__init__()
self.layers = nn.Sequential(
nn.Conv2d(3, 20, 3, stride=2, bias=False),
nn.Conv2d(20, 40, 3, stride=2, bias=False),
nn.BatchNorm2d(40),
nn.ReLU(inplace=True),
nn.Flatten(),
nn.Linear(1000, 8) # 24x24x3 12x12x20 5x5x40=1000
)
def forward(self, x):
return self.layers(x)
m = myModel()
torch.onnx.export(m, (torch.ones(1,3,24,24),), 'test.onnx')
here is the capture, BatchNorm disappeared and bias shows
update:
when I delete all conv layers, the batchnorm shows:

it's a version specific problem, and if I switch the order bn and relu, it will render the bn layer normally.

Related

pytorch multiple branches of a model

Hi I'm trying to make this model using pytorch.
Each input is consisted of 20 images of size 28 X 28, which is C1 ~ Cp in the image.
Each image goes to CNN of same structure, but their outputs are concatenated eventually.
I'm currently struggling with feeding multiple inputs to each of its respective CNN model.
Each model in the first box with three convolutional layers will look like this as a code, but I'm not quite sure how I can put 20 different input to separate models of same structure to eventually concatenate.
self.features = nn.Sequential(
nn.Conv2d(1,10, kernel_size = 3, padding = 1),
nn.ReLU(),
nn.Conv2d(10, 14, kernel_size=3, padding=1),
nn.ReLU(),
nn.Conv2d(14, 18, kernel_size=3, padding=1),
nn.ReLU(),
nn.Flatten(),
nn.Linear(28*28*18, 256)
)
I've tried out giving a list of inputs as an input to forward function, but it ended up with an error and won't go through.
I'll be more than happy to explain further if anything is unclear.
Simply define forward as taking a list of tensors as input, then process each input with the corresponding CNN (in the example snippet, CNNs share the same structure but don't share parameters, which is what I assume you need. You'll need to fill in the dots ... according to your specifications.
class MyModel(torch.nn.Module):
def __init__(self, ...):
...
self.cnns = torch.nn.ModuleList([torch.nn.Sequential(...) for _ in range(20)])
def forward(xs: list[Tensor]):
return torch.cat([cnn(x) for x, cnn in zip(xs, self.cnns)], dim=...)
Assuming each path have it's own weights, may be this could be done with grouped convolution, although pre fusion Linear can cause some trouble.
P = 20
self.features = nn.Sequential(
nn.Conv2d(1*P,10*P, kernel_size = 3, padding = 1, groups = P ),
nn.ReLU(),
nn.Conv2d(10*P, 14*P, kernel_size=3, padding=1, groups = P),
nn.ReLU(),
nn.Conv2d(14*P, 18*P, kernel_size=3, padding=1, groups = P),
nn.ReLU(),
nn.Conv2d(18*P, 256*P, kernel_size=28, groups = P), # not shure about this one
nn.Flatten(),
nn.Linear(256*P, 1024 )
)

Converting .npz model from ChainerRL to Keras model, or alternative methods?

I have a DQN reinforcement learning model which was trained using ChainerRL's built-in DQN experiment on the Ms Pacman Atari game environment, let's call this file model.npz. I have some analysis software written in Keras, which uses a Keras network and loads into that network a model.
I am having trouble getting the .npz exported from ChainerRL to play nice with the Keras network.
I have figured out how to load the weights from the .npz file. I think I figured out how to make sure the Keras model matches the Chainer RL model in terms of kernel size, stride, and activation.
Here is the code which calls the function that builds the network in ChainerRL:
return links.Sequence(
links.NatureDQNHead(),
L.Linear(512, n_actions),
DiscreteActionValue)
And the code which gets called by this, and builds a Chainer DQN network, is:
class NatureDQNHead(chainer.ChainList):
"""DQN's head (Nature version)"""
def __init__(self, n_input_channels=4, n_output_channels=512,
activation=F.relu, bias=0.1):
self.n_input_channels = n_input_channels
self.activation = activation
self.n_output_channels = n_output_channels
layers = [
#L.Convolution2D(n_input_channels, out_channel=32, ksize=8, stride=4, pad=0, nobias=False, initialW=None, initial_bias=bias, *, dilate=1, groups=1),
L.Convolution2D(n_input_channels, 32, 8, stride=4,
initial_bias=bias),
#L.Convolution2D(n_input_channels=32, out_channel=64, ksize=4, stride=2, pad=0, nobias=False, initialW=None, initial_bias=bias, *, dilate=1, groups=1),
L.Convolution2D(32, 64, 4, stride=2, initial_bias=bias),
#L.Convolution2D(n_input_channels=64, out_channel=64, ksize=3, stride=1, pad=0, nobias=False, initialW=None, initial_bias=bias, *, dilate=1, groups=1),
L.Convolution2D(64, 64, 3, stride=1, initial_bias=bias),
#L.Convolution2D(in_size=3136, out_size=n_output_channels, nobias=False, initialW=None, initial_bias=bias),
L.Linear(3136, n_output_channels, initial_bias=bias),
]
super(NatureDQNHead, self).__init__(*layers)
def __call__(self, state):
h = state
for layer in self:
h = self.activation(layer(h))
return h
So I wrote the following Keras code to build an equivalent network in Keras:
# Keras Model
hidden = 512
#bias initializer to match the chainerRL one
initial_bias = tf.keras.initializers.Constant(0.1)
#matches default "channels_last" data format for Keras layers
inputs = Input(shape=(84, 84, 4))
#First call to Conv2D including all defaults for easy reference
x = Conv2D(filters=32, kernel_size=(8, 8), strides=4, padding='valid', data_format=None, dilation_rate=(1, 1), activation='relu', use_bias=True, kernel_initializer='glorot_uniform', bias_initializer=initial_bias, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, name='deepq/q_func/convnet/Conv')(inputs)
x1 = Conv2D(filters=64, kernel_size=(4, 4), strides=2, activation='relu', padding='valid', bias_initializer=initial_bias, name='deepq/q_func/convnet/Conv_1')(x)
x2 = Conv2D(filters=64, kernel_size=(3, 3), strides=1, activation='relu', padding='valid', bias_initializer=initial_bias, name='deepq/q_func/convnet/Conv_2')(x1)
#Flatten for move to linear layers
conv_out = Flatten()(x2)
action_out = Dense(hidden, activation='relu', name='deepq/q_func/action_value/fully_connected')(conv_out)
action_scores = Dense(units = 9, name='deepq/q_func/action_value/fully_connected_1', activation='linear', use_bias=True, kernel_initializer="glorot_uniform", bias_initializer=initial_bias, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None,)(action_out) # num_actions in {4, .., 18}
#Now create model using the above-defined layers
modelArchitecture = Model(inputs, action_scores)
I have examined the structure of the initial weights for the Keras model and found them to be as follows:
Layer 0: no weights
Layer 1: (8,8,4,32)
Layer 2: (4,4,32,64)
Layer 3: (4,4,64,64)
Layer 4: no weights
Layer 5: (3136,512)
Layer 6: (9,512)
Then, I examined the weights in the .npz model which I am trying to import and found them to be as follows:
Layer 0: (32,4,8,8)
Layer 1: (64,32,4,4)
Layer 2: (64,64,4,4)
Layer 3: (512,3136)
Layer 4: (9,512)
So, I reshaped the weights from Layer 0 of model.npz with numpy.reshape and applied them to Layer 1 of the Keras network. I did the same with the model.npz weights for Layer 1, and applied them to Layer 2 of the Keras network. Then, I reshaped the weights from Layer 2 of model.npz, and applied them to Layer 3 of the Keras network. I transposed the weights of Layer 3 from model.npz, and applied them to Layer 5 of the Keras model. Finally, I transposed the weights of Layer 4 of model.npz and applied them to Layer 6 of the Keras model.
I saved the model in .H5 format, and then tried to run it on the evaluation code in the Ms Pacman Atari environment, and produces a video. When I do this, Pacman follows the exact same, short path, runs face-first into a wall, and then keeps trying to walk through the wall until a ghost kills it.
It seems, therfore, like I am doing something wrong in my translation between the Chainer DQN network and the Keras DQN network. I am not sure if maybe they process color in a different order or something?
I also attempted to export the ChainerRL model.npz file to ONNX, but got several errors to the point where it didn't seem possible without rewriting a lot of the ChainerRL code base.
Any help would be appreciated.
I am the author of ChainerRL. I have no experience with Keras, but apparently the formats of the weight parameters seem different between Chainer and Keras. You should check the meaning of each dimension of the weight parameters for each deep learning framework. In Chainer, as you can find in the document (https://docs.chainer.org/en/stable/reference/generated/chainer.functions.convolution_2d.html#chainer.functions.convolution_2d), the weight parameter of Convolution2D is stored as (c_O, c_I, h_K, w_K).
Once you find the meaning of each dimension, I guess what you need is always numpy.transpose, not numpy.reshape, to re-order dimensions to match the order of Keras.

Pytorch, custom layer works in Sequential but not in Functional

I'm using convGRU from here and it works OK when I use it in the Sequential mode but it does not with the Functional. When I say it does not work, I mean that I'm getting black predictions from the Functional, while from the Sequential the outputs are similar to the inputs. Everything else in the code remains the same.
Below is an example of what I consistently get with one and another (being the first row the target and the second the prediction)
class MyModule(nn.Module):
def __init__(self):
super(MyModule, self).__init__()
self.rnn1 = ConvGRU(
input_size=(64, 64),
input_dim=1,
hidden_dim=1,
kernel_size=(3, 3),
num_layers=1,
dtype=dtype,
batch_first=True,
bias=True,
return_all_layers=False,
)
def forward(self, x):
x = self.rnn1(x)
return x
versus
model = nn.Sequential(
ConvGRU(
input_size=(64, 64),
input_dim=1,
hidden_dim=1,
kernel_size=(3, 3),
num_layers=1,
dtype=dtype,
batch_first=True,
bias=True,
return_all_layers=False,
)
)
Any idea if when programming custom layers there should be different considerations for when it is intended to use in Functional or Sequential?
Thanks!

Changing input dimension for AlexNet

I am beginner and I am trying to implement AlexNet for image classification. The pytorch implementation of AlexNet is as follows:
class AlexNet(nn.Module):
def __init__(self, num_classes=1000):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(64, 192, kernel_size=5, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(192, 384, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(384, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
)
self.avgpool = nn.AdaptiveAvgPool2d((6, 6))
self.classifier = nn.Sequential(
nn.Dropout(),
nn.Linear(256 * 6 * 6, 4096),
nn.ReLU(inplace=True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(inplace=True),
nn.Linear(4096, num_classes),
)
def forward(self, x):
x = self.features(x)
x = self.avgpool(x)
x = x.view(x.size(0), 256 * 6 * 6)
x = self.classifier(x)
return x
However I am trying to implement the network for a input size of (3,448,224) with num of classes = 8.
I have no idea on how to change x.view in the forward method and how many layers I should drop to get optimum performance. Please help.
As stated in https://github.com/pytorch/vision/releases:
Since, most of the pretrained models provided in torchvision (the newest version) already added self.avgpool = nn.AdaptiveAvgPool2d((size, size)) to resolve the incompatibility with input size. So you don't have to care about it so much.
Below is the code, very short.
import torchvision
import torch.nn as nn
num_classes = 8
model = torchvision.models.alexnet(pretrained=True)
# replace the last classifier
model.classifier[6] = nn.Linear(4096, num_classes)
# now you can trained it with your dataset of size (3, 448, 224)
Transfer learning
There are two popular ways to do transfer learning. Suppose that we trained a model M in very large dataset D_large, now we would like to transfer the "knowledge" learned by the model M to our new model, M', on other datasets such as D_other (which has a smaller size than that of D_large).
Use (most) parts of M as the architecture of our new M' and initialize those parts with the weights trained on D_large. We can start training the model M' on the dataset D_other and let it learn the weights of those above parts from M to find the optimal weights on our new dataset. This is usually referred as fine-tuning the model M'.
Same as the above method except that before training M' we freeze all the parameters of those parts and start training M' on our dataset D_other. In both cases, those parts from M are mostly the first components in the model M' (the base). However, in this case, we refer those parts of M as the model to extract the features from the input dataset (or feature extractor). The accuracy obtained from the two methods may differ a little to some extent. However, this method guarantees the model doesn't overfit on the small dataset. It's a good point in terms of accuracy. On the other hands, when we freeze the weights of M, we don't need to store some intermediate values (the hidden outputs from each hidden layer) in the forward pass and also don't need to compute the gradients during the backward pass. This improves the speed of training and reduces the memory required during training.
The implementation
Along with Alexnet, a lot of pretrained models on ImageNet is already provided by Facebook team such as ResNet, VGG.
To fit your requirements the most in the aspect of model size, it would be nice to use VGG11, and ResNet which have fewest parameters in their model family.
I just pick VGG11 as an example:
Obtain a pretrained model from torchvision.
Freeze the all the parameters of this model.
Replace the last layer in the model by your new Linear layer to perform your classification. This means that you can reuse all most everything of M to M'.
import torchvision
# obtain the pretrained model
model = torchvision.models.vgg11(pretrained=True)
# freeze the params
for param in net.parameters():
param.requires_grad = False
# replace with your classifier
num_classes = 8
net.classifier[6] = nn.Linear(in_features=4096, out_features=num_classes)
# start training with your dataset
Warnings
In the old torchvision package version, there is no self.avgpool = nn.AdaptiveAvgPool2d((size, size)) which makes harder to train on our input size which is different from [3, 224, 224] used in training ImageNet. You can do a little effort as below:
class OurVGG11(nn.Module):
def __init__(self, num_classes=8):
super(OurVGG11, self).__init__()
self.vgg11 = torchvision.models.vgg11(pretrained=True)
for param in self.vgg11.parameters():
param.requires_grad = False
# Add a avgpool here
self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
# Replace the classifier layer
self.vgg11.classifier[-1] = nn.Linear(4096, num_classes)
def forward(self, x):
x = self.vgg11.features(x)
x = self.avgpool(x)
x = x.view(x.size(0), 512 * 7 * 7)
x = self.vgg11.classifier(x)
return x
model = OurVGG11()
# now start training `model` on our dataset.
Try out with different models in torchvision.models.

Fractional max-pooling in keras

The existing function in keras lib including max-pooling, average pooling, etc.
However, I would like to implement fractional max-pooling in keras based on the paper https://arxiv.org/abs/1412.6071.
My implementation are as follow:
model = Sequential()
......
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
So, instead of model.add(MaxPooling2D(pool_size=(2, 2))), I would like to implement something like the following:
model.add(fractionalMaxpool2D(..............))
Is it possible?
I am currently using keras as backend in tensorflow.
Appreciate if someone would provide the algorithm/code.
I am quite new to this as I didn't wrote any custom layer before so could anyone kindly help out? Thanks!
In my opinion, you can do that by implementing your custom layer
class FractionalMaxpool2D(Layer):
def __init__(self, output_dim):
super(FractionalMaxpool2D, self).__init__()
self.output_dim = output_dim
def build(self, input_shape):
# Create a trainable weight variable for this layer.
# This kind of layer doesn't have any variable
pass
def call(self, x):
# Handle you algorithm here
return ....
def compute_output_shape(self, input_shape):
# return the output shape
return (input_shape[0], self.output_dim)
The problem is it's difficult to implement the core function for the Fractional max pooling that uses GPU.
Please check this discussion from Keras's Github.
You Can Use Keras Lambda Layer to Wrap tf.nn.fractional_max_pool, like
FMP = Lambda(lambda img, pool_size: tf.nn.fractional_max_pool(img, pool_size))
Now You can Use FMP in your Keras Code like other layers with Two Arguments
Img: with dimensions like [batch, height, width, channels]
pool_size: [1.0, pool_size_you_want, pool_size_you_want, 1.0]
The first and last are 1.0, which is because tf doesnot perform pooling on batch_size and channels, it performs on height and width

Resources