Converting .npz model from ChainerRL to Keras model, or alternative methods? - python-3.x

I have a DQN reinforcement learning model which was trained using ChainerRL's built-in DQN experiment on the Ms Pacman Atari game environment, let's call this file model.npz. I have some analysis software written in Keras, which uses a Keras network and loads into that network a model.
I am having trouble getting the .npz exported from ChainerRL to play nice with the Keras network.
I have figured out how to load the weights from the .npz file. I think I figured out how to make sure the Keras model matches the Chainer RL model in terms of kernel size, stride, and activation.
Here is the code which calls the function that builds the network in ChainerRL:
return links.Sequence(
links.NatureDQNHead(),
L.Linear(512, n_actions),
DiscreteActionValue)
And the code which gets called by this, and builds a Chainer DQN network, is:
class NatureDQNHead(chainer.ChainList):
"""DQN's head (Nature version)"""
def __init__(self, n_input_channels=4, n_output_channels=512,
activation=F.relu, bias=0.1):
self.n_input_channels = n_input_channels
self.activation = activation
self.n_output_channels = n_output_channels
layers = [
#L.Convolution2D(n_input_channels, out_channel=32, ksize=8, stride=4, pad=0, nobias=False, initialW=None, initial_bias=bias, *, dilate=1, groups=1),
L.Convolution2D(n_input_channels, 32, 8, stride=4,
initial_bias=bias),
#L.Convolution2D(n_input_channels=32, out_channel=64, ksize=4, stride=2, pad=0, nobias=False, initialW=None, initial_bias=bias, *, dilate=1, groups=1),
L.Convolution2D(32, 64, 4, stride=2, initial_bias=bias),
#L.Convolution2D(n_input_channels=64, out_channel=64, ksize=3, stride=1, pad=0, nobias=False, initialW=None, initial_bias=bias, *, dilate=1, groups=1),
L.Convolution2D(64, 64, 3, stride=1, initial_bias=bias),
#L.Convolution2D(in_size=3136, out_size=n_output_channels, nobias=False, initialW=None, initial_bias=bias),
L.Linear(3136, n_output_channels, initial_bias=bias),
]
super(NatureDQNHead, self).__init__(*layers)
def __call__(self, state):
h = state
for layer in self:
h = self.activation(layer(h))
return h
So I wrote the following Keras code to build an equivalent network in Keras:
# Keras Model
hidden = 512
#bias initializer to match the chainerRL one
initial_bias = tf.keras.initializers.Constant(0.1)
#matches default "channels_last" data format for Keras layers
inputs = Input(shape=(84, 84, 4))
#First call to Conv2D including all defaults for easy reference
x = Conv2D(filters=32, kernel_size=(8, 8), strides=4, padding='valid', data_format=None, dilation_rate=(1, 1), activation='relu', use_bias=True, kernel_initializer='glorot_uniform', bias_initializer=initial_bias, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, name='deepq/q_func/convnet/Conv')(inputs)
x1 = Conv2D(filters=64, kernel_size=(4, 4), strides=2, activation='relu', padding='valid', bias_initializer=initial_bias, name='deepq/q_func/convnet/Conv_1')(x)
x2 = Conv2D(filters=64, kernel_size=(3, 3), strides=1, activation='relu', padding='valid', bias_initializer=initial_bias, name='deepq/q_func/convnet/Conv_2')(x1)
#Flatten for move to linear layers
conv_out = Flatten()(x2)
action_out = Dense(hidden, activation='relu', name='deepq/q_func/action_value/fully_connected')(conv_out)
action_scores = Dense(units = 9, name='deepq/q_func/action_value/fully_connected_1', activation='linear', use_bias=True, kernel_initializer="glorot_uniform", bias_initializer=initial_bias, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None,)(action_out) # num_actions in {4, .., 18}
#Now create model using the above-defined layers
modelArchitecture = Model(inputs, action_scores)
I have examined the structure of the initial weights for the Keras model and found them to be as follows:
Layer 0: no weights
Layer 1: (8,8,4,32)
Layer 2: (4,4,32,64)
Layer 3: (4,4,64,64)
Layer 4: no weights
Layer 5: (3136,512)
Layer 6: (9,512)
Then, I examined the weights in the .npz model which I am trying to import and found them to be as follows:
Layer 0: (32,4,8,8)
Layer 1: (64,32,4,4)
Layer 2: (64,64,4,4)
Layer 3: (512,3136)
Layer 4: (9,512)
So, I reshaped the weights from Layer 0 of model.npz with numpy.reshape and applied them to Layer 1 of the Keras network. I did the same with the model.npz weights for Layer 1, and applied them to Layer 2 of the Keras network. Then, I reshaped the weights from Layer 2 of model.npz, and applied them to Layer 3 of the Keras network. I transposed the weights of Layer 3 from model.npz, and applied them to Layer 5 of the Keras model. Finally, I transposed the weights of Layer 4 of model.npz and applied them to Layer 6 of the Keras model.
I saved the model in .H5 format, and then tried to run it on the evaluation code in the Ms Pacman Atari environment, and produces a video. When I do this, Pacman follows the exact same, short path, runs face-first into a wall, and then keeps trying to walk through the wall until a ghost kills it.
It seems, therfore, like I am doing something wrong in my translation between the Chainer DQN network and the Keras DQN network. I am not sure if maybe they process color in a different order or something?
I also attempted to export the ChainerRL model.npz file to ONNX, but got several errors to the point where it didn't seem possible without rewriting a lot of the ChainerRL code base.
Any help would be appreciated.

I am the author of ChainerRL. I have no experience with Keras, but apparently the formats of the weight parameters seem different between Chainer and Keras. You should check the meaning of each dimension of the weight parameters for each deep learning framework. In Chainer, as you can find in the document (https://docs.chainer.org/en/stable/reference/generated/chainer.functions.convolution_2d.html#chainer.functions.convolution_2d), the weight parameter of Convolution2D is stored as (c_O, c_I, h_K, w_K).
Once you find the meaning of each dimension, I guess what you need is always numpy.transpose, not numpy.reshape, to re-order dimensions to match the order of Keras.

Related

How to access intermediate layers in a pretrained model after retraining it on a new image dataset

I trained a VGG16 model on a labeled image dataset using the categorical crossentropy. I removed the fully connected layer and replaced it with new layers, as follows:
vgg16_model = VGG16(weights="imagenet", include_top=False, input_shape=(224, 224, 3))
model = Sequential()
model.add(vgg16_model)
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(70, activation='softmax'))
Then I trained the full model on my dataset. I want to use it now as a feature extraction model by extracting image features using any of the intermediate layers that belong to vgg16_model.
After training and saving the model, I can only access the layers that I added Dense and Dropout using pop() function to remove them and only keep the trained feature extractor model (vgg16).
i = 0
while i < 4:
model.pop()
This keeps only VGG16 model. However, the layers inside are not accessible, I tried:
new_model = Model(model.input,model.layers[-1].output)
But I get this error:
ValueError: Graph disconnected: cannot obtain value for tensor KerasTensor(type_spec=TensorSpec(shape=(None, 224, 224, 3), dtype=tf.float32, name='input_9'), name='input_9', description="created by layer 'input_9'") at layer "block1_conv1". The following previous layers were accessed without issue: []
How can I modify my model to consider early k layers at a given time, then use the model for prediction?
Defining the keras model in the way you did, unfortunately has many complications when we tried to features from intermediate layer. I've raised a ticket, see here.
As you want to extract image features using any of the intermediate layers that belong to vgg16_model, you can try the following approach.
# [good practice]: check first to know name and shape
# for layer in model.layers:
# print(layer.name, layer.output_shape)
# vgg16 (None, 7, 7, 512)
# flatten (None, 25088)
# dense (None, 256)
# dropout (None, 256)
# dense_1 (None, 70)
# get the trained model first
trained_vgg16 = keras.Model(
inputs=model.get_layer(name="vgg16").inputs,
outputs=model.get_layer(name="vgg16").outputs,
)
x = tf.ones((1, 224, 224, 3))
y = trained_vgg16(x)
y.shape
TensorShape([1, 7, 7, 512])
Next, use this trained_vgg16 model to build the target model. For example,
# extract only 1 intermediate layer
feature_extractor_block3_pool = keras.Model(
inputs=trained_vgg16.inputs,
outputs=trained_vgg16.get_layer(name="block3_pool").output,
)
# or, 2 based on purpose.
feature_extractor_block3_pool_block4_conv3 = keras.Model(
inputs=trained_vgg16.inputs,
outputs=[
trained_vgg16.get_layer(name="block3_pool").output,
trained_vgg16.get_layer(name="block4_conv3").output,
],
)
# or, all
feature_extractor = keras.Model(
inputs=trained_vgg16.inputs,
outputs=[layer.output for layer in trained_vgg16.layers],
)

Predict class and bounding rectangle

I try to train a neural network image classifier for two classes (3 in fact -> the 2 classes and all the rest). After the network training, I would like to be able to add a bounding rectangle on the detected area if it is one of the two wanted classes.
My model is created like that:
model = Sequential([
data_augmentation,
layers.Rescaling(1./255),
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(32, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Dropout(0.1),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(num_classes)
])
When I use the predict method, I just have a list of 1 input per class:
predictions = model.predict(img_array, verbose=0)
# [[1.12, 4.10, 6.21]]
score = tf.nn.softmax(predictions[0])
prediction = class_names[np.argmax(score)]
So I can have the class name of the tested image but can't have the bounding rectangle of the ROI in the image that trigger the detection of the class in order to pixelate/blur this area. Did I miss something in the output of my model ?
Sequential model is for simple network design.
Suppose you have object coordinates information in your dataset.
For more complex network with multiple I/O, in your case (class probability with object coordinates), you should use Keras Functional API instead.

Error in compiling CNN LSTM neural network

I am trying to create a neural network with keras that will have as input N multivariate time-series and as target output N time-series. I converted the time-series to a supervised problem with the window or lag method. As input I have a 4D matrix (samples, variables, sequence, lag) and as output a 2D matrix (samples, sequence). I found similar examples that used CNN+LSTM models, but I have difficulties applying them. In case it helps, I have train_X, train_y, test_X, test_y with dimensions (112, 5, 7998, 2) (112, 7998) (29, 5, 7998, 2) (29, 7998)
I have tried applying and removing the TimeDistributed Keras wrapper to the CNN part only and the whole model. The relevant part of the code is bellow.
model = Sequential()
model.add(TimeDistributed(Conv2D(filters=32, kernel_size=(1, 80), activation='relu', padding='same', input_shape=(train_X.shape[1], train_X.shape[2], train_X.shape[3]))))
model.add(TimeDistributed(MaxPool2D(pool_size=(1, 2),strides=1)))
model.add(TimeDistributed(Dropout(0.5)))
model.add(TimeDistributed(Flatten()))
model.add(TimeDistributed(LSTM(100, return_sequences=True)))
model.add(TimeDistributed(Dropout(0.2)))
model.add(TimeDistributed(Dense(units=1)))
model.compile(loss='mean_squared_error', optimizer='adam')
I get an index error.
IndexError: list index out of range

Changing input dimension for AlexNet

I am beginner and I am trying to implement AlexNet for image classification. The pytorch implementation of AlexNet is as follows:
class AlexNet(nn.Module):
def __init__(self, num_classes=1000):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(64, 192, kernel_size=5, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(192, 384, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(384, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
)
self.avgpool = nn.AdaptiveAvgPool2d((6, 6))
self.classifier = nn.Sequential(
nn.Dropout(),
nn.Linear(256 * 6 * 6, 4096),
nn.ReLU(inplace=True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(inplace=True),
nn.Linear(4096, num_classes),
)
def forward(self, x):
x = self.features(x)
x = self.avgpool(x)
x = x.view(x.size(0), 256 * 6 * 6)
x = self.classifier(x)
return x
However I am trying to implement the network for a input size of (3,448,224) with num of classes = 8.
I have no idea on how to change x.view in the forward method and how many layers I should drop to get optimum performance. Please help.
As stated in https://github.com/pytorch/vision/releases:
Since, most of the pretrained models provided in torchvision (the newest version) already added self.avgpool = nn.AdaptiveAvgPool2d((size, size)) to resolve the incompatibility with input size. So you don't have to care about it so much.
Below is the code, very short.
import torchvision
import torch.nn as nn
num_classes = 8
model = torchvision.models.alexnet(pretrained=True)
# replace the last classifier
model.classifier[6] = nn.Linear(4096, num_classes)
# now you can trained it with your dataset of size (3, 448, 224)
Transfer learning
There are two popular ways to do transfer learning. Suppose that we trained a model M in very large dataset D_large, now we would like to transfer the "knowledge" learned by the model M to our new model, M', on other datasets such as D_other (which has a smaller size than that of D_large).
Use (most) parts of M as the architecture of our new M' and initialize those parts with the weights trained on D_large. We can start training the model M' on the dataset D_other and let it learn the weights of those above parts from M to find the optimal weights on our new dataset. This is usually referred as fine-tuning the model M'.
Same as the above method except that before training M' we freeze all the parameters of those parts and start training M' on our dataset D_other. In both cases, those parts from M are mostly the first components in the model M' (the base). However, in this case, we refer those parts of M as the model to extract the features from the input dataset (or feature extractor). The accuracy obtained from the two methods may differ a little to some extent. However, this method guarantees the model doesn't overfit on the small dataset. It's a good point in terms of accuracy. On the other hands, when we freeze the weights of M, we don't need to store some intermediate values (the hidden outputs from each hidden layer) in the forward pass and also don't need to compute the gradients during the backward pass. This improves the speed of training and reduces the memory required during training.
The implementation
Along with Alexnet, a lot of pretrained models on ImageNet is already provided by Facebook team such as ResNet, VGG.
To fit your requirements the most in the aspect of model size, it would be nice to use VGG11, and ResNet which have fewest parameters in their model family.
I just pick VGG11 as an example:
Obtain a pretrained model from torchvision.
Freeze the all the parameters of this model.
Replace the last layer in the model by your new Linear layer to perform your classification. This means that you can reuse all most everything of M to M'.
import torchvision
# obtain the pretrained model
model = torchvision.models.vgg11(pretrained=True)
# freeze the params
for param in net.parameters():
param.requires_grad = False
# replace with your classifier
num_classes = 8
net.classifier[6] = nn.Linear(in_features=4096, out_features=num_classes)
# start training with your dataset
Warnings
In the old torchvision package version, there is no self.avgpool = nn.AdaptiveAvgPool2d((size, size)) which makes harder to train on our input size which is different from [3, 224, 224] used in training ImageNet. You can do a little effort as below:
class OurVGG11(nn.Module):
def __init__(self, num_classes=8):
super(OurVGG11, self).__init__()
self.vgg11 = torchvision.models.vgg11(pretrained=True)
for param in self.vgg11.parameters():
param.requires_grad = False
# Add a avgpool here
self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
# Replace the classifier layer
self.vgg11.classifier[-1] = nn.Linear(4096, num_classes)
def forward(self, x):
x = self.vgg11.features(x)
x = self.avgpool(x)
x = x.view(x.size(0), 512 * 7 * 7)
x = self.vgg11.classifier(x)
return x
model = OurVGG11()
# now start training `model` on our dataset.
Try out with different models in torchvision.models.

TypeError when trying to create a BLSTM network in Keras

I'm a bit new to Keras and deep learning. I'm currently trying to replicate this paper but when I'm compiling the second model (with the LSTMs) I get the following error:
"TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'"
The description of the model is this:
Input (length T is appliance specific window size)
Parallel 1D convolution with filter size 3, 5, and 7
respectively, stride=1, number of filters=32,
activation type=linear, border mode=same
Merge layer which concatenates the output of
parallel 1D convolutions
Bidirectional LSTM consists of a forward LSTM
and a backward LSTM, output_dim=128
Bidirectional LSTM consists of a forward LSTM
and a backward LSTM, output_dim=128
Dense layer, output_dim=128, activation type=ReLU
Dense layer, output_dim= T , activation type=linear
My code is this:
from keras import layers, Input
from keras.models import Model
def lstm_net(T):
input_layer = Input(shape=(T,1))
branch_a = layers.Conv1D(32, 3, activation='linear', padding='same', strides=1)(input_layer)
branch_b = layers.Conv1D(32, 5, activation='linear', padding='same', strides=1)(input_layer)
branch_c = layers.Conv1D(32, 7, activation='linear', padding='same', strides=1)(input_layer)
merge_layer = layers.Concatenate(axis=-1)([branch_a, branch_b, branch_c])
print(merge_layer.shape)
BLSTM1 = layers.Bidirectional(layers.LSTM(128, input_shape=(8,40,96)))(merge_layer)
print(BLSTM1.shape)
BLSTM2 = layers.Bidirectional(layers.LSTM(128))(BLSTM1)
dense_layer = layers.Dense(128, activation='relu')(BLSTM2)
output_dense = layers.Dense(1, activation='linear')(dense_layer)
model = Model(input_layer, output_dense)
model.name = "lstm_net"
return model
model = lstm_net(40)
After that I get the above error. My goal is to give as input a batch of 8 sequences of length 40 and get as output a batch of 8 sequences of length 40 too. I found this issue on Keras Github LSTM layer cannot connect to Dense layer after Flatten #818 and there #fchollet suggests that I should specify the 'input_shape' in the first layer which I did but probably not correctly. I put the two print statements to see how the shape is changing and the output is:
(?, 40, 96)
(?, 256)
The error occurs on the line BLSTM2 is defined and can be seen in full here
Your problem lies in these three lines:
BLSTM1 = layers.Bidirectional(layers.LSTM(128, input_shape=(8,40,96)))(merge_layer)
print(BLSTM1.shape)
BLSTM2 = layers.Bidirectional(layers.LSTM(128))(BLSTM1)
As a default, LSTM is returning only the last element of computations - so your data is losing its sequential nature. That's why the proceeding layer raises an error. Change this line to:
BLSTM1 = layers.Bidirectional(layers.LSTM(128, return_sequences=True))(merge_layer)
print(BLSTM1.shape)
BLSTM2 = layers.Bidirectional(layers.LSTM(128))(BLSTM1)
In order to make the input to the second LSTM to have sequential nature also.
Aside of this - I'd rather not use input_shape in middle model layer as it's automatically inferred.

Resources