Related
I have a Sequential model in PyTorch:
model = nn.Sequential(
nn.Embedding(alphabet_size, 64),
nn.LSTM(64, ...),
nn.Flatten(),
nn.Linear(...),
nn.Softmax()
)
I would like to force the batch size to Embedding layer:
nn.Embedding(alphabet_size, 64)
as in Keras:
Embedding(alphabet_size, 64, batch_input_shape=(batch_size, time_steps)))
How to do it?
Here is the answer to your question.
The example already available below
PyTorch sequential model and batch_size
I have a DQN reinforcement learning model which was trained using ChainerRL's built-in DQN experiment on the Ms Pacman Atari game environment, let's call this file model.npz. I have some analysis software written in Keras, which uses a Keras network and loads into that network a model.
I am having trouble getting the .npz exported from ChainerRL to play nice with the Keras network.
I have figured out how to load the weights from the .npz file. I think I figured out how to make sure the Keras model matches the Chainer RL model in terms of kernel size, stride, and activation.
Here is the code which calls the function that builds the network in ChainerRL:
return links.Sequence(
links.NatureDQNHead(),
L.Linear(512, n_actions),
DiscreteActionValue)
And the code which gets called by this, and builds a Chainer DQN network, is:
class NatureDQNHead(chainer.ChainList):
"""DQN's head (Nature version)"""
def __init__(self, n_input_channels=4, n_output_channels=512,
activation=F.relu, bias=0.1):
self.n_input_channels = n_input_channels
self.activation = activation
self.n_output_channels = n_output_channels
layers = [
#L.Convolution2D(n_input_channels, out_channel=32, ksize=8, stride=4, pad=0, nobias=False, initialW=None, initial_bias=bias, *, dilate=1, groups=1),
L.Convolution2D(n_input_channels, 32, 8, stride=4,
initial_bias=bias),
#L.Convolution2D(n_input_channels=32, out_channel=64, ksize=4, stride=2, pad=0, nobias=False, initialW=None, initial_bias=bias, *, dilate=1, groups=1),
L.Convolution2D(32, 64, 4, stride=2, initial_bias=bias),
#L.Convolution2D(n_input_channels=64, out_channel=64, ksize=3, stride=1, pad=0, nobias=False, initialW=None, initial_bias=bias, *, dilate=1, groups=1),
L.Convolution2D(64, 64, 3, stride=1, initial_bias=bias),
#L.Convolution2D(in_size=3136, out_size=n_output_channels, nobias=False, initialW=None, initial_bias=bias),
L.Linear(3136, n_output_channels, initial_bias=bias),
]
super(NatureDQNHead, self).__init__(*layers)
def __call__(self, state):
h = state
for layer in self:
h = self.activation(layer(h))
return h
So I wrote the following Keras code to build an equivalent network in Keras:
# Keras Model
hidden = 512
#bias initializer to match the chainerRL one
initial_bias = tf.keras.initializers.Constant(0.1)
#matches default "channels_last" data format for Keras layers
inputs = Input(shape=(84, 84, 4))
#First call to Conv2D including all defaults for easy reference
x = Conv2D(filters=32, kernel_size=(8, 8), strides=4, padding='valid', data_format=None, dilation_rate=(1, 1), activation='relu', use_bias=True, kernel_initializer='glorot_uniform', bias_initializer=initial_bias, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, name='deepq/q_func/convnet/Conv')(inputs)
x1 = Conv2D(filters=64, kernel_size=(4, 4), strides=2, activation='relu', padding='valid', bias_initializer=initial_bias, name='deepq/q_func/convnet/Conv_1')(x)
x2 = Conv2D(filters=64, kernel_size=(3, 3), strides=1, activation='relu', padding='valid', bias_initializer=initial_bias, name='deepq/q_func/convnet/Conv_2')(x1)
#Flatten for move to linear layers
conv_out = Flatten()(x2)
action_out = Dense(hidden, activation='relu', name='deepq/q_func/action_value/fully_connected')(conv_out)
action_scores = Dense(units = 9, name='deepq/q_func/action_value/fully_connected_1', activation='linear', use_bias=True, kernel_initializer="glorot_uniform", bias_initializer=initial_bias, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None,)(action_out) # num_actions in {4, .., 18}
#Now create model using the above-defined layers
modelArchitecture = Model(inputs, action_scores)
I have examined the structure of the initial weights for the Keras model and found them to be as follows:
Layer 0: no weights
Layer 1: (8,8,4,32)
Layer 2: (4,4,32,64)
Layer 3: (4,4,64,64)
Layer 4: no weights
Layer 5: (3136,512)
Layer 6: (9,512)
Then, I examined the weights in the .npz model which I am trying to import and found them to be as follows:
Layer 0: (32,4,8,8)
Layer 1: (64,32,4,4)
Layer 2: (64,64,4,4)
Layer 3: (512,3136)
Layer 4: (9,512)
So, I reshaped the weights from Layer 0 of model.npz with numpy.reshape and applied them to Layer 1 of the Keras network. I did the same with the model.npz weights for Layer 1, and applied them to Layer 2 of the Keras network. Then, I reshaped the weights from Layer 2 of model.npz, and applied them to Layer 3 of the Keras network. I transposed the weights of Layer 3 from model.npz, and applied them to Layer 5 of the Keras model. Finally, I transposed the weights of Layer 4 of model.npz and applied them to Layer 6 of the Keras model.
I saved the model in .H5 format, and then tried to run it on the evaluation code in the Ms Pacman Atari environment, and produces a video. When I do this, Pacman follows the exact same, short path, runs face-first into a wall, and then keeps trying to walk through the wall until a ghost kills it.
It seems, therfore, like I am doing something wrong in my translation between the Chainer DQN network and the Keras DQN network. I am not sure if maybe they process color in a different order or something?
I also attempted to export the ChainerRL model.npz file to ONNX, but got several errors to the point where it didn't seem possible without rewriting a lot of the ChainerRL code base.
Any help would be appreciated.
I am the author of ChainerRL. I have no experience with Keras, but apparently the formats of the weight parameters seem different between Chainer and Keras. You should check the meaning of each dimension of the weight parameters for each deep learning framework. In Chainer, as you can find in the document (https://docs.chainer.org/en/stable/reference/generated/chainer.functions.convolution_2d.html#chainer.functions.convolution_2d), the weight parameter of Convolution2D is stored as (c_O, c_I, h_K, w_K).
Once you find the meaning of each dimension, I guess what you need is always numpy.transpose, not numpy.reshape, to re-order dimensions to match the order of Keras.
I am trying to train a convolutional neural network on google colab for a medical classification problem. The data set is 89 256x256x256 images for training and 11 for testing. When I try to make my model train it gives me the following error:
import keras
from keras import optimizers
import keras.models
from keras.models import Sequential
import keras.layers
from keras.layers.convolutional import Conv3D
from keras.layers.convolutional import MaxPooling3D
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers import Dense
from keras import metrics
model = Sequential()
model.add(Conv3D(64, kernel_size=(3,3,3),
activation='relu',
input_shape=(10,1,256,256,256)))
model.add(Conv3D(64, (2,2,2), activation='relu'))
model.add(MaxPooling3D(pool_size=(2,2,2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
opt=keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
model.compile(opt, loss='categorical_crossentropy', metrics=['mae','acc'])
model.fit(x=train_data, y=train_labels,epochs=100, batch_size=10, verbose=2 ,callbacks=None, validation_split=0.0, validation_data=(validation_data,validation_labels), shuffle=True)
This is the error i get:
ValueError: Input 0 is incompatible with layer conv3d_56: expected ndim=5, found ndim=6
Assuming you are using channels first data_format, your input_shape argugment to the first Conv3D layer should be (CHANNELS, HEIGHT, WIDTH, DEPTH). But your input shape tuple has length of 5, and that is not what Conv3D layer expecting. Assuming the batch_size(of 10) is specified by mistake, making the following changes should fix the problem
model.add(Conv3D(64, kernel_size=(3,3,3),
activation='relu',
input_shape=(1,256,256,256)))
Edit
If you are using channels_last data-format your input_shape should be (HEIGHT, WIDTH, DEPTH, CHANNELS). And assuming your images have 1 channels, the above line should be,
model.add(Conv3D(64, kernel_size=(3,3,3),
activation='relu',
input_shape=(256,256,256, 1)))
I was trying to implement a simple Keras cat vs dog classifier, but while adding a dense layer, it returns an value error.
I'm using theano as backend.
Here's the code:
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
classifier = Sequential()
classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))
classifier.add(Conv2D(32, (3, 3), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))
classifier.add(Flatten())
classifier.add(Dense(units = 128, activation = 'relu'))
Here's the summary of the model
While executing the last line (adding Dense layer), I'm getting the following error:
ValueError: ('The specified size contains a dimension with value <= 0', (-448, 128))
Here's my keras.json file content
{
"backend": "theano",
"image_data_format": "channels_first",
"floatx": "float32",
"epsilon": 1e-07
}
I'm not able to find the problem.
Thanks in advance!
You're convolving across the channels dimension, try to explicitly set the data_format parameter in convolutions and pooling like this:
classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu', data_format='channels_last'))
classifier.add(Conv2D(32, (3, 3), activation = 'relu', data_format='channels_last'))
classifier.add(MaxPooling2D(pool_size = (2, 2), data_format='channels_last'))
Or reshape your data to have shape (3, 64, 64).
In simple terms, convolution is supposed to work roughly as shown in this gif:
You see that the gray-ish filter is strided across the pixels of your image (blue) in order to extract what are called local patterns (in green-ish). The application of this filter should ideally happen along the width and height of your image, namely the two 64-dimensions in your data.
This is also especially useful when, as it is customary, we split images in channels, usually to represent their RGB components. In this case, the same process shown in the gif is applied in parallel to the three channels, and in general can be applied to N arbitrary channels. This image should help clarify:
To cut a long story short, when you call:
classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))
Keras by default thinks that you're passing it a 64 x 3 image with 64 channels, and tries to convolve accordingly. This is obvioulsy wrong and results in negative dimensions (note how convolutions shrink the size of the image). By specifying the 'channels_last' format, you're telling Keras how the image is oriented (with the components dimension in the last "place"), so that it is able to convolve properly across the 64 x 64 images.
I run your code above and the summary i get is different than yours.
You should provide more information such as the keras version and backend you use...
I suspect there is something wrong within your keras.json file
as from official keras page, check your keras.json file (located in your home directory .keras/keras.json)
It should look like
{
"image_data_format": "channels_last",
"image_dim_ordering": "tf",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "tensorflow"
}
or
{
"image_data_format": "channels_last",
"image_dim_ordering": "th",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "theano"
}
I'm trying to replicate the example on Keras's website:
# as the first layer in a Sequential model
model = Sequential()
model.add(LSTM(32, input_shape=(10, 64)))
# now model.output_shape == (None, 32)
# note: `None` is the batch dimension.
# for subsequent layers, no need to specify the input size:
model.add(LSTM(16))
But when I run the following:
# only lines I've added:
from keras.models import Sequential
from keras.layers import Dense, LSTM
# all else is the same:
model = Sequential()
model.add(LSTM(32, input_shape=(10, 64)))
model.add(LSTM(16))
However, I get the following:
ValueError: Input 0 is incompatible with layer lstm_4: expected ndim=3, found ndim=2
Versions:
Keras: '2.0.5'
Python: '3.4.3'
Tensorflow: '1.2.1'
LSTM layer as their default option has to return only the last output from a sequence. That's why your data loses its sequential nature. In order to change that try:
model.add(LSTM(32, input_shape=(10, 64), return_sequences=True))
What makes LSTM to return a whole sequence of predictions.