So I'm trying to implement this paper about a Siamese neural network: Learning a similarity metric discriminatively, with application to face verification, by Sumit Chopra, Raia Hadsell and Yann LeCun (2005). I'm using the CIFAR10 dataset instead, though, with 10 classes.
The specifications of one of the legs is reproduced for convenience. Notation: C_x is a convolution layer, S_x is a subsampling layer and F_x is a fully connected layer; with a shared index x:
C1: feature maps: 15, kernel size = (7, 7)
S2: feature maps: 15, field-of-view = (2, 2)
C3: feature maps: 45, kernel size = (6, 6)
S4: feature maps: 45, field-of-view = (4, 3)
C5: feature maps: 250, kernel size = (5, 5)
F6 (fully connected layer): no. of units = 50
What I've Tried
model = Sequential()
#C1
model.add(Convolution2D(15, 7, 7,
activation='relu',
border_mode='same',
input_shape=input_img_shape))
print("C1 shape: ", model.output_shape)
#S2
model.add(MaxPooling2D((2,2), border_mode='same'))
print("S2 shape: ", model.output_shape)
#...
#C5
model.add(Convolution2D(250, 5, 5,
activation='relu',
border_mode='same'))
print("C5 shape: ", model.output_shape)
#F6
model.add(Dense(50))
This throws a long error message, which I believe is a reshape error. A snippet of the error:
Exception: Input 0 is incompatible with layer dense_13: expected
ndim=2, found ndim=4
I know that the problem is isolated in that final Dense layer, because the code proceeds smoothly if I comment it out. But I'm not sure exactly how I should then shape/specify my final fully connected layer so that it's compatible with the prior convolution layer?
Some Places I've Looked
This is a related problem, though the implementation is slightly different (it seems that there isn't a 'Siamese' core layer in keras at the time of this writing). I'm aware that there are also implementations in Theano, which I'll bear in mind if I'm just not able to do it in keras.
Thanks!
As mentioned by Matias Valdenegro, Keras already has an example of Siamese network. The example uses only dense layers, though.
Your problem is that you need to add a Flatten layer between the convolutional layers and the dense layers to have a correct shape, see this Keras CNN example
These 2 examples should help you build your Siamese network.
You don't need a Siamese Layer, you just need to use the Keras functional API to create a model with two inputs and one output.
Seems that Keras examples already contain a model that is very similar to the one you are implementing.
Related
I have a DQN reinforcement learning model which was trained using ChainerRL's built-in DQN experiment on the Ms Pacman Atari game environment, let's call this file model.npz. I have some analysis software written in Keras, which uses a Keras network and loads into that network a model.
I am having trouble getting the .npz exported from ChainerRL to play nice with the Keras network.
I have figured out how to load the weights from the .npz file. I think I figured out how to make sure the Keras model matches the Chainer RL model in terms of kernel size, stride, and activation.
Here is the code which calls the function that builds the network in ChainerRL:
return links.Sequence(
links.NatureDQNHead(),
L.Linear(512, n_actions),
DiscreteActionValue)
And the code which gets called by this, and builds a Chainer DQN network, is:
class NatureDQNHead(chainer.ChainList):
"""DQN's head (Nature version)"""
def __init__(self, n_input_channels=4, n_output_channels=512,
activation=F.relu, bias=0.1):
self.n_input_channels = n_input_channels
self.activation = activation
self.n_output_channels = n_output_channels
layers = [
#L.Convolution2D(n_input_channels, out_channel=32, ksize=8, stride=4, pad=0, nobias=False, initialW=None, initial_bias=bias, *, dilate=1, groups=1),
L.Convolution2D(n_input_channels, 32, 8, stride=4,
initial_bias=bias),
#L.Convolution2D(n_input_channels=32, out_channel=64, ksize=4, stride=2, pad=0, nobias=False, initialW=None, initial_bias=bias, *, dilate=1, groups=1),
L.Convolution2D(32, 64, 4, stride=2, initial_bias=bias),
#L.Convolution2D(n_input_channels=64, out_channel=64, ksize=3, stride=1, pad=0, nobias=False, initialW=None, initial_bias=bias, *, dilate=1, groups=1),
L.Convolution2D(64, 64, 3, stride=1, initial_bias=bias),
#L.Convolution2D(in_size=3136, out_size=n_output_channels, nobias=False, initialW=None, initial_bias=bias),
L.Linear(3136, n_output_channels, initial_bias=bias),
]
super(NatureDQNHead, self).__init__(*layers)
def __call__(self, state):
h = state
for layer in self:
h = self.activation(layer(h))
return h
So I wrote the following Keras code to build an equivalent network in Keras:
# Keras Model
hidden = 512
#bias initializer to match the chainerRL one
initial_bias = tf.keras.initializers.Constant(0.1)
#matches default "channels_last" data format for Keras layers
inputs = Input(shape=(84, 84, 4))
#First call to Conv2D including all defaults for easy reference
x = Conv2D(filters=32, kernel_size=(8, 8), strides=4, padding='valid', data_format=None, dilation_rate=(1, 1), activation='relu', use_bias=True, kernel_initializer='glorot_uniform', bias_initializer=initial_bias, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, name='deepq/q_func/convnet/Conv')(inputs)
x1 = Conv2D(filters=64, kernel_size=(4, 4), strides=2, activation='relu', padding='valid', bias_initializer=initial_bias, name='deepq/q_func/convnet/Conv_1')(x)
x2 = Conv2D(filters=64, kernel_size=(3, 3), strides=1, activation='relu', padding='valid', bias_initializer=initial_bias, name='deepq/q_func/convnet/Conv_2')(x1)
#Flatten for move to linear layers
conv_out = Flatten()(x2)
action_out = Dense(hidden, activation='relu', name='deepq/q_func/action_value/fully_connected')(conv_out)
action_scores = Dense(units = 9, name='deepq/q_func/action_value/fully_connected_1', activation='linear', use_bias=True, kernel_initializer="glorot_uniform", bias_initializer=initial_bias, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None,)(action_out) # num_actions in {4, .., 18}
#Now create model using the above-defined layers
modelArchitecture = Model(inputs, action_scores)
I have examined the structure of the initial weights for the Keras model and found them to be as follows:
Layer 0: no weights
Layer 1: (8,8,4,32)
Layer 2: (4,4,32,64)
Layer 3: (4,4,64,64)
Layer 4: no weights
Layer 5: (3136,512)
Layer 6: (9,512)
Then, I examined the weights in the .npz model which I am trying to import and found them to be as follows:
Layer 0: (32,4,8,8)
Layer 1: (64,32,4,4)
Layer 2: (64,64,4,4)
Layer 3: (512,3136)
Layer 4: (9,512)
So, I reshaped the weights from Layer 0 of model.npz with numpy.reshape and applied them to Layer 1 of the Keras network. I did the same with the model.npz weights for Layer 1, and applied them to Layer 2 of the Keras network. Then, I reshaped the weights from Layer 2 of model.npz, and applied them to Layer 3 of the Keras network. I transposed the weights of Layer 3 from model.npz, and applied them to Layer 5 of the Keras model. Finally, I transposed the weights of Layer 4 of model.npz and applied them to Layer 6 of the Keras model.
I saved the model in .H5 format, and then tried to run it on the evaluation code in the Ms Pacman Atari environment, and produces a video. When I do this, Pacman follows the exact same, short path, runs face-first into a wall, and then keeps trying to walk through the wall until a ghost kills it.
It seems, therfore, like I am doing something wrong in my translation between the Chainer DQN network and the Keras DQN network. I am not sure if maybe they process color in a different order or something?
I also attempted to export the ChainerRL model.npz file to ONNX, but got several errors to the point where it didn't seem possible without rewriting a lot of the ChainerRL code base.
Any help would be appreciated.
I am the author of ChainerRL. I have no experience with Keras, but apparently the formats of the weight parameters seem different between Chainer and Keras. You should check the meaning of each dimension of the weight parameters for each deep learning framework. In Chainer, as you can find in the document (https://docs.chainer.org/en/stable/reference/generated/chainer.functions.convolution_2d.html#chainer.functions.convolution_2d), the weight parameter of Convolution2D is stored as (c_O, c_I, h_K, w_K).
Once you find the meaning of each dimension, I guess what you need is always numpy.transpose, not numpy.reshape, to re-order dimensions to match the order of Keras.
I am trying to implement the artificial convolutional neural network in order to perform a two-class pixel-wise classification as seen in the figure attached (from Chen et al. Nature 2017).
Can you give me a hint on what the third and fourth layers should look like?
This is how far I've got already:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
model = Sequential()
model.add(Conv2D(40, (15, 15), activation='relu',
padding='same', input_shape = (64, 64, 1))) # first layer
model.add(MaxPooling2D((2, 2), padding='same')) # second layer
# model.add(...) # third layer <-- how to implement this?
# model.add(...) # fourth layer <-- how to implement this?
print(model.summary())
How many kernels did they use for the remaining layers and how should I interpret the summation symbols in the image?
Thanks in advance!
The actual question is rather ambiguous.
I am guessing correctly, that you want someone to implement the missing two lines of code for the network?
model = Sequential()
model.add(Conv2D(40, (15, 15), activation='relu',
padding='same', input_shape=(64, 64, 1)))
model.add(MaxPooling2D((2, 2), padding='same'))
model.add(Conv2D(40, (15, 15), activation='relu', padding='same')) # layer 3
model.add(Conv2D(1, (15, 15), activation='linear', padding='same')) # layer 4
print(model.summary())
To get 40 feature maps after layer 3, we just convolve with 40 different kernels.
After layer 4, there should be only one feature map / channel, so 1 kernel is enough here.
By the way, the figure seems to be from Convolutional neural networks for automated annotation of cellular cryo-electron tomograms (PDF) by Chen et al., a Nature article from 2017.
Update:
Comment: [...] why the authors say 1600 kernels in total and there is a summation?
Actually, the authors seem to follow a rather strange notation here. They have an (imho) incorrect way to count kernels. What they rather mean is weights (if given 1x1 kernels...).
Maybe they did not understand that the shape of the kernels are in fact 3-D, due to the last dimension equal to the number of feature maps.
When we break it down there are
40 kernels of size 15x15x1 for the 1st layer (which makes 40 * 15 ** 2 trainable weights)
No kernels in the 2nd layer
40 kernels of size 15x15x40 in the 3rd layer (which makes 1600 * 15 ** 2 trainable weights)
1 kernel of size 15x15x40 for the 4th layer (which makes 40 * 15 ** 2 trainable weights)
import keras.layers as KL
input_image = KL.Input([None, None, 3], name = 'input_image')
x = KL.Conv2D(64, (3,3), padding='same')(input_image)
after Conv, I want to add a dense as below:
KL.Dense(2)(KL.Flatten()(x))
but there will be an error:
ValueError: The shape of the input to "Flatten" is not fully defined
(got (None, None, 64). Make sure to pass a complete "input_shape" or
"batch_input_shape" argument to the first layer in your model.
So if I want a model contained conv followed by dense which can accept any size of input, how should I do?
Neural networks don't work with variable sized inputs. Unless you are dealing with recurrent neural networks.
With a network with variable sized input, what would the weights of the network look like?
Typically, you will pick a size for your input layer and resize or pad your input to match that size.
Although it's not the same as flattening your input you could use Global Max Pooling:
x = KL.GlobalMaxPooling2D()(x)
This will change your dimension from (None, None, None 64) to (None, 64) (including batch dimension). Global Max Pooling is a common way to close up convultional Networks and feed the output into a Dense Neural Network.
To build a CNN model you should use a pooling layer and then a flatten one, as you can see in the example below.
The pooling layer will reduce the number of data to be analysed in the convolutional network, and then we use Flatten to have the data as a "normal" input to a Dense layer. Moreover, after a convolutional layer, we always add a pooling one.
The example below is for 1D CNN but has the same structure as the 2D ones. Again, Flatten() changes the shape of the output to use properly in the last Dense layer.
model = Sequential()
model.add(Conv1D(num_filters_to_use, (filters_size_tuple), input_shape=features_array_shape, activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
I'm a bit new to Keras and deep learning. I'm currently trying to replicate this paper but when I'm compiling the second model (with the LSTMs) I get the following error:
"TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'"
The description of the model is this:
Input (length T is appliance specific window size)
Parallel 1D convolution with filter size 3, 5, and 7
respectively, stride=1, number of filters=32,
activation type=linear, border mode=same
Merge layer which concatenates the output of
parallel 1D convolutions
Bidirectional LSTM consists of a forward LSTM
and a backward LSTM, output_dim=128
Bidirectional LSTM consists of a forward LSTM
and a backward LSTM, output_dim=128
Dense layer, output_dim=128, activation type=ReLU
Dense layer, output_dim= T , activation type=linear
My code is this:
from keras import layers, Input
from keras.models import Model
def lstm_net(T):
input_layer = Input(shape=(T,1))
branch_a = layers.Conv1D(32, 3, activation='linear', padding='same', strides=1)(input_layer)
branch_b = layers.Conv1D(32, 5, activation='linear', padding='same', strides=1)(input_layer)
branch_c = layers.Conv1D(32, 7, activation='linear', padding='same', strides=1)(input_layer)
merge_layer = layers.Concatenate(axis=-1)([branch_a, branch_b, branch_c])
print(merge_layer.shape)
BLSTM1 = layers.Bidirectional(layers.LSTM(128, input_shape=(8,40,96)))(merge_layer)
print(BLSTM1.shape)
BLSTM2 = layers.Bidirectional(layers.LSTM(128))(BLSTM1)
dense_layer = layers.Dense(128, activation='relu')(BLSTM2)
output_dense = layers.Dense(1, activation='linear')(dense_layer)
model = Model(input_layer, output_dense)
model.name = "lstm_net"
return model
model = lstm_net(40)
After that I get the above error. My goal is to give as input a batch of 8 sequences of length 40 and get as output a batch of 8 sequences of length 40 too. I found this issue on Keras Github LSTM layer cannot connect to Dense layer after Flatten #818 and there #fchollet suggests that I should specify the 'input_shape' in the first layer which I did but probably not correctly. I put the two print statements to see how the shape is changing and the output is:
(?, 40, 96)
(?, 256)
The error occurs on the line BLSTM2 is defined and can be seen in full here
Your problem lies in these three lines:
BLSTM1 = layers.Bidirectional(layers.LSTM(128, input_shape=(8,40,96)))(merge_layer)
print(BLSTM1.shape)
BLSTM2 = layers.Bidirectional(layers.LSTM(128))(BLSTM1)
As a default, LSTM is returning only the last element of computations - so your data is losing its sequential nature. That's why the proceeding layer raises an error. Change this line to:
BLSTM1 = layers.Bidirectional(layers.LSTM(128, return_sequences=True))(merge_layer)
print(BLSTM1.shape)
BLSTM2 = layers.Bidirectional(layers.LSTM(128))(BLSTM1)
In order to make the input to the second LSTM to have sequential nature also.
Aside of this - I'd rather not use input_shape in middle model layer as it's automatically inferred.
my question is quite closely related to this question but also goes beyond it.
I am trying to implement the following LSTM in Keras where
the number of timesteps be nb_tsteps=10
the number of input features is nb_feat=40
the number of LSTM cells at each time step is 120
the LSTM layer is followed by TimeDistributedDense layers
From the question referenced above I understand that I have to present the input data as
nb_samples, 10, 40
where I get nb_samples by rolling a window of length nb_tsteps=10 across the original timeseries of shape (5932720, 40). The code is hence
model = Sequential()
model.add(LSTM(120, input_shape=(X_train.shape[1], X_train.shape[2]),
return_sequences=True, consume_less='gpu'))
model.add(TimeDistributed(Dense(50, activation='relu')))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(20, activation='relu')))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(10, activation='relu')))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(3, activation='relu')))
model.add(TimeDistributed(Dense(1, activation='sigmoid')))
Now to my question (assuming the above is correct so far):
The binary responses (0/1) are heavily imbalanced and I need to pass a class_weight dictionary like cw = {0: 1, 1: 25} to model.fit(). However I get an exception class_weight not supported for 3+ dimensional targets. This is because I present the response data as (nb_samples, 1, 1). If I reshape it into a 2D array (nb_samples, 1) I get the exception Error when checking model target: expected timedistributed_5 to have 3 dimensions, but got array with shape (5932720, 1).
Thanks a lot for any help!
I think you should use sample_weight with sample_weight_mode='temporal'.
From the Keras docs:
sample_weight: Numpy array of weights for the training samples, used
for scaling the loss function (during training only). You can either
pass a flat (1D) Numpy array with the same length as the input samples
(1:1 mapping between weights and samples), or in the case of temporal
data, you can pass a 2D array with shape (samples, sequence_length),
to apply a different weight to every timestep of every sample. In this
case you should make sure to specify sample_weight_mode="temporal" in
compile().
In your case you would need to supply a 2D array with the same shape as your labels.
If this is still an issue.. I think the TimeDistributed Layer expects and returns a 3D array (kind of similar to if you have return_sequences=True in the regular LSTM layer). Try adding a Flatten() layer or another LSTM layer at the end before the prediction layer.
d = TimeDistributed(Dense(10))(input_from_previous_layer)
lstm_out = Bidirectional(LSTM(10))(d)
output = Dense(1, activation='sigmoid')(lstm_out)
Using temporal is a workaround. Check out this stack. The issue is also documented on github.