Getting an error while adding a dense layer in keras - python-3.x

I was trying to implement a simple Keras cat vs dog classifier, but while adding a dense layer, it returns an value error.
I'm using theano as backend.
Here's the code:
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
classifier = Sequential()
classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))
classifier.add(Conv2D(32, (3, 3), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))
classifier.add(Flatten())
classifier.add(Dense(units = 128, activation = 'relu'))
Here's the summary of the model
While executing the last line (adding Dense layer), I'm getting the following error:
ValueError: ('The specified size contains a dimension with value <= 0', (-448, 128))
Here's my keras.json file content
{
"backend": "theano",
"image_data_format": "channels_first",
"floatx": "float32",
"epsilon": 1e-07
}
I'm not able to find the problem.
Thanks in advance!

You're convolving across the channels dimension, try to explicitly set the data_format parameter in convolutions and pooling like this:
classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu', data_format='channels_last'))
classifier.add(Conv2D(32, (3, 3), activation = 'relu', data_format='channels_last'))
classifier.add(MaxPooling2D(pool_size = (2, 2), data_format='channels_last'))
Or reshape your data to have shape (3, 64, 64).
In simple terms, convolution is supposed to work roughly as shown in this gif:
You see that the gray-ish filter is strided across the pixels of your image (blue) in order to extract what are called local patterns (in green-ish). The application of this filter should ideally happen along the width and height of your image, namely the two 64-dimensions in your data.
This is also especially useful when, as it is customary, we split images in channels, usually to represent their RGB components. In this case, the same process shown in the gif is applied in parallel to the three channels, and in general can be applied to N arbitrary channels. This image should help clarify:
To cut a long story short, when you call:
classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))
Keras by default thinks that you're passing it a 64 x 3 image with 64 channels, and tries to convolve accordingly. This is obvioulsy wrong and results in negative dimensions (note how convolutions shrink the size of the image). By specifying the 'channels_last' format, you're telling Keras how the image is oriented (with the components dimension in the last "place"), so that it is able to convolve properly across the 64 x 64 images.

I run your code above and the summary i get is different than yours.
You should provide more information such as the keras version and backend you use...
I suspect there is something wrong within your keras.json file
as from official keras page, check your keras.json file (located in your home directory .keras/keras.json)
It should look like
{
"image_data_format": "channels_last",
"image_dim_ordering": "tf",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "tensorflow"
}
or
{
"image_data_format": "channels_last",
"image_dim_ordering": "th",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "theano"
}

Related

Converting .npz model from ChainerRL to Keras model, or alternative methods?

I have a DQN reinforcement learning model which was trained using ChainerRL's built-in DQN experiment on the Ms Pacman Atari game environment, let's call this file model.npz. I have some analysis software written in Keras, which uses a Keras network and loads into that network a model.
I am having trouble getting the .npz exported from ChainerRL to play nice with the Keras network.
I have figured out how to load the weights from the .npz file. I think I figured out how to make sure the Keras model matches the Chainer RL model in terms of kernel size, stride, and activation.
Here is the code which calls the function that builds the network in ChainerRL:
return links.Sequence(
links.NatureDQNHead(),
L.Linear(512, n_actions),
DiscreteActionValue)
And the code which gets called by this, and builds a Chainer DQN network, is:
class NatureDQNHead(chainer.ChainList):
"""DQN's head (Nature version)"""
def __init__(self, n_input_channels=4, n_output_channels=512,
activation=F.relu, bias=0.1):
self.n_input_channels = n_input_channels
self.activation = activation
self.n_output_channels = n_output_channels
layers = [
#L.Convolution2D(n_input_channels, out_channel=32, ksize=8, stride=4, pad=0, nobias=False, initialW=None, initial_bias=bias, *, dilate=1, groups=1),
L.Convolution2D(n_input_channels, 32, 8, stride=4,
initial_bias=bias),
#L.Convolution2D(n_input_channels=32, out_channel=64, ksize=4, stride=2, pad=0, nobias=False, initialW=None, initial_bias=bias, *, dilate=1, groups=1),
L.Convolution2D(32, 64, 4, stride=2, initial_bias=bias),
#L.Convolution2D(n_input_channels=64, out_channel=64, ksize=3, stride=1, pad=0, nobias=False, initialW=None, initial_bias=bias, *, dilate=1, groups=1),
L.Convolution2D(64, 64, 3, stride=1, initial_bias=bias),
#L.Convolution2D(in_size=3136, out_size=n_output_channels, nobias=False, initialW=None, initial_bias=bias),
L.Linear(3136, n_output_channels, initial_bias=bias),
]
super(NatureDQNHead, self).__init__(*layers)
def __call__(self, state):
h = state
for layer in self:
h = self.activation(layer(h))
return h
So I wrote the following Keras code to build an equivalent network in Keras:
# Keras Model
hidden = 512
#bias initializer to match the chainerRL one
initial_bias = tf.keras.initializers.Constant(0.1)
#matches default "channels_last" data format for Keras layers
inputs = Input(shape=(84, 84, 4))
#First call to Conv2D including all defaults for easy reference
x = Conv2D(filters=32, kernel_size=(8, 8), strides=4, padding='valid', data_format=None, dilation_rate=(1, 1), activation='relu', use_bias=True, kernel_initializer='glorot_uniform', bias_initializer=initial_bias, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, name='deepq/q_func/convnet/Conv')(inputs)
x1 = Conv2D(filters=64, kernel_size=(4, 4), strides=2, activation='relu', padding='valid', bias_initializer=initial_bias, name='deepq/q_func/convnet/Conv_1')(x)
x2 = Conv2D(filters=64, kernel_size=(3, 3), strides=1, activation='relu', padding='valid', bias_initializer=initial_bias, name='deepq/q_func/convnet/Conv_2')(x1)
#Flatten for move to linear layers
conv_out = Flatten()(x2)
action_out = Dense(hidden, activation='relu', name='deepq/q_func/action_value/fully_connected')(conv_out)
action_scores = Dense(units = 9, name='deepq/q_func/action_value/fully_connected_1', activation='linear', use_bias=True, kernel_initializer="glorot_uniform", bias_initializer=initial_bias, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None,)(action_out) # num_actions in {4, .., 18}
#Now create model using the above-defined layers
modelArchitecture = Model(inputs, action_scores)
I have examined the structure of the initial weights for the Keras model and found them to be as follows:
Layer 0: no weights
Layer 1: (8,8,4,32)
Layer 2: (4,4,32,64)
Layer 3: (4,4,64,64)
Layer 4: no weights
Layer 5: (3136,512)
Layer 6: (9,512)
Then, I examined the weights in the .npz model which I am trying to import and found them to be as follows:
Layer 0: (32,4,8,8)
Layer 1: (64,32,4,4)
Layer 2: (64,64,4,4)
Layer 3: (512,3136)
Layer 4: (9,512)
So, I reshaped the weights from Layer 0 of model.npz with numpy.reshape and applied them to Layer 1 of the Keras network. I did the same with the model.npz weights for Layer 1, and applied them to Layer 2 of the Keras network. Then, I reshaped the weights from Layer 2 of model.npz, and applied them to Layer 3 of the Keras network. I transposed the weights of Layer 3 from model.npz, and applied them to Layer 5 of the Keras model. Finally, I transposed the weights of Layer 4 of model.npz and applied them to Layer 6 of the Keras model.
I saved the model in .H5 format, and then tried to run it on the evaluation code in the Ms Pacman Atari environment, and produces a video. When I do this, Pacman follows the exact same, short path, runs face-first into a wall, and then keeps trying to walk through the wall until a ghost kills it.
It seems, therfore, like I am doing something wrong in my translation between the Chainer DQN network and the Keras DQN network. I am not sure if maybe they process color in a different order or something?
I also attempted to export the ChainerRL model.npz file to ONNX, but got several errors to the point where it didn't seem possible without rewriting a lot of the ChainerRL code base.
Any help would be appreciated.
I am the author of ChainerRL. I have no experience with Keras, but apparently the formats of the weight parameters seem different between Chainer and Keras. You should check the meaning of each dimension of the weight parameters for each deep learning framework. In Chainer, as you can find in the document (https://docs.chainer.org/en/stable/reference/generated/chainer.functions.convolution_2d.html#chainer.functions.convolution_2d), the weight parameter of Convolution2D is stored as (c_O, c_I, h_K, w_K).
Once you find the meaning of each dimension, I guess what you need is always numpy.transpose, not numpy.reshape, to re-order dimensions to match the order of Keras.

Stacking fully connected layers on top of two autoencoders for classification

I'm training autoencoders on 2D images using convolutional layers and would like to put fully connected layers on top of encoder part for classification. My autoencoder is defined as follows (just a simple one for illustration):
def encoder(input_img):
conv1 = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
conv1 = BatchNormalization()(conv1)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Conv2D(64, (3, 3), activation='relu', padding='same')(pool1)
conv2 = BatchNormalization()(conv2)
return conv2
def decoder(conv2):
conv3 = Conv2D(128, (3, 3), activation='relu', padding='same')(conv2)
conv3 = BatchNormalization()(conv3)
up1 = UpSampling2D((2,2))(conv3)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(up1)
return decoded
autoencoder = Model(input_img, decoder(encoder(input_img)))
My input images are of size (64,80,1). Now when stacking fully connected layers on top of the encoder I'm doing the following:
def fc(enco):
flat = Flatten()(enco)
den = Dense(128, activation='relu')(flat)
out = Dense(num_classes, activation='softmax')(den)
return out
encode = encoder(input_img)
full_model = Model(input_img,fc(encode))
for l1,l2 in zip(full_model.layers[:19],autoencoder.layers[0:19]):
l1.set_weights(l2.get_weights())
For only one autoencoder this works but the problem now is that I have 2 autoencoders trained on sets of images all of size (64, 80, 1).
For every label I have as input two images of size (64, 80, 1) and one label (0 or 1). I need to feed image 1 into the first autoencoder and image 2 into the second autoencoder. But how can I combine both autoencoders in the full_model in above code?
Another problem is also the input to the fit() method. Until now with only one autoencoder the input consisted just of numpy arrays of images (e.g. (1000,64,80,1)) but with two autoencoders I would have two sets of images as input. How can I feed this into the fit() method so that the first autoencoder consumes the first set of images and the second autoencoder the second set?
Q: How can I combine both autoencoders in full_model?
A: You could concatenate the bottleneck layers enco_1 and enco_2 of both autoencoders within fc:
def fc(enco_1, enco_2):
flat_1 = Flatten()(enco_1)
flat_2 = Flatten()(enco_2)
flat = Concatenate()([enco_1, enco_2])
den = Dense(128, activation='relu')(flat)
out = Dense(num_classes, activation='softmax')(den)
return out
encode_1 = encoder_1(input_img_1)
encode_2 = encoder_2(input_img_2)
full_model = Model([input_img_1, input_img_2], fc(encode_1, encode_2))
Note that the last part where you manually set the weights of the encoder is unnecessary - see https://keras.io/getting-started/functional-api-guide/#shared-layers
Q: How can I feed this into the fit method so that the first autoencoder consumes the first set of images and the second autoencoder the second set?
A: In the code above, note that the two encoders are fed with different inputs (one for each image set). Now, provided that the model is defined in this way, you can call full_model.fit as follows:
full_model.fit(x=[images_set_1, images_set_2],
y=label,
...)
NOTE: Not tested.

How to add a ConvLSTM2D layer after a Conv2D layer?

I'm making an autoEncoder for depth estimation from monocular images. The first layer is a convolutional layer and the second layer is a convolutional LSTM layer. How do I add the ConvLSTM2D layer after the Conv2D layer.
This is the code I've tried but it gives an error.
autoencoder = Sequential()
autoencoder.add(Conv2D(64, (3, 3),strides = 2 , input_shape = (640, 480, 3), activation = 'linear'))
autoencoder.add(LeakyReLU(alpha = 0.1))
autoencoder.add(ConvLSTM2D(256, (3,3), strides = 2, input_shape = (None, 32), return_sequences = True))
I get the following error
ValueError: Input 0 is incompatible with layer conv_gr_u2d_1: expected
ndim=5, found ndim=4
You have maybe misunderstood what ConvLSTM2D is good for. It is designed for the scenario that you have a series of data where each data point is a picture. So, a movie would be a typical use case.
So, whatever you feed into it must have the shape (batch_size, timesteps, rows, cols, channels). On the other hand, Conv2D has an output shape of (batch_size, rows, cols, features). This is what the error is telling you.
Technically, you could just add a Reshape layer between those and generate whatever shape you want, but I don't see how this would make any sense in your scenario.
Having it vice versa (ConvLSTM2D first, then Conv2D) would make much more sense. But then you need "movie-like" input data. If I understand you correctly, you don't have that.
input shape to Conv2D should be:
input_shape = (batch_size, img_wd, img_hg, channels)
eg:
input_shape = (None, 640, 480, 3)
and u dont have to add input_shape argument in ConvGRU2D

Creating a CNN Model in Keras with feature maps from each of the previous filtered images

I am trying to implement the artificial convolutional neural network in order to perform a two-class pixel-wise classification as seen in the figure attached (from Chen et al. Nature 2017).
Can you give me a hint on what the third and fourth layers should look like?
This is how far I've got already:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
model = Sequential()
model.add(Conv2D(40, (15, 15), activation='relu',
padding='same', input_shape = (64, 64, 1))) # first layer
model.add(MaxPooling2D((2, 2), padding='same')) # second layer
# model.add(...) # third layer <-- how to implement this?
# model.add(...) # fourth layer <-- how to implement this?
print(model.summary())
How many kernels did they use for the remaining layers and how should I interpret the summation symbols in the image?
Thanks in advance!
The actual question is rather ambiguous.
I am guessing correctly, that you want someone to implement the missing two lines of code for the network?
model = Sequential()
model.add(Conv2D(40, (15, 15), activation='relu',
padding='same', input_shape=(64, 64, 1)))
model.add(MaxPooling2D((2, 2), padding='same'))
model.add(Conv2D(40, (15, 15), activation='relu', padding='same')) # layer 3
model.add(Conv2D(1, (15, 15), activation='linear', padding='same')) # layer 4
print(model.summary())
To get 40 feature maps after layer 3, we just convolve with 40 different kernels.
After layer 4, there should be only one feature map / channel, so 1 kernel is enough here.
By the way, the figure seems to be from Convolutional neural networks for automated annotation of cellular cryo-electron tomograms (PDF) by Chen et al., a Nature article from 2017.
Update:
Comment: [...] why the authors say 1600 kernels in total and there is a summation?
Actually, the authors seem to follow a rather strange notation here. They have an (imho) incorrect way to count kernels. What they rather mean is weights (if given 1x1 kernels...).
Maybe they did not understand that the shape of the kernels are in fact 3-D, due to the last dimension equal to the number of feature maps.
When we break it down there are
40 kernels of size 15x15x1 for the 1st layer (which makes 40 * 15 ** 2 trainable weights)
No kernels in the 2nd layer
40 kernels of size 15x15x40 in the 3rd layer (which makes 1600 * 15 ** 2 trainable weights)
1 kernel of size 15x15x40 for the 4th layer (which makes 40 * 15 ** 2 trainable weights)

TypeError when trying to create a BLSTM network in Keras

I'm a bit new to Keras and deep learning. I'm currently trying to replicate this paper but when I'm compiling the second model (with the LSTMs) I get the following error:
"TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'"
The description of the model is this:
Input (length T is appliance specific window size)
Parallel 1D convolution with filter size 3, 5, and 7
respectively, stride=1, number of filters=32,
activation type=linear, border mode=same
Merge layer which concatenates the output of
parallel 1D convolutions
Bidirectional LSTM consists of a forward LSTM
and a backward LSTM, output_dim=128
Bidirectional LSTM consists of a forward LSTM
and a backward LSTM, output_dim=128
Dense layer, output_dim=128, activation type=ReLU
Dense layer, output_dim= T , activation type=linear
My code is this:
from keras import layers, Input
from keras.models import Model
def lstm_net(T):
input_layer = Input(shape=(T,1))
branch_a = layers.Conv1D(32, 3, activation='linear', padding='same', strides=1)(input_layer)
branch_b = layers.Conv1D(32, 5, activation='linear', padding='same', strides=1)(input_layer)
branch_c = layers.Conv1D(32, 7, activation='linear', padding='same', strides=1)(input_layer)
merge_layer = layers.Concatenate(axis=-1)([branch_a, branch_b, branch_c])
print(merge_layer.shape)
BLSTM1 = layers.Bidirectional(layers.LSTM(128, input_shape=(8,40,96)))(merge_layer)
print(BLSTM1.shape)
BLSTM2 = layers.Bidirectional(layers.LSTM(128))(BLSTM1)
dense_layer = layers.Dense(128, activation='relu')(BLSTM2)
output_dense = layers.Dense(1, activation='linear')(dense_layer)
model = Model(input_layer, output_dense)
model.name = "lstm_net"
return model
model = lstm_net(40)
After that I get the above error. My goal is to give as input a batch of 8 sequences of length 40 and get as output a batch of 8 sequences of length 40 too. I found this issue on Keras Github LSTM layer cannot connect to Dense layer after Flatten #818 and there #fchollet suggests that I should specify the 'input_shape' in the first layer which I did but probably not correctly. I put the two print statements to see how the shape is changing and the output is:
(?, 40, 96)
(?, 256)
The error occurs on the line BLSTM2 is defined and can be seen in full here
Your problem lies in these three lines:
BLSTM1 = layers.Bidirectional(layers.LSTM(128, input_shape=(8,40,96)))(merge_layer)
print(BLSTM1.shape)
BLSTM2 = layers.Bidirectional(layers.LSTM(128))(BLSTM1)
As a default, LSTM is returning only the last element of computations - so your data is losing its sequential nature. That's why the proceeding layer raises an error. Change this line to:
BLSTM1 = layers.Bidirectional(layers.LSTM(128, return_sequences=True))(merge_layer)
print(BLSTM1.shape)
BLSTM2 = layers.Bidirectional(layers.LSTM(128))(BLSTM1)
In order to make the input to the second LSTM to have sequential nature also.
Aside of this - I'd rather not use input_shape in middle model layer as it's automatically inferred.

Resources