convLSTM2D dilation_rate seems not to work - keras

I'm trying to do an upsampling using the dilation_rate from the convLSTM2D (Keras with Tenosrflow as backend)
input = Input(shape=(10, 64, 64, 1), name='encoder_input')
layer1 = ConvLSTM2D(filters=33, kernel_size=(5,5), dilation_rate=(2, 2))
model = Model(input, layer1(input))
plot_model(model, show_shapes=True, show_layer_names=True)
I would expect the output shape to be (None,128,128,33) but I got (None,64,64,33).
Wouldn't this dilation_rate=(2, 2) be the opposite to strides=(2, 2)?

Dilation, unlike stride, does not change the shape of the data. It simply increases the "spread" of the kernels. In this gif, you can see how it works:
The only change in the shape of the data comes from cutting off 2 from each side, because no padding is used.


Stacking fully connected layers on top of two autoencoders for classification

I'm training autoencoders on 2D images using convolutional layers and would like to put fully connected layers on top of encoder part for classification. My autoencoder is defined as follows (just a simple one for illustration):
def encoder(input_img):
conv1 = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
conv1 = BatchNormalization()(conv1)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Conv2D(64, (3, 3), activation='relu', padding='same')(pool1)
conv2 = BatchNormalization()(conv2)
return conv2
def decoder(conv2):
conv3 = Conv2D(128, (3, 3), activation='relu', padding='same')(conv2)
conv3 = BatchNormalization()(conv3)
up1 = UpSampling2D((2,2))(conv3)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(up1)
return decoded
autoencoder = Model(input_img, decoder(encoder(input_img)))
My input images are of size (64,80,1). Now when stacking fully connected layers on top of the encoder I'm doing the following:
def fc(enco):
flat = Flatten()(enco)
den = Dense(128, activation='relu')(flat)
out = Dense(num_classes, activation='softmax')(den)
return out
encode = encoder(input_img)
full_model = Model(input_img,fc(encode))
for l1,l2 in zip(full_model.layers[:19],autoencoder.layers[0:19]):
For only one autoencoder this works but the problem now is that I have 2 autoencoders trained on sets of images all of size (64, 80, 1).
For every label I have as input two images of size (64, 80, 1) and one label (0 or 1). I need to feed image 1 into the first autoencoder and image 2 into the second autoencoder. But how can I combine both autoencoders in the full_model in above code?
Another problem is also the input to the fit() method. Until now with only one autoencoder the input consisted just of numpy arrays of images (e.g. (1000,64,80,1)) but with two autoencoders I would have two sets of images as input. How can I feed this into the fit() method so that the first autoencoder consumes the first set of images and the second autoencoder the second set?
Q: How can I combine both autoencoders in full_model?
A: You could concatenate the bottleneck layers enco_1 and enco_2 of both autoencoders within fc:
def fc(enco_1, enco_2):
flat_1 = Flatten()(enco_1)
flat_2 = Flatten()(enco_2)
flat = Concatenate()([enco_1, enco_2])
den = Dense(128, activation='relu')(flat)
out = Dense(num_classes, activation='softmax')(den)
return out
encode_1 = encoder_1(input_img_1)
encode_2 = encoder_2(input_img_2)
full_model = Model([input_img_1, input_img_2], fc(encode_1, encode_2))
Note that the last part where you manually set the weights of the encoder is unnecessary - see
Q: How can I feed this into the fit method so that the first autoencoder consumes the first set of images and the second autoencoder the second set?
A: In the code above, note that the two encoders are fed with different inputs (one for each image set). Now, provided that the model is defined in this way, you can call as follows:[images_set_1, images_set_2],
NOTE: Not tested.

How to add a ConvLSTM2D layer after a Conv2D layer?

I'm making an autoEncoder for depth estimation from monocular images. The first layer is a convolutional layer and the second layer is a convolutional LSTM layer. How do I add the ConvLSTM2D layer after the Conv2D layer.
This is the code I've tried but it gives an error.
autoencoder = Sequential()
autoencoder.add(Conv2D(64, (3, 3),strides = 2 , input_shape = (640, 480, 3), activation = 'linear'))
autoencoder.add(LeakyReLU(alpha = 0.1))
autoencoder.add(ConvLSTM2D(256, (3,3), strides = 2, input_shape = (None, 32), return_sequences = True))
I get the following error
ValueError: Input 0 is incompatible with layer conv_gr_u2d_1: expected
ndim=5, found ndim=4
You have maybe misunderstood what ConvLSTM2D is good for. It is designed for the scenario that you have a series of data where each data point is a picture. So, a movie would be a typical use case.
So, whatever you feed into it must have the shape (batch_size, timesteps, rows, cols, channels). On the other hand, Conv2D has an output shape of (batch_size, rows, cols, features). This is what the error is telling you.
Technically, you could just add a Reshape layer between those and generate whatever shape you want, but I don't see how this would make any sense in your scenario.
Having it vice versa (ConvLSTM2D first, then Conv2D) would make much more sense. But then you need "movie-like" input data. If I understand you correctly, you don't have that.
input shape to Conv2D should be:
input_shape = (batch_size, img_wd, img_hg, channels)
input_shape = (None, 640, 480, 3)
and u dont have to add input_shape argument in ConvGRU2D

understanding output shape of keras Conv2DTranspose

I am having a hard time understanding the output shape of keras.layers.Conv2DTranspose
Here is the prototype:
strides=(1, 1),
dilation_rate=(1, 1),
In the documentation (, I read:
If output_padding is set to None (default), the output shape is inferred.
In the code (, I read:
out_height = conv_utils.deconv_length(height,
stride_h, kernel_h,
out_width = conv_utils.deconv_length(width,
stride_w, kernel_w,
if self.data_format == 'channels_first':
output_shape = (batch_size, self.filters, out_height, out_width)
output_shape = (batch_size, out_height, out_width, self.filters)
and (
def deconv_length(dim_size, stride_size, kernel_size, padding, output_padding, dilation=1):
"""Determines output length of a transposed convolution given input length.
# Arguments
dim_size: Integer, the input length.
stride_size: Integer, the stride along the dimension of `dim_size`.
kernel_size: Integer, the kernel size along the dimension of `dim_size`.
padding: One of `"same"`, `"valid"`, `"full"`.
output_padding: Integer, amount of padding along the output dimension, can be set to `None` in which case the output length is inferred.
dilation: dilation rate, integer.
# Returns
The output length (integer).
assert padding in {'same', 'valid', 'full'}
if dim_size is None:
return None
# Get the dilated kernel size
kernel_size = kernel_size + (kernel_size - 1) * (dilation - 1)
# Infer length if output padding is None, else compute the exact length
if output_padding is None:
if padding == 'valid':
dim_size = dim_size * stride_size + max(kernel_size - stride_size, 0)
elif padding == 'full':
dim_size = dim_size * stride_size - (stride_size + kernel_size - 2)
elif padding == 'same':
dim_size = dim_size * stride_size
if padding == 'same':
pad = kernel_size // 2
elif padding == 'valid':
pad = 0
elif padding == 'full':
pad = kernel_size - 1
dim_size = ((dim_size - 1) * stride_size + kernel_size - 2 * pad + output_padding)
return dim_size
I understand that Conv2DTranspose is kind of a Conv2D, but reversed.
Since applying a Conv2D with kernel_size = (3, 3), strides = (10, 10) and padding = "same" to a 200x200 image will output a 20x20 image,
I assume that applying a Conv2DTranspose with kernel_size = (3, 3), strides = (10, 10) and padding = "same" to a 20x20 image will output a 200x200 image.
Also, applying a Conv2D with kernel_size = (3, 3), strides = (10, 10) and padding = "same" to a 195x195 image will also output a 20x20 image.
So, I understand that there is kind of an ambiguity on the output shape when applying a Conv2DTranspose with kernel_size = (3, 3), strides = (10, 10) and padding = "same" (user might want output to be 195x195, or 200x200, or many other compatible shapes).
I assume that "the output shape is inferred." means that a default output shape is computed according to the parameters of the layer, and I assume that there is a mechanism to specify an output shape differnet from the default one, if necessary.
This said, I do not really understand
the meaning of the "output_padding" parameter
the interactions between parameters "padding" and "output_padding"
the various formulas in the function keras.conv_utils.deconv_length
Could someone explain this?
Many thanks,
I may have found a (partial) answer.
I found it in the Pytorch documentation, which appears to be much clearer than the Keras documentation on this topic.
When applying Conv2D with a stride greater than 1 to images which dimensions are close, we get output images with the same dimensions.
For instance, when applied a Conv2D with kernel size of 3x3, stride of 7x7 and padding "same", the following image dimensions
22x22, 23x23, ..., 28x28, 22x28, 28x22, 27x24, etc. (7x7 = 49
will ALL yield an output dimension of 4x4.
That is because output_dimension = ceiling(input_dimension / stride).
As a consequence, when applying a Conv2DTranspose with kernel size of 3x3, stride of 7x7 and padding "same", there is an ambiguity about the output dimension.
Any of the 49 possible output dimensions would be correct.
The parameter output_padding is a way to resolve the ambiguity by choosing explicitly the output dimension.
In my example, the minimum output size is 22x22, and output_padding provides a number of lines (between 0 and 6) to add at the bottom of the output image and a number of columns (between 0 and 6) to add at the right of the output image.
So I can get output_dimensions = 24x25 if I use outout_padding = (2, 3)
What I still do not understand, however, is the logic that keras uses to choose a certain output image dimension when output_padding is not specified (when it 'infers" the output shape)
A few pointers:
So to answer my own questions:
the meaning of the "output_padding" parameter: see above
the interactions between parameters "padding" and "output_padding": these parameters are independant
the various formulas in the function keras.conv_utils.deconv_length
For now, I do not understand the part when output_padding is None;
I ignore the case when padding == 'full' (not supported by Conv2DTranspose);
The formula for padding == 'valid' seems correct (can be computed by reversing the formula of Conv2D)
The formula for padding == 'same' seems incorrect to me, in case kernel_size is even. (As a matter of fact, keras crashes when trying to build a Conv2DTranspose layer with input_dimension = 5x5, kernel_size = 2x2, stride = 7x7 and padding = 'same'. It appears to me that there is a bug in keras, I will start another thread for this topic...)
Outpadding in Conv2DTranspose is also what I am concerned about when designing an autoencoder.
Assume stride is always 1. Along the encoder path, for each convolution layer, I chose padding='valid', which means that if my input image is HXW, and the filter is sized mXn, the output of the layer will be (H-(m-1))X(W-(n-1)).
In the corresponding Con2DTranspose layer along the decoder path, if I use Theano, in order to resume the input size of its corresponding Con2D, I have to chose padding='full', and out_padding = None or 0 (no difference), which implies the input size will be expanded by [m-1, n-1] around it, that is, (m-1)/2 for top and bottom, and (n-1)/2 for left and right.
If I use tensorflow, I will have to choose padding = 'same', and out_padding = 2*((filter_size-1)//2), I think that is Keras' intended behaviour.
If stride is not 1, then you will have to calculate carefully how many output paddings are to be added.
In Conv2D out_size = floor(in_size+2*padding_size-filter_size)/stride+1)
If we choose padding = 'same', Keras will automatically set padding = (filter_size-1)/2; whilst if we choose 'valid', padding_size will be set 0, which is the convention of any N-D convolutions.
Conversely, in Con2DTranspose out_size = (in_size-1)*stride+filter_size-2*padding_size
where padding_size refers to how many pixels will actually be padded caused by 'padding' option and out_padding together. Based upon the discussion above, there is no 'full' option on tensorflow, we will have to use out_padding to resume the input size of its corresponding Con2D.
Could you try and see if it works properly and let me know, please?
So in summary, I think out_padding is used for facilitating different backends.
When output_padding=None, Keras uses the deconv_output_length method to compute the output length, which sets it to:
if padding == 'valid':
length = input_length * stride + max(filter_size - stride, 0)
elif padding == 'same':
length = input_length * stride
Now in the documentation it says that if output_padding is set, the output length will be
((input_length - 1) * stride + filter_size - 2 * padding + output_padding
So using this we can figure out what the default output_padding is.
In the padding='valid' case, padding = 0 in the above, so solving for output_padding:
output_padding = max(stride - filter_size, 0)
In this case, padding = 0 in the above, so solving for output_padding:
output_padding = max(stride - filter_size, 0)
and one can check that setting this results in the same as setting it to None
padding = 'same'
This case is much more mysterious, and in fact it seems to be impossible to get the same as output_padding=None by setting it to any integer. For example with strides=2 and kernel_size=2, for an output_padding larger than 1, it gives a warning that the stride must be larger than the output padding. For anything smaller than 1 it gives a warning that the size of out_backprop doesn't match computed. So the only value that works is 1, but this results in a different output shape from None.
In fact it is not implemented by setting output_padding to some default value, it is only used to compute the output shape, which then is used in the convolution method.

Getting an error while adding a dense layer in keras

I was trying to implement a simple Keras cat vs dog classifier, but while adding a dense layer, it returns an value error.
I'm using theano as backend.
Here's the code:
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
classifier = Sequential()
classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))
classifier.add(Conv2D(32, (3, 3), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))
classifier.add(Dense(units = 128, activation = 'relu'))
Here's the summary of the model
While executing the last line (adding Dense layer), I'm getting the following error:
ValueError: ('The specified size contains a dimension with value <= 0', (-448, 128))
Here's my keras.json file content
"backend": "theano",
"image_data_format": "channels_first",
"floatx": "float32",
"epsilon": 1e-07
I'm not able to find the problem.
Thanks in advance!
You're convolving across the channels dimension, try to explicitly set the data_format parameter in convolutions and pooling like this:
classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu', data_format='channels_last'))
classifier.add(Conv2D(32, (3, 3), activation = 'relu', data_format='channels_last'))
classifier.add(MaxPooling2D(pool_size = (2, 2), data_format='channels_last'))
Or reshape your data to have shape (3, 64, 64).
In simple terms, convolution is supposed to work roughly as shown in this gif:
You see that the gray-ish filter is strided across the pixels of your image (blue) in order to extract what are called local patterns (in green-ish). The application of this filter should ideally happen along the width and height of your image, namely the two 64-dimensions in your data.
This is also especially useful when, as it is customary, we split images in channels, usually to represent their RGB components. In this case, the same process shown in the gif is applied in parallel to the three channels, and in general can be applied to N arbitrary channels. This image should help clarify:
To cut a long story short, when you call:
classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))
Keras by default thinks that you're passing it a 64 x 3 image with 64 channels, and tries to convolve accordingly. This is obvioulsy wrong and results in negative dimensions (note how convolutions shrink the size of the image). By specifying the 'channels_last' format, you're telling Keras how the image is oriented (with the components dimension in the last "place"), so that it is able to convolve properly across the 64 x 64 images.
I run your code above and the summary i get is different than yours.
You should provide more information such as the keras version and backend you use...
I suspect there is something wrong within your keras.json file
as from official keras page, check your keras.json file (located in your home directory .keras/keras.json)
It should look like
"image_data_format": "channels_last",
"image_dim_ordering": "tf",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "tensorflow"
"image_data_format": "channels_last",
"image_dim_ordering": "th",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "theano"

TypeError when trying to create a BLSTM network in Keras

I'm a bit new to Keras and deep learning. I'm currently trying to replicate this paper but when I'm compiling the second model (with the LSTMs) I get the following error:
"TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'"
The description of the model is this:
Input (length T is appliance specific window size)
Parallel 1D convolution with filter size 3, 5, and 7
respectively, stride=1, number of filters=32,
activation type=linear, border mode=same
Merge layer which concatenates the output of
parallel 1D convolutions
Bidirectional LSTM consists of a forward LSTM
and a backward LSTM, output_dim=128
Bidirectional LSTM consists of a forward LSTM
and a backward LSTM, output_dim=128
Dense layer, output_dim=128, activation type=ReLU
Dense layer, output_dim= T , activation type=linear
My code is this:
from keras import layers, Input
from keras.models import Model
def lstm_net(T):
input_layer = Input(shape=(T,1))
branch_a = layers.Conv1D(32, 3, activation='linear', padding='same', strides=1)(input_layer)
branch_b = layers.Conv1D(32, 5, activation='linear', padding='same', strides=1)(input_layer)
branch_c = layers.Conv1D(32, 7, activation='linear', padding='same', strides=1)(input_layer)
merge_layer = layers.Concatenate(axis=-1)([branch_a, branch_b, branch_c])
BLSTM1 = layers.Bidirectional(layers.LSTM(128, input_shape=(8,40,96)))(merge_layer)
BLSTM2 = layers.Bidirectional(layers.LSTM(128))(BLSTM1)
dense_layer = layers.Dense(128, activation='relu')(BLSTM2)
output_dense = layers.Dense(1, activation='linear')(dense_layer)
model = Model(input_layer, output_dense) = "lstm_net"
return model
model = lstm_net(40)
After that I get the above error. My goal is to give as input a batch of 8 sequences of length 40 and get as output a batch of 8 sequences of length 40 too. I found this issue on Keras Github LSTM layer cannot connect to Dense layer after Flatten #818 and there #fchollet suggests that I should specify the 'input_shape' in the first layer which I did but probably not correctly. I put the two print statements to see how the shape is changing and the output is:
(?, 40, 96)
(?, 256)
The error occurs on the line BLSTM2 is defined and can be seen in full here
Your problem lies in these three lines:
BLSTM1 = layers.Bidirectional(layers.LSTM(128, input_shape=(8,40,96)))(merge_layer)
BLSTM2 = layers.Bidirectional(layers.LSTM(128))(BLSTM1)
As a default, LSTM is returning only the last element of computations - so your data is losing its sequential nature. That's why the proceeding layer raises an error. Change this line to:
BLSTM1 = layers.Bidirectional(layers.LSTM(128, return_sequences=True))(merge_layer)
BLSTM2 = layers.Bidirectional(layers.LSTM(128))(BLSTM1)
In order to make the input to the second LSTM to have sequential nature also.
Aside of this - I'd rather not use input_shape in middle model layer as it's automatically inferred.
