Maxpooling Layer causes error in Keras - python-3.x

I have created a CNN in Keras with 12 Convolutional layers each followed by BatchNormalization, Activation and MaxPooling. A sample of the layer is:
model.add(Conv2D(256, (3, 3), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=2))
I start with 32 feature maps and end with 512. If I add MaxPooling after every Conv Layer like in the code above, I get an error in the final layer:
ValueError: Negative dimension size caused by subtracting 2 from 1 for 'max_pooling2d_11/MaxPool' (op: 'MaxPool') with input shapes: [?,1,1,512].
If I omit one MaxPooling in any layer the model compiles and starts training. I am using Tensorflow as backend and I have the right input shape of the image in the first layer.
Are there any suggestions why this may happening?

If your spatial dimensions are 256x256, then you cannot have more than 8 Max-Pooling layers in your network. As 2 ** 8 == 256, after downsampling by a factor of two, eight times, your feature maps will be 1x1 in the spatial dimensions, meaning you cannot perform max pooling as you would get a 0x0 or negative dimensions.
Its just an obvious limitation of Max Pooling but not always discussed in papers.

This can also be caused by having your input image in the wrong format
if you're using (3,X,Y) and it expects (X,Y,3) then the down sampling occurs on the colour channels and causes issues.

Related

How to use Conv1D as an input layer

I am trying to use Conv1d as a first layer. I read the link https://keras.io/api/layers/convolution_layers/convolution1d/ but failed to understand.
If I use the first layer as dense like
model.add(keras.layers.Dense(12, input_dim=232, activation='relu'))
it works fine but if try to use the first layer as Conv1d like
model.add(keras.layers.Conv1D(32, 5, activation = 'relu'))
I get an error as:
Input 0 of layer "conv1d" is incompatible with the layer: expected min_ndim=3, found ndim=2. Full shape received: (1, 232)
My input size is 3000X232 and I am trying to learn whether a particular vector is present or not. So my output is either 0 or 1. If the vector is present in the input, the output is 1 and if the vector is absent in the input the output is 0. So I am learning a simple two-class classifier.
Can anyone help? Is it that there must be at least 2 spatial dimensions to use Conv1D?

Visualizing convoluational layers in autoencoder

I have built a variational autoencoder using 2D convolutions (Conv2D) in the encoder and decoder. I'm using Keras. In total I have 2 layers with 32 and 64 filters each and a a kernel size of 4x4 and stride 2x2 each. My input images are (64, 80, 1). I'm using the MSE loss. Now, I would like to visualize the individual convolutional layers (i.e. what they learn) as done here.
So, first I load my model using load_weights() function and then I call visualize_layer(encoder, 'conv2d_1') from above mentioned code where conv2d_1 is the layer name of the first convolutional layer in my encoder.
When I do so I'm getting the error message
tensorflow.python.framework.errors_impl.UnimplementedError: Fused conv implementation does not support grouped convolutions for now.
[[{{node conv2d_1/BiasAdd}}]]
When I use the VGG16 model as in the example code it works. Does somebody know how I can adapt the code to work for my case?

In 'ResNet50 model for Keras', why use 1x1 convolution with stride = 2?

ResNet50 is here: https://github.com/fchollet/deep-learning-models/blob/master/resnet50.py
In a 'conv_block', the first layer is like this:
x = Conv2D(filters1 = 64, # number of filters
kernel_size=(1, 1), # height/width of filters
strides=(2, 2) # stride
)(input_tensor)
My question is:
Isn't this layer going to miss some pixels?
This 1x1 convolutions only look at 1 pixel, and then move 2 pixels(stride=2).
It was mentioned in the original paper of Resnet:
The convolutional layers mostly have 3×3 filters and
follow two simple design rules: (i) for the same output
feature map size, the layers have the same number of filters; and (ii) if the feature map size is halved, the number of filters is doubled so as to preserve the time complexity per layer. We perform downsampling directly by
convolutional layers that have a stride of 2
So you may consider it as a replacement for Pooling layer, and it also reduces calculation complexity of the whole model comparing to calculate the whole activation map and then pooling it.

Why return sequences in stacked RNNs?

When stacking RNNs, it is mandatory to set return_sequences parameter as True in Keras.
For instance in Keras,
lstm1 = LSTM(1, return_sequences=True)(inputs1)
lstm2 = LSTM(1)(lstm1)
It is somewhat intuitive to preserve the dimensionality of input space for each stacked RNN layer, however, I am not convinced thoroughly.
Can someone (mathematically) explain the reason?
Thanks.
The input shape for recurrent layers is:
(number_of_sequences, time_steps, input_features).
This is absolutely required for recurrent layers because there can only be any recurrency if there are time steps.
Now, compare the "outputs" of the recurrent layers in each case:
with return_sequences=True - (number_of_sequences, time_steps, output_features)
with return_sequences=False - (number_of_sequences, output_features)
Without return_sequences=True, you eliminate the time steps, so, it cannot be fed into a recurrent layer, because there aren't enough dimensions and the most important one, the time_steps is not present.

How to change batch size of an intermediate layer in Keras?

My problem is to take all hidden outputs from an LSTM and use them as training examples for a single dense layer. Flattening the output of the hidden layers and feeding them to a dense layer is not what I am looking to do. I have tried the following things:
I have considered Timedistributed wrapper for the dense layer (https://keras.io/layers/wrappers/). But, this seems to apply the same layer to every time slice, which is not what I want. In other words, the Timedistributed wrapper has input_shape of a 3D tensor (number of samples, number of timesteps, number of features) and produces another 3D tensor of the same type: (number of samples, number of timesteps, number of features). Instead what I want is a 2D tensor as output, which looks like (number of samples*number of timesteps, number of features)
There was a pull request for an AdvancedReshapeLayer: https://github.com/fchollet/keras/pull/36 on GitHub. This seems to be exactly what I am looking for. Unfortunately, it appears like that pull request was closed with no conclusive outcome.
I tried to build my own lambda layer to accomplish what I want as follows:
A). model.add(LSTM(NUM_LSTM_UNITS, return_sequences=True, activation='tanh')) #
B). model.add(Lambda(lambda x: x, output_shape=lambda x: (x[0]*x[1], x[2])))
C). model.add(Dense(NUM_CLASSES, input_dim=NUM_LSTM_UNITS))
mode.output_shape after (A) prints: (BATCH_SIZE, NUM_TIME_STEPS, NUM_LSTM_UNITS) and model.output_shape after (B) prints: (BATCH_SIZE*NUM_OF_TIMESTEPS, NUM_LSTM_UNITS)
Which is exactly what I am trying to achieve.
Unfortunately, when I try to run step (C). I get the following error:
Input 0 is incompatible with layer dense_1: expected ndim=2, found
ndim=3
This is baffling since when I print model.output_shape after (B), I do indeed see (BATCH_SIZE*NUM_OF_TIMESTEPS, NUM_LSTM_UNITS), which is of ndim=2.
Really appreciate any help with this.
EDIT: When I try to use the functional API instead of a sequential model, I still get the same error on step (C)
You can use backend reshape which includes batch_size dimension.
def backend_reshape(x):
return backend.reshape(x, (-1, NUM_LSTM_UNITS))
model.add(Lambda(backend_reshape, output_shape=(NUM_LSTM_UNITS,)))

Resources