How to add a ConvLSTM2D layer after a Conv2D layer? - keras

I'm making an autoEncoder for depth estimation from monocular images. The first layer is a convolutional layer and the second layer is a convolutional LSTM layer. How do I add the ConvLSTM2D layer after the Conv2D layer.
This is the code I've tried but it gives an error.
autoencoder = Sequential()
autoencoder.add(Conv2D(64, (3, 3),strides = 2 , input_shape = (640, 480, 3), activation = 'linear'))
autoencoder.add(LeakyReLU(alpha = 0.1))
autoencoder.add(ConvLSTM2D(256, (3,3), strides = 2, input_shape = (None, 32), return_sequences = True))
I get the following error
ValueError: Input 0 is incompatible with layer conv_gr_u2d_1: expected
ndim=5, found ndim=4

You have maybe misunderstood what ConvLSTM2D is good for. It is designed for the scenario that you have a series of data where each data point is a picture. So, a movie would be a typical use case.
So, whatever you feed into it must have the shape (batch_size, timesteps, rows, cols, channels). On the other hand, Conv2D has an output shape of (batch_size, rows, cols, features). This is what the error is telling you.
Technically, you could just add a Reshape layer between those and generate whatever shape you want, but I don't see how this would make any sense in your scenario.
Having it vice versa (ConvLSTM2D first, then Conv2D) would make much more sense. But then you need "movie-like" input data. If I understand you correctly, you don't have that.

input shape to Conv2D should be:
input_shape = (batch_size, img_wd, img_hg, channels)
eg:
input_shape = (None, 640, 480, 3)
and u dont have to add input_shape argument in ConvGRU2D

Related

ValueError Input 0 of layer sequential_13 is incompatible with the layer: expected ndim=3, found ndim=4 Full shape received: (None, None, None, None)

I am trying to work with a Simple RNN to predict Parkinson's Gait using Physionet Database. I am feeding the RNN with Images of height 240 and width of 16 pixels. I am also using Model checkpoint and monitor validation accuracy to save the best weights. While trying the input shape to the RNN I am getting that error as
ValueError: Input 0 of layer sequential_13 is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: (None, None, None, None)
RNN model:
model = Sequential()
model.add(SimpleRNN(24, kernel_initializer='glorot_uniform', input_shape=(64,240), return_sequences = True))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(2))
model.add(Activation('softmax'))
opt = optimizers.RMSprop(learning_rate=0.001, decay=1e-6)
epoch=10
early_stopping = EarlyStopping(monitor='val_accuracy', patience=60, verbose=1, mode='auto')
checkpoint = ModelCheckpoint("model_parkinsons.h5",
monitor='val_accuracy', verbose=0, save_best_only=True,
save_weights_only=False, mode='auto', save_freq='epoch')
model.compile(loss='binary_crossentropy',
optimizer=opt,
metrics=['accuracy'])
Batch size:64
Height of the image: 240
a.shape
Output: (64, 16, 240, 1)
I tried to feed the input shape as a.shape[1:]
But I am getting the error as expected 3 dimension but got 4 dimension.
Please help me how to resolve this.
In your first layer, you specified the input shape of your network. This shape does not include your batch size. So, if you specify "input_shape=(64, 240)", this would mean that your final input would need to have the shape (batch_size, 64, 240). Since 64 is your batch size, it seems like there definitely went something wrong there. Additionally, your input has four dimensions: (64, 16, 240, 1), but your first layer takes three dimensional inputs. I do not quite understand what you want to achieve with your model, but it should work if you input a[:, :, :, 0] into your model instead of a. Additionally, you need to set "input_shape=(16, 240)" in your first layer. If you do these two things, then your model uses the RNN to process one column of the image at a time. This approach does not make any sense to me (since RNNs are not used for image processing, at least not in this form), but I do not see any other way to interpret what you already did.

Unable to add CNN Layer after Embedding Layer Keras

I have 7 categorical Featues
And i am trying to add A CNN Layer after Embedding Layer
My first Layer is input Layer
Second Layer is Embedding Layer
Third Layer I want to add a Conv2D Layer
I've tried input_shape=(7,36,1) in Conv_2D but that didn't work
input2 = Input(shape=(7,))
embedding2 = Embedding(76474, 36)(input2)
# 76474 is the number of datapoints (rows)
# 36 is the output dim of embedding Layer
cnn1 = Conv2D(64, (3, 3), activation='relu')(embedding2)
flat2 = Flatten()(cnn1)
But i'm getting this error
Input 0 of layer conv2d is incompatible with the layer: expected
ndim=4, found ndim=3. Full shape received: [None, 7, 36]
The output of an embedding layer is 3D, namely (samples, seq_length, features), where features = 36 is the dimensionality of the embedding space, and seq_length = 7 is the sequence length. A Conv2D layer requires an image, which is usually represented as a 4D tensor (samples, width, height, channels).
Only a Conv1D layer would make sense, as it also takes 3D-shaped data, typically (samples, width, channels), and then you need to decide if you want to do convolution across the sequence length, or across the features dimension. That's something you need to experiment with, which in the end is to decide which is the "spatial dimension" in the output of the embedding

Struggle with LSTM and RNN using Keras

I'm working on a speech recognition problem running on Colab using LSTM. The audio files were converted into spectrograms and then normalized. There are 6840 spectrograms in total and the shape of each one is (288, 864, 4).
I already tried a few examples with RNN and CNN and they worked, but when I try an example using a LSTM I get shape errors, every time either there is one more or one less dimension than expected. Here are some of these cases :
rnn = keras.Sequential()
rnn.add(keras.layers.SimpleRNN(500, input_shape = (864, 4)))
rnn.add(keras.layers.LSTM(500, return_sequences = True))
rnn.add(keras.layers.Dropout(0.2))
rnn.add(keras.layers.LSTM(500, return_sequences = True))
rnn.add(keras.layers.Dropout(0.2))
rnn.add(keras.layers.LSTM(500, return_sequences = True))
rnn.add(keras.layers.Dropout(0.2))
rnn.add(keras.layers.Dense(212, activation = 'softmax'))
rnn.compile(optimizer = 'adam', loss = 'sparse_categorical_crossentropy',metrics = ['accuracy'])
rnn.fit(X_train, y_train, epochs = 5, validation_data=(X_test, y_test))
scores = rnn.evaluate(X_test, y_test, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', '%.2f' % (scores[1] * 100), '%')
The following error is raised on the first LSTM layer : ValueError: Input 0 of layer lstm_54 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 500]
If I remove the SimpleRNN line and feed the input directly to the first LSTM like this
rnn.add(keras.layers.LSTM(500, return_sequences = True, input_shape = (288, 864, 4)))
I get : ValueError: Input 0 of layer lstm_56 is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: [None, 288, 864, 4]
I tried reshaping the images to (4, 288 * 864) and got the same error when trying to use the RNN layer, but with just the LSTM I got InvalidArgumentError: Incompatible shapes: [32] vs. [32,4].
No idea where the 32 came from, though.
One last thing, not really an issue but more of a request, is there any library that can resize images the simple way? 288x864 is too big for Colab, so I'll have to do it eventually to be able to load all 6840 images and feed it to the neural network. Right now I'm just using 100 samples to test.
Feel free to leave suggestions about other methods, cabalistic number of nodes/layers or anything like that.
LSTM input is 3 dimensions [n_samples, n_timesteps, n_features], so your first line also need to enable return sequences:
rnn.add(keras.layers.SimpleRNN(500, return_sequences = True, input_shape = (864, 4)))
Next, your Dense layer will complain from wrong input size, so you want to remove return_sequence on the last LSTM network:
rnn.add(keras.layers.LSTM(500))
If you still want to keep the return_sequences = True on the last LSTM layer, you might want to wrap the Dense layer in a TimeDistributed.
I tried it on the following input and they seems to work
X_train = np.random.rand(100, 864, 4)
y_train = np.random.rand(100, 1)
The PIL from pillow package has plenty of image manipulation methods.

Enquiry about the input & output shape of LSTMs in Keras

I am trying to predict a time series with LSTM and am writing my code in Python by using Keras.
I have 30 features as input (continuous value) and a binary output.
I would like to use the 20 previous timesteps (t-20, t-19, .. , t-1) of each input feature in order to predict the output of next timestep (t+1).
My batch size is fixed at 52. What does this exactly mean?
I don't understand how to define the shape of the input layer.
The stacked LSTM example in the Keras documentation says that the last dimension of the 3D tensor will be 'data_dim'.
Is it input dimension or output dimension?
If this is output dimension, then I can't use more than one input feature as in my case the input_shape will be (batch_size=52,time_step=20,data_dim=1).
Also, in case data_dim is input shape, then I have tried to define a four layers-LSTM and the model shape results to be like this.
Layer (type) Output Shape Param #
================================================================= input_2 (InputLayer) (52, 20, 30) 0
_________________________________________________________________ lstm_3 (LSTM) (52, 20, 128) 81408
_________________________________________________________________ lstm_4 (LSTM) (52, 128) 131584
_________________________________________________________________ dense_2 (Dense) (52, 1) 129
================================================================= Total params: 213,121 Trainable params: 213,121 Non-trainable params: 0
Does this architecture make sense? Am I making some obvious mistakes?
My snippet of code is the one below:
input_layer=Input(batch_shape=(batch_size,input_timesteps,input_dims))
lstm1=LSTM(num_neurons,activation = 'relu',dropout=0.0,stateful=False,return_sequences=True)(input_layer)
lstm2=LSTM(num_neurons,activation = 'relu',dropout=0.0,stateful=False,return_sequences=False)(lstm1)
output_layer=Dense(1, activation='sigmoid')(lstm2)
model=Model(inputs=input_layer,outputs=output_layer)
I am getting very poor results and thus trying to debug each step.
If you want to use deep learning techniques you should try to overfit first and then reduce the complexity till you reach a break even point in terms of both neural complexity, training error and test error.
You are actually using a larger feature space in the hidden layer, are you sure your data are able to fit this?
Do you have enough rows to let the model learn this complex representation?
Otherwise I would suggest you something like this, in order to extrapolate the most important dimensions:
num_neurons1 = int(input_dims/2)
num_neurons2 = int(input_dims/4)
input_layer=Input(batch_shape=(batch_size, input_timesteps, input_dims))
lstm1=LSTM(num_neurons, activation = 'relu', dropout=0.0, stateful=False, return_sequences=True, kernel_initializer="he_normal")(input_layer)
lstm2=LSTM(num_neurons2, activation = 'relu', dropout=0.0, stateful=False, return_sequences=False, kernel_initializer="he_normal")(lstm1)
output_layer=Dense(1, activation='sigmoid')(lstm2)
model=Model(inputs=input_layer,outputs=output_layer)
Also, you are using relu as activation function.
Does it fit your data? Would be better you have only positive data after rescaling & normalization.
In case it does fit, you can also use a proper kernel initialization.
To better understand the problem, you could also post the optimizer parameters and the behaviour while training during epochs.

Add a Dense after conv layer

import keras.layers as KL
input_image = KL.Input([None, None, 3], name = 'input_image')
x = KL.Conv2D(64, (3,3), padding='same')(input_image)
after Conv, I want to add a dense as below:
KL.Dense(2)(KL.Flatten()(x))
but there will be an error:
ValueError: The shape of the input to "Flatten" is not fully defined
(got (None, None, 64). Make sure to pass a complete "input_shape" or
"batch_input_shape" argument to the first layer in your model.
So if I want a model contained conv followed by dense which can accept any size of input, how should I do?
Neural networks don't work with variable sized inputs. Unless you are dealing with recurrent neural networks.
With a network with variable sized input, what would the weights of the network look like?
Typically, you will pick a size for your input layer and resize or pad your input to match that size.
Although it's not the same as flattening your input you could use Global Max Pooling:
x = KL.GlobalMaxPooling2D()(x)
This will change your dimension from (None, None, None 64) to (None, 64) (including batch dimension). Global Max Pooling is a common way to close up convultional Networks and feed the output into a Dense Neural Network.
To build a CNN model you should use a pooling layer and then a flatten one, as you can see in the example below.
The pooling layer will reduce the number of data to be analysed in the convolutional network, and then we use Flatten to have the data as a "normal" input to a Dense layer. Moreover, after a convolutional layer, we always add a pooling one.
The example below is for 1D CNN but has the same structure as the 2D ones. Again, Flatten() changes the shape of the output to use properly in the last Dense layer.
model = Sequential()
model.add(Conv1D(num_filters_to_use, (filters_size_tuple), input_shape=features_array_shape, activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

Resources