LSTM for Video Input - keras

I am a newbie trying out LSTM.
I am basically using LSTM to determine action type (5 different actions) like running, dancing etc. My input is 60 frames per action and roughly let's say about 120 such videos
train_x.shape = (120,192,192,60)
where 120 is the number of sample videos for training, 192X192 is the frame size and 60 is the # frames.
train_y.shape = (120*5) [1 0 0 0 0 ..... 0 0 0 0 1] one hot-coded
I am not clear as to how to pass 3d parameters to lstm (timestamp and features)
model.add(LSTM(100, input_shape=(train_x.shape[1],train_x.shape[2])))
model.add(Dropout(0.5))
model.add(Dense(100, activation='relu'))
model.add(Dense(len(uniquesegments), activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(train_x, train_y, epochs=100, batch_size=batch_size, verbose=1)
i get the following error
Input 0 of layer sequential is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: (None, 192, 192, 60)
training data algorithm
Loop through videos
Loop through each frame of a video
logic
append to array
convert to numpy array
roll axis to convert 60 192 192 to 192 192 60
add to training list
convert training list to numpy array
training list shape <120, 192, 192, 60>

First you should know, method of solving video classification task is better suit for Convolutional RNN than LSTM or any RNN Cell, just as CNN is better suit for image classification task than MLP
Those RNN cell (e.g LSTM, GRU) is expect inputs with shape (samples, timesteps, channels), since you are deal inputs with shape (samples, timesteps, width, height, channels), so you should using tf.keras.layers.ConvLSTM2D instead
Following example code will show you how to build a model that can deal your video classification task:
import tensorflow as tf
from tensorflow.keras import models, layers
timesteps = 60
width = 192
height = 192
channels = 1
action_num = 5
model = models.Sequential(
[
layers.Input(
shape=(timesteps, width, height, channels)
),
layers.ConvLSTM2D(
filters=64, kernel_size=(3, 3), padding="same", return_sequences=True, dropout=0.1, recurrent_dropout=0.1
),
layers.MaxPool3D(
pool_size=(1, 2, 2), strides=(1, 2, 2), padding="same"
),
layers.BatchNormalization(),
layers.ConvLSTM2D(
filters=32, kernel_size=(3, 3), padding="same", return_sequences=True, dropout=0.1, recurrent_dropout=0.1
),
layers.MaxPool3D(
pool_size=(1, 2, 2), strides=(1, 2, 2), padding="same"
),
layers.BatchNormalization(),
layers.ConvLSTM2D(
filters=16, kernel_size=(3, 3), padding="same", return_sequences=False, dropout=0.1, recurrent_dropout=0.1
),
layers.MaxPool2D(
pool_size=(2, 2), strides=(2, 2), padding="same"
),
layers.BatchNormalization(),
layers.Flatten(),
layers.Dense(256, activation='relu'),
layers.Dense(action_num, activation='softmax')
]
)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
Outputs:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv_lst_m2d (ConvLSTM2D) (None, 60, 192, 192, 64) 150016
_________________________________________________________________
max_pooling3d (MaxPooling3D) (None, 60, 96, 96, 64) 0
_________________________________________________________________
batch_normalization (BatchNo (None, 60, 96, 96, 64) 256
_________________________________________________________________
conv_lst_m2d_1 (ConvLSTM2D) (None, 60, 96, 96, 32) 110720
_________________________________________________________________
max_pooling3d_1 (MaxPooling3 (None, 60, 48, 48, 32) 0
_________________________________________________________________
batch_normalization_1 (Batch (None, 60, 48, 48, 32) 128
_________________________________________________________________
conv_lst_m2d_2 (ConvLSTM2D) (None, 48, 48, 16) 27712
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 24, 24, 16) 0
_________________________________________________________________
batch_normalization_2 (Batch (None, 24, 24, 16) 64
_________________________________________________________________
flatten (Flatten) (None, 9216) 0
_________________________________________________________________
dense (Dense) (None, 256) 2359552
_________________________________________________________________
dense_1 (Dense) (None, 5) 1285
=================================================================
Total params: 2,649,733
Trainable params: 2,649,509
Non-trainable params: 224
_________________________________________________________________
Beware you should reorder your data to the shape (samples, timesteps, width, height, channels) before feed in above model (i.e not like np.reshape, but like np.moveaxis), in your case the shape should be (120, 60, 192, 192, 1), then you can split your 120 video to batchs and feed to model

From the docs, it seems like LSTM isn't even intended to take an input_shape argument. And that makes sense because typically you should be feeding it a 1d feature per timestep. That's why in the docs it says:
inputs: A 3D tensor with shape [batch, timesteps, feature]
What you're trying to do won't work (I've also left you a comment explaining why you probably shouldn't be trying to do it that way).

Related

Shapes Incompatible in Keras with CNN

I am implementing a network that takes a 2d image and outputs a 3D binary voxels for it.
I am using an autoencoder with LSTM module.
The current shape of images and voxels are as follows:
print(x_train.shape)
print(y_train.shape)
>>> (792, 127, 127, 3)
>>> (792, 32, 32, 32)
792 RGB images 127 x 127
792 corresponding voxels with 3D Binary Tensor (32 x 32 x 32)
Running the following encoder model:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Conv2D, LeakyReLU, MaxPooling2D, Dense, Flatten, Conv3D, MaxPool3D, GRU, Reshape, UpSampling3D
from tensorflow import keras
enc_filter = [96, 128, 256, 256, 256, 256]
fc_filters = [1024]
model = Sequential()
epochs = 5
batch_size = 24
input_shape=(127,127,3)
model.add(Conv2D(enc_filter[0], kernel_size=(7, 7), strides=(1,1),activation='relu',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(LeakyReLU(alpha=0.1))
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.SGD(lr=0.01),
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs)
yields the following:
ValueError: Shapes (24, 32, 32, 32) and (24, 1024) are incompatible
Can someone address why the shapes are incompatible? I tried removing layers and test others but all yields compatibility issues.
Your model has a dense layer with 1024 output, but you are passing 32,32,32 shaped array.
You need to reshape your model output so that it has proper shape.
This is a dummy model, you need to change the parameters to find the suitable architecture.
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Conv2D, LeakyReLU, MaxPooling2D, Dense, Flatten, Conv3D, MaxPool3D, GRU, Reshape, UpSampling3D
from tensorflow import keras
import numpy as np
# dummy data
x_train = np.random.randn(792, 127, 127, 3)
y_train = np.random.randn(792, 32, 32, 32)
enc_filter = [96, 128, 256, 2]
fc_filters = [1024]
model = Sequential()
epochs = 5
batch_size = 24
input_shape=(127,127,3)
model.add(Conv2D(enc_filter[0], kernel_size=(7, 7), strides=(1,1),activation='relu',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(LeakyReLU(alpha=0.1))
model.add(Conv2D(enc_filter[1], kernel_size=(7, 7), strides=(1,1),activation='relu',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(LeakyReLU(alpha=0.1))
model.add(Conv2D(enc_filter[2], kernel_size=(7, 7), strides=(1,1),activation='relu',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(LeakyReLU(alpha=0.1))
model.add(Conv2D(enc_filter[3], kernel_size=(7, 7), strides=(1,1),activation='relu',input_shape=input_shape)) # bottolneck
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(LeakyReLU(alpha=0.1))
model.add(Flatten())
model.add(Dense(32*32*32, activation='relu'))
model.add(Reshape((32,32,32)))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.SGD(lr=0.01),
metrics=['accuracy'])
model.summary()
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs)
Model: "sequential_10"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_24 (Conv2D) (None, 121, 121, 96) 14208
_________________________________________________________________
max_pooling2d_24 (MaxPooling (None, 60, 60, 96) 0
_________________________________________________________________
leaky_re_lu_24 (LeakyReLU) (None, 60, 60, 96) 0
_________________________________________________________________
conv2d_25 (Conv2D) (None, 54, 54, 128) 602240
_________________________________________________________________
max_pooling2d_25 (MaxPooling (None, 27, 27, 128) 0
_________________________________________________________________
leaky_re_lu_25 (LeakyReLU) (None, 27, 27, 128) 0
_________________________________________________________________
conv2d_26 (Conv2D) (None, 21, 21, 256) 1605888
_________________________________________________________________
max_pooling2d_26 (MaxPooling (None, 10, 10, 256) 0
_________________________________________________________________
leaky_re_lu_26 (LeakyReLU) (None, 10, 10, 256) 0
_________________________________________________________________
conv2d_27 (Conv2D) (None, 4, 4, 2) 25090
_________________________________________________________________
max_pooling2d_27 (MaxPooling (None, 2, 2, 2) 0
_________________________________________________________________
leaky_re_lu_27 (LeakyReLU) (None, 2, 2, 2) 0
_________________________________________________________________
flatten_10 (Flatten) (None, 8) 0
_________________________________________________________________
dense_1 (Dense) (None, 32768) 294912
_________________________________________________________________
reshape_10 (Reshape) (None, 32, 32, 32) 0
=================================================================
Total params: 2,542,338
Trainable params: 2,542,338
Non-trainable params: 0
In the summary, you can see I add a dense layer with 32x32x32 neurons and then reshape it.

logits and labels must have the same first dimension, got logits shape [3662,5] and labels shape [18310]

I am trying to create a CNN with tensorflow, my images are 64x64x1 images and I have an array of 3662 images which I am using for training. I have total 5 labels which I have one-hot encoded. I am getting this error everytime:
InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [3662,5] and labels shape [18310]
[[{{node loss_2/dense_5_loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]]
my neural network structure is this:
def cnn_model():
model = models.Sequential()
# model.add(layers.Dense(128, activation='relu', ))
model.add(layers.Conv2D(128, (3, 3), activation='relu',input_shape=(64, 64, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu',padding = 'same'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(5, activation='softmax'))
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'])
print(model.summary())
return model
My model summary is this:
Model: "sequential_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_9 (Conv2D) (None, 62, 62, 128) 1280
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 31, 31, 128) 0
_________________________________________________________________
conv2d_10 (Conv2D) (None, 31, 31, 64) 73792
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 15, 15, 64) 0
_________________________________________________________________
conv2d_11 (Conv2D) (None, 15, 15, 64) 36928
_________________________________________________________________
dense_4 (Dense) (None, 15, 15, 64) 4160
_________________________________________________________________
flatten_2 (Flatten) (None, 14400) 0
_________________________________________________________________
dense_5 (Dense) (None, 5) 72005
=================================================================
Total params: 188,165
Trainable params: 188,165
Non-trainable params: 0
my output array is of the shape (3662,5,1). I have seen other answers to same questions but I can't figure out the problem with mine. Where am I wrong?
Edit: My labels are stored in one hot encoded form using these:
df = pd.get_dummies(df)
diag = np.array(df)
diag = np.reshape(diag,(3662,5,1))
I have tried as numpy array and after converting to tensor(same for input as per documentation)
The problem lines within the choice of the loss function tf.keras.losses.SparseCategoricalCrossentropy(). According to what you are trying to achieve you should use tf.keras.losses.CategoricalCrossentropy(). Namely, the documentation of tf.keras.losses.SparseCategoricalCrossentropy() states:
Use this crossentropy loss function when there are two or more label classes. We expect labels to be provided as integers.
On the other hand, the documentation of tf.keras.losses.CategoricalCrossentropy() states:
We expect labels to be provided in a one_hot representation.
And because your labels are encoded as one-hot, you should use tf.keras.losses.CategoricalCrossentropy().

How to have a 3D filter for Conv2D in Keras?

My input images have 8 channels and my output (label) has 1 channel and my CNN in keras is like below:
def set_model(ks1=5, ks2=5, nf1=64, nf2=1):
model = Sequential()
model.add(Conv2D(nf1, padding="same", kernel_size=(ks1, ks1),
activation='relu', input_shape=(62, 62, 8)))
model.add(Conv2D(nf2, padding="same", kernel_size=(ks2, ks2),
activation='relu'))
model.compile(loss=keras.losses.binary_crossentropy,
optimizer=keras.optimizers.Adadelta())
return model
The filter I have here is the same for all 8 channels. What I would like to have is a 3D filter, something like (8, 5, 5) such that every channel has a separate filter because these channels have not the same importance.
Below is the summary of the model implemented above:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 62, 62, 64) 12864
_________________________________________________________________
conv2d_2 (Conv2D) (None, 62, 62, 1) 1601
=================================================================
Total params: 14,465
Trainable params: 14,465
Non-trainable params: 0
_________________________________________________________________
And when I get the shape of weights for the first layer I have the following results:
for layer in model.layers:
weights = layer.get_weights()
len(weights)
2
a = np.array(weights[0])
a.shape
(5, 5, 64, 1)
And I am wondering where is 8 in the shape of weights of the first layer?

Keras to Pytorch model translation and input size

I am following a Keras tutorial and want to shadow it in Pytorch, so am translating. I'm not strongly familiar with either and am coming unstuck on the input size parameter especially, but also the final layer - do I need another Linear layer? Can anyone translate the following to a Pytorch sequential definition?
visible = Input(shape=(64,64,1))
conv1 = Conv2D(32, kernel_size=4, activation='relu')(visible)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Conv2D(16, kernel_size=4, activation='relu')(pool1)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
hidden1 = Dense(10, activation='relu')(pool2)
output = Dense(1, activation='sigmoid')(hidden1)
model = Model(inputs=visible, outputs=output)
This is the output of the model:
Layer (type) Output Shape Param #
_________________________________________________________________
input_1 (InputLayer) (None, 64, 64, 1) 0
conv2d_1 (Conv2D) (None, 61, 61, 32) 544
max_pooling2d_1 (MaxPooling2 (None, 30, 30, 32) 0
conv2d_2 (Conv2D) (None, 27, 27, 16) 8208
max_pooling2d_2 (MaxPooling2 (None, 13, 13, 16) 0
dense_1 (Dense) (None, 13, 13, 10) 170
dense_2 (Dense) (None, 13, 13, 1) 11
Total params: 8,933
Trainable params: 8,933
Non-trainable params: 0
What I have worked out lacks a specification for the shape of the input, and I am also a bit perplexed at the translation of stride in the specified Keras model as it uses stride 2 in the MaxPooling2D but doesn't specify this elsewhere - it is perhaps a toy example.
model = nn.Sequential(
nn.Conv2d(1, 32, 4),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(1, 16, 4),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Linear(10, 1),
nn.Sigmoid(),
)

How to fix expected dense_11 to have 4 dimensions error

I got some error when I was building a convolutional neural network with Keras:
Error when checking target: expected dense_11 to have 4 dimensions,
but got array with shape (48986, 12)
Since I lack knowledge, I have no idea what to fix. Can someone explain the reason and also suggest the solution?
input_shape = (99, 81, 1)
nclass = 12
model = Sequential()
model.add(Dense(32, input_shape=input_shape))
model.add(Convolution2D(8,3,3,activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(Dense(128, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(nclass, activation='softmax'))
x_train, x_valid, y_train, y_valid = train_test_split(x_train, y_train, test_size=0.1, random_state=2017)
#vgg
batch_size = 128
nb_epoch = 1
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
#model.fit(x_train,y_train,nb_epoch= nb_epoch,batch_size = batch_size , validation_split=0.1)
model.fit(x_train, y_train, batch_size=16, validation_data=(x_valid, y_valid), epochs=3, shuffle=True, verbose=2)
model.save(os.path.join(model_path, 'vgg16.model'))
x_train has a shape of (99, 81, 1) and the nclass output should be 12.
Look at the error again:
"Error when checking target: expected dense_11 to have 4 dimensions, but got array with shape (48986, 12)" - target=labels/output
Meaning, there is some kind of problem with your output shape.
Lets print the model summary to check what is the expected output shape:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 99, 81, 32) 64
_________________________________________________________________
conv2d_1 (Conv2D) (None, 97, 79, 8) 2312
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 48, 39, 8) 0
_________________________________________________________________
dense_2 (Dense) (None, 48, 39, 128) 1152
_________________________________________________________________
dense_3 (Dense) (None, 48, 39, 128) 16512
_________________________________________________________________
dense_4 (Dense) (None, 48, 39, 12) 1548
=================================================================
Total params: 21,588
Trainable params: 21,588
Non-trainable params: 0
_________________________________________________________________
The final layer outputs predictions with shape: (None, 48,39,12).
You can see that this is happening because the Dense layer get input with shape (None, 48,39,8) and according to Keras implementation, Dense layer is places on top of the last dimension -> meaning: Dense layer with 128 nodes that gets input with shape (None,48,39,8) will outputs (None,48,39,128).
The solution depends on what you want to do and what is the shape of your labels (what the output should be).
For example, if the output shape of your model should be (nclass,1) than maybe you can Flatten the data after the MaxPool layer.
If it should be something else that change your labels shape to be (None, 48, 39, 12).
Good luck :)

Resources