Data preparation for variable length video classification - keras

I am doing video classification for action detection using Keras (v.2.3.1). My model is CNN and LSTM. My dataset consists of videos of 3-7 seconds, each representing specific action. I use OpenCv to get frames from each video. But since video lengths are different, I get different number of frames for each video. However, I need to have the same number of frames for the LSTM layer. I searched a little bit and looks like padding along with a masking layer should do it, but I can’t figure out how to do this with Keras. Any help is appreciated. Here is my model:
conv_base = VGG16(weights= 'imagenet', include_top=False)
model = models.Sequential()
model.add(TimeDistributed(conv_base, input_shape = (n_timesteps, img_length, img_height, channel)))
model.add(TimeDistributed((Flatten())))
model.add(LSTM(units = lstm_cells))
model.add(Dropout(0.5))
model.add(Dense(512, activation='relu'))
model.add(Dense(256, activation='relu'))
model.add(Dense(100, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

Related

GradCam applied to video sequence classification with TimeDistributed CNN and LSTM

after days working on it, I have found any reasonable way of doing, so here I am.
I have a network that aims to predict the next video class, given the features of the current one. Each video is composed by 30 frames. The idea is to apply a feature extraction method to each input, then feed into an LSTM + Dense layer to make prediction.
Here the code:
video = Input(shape=(30,299,299,3))
inc = InceptionV3(weights='imagenet', include_top=False, input_shape=(299, 299, 3))
cnn_out = GlobalAveragePooling2D()(inc.output)
cnn = Model(inputs=inc.input, outputs=cnn_out)
encoded_frames = TimeDistributed(cnn)(video)
encoded_sequence = LSTM(128, activation='relu', return_sequences=False, kernel_initializer=he_uniform(), bias_initializer='zeros', dropout=0.5)(encoded_frames)
hidden_layer = Dense(1024, activation='relu', kernel_initializer=he_uniform(), bias_initializer='zeros')(encoded_sequence)
outputs = Dense(4, activation="softmax", kernel_initializer=glorot_normal(), bias_initializer='zeros')(hidden_layer)
model = Model(inputs=[video], outputs=outputs)
adam = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, amsgrad=False)
model.compile(optimizer=adam, loss='categorical_crossentropy', metrics=['accuracy'])
I would like to visualize the feature activations at the CNN stage for each image. So if I look at the saliency map for each input image I can understand which features are more importante than others to make this kind of prediction.
All the examples on internet are facing with just one CNN and one input image, is there any way of doing this?
Any help is really appreciated, thanks!

Why does CNN only predict one class

I have a model that needs to detect if a plant is dead or alive. It is only predicting one class, that data is imbalanced, but i have used weights to counter the imbalance.
I have looked at loads of questions about this problem, but none seem to work, apparently this problem occurs when overfitting, so I have used dropout. But the model still only predicts one class.
Heres the model:
model=Sequential()
# Convolutional layer / input layer
model.add(Conv2D(60, 5,5, activation='relu', input_shape=np.shape(X[1])))
model.add(MaxPooling2D(pool_size=(3,3)))
model.add(Dropout(0.8))
model.add(Flatten())
model.add(Dropout(0.7))
model.add(Dense(130, activation='relu'))
model.add(Dropout(0.6))
# Output layer
model.add(Dense(2, activation='softmax'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(X, y, epochs=6, batch_size=32, class_weight=class_weight, validation_data=(X_test, y_test))
Usually it should predict both classes with 1: a healthy plant and 0:
an unhealthy plant
Since your problem is a binary classification and your output dimensionality is 2, you should change your activation to softmax.
model.add(Dense(2, activation='softmax'))
However, if you want to keep sigmoid just change your output layer units to 1, this way you will output how likely your input is one of the two classes with only one unit.
model.add(Dense(1, activation='sigmoid'))

Why is the validation accuracy constant at 20%?

I am trying to implement a 5 class animal classifier using Keras. I am building the CNN from scratch and the weird thing is, the validation accuracy stays constant at 0.20 for all epochs. Any idea why this is happening? The dataset folder contains train, test and validation folders. And each of the folders contains 5 folders corresponding to the 5 classes. What am I doing wrong?
I have tried multiple optimizer but the problem persists. I have included the code sample below.
import warnings
warnings.filterwarnings("ignore")
#First convolution layer
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(3, 3), activation='relu',kernel_initializer='he_normal',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
#Second convolution layer
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu',kernel_initializer='he_normal',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
#Flatten the outputs of the convolution layer into a 1D contigious array
model.add(Flatten())
#Add a fully connected layer containing 256 neurons
model.add(Dense(256, activation='relu',kernel_initializer='he_normal'))
model.add(BatchNormalization())
#Add another fully connected layer containing 256 neurons
model.add(Dense(256, activation='relu',kernel_initializer='he_normal'))
model.add(BatchNormalization())
#Add the ouput layer containing 5 neurons, because we have 5 categories
model.add(Dense(5, activation='softmax',kernel_initializer='glorot_uniform'))
optim=RMSprop(lr=1e-6)
model.compile(loss='categorical_crossentropy',optimizer=optim,metrics=['accuracy'])
model.summary()
#We will use the below code snippet for rescaling the images to 0-1 for all the train and test images
train_datagen = ImageDataGenerator(rescale=1./255)
#We won't augment the test data. We will just use ImageDataGenerator to rescale the images.
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(train_data_dir,
classes=['frog', 'giraffe', 'horse', 'tiger','dog'],
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
shuffle=False)
validation_generator = test_datagen.flow_from_directory(validation_data_dir,
classes=['frog', 'giraffe', 'horse', 'tiger','dog'],
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
shuffle=False)
hist=History()
model.fit_generator(train_generator,
steps_per_epoch=nb_train_samples // batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps=nb_validation_samples // batch_size,
callbacks=[hist])
model.save('models/basic_cnn_from_scratch_model.h5') #Save the model weights #Load using: model = load_model('cnn_from_scratch_weights.h5') from keras.models import load_model
print("Time taken to train the baseline model from scratch: ",datetime.now()-global_start)
Check the following for your data:
Shuffle the training data well (I see shuffle=False everywhere)
Properly normalize all data (I see you are doing rescale=1./255, maybe okay)
Proper train/val split (you seem to be doing that too)
Suggestions for your model:
Use multiple Conv2D layers followed by a final Dense. That's what works best for image classification problems. You can also look at popular architectures that are tried and tested; e.g. AlexNet
Can change the optimizer to Adam and try with different learning rates
Have a look at your training and validation loss graphs and see if they look as expected
Also, I guess you corrected the shape of the 2nd Conv2D layer as mentioned in the comments.
It looks as if your output is always the same animal, thus you have a 20% accuracy. I highly recommend you to check your testing outputs to see if they are all the same.
Also you said that you were building a CNN but in the code snipet you posted I see only dense layers, it is going to be hard for a dense architecture to do this task, and it is very small. What is the size of your pictures?
Hope it helps!
The models seems to be working now. I have removed shuffle=False attribute. Corrected the input shape for the 2nd convolution layer. Changed the optimizer to adam. I have reached a validation accuracy of almost 94%. However, I have not yet tested the model on unseen data. There is a bit of overfitting in the model. I will have to use some aggressive dropouts to reduce them. Thanks!

Input of RNN model layer how it works?

I don't understand input of RNN model. Why it show None before node size every layer. Why it is (None,1) (None,12)
This is my code.
K.clear_session()
model = Sequential()
model.add(Dense(12, input_dim=1, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.summary()
This is not a RNN, it's just a fully connected network (FC or Dense).
The first dimension of every tensor in a Keras network is the batch_size, which represents the number of "samples" or "examples" you are passing to the model. The value is None because this dimension is not fixed, you can have batches of any size you want.

deep learning data preparation

I have a text dataset, that contains 6 classes. for each sample, I have the percent value and sum of the 6 percent values is 100% (features are related to each other). For example :
{A:16, B:35, C:7, D:0, E:3, F:40}
how can I feed a deep learning algorithm with this dataset?
I actually want the prediction to be exactly in the shape of training data.
Here is what you can do:
First of all, normalize all of your labels and scale them between 0-1.
Use a softmax layer for prediction.
Here is some code in Keras for intuition:
model = Sequential()
model.add(Dense(100, input_dim = x.shape[1], activation='relu'))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

Resources