CNN alternates between good performance and chance - keras

I have a binary classification problem I am trying to solve with a CNN written in Keras. The input are very sparse 200X125X2 tensors (can be though of as two images stacked together), and its nonzero elements are only ones (representing neuron spike trains). The input is generated using a data generator that I have built, so the model is trained using the fit_generator function.
I have tried various architectures, and some show a decent performance (~88%), but the thing is that sometimes when I train new models, they don't seem to work at all, giving a chance (50%) result every epoch. The weird thing is that it happens sometimes to the same architectures that worked well before. I am running the code on Google Colab (GPU) with TensorFlow 2.0. I have check multiple times that I haven't changed anything in the code. I know that random initialization of the weights and biases may cause slight changes in the performance, but it looks like something else.
Any ideas will be very helpful. Thanks!
Here is the relevant code for one of the models that had this problem (I am using unusual kernels, I know):
# General settings
x_max = 10
x_size, t_size, N_features = parameters(x_max)
batch_size = 64
N_epochs = 10
N_final = 10*N_features
N_final = int(N_final - N_final%(batch_size))
N_val = 100*batch_size
N_test = N_final/5
# Setting up the architecture of the network and compiling
model = Sequential()
model.add(SeparableConv2D(50, (50,30), data_format='channels_first', input_shape=(2,x_size, t_size)))
model.add(MaxPooling2D(pool_size=2, data_format='channels_first'))
model.add(SeparableConv2D(100, (10,6), data_format='channels_first', input_shape=(2,x_size, t_size)))
model.add(MaxPooling2D(pool_size=2, data_format='channels_first'))
model.add(Flatten())
model.add(Dense(100, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fitiing the model on generated data
filepath="......hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_accuracy', verbose=1, save_best_only=True, mode='max')
start = time.time()
fit_history = model_delta.fit_generator(generator = data_generator(batch_size,x_max,'delta','_',100),
steps_per_epoch = N_final//batch_size,
validation_data = data_generator(batch_size,x_max,'delta','_',100),
validation_steps = N_val//batch_size,
callbacks = [checkpoint],
epochs = N_epochs)
end = time.time()

The most suspicious thing I see is a 'relu' near the end of the model. Depending on the initialization and on the learning rate, ReLUs can be unlucky and fall into an all-zeros case. When this happens, they completely stop gradients and don't train anymore.
By the looks of your problem (sometimes it works, sometimes it doesn't), it seems very plausible that it's the relu.
So, the first suggesion (this always solves it) is to add a batch normalization before the activation:
model.add(Dense(100))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dense(1, activation='sigmoid')
Hint, if you are going to use it with the 4D tensors before the flatten, remember to use the channels dimension: BatchNormalization(1).

Related

LSTM Autoencoder producing poor results in test data

I'm applying LSTM autoencoder for anomaly detection. Since anomaly data are very few as compared to normal data, only normal instances are used for the training. Testing data consists of both anomalies and normal instances. During the training, the model loss seems good. However, in the test the data the model produces poor accuracy. i.e. anomaly and normal points are not well separated.
The snippet of my code is below:
.............
.............
X_train = X_train.reshape(X_train.shape[0], lookback, n_features)
X_valid = X_valid.reshape(X_valid.shape[0], lookback, n_features)
X_test = X_test.reshape(X_test.shape[0], lookback, n_features)
.....................
......................
N = 1000
batch = 1000
lr = 0.0001
timesteps = 3
encoding_dim = int(n_features/2)
lstm_model = Sequential()
lstm_model.add(LSTM(N, activation='relu', input_shape=(timesteps, n_features), return_sequences=True))
lstm_model.add(LSTM(encoding_dim, activation='relu', return_sequences=False))
lstm_model.add(RepeatVector(timesteps))
# Decoder
lstm_model.add(LSTM(timesteps, activation='relu', return_sequences=True))
lstm_model.add(LSTM(encoding_dim, activation='relu', return_sequences=True))
lstm_model.add(TimeDistributed(Dense(n_features)))
lstm_model.summary()
adam = optimizers.Adam(lr)
lstm_model.compile(loss='mse', optimizer=adam)
cp = ModelCheckpoint(filepath="lstm_classifier.h5",
save_best_only=True,
verbose=0)
tb = TensorBoard(log_dir='./logs',
histogram_freq=0,
write_graph=True,
write_images=True)
lstm_model_history = lstm_model.fit(X_train, X_train,
epochs=epochs,
batch_size=batch,
shuffle=False,
verbose=1,
validation_data=(X_valid, X_valid),
callbacks=[cp, tb]).history
.........................
test_x_predictions = lstm_model.predict(X_test)
mse = np.mean(np.power(preprocess_data.flatten(X_test) - preprocess_data.flatten(test_x_predictions), 2), axis=1)
error_df = pd.DataFrame({'Reconstruction_error': mse,
'True_class': y_test})
# Confusion Matrix
pred_y = [1 if e > threshold else 0 for e in error_df.Reconstruction_error.values]
conf_matrix = confusion_matrix(error_df.True_class, pred_y)
plt.figure(figsize=(5, 5))
sns.heatmap(conf_matrix, xticklabels=LABELS, yticklabels=LABELS, annot=True, fmt="d")
plt.title("Confusion matrix")
plt.ylabel('True class')
plt.xlabel('Predicted class')
plt.show()
Please suggest what can be done in the model to improve the accuracy.
If your model is not performing good on the test set I would make sure to check certain things;
Training set is not contaminated with anomalies or any information from the test set. If you use scaling, make sure you did not fit the scaler to training and test set combined.
Based on my experience; if an autoencoder cannot discriminate well enough on the test data but has low training loss, provided your training set is pure, it means that the autoencoder did learn about the underlying details of the training set but not about the generalized idea.
Your threshold value might be off and you may need to come up with a better thresholding procedure. One example can be found here: https://dl.acm.org/citation.cfm?doid=3219819.3219845
If the problem is 2nd one, the solution is to increase generalization. With autoencoders, one of the most efficient generalization tool is the dimension of the bottleneck. Again based on my experience with anomaly detection in flight radar data; lowering the bottleneck dimension significantly increased my multi-class classification accuracy. I was using 14 features with an encoding_dim of 7, but encoding_dim of 4 provided even better results. The value of the training loss was not important in my case because I was only comparing reconstruction errors, but since you are making a classification with a threshold value of RE, a more robust thresholding may be used to improve accuracy, just as in the paper I've shared.

Why is the validation accuracy constant at 20%?

I am trying to implement a 5 class animal classifier using Keras. I am building the CNN from scratch and the weird thing is, the validation accuracy stays constant at 0.20 for all epochs. Any idea why this is happening? The dataset folder contains train, test and validation folders. And each of the folders contains 5 folders corresponding to the 5 classes. What am I doing wrong?
I have tried multiple optimizer but the problem persists. I have included the code sample below.
import warnings
warnings.filterwarnings("ignore")
#First convolution layer
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(3, 3), activation='relu',kernel_initializer='he_normal',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
#Second convolution layer
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu',kernel_initializer='he_normal',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
#Flatten the outputs of the convolution layer into a 1D contigious array
model.add(Flatten())
#Add a fully connected layer containing 256 neurons
model.add(Dense(256, activation='relu',kernel_initializer='he_normal'))
model.add(BatchNormalization())
#Add another fully connected layer containing 256 neurons
model.add(Dense(256, activation='relu',kernel_initializer='he_normal'))
model.add(BatchNormalization())
#Add the ouput layer containing 5 neurons, because we have 5 categories
model.add(Dense(5, activation='softmax',kernel_initializer='glorot_uniform'))
optim=RMSprop(lr=1e-6)
model.compile(loss='categorical_crossentropy',optimizer=optim,metrics=['accuracy'])
model.summary()
#We will use the below code snippet for rescaling the images to 0-1 for all the train and test images
train_datagen = ImageDataGenerator(rescale=1./255)
#We won't augment the test data. We will just use ImageDataGenerator to rescale the images.
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(train_data_dir,
classes=['frog', 'giraffe', 'horse', 'tiger','dog'],
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
shuffle=False)
validation_generator = test_datagen.flow_from_directory(validation_data_dir,
classes=['frog', 'giraffe', 'horse', 'tiger','dog'],
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
shuffle=False)
hist=History()
model.fit_generator(train_generator,
steps_per_epoch=nb_train_samples // batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps=nb_validation_samples // batch_size,
callbacks=[hist])
model.save('models/basic_cnn_from_scratch_model.h5') #Save the model weights #Load using: model = load_model('cnn_from_scratch_weights.h5') from keras.models import load_model
print("Time taken to train the baseline model from scratch: ",datetime.now()-global_start)
Check the following for your data:
Shuffle the training data well (I see shuffle=False everywhere)
Properly normalize all data (I see you are doing rescale=1./255, maybe okay)
Proper train/val split (you seem to be doing that too)
Suggestions for your model:
Use multiple Conv2D layers followed by a final Dense. That's what works best for image classification problems. You can also look at popular architectures that are tried and tested; e.g. AlexNet
Can change the optimizer to Adam and try with different learning rates
Have a look at your training and validation loss graphs and see if they look as expected
Also, I guess you corrected the shape of the 2nd Conv2D layer as mentioned in the comments.
It looks as if your output is always the same animal, thus you have a 20% accuracy. I highly recommend you to check your testing outputs to see if they are all the same.
Also you said that you were building a CNN but in the code snipet you posted I see only dense layers, it is going to be hard for a dense architecture to do this task, and it is very small. What is the size of your pictures?
Hope it helps!
The models seems to be working now. I have removed shuffle=False attribute. Corrected the input shape for the 2nd convolution layer. Changed the optimizer to adam. I have reached a validation accuracy of almost 94%. However, I have not yet tested the model on unseen data. There is a bit of overfitting in the model. I will have to use some aggressive dropouts to reduce them. Thanks!

Emotion detection on text

I am a newbie in ML and was experimenting with emotion detection on the text.
So I have an ISEAR dataset which contains tweets with their emotion labeled.
So my current accuracy is 63% and I want to increase to at least 70% or even more maybe.
Heres the code :
inputs = Input(shape=(MAX_LENGTH, ))
embedding_layer = Embedding(vocab_size,
64,
input_length=MAX_LENGTH)(inputs)
# x = Flatten()(embedding_layer)
x = LSTM(32, input_shape=(32, 32))(embedding_layer)
x = Dense(10, activation='relu')(x)
predictions = Dense(num_class, activation='softmax')(x)
model = Model(inputs=[inputs], outputs=predictions)
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['acc'])
model.summary()
filepath="weights-simple.hdf5"
checkpointer = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
history = model.fit([X_train], batch_size=64, y=to_categorical(y_train), verbose=1, validation_split=0.1,
shuffle=True, epochs=10, callbacks=[checkpointer])
That's a pretty general question, optimizing the performance of a neural network may require tuning many factors.
For instance:
The optimizer chosen: in NLP tasks rmsprop is also a popular
optimizer
Tweaking the learning rate
Regularization - e.g dropout, recurrent_dropout, batch norm. This may help the model to generalize better
More units in the LSTM
More dimensions in the embedding
You can try grid search, e.g. using different optimizers and evaluate on a validation set.
The data may also need some tweaking, such as:
Text normalization - better representation of the tweets - remove unnecessary tokens (#, #)
Shuffle the data before the fit - keras validation_split creates a validation set using the last data records
There is no simple answer to your question.

Keras/Tensorflow Model Works with Validation Images but Not Real World Data

I have a model with very validation high accuracy (> 99%) that fails when run against images that are not in the original training or validation set, namely photos taken with my smartphone.
I've always felt that to learn any new technology one has to suffer with it. To that end (and after reading and watching a bunch of machine learning tutorials) I created and labeled around 25,000 images and fed them into my CNN (mostly cribbed from the CIFAR-10 examples).
The images (buildings on my block in NY) were harvested from video taken with both a GoPro and my Android phone. Each frame was converted to a full-sized (original resolution) jpeg file.
The images were labeled and organized into a directory structure where each sub-directory corresponded to the address (label) of the image (100MainSt, 102MainSt, etc). This is to allow seamless integration with the Keras 'flow_from_directory' functionality. Note that a given directory/label contains both Android and GoPro images.
The data was then divided (80/20) into training and validation data using the sklearn train_test_split function.
I ran my model with the Adam optimizer, the loss function was categorical_crossentropy, the learning rate was 1e-6, and each image was shrunk to 300x300 (due to memory limitations on my GPU). After 70+ epochs my validation accuracy was 99.2% and my loss was 0.0383. Not bad (or so I thought).
Now my problem: When I take photos with my phone (stills, not frames from a video as above) and feed them through my model the performance is terrible, with 7 out of 12 images incorrectly classified. When I run randomly selected (by me) training or validation images (from above) through the model it works very well, which is what I would expect. This indicates to me that the transformations I do to the input image (shrinking, transposing, converting to numpy array, etc) are the same and correct in all cases.
The only salient difference I can see between the video harvested images that I used for training and validation and the still images (aka snapshots) is the resolution. The snapshots have a significantly higher resolution, although I would think that wouldn't matter given that all images are reduced to 300x300.
Any insight or ideas would be much appreciated (and probably helpful for future travelers) as I'm completely mystified as to why this isn't working.
The guts of my code:
model = Sequential()
filters = 32
model.add(Conv2D(filters, (3, 3), padding='same', input_shape=(image_width, image_height, 3)))
model.add(Activation('relu'))
model.add(Conv2D(filters, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(filters*2, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(filters*2, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(filters*16))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(len(classes)))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer=opts[opt],
metrics=['accuracy'])
Fitting, normalization, and prediction:
train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
training_dir,
target_size=(image_width, image_height),
batch_size=batch_size,
class_mode='categorical',
follow_links=True
)
validation_generator = test_datagen.flow_from_directory(
validation_dir,
target_size=(image_width, image_height),
batch_size=batch_size,
class_mode='categorical',
follow_links=True
)
hist = model.fit_generator(
train_generator,
steps_per_epoch=train_generator.samples//batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps=validation_generator.samples//batch_size,
use_multiprocessing=True,
workers=8,
callbacks=[early_stopping, time_callback]
)
pred = model.predict_generator(validation_generator, workers=8, use_multiprocessing=True, verbose=1)
And the code I use to test with individual (snapshot) images:
#... Use with snapshots (may need to be rotated)
# image = Image.open(image_file).convert("RGB").rotate(-90).resize((width, height))
#... Use with images scraped from video (either GoPro or Android)
image = Image.open(image_file).convert("RGB").resize((width, height))
img = np.array(image)
r = img[:,:,0]
g = img[:,:,1]
b = img[:,:,2]
npimages = np.array([[r] + [g] + [b]], np.uint8)
npimages = npimages.transpose(0,2,3,1)
classes = model.predict_classes(npimages)
prediction = model.predict(npimages, verbose=2)
print(prediction)
print(classes)
print(label_map[classes[0]])
plt.imshow(img)
I’m afraid video data tends to be strongly correlated. That is, although a 1 minute video translates to 60 seconds of 30 fps (1800) images, most of them are very similar. It’s the same scene, "same" lighting conditions, same cars or people passing by.
If your validation image data comes from the same video sequence as your test data, you’ll get a great accuracy (nearly 100%!) but it’s similar to testing on your training data. It’s overfitting and the validation set is similar enough to the test set so the accuracy is high. Drop out might help a little, but not if your dataset is strongly correlated.
On the plus side, your coding is probably valid!
Fixes? More data is probably your best route. Even if it’s more video data taken on a different day (different cars, people, weather etc). Sorry - I know there’s a lot of work to label things. Alternatively, maybe try fine tuning a pre-trained network on your data.

Confusion about Keras RNN Input shape requirement

I have read plenty of posts for this point. They are inconsistent with each other and every answer seems to have a different explanation so I thought to ask based on my analyzing of all of them.
As Keras RNN documentation states, the input shape is always in this form (batch_size, timesteps, input_dim). I am a bit confused about that but I guess, not sure though, that input_dim is always 1 while timesteps depends on your problem (could be the data dimension as well). Is that roughly correct?
The reason for this question is that I always get an error when trying to change the value of input_dim to be my dataset dimension (as input_dim sounds like that!!), so I made an assumption that input_dim represent the shape of the input vector to LSTM at a time. Am I wrong again?
C = C.reshape((C.shape[0], C.shape[1], 1))
tr_C, ts_C, tr_r, ts_r = train_test_split(C, r, train_size=.8)
batch_size = 1000
print('Build model...')
model = Sequential()
model.add(LSTM(8, batch_input_shape=(batch_size, C.shape[1], 1), stateful=True, activation='relu'))
model.add(Dense(1, activation='relu'))
print('Training...')
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(tr_C, tr_r,
batch_size=batch_size, epochs=1,
shuffle=True, validation_data=(ts_C, ts_r))
Thanks!
Indeed, input_dim is the shape of the input vector at a time. In other words, input_dim is the number of the input features.
It's not necessarily 1, though. If you're working with more than one var, it can be any number.
Suppose you have 10 sequences, each sequence has 200 time steps, and you're measuring just a temperature. Then you have one feature:
input_shape = (200,1) -- notice that the batch size (number of sequences) is ignored here
batch_input_shape = (10,200,1) -- only in specific cases, like stateful = True, you will need a batch input shape.
Now suppose you're measuring not only temperature, but also pressure and volume. Now you've got three input features:
input_shape = (200,3)
batch_input_shape = (10,200,3)
In other words, the first dimension is the number of different sequences. The second is the length of the sequence (how many measures along time). And the last is how many vars at each time.

Resources