Keras. EfficientNetV2 doesn't converge while EfficientNet does - keras

Using transfer learning with EfficientNet (B4) for the image classification yielded decent results. Trying to run the same with the V2 gets stuck with no learning.
Any idea what should be done to solve it?
Thanks
This converges just fine starting from the epoch 1:
efficientnetB4 = tf.keras.applications.EfficientNetB4(
input_shape=(224, 224, 3),
include_top=False,
weights='imagenet',
pooling=None
)
This gets stuck with no accuracy improvement for several epochs.
efficientnetV2S = tf.keras.applications.EfficientNetV2S(
input_shape=(224, 224, 3),
include_top=False,
weights='imagenet',
pooling=None
)

Appears reducing the initial learning rate from 1e-3 to 1e-4 solves the problem. The training starts converging from epoch 1.

Related

Transfer Learning on Resnet50 doesn't go beyond 40%

I'm using Kaggle - Animal-10 dataset for experimenting transfer learning with FastAI and Keras.
Base model is Resnet-50.
With FastAI I'm able to get accuracy of 95% in 3 epochs.
learn.fine_tune(3, base_lr=1e-2, cbs=[ShowGraphCallback()])
I believe it only trains the top layers.
With Keras
If I train the complete Resnet then only I'm able to achieve accuracy of 96%
If I use the below code for transfer learning, then at max I'm able to reach 40%
num_classes = 10
#number of layers to retrain from previous model
fine_tune = 33 #conv5 block
model = Sequential()
base_layer = ResNet50(include_top=False, pooling='avg', weights="imagenet")
# base_layer.trainable = False
#make only last few layers trainable, except them make all false
for layer in base_layer.layers[:-fine_tune]:
layer.trainable = False
model.add(base_layer)
model.add(layers.Flatten())
# model.add(layers.BatchNormalization())
# model.add(layers.Dense(2048, activation='relu'))
# model.add(layers.Dropout(rate=0.2))
model.add(layers.Dense(1024, activation='relu'))
model.add(layers.BatchNormalization())
model.add(layers.Dense(num_classes, activation='softmax'))
I assume the cause Transfer learning with Keras, validation accuracy does not improve from outset (beyond naive baseline) while train accuracy improves
and thats the reason that now I'm re-training complete block5 of Resnet and still it doesn't add any value.

CNN classifier with keras shows accuracy (and categorical accuracy) of 1 but still many predictions are false

I have trained a CNN classifier and get weird results. When training, it reaches 1 accuracy (and also categorical accuracy, whatever the difference might be). However, when I predict on training samples manually, I rarely get the right class after a np.argmax() which seems very odd. I figured it might be a bad mapping of classes, but after checking the generator classes mapping it look ok.
I suspect the way I input the images for testing is different from the way the data generator feeds the images for training, it's the only possible explanation. Here's some code:
datagen = ImageDataGenerator(rescale=1./255)
train_classif_generator = datagen.flow_from_directory('full_ae_output/classifier_classes',target_size=image_dims_original, batch_size=batch_size,shuffle=True, color_mode='grayscale')
classifier = Sequential()
classifier.add(Conv2D(8, (3, 3), padding='same', input_shape=image_input_dims))
classifier.add(Activation('relu'))
classifier.add(MaxPooling2D(pool_size=(2,2), padding='same'))
#2nd convolution layer
classifier.add(Conv2D(8, (3, 3), padding='same'))
classifier.add(Activation('relu'))
classifier.add(MaxPooling2D(pool_size=(2,2), padding='same'))
#3rd convolution layer
classifier.add(Conv2D(16, (3, 3), padding='same'))
classifier.add(Activation('relu'))
classifier.add(MaxPooling2D(pool_size=(2,2), padding='same'))
# Classifier
classifier.add(Flatten())
classifier.add(Dense(n_classes*2,activation='relu'))
#classifier.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
classifier.add(Dense(n_classes, activation='softmax'))
classifier.summary()
classifier.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['categorical_accuracy'])
Epoch 1/3 92/92 [==============================] - 108s 1s/step -
loss: 0.0638 - categorical_accuracy: 0.9853 Epoch 2/3 92/92
[==============================] - 107s 1s/step - loss: 0.0141 -
categorical_accuracy: 0.9969 Epoch 3/3 92/92
[==============================] - 108s 1s/step - loss: 0.0188 -
categorical_accuracy: 0.9938
input_class = 10
i = 0
image_path = glob.glob("full_ae_output/classifier_classes/class"+"{0:0=3d}".format(input_class)+"/*")[i]
input_img = np.array([np.array(Image.open(image_path).convert('L').resize(image_dims_original[::-1]))/255])
pred = classifier.predict(np.expand_dims(input_img,axis=3))
print("Predicted class = ",np.argmax(pred[0]))
I didn't recompute the actual accuracy but I suspect it to be lower than 50% since every sample I try I never get the right class.
Any ideas what might be bugging? Is the training accuracy computed by keras false?
Found it! It's just a matter of interpolation, datagenerator uses nearest by default, and opencv.resize uses bilinear interpolation. It's unbelievable how this difference messes up the classifier and changes the whole predictions. Fixed it and I get my 100% accuracy right now. Issue solved!
The training accuracy is definitely not false as computed by Keras.
Your intuition is good, indeed it is the preprocessing step on the test images that causes this problem.
I would recommend that you load one image and check how ImageDataGenerator works behind the curtains and check that when you use Pillow the exact preprocessing steps are applied.

LSTM Autoencoder producing poor results in test data

I'm applying LSTM autoencoder for anomaly detection. Since anomaly data are very few as compared to normal data, only normal instances are used for the training. Testing data consists of both anomalies and normal instances. During the training, the model loss seems good. However, in the test the data the model produces poor accuracy. i.e. anomaly and normal points are not well separated.
The snippet of my code is below:
.............
.............
X_train = X_train.reshape(X_train.shape[0], lookback, n_features)
X_valid = X_valid.reshape(X_valid.shape[0], lookback, n_features)
X_test = X_test.reshape(X_test.shape[0], lookback, n_features)
.....................
......................
N = 1000
batch = 1000
lr = 0.0001
timesteps = 3
encoding_dim = int(n_features/2)
lstm_model = Sequential()
lstm_model.add(LSTM(N, activation='relu', input_shape=(timesteps, n_features), return_sequences=True))
lstm_model.add(LSTM(encoding_dim, activation='relu', return_sequences=False))
lstm_model.add(RepeatVector(timesteps))
# Decoder
lstm_model.add(LSTM(timesteps, activation='relu', return_sequences=True))
lstm_model.add(LSTM(encoding_dim, activation='relu', return_sequences=True))
lstm_model.add(TimeDistributed(Dense(n_features)))
lstm_model.summary()
adam = optimizers.Adam(lr)
lstm_model.compile(loss='mse', optimizer=adam)
cp = ModelCheckpoint(filepath="lstm_classifier.h5",
save_best_only=True,
verbose=0)
tb = TensorBoard(log_dir='./logs',
histogram_freq=0,
write_graph=True,
write_images=True)
lstm_model_history = lstm_model.fit(X_train, X_train,
epochs=epochs,
batch_size=batch,
shuffle=False,
verbose=1,
validation_data=(X_valid, X_valid),
callbacks=[cp, tb]).history
.........................
test_x_predictions = lstm_model.predict(X_test)
mse = np.mean(np.power(preprocess_data.flatten(X_test) - preprocess_data.flatten(test_x_predictions), 2), axis=1)
error_df = pd.DataFrame({'Reconstruction_error': mse,
'True_class': y_test})
# Confusion Matrix
pred_y = [1 if e > threshold else 0 for e in error_df.Reconstruction_error.values]
conf_matrix = confusion_matrix(error_df.True_class, pred_y)
plt.figure(figsize=(5, 5))
sns.heatmap(conf_matrix, xticklabels=LABELS, yticklabels=LABELS, annot=True, fmt="d")
plt.title("Confusion matrix")
plt.ylabel('True class')
plt.xlabel('Predicted class')
plt.show()
Please suggest what can be done in the model to improve the accuracy.
If your model is not performing good on the test set I would make sure to check certain things;
Training set is not contaminated with anomalies or any information from the test set. If you use scaling, make sure you did not fit the scaler to training and test set combined.
Based on my experience; if an autoencoder cannot discriminate well enough on the test data but has low training loss, provided your training set is pure, it means that the autoencoder did learn about the underlying details of the training set but not about the generalized idea.
Your threshold value might be off and you may need to come up with a better thresholding procedure. One example can be found here: https://dl.acm.org/citation.cfm?doid=3219819.3219845
If the problem is 2nd one, the solution is to increase generalization. With autoencoders, one of the most efficient generalization tool is the dimension of the bottleneck. Again based on my experience with anomaly detection in flight radar data; lowering the bottleneck dimension significantly increased my multi-class classification accuracy. I was using 14 features with an encoding_dim of 7, but encoding_dim of 4 provided even better results. The value of the training loss was not important in my case because I was only comparing reconstruction errors, but since you are making a classification with a threshold value of RE, a more robust thresholding may be used to improve accuracy, just as in the paper I've shared.

CNN alternates between good performance and chance

I have a binary classification problem I am trying to solve with a CNN written in Keras. The input are very sparse 200X125X2 tensors (can be though of as two images stacked together), and its nonzero elements are only ones (representing neuron spike trains). The input is generated using a data generator that I have built, so the model is trained using the fit_generator function.
I have tried various architectures, and some show a decent performance (~88%), but the thing is that sometimes when I train new models, they don't seem to work at all, giving a chance (50%) result every epoch. The weird thing is that it happens sometimes to the same architectures that worked well before. I am running the code on Google Colab (GPU) with TensorFlow 2.0. I have check multiple times that I haven't changed anything in the code. I know that random initialization of the weights and biases may cause slight changes in the performance, but it looks like something else.
Any ideas will be very helpful. Thanks!
Here is the relevant code for one of the models that had this problem (I am using unusual kernels, I know):
# General settings
x_max = 10
x_size, t_size, N_features = parameters(x_max)
batch_size = 64
N_epochs = 10
N_final = 10*N_features
N_final = int(N_final - N_final%(batch_size))
N_val = 100*batch_size
N_test = N_final/5
# Setting up the architecture of the network and compiling
model = Sequential()
model.add(SeparableConv2D(50, (50,30), data_format='channels_first', input_shape=(2,x_size, t_size)))
model.add(MaxPooling2D(pool_size=2, data_format='channels_first'))
model.add(SeparableConv2D(100, (10,6), data_format='channels_first', input_shape=(2,x_size, t_size)))
model.add(MaxPooling2D(pool_size=2, data_format='channels_first'))
model.add(Flatten())
model.add(Dense(100, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fitiing the model on generated data
filepath="......hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_accuracy', verbose=1, save_best_only=True, mode='max')
start = time.time()
fit_history = model_delta.fit_generator(generator = data_generator(batch_size,x_max,'delta','_',100),
steps_per_epoch = N_final//batch_size,
validation_data = data_generator(batch_size,x_max,'delta','_',100),
validation_steps = N_val//batch_size,
callbacks = [checkpoint],
epochs = N_epochs)
end = time.time()
The most suspicious thing I see is a 'relu' near the end of the model. Depending on the initialization and on the learning rate, ReLUs can be unlucky and fall into an all-zeros case. When this happens, they completely stop gradients and don't train anymore.
By the looks of your problem (sometimes it works, sometimes it doesn't), it seems very plausible that it's the relu.
So, the first suggesion (this always solves it) is to add a batch normalization before the activation:
model.add(Dense(100))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dense(1, activation='sigmoid')
Hint, if you are going to use it with the 4D tensors before the flatten, remember to use the channels dimension: BatchNormalization(1).

LSTM with Keras for mini-batch training and online testing

I would like to implement an LSTM in Keras for streaming time-series prediction -- i.e., running online, getting one data point at a time. This is explained well here, but as one would assume, the training time for an online LSTM can be prohibitively slow. I would like to train my network on mini-batches, and test (run prediction) online. What is the best way to do this in Keras?
For example, a mini-batch could be a sequence of 1000 data values ([33, 34, 42, 33, 32, 33, 36, ... 24, 23]) that occur at consecutive time steps. To train the network I've specified an array X of shape (900, 100, 1), where there are 900 sequences of length 100, and an array y of shape (900, 1). E.g.,
X[0] = [[33], [34], [42], [33], ...]]
X[1] = [[34], [42], [33], [32], ...]]
...
X[999] = [..., [24]]
y[999] = [23]
So for each sequence X[i], there is a corresponding y[i] that represents the next value in the time-series -- what we want to predict.
In test I want to predict the next data values 1000 to 1999. I do this by feeding an array of shape (1, 100, 1) for each step from 1000 to 1999, where the model tries to predict the value at the next step.
Is this the recommended approach and setup for my problem? Enabling statefulness may be the way to go for a purely online implementation, but in Keras this requires a consistent batch_input_shape in training and testing, which would not work for my intent of training on mini-batches and then testing online. Or is there a way I can do this?
UPDATE: Trying to implement the network as #nemo recommended
I ran my own dataset on an example network from a blog post "Time Series Prediction with LSTM Recurrent Neural Networks in Python with Keras", and then tried implementing the prediction phase as a stateful network.
The model building and training is the same for both:
# Create and fit the LSTM network
numberOfEpochs = 10
look_back = 30
model = Sequential()
model.add(LSTM(4, input_dim=1, input_length=look_back))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, nb_epoch=numberOfEpochs, batch_size=1, verbose=2)
# trainX.shape = (6883, 30, 1)
# trainY.shape = (6883,)
# testX.shape = (3375, 30, 1)
# testY.shape = (3375,)
Batch prediction is done with:
trainPredict = model.predict(trainX, batch_size=batch_size)
testPredict = model.predict(testX, batch_size=batch_size)
To try a stateful prediction phase, I ran the same model setup and training as before, but then the following:
w = model.get_weights()
batch_size = 1
model = Sequential()
model.add(LSTM(4, batch_input_shape=(batch_size, look_back, 1), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
trainPredictions, testPredictions = [], []
for trainSample in trainX:
trainPredictions.append(model.predict(trainSample.reshape((1,look_back,1)), batch_size=batch_size))
trainPredict = numpy.concatenate(trainPredictions).ravel()
for testSample in testX:
testPredictions.append(model.predict(testSample.reshape((1,look_back,1)), batch_size=batch_size))
testPredict = numpy.concatenate(testPredictions).ravel()
To inspect the results, the plots below show the actual (normalized) data in blue, the predictions on the training set in green, and the predictions on the test set in red.
The first figure is from using batch prediction, and the second from stateful. Any ideas what I'm doing incorrectly?
If I understand you correctly you are asking if you can enable statefulness after training. This should be possible, yes. For example:
net = Dense(1)(SimpleRNN(stateful=False)(input))
model = Model(input=input, output=net)
model.fit(...)
w = model.get_weights()
net = Dense(1)(SimpleRNN(stateful=True)(input))
model = Model(input=input, output=net)
model.set_weights(w)
After that you can predict in a stateful way.

Resources