(keras) I want to decay learning rate when val_acc stops improving - keras

I'm trying to make CNN model, but having low accuarcy :(
So, I want to decay SGD learning rate when validation accuracy stops improving.
How can I make and compile it??

If you loop on model.train_on_batch you can change the learning rate manually:
import keras.backend as K
from keras.optimizers import Adam
import sys
epochs = 50
batch_size = 32
iterations_per_epoch = len(x_train) // batch_size
lr = 0.01
model.compile(optimizer=Adam(lr), loss='some loss')
min_val_loss = sys.float_info.max
for epoch in range(epochs):
for batch in range(iterations_per_epoch):
model.train_on_batch(x_train, y_train)
val_loss = model.evaluate(x_val, y_val)
if val_loss >= min_val_loss:
K.set_value(model.optimizer.lr, lr / 2.)
lr /= 2.
else:
min_val_loss = val_loss
This is a very naive way to decrease the learning rate once the validation loss had stopped decreasing. I would suggest implementing a bit more sophisticated rule such as validation loss had not decreased for the last X batches or so.

Related

Why is testing accuracy so low, could there be a bug in my code?

I've been training an image classification model using object detection and then applying image classification to the images. I have 87 custom classes in my data(not ImageNet classes), and just over 7000 images altogether(around 60 images per class). I am happy with my object detection code and I think it works quite well, however, for classification I have been using ResNet and AlexNet. I have tried AlexNet, ResNet18, ResNet50 and ResNet101 for training however, I am getting very low testing accuracies(around 10%), and my training accuracies are high for all models. I've also attempted regularisation and changing the learning rates, but I am not getting the higher accuracies(>80%) that I require. I wonder if there is a bug in my code, although I haven't been able to figure it out.
Here is my training code, I have also processed images in the way that Pytorch pretrained models expect:
import torch.nn as nn
import torch.optim as optim
from typing import Callable
import numpy as np
EPOCHS=100
resnet = torch.hub.load('pytorch/vision:v0.10.0', 'resnet50')
resnet.eval()
resnet.fc = nn.Linear(2048, 87)
res_loss = nn.CrossEntropyLoss()
res_optimiser = optim.SGD(resnet.parameters(), lr=0.01, momentum=0.9, weight_decay=1e-5)
def train_model(model, loss_fn, optimiser, modelsavepath):
train_acc = 0
for j in range(EPOCHS):
running_loss = 0.0
correct = 0
total = 0
for i, data in enumerate(training_generator, 0):
model.train()
inputs, labels, paths = data
total += 1
optimizer.zero_grad()
outputs = model(inputs)
_, predicted = torch.max(outputs, 1)
if(predicted.int() == labels.int()):
correct += 1
loss = loss_fn(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
train_acc = train_correct / len(training_generator)
print("Epoch:{}/{} AVG Training Loss:{:.3f} AVG Training Acc {:.2f}% ".format(j + 1, EPOCHS, train_loss, train_acc))
torch.save(model, modelsavepath)
train_model(resnet, res_loss, res_optimiser, 'resnet.pth')
Here is the testing code used for a single image, it is part of a class:
self.model.eval()
outputs = self.model(img[None, ...]) #models expect batches, so give it a singleton batch
scores, predictions = torch.max(outputs, 1)
predictions = predictions.numpy()[0]
possible_scores= np.argmax(scores.detach().numpy())
Is there a bug in my code, either testing or training, or is my model just overfitting? Additionally, is there a better image classification model that I could try?
Your dataset is very small, so you're most likely overfitting. Try:
decrease learning rate (try 0.001, 0.0001, 0.00001)
increase weight_decay (try 1e-4, 1e-3, 1e-2)
if you don't already, use image augmentations (at least the default ones, like random crop and flip).
Watch train/test loss curves when finetuning your model and stop training as soon as you see test accuracy going down while train accuracy goes up.

SDG with batch size >1?

I'm taking the "Deep NNs with PyTorch" course by IBM and I encountered lab examples where SDG is used for optimizer while batch size is >1 in DataLoader.
If I understand correctly, SGD would perform gradient descent with only 1 training example in each step, so it this case how would the SGD interact with each batch of training example?
For example, if batch size = 20, would the SGD optimizer perform 20 GD steps in each batch? If this is the case, then does that mean no matter what batch size I set for DataLoader, the SGD optimizer would just perform (# of training example) GD steps in one epoch?
Layers = [2, 50, 3]
model = Net(Layers)
learning_rate = 0.10
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
train_loader = DataLoader(dataset=data_set, batch_size=20)
criterion = nn.CrossEntropyLoss()
LOSS = train(data_set, model, criterion, train_loader, optimizer, epochs=100)
def train(data_set, model, criterion, train_loader, optimizer, epochs=100):
LOSS = []
ACC = []
for epoch in range(epochs):
for x, y in train_loader:
print(x, y)
optimizer.zero_grad()
yhat = model(x)
loss = criterion(yhat, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
LOSS.append(loss.item())
ACC.append(accuracy(model, data_set))
...
if batch size = 20, would the SGD optimizer perform 20 GD steps in
each batch?
No. Batch size = 20 means, it would process all the 20 samples and then get the scalar loss. Based on that it would backpropagate the error. And that is one step of GD.
This is known as minibatch SGD, instead of taking 1 input like in SGD, it considers 20 and then everything else stays the same.

InceptionV3 transfer learning with Keras overfitting too soon

I'm using a pre trained InceptionV3 on Keras to retrain the model to make a binary image classification (data labeled with 0's and 1's).
I'm reaching about 65% of accuracy on my k-fold validation with never seen data, but the problem is the model is overfitting to soon. I need to improve this average accuracy, and I guess there is something related to this overfitting problem.
Here are the loss values on epochs:
Here is the code. The dataset and label variables are Numpy Arrays.
dataset = joblib.load(path_to_dataset)
labels = joblib.load(path_to_labels)
le = LabelEncoder()
labels = le.fit_transform(labels)
labels = to_categorical(labels, 2)
X_train, X_test, y_train, y_test = sk.train_test_split(dataset, labels, test_size=0.2)
X_train, X_val, y_train, y_val = sk.train_test_split(X_train, y_train, test_size=0.25) # 0.25 x 0.8 = 0.2
X_train = np.array(X_train)
y_train = np.array(y_train)
X_val = np.array(X_val)
y_val = np.array(y_val)
X_test = np.array(X_test)
y_test = np.array(y_test)
aug = ImageDataGenerator(
rotation_range=20,
zoom_range=0.15,
horizontal_flip=True,
fill_mode="nearest")
pre_trained_model = InceptionV3(input_shape = (299, 299, 3),
include_top = False,
weights = 'imagenet')
for layer in pre_trained_model.layers:
layer.trainable = False
x = layers.Flatten()(pre_trained_model.output)
x = layers.Dense(1024, activation = 'relu')(x)
x = layers.Dropout(0.2)(x)
x = layers.Dense(2, activation = 'softmax')(x) #already tried with sigmoid activation, same behavior
model = Model(pre_trained_model.input, x)
model.compile(optimizer = RMSprop(lr = 0.0001),
loss = 'binary_crossentropy',
metrics = ['accuracy']) #Already tried with Adam optimizer, same behavior
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=100)
mc = ModelCheckpoint('best_model_inception_rmsprop.h5', monitor='val_accuracy', mode='max', verbose=1, save_best_only=True)
history = model.fit(x=aug.flow(X_train, y_train, batch_size=32),
validation_data = (X_val, y_val),
epochs = 100,
callbacks=[es, mc])
The training dataset has 2181 images and validation has 727 images.
Something is wrong, but I can't tell what...
Any thoughts of what can be done to improve it?
One way to avoid overfitting is to use a lot of data. The main reason overfitting happens is because you have a small dataset and you try to learn from it. The algorithm will have greater control over this small dataset and it will make sure it satisfies all the datapoints exactly. But if you have a large number of datapoints, then the algorithm is forced to generalize and come up with a good model that suits most of the points.
Suggestions:
Use a lot of data.
Use less deep network if you have a small number of data samples.
If 2nd satisfies then don't use huge number of epochs - Using many epochs leads is kinda forcing your model to learn that and your model will learn it well but can not generalize.
From your loss graph , i see that the model is generalized at early epoch ( where there is intersection of both the train & val score) so plz try to use the model saved at that epoch ( and not the later epochs which seems to overfit)
Second option what you have is use lot of training samples..
If you have less no. of training samples then use data augmentations
Have you tried following?
Using a higher dropout value
Lower Learning Rate (lr=0.00001 or lr=0.000001 ...)
More data augmentation you can use.
It seems to me your data amount is low. You may use a lower ratio for test and validation (10%, 10%).

TensorFlow 2.0 learning rate scheduler with tf.GradientTape

I am using TensorFlow 2.0 and Python 3.8 and I want to use a learning rate scheduler for which I have a function. I have to train a neural network for 160 epochs with the following where the learning rate is to be decreased by a factor of 10 at 80 and 120 epochs, where the initial learning rate = 0.01.
def scheduler(epoch, current_learning_rate):
if epoch == 79 or epoch == 119:
return current_learning_rate / 10
else:
return min(current_learning_rate, 0.001)
How can I use this learning rate scheduler function with 'tf.GradientTape()'? I know how to use this using "model.fit()" as a callback:
callback = tf.keras.callbacks.LearningRateScheduler(scheduler)
How do I use this while using custom training loops with "tf.GradientTape()"?
Thanks!
The learning rate for different epochs can be set using lr attribute of tensorflow keras optimizer. lr attribute of the optimizer still exists since tensorflow 2 has backward compatibility for keras (For more details refer the source code here).
Below is a small snippet of how the learning rate can be varied across different epochs. self._train_step is similar to the train_step function defined here.
def set_learning_rate(epoch):
if epoch > 180:
optimizer.lr = 0.5e-6
elif epoch > 160:
optimizer.lr = 1e-6
elif epoch > 120:
optimizer.lr = 1e-5
elif epoch > 3:
optimizer.lr = 1e-4
def train(epochs, train_data, val_data):
prev_val_loss = float('inf')
for epoch in range(epochs):
self.set_learning_rate(epoch)
for images, labels in train_data:
self._train_step(images, labels)
for images, labels in val_data:
self._test_step(images, labels)
Another alternative would be to use tf.keras.optimizers.schedules
learning_rate_fn = keras.optimizers.schedules.PiecewiseConstantDecay(
[80*num_steps, 120*num_steps, 160*num_steps, 180*num_steps],
[1e-3, 1e-4, 1e-5, 1e-6, 5e-6]
)
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate_fn)
Note that here one cant directly provide the epochs, instead the number of steps have to be given, where each step is len(train_data)/batch_size.
A learning rate schedule needs a step value that can not be specified when using GradientTape followed by optimizer.apply_gradient().
So you should not pass directly the schedule as the learning_rate of the optimizer.
Instead, you can first call the schedule function to get the value for current step and then update the learning rate value in the optimizer:
optim = tf.keras.optimizers.SGD()
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(1e-2,1000,.9)
for step in range(0,1000):
lr = lr_schedule(step)
optim.learning_rate = lr
with GradientTape() as tape:
call func to differentiate
optim.apply_gradient(func,...)

Keras batch normalization stops convergence

I'm new to keras and have been experimenting with various things such as BatchNormalization but it is not working at all. When the BatchNormalization line is commented out it will converge to around 0.04 loss or better, but with it as it is it will converge to 0.71 and get stuck around there, I'm not sure what's wrong.
from sklearn import preprocessing
from sklearn.datasets import load_boston
from keras.models import Model
from keras.layers import Input, Dense
from keras.layers.normalization import BatchNormalization
import keras.optimizers
boston = load_boston()
x = boston.data
y = boston.target
normx = preprocessing.scale(x)
normy = preprocessing.scale(y)
# doesnt construct output layer
def layer_looper(inputs, number_of_loops, neurons):
inputs_copy = inputs
for i in range(number_of_loops):
inputs_copy = Dense(neurons, activation='relu')(inputs_copy)
inputs_copy = BatchNormalization()(inputs_copy)
return inputs_copy
inputs = Input(shape = (13,))
x = layer_looper(inputs, 40, 20)
predictions = Dense(1, activation='linear')(x)
model = Model(inputs=inputs, outputs=predictions)
opti = keras.optimizers.Adam(lr=0.0001)
model.compile(loss='mean_absolute_error', optimizer=opti, metrics=['acc'])
print(model.summary())
model.fit(normx, normy, epochs=5000, verbose=2, batch_size=128)
I have tried experimenting with batch sizes and the optimizer but it doesn't seem very effective. Am I doing something wrong?
I've increased learning rate to 0.01 and it seems like the network is able to learn something (I get Epoch 1000/5000- 0s - loss: 0.2330) .
I think it's worth to note the following from the abstract of original Batch Normalization paper:
Batch Normalization allows us to use much higher learning rates and
be less careful about initialization. It also acts as a regularizer (...)
That hinted to increased learning rate (that's something you might want to experiment with).
Be aware that since it works like regularization, BatchNorm should make your training loss worse - it's supposed to prevent overfitting and thus close the gap between the train and test/valid errors.

Resources