I am using TensorFlow 2.0 and Python 3.8 and I want to use a learning rate scheduler for which I have a function. I have to train a neural network for 160 epochs with the following where the learning rate is to be decreased by a factor of 10 at 80 and 120 epochs, where the initial learning rate = 0.01.
def scheduler(epoch, current_learning_rate):
if epoch == 79 or epoch == 119:
return current_learning_rate / 10
else:
return min(current_learning_rate, 0.001)
How can I use this learning rate scheduler function with 'tf.GradientTape()'? I know how to use this using "model.fit()" as a callback:
callback = tf.keras.callbacks.LearningRateScheduler(scheduler)
How do I use this while using custom training loops with "tf.GradientTape()"?
Thanks!
The learning rate for different epochs can be set using lr attribute of tensorflow keras optimizer. lr attribute of the optimizer still exists since tensorflow 2 has backward compatibility for keras (For more details refer the source code here).
Below is a small snippet of how the learning rate can be varied across different epochs. self._train_step is similar to the train_step function defined here.
def set_learning_rate(epoch):
if epoch > 180:
optimizer.lr = 0.5e-6
elif epoch > 160:
optimizer.lr = 1e-6
elif epoch > 120:
optimizer.lr = 1e-5
elif epoch > 3:
optimizer.lr = 1e-4
def train(epochs, train_data, val_data):
prev_val_loss = float('inf')
for epoch in range(epochs):
self.set_learning_rate(epoch)
for images, labels in train_data:
self._train_step(images, labels)
for images, labels in val_data:
self._test_step(images, labels)
Another alternative would be to use tf.keras.optimizers.schedules
learning_rate_fn = keras.optimizers.schedules.PiecewiseConstantDecay(
[80*num_steps, 120*num_steps, 160*num_steps, 180*num_steps],
[1e-3, 1e-4, 1e-5, 1e-6, 5e-6]
)
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate_fn)
Note that here one cant directly provide the epochs, instead the number of steps have to be given, where each step is len(train_data)/batch_size.
A learning rate schedule needs a step value that can not be specified when using GradientTape followed by optimizer.apply_gradient().
So you should not pass directly the schedule as the learning_rate of the optimizer.
Instead, you can first call the schedule function to get the value for current step and then update the learning rate value in the optimizer:
optim = tf.keras.optimizers.SGD()
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(1e-2,1000,.9)
for step in range(0,1000):
lr = lr_schedule(step)
optim.learning_rate = lr
with GradientTape() as tape:
call func to differentiate
optim.apply_gradient(func,...)
Related
I've been training an image classification model using object detection and then applying image classification to the images. I have 87 custom classes in my data(not ImageNet classes), and just over 7000 images altogether(around 60 images per class). I am happy with my object detection code and I think it works quite well, however, for classification I have been using ResNet and AlexNet. I have tried AlexNet, ResNet18, ResNet50 and ResNet101 for training however, I am getting very low testing accuracies(around 10%), and my training accuracies are high for all models. I've also attempted regularisation and changing the learning rates, but I am not getting the higher accuracies(>80%) that I require. I wonder if there is a bug in my code, although I haven't been able to figure it out.
Here is my training code, I have also processed images in the way that Pytorch pretrained models expect:
import torch.nn as nn
import torch.optim as optim
from typing import Callable
import numpy as np
EPOCHS=100
resnet = torch.hub.load('pytorch/vision:v0.10.0', 'resnet50')
resnet.eval()
resnet.fc = nn.Linear(2048, 87)
res_loss = nn.CrossEntropyLoss()
res_optimiser = optim.SGD(resnet.parameters(), lr=0.01, momentum=0.9, weight_decay=1e-5)
def train_model(model, loss_fn, optimiser, modelsavepath):
train_acc = 0
for j in range(EPOCHS):
running_loss = 0.0
correct = 0
total = 0
for i, data in enumerate(training_generator, 0):
model.train()
inputs, labels, paths = data
total += 1
optimizer.zero_grad()
outputs = model(inputs)
_, predicted = torch.max(outputs, 1)
if(predicted.int() == labels.int()):
correct += 1
loss = loss_fn(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
train_acc = train_correct / len(training_generator)
print("Epoch:{}/{} AVG Training Loss:{:.3f} AVG Training Acc {:.2f}% ".format(j + 1, EPOCHS, train_loss, train_acc))
torch.save(model, modelsavepath)
train_model(resnet, res_loss, res_optimiser, 'resnet.pth')
Here is the testing code used for a single image, it is part of a class:
self.model.eval()
outputs = self.model(img[None, ...]) #models expect batches, so give it a singleton batch
scores, predictions = torch.max(outputs, 1)
predictions = predictions.numpy()[0]
possible_scores= np.argmax(scores.detach().numpy())
Is there a bug in my code, either testing or training, or is my model just overfitting? Additionally, is there a better image classification model that I could try?
Your dataset is very small, so you're most likely overfitting. Try:
decrease learning rate (try 0.001, 0.0001, 0.00001)
increase weight_decay (try 1e-4, 1e-3, 1e-2)
if you don't already, use image augmentations (at least the default ones, like random crop and flip).
Watch train/test loss curves when finetuning your model and stop training as soon as you see test accuracy going down while train accuracy goes up.
lr = tf.train.exponential_decay(start_lr, global_step, 3000, 0.96, staircase=True)
optimizer = tf.train.AdamOptimizer(learning_rate=lr, epsilon=0.1)
I want this tensorflow code to be converted to Pytorch
args.learning_rate = 0.001 i.e. start_lr
args.learning_rate_decay_factor = 0.96
args.learning_rate_decay_step = 3000
optim = torch.optim.Adam(params=model.parameters(), lr=args.learning_rate)
lr_scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer=optim, gamma=args.learning_rate_decay_factor,last_epoch=-1)
for epoch in (range(1, args.num_epoch + 1)):
# Do Forward pass here
optim.zero_grad()
loss.backward()
optim.step()
if epoch % args.learning_rate_decay_step == 0:
lr_scheduler.step()
# debugging purpose
print(lr_scheduler.get_last_lr()) # will print last learning rate.
The training process will update learning rate every 3000 epoch.
Hope this answers your question, let me know otherwise. This is my first answer so your feedback will help me improve future answers. :)
References:
PyTorch - How to get learning rate during training?
https://discuss.pytorch.org/t/how-to-use-torch-optim-lr-scheduler-exponentiallr/12444/6
I translated TensorFlow code to Pytorch code but it takes a long time.
It takes 6 hours to train in TensorFlow slim but it takes about 42 hours in PyTorch. (3200 epochs)
I use the profiler function to check the time each function takes.
But it says to (maybe .to(device)) function occupy a large portion of CPU time.
result of the profiler (5 epochs):
Below codes is a part of codes for training
with pf.profile() as prof :
for epoch in range(5):
# Print epoch
print(f'Starting epoch {epoch+1}')
# Set current loss value
current_loss = 0.0
tot_loss = 0.0
# Iterate over the DataLoader for training data
for i, data in enumerate(trainloader, 0):
# Get inputs
inputs, targets = data
inputs = inputs.cuda()
targets = targets.cuda()
# Zero the gradients
optimizer.zero_grad()
# Perform forward pass
outputs = model(inputs)
loss = criterion(outputs, targets)
# Perform backward pass
loss.backward()
# Perform optimization
optimizer.step()
# Print statistics
if i % 10000 == 9999:
break
I have been seeing code that uses an Adam optimizer . And the way they decrease the learning rate is as follows:
optimizer = torch.optim.Adam(net.parameters(),lr=0.01)
(training...
optimizer.step()...)
if iteration >= some_threshold:
for param_group in optimizer.param_groups:
param_group['lr'] = 0.001
I thought we have the same learning rate for all parameters. So why then iterate over the param_groups and individually set learning rate for each parameter?
Wouldn't the following be faster and have an identical effect?
optimizer = torch.optim.Adam(net.parameters(),lr=0.01)
scheduler = MultiStepLR(optimizer, milestones=[some_threshold], gamma=0.1)
(training...
optimizer.step()
scheduler.step())
Thank you
You need to iterate over param_groups because if you don't specify multiple groups of parameters in the optimiser, you automatically have a single group. That doesn't mean you set the learning rate for each parameter, but rather each parameter group.
In fact the learning rate schedulers from PyTorch do the same thing. From _LRScheduler (base class of learning rate schedulers):
with _enable_get_lr_call(self):
if epoch is None:
self.last_epoch += 1
values = self.get_lr()
else:
warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning)
self.last_epoch = epoch
if hasattr(self, "_get_closed_form_lr"):
values = self._get_closed_form_lr()
else:
values = self.get_lr()
for param_group, lr in zip(self.optimizer.param_groups, values):
param_group['lr'] = lr
Yes, it has the identical effect in this case, but it wouldn't be faster.
For example, set lr = 0.01 for the first 100 epochs, lr = 0.001 from epoch 101 to epoch 1000, lr = 0.0005 for epoch 1001-4000. Basically my learning rate plan is not letting it decay exponentially with a fixed number of steps. I know it can be achieved by self-defined functions, just curious if there are already developed functions to do that.
torch.optim.lr_scheduler.LambdaLR is what you are looking for. It returns multiplier of initial learning rate so you can specify any value for any given epoch. For your example it would be:
def lr_lambda(epoch: int):
if 100 < epoch < 1000:
return 0.1
if 1000 < epoch 4000:
return 0.05
# Optimizer has lr set to 0.01
scheduler = LambdaLR(optimizer, lr_lambda=[lambda1, lambda2])
for epoch in range(100):
train(...)
validate(...)
optimizer.step()
scheduler.step()
In PyTorch there are common functions (like MultiStepLR or ExponentialLR) but for custom use case (as is yours), LambdaLR is the easiest.