lr = tf.train.exponential_decay(start_lr, global_step, 3000, 0.96, staircase=True)
optimizer = tf.train.AdamOptimizer(learning_rate=lr, epsilon=0.1)
I want this tensorflow code to be converted to Pytorch
args.learning_rate = 0.001 i.e. start_lr
args.learning_rate_decay_factor = 0.96
args.learning_rate_decay_step = 3000
optim = torch.optim.Adam(params=model.parameters(), lr=args.learning_rate)
lr_scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer=optim, gamma=args.learning_rate_decay_factor,last_epoch=-1)
for epoch in (range(1, args.num_epoch + 1)):
# Do Forward pass here
optim.zero_grad()
loss.backward()
optim.step()
if epoch % args.learning_rate_decay_step == 0:
lr_scheduler.step()
# debugging purpose
print(lr_scheduler.get_last_lr()) # will print last learning rate.
The training process will update learning rate every 3000 epoch.
Hope this answers your question, let me know otherwise. This is my first answer so your feedback will help me improve future answers. :)
References:
PyTorch - How to get learning rate during training?
https://discuss.pytorch.org/t/how-to-use-torch-optim-lr-scheduler-exponentiallr/12444/6
Related
Good morning.
I'm training image captioning model and just wondering if there's any different between those two code.
I'm training with first code and I found out that in some sample, model's keep producing random values after the seq(where there should be ideally padding).
Is there any difference between those two code?
# index 0 is for pad token
criterion = nn.CrossEntropyLoss(ignore_index=0)
'computing loss'
loss = criterion(pred, target)
loss.backward()
optimizer.step()
criterion = nn.CrossEntropyLoss()
'computing loss'
pad_location = torch.ne(target, 0)
loss = criterion(pred, target)
loss *= pad_location
loss.backward()
optimizer.step()
Thanks
I've been training an image classification model using object detection and then applying image classification to the images. I have 87 custom classes in my data(not ImageNet classes), and just over 7000 images altogether(around 60 images per class). I am happy with my object detection code and I think it works quite well, however, for classification I have been using ResNet and AlexNet. I have tried AlexNet, ResNet18, ResNet50 and ResNet101 for training however, I am getting very low testing accuracies(around 10%), and my training accuracies are high for all models. I've also attempted regularisation and changing the learning rates, but I am not getting the higher accuracies(>80%) that I require. I wonder if there is a bug in my code, although I haven't been able to figure it out.
Here is my training code, I have also processed images in the way that Pytorch pretrained models expect:
import torch.nn as nn
import torch.optim as optim
from typing import Callable
import numpy as np
EPOCHS=100
resnet = torch.hub.load('pytorch/vision:v0.10.0', 'resnet50')
resnet.eval()
resnet.fc = nn.Linear(2048, 87)
res_loss = nn.CrossEntropyLoss()
res_optimiser = optim.SGD(resnet.parameters(), lr=0.01, momentum=0.9, weight_decay=1e-5)
def train_model(model, loss_fn, optimiser, modelsavepath):
train_acc = 0
for j in range(EPOCHS):
running_loss = 0.0
correct = 0
total = 0
for i, data in enumerate(training_generator, 0):
model.train()
inputs, labels, paths = data
total += 1
optimizer.zero_grad()
outputs = model(inputs)
_, predicted = torch.max(outputs, 1)
if(predicted.int() == labels.int()):
correct += 1
loss = loss_fn(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
train_acc = train_correct / len(training_generator)
print("Epoch:{}/{} AVG Training Loss:{:.3f} AVG Training Acc {:.2f}% ".format(j + 1, EPOCHS, train_loss, train_acc))
torch.save(model, modelsavepath)
train_model(resnet, res_loss, res_optimiser, 'resnet.pth')
Here is the testing code used for a single image, it is part of a class:
self.model.eval()
outputs = self.model(img[None, ...]) #models expect batches, so give it a singleton batch
scores, predictions = torch.max(outputs, 1)
predictions = predictions.numpy()[0]
possible_scores= np.argmax(scores.detach().numpy())
Is there a bug in my code, either testing or training, or is my model just overfitting? Additionally, is there a better image classification model that I could try?
Your dataset is very small, so you're most likely overfitting. Try:
decrease learning rate (try 0.001, 0.0001, 0.00001)
increase weight_decay (try 1e-4, 1e-3, 1e-2)
if you don't already, use image augmentations (at least the default ones, like random crop and flip).
Watch train/test loss curves when finetuning your model and stop training as soon as you see test accuracy going down while train accuracy goes up.
x=np.linspace(0,20,100)
g=1+0.2*np.exp(-0.1*(x-7)**2)
y=np.sin(g*x)
plt.plot(x,y)
plt.show()
x=torch.from_numpy(x)
y=torch.from_numpy(y)
x=x.reshape((100,1))
y=y.reshape((100,1))
MM=nn.Sequential()
MM.add_module('L1',nn.Linear(1,128))
MM.add_module('R1',nn.ReLU())
MM.add_module('L2',nn.Linear(128,128))
MM.add_module('R2',nn.ReLU())
MM.add_module('L3',nn.Linear(128,128))
MM.add_module('R3',nn.ReLU())
MM.add_module('L4',nn.Linear(128,128))
MM.add_module('R5',nn.ReLU())
MM.add_module('L5',nn.Linear(128,1))
MM.double()
L=nn.MSELoss()
lr=3e-05 ######
opt=torch.optim.Adam(MM.parameters(),lr) #########
Epo=[]
COST=[]
for epoch in range(8000):
opt.zero_grad()
err=L(torch.sin(MM(x)),y)
Epo.append(epoch)
COST.append(err)
err.backward()
if epoch%100==0:
print(err)
opt.step()
Epo=np.array(Epo)/1000.
COST=np.array(COST)
pred=torch.sin(MM(x)).detach().numpy()
Trans=MM(x).detach().numpy()
x=x.reshape((100))
pred=pred.reshape((100))
Trans=Trans.reshape((100))
fig = plt.figure(figsize=(10,10))
#ax = fig.gca(projection='3d')
ax = fig.add_subplot(2,2,1)
surf = ax.plot(x,y,'r')
#ax.plot_surface(x_dat,y_dat,z_pred)
#ax.plot_wireframe(x_dat,y_dat,z_pred,linewidth=0.1)
fig.tight_layout()
#plt.show()
ax = fig.add_subplot(2,2,2)
surf = ax.plot(x,pred,'g')
fig.tight_layout()
ax = fig.add_subplot(2,2,3)
surff=ax.plot(Epo,COST,'y+')
plt.ylim(0,1100)
ax = fig.add_subplot(2,2,4)
surf = ax.plot(x,Trans,'b')
fig.tight_layout()
plt.show()
This is the original code 1.
For changing learning rate during training, I tried to move the position of 'opt' as
Epo=[]
COST=[]
for epoch in range(8000):
lr=3e-05 ######
opt=torch.optim.Adam(MM.parameters(),lr) #########
opt.zero_grad()
err=L(torch.sin(MM(x)),y)
Epo.append(epoch)
COST.append(err)
err.backward()
if epoch%100==0:
print(err)
opt.step()
This is code 2.
The code 2 also operate, but the result is quite different with code 1.
What is the difference and for changing learning rate during training(like lr=(1-epoch/10000 *0.99), what should I do?
You shouldn't move the optimizer definition into the training loop, because the optimizer keeps many other information related to training history, e.g in case of Adam there are running averages of gradients that are stored and updated dynamically in the optimizer's internal mechanism,...
So instanciating a new optimizer each iteration makes you lose this history track.
To update the learning rate dynamically there are lot of schedulers classes proposed in pytorch (exponential decay, cyclical decay, cosine annealing , ...). you can check them from the documentation for the full list of schedulers or you can implement your own if needed: https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
Example from the documentation: to decay the learning rate by multiplying it by 0.5 each 10 epochs you can use the StepLR scheduler as follows:
opt = torch.optim.Adam(MM.parameters(), lr)
scheduler = torch.optim.lr_scheduler.StepLR(opt, step_size=10, gamma=0.5)
And in your original code 1 you can do :
for epoch in range(8000):
opt.zero_grad()
err=L(torch.sin(MM(x)),y)
Epo.append(epoch)
COST.append(err)
err.backward()
if epoch%100==0:
print(err)
opt.step()
scheduler.step()
As I say you have many other type of lr schedulers so you can choose from the documentation or implement your own
I am using TensorFlow 2.0 and Python 3.8 and I want to use a learning rate scheduler for which I have a function. I have to train a neural network for 160 epochs with the following where the learning rate is to be decreased by a factor of 10 at 80 and 120 epochs, where the initial learning rate = 0.01.
def scheduler(epoch, current_learning_rate):
if epoch == 79 or epoch == 119:
return current_learning_rate / 10
else:
return min(current_learning_rate, 0.001)
How can I use this learning rate scheduler function with 'tf.GradientTape()'? I know how to use this using "model.fit()" as a callback:
callback = tf.keras.callbacks.LearningRateScheduler(scheduler)
How do I use this while using custom training loops with "tf.GradientTape()"?
Thanks!
The learning rate for different epochs can be set using lr attribute of tensorflow keras optimizer. lr attribute of the optimizer still exists since tensorflow 2 has backward compatibility for keras (For more details refer the source code here).
Below is a small snippet of how the learning rate can be varied across different epochs. self._train_step is similar to the train_step function defined here.
def set_learning_rate(epoch):
if epoch > 180:
optimizer.lr = 0.5e-6
elif epoch > 160:
optimizer.lr = 1e-6
elif epoch > 120:
optimizer.lr = 1e-5
elif epoch > 3:
optimizer.lr = 1e-4
def train(epochs, train_data, val_data):
prev_val_loss = float('inf')
for epoch in range(epochs):
self.set_learning_rate(epoch)
for images, labels in train_data:
self._train_step(images, labels)
for images, labels in val_data:
self._test_step(images, labels)
Another alternative would be to use tf.keras.optimizers.schedules
learning_rate_fn = keras.optimizers.schedules.PiecewiseConstantDecay(
[80*num_steps, 120*num_steps, 160*num_steps, 180*num_steps],
[1e-3, 1e-4, 1e-5, 1e-6, 5e-6]
)
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate_fn)
Note that here one cant directly provide the epochs, instead the number of steps have to be given, where each step is len(train_data)/batch_size.
A learning rate schedule needs a step value that can not be specified when using GradientTape followed by optimizer.apply_gradient().
So you should not pass directly the schedule as the learning_rate of the optimizer.
Instead, you can first call the schedule function to get the value for current step and then update the learning rate value in the optimizer:
optim = tf.keras.optimizers.SGD()
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(1e-2,1000,.9)
for step in range(0,1000):
lr = lr_schedule(step)
optim.learning_rate = lr
with GradientTape() as tape:
call func to differentiate
optim.apply_gradient(func,...)
I'm trying to make CNN model, but having low accuarcy :(
So, I want to decay SGD learning rate when validation accuracy stops improving.
How can I make and compile it??
If you loop on model.train_on_batch you can change the learning rate manually:
import keras.backend as K
from keras.optimizers import Adam
import sys
epochs = 50
batch_size = 32
iterations_per_epoch = len(x_train) // batch_size
lr = 0.01
model.compile(optimizer=Adam(lr), loss='some loss')
min_val_loss = sys.float_info.max
for epoch in range(epochs):
for batch in range(iterations_per_epoch):
model.train_on_batch(x_train, y_train)
val_loss = model.evaluate(x_val, y_val)
if val_loss >= min_val_loss:
K.set_value(model.optimizer.lr, lr / 2.)
lr /= 2.
else:
min_val_loss = val_loss
This is a very naive way to decrease the learning rate once the validation loss had stopped decreasing. I would suggest implementing a bit more sophisticated rule such as validation loss had not decreased for the last X batches or so.