How to display the last used learning rate after an epoch - keras

I have tried several methods to display the learning rate of a model effectively used at the last epoch in Keras.
Some research has shown it was possible to change the learning rate using callbacks, or to display the learning rate with a custom metric.
But the displayed learning rate was always the ORIGINAL learning rate, whatever method I tried.
Some answers imply re-calculates how much the rate should be based on the formula. But what I want is simply to get the learning rate which was used for backpropagation, without calculating it based on the algorithm.
Here is some code I used:
callback_list = []
metric_list = ['accuracy']
# Add checkpoints to save weights in case the test set acc improved
#...
if show_learn_param:
learn_param = Callback_show_learn_param()
callback_list.append(learn_param)
# Add metric if needed
def get_lr_metric(optimizer):
def lr(y_true, y_pred):
return optimizer.lr #K.eval(optimizer.lr)
return lr
lr_metric = get_lr_metric(optimizer)
metric_list.append(lr_metric)
Here is the definition of the callback:
class Callback_show_learn_param(Callback):
def on_epoch_end(self, epoch, logs=None):
lr = self.model.optimizer.lr
decay = self.model.optimizer.decay
iterations = self.model.optimizer.iterations
lr_with_decay = lr / (1. + decay * K.cast(iterations, K.dtype(decay)))
# Beta values
beta_1=self.model.optimizer.beta_1
beta_2=self.model.optimizer.beta_2
print("lr", K.eval(lr), "decay", K.eval(decay), "lr_with_decay", K.eval(lr_with_decay),
"beta_1", K.eval(beta_1), "beta_2", K.eval(beta_2))
Basically, the displayed values are constant and do not change. It makes sense for beta values and decay. The shown learning rate seems to be the initial one. For the learning rate, I could not find a way to display this simple value: effective learning rate really used.
There is BTW an easier way to display this initial learning rate:
import keras.backend as K
print(K.eval(model.optimizer.lr))

You need to use K.get_value to obtain the learning rate. Have a look at LearningRateScheduler and how that callback obtains the learning rate from the model. In your case you should be able to print the learning rate:
def on_epoch_end(self, epoch, logs=None):
lr = float(K.get_value(self.model.optimizer.lr))
print("Learning rate:", lr)

Related

Why do genetic algorithms converge to end up with a population that is identical?

I was implementing a genetic algorithm with tf keras, where i manualy modify the weight, make the gene cross over, all that. Ive found that after a few docen generations, the predictions of all the network are essentialy identical, and after a few more generations the predictions are exactly the same. trying to google the problem i found this page
that mentions the problem in a conceptual level but i cant understand how this would happen if im manualy creating genetic diverity every generation.
def model_mutate(weights,var):
for i in range(len(weights)):
for j in range(len(weights[i])):
if( random.uniform(0,1) < 0.2): #learing rate of 15%
change = np.random.uniform(-var,var,weights[i][j].shape)
weights[i][j] += change
return weights
def crossover_brains(parent1, parent2):
global brains
weight1 = parent1.get_weights()
weight2 = parent2.get_weights()
new_weight1 = weight1
new_weight2 = weight2
gene = random.randint(0,len(new_weight1)-1) #we change a random weight
#or set of weights
new_weight1[gene] = weight2[gene]
new_weight2[gene] = weight1[gene]
q=np.asarray([new_weight1,new_weight2],dtype=object)
return q
def evolve(best_fit1,best_fit2):
global generation
global best_brain
global best_brain2
mutations=[]
for i in range(total_brains//2):
cross_weights=model_crossover(best_fit1,best_fit2)
mutation1=model_mutate(cross_weights[0],0.5)
mutation2=model_mutate(cross_weights[1],0.5)
mutations.append(mutation1)
mutations.append(mutation2)
for i in range(total_brains):
brains[i].set_weights(mutations[i])
generation+=1
def find_best_fit():
fitness=np.loadtxt("fitness.txt")
print(f"fitness average {np.mean(fitness)} in generation {generation}")
print(f"fitness max is {np.max(fitness)} in generation {generation} ")
fitness_t.append(np.mean(fitness))
maxfit1=np.max(fitness)
best_fit1=np.where(fitness==maxfit1)[0]
fitness[best_fit1]=0
maxfit2=np.max(fitness)
best_fit2=np.where(fitness==maxfit2)[0]
if len(best_fit1)>1: #this is a band_aid for when several indiviuals are the same
# this would lead to best_fit(1,2) being an array of indeces
best_fit1=best_fit1[0]
if len(best_fit2)>1:
best_fit2=best_fit2[0]
return int(best_fit1),int(best_fit2)
bf1,bf2=find_best_fit()
evolve(bf1,bf2)
This is the code im using to set the modified weights to the existing keras models (mostly not mine, i dont understand it enough to have created this myself)
if keras is working how i think its working, then i dont see how this would converge to anything that does not maximize fitness, further more, it seems to be decreasing over time.

What is the best way to resume training with a different learning rate in pytorch lightning

I want to train a model in two stages. The first one is a pre-training with teacher forcing, and the second one is a regular training (without teacher forcing). The difference here is that the model is instantiated with use_teacher_forcing=True in the first case and use_teacher_forcing=False in the latter.
To do so, I currently run two trainings, where the second training resumes from the first trainings checkpoint, by passing the last checkpoint to the lightning trainer.
Regarding the learning rate, I want to decay it over several milestones as well in pre-training as well as in regular training. For instance, if I use 5 epochs of pre-training and 5 epochs of training, I want the learning rate to be as follows:
0
1
2
3
4
5
6
7
8
9
1e-4
1e-4
1e-5
1e-5
1e-6
1e-4
1e-4
1e-5
1e-5
1e-6
However, I cannot find a way reset the learning rate to its initial value at the beginning of the regular training, since the scheduler is also loaded from the checkpoint.
Is there a way to do this?
I am using torch 1.9.0 und pytorch-lightning 1.3.8 and am not able to upgrade to later versions.
I came across the following solution.
Apparently, it's not that hard to implement and use a custom learning rate scheduler. I'll leave the code here in case anybody stumbles upon the same problem.
class MultiStepLRWithReset(_LRScheduler):
def __init__(self, optimizer, milestones, reset_epochs, reset_lr_to=None, gamma=0.1, last_epoch=-1, verbose=False):
full_milestones = milestones
for reset_epoch in reset_epochs:
full_milestones += [m + reset_epoch for m in milestones]
self.milestones = Counter(full_milestones)
self.reset_epochs = reset_epochs
self.reset_lr_to = reset_lr_to
self.gamma = gamma
super(MultiStepLRWithReset, self).__init__(optimizer, last_epoch, verbose)
def get_lr(self):
if not self._get_lr_called_within_step:
warnings.warn("To get the last learning rate computed by the scheduler, "
"please use `get_last_lr()`.", UserWarning)
if self.last_epoch in self.reset_epochs:
if self.reset_lr_to is None:
return [group['initial_lr'] for group in self.optimizer.param_groups]
else:
return [self.reset_lr_to for _ in self.optimizer.param_groups]
if self.last_epoch not in self.milestones:
return [group['lr'] for group in self.optimizer.param_groups]
return [group['lr'] * self.gamma ** self.milestones[self.last_epoch]
for group in self.optimizer.param_groups]
You will have to create a LRScheduler for the entire training as it will not be reinstantiated for the second training stage if all the pytorch training components are loaded from their last checkpoint.

Why does by torch.optim.SGD method learning rate change?

With SGD learning rate should not be changed during epochs but it is. Help me understand why it happens please and how to prevent this LR changing?
import torch
params = [torch.nn.Parameter(torch.randn(1, 1))]
optimizer = torch.optim.SGD(params, lr=0.9)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 1, gamma=0.9)
for epoch in range(5):
print(scheduler.get_lr())
scheduler.step()
Output is:
[0.9]
[0.7290000000000001]
[0.6561000000000001]
[0.5904900000000002]
[0.5314410000000002]
My torch version is 1.4.0
Since you are using the command torch.optim.lr_scheduler.StepLR(optimizer, 1, gamma=0.9) (meaning actually torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.9)) thus you are multiplying the learning rate by gamma=0.9 every step_size=1 step:
0.9 = 0.9
0.729 = 0.9*0.9*0.9
0.6561 = 0.9*0.9*0.9*0.9
0.59049 = 0.9*0.9*0.9*0.9*0.9
The only "strange" point is that it missing 0.81=0.9*0.9 at the second step (UPDATE: see Szymon Maszke answer for an explanation)
To prevent early decreasing, if you have N samples in your dataset, and the batch size is D, then set torch.optim.lr_scheduler.StepLR(optimizer, step_size=N/D, gamma=0.9) to decrease at each epoch. To decrease each E epoch set torch.optim.lr_scheduler.StepLR(optimizer, step_size=E*N/D, gamma=0.9)
This is just what torch.optim.lr_scheduler.StepLR is supposed to do. It changes the learning rate. From the pytorch documentation:
Decays the learning rate of each parameter group by gamma every step_size epochs. Notice that such decay can happen simultaneously with other changes to the learning rate from outside this scheduler. When last_epoch=-1, sets initial lr as lr
If you are trying to optimize params, your code should look more like this (just a toy example, the precise form of loss will depend on your application)
for epoch in range(5):
optimizer.zero_grad()
loss = (params[0]**2).sum()
loss.backward()
optimizer.step()
To expand upon xiawi's answer about "strange" behavior (0.81 is missing): It is PyTorch's default way since 1.1.0 release, check documentation, namely this part:
[...] If you use the learning rate scheduler (calling
scheduler.step()) before the optimizer’s update (calling
optimizer.step()), this will skip the first value of the learning rate
schedule.
Additionally you should get a UserWarning thrown by this function after the first get_lr() call as you do not call optimizer.step() at all.

How can I create a Keras Learning Rate Schedule that updates based upon batches rather than epochs

I'm working with Keras, and trying to create a Learning Rate Scheduler that schedules on the basis of number of batches processed, instead of number of epochs. To do this, I've inserted the scheduling code into the get_updates method of my `Optimizer'. For the most part, I've tried to use regular Python variables for values that remain constant during a given training run and computational graph nodes only for parameters that actually vary.
My 2 Questions are:
Does the code below look like it should behave properly as a Learning Rate Scheduler, if placed within the get_updates method of a Keras Optimizer.
How could one embed this code in a Class similar to LearningRateScheduler, but which scheduled based upon number of batches, rather than number of epochs?
#Copying graph node that stores original value of learning rate
lr = self.lr
# Checking whether learning rate schedule is to be used
if self.initial_lr_decay > 0:
# this decay mimics exponential decay from
# tensorflow/python/keras/optimizer_v2/exponential_decay
# Get value of current number of processed batches from graph node
# and convert to numeric value for use in K.pow()
curr_batch = float(K.get_value(self.iterations))
# Create graph node containing lr decay factor
# Note: self.lr_decay_steps is a number, not a node
# self.lr_decay is a node, not a number
decay_factor = K.pow(self.lr_decay, (curr_batch / self.lr_decay_steps))
# Reassign lr to graph node formed by
# product of graph node containing decay factor
# and graph node containing original learning rate.
lr = lr * decay_factor
# Get product of two numbers to calculate number of batches processed
# in warmup period
num_warmup_batches = self.steps_per_epoch_num * self.warmup_epochs
# Make comparisons between numbers to determine if we're in warmup period
if (self.warmup_epochs > 0) and (curr_batch < num_warmup_batches):
# Create node with value of learning rate by multiplying a number
# by a node, and then dividing by a number
lr = (self.initial_lr *
K.cast(self.iterations, K.floatx()) / curr_batch)
Easier than messing with Keras source code (it's possible, but it's complex and sensible), you could use a callback.
from keras.callbacks import LambdaCallback
total_batches = 0
def what_to_do_when_batch_ends(batch, logs):
total_batches += 1 #or use the "batch" variable,
#which is the batch index of the last finished batch
#change learning rate at will
if your_condition == True:
keras.backend.set_value(model.optimizer.lr, newLrValueAsPythonFloat)
When training, use the callback:
lrUpdater = LambdaCallback(on_batch_end = what_to_do_when_batch_ends)
model.fit(........, callbacks = [lrUpdater, ...other callbacks...])

How to apply random forest properly?

I am new to machine learning and python. Now I am trying to apply random forest to predict binary results of a target. In my data I have 24 predictors (1000 observations) where one of them is categorical(gender) and all the others numerical. Among numerical ones, there are two types of values which are volume of money in euros (very skewed and scaled) and numbers (number of transactions from an atm). I have transformed the big scale features and did the imputation. Last, I have checked correlation and collinearity and based on that removed some features (as a result I had 24 features.) Now when I implement RF it is always perfect in the training set while the ratios not so good according to crossvalidation. And even applying it in the test set it gives very very low recall values. How should I remedy this?
def classification_model(model, data, predictors, outcome):
# Fit the model:
model.fit(data[predictors], data[outcome])
# Make predictions on training set:
predictions = model.predict(data[predictors])
# Print accuracy
accuracy = metrics.accuracy_score(predictions, data[outcome])
print("Accuracy : %s" % "{0:.3%}".format(accuracy))
# Perform k-fold cross-validation with 5 folds
kf = KFold(data.shape[0], n_folds=5)
error = []
for train, test in kf:
# Filter training data
train_predictors = (data[predictors].iloc[train, :])
# The target we're using to train the algorithm.
train_target = data[outcome].iloc[train]
# Training the algorithm using the predictors and target.
model.fit(train_predictors, train_target)
# Record error from each cross-validation run
error.append(model.score(data[predictors].iloc[test, :], data[outcome].iloc[test]))
print("Cross-Validation Score : %s" % "{0:.3%}".format(np.mean(error)))
# Fit the model again so that it can be refered outside the function:
model.fit(data[predictors], data[outcome])
outcome_var = 'Sold'
model = RandomForestClassifier(n_estimators=20)
predictor_var = train.drop('Sold', axis=1).columns.values
classification_model(model,train,predictor_var,outcome_var)
#Create a series with feature importances:
featimp = pd.Series(model.feature_importances_, index=predictor_var).sort_values(ascending=False)
print(featimp)
outcome_var = 'Sold'
model = RandomForestClassifier(n_estimators=20, max_depth=20, oob_score = True)
predictor_var = ['fet1','fet2','fet3','fet4']
classification_model(model,train,predictor_var,outcome_var)
In Random Forest it is very easy to overfit. To resolve this you need to do parameter search a little more rigorously to know the best parameter to use. [Here](http://scikit-learn.org/stable/auto_examples/model_selection/randomized_search.html
) is the link on how to do this: (from the scikit doc).
It is overfitting and you need to search for the best parameter that will work work on the model. The link provides implementation for Grid and Randomized search for hyper parameter estimation.
And it will also be fun to go through this MIT Artificial Intelligence lecture to get get deep theoretical orientation: https://www.youtube.com/watch?v=UHBmv7qCey4&t=318s.
Hope this helps!

Resources