Resume training with different loss function - keras

I want to implement a two-step learning process where:
pre-train a model for a few epochs using the loss function loss_1
change the loss function to loss_2 and continue the training for fine-tuning
Currently, my approach is:
model.compile(optimizer=opt, loss=loss_1, metrics=['accuracy'])
model.fit_generator(…)
model.compile(optimizer=opt, loss=loss_2, metrics=['accuracy'])
model.fit_generator(…)
Note that the optimizer remains the same, and only the loss function changes. I'd like to smoothly continue training, but with a different loss function. According to this post, re-compiling the model loses the optimizer state. Questions:
a) Will I lose the optimizer state even if I use the same optimizer, eg Adam?
b) if the answer to a) is yes, any suggestions on how to change the loss function to a new one without reseting the optimizer state?
EDIT:
As suggested by Simon Caby and based on this thread, I created a custom loss function with two loss computations that depend on epoch number. However, it does not work for me. My approach:
def loss_wrapper(t_change, current_epoch):
def custom_loss(y_true, y_pred):
c_epoch = K.get_value(current_epoch)
if c_epoch < t_change:
# compute loss_1
else:
# compute loss_2
return custom_loss
And I compile as follows, after initializing current_epoch:
current_epoch = K.variable(0.)
model.compile(optimizer=opt, loss=loss_wrapper(5, current_epoch), metrics=...)
To update the current_epoch, I create the following callback:
class NewCallback(Callback):
def __init__(self, current_epoch):
self.current_epoch = current_epoch
def on_epoch_end(self, epoch, logs={}):
K.set_value(self.current_epoch, epoch)
model.fit_generator(..., callbacks=[NewCallback(current_epoch)])
The callback updates self.current_epoch every epoch correctly. But the update does not reach the custom loss function. Instead, current_epoch keeps the initialization value forever, and loss_2 is never executed.
Any suggestion is welcome, thanks!

My answers :
a) yes, and you should probably make your own learning rate scheduler in order to keep control of it :
keras.callbacks.LearningRateScheduler(schedule, verbose=0)
b) yes you can create your own loss function, including one that flutuates between two different loss methods. see : "Advanced Keras — Constructing Complex Custom Losses and Metrics"
https://towardsdatascience.com/advanced-keras-constructing-complex-custom-losses-and-metrics-c07ca130a618

If you change:
def loss_wrapper(t_change, current_epoch):
def custom_loss(y_true, y_pred):
c_epoch = K.get_value(current_epoch)
if c_epoch < t_change:
# compute loss_1
else:
# compute loss_2
return custom_loss
to:
def loss_wrapper(t_change, current_epoch):
def custom_loss(y_true, y_pred):
# compute loss_1 and loss_2
bool_case_1=K.less(current_epoch,t_change)
num_case_1=K.cast(bool_case_1,"float32")
loss = (num_case_1)*loss_1 + (1-num_case_1)*loss_2
return loss
return custom_loss
it works.
We are essentially required to turn python code into compositions of backend functions for the loss to work without having to update in a re-compile of model.compile(...). I am not satisfied with these hacks, and wish it was possible to set model.loss in a callback without re-compiling model.compile(...) after (since then the optimizer states are reset).

Related

Why my cross entropy loss function does not converge?

I try to write a cross entropy loss function by myself. My loss function gives the same loss value as the official one, but when i use my loss function in the code instead of official cross entropy loss function, the code does not converge. When i use the official cross entropy loss function, the code converges. Here is my code, please give me some suggestions. Thanks very much
The input 'out' is a tensor (B*C) and 'label' contains class indices (1 * B)
class MylossFunc(nn.Module):
def __init__(self):
super(MylossFunc, self).__init__()
def forward(self, out, label):
out = torch.nn.functional.softmax(out, dim=1)
n = len(label)
loss = torch.FloatTensor([0])
loss = Variable(loss, requires_grad=True)
tmp = torch.log(out)
#print(out)
torch.scalar_tensor(-100)
for i in range(n):
loss = loss - torch.max(tmp[i][label[i]], torch.scalar_tensor(-100) )/n
loss = torch.sum(loss)
return loss
Instead of using torch.softmax and torch.log, you should use torch.log_softmax, otherwise your training will become unstable with nan values everywhere.
This happens because when you take the softmax of your logits using the following line:
out = torch.nn.functional.softmax(out, dim=1)
you might get a zero in one of the components of out, and when you follow that by applying torch.log it will result in nan (since log(0) is undefined). That is why torch (and other common libraries) provide a single stable operation, log_softmax, to avoid the numerical instabilities that occur when you use torch.softmax and torch.log individually.

Custom adaptive loss function with additional dynamic argument in Keras

I have to use an adaptive custom loss function that takes an additional dynamic argument (eps) in keras. The argument eps is a scalar but changes from one sample to the other : the loss function should be therefore adapted during training. I use a generator and I can pass this argument through every call of the generator during training (generator_train[2]). Based on answers to similar questions I tried to write the following wrapping:
def custom_loss(eps):
def square_err(y_true, y_pred):
nom = K.sum(K.square(y_pred - y_true), axis=-1)
denom = eps**2
loss = nom/denom
return loss
return square_err
But I am struggling with implementing it since eps is a dynamic variable: I don't know how I should pass this argument to the loss function during training (model.fit). Here is a simple version of my model:
model = keras.Sequential()
model.add(layers.LSTM(units=32, input_shape=(32, 4))
model.add(layers.Dense(units=1))
model.add_loss(custom_loss)
opt = keras.optimizers.Adam()
model.compile(optimizer=opt)
history = model.fit(x=generator_train[0], y=generator_train[1],
steps_per_epoch=100
epochs=50,
validation_data=gen_vl,
validation_steps=n_vl)
Your help would be very appreciated.
Simply pass "sample weights", which will be 1/(eps**2) for each sample.
Your generator should just output x, y, sample_weights and that's all.
Your loss can be:
def loss(y_true, y_pred):
return K.sum(K.square(y_pred - y_true), axis=-1)
In fit, you cannot use indexing in the generator, you will pass just generator_train, no x, no y, just generator_train.

Custom backward/optimization steps in pytorch-lightning

I would like to implement the training loop below in pytorch-lightning (to be read as pseudo-code). The peculiarity is that the backward and optimization steps are not performed for every batch.
(Background: I am trying to implement a few-shots learning algorithm; although I need to make predictions at every step -- forward method
-- I need to perform the gradient updates at random -- if- block.
for batch in batches:
x, y = batch
loss = forward(x,y)
optimizer.zero_grad()
if np.random.rand() > 0.5:
loss.backward()
optimizer.step()
My proposed solution entails implementing the backward and the optimizer_step methods as follows:
def backward(self, use_amp, loss, optimizer):
self.compute_grads = False
if np.random.rand() > 0.5:
loss.backward()
nn.utils.clip_grad_value_(self.enc.parameters(), 1)
nn.utils.clip_grad_value_(self.dec.parameters(), 1)
self.compute_grads = True
return
def optimizer_step(self, current_epoch, batch_nb, optimizer, optimizer_i, second_order_closure=None):
if self.compute_grads:
optimizer.step()
optimizer.zero_grad()
return
Note: In this way I need to store a compute_grads attribute at the class level.
What is the "best-practice" way to implement it in pytorch-lightning? Is there a better way to use the hooks?
This is a good way to do it! that's what the hooks are for.
There is a new Callbacks module that might also be helpful:
https://pytorch-lightning.readthedocs.io/en/0.7.1/callbacks.html

Pytorch: Custom Loss involving Norm of End-to-End Jacobian

Cross posting from Pytorch discussion boards
I want to train a network using a modified loss function that has both a typical classification loss (e.g. nn.CrossEntropyLoss) as well as a penalty on the Frobenius norm of the end-to-end Jacobian (i.e. if f(x) is the output of the network, \nabla_x f(x)).
I’ve implemented a model that can successfully learn using nn.CrossEntropyLoss. However, when I try adding the second loss function (by doing two backwards passes), my training loop runs, but the model never learns. Furthermore, if I calculate the end-to-end Jacobian, but don’t include it in the loss function, the model also never learns. At a high level, my code does the following:
Forward pass to get predicted classes, yhat, from inputs x
Call yhat.backward(torch.ones(appropriate shape), retain_graph=True)
Jacobian norm = x.grad.data.norm(2)
Set loss equal to classification loss + scalar coefficient * jacobian norm
Run loss.backward()
I suspect that I’m misunderstanding how backward() works when run twice, but I haven’t been able to find any good resources to clarify this.
Too much is required to produce a working example, so I’ve tried to extract the relevant code:
def train_model(model, train_dataloader, optimizer, loss_fn, device=None):
if device is None:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.train()
train_loss = 0
correct = 0
for batch_idx, (batch_input, batch_target) in enumerate(train_dataloader):
batch_input, batch_target = batch_input.to(device), batch_target.to(device)
optimizer.zero_grad()
batch_input.requires_grad_(True)
model_batch_output = model(batch_input)
loss = loss_fn(model_output=model_batch_output, model_input=batch_input, model=model, target=batch_target)
train_loss += loss.item() # sum up batch loss
loss.backward()
optimizer.step()
and
def end_to_end_jacobian_loss(model_output, model_input):
model_output.backward(
torch.ones(*model_output.shape),
retain_graph=True)
jacobian = model_input.grad.data
jacobian_norm = jacobian.norm(2)
return jacobian_norm
Edit 1: I swapped my previous implementation with .backward() to autograd.grad and it apparently works! What's the difference?
def end_to_end_jacobian_loss(model_output, model_input):
jacobian = autograd.grad(
outputs=model_output['penultimate_layer'],
inputs=model_input,
grad_outputs=torch.ones(*model_output['penultimate_layer'].shape),
retain_graph=True,
only_inputs=True)[0]
jacobian_norm = jacobian.norm(2)
return jacobian_norm

Getting loss for individual training samples

I am using Keras with TensorFlow backend, and I want to record the individual losses that are calculated during back propagation, for every training sample. This could be done by printing out each loss to the terminal, when it is calculated using the loss function.
But from what I have seen, there is no way to do this using the Keras API. So, my solution is to override one of the Keras loss functions, e.g.:
def mean_squared_error(y_true, y_pred):
loss = K.mean(K.square(y_pred - y_true), axis=-1)
print('Loss = ' + str(loss))
return loss
This compiles ok; however, nothing is printed to my terminal.
Any suggestions on why nothing is being printed, or what a better solution to this might be?
You can use keras.backend.print_tensor, which is just an identity transform that has the side-effect of printing the value of the tensor, and optionally a message. For your example, you can try:
import keras.backend as K
def mean_squared_error(y_true, y_pred):
loss = K.mean(K.square(y_pred - y_true), axis=-1)
return K.print_tensor(loss, message='Loss: ')
See the documentation for print_tensor and this answer for another example.

Resources