I am using Keras with TensorFlow backend, and I want to record the individual losses that are calculated during back propagation, for every training sample. This could be done by printing out each loss to the terminal, when it is calculated using the loss function.
But from what I have seen, there is no way to do this using the Keras API. So, my solution is to override one of the Keras loss functions, e.g.:
def mean_squared_error(y_true, y_pred):
loss = K.mean(K.square(y_pred - y_true), axis=-1)
print('Loss = ' + str(loss))
return loss
This compiles ok; however, nothing is printed to my terminal.
Any suggestions on why nothing is being printed, or what a better solution to this might be?
You can use keras.backend.print_tensor, which is just an identity transform that has the side-effect of printing the value of the tensor, and optionally a message. For your example, you can try:
import keras.backend as K
def mean_squared_error(y_true, y_pred):
loss = K.mean(K.square(y_pred - y_true), axis=-1)
return K.print_tensor(loss, message='Loss: ')
See the documentation for print_tensor and this answer for another example.
Related
I try to write a cross entropy loss function by myself. My loss function gives the same loss value as the official one, but when i use my loss function in the code instead of official cross entropy loss function, the code does not converge. When i use the official cross entropy loss function, the code converges. Here is my code, please give me some suggestions. Thanks very much
The input 'out' is a tensor (B*C) and 'label' contains class indices (1 * B)
class MylossFunc(nn.Module):
def __init__(self):
super(MylossFunc, self).__init__()
def forward(self, out, label):
out = torch.nn.functional.softmax(out, dim=1)
n = len(label)
loss = torch.FloatTensor([0])
loss = Variable(loss, requires_grad=True)
tmp = torch.log(out)
#print(out)
torch.scalar_tensor(-100)
for i in range(n):
loss = loss - torch.max(tmp[i][label[i]], torch.scalar_tensor(-100) )/n
loss = torch.sum(loss)
return loss
Instead of using torch.softmax and torch.log, you should use torch.log_softmax, otherwise your training will become unstable with nan values everywhere.
This happens because when you take the softmax of your logits using the following line:
out = torch.nn.functional.softmax(out, dim=1)
you might get a zero in one of the components of out, and when you follow that by applying torch.log it will result in nan (since log(0) is undefined). That is why torch (and other common libraries) provide a single stable operation, log_softmax, to avoid the numerical instabilities that occur when you use torch.softmax and torch.log individually.
i'm trying to define the loss function of a two-class classification problem. However, the target label is not hard label 0,1, but a float number between 0~1.
torch.nn.CrossEntropy in Pytorch do not support soft label so i'm trying to write a cross entropy function by my self.
My function looks like this
def cross_entropy(self, pred, target):
loss = -torch.mean(torch.sum(target.flatten() * torch.log(pred.flatten())))
return loss
def step(self, batch: Any):
x, y = batch
logits = self.forward(x)
loss = self.criterion(logits, y)
preds = logits
# torch.argmax(logits, dim=1)
return loss, preds, y
however it does not work at all.
Can anyone give me a suggestion is there any mistake in my loss function?
It seems like BCELoss and the robust version BCEWithLogitsLoss are working with fuzzy targets "out of the box". They do not expect target to be binary" any number between zero and one is fine.
Please read the doc.
I have to use an adaptive custom loss function that takes an additional dynamic argument (eps) in keras. The argument eps is a scalar but changes from one sample to the other : the loss function should be therefore adapted during training. I use a generator and I can pass this argument through every call of the generator during training (generator_train[2]). Based on answers to similar questions I tried to write the following wrapping:
def custom_loss(eps):
def square_err(y_true, y_pred):
nom = K.sum(K.square(y_pred - y_true), axis=-1)
denom = eps**2
loss = nom/denom
return loss
return square_err
But I am struggling with implementing it since eps is a dynamic variable: I don't know how I should pass this argument to the loss function during training (model.fit). Here is a simple version of my model:
model = keras.Sequential()
model.add(layers.LSTM(units=32, input_shape=(32, 4))
model.add(layers.Dense(units=1))
model.add_loss(custom_loss)
opt = keras.optimizers.Adam()
model.compile(optimizer=opt)
history = model.fit(x=generator_train[0], y=generator_train[1],
steps_per_epoch=100
epochs=50,
validation_data=gen_vl,
validation_steps=n_vl)
Your help would be very appreciated.
Simply pass "sample weights", which will be 1/(eps**2) for each sample.
Your generator should just output x, y, sample_weights and that's all.
Your loss can be:
def loss(y_true, y_pred):
return K.sum(K.square(y_pred - y_true), axis=-1)
In fit, you cannot use indexing in the generator, you will pass just generator_train, no x, no y, just generator_train.
Cross posting from Pytorch discussion boards
I want to train a network using a modified loss function that has both a typical classification loss (e.g. nn.CrossEntropyLoss) as well as a penalty on the Frobenius norm of the end-to-end Jacobian (i.e. if f(x) is the output of the network, \nabla_x f(x)).
I’ve implemented a model that can successfully learn using nn.CrossEntropyLoss. However, when I try adding the second loss function (by doing two backwards passes), my training loop runs, but the model never learns. Furthermore, if I calculate the end-to-end Jacobian, but don’t include it in the loss function, the model also never learns. At a high level, my code does the following:
Forward pass to get predicted classes, yhat, from inputs x
Call yhat.backward(torch.ones(appropriate shape), retain_graph=True)
Jacobian norm = x.grad.data.norm(2)
Set loss equal to classification loss + scalar coefficient * jacobian norm
Run loss.backward()
I suspect that I’m misunderstanding how backward() works when run twice, but I haven’t been able to find any good resources to clarify this.
Too much is required to produce a working example, so I’ve tried to extract the relevant code:
def train_model(model, train_dataloader, optimizer, loss_fn, device=None):
if device is None:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.train()
train_loss = 0
correct = 0
for batch_idx, (batch_input, batch_target) in enumerate(train_dataloader):
batch_input, batch_target = batch_input.to(device), batch_target.to(device)
optimizer.zero_grad()
batch_input.requires_grad_(True)
model_batch_output = model(batch_input)
loss = loss_fn(model_output=model_batch_output, model_input=batch_input, model=model, target=batch_target)
train_loss += loss.item() # sum up batch loss
loss.backward()
optimizer.step()
and
def end_to_end_jacobian_loss(model_output, model_input):
model_output.backward(
torch.ones(*model_output.shape),
retain_graph=True)
jacobian = model_input.grad.data
jacobian_norm = jacobian.norm(2)
return jacobian_norm
Edit 1: I swapped my previous implementation with .backward() to autograd.grad and it apparently works! What's the difference?
def end_to_end_jacobian_loss(model_output, model_input):
jacobian = autograd.grad(
outputs=model_output['penultimate_layer'],
inputs=model_input,
grad_outputs=torch.ones(*model_output['penultimate_layer'].shape),
retain_graph=True,
only_inputs=True)[0]
jacobian_norm = jacobian.norm(2)
return jacobian_norm
I want to implement a two-step learning process where:
pre-train a model for a few epochs using the loss function loss_1
change the loss function to loss_2 and continue the training for fine-tuning
Currently, my approach is:
model.compile(optimizer=opt, loss=loss_1, metrics=['accuracy'])
model.fit_generator(…)
model.compile(optimizer=opt, loss=loss_2, metrics=['accuracy'])
model.fit_generator(…)
Note that the optimizer remains the same, and only the loss function changes. I'd like to smoothly continue training, but with a different loss function. According to this post, re-compiling the model loses the optimizer state. Questions:
a) Will I lose the optimizer state even if I use the same optimizer, eg Adam?
b) if the answer to a) is yes, any suggestions on how to change the loss function to a new one without reseting the optimizer state?
EDIT:
As suggested by Simon Caby and based on this thread, I created a custom loss function with two loss computations that depend on epoch number. However, it does not work for me. My approach:
def loss_wrapper(t_change, current_epoch):
def custom_loss(y_true, y_pred):
c_epoch = K.get_value(current_epoch)
if c_epoch < t_change:
# compute loss_1
else:
# compute loss_2
return custom_loss
And I compile as follows, after initializing current_epoch:
current_epoch = K.variable(0.)
model.compile(optimizer=opt, loss=loss_wrapper(5, current_epoch), metrics=...)
To update the current_epoch, I create the following callback:
class NewCallback(Callback):
def __init__(self, current_epoch):
self.current_epoch = current_epoch
def on_epoch_end(self, epoch, logs={}):
K.set_value(self.current_epoch, epoch)
model.fit_generator(..., callbacks=[NewCallback(current_epoch)])
The callback updates self.current_epoch every epoch correctly. But the update does not reach the custom loss function. Instead, current_epoch keeps the initialization value forever, and loss_2 is never executed.
Any suggestion is welcome, thanks!
My answers :
a) yes, and you should probably make your own learning rate scheduler in order to keep control of it :
keras.callbacks.LearningRateScheduler(schedule, verbose=0)
b) yes you can create your own loss function, including one that flutuates between two different loss methods. see : "Advanced Keras — Constructing Complex Custom Losses and Metrics"
https://towardsdatascience.com/advanced-keras-constructing-complex-custom-losses-and-metrics-c07ca130a618
If you change:
def loss_wrapper(t_change, current_epoch):
def custom_loss(y_true, y_pred):
c_epoch = K.get_value(current_epoch)
if c_epoch < t_change:
# compute loss_1
else:
# compute loss_2
return custom_loss
to:
def loss_wrapper(t_change, current_epoch):
def custom_loss(y_true, y_pred):
# compute loss_1 and loss_2
bool_case_1=K.less(current_epoch,t_change)
num_case_1=K.cast(bool_case_1,"float32")
loss = (num_case_1)*loss_1 + (1-num_case_1)*loss_2
return loss
return custom_loss
it works.
We are essentially required to turn python code into compositions of backend functions for the loss to work without having to update in a re-compile of model.compile(...). I am not satisfied with these hacks, and wish it was possible to set model.loss in a callback without re-compiling model.compile(...) after (since then the optimizer states are reset).