When using a 64 size batch, I need to fine-operate the loss value with respect to every single data point.
I know I can use reduction='none' when creating a loss function object then I can get a fine granularity loss value. But it's better to be a regular loss object without setting reduction='none', to keep consistency with other code.
It there any way to operate finer loss value without reduction='none'?
Why don't you wrap the function with your predefined options?
def custom_loss(*args, **kwargs):
return some_builtin(*args, **kwargs, reduction='none')
Where some_builtin would be a builtin PyTorch loss, e.g. torch.functional.l1_loss, torch.functional.mse_loss, ...
Related
In order to mimick a larger batch size, I want to be able to accumulate gradients every N batches for a model in PyTorch, like:
def train(model, optimizer, dataloader, num_epochs, N):
for epoch_num in range(1, num_epochs+1):
for batch_num, data in enumerate(dataloader):
ims = data.to('cuda:0')
loss = model(ims)
loss.backward()
if batch_num % N == 0:
optimizer.step()
optimizer.zero_grad(set_to_none=True)
For this approach do I need to add the flag retain_graph=True, i.e.
loss.backward(retain_graph=True)
In this manner, are the gradients per each backward call simply summed per each parameter?
You need to set retain_graph=True if you want to make multiple backward passes over the same computational graph, making use of the intermediate results from a single forward pass. This would have been the case, for instance, if you called loss.backward() multiple times after computing loss once, or if you had multiple losses from different parts of the graph to backpropagate from (a good explanation can be found here).
In your case, for each forward pass, you backpropagate exactly once. So you don't need to store the intermediate results from the computational graph once the gradients are computed.
In short:
Intermediate outputs in the graph are cleared after a backward pass, unless explicitly preserved using retain_graph=True.
Gradients accumulate by default, unless explicitly cleared using zero_grad.
I have a script that performs a Gatys-like neural style transfer. It uses style loss, and a total variation loss. I'm using the GradientTape() to compute my gradients. The losses that I have implemented seem to work fine, but a new loss that I added isn't being properly accounted for by the GradientTape(). I'm using TensorFlow with eager execution enabled.
I suspect it has something to do with how I compute the loss based on the input variable. The input is a 4D tensor (batch, h, w, channels). At the most basic level, the input is a floating point image, and in order to compute this new loss I need to convert it to a binary image to compute the ratio of one pixel color to another. I don't want to actually go and change the image like that during every iteration, so I just make a copy of the tensor(in numpy form) and operate on that to compute the loss. I do not understand the limitations of the GradientTape, but I believe it is "losing the thread" of how the input variable is used to get to the loss when it's converted to a numpy array.
Could I make a copy of the image tensor and perform binarizing operations & loss computation using that? Or am I asking tensorflow to do something that it just can not do?
My new loss function:
def compute_loss(self, **kwargs):
loss = 0
image = self.model.deprocess_image(kwargs['image'].numpy())
binarized_image = self.image_decoder.binarize_image(image)
volume_fraction = self.compute_volume_fraction(binarized_image)
loss = np.abs(self.volume_fraction_target - volume_fraction)
return loss
My implementation using the GradientTape:
def compute_grads_and_losses(self, style_transfer_state):
"""
Computes gradients with respect to input image
"""
with tf.GradientTape() as tape:
loss = self.loss_evaluator.compute_total_loss(style_transfer_state)
total_loss = loss['total_loss']
return tape.gradient(total_loss, style_transfer_state['image']), loss
An example that I believe might illustrate my confusion. The strangest thing is that my code doesn't have any problem running; it just doesn't seem to minimize the new loss term whatsoever. But this example won't even run due to an attribute error: AttributeError: 'numpy.float64' object has no attribute '_id'.
Example:
import tensorflow.contrib.eager as tfe
import tensorflow as tf
def compute_square_of_value(x):
a = turn_to_numpy(x['x'])
return a**2
def turn_to_numpy(arg):
return arg.numpy() #just return arg to eliminate the error
tf.enable_eager_execution()
x = tfe.Variable(3.0, dtype=tf.float32)
data_dict = {'x': x}
with tf.GradientTape() as tape:
tape.watch(x)
y = compute_square_of_value(data_dict)
dy_dx = tape.gradient(y, x) # Will compute to 6.0
print(dy_dx)
Edit:
From my current understanding the issue arises that my use of the .numpy() operation is what makes the Gradient Tape lose track of the variable to compute the gradient from. My original reason for doing this is because my loss operation requires me to physically change values of the tensor, and I don't want to actually change the values used for the tensor that is being optimized. Hence the use of the numpy() copy to work on in order to compute the loss properly. Is there any way around this? Or is shall I consider my loss calculation to be impossible to implement because of this constraint of having to perform essentially non-reversible operations on the input tensor?
The first issue here is that GradientTape only traces operations on tf.Tensor objects. When you call tensor.numpy() the operations executed there fall outside the tape.
The second issue is that your first example never calls tape.watche on the image you want to differentiate with respect to.
For example, I feed a set of images into a CNN. And the default weight of these images is 1. How can I re-weight some of these images so that they have different weights? Can 'DataLoader' achieve this goal in pytorch?
I learned two other possibilities:
Defining a custom loss function, providing weights for each sample as I require.
Repeating samples in the training set, which will result in more frequent samples having a higher weight in the final loss.
Is there any other way, we can achieve that? Any suggestion would be appreciated.
I can think of two ways to achieve this.
Pass on the weight explicitly, when you backpropagate the gradients.
After you computed loss, and when you're about to backpropagate, you can pass a Tensor to backward() and all the subsequent gradients will be scaled by the corresponding element, i.e. do something like
loss = torch.pow(out-target,2)
loss.backward(my_weights) # is a Tensor of same dimension as loss
However, if you use want to assign individual weights in a batch, you can't use the custom loss functions from the nn.module which aggregates the loss over all samples in a batch.
Use torch.utils.data.sampler.WeightedRandomSampler
If you use PyTorch's data.utils anyway, this is simpler than multiplying your training set. However it doesn't assign exact weights, since it's stochastic. But if you're iterating over your training set a sufficient number of times, it's probably close enough.
I'm writing a custom loss function in Keras and just tripped over the following:
Why do Keras loss functions have to return one scalar per batch item rather than just one scalar?
I care about the cumulative loss for the whole batch, not about the loss per item, don't I?
I think I figured it out: fit() has an argument sample_weight with which you can assign different weights to different samples in the batch. In order for this to work you need the loss function to return the loss per batch item.
How do I get the sample loss while training instead of the total loss? The loss history is available which gives the total batch loss but it doesn't provide the loss for individual samples.
If possible I would like to have something like this:
on_batch_end(batch, logs, **sample_losses**)
Is something like this available and if not can you provide some hints how to change the code to support this?
To the best of my knowledge it is not possible to get this information via callbacks since the loss is already computed once the callbacks are called (have a look at keras/engine/training.py). To simply inspect the losses you may override the loss function, e.g.:
def myloss(ytrue, ypred):
x = keras.objectives.mean_squared_error(ytrue, ypred)
return theano.printing.Print('loss for each sample')(x)
model.compile(loss=myloss)
Actually this can be done using a callback. This is now included in the keras documentation on callbacks. Define your own callback like this
class LossHistory(keras.callbacks.Callback):
def on_train_begin(self, logs={}):
self.losses = []
def on_batch_end(self, batch, logs={}):
self.losses.append(logs.get('loss'))
And then pass in this callback to your model. You should get per batch losses appended to the history ojbect.
I have also not found any existing functions in the Keras API that can return individual sample losses, while still computing on a minibatch. It seems you have to hack keras, or maybe access the tensorflow graph directly.
set batch size to 1 and use callbacks in model.evaluate OR manually calculate the loss between prediction (model.predict) and ground truth.