gradients of custom loss - python-3.x

Suppose a model as in:
model = Model(inputs=[A, B], outputs=C)
With custom loss:
def actor_loss(y_true, y_pred):
log_lik = y_true * K.log(y_pred)
loss = -K.sum(log_lik * K.stop_gradient(B))
return loss
Now I'm trying to define a function that returns the gradients of the loss wrt to the weights for a given pair of input and target output and expose it as such.
Here is an idea of what I mean in pseudocode
def _get_grads(inputs, targets):
loss = model.loss(targets, model.output)
weights = model.trainable_weights
grads = K.gradients(loss, weights)
model.input[0] (aka 'A') <----inputs[0]
model.input[1] (aka 'B') <----inputs[1]
return K.function(model.input, grads)
self.get_grads = _get_grads
My question is how do I feed inputs argument to the graph inside said function.
(So far I've only worked with .fit and not with .gradients and I can't find any decent documentation with custom loss or multiple inputs)

If you call K.function, you get an actual callable function, so you should just call it with some parameter values. The format is exactly the same as model.fit, in your case it should be two arrays of values, including the batch dimension:
self.get_grads = _get_grads(inputs, targets)
grad_value = self.get_grads([input1, input2])
Where input1 and input2 are numpy arrays that include the batch dimension.

My understanding of K.function ,K.gradients and custom loss was fundamentally wrong. You use the function to construct a mini-graph that computes gradients of loss wrt to weights. No need for the function itself to have arguments.
def _get_grads():
targets = Input(shape=...)
loss = model.loss(targets, model.output)
weights = model.trainable_weights
grads = K.gradients(loss, weights)
return K.function(model.input + [targets], grads)
I was under the impression that _get_grads was itself K.function but that was wrong. _get_grads() returns K.function. And then you use that as
f = _get_grads() # constructs the mini-graph that gives gradients
grads = f([inputs, labels])
inputs is fed to model.inputs, labels to targets and it returns grads.

Related

Using weights in CrossEntropyLoss and BCELoss (PyTorch)

I am training a PyTorch model to perform binary classification. My minority class makes up about 10% of the data, so I want to use a weighted loss function. The docs for BCELoss and CrossEntropyLoss say that I can use a 'weight' for each sample.
However, when I declare CE_loss = nn.BCELoss() or nn.CrossEntropyLoss() and then do CE_Loss(output, target, weight=batch_weights), where output, target, and batch_weights are Tensors of batch_size, I get the following error message:
forward() got an unexpected keyword argument 'weight'
Another way you could accomplish your goal is to use reduction=none when initializing the loss and then multiply the resulting tensor by your weights before computing the mean.
e.g.
loss = torch.nn.BCELoss(reduction='none')
model = torch.sigmoid
weights = torch.rand(10,1)
inputs = torch.rand(10,1)
targets = torch.rand(10,1)
intermediate_losses = loss(model(inputs), targets)
final_loss = torch.mean(weights*intermediate_losses)
Of course for your scenario you still would need to calculate the weights tensor. But hopefully this helps!
Could it be that you want to apply separate fixed weights to all elements of class 0 and class 1 in your dataset? It is not clear what value you are passing for batch_weights here. If so, then that is not what the weight parameter in BCELoss does. The weight parameter expects you to pass a separate weight for every ELEMENT in the dataset, not for every CLASS. There are several ways around this. You could construct a weight table for every element. Alternatively, you could use a custom loss function that does what you want:
def BCELoss_class_weighted(weights):
def loss(input, target):
input = torch.clamp(input,min=1e-7,max=1-1e-7)
bce = - weights[1] * target * torch.log(input) - (1 - target) * weights[0] * torch.log(1 - input)
return torch.mean(bce)
return loss
Note that it is important to add a clamp to avoid numerical instability.
HTH Jeroen
the issue is wherein your providing the weight parameter. As it is mentioned in the docs, here, the weights parameter should be provided during module instantiation.
For example, something like,
from torch import nn
weights = torch.FloatTensor([2.0, 1.2])
loss = nn.BCELoss(weights=weights)
You can find a more concrete example here or another helpful PT forum discussion here.
you need to pass weights like below:
CE_loss = CrossEntropyLoss(weight=[…])
This is similar to the idea of #Jeroen Vuurens, but the class weights are determined by the target mean:
y_train_mean = y_train.mean()
bi_cls_w2 = 1/(1 - y_train_mean)
bi_cls_w1 = 1/y_train_mean - bi_cls_w2
bce_loss = nn.BCELoss(reduction='none')
loss_fun = lambda pred, target: ((bi_cls_w1*target + bi_cls_w2) * bce_loss(pred, target)).mean()

Custom adaptive loss function with additional dynamic argument in Keras

I have to use an adaptive custom loss function that takes an additional dynamic argument (eps) in keras. The argument eps is a scalar but changes from one sample to the other : the loss function should be therefore adapted during training. I use a generator and I can pass this argument through every call of the generator during training (generator_train[2]). Based on answers to similar questions I tried to write the following wrapping:
def custom_loss(eps):
def square_err(y_true, y_pred):
nom = K.sum(K.square(y_pred - y_true), axis=-1)
denom = eps**2
loss = nom/denom
return loss
return square_err
But I am struggling with implementing it since eps is a dynamic variable: I don't know how I should pass this argument to the loss function during training (model.fit). Here is a simple version of my model:
model = keras.Sequential()
model.add(layers.LSTM(units=32, input_shape=(32, 4))
model.add(layers.Dense(units=1))
model.add_loss(custom_loss)
opt = keras.optimizers.Adam()
model.compile(optimizer=opt)
history = model.fit(x=generator_train[0], y=generator_train[1],
steps_per_epoch=100
epochs=50,
validation_data=gen_vl,
validation_steps=n_vl)
Your help would be very appreciated.
Simply pass "sample weights", which will be 1/(eps**2) for each sample.
Your generator should just output x, y, sample_weights and that's all.
Your loss can be:
def loss(y_true, y_pred):
return K.sum(K.square(y_pred - y_true), axis=-1)
In fit, you cannot use indexing in the generator, you will pass just generator_train, no x, no y, just generator_train.

How to create a custom loss function in Keras that evaluates prediction after each epoch?

I'm working on a neural network in Keras that translates English sentences into a custom language. For this, I'd like to create a custom loss function that takes the prediction for each sentence and evaluates whether it complies with the grammar rules of the custom language and if not adds value to the standard loss function.
How can I evaluate a tensor after each epoch but not during compilation?
Below is my custom loss function. As during compilation of the model there is no batch yet, y_pred has the shape (None, x, y) and can't be evaluated to get the prediction. My idea to circumvent this was to assign a standard loss function during compilation and when batches arrive calculate the custom loss. Unfortunately the custom loss is never reached.
def custom_loss(tokenizer, punishment_rate):
def compile_loss(y_true, y_pred):
shape = K.int_shape(y_pred)
#standard loss function
loss = K.sparse_categorical_crossentropy(y_true, y_pred)
if shape[0] is not None:
#THIS is never reached and that's the problem
prediction = logits_to_text(K.eval(y_pred), tokenizer)
#test if prediction complies to grammar rules
compileable = compiles(prediction) ^ 1
compile_error = compileable * punishment_rate
loss = K.sparse_categorical_crossentropy(y_true, y_pred, axis=-1) * (1 + compile_error)
return loss
return compile_loss
Is there any workaround for evaluating a tensor only when it was filled with a batch? Or alternatively, change the loss function after compilation of the model via a callback without it having to recompile the model?
As per keras source, you can use a Loss Function Wrapper to create a Custom Loss Function class and then pass it to your model seamlessly.
As an example:
#Import the wrapper
from keras.losses import LossFunctionWrapper
#Create your class extending the wrapper
class MyLossFunction(LossFunctionWrapper):
#Implement the constructor - here you can give extended arguments to it.
def __init__(self,
tokenizer,
punishment_rate,
reduction=losses_utils.Reduction.SUM_OVER_BATCH_SIZE,
name='my_custom_text_function'):
super(MyLossFunction, self).__init__(
my_function,
name=name,
reduction=reduction,
tokenizer = tokenizer,
punishment_rate= punishment_rate)
#Now you have to define your function "my_function":
#Please, notice that ALL loss functions that follow keras model needs two arguments:
#y_true (correct result) and y_pred (the result obtained in the network).
def my_function(y_true, y_pred, tokenizer, punishment_rate):
shape = K.int_shape(y_pred)
if shape[0] is not None:
prediction = logits_to_text(K.eval(y_pred), tokenizer)
#test if prediction complies to grammar rules
compileable = compiles(prediction) ^ 1
compile_error = compileable * punishment_rate
return K.sparse_categorical_crossentropy(y_true, y_pred, axis=-1) * (1 + compile_error)
return K.sparse_categorical_crossentropy(y_true, y_pred)
You can then instantiate it and use in your compiler:
custom_loss= MyLossFunction(tokenizer = ..., punishment_rate = ...)
classifier.compile(optimizer=optimizer,
loss=custom_loss,
metrics= ['binary_accuracy'])

keras: unsupervised learning with external constraint

I have to train a network on unlabelled data of binary type (True/False), which sounds like unsupervised learning. This is what the normalised data look like:
array([[-0.05744527, -1.03575495, -0.1940105 , -1.15348956, -0.62664491,
-0.98484037],
[-0.05497629, -0.50935675, -0.19396862, -0.68990988, -0.10551919,
-0.72375012],
[-0.03275552, 0.31480204, -0.1834951 , 0.23724946, 0.15504367,
0.29810553],
...,
[-0.05744527, -0.68482282, -0.1940105 , -0.87534175, -0.23580062,
-0.98484037],
[-0.05744527, -1.50366446, -0.1940105 , -1.52435329, -1.14777063,
-0.98484037],
[-0.05744527, -1.26970971, -0.1940105 , -1.33892142, -0.88720777,
-0.98484037]])
However, I do have a constraint on the total number of True labels in my data. This doesn't mean I can build a classical custom loss function in Keras taking (y_true, y_pred) arguments as required: my external constraint is just on the predicted total of True and False, not on the individual labels.
My question is whether there is a somewhat "standard" approach to this kind of problems, and how that is implementable in Keras.
POSSIBLE SOLUTION
Should I assign y_true randomly as 0/1, have a network return y_pred as 1/0 with a sigmoid activation function, and then define my loss function as
sum_y_true = 500 # arbitrary constant known a priori
def loss_function(y_true, y_pred):
loss = np.abs(y_pred.sum() - sum_y_true)
return loss
In the end, I went with the following solution, which worked.
1) Define batches in your dataframe df with a batch_id column, so that in each batch Y_train is your identical "batch ground truth" (in my case, the total number of True labels in the batch). You can then pass these instances together to the network. This can be done with a generator:
def grouper(g,x,y):
while True:
for gr in g.unique():
# this assigns indices to the entire set of values in g,
# then subsects to all the rows in which g == gr
indices = g == gr
yield (x[indices],y[indices])
# train set
train_generator = grouper(df.loc[df['set'] == 'train','batch_id'], X_train, Y_train)
# validation set
val_generator = grouper(df.loc[df['set'] == 'val','batch_id'], X_val, Y_val)
2) define a custom loss function, to track how close the total number of instances predicted as true matches the ground truth:
def custom_delta(y_true, y_pred):
loss = K.abs(K.mean(y_true) - K.sum(y_pred))
return loss
def custom_wrapper():
def custom_loss_function(y_true, y_pred):
return custom_delta(y_true, y_pred)
return custom_loss_function
Note that here
a) Each y_true label is already the sum of the ground truth in our batch (cause we don't have individual values). That's why y_true is not summed over;
b) K.mean is actually a bit of an overkill to extract a single scalar from this uniform tensor, in which all y_true values in each batch are identical - K.min or K.max would also work, but I haven't tested whether their performance is faster.
3) Use fit_generator instead of fit:
fmodel = Sequential()
# ...your layers...
# Create the loss function object using the wrapper function above
loss_ = custom_wrapper()
fmodel.compile(loss=loss_, optimizer='adam')
history1 = fmodel.fit_generator(train_generator, steps_per_epoch=total_batches,
validation_data=val_generator,
validation_steps=df.loc[encs.df['set'] == 'val','batch_id'].nunique(),
epochs=20, verbose = 2)
This way the problem is basically addressed as one of supervised learning, although without individual labels, which means that notions like true/false positive are meaningless here.
This approach not only managed to give me a y_pred that closely matches the totals I know per batch. It actually finds two groups (True/False) that occupy the expected different portions of parameter space.

Keras Implementation of Customized Loss Function that need internal layer output as label

in keras, I want to customize my loss function which not only takes (y_true, y_pred) as input but also need to use the output from the internal layer of the network as the label for an output layer.This picture shows the Network Layout
Here, the internal output is xn, which is a 1D feature vector. in the upper right corner, the output is xn', which is the prediction of xn. In other words, xn is the label for xn'.
While [Ax, Ay] is traditionally known as y_true, and [Ax',Ay'] is y_pred.
I want to combine these two loss components into one and train the network jointly.
Any ideas or thoughts are much appreciated!
I have figured out a way out, in case anyone is searching for the same, I posted here (based on the network given in this post):
The idea is to define the customized loss function and use it as the output of the network. (Notation: A is the true label of variable A, and A' is the predicted value of variable A)
def customized_loss(args):
#A is from the training data
#S is the internal state
A, A', S, S' = args
#customize your own loss components
loss1 = K.mean(K.square(A - A'), axis=-1)
loss2 = K.mean(K.square(S - S'), axis=-1)
#adjust the weight between loss components
return 0.5 * loss1 + 0.5 * loss2
def model():
#define other inputs
A = Input(...) # define input A
#construct your model
cnn_model = Sequential()
...
# get true internal state
S = cnn_model(prev_layer_output0)
# get predicted internal state output
S' = Dense(...)(prev_layer_output1)
# get predicted A output
A' = Dense(...)(prev_layer_output2)
# customized loss function
loss_out = Lambda(customized_loss, output_shape=(1,), name='joint_loss')([A, A', S, S'])
model = Model(input=[...], output=[loss_out])
return model
def train():
m = model()
opt = 'adam'
model.compile(loss={'joint_loss': lambda y_true, y_pred:y_pred}, optimizer = opt)
# train the model
....
First of all you should be using the Functional API. Then you should define the network output as the output plus the result from the internal layer, merge them into a single output (by concatenating), and then make a custom loss function that then splits the merged output into two parts and does the loss computations on its own.
Something like:
def customLoss(y_true, y_pred):
#loss here
internalLayer = Convolution2D()(inputs) #or other layers
internalModel = Model(input=inputs, output=internalLayer)
tmpOut = Dense(...)(internalModel)
mergedOut = merge([tmpOut, mergedOut], mode = "concat", axis = -1)
fullModel = Model(input=inputs, output=mergedOut)
fullModel.compile(loss = customLoss, optimizer = "whatever")
I have my reservations regarding this implementation. The loss computed at the merged layer is propagated back to both the merged branches. Generally you would like to propagate it through just one layer.

Resources