How to define precision as loss Function in Keras? - keras

It’s known that sparse_categorical_crossentropy in keras can get the average loss function among each category. But what if only one certain category was I concerned most? Like if I want to define the precision(=TP/(TP+FP)) based on this category as loss function, how can I write it? Thanks!
My codes were like:
from keras import backend as K
def my_loss(y_true,y_pred):
y_true = K.cast(y_true,"float32")
y_pred = K.cast(K.argmax(y_pred),"float32")
nominator = K.sum(K.cast(K.equal(y_true,y_pred) & K.equal(y_true, 0),"float32"))
denominator = K.sum(K.cast(K.equal(y_pred,0),"float32"))
return -(nominator + K.epsilon()) / (denominator + K.epsilon())
And the error is like:
argmax is not differentiable

I don't recommend you to use precision as the loss function.
It is not differentiable that can't be set as a loss function for nn.
you can max it by predicting all the instance as class negative, that makes no sense.
One of the alternative solution is using F1 as the loss function, then tuning the probability cut-off manually for obtaining a desirable level of precision as well as recall is not too low.

You can pass to the fit method a parameter class_weight where you determine which classes are more important.
It should be a dictionary:
{
0: 1, #class 0 has weight 1
1: 0.5, #class 1 has half the importance of class 0
2: 0.7, #....
...
}
Custom loss
If that is not exactly what you need, you can create loss functions like:
import keras.backend as K
def customLoss(yTrue,yPred):
create operations with yTrue and yPred
- yTrue = the true output data (equal to y_train in most examples)
- yPred = the model's calculated output
- yTrue and yPred have exactly the same shape: (batch_size,output_dimensions,....)
- according to the output shape of the last layer
- also according to the shape of y_train
all operations must be like +, -, *, / or operations from K (backend)
return someResultingTensor

You cannot used argmax as it is not differentiable. That means that backprop will not work if loss function can't be differentiated.
Instead of using argmax, do y_true * y_pred.

Related

Using weights in CrossEntropyLoss and BCELoss (PyTorch)

I am training a PyTorch model to perform binary classification. My minority class makes up about 10% of the data, so I want to use a weighted loss function. The docs for BCELoss and CrossEntropyLoss say that I can use a 'weight' for each sample.
However, when I declare CE_loss = nn.BCELoss() or nn.CrossEntropyLoss() and then do CE_Loss(output, target, weight=batch_weights), where output, target, and batch_weights are Tensors of batch_size, I get the following error message:
forward() got an unexpected keyword argument 'weight'
Another way you could accomplish your goal is to use reduction=none when initializing the loss and then multiply the resulting tensor by your weights before computing the mean.
e.g.
loss = torch.nn.BCELoss(reduction='none')
model = torch.sigmoid
weights = torch.rand(10,1)
inputs = torch.rand(10,1)
targets = torch.rand(10,1)
intermediate_losses = loss(model(inputs), targets)
final_loss = torch.mean(weights*intermediate_losses)
Of course for your scenario you still would need to calculate the weights tensor. But hopefully this helps!
Could it be that you want to apply separate fixed weights to all elements of class 0 and class 1 in your dataset? It is not clear what value you are passing for batch_weights here. If so, then that is not what the weight parameter in BCELoss does. The weight parameter expects you to pass a separate weight for every ELEMENT in the dataset, not for every CLASS. There are several ways around this. You could construct a weight table for every element. Alternatively, you could use a custom loss function that does what you want:
def BCELoss_class_weighted(weights):
def loss(input, target):
input = torch.clamp(input,min=1e-7,max=1-1e-7)
bce = - weights[1] * target * torch.log(input) - (1 - target) * weights[0] * torch.log(1 - input)
return torch.mean(bce)
return loss
Note that it is important to add a clamp to avoid numerical instability.
HTH Jeroen
the issue is wherein your providing the weight parameter. As it is mentioned in the docs, here, the weights parameter should be provided during module instantiation.
For example, something like,
from torch import nn
weights = torch.FloatTensor([2.0, 1.2])
loss = nn.BCELoss(weights=weights)
You can find a more concrete example here or another helpful PT forum discussion here.
you need to pass weights like below:
CE_loss = CrossEntropyLoss(weight=[…])
This is similar to the idea of #Jeroen Vuurens, but the class weights are determined by the target mean:
y_train_mean = y_train.mean()
bi_cls_w2 = 1/(1 - y_train_mean)
bi_cls_w1 = 1/y_train_mean - bi_cls_w2
bce_loss = nn.BCELoss(reduction='none')
loss_fun = lambda pred, target: ((bi_cls_w1*target + bi_cls_w2) * bce_loss(pred, target)).mean()

Keras: Pixelwise class imbalance in binary image segmentation

I have a task in which I input a 500x500x1 image and get out a 500x500x1 binary segmentation. When working, only a small fraction of the 500x500 should be triggered (small "targets"). I'm using a sigmoid activation at the output. Since such a small fraction is desired to be positive, the training tends to stall with all outputs at zero, or very close. I've written my own loss function that partially deals with it, but I'd like to use binary cross entropy with a class weighting if possible.
My question is in two parts:
If I naively apply binary_crossentropy as the loss to my 500x500x1 output, will it apply on a per pixel basis as desired?
Is there a way for keras to apply class weighting with the single sigmoid output per pixel?
To answer your questions.
Yes, binary_cross_entropy will work per-pixel based, provided you feed to your image segmentation neural network pairs of the form (500x500x1 image(grayscale image) + 500x500x1 (corresponding mask to your image).
By feeding the parameter 'class_weight' parameter in model.fit()
Suppose you have 2 classes with 90%-10% distribution. Then you may want to penalise your algorithm 9 times more when it makes a mistake for the less well represented class(the class with 10% in this case). Suppose you have 900 examples of class 1 and 100 examples of class 2.
Then your class weights dictionary(there are multiple ways to compute it, what is important is to assign a greater weight to the less well represented class),
class_weights = {0:1000/900,1:1000/100}
Example : model.fit(X_train, Y_train, epochs = 30, batch_size=32, class_weight=class_weight)
NOTE: This is available only on 2d cases(class_weight). For 3D or higher dimensional spaces, one should use 'sample_weights'. For segmentation purposes, you would rather use sample_weights parameter.
The biggest gain you will have is by means of other loss functions. Other losses, apart from binary_crossentropy and categorical_crossentropy, inherently perform better on unbalanced datasets. Dice Loss is such a loss function.
Keras implementation:
smooth = 1.
def dice_coef(y_true, y_pred):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection = K.sum(y_true_f * y_pred_f)
return (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)
def dice_coef_loss(y_true, y_pred):
return 1 - dice_coef(y_true, y_pred)
You can also use as a loss function the sum of binary_crossentropy
and other losses if it suits you : i.e. loss = dice_loss + bce

keras: unsupervised learning with external constraint

I have to train a network on unlabelled data of binary type (True/False), which sounds like unsupervised learning. This is what the normalised data look like:
array([[-0.05744527, -1.03575495, -0.1940105 , -1.15348956, -0.62664491,
-0.98484037],
[-0.05497629, -0.50935675, -0.19396862, -0.68990988, -0.10551919,
-0.72375012],
[-0.03275552, 0.31480204, -0.1834951 , 0.23724946, 0.15504367,
0.29810553],
...,
[-0.05744527, -0.68482282, -0.1940105 , -0.87534175, -0.23580062,
-0.98484037],
[-0.05744527, -1.50366446, -0.1940105 , -1.52435329, -1.14777063,
-0.98484037],
[-0.05744527, -1.26970971, -0.1940105 , -1.33892142, -0.88720777,
-0.98484037]])
However, I do have a constraint on the total number of True labels in my data. This doesn't mean I can build a classical custom loss function in Keras taking (y_true, y_pred) arguments as required: my external constraint is just on the predicted total of True and False, not on the individual labels.
My question is whether there is a somewhat "standard" approach to this kind of problems, and how that is implementable in Keras.
POSSIBLE SOLUTION
Should I assign y_true randomly as 0/1, have a network return y_pred as 1/0 with a sigmoid activation function, and then define my loss function as
sum_y_true = 500 # arbitrary constant known a priori
def loss_function(y_true, y_pred):
loss = np.abs(y_pred.sum() - sum_y_true)
return loss
In the end, I went with the following solution, which worked.
1) Define batches in your dataframe df with a batch_id column, so that in each batch Y_train is your identical "batch ground truth" (in my case, the total number of True labels in the batch). You can then pass these instances together to the network. This can be done with a generator:
def grouper(g,x,y):
while True:
for gr in g.unique():
# this assigns indices to the entire set of values in g,
# then subsects to all the rows in which g == gr
indices = g == gr
yield (x[indices],y[indices])
# train set
train_generator = grouper(df.loc[df['set'] == 'train','batch_id'], X_train, Y_train)
# validation set
val_generator = grouper(df.loc[df['set'] == 'val','batch_id'], X_val, Y_val)
2) define a custom loss function, to track how close the total number of instances predicted as true matches the ground truth:
def custom_delta(y_true, y_pred):
loss = K.abs(K.mean(y_true) - K.sum(y_pred))
return loss
def custom_wrapper():
def custom_loss_function(y_true, y_pred):
return custom_delta(y_true, y_pred)
return custom_loss_function
Note that here
a) Each y_true label is already the sum of the ground truth in our batch (cause we don't have individual values). That's why y_true is not summed over;
b) K.mean is actually a bit of an overkill to extract a single scalar from this uniform tensor, in which all y_true values in each batch are identical - K.min or K.max would also work, but I haven't tested whether their performance is faster.
3) Use fit_generator instead of fit:
fmodel = Sequential()
# ...your layers...
# Create the loss function object using the wrapper function above
loss_ = custom_wrapper()
fmodel.compile(loss=loss_, optimizer='adam')
history1 = fmodel.fit_generator(train_generator, steps_per_epoch=total_batches,
validation_data=val_generator,
validation_steps=df.loc[encs.df['set'] == 'val','batch_id'].nunique(),
epochs=20, verbose = 2)
This way the problem is basically addressed as one of supervised learning, although without individual labels, which means that notions like true/false positive are meaningless here.
This approach not only managed to give me a y_pred that closely matches the totals I know per batch. It actually finds two groups (True/False) that occupy the expected different portions of parameter space.

How to use weighted categorical crossentropy on FCN (U-Net) in Keras?

I have built a Keras model for image segmentation (U-Net). However in my samples some misclassifications (areas) are not that important, while other are crucial, so I want to assign higher weight in loss function to them. To complicate things further, I would like some misclassifications (class 1 instead of 2) to have very high penalty while inverse (class 2 instead of 1) shouldn't be penalized that much.
The way I see it, I need to use a sum (across all of the pixels) of weighted categorical crossentropy, but the best I could find is this:
def w_categorical_crossentropy(y_true, y_pred, weights):
nb_cl = len(weights)
final_mask = K.zeros_like(y_pred[:, 0])
y_pred_max = K.max(y_pred, axis=1)
y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
y_pred_max_mat = K.cast(K.equal(y_pred, y_pred_max), K.floatx())
for c_p, c_t in product(range(nb_cl), range(nb_cl)):
final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
return K.categorical_crossentropy(y_pred, y_true) * final_mask
However this code only works with a single prediction and my knowledge of Keras inner workings is lacking (and math side of it is not much better). Anyone know how I can adapt it, or even better, is there a ready-made loss function which would suit my case?
I would appreciate some pointers.
EDIT: my question is similar to How to do point-wise categorical crossentropy loss in Keras?, except that I would like to use weighted categorical crossentropy.
You can use weight maps (as proposed in the U-Net paper). In those weight maps, you can weight regions with more weight or less weight. Here is some pseudocode:
loss = compute_categorical_crossentropy()
weighted_loss = loss * weight_map # using element-wise multiplication

Can GridSearchCV use predict_proba when using a custom score function?

I am trying to use a custom scoring function that calculates multi-class log loss with the ground truth and predict_proba y array. Is there a way to make GridSearchCV use this scoring function?
def multiclass_log_loss(y_true, y_pred):
Parameters
----------
y_true : array, shape = [n_samples]
true class, intergers in [0, n_classes - 1)
y_pred : array, shape = [n_samples, n_classes]
Returns
-------
loss : float
"""
eps=1e-15
predictions = np.clip(y_pred, eps, 1 - eps)
# normalize row sums to 1
predictions /= predictions.sum(axis=1)[:, np.newaxis]
actual = np.zeros(y_pred.shape)
n_samples = actual.shape[0]
actual[np.arange(n_samples), y_true.astype(int)] = 1
vectsum = np.sum(actual * np.log(predictions))
loss = -1.0 / n_samples * vectsum
return loss
I see that there are multiple options, score_func, loss_func and make_scorer. I tried using make_scorer with greater_is_better=False and also tried the loss_func parameter but it seems to still use the .predict method. How can I get around this problem?
UPDATE - if I set needs_threshold=True I get a multi-class error. Am I correct to understand multi-class is not supported in this case? If yes, can someone suggest a workaround?
Thanks.
The top answer to this question:
Pass estimator to custom score function via sklearn.metrics.make_scorer
might have what you need. One can define a scorer that takes as arguments a classifier clf, feature array X, and targets y_true, and feed the result of the clf.predict_proba() method to a scoring function that returns the error. As a hint, for binary classification, you probably need to use
clf.predict_proba(X)[:,1]
This worked for my needs (a normalized Gini score). For some reason, I couldn't get sklearn's metrics.make_scorer to work with my custom function that needs probabilities.

Resources