I am training a PyTorch model to perform binary classification. My minority class makes up about 10% of the data, so I want to use a weighted loss function. The docs for BCELoss and CrossEntropyLoss say that I can use a 'weight' for each sample.
However, when I declare CE_loss = nn.BCELoss() or nn.CrossEntropyLoss() and then do CE_Loss(output, target, weight=batch_weights), where output, target, and batch_weights are Tensors of batch_size, I get the following error message:
forward() got an unexpected keyword argument 'weight'
Another way you could accomplish your goal is to use reduction=none when initializing the loss and then multiply the resulting tensor by your weights before computing the mean.
e.g.
loss = torch.nn.BCELoss(reduction='none')
model = torch.sigmoid
weights = torch.rand(10,1)
inputs = torch.rand(10,1)
targets = torch.rand(10,1)
intermediate_losses = loss(model(inputs), targets)
final_loss = torch.mean(weights*intermediate_losses)
Of course for your scenario you still would need to calculate the weights tensor. But hopefully this helps!
Could it be that you want to apply separate fixed weights to all elements of class 0 and class 1 in your dataset? It is not clear what value you are passing for batch_weights here. If so, then that is not what the weight parameter in BCELoss does. The weight parameter expects you to pass a separate weight for every ELEMENT in the dataset, not for every CLASS. There are several ways around this. You could construct a weight table for every element. Alternatively, you could use a custom loss function that does what you want:
def BCELoss_class_weighted(weights):
def loss(input, target):
input = torch.clamp(input,min=1e-7,max=1-1e-7)
bce = - weights[1] * target * torch.log(input) - (1 - target) * weights[0] * torch.log(1 - input)
return torch.mean(bce)
return loss
Note that it is important to add a clamp to avoid numerical instability.
HTH Jeroen
the issue is wherein your providing the weight parameter. As it is mentioned in the docs, here, the weights parameter should be provided during module instantiation.
For example, something like,
from torch import nn
weights = torch.FloatTensor([2.0, 1.2])
loss = nn.BCELoss(weights=weights)
You can find a more concrete example here or another helpful PT forum discussion here.
you need to pass weights like below:
CE_loss = CrossEntropyLoss(weight=[…])
This is similar to the idea of #Jeroen Vuurens, but the class weights are determined by the target mean:
y_train_mean = y_train.mean()
bi_cls_w2 = 1/(1 - y_train_mean)
bi_cls_w1 = 1/y_train_mean - bi_cls_w2
bce_loss = nn.BCELoss(reduction='none')
loss_fun = lambda pred, target: ((bi_cls_w1*target + bi_cls_w2) * bce_loss(pred, target)).mean()
Related
I try to write a cross entropy loss function by myself. My loss function gives the same loss value as the official one, but when i use my loss function in the code instead of official cross entropy loss function, the code does not converge. When i use the official cross entropy loss function, the code converges. Here is my code, please give me some suggestions. Thanks very much
The input 'out' is a tensor (B*C) and 'label' contains class indices (1 * B)
class MylossFunc(nn.Module):
def __init__(self):
super(MylossFunc, self).__init__()
def forward(self, out, label):
out = torch.nn.functional.softmax(out, dim=1)
n = len(label)
loss = torch.FloatTensor([0])
loss = Variable(loss, requires_grad=True)
tmp = torch.log(out)
#print(out)
torch.scalar_tensor(-100)
for i in range(n):
loss = loss - torch.max(tmp[i][label[i]], torch.scalar_tensor(-100) )/n
loss = torch.sum(loss)
return loss
Instead of using torch.softmax and torch.log, you should use torch.log_softmax, otherwise your training will become unstable with nan values everywhere.
This happens because when you take the softmax of your logits using the following line:
out = torch.nn.functional.softmax(out, dim=1)
you might get a zero in one of the components of out, and when you follow that by applying torch.log it will result in nan (since log(0) is undefined). That is why torch (and other common libraries) provide a single stable operation, log_softmax, to avoid the numerical instabilities that occur when you use torch.softmax and torch.log individually.
I am trying to solve one multilabel problem with 270 labels and i have converted target labels into one hot encoded form. I am using BCEWithLogitsLoss(). Since training data is unbalanced, I am using pos_weight argument but i am bit confused.
pos_weight (Tensor, optional) – a weight of positive examples. Must be a vector with length equal to the number of classes.
Do i need to give total count of positive values of each label as a tensor or they mean something else by weights?
The PyTorch documentation for BCEWithLogitsLoss recommends the pos_weight to be a ratio between the negative counts and the positive counts for each class.
So, if len(dataset) is 1000, element 0 of your multihot encoding has 100 positive counts, then element 0 of the pos_weights_vector should be 900/100 = 9. That means that the binary crossent loss will behave as if the dataset contains 900 positive examples instead of 100.
Here is my implementation:
(new, based on this post)
pos_weight = (y==0.).sum()/y.sum()
(original)
def calculate_pos_weights(class_counts):
pos_weights = np.ones_like(class_counts)
neg_counts = [len(data)-pos_count for pos_count in class_counts]
for cdx, pos_count, neg_count in enumerate(zip(class_counts, neg_counts)):
pos_weights[cdx] = neg_count / (pos_count + 1e-5)
return torch.as_tensor(pos_weights, dtype=torch.float)
Where class_counts is just a column-wise sum of the positive samples. I posted it on the PyTorch forum and one of the PyTorch devs gave it his blessing.
Maybe is a little late, but here is how I calculate the same. Looking into the documentation:
For example, if a dataset contains 100 positive and 300 negative examples of a single class, then pos_weight for the class should be equal to 300/100 = 3.
So an easy way to calcule the positive weight is using the tensor methods with your label vector "y", in my case train_dataset.data.y. And then calculating the total negative labels.
num_positives = torch.sum(train_dataset.data.y, dim=0)
num_negatives = len(train_dataset.data.y) - num_positives
pos_weight = num_negatives / num_positives
Then the weights can be used easily as:
criterion = torch.nn.BCEWithLogitsLoss(pos_weight = pos_weight)
PyTorch solution
Well, actually I have gone through docs and you can simply use pos_weight indeed.
This argument gives weight to positive sample for each class, hence if you have 270 classes you should pass torch.Tensor with shape (270,) defining weight for each class.
Here is marginally modified snippet from documentation:
# 270 classes, batch size = 64
target = torch.ones([64, 270], dtype=torch.float32)
# Logits outputted from your network, no activation
output = torch.full([64, 270], 0.9)
# Weights, each being equal to one. You can input your own here.
pos_weight = torch.ones([270])
criterion = torch.nn.BCEWithLogitsLoss(pos_weight=pos_weight)
criterion(output, target) # -log(sigmoid(0.9))
Self-made solution
When it comes to weighting, there is no built-in solution, but you may code one yourself really easily:
import torch
class WeightedMultilabel(torch.nn.Module):
def __init__(self, weights: torch.Tensor):
self.loss = torch.nn.BCEWithLogitsLoss()
self.weights = weights.unsqueeze()
def forward(outputs, targets):
return self.loss(outputs, targets) * self.weights
Tensor has to be of the same length as the number of classes in your multilabel classification (270), each giving weight for your specific example.
Calculating weights
You just add labels of every sample in your dataset, divide by the minimum value and inverse at the end.
Sort of snippet:
weights = torch.zeros_like(dataset[0])
for element in dataset:
weights += element
weights = 1 / (weights / torch.min(weights))
Using this approach class occurring the least will give normal loss, while others will have weights smaller than 1.
It might cause some instability during training though, so you might want to experiment with those values a little (maybe log transform instead of linear?)
Other approach
You may think about upsampling/downsampling (though this operation is complicated as you would add/delete other classes as well, so advanced heuristics would be needed I think).
Just to provide a quick revision on #crypdick's answer, this implementation of the function worked for me:
def calculate_pos_weights(class_counts,data):
pos_weights = np.ones_like(class_counts)
neg_counts = [len(data)-pos_count for pos_count in class_counts]
for cdx, (pos_count, neg_count) in enumerate(zip(class_counts, neg_counts)):
pos_weights[cdx] = neg_count / (pos_count + 1e-5)
return torch.as_tensor(pos_weights, dtype=torch.float)
Where data is the dataset you're trying to apply weights to.
I would like to define a custom cost function
def custom_objective(y_true, y_pred):
....
return L
that will depend not only on y_true and y_pred, but on some feature of the corresponding x that produced y_pred. The only way I can think of doing this is to "hide" the relevant features in y_true, so that y_true = [usual_y_true, relevant_x_features], or something like that.
There are two main problems I am having with implementing this:
1) Changing the shape of y_true means I need to pad y_pred with some garbage so that their shapes are the same. I can do this by modyfing the last layer of my model
2) I used data augmentation like so:
datagen = ImageDataGenerator(preprocessing_function=my_augmenter)
where my_augmenter() is the function that should also give me the relevant x features to use in custom_objective() above. However, training with
model.fit_generator(datagen.flow(x_train, y_train, batch_size=1), ...)
doesn't seem to give me access to the features calculated with my_augmenter.
I suppose I could hide the features in the augmented x_train, copy them right away in my model setup, and then feed them directly into y_true or something like that, but surely there must be a better way to do this?
Maybe you could create a two part model with:
Inner model: original model that predicts desired outputs
Outer model:
Takes y_true data as inputs
Takes features as inputs
Outputs the loss itself (instead of predicted data)
So, suppose you already have the originalModel defined. Let's define the outer model.
#this model has three inputs:
originalInputs = originalModel.input
yTrueInputs = Input(shape_of_y_train)
featureInputs = Input(shape_of_features)
#the original outputs will become an input for a custom loss layer
originalOutputs = originalModel.output
#this layer contains our custom loss
loss = Lambda(innerLoss)([originalOutputs, yTrueInputs, featureInputs])
#outer model
outerModel = Model([originalInputs, yTrueInputs, featureInputs], loss)
Now, our custom inner loss:
def innerLoss(x):
y_pred = x[0]
y_true = x[1]
features = x[2]
.... calculate and return loss here ....
Now, for this model that already contains a custom loss "inside" it, we don't actually want a final loss function, but since keras demands it, we will use the final loss as just return y_pred:
def finalLoss(true,pred):
return pred
This will allow us to train passing just a dummy y_true.
But of course, we also need a custom generator, otherwise we can't get the features.
Consider you already have originalGenerator =datagen.flow(x_train, y_train, batch_size=1) defined:
def customGenerator(originalGenerator):
while True: #keras needs infinite generators
x, y = next(originalGenerator)
features = ____extract features here____(x)
yield (x,y,features), y
#the last y will be a dummy output, necessary but not used
You could also, if you want the extra functionality of randomizing batch order and use multiprocessing, implement a class CustomGenerator(keras.utils.Sequence) following the same logic. The help page shows how.
So, let's compile and train the outer model (this also trains the inner model so you can use it later for predicting):
outerModel.compile(optimizer=..., loss=finalLoss)
outerModel.fit_generator(customGenerator(originalGenerator), batchesInOriginalGenerator,
epochs=...)
It’s known that sparse_categorical_crossentropy in keras can get the average loss function among each category. But what if only one certain category was I concerned most? Like if I want to define the precision(=TP/(TP+FP)) based on this category as loss function, how can I write it? Thanks!
My codes were like:
from keras import backend as K
def my_loss(y_true,y_pred):
y_true = K.cast(y_true,"float32")
y_pred = K.cast(K.argmax(y_pred),"float32")
nominator = K.sum(K.cast(K.equal(y_true,y_pred) & K.equal(y_true, 0),"float32"))
denominator = K.sum(K.cast(K.equal(y_pred,0),"float32"))
return -(nominator + K.epsilon()) / (denominator + K.epsilon())
And the error is like:
argmax is not differentiable
I don't recommend you to use precision as the loss function.
It is not differentiable that can't be set as a loss function for nn.
you can max it by predicting all the instance as class negative, that makes no sense.
One of the alternative solution is using F1 as the loss function, then tuning the probability cut-off manually for obtaining a desirable level of precision as well as recall is not too low.
You can pass to the fit method a parameter class_weight where you determine which classes are more important.
It should be a dictionary:
{
0: 1, #class 0 has weight 1
1: 0.5, #class 1 has half the importance of class 0
2: 0.7, #....
...
}
Custom loss
If that is not exactly what you need, you can create loss functions like:
import keras.backend as K
def customLoss(yTrue,yPred):
create operations with yTrue and yPred
- yTrue = the true output data (equal to y_train in most examples)
- yPred = the model's calculated output
- yTrue and yPred have exactly the same shape: (batch_size,output_dimensions,....)
- according to the output shape of the last layer
- also according to the shape of y_train
all operations must be like +, -, *, / or operations from K (backend)
return someResultingTensor
You cannot used argmax as it is not differentiable. That means that backprop will not work if loss function can't be differentiated.
Instead of using argmax, do y_true * y_pred.
I am trying to use a custom scoring function that calculates multi-class log loss with the ground truth and predict_proba y array. Is there a way to make GridSearchCV use this scoring function?
def multiclass_log_loss(y_true, y_pred):
Parameters
----------
y_true : array, shape = [n_samples]
true class, intergers in [0, n_classes - 1)
y_pred : array, shape = [n_samples, n_classes]
Returns
-------
loss : float
"""
eps=1e-15
predictions = np.clip(y_pred, eps, 1 - eps)
# normalize row sums to 1
predictions /= predictions.sum(axis=1)[:, np.newaxis]
actual = np.zeros(y_pred.shape)
n_samples = actual.shape[0]
actual[np.arange(n_samples), y_true.astype(int)] = 1
vectsum = np.sum(actual * np.log(predictions))
loss = -1.0 / n_samples * vectsum
return loss
I see that there are multiple options, score_func, loss_func and make_scorer. I tried using make_scorer with greater_is_better=False and also tried the loss_func parameter but it seems to still use the .predict method. How can I get around this problem?
UPDATE - if I set needs_threshold=True I get a multi-class error. Am I correct to understand multi-class is not supported in this case? If yes, can someone suggest a workaround?
Thanks.
The top answer to this question:
Pass estimator to custom score function via sklearn.metrics.make_scorer
might have what you need. One can define a scorer that takes as arguments a classifier clf, feature array X, and targets y_true, and feed the result of the clf.predict_proba() method to a scoring function that returns the error. As a hint, for binary classification, you probably need to use
clf.predict_proba(X)[:,1]
This worked for my needs (a normalized Gini score). For some reason, I couldn't get sklearn's metrics.make_scorer to work with my custom function that needs probabilities.