torchmetric calculate accuracy with threshold - pytorch

How does torchmetrics.Accuracy threshold keyword work? I have the following setup:
import torch, torchmetrics
preds = torch.tensor([[0.3600, 0.3200, 0.3200]])
target = torch.tensor([0])
torchmetrics.functional.accuracy(preds, target, threshold=0.5, num_classes=3)
None of the values in preds have a probability higher than 0.5, why is it giving 100% accuracy?

threshold=0.5 sets each probability under 0.5 to 0. It is used only in case you are dealing with binary (which is not your case, since num_classes=3) or multilabel classification (which seems not the case because multiclass is not set). Therefore threshold is not actually involved.
In your case, preds represents a prediction related to one observation. Its largest probability (0.36) is set at position 0, thus argmax(preds) and target are equal to 0, therefore accuracy is set to 1, because the prediction of the (only) observation is correct.

You could use a multilabel target shape with subset_accuracy=True as explained here
import torch, torchmetrics
target = torch.tensor([0,1,2])
target = torch.nn.functional.one_hot(target)
preds = torch.tensor([[0.3600, 0.3200, 0.3200],
[0.300, 0.4000, 0.3000],
[0.3000, 0.2000, 0.5000]])
torchmetrics.functional.accuracy(preds, target, subset_accuracy=True, threshold=0.6)
Decreasing threshold:
torchmetrics.functional.accuracy(preds, target, subset_accuracy=True, threshold=0.5)
Decreasing threshold to 0.35:
torchmetrics.functional.accuracy(preds, target, subset_accuracy=True, threshold=0.35)
unfortunately this does not work for if any other probability is higher then the threshold, because it will think you are predicting multiple labels. For instance threshold 0.0 will give:
torchmetrics.functional.accuracy(preds, target, subset_accuracy=True, threshold=0.0)


binary_cross_entropy_with_logits: weight vs pos_weight, what are the differences?

According to Pytorch's documentation on binary_cross_entropy_with_logits, they are described as:
weight (Tensor, optional) – a manual rescaling weight if provided it’s
repeated to match input tensor shape
pos_weight (Tensor, optional) – a weight of positive examples. Must be
a vector with length equal to the number of classes.
What are their differences? The explanation is quite vague. If I understands correctly, weight is individual weight for each pixel (class), wheres pos_weight is the weight for everything that's not background (negative pixel/zero)?
What if I set both parameters? For example:
import torch
preds = torch.randn(4, 100, 50, 50)
target = torch.zeros((4, 100, 50, 50))
target[:, :, 10:20, 10:20] = 1
pos_weight = target * 100
pos_weight[pos_weight < 100] = 1
weight = target * 100
weight[weight < 100] = 1
loss1 = binary_cross_entropy_with_logits(preds, target, pos_weight=pos_weight, weight=weight)
loss2 = binary_cross_entropy_with_logits(preds, target, pos_weight=pos_weight)
loss3 = binary_cross_entropy_with_logits(preds, target, weight=weight)
loss1, loss2, and loss3, which one is the correct usage?
On the same subject, I was reading a paper that said:
To deal with the unbalanced negative and positive data, we dilate each
keypoint by 10 pixels and use weighted cross-entropy loss. The weight
for each keypoint is set to 100 while for non-keypoint pixels it is
set to 1.
which one is the correct usage if according to the paper?
Thanks in advance for any explanation!
The pos_weight parameter allows you to balance the positive example thus controlling the tradeoff between recall and precision (see also). A detailed explanation can be found on this thread along with the explicit math expression.
On the other hand, weight allows to weigh the different elements on a given batch.
Here is a minimal example:
>>> target = torch.ones([10, 64], dtype=torch.float32)
>>> output = torch.full([10, 64], 1.5)
>>> criterion = torch.nn.BCEWithLogitsLoss() # w/o weight
>>> criterion(output, target)
tensor(0.2014) # all batch elements weighted equally
>>> weight = torch.rand(10,1)
>>> criterion = torch.nn.BCEWithLogitsLoss(weight=weight) # w/ weight
>>> criterion(output, target)
tensor(0.0908) # per element weighting
Which is identical to doing:
>>> criterion = torch.nn.BCEWithLogitsLoss(reduction='none')
>>> torch.mean(criterion(output, target)*weight)

Change learning rate within minibatch - keras

I have a problem with imbalanced labels, for example 90% of the data have the label 0 and the rest 10% have the label 1.
I want to teach the network with minibatches. So I want the optimizer to give the examples labeled with 1 a learning rate (or somehow change the gradients to be) greater by 9 than those with label 0.
is there any way of doing that?
The problem is that the whole training process is done in this line:
history =, trainY, epochs=1, batch_size=minibatch_size, validation_data=(valX, valY), verbose=0)
is there a way to change the fit method in the low level?
You can try using the class_weight parameter of keras.
From keras doc:
class_weight: Optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function (during training only).
Example of using it in imbalance data:
class_weights={"class_1": 1, "class_2": 10}
history =, trainY, epochs=1, batch_size=minibatch_size, validation_data=(valX, valY), verbose=0, class_weight=class_weights)
Full example:
# Examine the class label imbalance
# you can use your_df['label_class_column'] or just the trainY values.
neg, pos = np.bincount(your_df['label_class_column'])
total = neg + pos
print('Examples:\n Total: {}\n Positive: {} ({:.2f}% of total)\n'.format(
total, pos, 100 * pos / total))
# Scaling by total/2 helps keep the loss to a similar magnitude.
# The sum of the weights of all examples stays the same.
weight_for_0 = (1 / neg)*(total)/2.0
weight_for_1 = (1 / pos)*(total)/2.0
class_weight = {0: weight_for_0, 1: weight_for_1}

Set custom probability threshold for Keras CNN

Surprised I can't quickly find this info online--
After training my CNN I grabbed the predictions by running;
predictions = model.predict_generator(test_generator, steps=num_test)
Rather than use
predicted_classes = np.argmax(predictions, axis=1)
I'd like to set a threshold of anything greater than 0.3 probability being labeled as class 1, rather than 0.5. Is there a quick and easy way to do this?
If it is a binary classification you could try:
while i < len(predictions):
This should "round" to class 1 if the predicted value is bigger than 0.3
There is no place in Keras to set such threshold, even as Keras uses 0.5 to compute the binary_accuracy metric. Your only option is to manually threshold the predictions:
predictions = model.predict_generator(test_generator, steps=num_test)
classes = predictions > 0.3

How to calculate unbalanced weights for BCEWithLogitsLoss in pytorch

I am trying to solve one multilabel problem with 270 labels and i have converted target labels into one hot encoded form. I am using BCEWithLogitsLoss(). Since training data is unbalanced, I am using pos_weight argument but i am bit confused.
pos_weight (Tensor, optional) – a weight of positive examples. Must be a vector with length equal to the number of classes.
Do i need to give total count of positive values of each label as a tensor or they mean something else by weights?
The PyTorch documentation for BCEWithLogitsLoss recommends the pos_weight to be a ratio between the negative counts and the positive counts for each class.
So, if len(dataset) is 1000, element 0 of your multihot encoding has 100 positive counts, then element 0 of the pos_weights_vector should be 900/100 = 9. That means that the binary crossent loss will behave as if the dataset contains 900 positive examples instead of 100.
Here is my implementation:
(new, based on this post)
pos_weight = (y==0.).sum()/y.sum()
def calculate_pos_weights(class_counts):
pos_weights = np.ones_like(class_counts)
neg_counts = [len(data)-pos_count for pos_count in class_counts]
for cdx, pos_count, neg_count in enumerate(zip(class_counts, neg_counts)):
pos_weights[cdx] = neg_count / (pos_count + 1e-5)
return torch.as_tensor(pos_weights, dtype=torch.float)
Where class_counts is just a column-wise sum of the positive samples. I posted it on the PyTorch forum and one of the PyTorch devs gave it his blessing.
Maybe is a little late, but here is how I calculate the same. Looking into the documentation:
For example, if a dataset contains 100 positive and 300 negative examples of a single class, then pos_weight for the class should be equal to 300/100 = 3.
So an easy way to calcule the positive weight is using the tensor methods with your label vector "y", in my case And then calculating the total negative labels.
num_positives = torch.sum(, dim=0)
num_negatives = len( - num_positives
pos_weight = num_negatives / num_positives
Then the weights can be used easily as:
criterion = torch.nn.BCEWithLogitsLoss(pos_weight = pos_weight)
PyTorch solution
Well, actually I have gone through docs and you can simply use pos_weight indeed.
This argument gives weight to positive sample for each class, hence if you have 270 classes you should pass torch.Tensor with shape (270,) defining weight for each class.
Here is marginally modified snippet from documentation:
# 270 classes, batch size = 64
target = torch.ones([64, 270], dtype=torch.float32)
# Logits outputted from your network, no activation
output = torch.full([64, 270], 0.9)
# Weights, each being equal to one. You can input your own here.
pos_weight = torch.ones([270])
criterion = torch.nn.BCEWithLogitsLoss(pos_weight=pos_weight)
criterion(output, target) # -log(sigmoid(0.9))
Self-made solution
When it comes to weighting, there is no built-in solution, but you may code one yourself really easily:
import torch
class WeightedMultilabel(torch.nn.Module):
def __init__(self, weights: torch.Tensor):
self.loss = torch.nn.BCEWithLogitsLoss()
self.weights = weights.unsqueeze()
def forward(outputs, targets):
return self.loss(outputs, targets) * self.weights
Tensor has to be of the same length as the number of classes in your multilabel classification (270), each giving weight for your specific example.
Calculating weights
You just add labels of every sample in your dataset, divide by the minimum value and inverse at the end.
Sort of snippet:
weights = torch.zeros_like(dataset[0])
for element in dataset:
weights += element
weights = 1 / (weights / torch.min(weights))
Using this approach class occurring the least will give normal loss, while others will have weights smaller than 1.
It might cause some instability during training though, so you might want to experiment with those values a little (maybe log transform instead of linear?)
Other approach
You may think about upsampling/downsampling (though this operation is complicated as you would add/delete other classes as well, so advanced heuristics would be needed I think).
Just to provide a quick revision on #crypdick's answer, this implementation of the function worked for me:
def calculate_pos_weights(class_counts,data):
pos_weights = np.ones_like(class_counts)
neg_counts = [len(data)-pos_count for pos_count in class_counts]
for cdx, (pos_count, neg_count) in enumerate(zip(class_counts, neg_counts)):
pos_weights[cdx] = neg_count / (pos_count + 1e-5)
return torch.as_tensor(pos_weights, dtype=torch.float)
Where data is the dataset you're trying to apply weights to.

Micro F1 score in Scikit-Learn with Class imbalance

I have some class imbalance and a simple baseline classifier that assigns the majority class to every sample:
from sklearn.metrics import precision_score, recall_score, confusion_matrix
y_true = [0,0,0,1]
y_pred = [0,0,0,0]
confusion_matrix(y_true, y_pred)
This yields
[[3, 0],
[1, 0]]
This means TP=3, FP=1, FN=0.
So far, so good. Now I want to calculate the micro average of precision and recall.
precision_score(y_true, y_pred, average='micro') # yields 0.75
recall_score(y_true, y_pred, average='micro') # yields 0.75
I am Ok with the precision, but why is recall not 1.0? How can they ever be the same in this example, given that FP > 0 and FN == 0? I know it must have to do with the micro averaging, but I can't wrap my head around this one.
Yes, its because of micro-averaging. See the documentation here to know how its calculated:
Note that if all labels are included, “micro”-averaging in a
multiclass setting will produce precision, recall and f-score that are all
identical to accuracy.
As you can see in the above linked page, both precision and recall are defined as:
where R(y, y-hat) is:
So in your case, Recall-micro will be calculated as
R = number of correct predictions / total predictions = 3/4 = 0.75
