I am having a hard time understanding the following scenario. I have a output probability of 0.0 on each class which means value of metrics such as f1 score, accuracy and recall should be zero? However i get the following:
import torch, torchmetrics
preds = torch.tensor([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])
target = torch.tensor([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])
print("F1: ", torchmetrics.functional.f1_score(preds, target))
print("Accuracy: ", torchmetrics.functional.accuracy(preds, target))
print("Recall: ", torchmetrics.functional.recall(preds, target))
print("Precision: ", torchmetrics.functional.precision(preds, target))
Output:
F1: tensor(0.)
Accuracy: tensor(0.6667)
Recall: tensor(0.)
Precision: tensor(0.)
Why is accuracy 0.6667? I would expect all outputs to be 0.0.
Your preds is a probabilities array for multi-label classification problem:
To make it simpler, I will assume the example like that:
preds = torch.tensor([[0., 0., 0.]]) # multi-labels [label1, label2, label3]
target = torch.tensor([[1, 0, 0]])
The true negatives are 2 since classifier predicts not existence for label2 and label3 while label2 and label3 indeed should not be existed.
The true positives are 0 since classifier predicts the existence of any label while a label should be existed.
The false negative is 1 since classifier predicts no existence for label1 while label1 should be existed.
The false positives are 0 since classifier predicts any label while a label should not be existed.
According to the above equation, Accuracy = 2/3 = 0.6667
You can read here more about different metrics and their calculations.
Related
I'm working on a classification problem. The number of classes is 5. I have a ground truth vector that has the shape (3) instead of 1. The values in this target vector are the possible classes and the predicted vector is of the shape (1x5) which holds the softmax scores for all the classes.
For example:
predicted_vector = tensor([0.0669, 0.1336, 0.3400, 0.3392, 0.1203]
ground_truth = tensor([3,2,5])
For the above illustration, a typical argmax operation would result in declaring class 3 as the predicted class (0.34) but I want the model to reward even if the argmax class is any of 3,2, or 5.
Which loss function is recommended for such a use case?
As jodag pointed out in the comments you can try to treat it as a multi-label classification problem.
So [[0, 1, 2], [0, 2, 4], [3, 3, 3]] will be transformed into:
tensor([[1., 1., 1., 0., 0.],
[1., 0., 1., 0., 1.],
[0., 0., 0., 1., 0.]])
Here is an example of how this can be implemented:
import torch
from torch.nn import BCELoss
predicted_vector = torch.rand((3, 5))
ground_truth = torch.LongTensor([[0, 1, 2], [0, 2, 4], [3, 3, 3]])
labels_onehot = torch.zeros_like(predicted_vector)
labels_onehot.scatter_(1, ground_truth, 1)
loss_fn = BCELoss()
loss = loss_fn(predicted_vector, labels_onehot)
Also you can add different weights to different labels
For this problem, a given sample is in exactly one class (say, class 3), but for training purposes, predicting class 2 or 5 is still okay so the model isn't penalised that heavily.
This is a typical single-label, multi-class problem, but with probabilistic (“soft”) labels, and CrossEntropyLoss should be used (and not use softmax()).
In this example, the (soft) target might be a probability of 0.7 for class 3, a probability of 0.2 for class 2, and a probability of 0.1 for class 5 (and zero for everything else).
I am using Keras Tuner and using RandomSearch() to hypertune my regression model. While I can hypertune using "relu" and "selu", I am unable to do the same for Leaky Relu. I understand that the reason "relu" and "selu" string works because, for "relu" and "selu", string aliases are available. String alias is not available for Leaky Relu. I tried passing a callable object of Leaky Relu (see my example below) but it doesn't seem to work. Can you please advise me how to do that? I have the same issue with using Parametric Leaky Relu,
Thank you in advance!
def build_model(hp):
model = Sequential()
model.add(
Dense(
units = 18,
kernel_initializer = 'normal',
activation = 'relu',
input_shape = (18, )
)
)
for i in range(hp.Int( name = "num_layers", min_value = 1, max_value = 5)):
model.add(
Dense(
units = hp.Int(
name = "units_" + str(i),
min_value = 18,
max_value = 180,
step = 18),
kernel_initializer = 'normal',
activation = hp.Choice(
name = 'dense_activation',
values=['relu', 'selu', LeakyReLU(alpha=0.01) ],
default='relu'
)
)
)
model.add( Dense( units = 1 ) )
model.compile(
optimizer = tf.keras.optimizers.Adam(
hp.Choice(
name = "learning_rate", values = [1e-2, 1e-3, 1e-4]
)
),
loss = 'mse'
)
return model
As a work-around, you can add another activation function in the tf.keras.activations.* module by modifying the source file ( which you'll see is activations.py )
Here's the code for tf.keras.activations.relu which you'll see in activations.py,
#keras_export('keras.activations.relu')
#dispatch.add_dispatch_support
def relu(x, alpha=0., max_value=None, threshold=0):
"""Applies the rectified linear unit activation function.
With default values, this returns the standard ReLU activation:
`max(x, 0)`, the element-wise maximum of 0 and the input tensor.
Modifying default parameters allows you to use non-zero thresholds,
change the max value of the activation,
and to use a non-zero multiple of the input for values below the threshold.
For example:
>>> foo = tf.constant([-10, -5, 0.0, 5, 10], dtype = tf.float32)
>>> tf.keras.activations.relu(foo).numpy()
array([ 0., 0., 0., 5., 10.], dtype=float32)
>>> tf.keras.activations.relu(foo, alpha=0.5).numpy()
array([-5. , -2.5, 0. , 5. , 10. ], dtype=float32)
>>> tf.keras.activations.relu(foo, max_value=5).numpy()
array([0., 0., 0., 5., 5.], dtype=float32)
>>> tf.keras.activations.relu(foo, threshold=5).numpy()
array([-0., -0., 0., 0., 10.], dtype=float32)
Arguments:
x: Input `tensor` or `variable`.
alpha: A `float` that governs the slope for values lower than the
threshold.
max_value: A `float` that sets the saturation threshold (the largest value
the function will return).
threshold: A `float` giving the threshold value of the activation function
below which values will be damped or set to zero.
Returns:
A `Tensor` representing the input tensor,
transformed by the relu activation function.
Tensor will be of the same shape and dtype of input `x`.
"""
return K.relu(x, alpha=alpha, max_value=max_value, threshold=threshold)
Copy this code and paste it just below. Change #keras_export('keras.activations.relu') to #keras_export( 'keras.activations.leaky_relu' ) and also change the value of alpha to 0.2, like,
#keras_export('keras.activations.leaky_relu')
#dispatch.add_dispatch_support
def relu(x, alpha=0.2, max_value=None, threshold=0):
"""Applies the rectified linear unit activation function.
With default values, this returns the standard ReLU activation:
`max(x, 0)`, the element-wise maximum of 0 and the input tensor.
Modifying default parameters allows you to use non-zero thresholds,
change the max value of the activation,
and to use a non-zero multiple of the input for values below the threshold.
For example:
>>> foo = tf.constant([-10, -5, 0.0, 5, 10], dtype = tf.float32)
>>> tf.keras.activations.relu(foo).numpy()
array([ 0., 0., 0., 5., 10.], dtype=float32)
>>> tf.keras.activations.relu(foo, alpha=0.5).numpy()
array([-5. , -2.5, 0. , 5. , 10. ], dtype=float32)
>>> tf.keras.activations.relu(foo, max_value=5).numpy()
array([0., 0., 0., 5., 5.], dtype=float32)
>>> tf.keras.activations.relu(foo, threshold=5).numpy()
array([-0., -0., 0., 0., 10.], dtype=float32)
Arguments:
x: Input `tensor` or `variable`.
alpha: A `float` that governs the slope for values lower than the
threshold.
max_value: A `float` that sets the saturation threshold (the largest value
the function will return).
threshold: A `float` giving the threshold value of the activation function
below which values will be damped or set to zero.
Returns:
A `Tensor` representing the input tensor,
transformed by the relu activation function.
Tensor will be of the same shape and dtype of input `x`.
"""
return K.relu(x, alpha=alpha, max_value=max_value, threshold=threshold)
You can use the String alias keras.activations.leaky_relu.
# Custom activation function
from keras.layers import Activation
from keras import backend as K
from keras.utils.generic_utils import get_custom_objects
## Add leaky-relu so we can use it as a string
get_custom_objects().update({'leaky-relu': Activation(LeakyReLU(alpha=0.2))})
## Main activation functions available to use
activation_functions = ['sigmoid', 'relu', 'elu', 'leaky-relu', 'selu', 'gelu',"swish"]
I have a tensor that looks like
coords = torch.Tensor([[0, 0, 1, 2],
[0, 2, 2, 2]])
The first row is the x-coordinates of objects on a grid and the second row is the corresponding y-coordinates.
I need a differentiable way (i.e. gradients can flow) to go from this tensor to the corresponding "grid" tensor, where a 1 represents the presence of an object in that location (row index, column index) and 0 represents no object:
grid = torch.Tensor([[1, 0, 1],
[0, 0, 1],
[0, 0, 1]])
In general, coords can be large (the grid size is 300x300). If coords was a sparse tensor I could simply call to_dense on it, but for various reasons specific to my application I cannot store coords as sparse. Additionally, I cannot create a new sparse tensor from coords and call to_dense on it because creating a new tensor is not differentiable.
Any help is appreciated!
I'm not sure what you mean by 'differentiable', but here's a simple way to do it using advanced indexing.
coords = coords.long()
grid[coords[0],coords[1]] = 1
tensor([[1., 0., 1.],
[0., 0., 1.],
[0., 0., 1.]])
I think Torch doesn't have a detailed documentation about this, but numpy has here. (probably very similar for torch)
this is also possible
coords = coords.long()
grid[coords[0],coords[1]] = torch.Tensor([1,2,3,4])
tensor([[1., 0., 2.],
[0., 0., 3.],
[0., 0., 4.]])
Say
coords = [[0, 0, 1, 2],
[0, 2, 2, 2]]
Then:
torch.stack([torch.stack(x) for x in coords])
I am using Keras with the scikit-learn wrapper. In particular, I want to use GridSearchCV for hyper-parameters optimisation.
This is a multi-class problem, i.e. the target variable can have only one label chosen on a set of n classes. For instance, the target variable can be 'Class1', 'Class2' ... 'Classn'.
# self._arch creates my model
nn = KerasClassifier(build_fn=self._arch, verbose=0)
clf = GridSearchCV(
nn,
param_grid={ ... },
# I use f1 score macro averaged
scoring='f1_macro',
n_jobs=-1)
# self.fX is the data matrix
# self.fy_enc is the target variable encoded with one-hot format
clf.fit(self.fX.values, self.fy_enc.values)
The problem is that, when score is computed during cross-validation, the true label for validation samples is encoded one-hot, while the prediction for some reason collapses to binary label (when the target variable has only two classes). For instance, this is the last part of the stack trace:
...........................................................................
/Users/fbrundu/.pyenv/versions/3.6.0/lib/python3.6/site-packages/sklearn/metrics/classification.py in _check_targets(y_true=array([[ 0., 1.],
[ 0., 1.],
[ 0... 0., 1.],
[ 0., 1.],
[ 0., 1.]]), y_pred=array([1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1,...0, 1, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 1, 1]))
77 if y_type == set(["binary", "multiclass"]):
78 y_type = set(["multiclass"])
79
80 if len(y_type) > 1:
81 raise ValueError("Can't handle mix of {0} and {1}"
---> 82 "".format(type_true, type_pred))
type_true = 'multilabel-indicator'
type_pred = 'binary'
83
84 # We can't have more than one value on y_type => The set is no more needed
85 y_type = y_type.pop()
86
ValueError: Can't handle mix of multilabel-indicator and binary
How can I instruct Keras/sklearn to give back predictions in one-hot encoding?
Following Vivek's comment, I used the original (not one-hot-encoded) target array, and I configured (in my Keras model, see code) the loss sparse_categorical_crossentropy, as per the comments to this issue.
arch.compile(
optimizer='sgd',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
Can someone explain why the OneVsRestClassifier gives different result than the out-of-the-box algorithm?
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier
X = [[1,2],[1,3],[4,2],[2,3],[1,4]]
y = [1,2,3,2,1]
X_pred = [[2,4], [5,4], [3,7]]
dummy_clf = OneVsRestClassifier(SGDClassifier(verbose=0, class_weight="auto", loss='modified_huber', random_state=0)) # first case
#dummy_clf = SGDClassifier(verbose=0, class_weight="auto", loss='modified_huber', random_state=0) # second case
dummy_clf.fit(X, y)
dummy_clf.predict_proba(X_pred)
First case:
array([[ 0.5, 0.5, 0. ],
[ 0. , 1. , 0. ],
[ 0.5, 0.5, 0. ]])
Second case:
array([[ 0., 1., 0.],
[ 0., 1., 0.],
[ 0., 1., 0.]])
OneVsRest gives you the probability of X_pred for all of the classes, thus the first and last test cases have a value for multiple classes (that sum to 1). The classifier is trained on all classes.
OneVsOne trains a classifier on all class pairs. For all class pairs, the class predicted most is the winner, so you only get one prediction per instance.