I'm working on a classification problem. The number of classes is 5. I have a ground truth vector that has the shape (3) instead of 1. The values in this target vector are the possible classes and the predicted vector is of the shape (1x5) which holds the softmax scores for all the classes.
For example:
predicted_vector = tensor([0.0669, 0.1336, 0.3400, 0.3392, 0.1203]
ground_truth = tensor([3,2,5])
For the above illustration, a typical argmax operation would result in declaring class 3 as the predicted class (0.34) but I want the model to reward even if the argmax class is any of 3,2, or 5.
Which loss function is recommended for such a use case?
As jodag pointed out in the comments you can try to treat it as a multi-label classification problem.
So [[0, 1, 2], [0, 2, 4], [3, 3, 3]] will be transformed into:
tensor([[1., 1., 1., 0., 0.],
[1., 0., 1., 0., 1.],
[0., 0., 0., 1., 0.]])
Here is an example of how this can be implemented:
import torch
from torch.nn import BCELoss
predicted_vector = torch.rand((3, 5))
ground_truth = torch.LongTensor([[0, 1, 2], [0, 2, 4], [3, 3, 3]])
labels_onehot = torch.zeros_like(predicted_vector)
labels_onehot.scatter_(1, ground_truth, 1)
loss_fn = BCELoss()
loss = loss_fn(predicted_vector, labels_onehot)
Also you can add different weights to different labels
For this problem, a given sample is in exactly one class (say, class 3), but for training purposes, predicting class 2 or 5 is still okay so the model isn't penalised that heavily.
This is a typical single-label, multi-class problem, but with probabilistic (“soft”) labels, and CrossEntropyLoss should be used (and not use softmax()).
In this example, the (soft) target might be a probability of 0.7 for class 3, a probability of 0.2 for class 2, and a probability of 0.1 for class 5 (and zero for everything else).
Related
I am having a hard time understanding the following scenario. I have a output probability of 0.0 on each class which means value of metrics such as f1 score, accuracy and recall should be zero? However i get the following:
import torch, torchmetrics
preds = torch.tensor([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])
target = torch.tensor([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])
print("F1: ", torchmetrics.functional.f1_score(preds, target))
print("Accuracy: ", torchmetrics.functional.accuracy(preds, target))
print("Recall: ", torchmetrics.functional.recall(preds, target))
print("Precision: ", torchmetrics.functional.precision(preds, target))
Output:
F1: tensor(0.)
Accuracy: tensor(0.6667)
Recall: tensor(0.)
Precision: tensor(0.)
Why is accuracy 0.6667? I would expect all outputs to be 0.0.
Your preds is a probabilities array for multi-label classification problem:
To make it simpler, I will assume the example like that:
preds = torch.tensor([[0., 0., 0.]]) # multi-labels [label1, label2, label3]
target = torch.tensor([[1, 0, 0]])
The true negatives are 2 since classifier predicts not existence for label2 and label3 while label2 and label3 indeed should not be existed.
The true positives are 0 since classifier predicts the existence of any label while a label should be existed.
The false negative is 1 since classifier predicts no existence for label1 while label1 should be existed.
The false positives are 0 since classifier predicts any label while a label should not be existed.
According to the above equation, Accuracy = 2/3 = 0.6667
You can read here more about different metrics and their calculations.
I have several Pytorch tensors ranging from 1-dimensional (e.g. torch.Size([128]), to 4-dimensional (e.g. torch.Size([256, 128, 3, 3]). Each tensor represents a weight in a neural network.
For each of these tensors I need to upscale 1 or 2 dimensions, for example
torch.Size([128])to torch.Size([256]),
torch.Size([256, 128, 3, 3]) to torch.Size([512, 256, 3, 3]),
torch.Size([3, 256, 1, 1]) to torch.Size([3, 512, 1, 1]).
I've looked at torch.nn.Upsample or nn.functional.interpolate and similar functions but I can't find a good way to do this comprehensively for each of my problems other than hardcoding it.
In the case of the simple 1D example I'm looking for a scaled version of my original tensor, something like this:
torch.arange(0, 9, dtype=torch.float32)
t = torch.arange(0, 9, dtype=torch.float32)
# = tensor([0., 1., 2., 3., 4., 5., 6., 7., 8.])
t_up = upsample(factor=2)
# = tensor([0., 0.5, 1., 1.5, 2., 2.5, 3., 3.5, 4., 4.5, 5., 5.5, 6., 6.5 7., 7.5, 8.])
Any help would be appreciated.
Your pattern is very irregular as:
torch.Size([128]) to torch.Size([256]) - 1D and interpolate everything
torch.Size([256, 128, 3, 3]) to torch.Size([512, 256, 3, 3]) - 4D and upscale first two dimensions
torch.Size([3, 256, 1, 1]) to torch.Size([3, 512, 1, 1]) - 3D and upscale only second dimension without the first
There is no clear way around "hard coding" in this case and "clever" approaches would probably only raise eyebrows when someone is going over your code.
Your 1D example uses linear mode with align_corners=False, not sure about 4D examples, but those would require bilinear mode at least.
size for torch.nn.functional.interpolate flattens 1 dimensions for some reason, hence only scale_factor is an option.
Some of the data has to be reshaped for interpolate
All in all, hardcoding and some comments are the best option in this case as there is no clear way to group different ways of expanding tensors you are given (and trying to be smart in this case is probably a dead end).
I have a tensor that looks like
coords = torch.Tensor([[0, 0, 1, 2],
[0, 2, 2, 2]])
The first row is the x-coordinates of objects on a grid and the second row is the corresponding y-coordinates.
I need a differentiable way (i.e. gradients can flow) to go from this tensor to the corresponding "grid" tensor, where a 1 represents the presence of an object in that location (row index, column index) and 0 represents no object:
grid = torch.Tensor([[1, 0, 1],
[0, 0, 1],
[0, 0, 1]])
In general, coords can be large (the grid size is 300x300). If coords was a sparse tensor I could simply call to_dense on it, but for various reasons specific to my application I cannot store coords as sparse. Additionally, I cannot create a new sparse tensor from coords and call to_dense on it because creating a new tensor is not differentiable.
Any help is appreciated!
I'm not sure what you mean by 'differentiable', but here's a simple way to do it using advanced indexing.
coords = coords.long()
grid[coords[0],coords[1]] = 1
tensor([[1., 0., 1.],
[0., 0., 1.],
[0., 0., 1.]])
I think Torch doesn't have a detailed documentation about this, but numpy has here. (probably very similar for torch)
this is also possible
coords = coords.long()
grid[coords[0],coords[1]] = torch.Tensor([1,2,3,4])
tensor([[1., 0., 2.],
[0., 0., 3.],
[0., 0., 4.]])
Say
coords = [[0, 0, 1, 2],
[0, 2, 2, 2]]
Then:
torch.stack([torch.stack(x) for x in coords])
I have a tensor of size 4 x 6 where 4 is batch size and 6 is sequence length. Every element of the sequence vectors are some index (0 to n). I want to create a 4 x 6 x n tensor where the vectors in 3rd dimension will be one hot encoding of the index which means I want to put 1 in the specified index and rest of the values will be zero.
For example, I have the following tensor:
[[5, 3, 2, 11, 15, 15],
[1, 4, 6, 7, 3, 3],
[2, 4, 7, 8, 9, 10],
[11, 12, 15, 2, 5, 7]]
Here, all the values are in between (0 to n) where n = 15. So, I want to convert the tensor to a 4 X 6 X 16 tensor where the third dimension will represent one hot encoding vector.
How can I do that using PyTorch functionalities? Right now, I am doing this with loop but I want to avoid looping!
NEW ANSWER
As of PyTorch 1.1, there is a one_hot function in torch.nn.functional. Given any tensor of indices indices and a maximal index n, you can create a one_hot version as follows:
n = 5
indices = torch.randint(0,n, size=(4,7))
one_hot = torch.nn.functional.one_hot(indices, n) # size=(4,7,n)
Very old Answer
At the moment, slicing and indexing can be a bit of a pain in PyTorch from my experience. I assume you don't want to convert your tensors to numpy arrays. The most elegant way I can think of at the moment is to use sparse tensors and then convert to a dense tensor. That would work as follows:
from torch.sparse import FloatTensor as STensor
batch_size = 4
seq_length = 6
feat_dim = 16
batch_idx = torch.LongTensor([i for i in range(batch_size) for s in range(seq_length)])
seq_idx = torch.LongTensor(list(range(seq_length))*batch_size)
feat_idx = torch.LongTensor([[5, 3, 2, 11, 15, 15], [1, 4, 6, 7, 3, 3],
[2, 4, 7, 8, 9, 10], [11, 12, 15, 2, 5, 7]]).view(24,)
my_stack = torch.stack([batch_idx, seq_idx, feat_idx]) # indices must be nDim * nEntries
my_final_array = STensor(my_stack, torch.ones(batch_size * seq_length),
torch.Size([batch_size, seq_length, feat_dim])).to_dense()
print(my_final_array)
Note: PyTorch is undergoing some work currently, that will add numpy style broadcasting and other functionalities within the next two or three weeks and other functionalities. So it's possible, there'll be better solutions available in the near future.
Hope this helps you a bit.
The easiest way I found. Where x is a list of numbers and class_count is the amount of classes you have.
def one_hot(x, class_count):
return torch.eye(class_count)[x,:]
Use it like this:
x = [0,2,5,4]
class_count = 8
one_hot(x,class_count)
tensor([[1., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 1., 0., 0.],
[0., 0., 0., 0., 1., 0., 0., 0.]])
This can be done in PyTorch using the in-place scatter_ method for any Tensor object.
labels = torch.LongTensor([[[2,1,0]], [[0,1,0]]]).permute(0,2,1) # Let this be your current batch
batch_size, k, _ = labels.size()
labels_one_hot = torch.FloatTensor(batch_size, k, num_classes).zero_()
labels_one_hot.scatter_(2, labels, 1)
For num_classes=3 (the indices should vary from [0,3)), this will give you
(0 ,.,.) =
0 0 1
0 1 0
1 0 0
(1 ,.,.) =
1 0 0
0 1 0
1 0 0
[torch.FloatTensor of size 2x3x3]
Note that labels should be a torch.LongTensor.
PyTorch Docs Reference: torch.Tensor.scatter_
Can someone explain why the OneVsRestClassifier gives different result than the out-of-the-box algorithm?
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier
X = [[1,2],[1,3],[4,2],[2,3],[1,4]]
y = [1,2,3,2,1]
X_pred = [[2,4], [5,4], [3,7]]
dummy_clf = OneVsRestClassifier(SGDClassifier(verbose=0, class_weight="auto", loss='modified_huber', random_state=0)) # first case
#dummy_clf = SGDClassifier(verbose=0, class_weight="auto", loss='modified_huber', random_state=0) # second case
dummy_clf.fit(X, y)
dummy_clf.predict_proba(X_pred)
First case:
array([[ 0.5, 0.5, 0. ],
[ 0. , 1. , 0. ],
[ 0.5, 0.5, 0. ]])
Second case:
array([[ 0., 1., 0.],
[ 0., 1., 0.],
[ 0., 1., 0.]])
OneVsRest gives you the probability of X_pred for all of the classes, thus the first and last test cases have a value for multiple classes (that sum to 1). The classifier is trained on all classes.
OneVsOne trains a classifier on all class pairs. For all class pairs, the class predicted most is the winner, so you only get one prediction per instance.