Related
Problem
I have a list of indices and a list of values like so:
i = torch.tensor([[2, 2, 1], [2, 0, 2]])
v = torch.tensor([1, 2, 3])
I want to define a (3x3 for the example) matrix which contains the values v at the indices i (1 at position (2,2), 2 at position (2, 0) and 3 at position (1,2)):
tensor([[0, 0, 0],
[0, 0, 3],
[2, 0, 1]])
What I have tried
I can do it using a trick, with torch.sparse and .to_dense() but I feel that it's not the "pytorchic" way to do that nor the most efficient:
f = torch.sparse.FloatTensor(indices, values, torch.Size([3, 3]))
print(f.to_dense())
Any idea for a better solution ?
Ideally I would appreciate a solution at least as fast than the one provided above.
Of course this was just an example, no particular structure in tensors i and v are assumed (neither for the dimension).
There is an alternative, as below:
import torch
i = torch.tensor([[2, 2, 1], [2, 0, 2]])
v = torch.tensor([1, 2, 3], dtype=torch.float) # enforcing same data-type
target = torch.zeros([3,3], dtype=torch.float) # enforcing same data-type
target.index_put_(tuple([k for k in i]), v)
print(target)
The target tensor will be as follows:
tensor([[0., 0., 0.],
[0., 0., 3.],
[2., 0., 1.]])
This medium.com blog article provides a comprehensive list of all index functions for PyTorch Tensors.
I have a tensor that looks like
coords = torch.Tensor([[0, 0, 1, 2],
[0, 2, 2, 2]])
The first row is the x-coordinates of objects on a grid and the second row is the corresponding y-coordinates.
I need a differentiable way (i.e. gradients can flow) to go from this tensor to the corresponding "grid" tensor, where a 1 represents the presence of an object in that location (row index, column index) and 0 represents no object:
grid = torch.Tensor([[1, 0, 1],
[0, 0, 1],
[0, 0, 1]])
In general, coords can be large (the grid size is 300x300). If coords was a sparse tensor I could simply call to_dense on it, but for various reasons specific to my application I cannot store coords as sparse. Additionally, I cannot create a new sparse tensor from coords and call to_dense on it because creating a new tensor is not differentiable.
Any help is appreciated!
I'm not sure what you mean by 'differentiable', but here's a simple way to do it using advanced indexing.
coords = coords.long()
grid[coords[0],coords[1]] = 1
tensor([[1., 0., 1.],
[0., 0., 1.],
[0., 0., 1.]])
I think Torch doesn't have a detailed documentation about this, but numpy has here. (probably very similar for torch)
this is also possible
coords = coords.long()
grid[coords[0],coords[1]] = torch.Tensor([1,2,3,4])
tensor([[1., 0., 2.],
[0., 0., 3.],
[0., 0., 4.]])
Say
coords = [[0, 0, 1, 2],
[0, 2, 2, 2]]
Then:
torch.stack([torch.stack(x) for x in coords])
I have to admit, I'm a bit confused by the scatter* and index* operations - I'm not sure any of them do exactly what I'm looking for, which is very simple:
Given some 2-D tensor
z = tensor([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])
And a list (or tensor?) of 2-d indexes:
inds = tensor([[0, 0],
[1, 1],
[1, 2]])
I want to add a scalar to z at those indexes (and do it efficiently):
znew = z.something_add(inds, 3)
->
znew = tensor([[4., 1., 1., 1.],
[1., 4., 4., 1.],
[1., 1., 1., 1.]])
If I have to I can make that scalar a tensor of whatever shape (where all elements = 3), but I'd rather not...
You must provide two lists to your indexing. The first having the row positions and the second the column positions. In your example, it would be:
z[[0, 1, 1], [0, 1, 2]] += 3
torch.Tensor indexing follows Numpy. See https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#integer-array-indexing for more details.
This code achieves what you want:
z_new = z.clone() # copy the tensor
z_new[inds[:, 0], inds[:, 1]] += 3 # modify selected indices of new tensor
In PyTorch, you can index each axis of a tensor with another tensor.
I have a list of one batch data with multi-label for every sample. So how to covert it into torch.Tensor in one-hot encoding?
For example, with batch_size=5 and class_num=6,
label =[
[1,2,3],
[4,6],
[1],
[1,4,5],
[4]
]
how to make it into one-hot encoding in pytorch?
label_tensor=tensor([
[1,1,1,0,0,0],
[0,0,0,1,0,1],
[1,0,0,0,0,0],
[1,0,0,1,1,0],
[0,0,0,1,0,0]
])
If the batch size can be derived from len(labels):
def to_onehot(labels, n_categories, dtype=torch.float32):
batch_size = len(labels)
one_hot_labels = torch.zeros(size=(batch_size, n_categories), dtype=dtype)
for i, label in enumerate(labels):
# Subtract 1 from each LongTensor because your
# indexing starts at 1 and tensor indexing starts at 0
label = torch.LongTensor(label) - 1
one_hot_labels[i] = one_hot_labels[i].scatter_(dim=0, index=label, value=1.)
return one_hot_labels
and you have 6 categories and want the output to be a tensor of integers:
to_onehot(labels, n_categories=6, dtype=torch.int64)
tensor([[1, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 1],
[1, 0, 0, 0, 0, 0],
[1, 0, 0, 1, 1, 0],
[0, 0, 0, 1, 0, 0]])
I would stick to torch.float32 in case you want to use label smoothing, mix-up or something along those lines later.
To handle any situation (include string labels) I've extended #karniol's answer:
def multihot_encoder(labels, dtype=torch.float32):
""" Convert list of label lists into a 2-D multihot Tensor """
label_set = set()
for label_list in labels:
label_set = label_set.union(set(label_list))
label_set = sorted(label_set)
multihot_vectors = []
for label_list in labels:
multihot_vectors.append([1 if x in label_list else 0 for x in label_set])
# To keep track of which columns are which, set dtype to None and...
# import pandas as pd
if dtype is None:
return pd.DataFrame(multihot_vectors, columns=label_set)
return torch.Tensor(multihot_vectors).to(dtype)
Your use case:
label_lists = [[1,2,3], [4,6], [1], [1,4,5], [4]]
>>> multihot_encoder(label_lists)
tensor([[1., 1., 1., 0., 0., 0.],
[0., 0., 0., 1., 0., 1.],
[1., 0., 0., 0., 0., 0.],
[1., 0., 0., 1., 1., 0.],
[0., 0., 0., 1., 0., 0.]])
If you want to keep track of your labels (feature names) before converting your dataset to a Tensor, just set dtype to None:
label_lists = [
['happy', 'kind'], ['sad', 'mean'],
['loud', 'happy'], ['quiet', 'kind']
]
multihot_encoder(label_lists, dtype=None)
happy kind loud mean quiet sad
0 1 1 0 0 0 0
1 0 0 0 1 0 1
2 1 0 1 0 0 0
3 0 1 0 0 1 0
and you have 6 categories and want the output to be a tensor of integers:
to_onehot(labels, n_categories=6, dtype=torch.int64)
tensor([[1, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 1],
[1, 0, 0, 0, 0, 0],
[1, 0, 0, 1, 1, 0],
[0, 0, 0, 1, 0, 0]])
I have a tensor of size 4 x 6 where 4 is batch size and 6 is sequence length. Every element of the sequence vectors are some index (0 to n). I want to create a 4 x 6 x n tensor where the vectors in 3rd dimension will be one hot encoding of the index which means I want to put 1 in the specified index and rest of the values will be zero.
For example, I have the following tensor:
[[5, 3, 2, 11, 15, 15],
[1, 4, 6, 7, 3, 3],
[2, 4, 7, 8, 9, 10],
[11, 12, 15, 2, 5, 7]]
Here, all the values are in between (0 to n) where n = 15. So, I want to convert the tensor to a 4 X 6 X 16 tensor where the third dimension will represent one hot encoding vector.
How can I do that using PyTorch functionalities? Right now, I am doing this with loop but I want to avoid looping!
NEW ANSWER
As of PyTorch 1.1, there is a one_hot function in torch.nn.functional. Given any tensor of indices indices and a maximal index n, you can create a one_hot version as follows:
n = 5
indices = torch.randint(0,n, size=(4,7))
one_hot = torch.nn.functional.one_hot(indices, n) # size=(4,7,n)
Very old Answer
At the moment, slicing and indexing can be a bit of a pain in PyTorch from my experience. I assume you don't want to convert your tensors to numpy arrays. The most elegant way I can think of at the moment is to use sparse tensors and then convert to a dense tensor. That would work as follows:
from torch.sparse import FloatTensor as STensor
batch_size = 4
seq_length = 6
feat_dim = 16
batch_idx = torch.LongTensor([i for i in range(batch_size) for s in range(seq_length)])
seq_idx = torch.LongTensor(list(range(seq_length))*batch_size)
feat_idx = torch.LongTensor([[5, 3, 2, 11, 15, 15], [1, 4, 6, 7, 3, 3],
[2, 4, 7, 8, 9, 10], [11, 12, 15, 2, 5, 7]]).view(24,)
my_stack = torch.stack([batch_idx, seq_idx, feat_idx]) # indices must be nDim * nEntries
my_final_array = STensor(my_stack, torch.ones(batch_size * seq_length),
torch.Size([batch_size, seq_length, feat_dim])).to_dense()
print(my_final_array)
Note: PyTorch is undergoing some work currently, that will add numpy style broadcasting and other functionalities within the next two or three weeks and other functionalities. So it's possible, there'll be better solutions available in the near future.
Hope this helps you a bit.
The easiest way I found. Where x is a list of numbers and class_count is the amount of classes you have.
def one_hot(x, class_count):
return torch.eye(class_count)[x,:]
Use it like this:
x = [0,2,5,4]
class_count = 8
one_hot(x,class_count)
tensor([[1., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 1., 0., 0.],
[0., 0., 0., 0., 1., 0., 0., 0.]])
This can be done in PyTorch using the in-place scatter_ method for any Tensor object.
labels = torch.LongTensor([[[2,1,0]], [[0,1,0]]]).permute(0,2,1) # Let this be your current batch
batch_size, k, _ = labels.size()
labels_one_hot = torch.FloatTensor(batch_size, k, num_classes).zero_()
labels_one_hot.scatter_(2, labels, 1)
For num_classes=3 (the indices should vary from [0,3)), this will give you
(0 ,.,.) =
0 0 1
0 1 0
1 0 0
(1 ,.,.) =
1 0 0
0 1 0
1 0 0
[torch.FloatTensor of size 2x3x3]
Note that labels should be a torch.LongTensor.
PyTorch Docs Reference: torch.Tensor.scatter_