Represent torch.einsum as torch.matmul operation - pytorch

I'm trying to implement an einsum operation as matrix multiplication in pytorch. In the example below,
degree_matrix_hat is a matrix with shape (64, 115) and edge_features_reduced is a tensor of shape (64, 115, 115).
Using torch.einsum I get the following output:
ein1 = torch.einsum("bi,bij->bij",degree_matrix_hat, edge_features_reduced)
ein1[0],ein1.size()
(tensor([[0.7071, 0.7071, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[0.5774, 0.5774, 0.5774, ..., 0.0000, 0.0000, 0.0000],
[0.0000, 0.5774, 0.5774, ..., 0.0000, 0.0000, 0.0000],
...,
[0.0000, 0.0000, 0.0000, ..., 1.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, ..., 0.0000, 1.0000, 0.0000],
[0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 1.0000]]),
torch.Size([64, 115, 115]))
However, I get a different output using using torch.matmul:
matmul1=torch.matmul(degree_matrix_hat, edge_features_reduced)
matmul1[0],matmul1.size()
(tensor([[1.2845, 1.8618, 1.7321, ..., 1.0000, 1.0000, 1.0000],
[1.2845, 1.7845, 1.6547, ..., 1.0000, 1.0000, 1.0000],
[1.2071, 1.7845, 1.6547, ..., 1.0000, 1.0000, 1.0000],
...,
[1.2071, 1.7845, 1.6547, ..., 1.0000, 1.0000, 1.0000],
[1.2845, 1.7845, 1.6547, ..., 1.0000, 1.0000, 1.0000],
[1.2071, 1.7845, 1.6547, ..., 1.0000, 1.0000, 1.0000]]),
torch.Size([64, 64, 115]))
I'm quite new to linear algebra so I probably missed something here. What is the exact expression torch.einsum is solving in this case?

Related

torch suppress to kth largest values

I have the following function which works, but just not for half precision values (get a NotImplemented error for kthvalue).
def suppress_small_probabilities(probabilities: torch.FloatTensor, k: int) -> torch.FloatTensor:
kth_largest, _ = (-probabilities).kthvalue(k, dim=-1, keepdim=True)
return probabilities * (probabilities >= -kth_largest)
How would you do the equivalent without using kthvalue? I'm guessing topk has something to do with it, but I want to suppress the smaller values. probabilities is of size batch_size x 1000.
Implement your own topk, e.g.
def mytopk(xs: Tensor, k: int) -> Tensor:
mask = torch.zeros_like(xs)
batch_idx = torch.arange(0, len(xs))
for _ in range(k):
_, index = torch.where(mask == 0, xs, -1e4).max(-1)
mask[(batch_idx, index)] = 1
return mask
This will return a boolean mask tensor where the row-wise top-k elements will have value 1, rest 0.
Then use the mask to index your original tensor, e.g.
xs = torch.rand(3, 5, dtype=torch.float16)
# tensor([[0.0626, 0.9620, 0.5596, 0.4423, 0.1932],
# [0.5289, 0.0857, 0.7802, 0.7730, 0.4807],
# [0.8272, 0.5016, 0.1169, 0.4372, 0.1843]], dtype=torch.float16)
mask = mytopk(xs, 2)
# tensor([[0., 1., 1., 0., 0.],
# [0., 0., 1., 1., 0.],
# [1., 1., 0., 0., 0.]])
top_only = torch.where(mask == 1, xs, 0)
# tensor([[0.0000, 0.9620, 0.5596, 0.0000, 0.0000],
# [0.0000, 0.0000, 0.7802, 0.7730, 0.0000],
# [0.8271, 0.5016, 0.0000, 0.0000, 0.0000]], dtype=torch.float16)

Keeping ponly the row-wise maximum of a tensor and setting to zero all the other entries

Given a tensor like this
tensor([[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 1.0534, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 1.0944, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[1.2780, 1.5430, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[1.1799, 1.2002, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]]
I want to transform it by keeping only the maximum element for each row, and setting all the others to 0. I was trying to play around with torch.argmax(tensor, dim=1) but not sure that is going to help. So the desired output in this case would be
tensor([[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 1.0534, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 1.0944, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 1.5430, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 1.2002, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]]
Seems to work:
a = torch.randn((8, 7))
max_a, ids = torch.max(a, 1, keepdim=True)
b = torch.zeros_like(a) # result tensor
b.scatter_(1, ids, max_a) # set max values on idx indices

torch.masked_scatter result did not meet expectations

my pytorch code:
import torch
x = torch.tensor([[0.3992, 0.2908, 0.9004, 0.4850, 0.6004],
[0.5735, 0.9006, 0.6797, 0.4152, 0.1732]])
print(x.shape)
mask = torch.tensor([[False, False, True, False, True],
[ True, True, True, False, False]])
print(mask.shape)
y = torch.tensor([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]])
print(y.shape)
y.masked_scatter_(mask, x)
print(y)
result is:
torch.Size([2, 5])
torch.Size([2, 5])
torch.Size([2, 5])
tensor([[0.0000, 0.0000, 0.3992, 0.0000, 0.2908],
[0.9004, 0.4850, 0.6004, 0.0000, 0.0000]])
i think the result answer is:
tensor([[0.0000, 0.0000, 0.9004, 0.0000, 0.6004],
[0.5375, 0.9006, 0.6797, 0.0000, 0.0000]])
my pytorch version is pytorch1.4
You are right, this is confusing and there is virtually no documentation.
However, the way scatter works (as you have discovered) is that the ith True in a row is given the ith value from the source. So not the value corresponding to the position of the True.
Luckily what you are trying to do can easily be achieved using the normal indexing notation:
>>> y[mask] = x[mask]
>>> y
tensor([[0.0000, 0.0000, 0.9004, 0.0000, 0.6004],
[0.5735, 0.9006, 0.6797, 0.0000, 0.0000]])

In pytorch, how to fill a tensor with another tensor?

I'm looking for a way to expand the size of an image by adding 0 values to the right & lower edges of it. My initial plan is to use nn.padding to add the edge, until I encounter this error:
File "/home/shared/virtualenv/dl-torch/lib/python3.7/site-packages/torch/nn/functional.py", line 2796, in pad
assert len(pad) % 2 == 0, 'Padding length must be divisible by 2'
AssertionError: Padding length must be divisible by 2
It appears that torch tries to pad the image from both side! Is there an easy way to override this and fill the tensor into the upper-left side of another image?
the only way I know is:
with torch.no_grad(): # assuming it's for init
val = torch.distributions.MultivariateNormal(loc=zeros(2), scale=torch.eye(2))
w.data = val
but I doubt it's recommended.
Answering the title of the question.
With nn.ConstantPad2d, you can specify the number of padding elements in all four directions separately.
>>> t = torch.randn(2,3)
>>> t
tensor([[ 0.1254, 0.6358, 0.3243],
[ 0.7005, -0.4931, 1.0582]])
>>> p = torch.nn.ConstantPad2d((0, 4, 0, 2), 0)
>>> p(t)
tensor([[ 0.1254, 0.6358, 0.3243, 0.0000, 0.0000, 0.0000, 0.0000],
[ 0.7005, -0.4931, 1.0582, 0.0000, 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]])
I had a similar problem and wanted to initialize a image tensor with a specific color. I solved it as follows:
Let X be a tensor of shape (h, w, dim) and let dim hold 3 values (r,g,b).
If you want to initialize your tensor X with the rgb color 226, 169, 41 you could do something like:
index_0 = torch.tensor([0]) # 226
index_1 = torch.tensor([1]) #169
index_2 = torch.tensor([2]) #41
X.index_fill_(2, index_0, 226)
X.index_fill_(2, index_1, 169)
X.index_fill_(2, index_2, 41)

groupby aggregate mean in pytorch

I have a 2D tensor:
samples = torch.Tensor([
[0.1, 0.1], #-> group / class 1
[0.2, 0.2], #-> group / class 2
[0.4, 0.4], #-> group / class 2
[0.0, 0.0] #-> group / class 0
])
and a label for each sample corresponding to a class:
labels = torch.LongTensor([1, 2, 2, 0])
so len(samples) == len(labels). Now I want to calculate the mean for each class / label. Because there are 3 classes (0, 1 and 2) the final vector should have dimension [n_classes, samples.shape[1]] So the expected solution should be:
result == torch.Tensor([
[0.1, 0.1],
[0.3, 0.3], # -> mean of [0.2, 0.2] and [0.4, 0.4]
[0.0, 0.0]
])
Question: How can this be done in pure pytorch (i.e. no numpy so that I can autograd) and ideally without for loops?
All you need to do is form an mxn matrix (m=num classes, n=num samples) which will select the appropriate weights, and scale the mean appropriately. Then you can perform a matrix multiplication between your newly formed matrix and the samples matrix.
Given your labels, your matrix should be (each row is a class number, each class a sample number and its weight):
[[0.0000, 0.0000, 0.0000, 1.0000],
[1.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.5000, 0.5000, 0.0000]]
Which you can form as follows:
M = torch.zeros(labels.max()+1, len(samples))
M[labels, torch.arange(len(samples)] = 1
M = torch.nn.functional.normalize(M, p=1, dim=1)
torch.mm(M, samples)
Output:
tensor([[0.0000, 0.0000],
[0.1000, 0.1000],
[0.3000, 0.3000]])
Note that the output means are correctly sorted in class order.
Why does M[labels, torch.arange(len(samples))] = 1 work?
This is performing a broadcast operation between the labels and the number of samples. Essentially, we are generating a 2D index for every element in labels: the first specifies which of the m classes it belongs to, and the second simply specifies its index position (from 1 to N). Another way would be top explicitly generate all the 2D indices:
twoD_indices = []
for count, label in enumerate(labels):
twoD_indices.append((label, count))
Reposting here an answer from #ptrblck_de in the Pytorch forums
labels = labels.view(labels.size(0), 1).expand(-1, samples.size(1))
unique_labels, labels_count = labels.unique(dim=0, return_counts=True)
res = torch.zeros_like(unique_labels, dtype=torch.float).scatter_add_(0, labels, samples)
res = res / labels_count.float().unsqueeze(1)
As previous solutions do not work for the case of sparse groups (e.g., not all the groups are in the data), I made one :)
def groupby_mean(value:torch.Tensor, labels:torch.LongTensor) -> (torch.Tensor, torch.LongTensor):
"""Group-wise average for (sparse) grouped tensors
Args:
value (torch.Tensor): values to average (# samples, latent dimension)
labels (torch.LongTensor): labels for embedding parameters (# samples,)
Returns:
result (torch.Tensor): (# unique labels, latent dimension)
new_labels (torch.LongTensor): (# unique labels,)
Examples:
>>> samples = torch.Tensor([
[0.15, 0.15, 0.15], #-> group / class 1
[0.2, 0.2, 0.2], #-> group / class 3
[0.4, 0.4, 0.4], #-> group / class 3
[0.0, 0.0, 0.0] #-> group / class 0
])
>>> labels = torch.LongTensor([1, 5, 5, 0])
>>> result, new_labels = groupby_mean(samples, labels)
>>> result
tensor([[0.0000, 0.0000, 0.0000],
[0.1500, 0.1500, 0.1500],
[0.3000, 0.3000, 0.3000]])
>>> new_labels
tensor([0, 1, 5])
"""
uniques = labels.unique().tolist()
labels = labels.tolist()
key_val = {key: val for key, val in zip(uniques, range(len(uniques)))}
val_key = {val: key for key, val in zip(uniques, range(len(uniques)))}
labels = torch.LongTensor(list(map(key_val.get, labels)))
labels = labels.view(labels.size(0), 1).expand(-1, value.size(1))
unique_labels, labels_count = labels.unique(dim=0, return_counts=True)
result = torch.zeros_like(unique_labels, dtype=torch.float).scatter_add_(0, labels, value)
result = result / labels_count.float().unsqueeze(1)
new_labels = torch.LongTensor(list(map(val_key.get, unique_labels[:, 0].tolist())))
return result, new_labels
For 3D Tensors:
For those, who are interested. I expanded #yhenon's answer to the case, where labels is a 2D tensor and samples is a 3D Tensor. This might be useful, if you want to execute this operation in batches (as I do). But it comes with a caveat (see at the end).
M = torch.zeros(labels.shape[0], labels.max()+1, labels.shape[1])
M[torch.arange(len(labels))[:,None], labels, torch.arange(labels.size(1))] = 1
M = torch.nn.functional.normalize(M, p=1, dim=-1)
result = M#samples
samples = torch.Tensor([[
[0.1, 0.1], #-> group / class 1
[0.2, 0.2], #-> group / class 2
[0.4, 0.4], #-> group / class 2
[0.0, 0.0] #-> group / class 0
], [
[0.5, 0.5], #-> group / class 0
[0.2, 0.2], #-> group / class 1
[0.4, 0.4], #-> group / class 2
[0.1, 0.1] #-> group / class 3
]])
labels = torch.LongTensor([[1, 2, 2, 0], [0, 1, 2, 3]])
Output:
>>> result
tensor([[[0.0000, 0.0000],
[0.1000, 0.1000],
[0.3000, 0.3000],
[0.0000, 0.0000]],
[[0.5000, 0.5000],
[0.2000, 0.2000],
[0.4000, 0.4000],
[0.1000, 0.1000]]])
Be careful: Now, result[0] has a length of 4 (instead of 3 in #yhenon's answer), because labels[1] contains a 3. The last row contains only 0s. If you don't except 0s in the last rows of your resulting tensor, you can use this code and deal with the 0s later.

Resources