Subtract the elements of every possible pair of a torch Tensor efficiently - pytorch

I have a huge torch Tensor and I'm looking for an efficient approach to subtract the elements of every pair of that Tensor.
Of course I could use two nested for but it wouldn't be efficient.
For example giving
[1, 2, 3, 4]
The output I want is
[1-2, 1-3, 1-4, 2-3, 2-4, 3-4]

You can do this easily:
>>> x = torch.tensor([1, 2, 3, 4])
>>> x[:, None] - x[None, :]
tensor([[ 0, -1, -2, -3],
[ 1, 0, -1, -2],
[ 2, 1, 0, -1],
[ 3, 2, 1, 0]])
see more details here.

Related

How to pad the left side of a list of tensors in pytorch to the size of the largest list?

In pytorch, if you have a list of tensors, you can pad the right side using torch.nn.utils.rnn.pad_sequence
import torch
'for the collate function, pad the sequences'
f = [
[0,1],
[0, 3, 4],
[4, 3, 2, 4, 3]
]
torch.nn.utils.rnn.pad_sequence(
[torch.tensor(part) for part in f],
batch_first=True
)
tensor([[0, 1, 0, 0, 0],
[0, 3, 4, 0, 0],
[4, 3, 2, 4, 3]])
How would I pad the left side? The desired solution is
tensor([[0, 0, 0, 0, 1],
[0, 0, 0, 3, 4],
[4, 3, 2, 4, 3]])
You can reverse the list, do the padding, and reverse the tensor. Would that be acceptable to you? If yes, you can use the code below.
torch.nn.utils.rnn.pad_sequence([
torch.tensor(i[::-1]) for i in f
], # reverse the list and create tensors
batch_first=True) # pad
.flip(dims=[1]) # reverse/flip the padded tensor in first dimension

Count Unique elements in pytorch Tensor

Suppose I have the following tensor: y = torch.randint(0, 3, (10,)). How would you go about counting the 0's 1's and 2's in there?
The only way I can think of is by using collections.Counter(y) but was wondering if there was a more "pytorch" way of doing this. A use case for example would be when building the confusion matrix for predictions.
You can use torch.unique with the return_counts option:
>>> x = torch.randint(0, 3, (10,))
tensor([1, 1, 0, 2, 1, 0, 1, 1, 2, 1])
>>> x.unique(return_counts=True)
(tensor([0, 1, 2]), tensor([2, 6, 2]))

Using numba to randomly sample possible combinations of categories

I am trying to speed up a function that randomly samples a number of records with the possible combinations of a number of categories for a number of records and ensures they are unique (i.e. let's assume there's 3 records, any of them can be either 0 or 1 and I want 10 random samples of unique possible combinations of records).
If I did not use numba, I might would do something like this:
import numpy as np
def myfunc(categories, NumberOfRecords, maxsamples):
return np.unique( np.random.choice(np.arange(categories), size=(maxsamples*10, NumberOfRecords), replace=True), axis=0 )[0:maxsamples]
Annoyingly, numba does not support axis in np.unique, so I can do something like this, but some of the records may turn out to be non-unique.
from numba import njit, int64
import numpy as np
#njit(int64[:,:](int64, int64, int64), cache=True)
def myfunc(categories, NumberOfRecords, maxsamples):
return np.random.choice(np.arange(categories), size=(maxsamples, NumberOfRecords), replace=True)
myfunc(categories=2, NumberOfRecords=3, maxsamples=10)
E.g. in one call (obviously there's some randomness here), I got the below (for which the indices 1 and 6, and 3 and 4, and 7 and 9 are identical rows):
array([[0, 1, 1],
[1, 1, 0],
[0, 1, 0],
[1, 0, 1],
[1, 0, 1],
[1, 1, 1],
[1, 1, 0],
[1, 0, 0],
[0, 0, 0],
[1, 0, 0]])
My questions are:
Is this something where I would even expect a speed up from numba?
If so, how can I get a unique rows (this seems rather difficult with numba, but presumably there's a way)?
Perhaps there's a way to get at this more efficiently (perhaps without creating more random samples than I need in the end)?
In the following, I don't use numba, but all the operations use vectorized numpy functions.
Each row of the result that you generate can be interpreted as an integer expressed in base N, where N is the number of categories. With that interpretation, what you want is to sample without replacement from the integers [0, 1, ... N**R-1], where R is the number of "records". You can use the choice function for that, with the argument replace=False. Once you have that, you need to convert the chosen integers to base N. For that, I use the function int2base, which is a pared down version of a function that I wrote in a different answer.
Here's the code:
import numpy as np
def int2base(x, base, ndigits):
# x = np.asarray(x) # Uncomment this line for general purpose use.
powers = base ** np.arange(ndigits)
digits = (x.reshape(x.shape + (1,)) // powers) % base
return digits
def makesample(ncategories, nrecords, nsamples, rng=None):
if rng is None:
rng = np.random.default_rng()
n = ncategories ** nrecords
choices = rng.choice(n, replace=False, size=nsamples)
return int2base(choices, ncategories, nrecords)
In makesample, I included the optional argument rng. It allows you to specify the object that holds the choice function. If not provided, it uses np.random.default_rng().
Example:
In [118]: makesample(2, 3, 6)
Out[118]:
array([[0, 1, 1],
[0, 0, 1],
[1, 0, 1],
[0, 0, 0],
[1, 1, 0],
[1, 1, 1]])
In [119]: makesample(5, 4, 12)
Out[119]:
array([[3, 4, 0, 1],
[2, 0, 2, 0],
[4, 2, 4, 3],
[0, 1, 0, 4],
[0, 2, 0, 1],
[1, 2, 0, 1],
[0, 3, 0, 4],
[3, 3, 0, 3],
[3, 4, 1, 4],
[2, 4, 1, 1],
[3, 4, 1, 0],
[1, 1, 4, 4]])
makesample will raise an exception if you ask for too many samples:
In [120]: makesample(2, 3, 10)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-120-80044e78a60a> in <module>
----> 1 makesample(2, 3, 10)
~/code_snippets/python/numpy/random_samples_for_so_question.py in makesample(ncategories, nrecords, nsamples, rng)
17 rng = np.random.default_rng()
18 n = ncategories ** nrecords
---> 19 choices = rng.choice(n, replace=False, size=nsamples)
20 return int2base(choices, ncategories, nrecords)
_generator.pyx in numpy.random._generator.Generator.choice()
ValueError: Cannot take a larger sample than population when 'replace=False'

Bitwise operations in Pytorch

Could someone help me how to perform bitwise AND operations on two tensors in Pytorch 1.4?
Apparently I could only find NOT and XOR operations in official document
I don't see them in the docs, but it looks like &, |, __and__, __or__, __xor__, etc are bit-wise:
>>> torch.tensor([1, 2, 3, 4]).__xor__(torch.tensor([1, 1, 1, 1]))
tensor([0, 3, 2, 5])
>>> torch.tensor([1, 2, 3, 4]) | torch.tensor([1, 1, 1, 1])
tensor([1, 3, 3, 5])
>>> torch.tensor([1, 2, 3, 4]) & torch.tensor([1, 1, 1, 1])
tensor([1, 0, 1, 0])
>>> torch.tensor([1, 2, 3, 4]).__and__(torch.tensor([1, 1, 1, 1]))
tensor([1, 0, 1, 0])
See https://github.com/pytorch/pytorch/pull/1556
Check this. There is no bitwise and/or operation for tensors in Torch. There are element-wise operations implemented in Torch, but not bite-wise ones.
However, if you could convert each bit as a separate Tensor dimension, you can use element-wise operation.
For an example,
a = torch.Tensor{0,1,1,0}
b = torch.Tensor{0,1,0,1}
torch.cmul(a,b):eq(1)
0
1
0
0
[torch.ByteTensor of size 4]
torch.add(a,b):ge(1)
0
1
1
1
[torch.ByteTensor of size 4]
Hope this will help you.

Explanation for slicing in Pytorch

why is the output same every time?
a = torch.tensor([0, 1, 2, 3, 4])
a[-2:] = torch.tensor([[[5, 6]]])
a
tensor([0, 1, 2, 5, 6])
a = torch.tensor([0, 1, 2, 3, 4])
a[-2:] = torch.tensor([[5, 6]])
a
tensor([0, 1, 2, 5, 6])
a = torch.tensor([0, 1, 2, 3, 4])
a[-2:] = torch.tensor([5, 6])
a
tensor([0, 1, 2, 5, 6])
Pytorch is following Numpy here which allows assignment to slices as long as the shapes are compatible meaning that the two sides have the same shape or the right hand side is broadcastable to the shape of the slice. Starting with trailing dimensions, two arrays are broadcastable if they only differ in dimensions where one of them is 1. So in this case
a = torch.tensor([0, 1, 2, 3, 4])
b = torch.tensor([[[5, 6]]])
print(a[-2:].shape, b.shape)
>> torch.Size([2]) torch.Size([1, 1, 2])
Pytorch will perform the following comparisons:
a[-2:].shape[-1] and b.shape[-1] are equal so the last dimension is compatible
a[-2:].shape[-2] does not exist, but b.shape[-2] is 1 so they are compatible
a[-2:].shape[-3] does not exist, but b.shape[-3] is 1 so they are compatible
All dimensions are compatible, so b can be broadcasted to a
Finally, Pytorch will convert b to tensor([5, 6]) before performing the assignment thus producing the result:
a[-2:] = b
print(a)
>> tensor([0, 1, 2, 5, 6])

Resources