Alternative concatenation of tensors - pytorch

I have 2 tensors of shape [2, 1, 9] and [2, 1, 3]. I'd like to concatenate across the 3rd dimension alternatively (once every 4).
For example:
a = [[[1,2,3,4,5,6,7,8,9]],[[11,12,13,14,15,16,17,18,19]]]
b = [[[10, 20, 30]], [[1, 2, 3]]]
result = [[[1,2,3,10,4,5,6,20,7,8,9,30]],[[11,12,13,1,14,15,16,2,17,18,19,3]]]
How can I do this in pytorch?

This would do the trick:
torch.concat([a.reshape((2, 1, 3, 3)), b.reshape(2, 1, 3, 1)], axis=-1).reshape((2, 1, -1))
There's probably a smarter way to do this, but hey, it works.

Related

Count Unique elements in pytorch Tensor

Suppose I have the following tensor: y = torch.randint(0, 3, (10,)). How would you go about counting the 0's 1's and 2's in there?
The only way I can think of is by using collections.Counter(y) but was wondering if there was a more "pytorch" way of doing this. A use case for example would be when building the confusion matrix for predictions.
You can use torch.unique with the return_counts option:
>>> x = torch.randint(0, 3, (10,))
tensor([1, 1, 0, 2, 1, 0, 1, 1, 2, 1])
>>> x.unique(return_counts=True)
(tensor([0, 1, 2]), tensor([2, 6, 2]))

Bitwise operations in Pytorch

Could someone help me how to perform bitwise AND operations on two tensors in Pytorch 1.4?
Apparently I could only find NOT and XOR operations in official document
I don't see them in the docs, but it looks like &, |, __and__, __or__, __xor__, etc are bit-wise:
>>> torch.tensor([1, 2, 3, 4]).__xor__(torch.tensor([1, 1, 1, 1]))
tensor([0, 3, 2, 5])
>>> torch.tensor([1, 2, 3, 4]) | torch.tensor([1, 1, 1, 1])
tensor([1, 3, 3, 5])
>>> torch.tensor([1, 2, 3, 4]) & torch.tensor([1, 1, 1, 1])
tensor([1, 0, 1, 0])
>>> torch.tensor([1, 2, 3, 4]).__and__(torch.tensor([1, 1, 1, 1]))
tensor([1, 0, 1, 0])
See https://github.com/pytorch/pytorch/pull/1556
Check this. There is no bitwise and/or operation for tensors in Torch. There are element-wise operations implemented in Torch, but not bite-wise ones.
However, if you could convert each bit as a separate Tensor dimension, you can use element-wise operation.
For an example,
a = torch.Tensor{0,1,1,0}
b = torch.Tensor{0,1,0,1}
torch.cmul(a,b):eq(1)
0
1
0
0
[torch.ByteTensor of size 4]
torch.add(a,b):ge(1)
0
1
1
1
[torch.ByteTensor of size 4]
Hope this will help you.

How should I understand the nn.Embeddings arguments num_embeddings and embedding_dim?

I'm trying to get used to the Embedding class in the PyTorch nn module.
I've noticed that quite a few other people have had the same problem as myself, and therefore posted questions on the PyTorch discussion forum and on Stack Overflow, but I'm still having some confusion.
According to the official documentation, the arguments that are passed are num_embeddings and embedding_dim which each refer to how large our dictionary (or vocabulary) is and how many dimensions we want our embeddings to be, respectively.
What I'm confused about is how exactly I should interpret those. For example, the small practice code that I ran:
import torch
import torch.nn as nn
embedding = nn.Embedding(num_embeddings=10, embedding_dim=3)
a = torch.LongTensor([[1, 2, 3, 4], [4, 3, 2, 1]]) # (2, 4)
b = torch.LongTensor([[1, 2, 3], [2, 3, 1], [4, 5, 6], [3, 3, 3], [2, 1, 2],
[6, 7, 8], [2, 5, 2], [3, 5, 8], [2, 3, 6], [8, 9, 6],
[2, 6, 3], [6, 5, 4], [2, 6, 5]]) # (13, 3)
c = torch.LongTensor([[1, 2, 3, 2, 1, 2, 3, 3, 3, 3, 3],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]]) # (2, 11)
When I run a, b, and c through the embedding variable, I get embedded results of shapes (2, 4, 3), (13, 3, 3), (2, 11, 3).
What's confusing me is that I thought of the number of samples we have exceeds the predefined number of embeddings, we should get an error? Since the embedding I've defined has 10 embeddings, shouldn't b give me an error since it is a tensor containing 13 words of dimension 3?
In your case, here is how your input tensor are interpreted:
a = torch.LongTensor([[1, 2, 3, 4], [4, 3, 2, 1]]) # 2 sequences of 4 elements
Moreover, this is how your embedding layer is interpreted:
embedding = nn.Embedding(num_embeddings=10, embedding_dim=3) # 10 distinct elements and each those is going to be embedded in a 3 dimensional space
So, it doesn't matter if your input tensor has more than 10 elements, as long as they are in the range [0, 9]. For example, if we create a tensor of two elements such as:
d = torch.LongTensor([[1, 10]]) # 1 sequence of 2 elements
We would get the following error when we pass this tensor through the embedding layer:
RuntimeError: index out of range: Tried to access index 10 out of table with 9 rows
To summarize num_embeddings is total number of unique elements in the vocabulary, and embedding_dim is the size of each embedded vector once passed through the embedding layer. Therefore, you can have a tensor of 10+ elements, as long as each element in the tensor is in the range [0, 9], because you defined a vocabulary size of 10 elements.

Explanation for slicing in Pytorch

why is the output same every time?
a = torch.tensor([0, 1, 2, 3, 4])
a[-2:] = torch.tensor([[[5, 6]]])
a
tensor([0, 1, 2, 5, 6])
a = torch.tensor([0, 1, 2, 3, 4])
a[-2:] = torch.tensor([[5, 6]])
a
tensor([0, 1, 2, 5, 6])
a = torch.tensor([0, 1, 2, 3, 4])
a[-2:] = torch.tensor([5, 6])
a
tensor([0, 1, 2, 5, 6])
Pytorch is following Numpy here which allows assignment to slices as long as the shapes are compatible meaning that the two sides have the same shape or the right hand side is broadcastable to the shape of the slice. Starting with trailing dimensions, two arrays are broadcastable if they only differ in dimensions where one of them is 1. So in this case
a = torch.tensor([0, 1, 2, 3, 4])
b = torch.tensor([[[5, 6]]])
print(a[-2:].shape, b.shape)
>> torch.Size([2]) torch.Size([1, 1, 2])
Pytorch will perform the following comparisons:
a[-2:].shape[-1] and b.shape[-1] are equal so the last dimension is compatible
a[-2:].shape[-2] does not exist, but b.shape[-2] is 1 so they are compatible
a[-2:].shape[-3] does not exist, but b.shape[-3] is 1 so they are compatible
All dimensions are compatible, so b can be broadcasted to a
Finally, Pytorch will convert b to tensor([5, 6]) before performing the assignment thus producing the result:
a[-2:] = b
print(a)
>> tensor([0, 1, 2, 5, 6])

Iterating over distinct permutations of a vector in Pari/GP

I want to iterate over all distinct permutations of a vector. I have tried doing this by using vecextract() in combination with numtoperm() to create a vector of permutations, and vecsort(,,,8) to remove equivalent permutations.
Unfortunately, this doesn't scale well: the maximum size of a vector within my current stack size of 4GB is less than 12!, and my machine only has 16GB.
Is there a way to do this without running out of memory, maybe by generating the k-th distinct permutation directly?
There is nothing built into PARI for this. I would suggest reading How to generate all the permutations of a multiset?.
Use forperm.
forperm([1,1,2,3], v, print(v))
Produces
Vecsmall([1, 1, 2, 3])
Vecsmall([1, 1, 3, 2])
Vecsmall([1, 2, 1, 3])
Vecsmall([1, 2, 3, 1])
Vecsmall([1, 3, 1, 2])
Vecsmall([1, 3, 2, 1])
Vecsmall([2, 1, 1, 3])
Vecsmall([2, 1, 3, 1])
Vecsmall([2, 3, 1, 1])
Vecsmall([3, 1, 1, 2])
Vecsmall([3, 1, 2, 1])
Vecsmall([3, 2, 1, 1])
Note that the input vector to forperm should be sorted for correct results.

Resources