Max-pooling with complex masks in PyTorch - pytorch

Suppose I have a matrix src with shape (5, 3) and a boolean matrix adj with shape (5, 5) as follow,
src = tensor([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
and
adj = tensor([[1, 0, 1, 1, 0],
[0, 1, 1, 1, 0],
[1, 1, 0, 1, 1],
[1, 1, 1, 0, 0],
[0, 0, 1, 0, 1]])
We can take each row in src as one node embedding, and regard each row in adj as the indicator of which nodes are the neighborhood.
My goal is to operate a max-pooling among all neighborhood node embeddings for each node in src.
For example, as the neighborhood nodes (including itself) for the 0-th node is 0, 2, 3, thus we compute a max-pooling on [0, 1, 2], [6, 7, 8], [ 9, 10, 11] and lead an updated embedding [ 9, 10, 11] to update 0-th node in src_update.
A simple solution I wrote is
src_update = torch.zeros_like(src)
for index in range(adj.size(0)):
list_of_non_zero = adj[index].nonzero().view(-1)
mat_non_zero = torch.index_select(src, 0, list_of_non_zero)
src_update[index] = torch.sum(mat_non_zero, dim=0)
And src_update is updated as:
tensor([[ 9, 10, 11],
[ 9, 10, 11],
[12, 13, 14],
[ 6, 7, 8],
[12, 13, 14]])
Although it works, it runs very slowly and doesn't look elegant!
Any suggestions to improve it for better efficiency?
In addition, if both src and adj are appended with batches ((batch, 5, 3), (batch, 5, 5)), how to make it works?

I was experimenting with your code:
output = torch.zeros_like(src)
for index in range(adj.size(0)):
nz = adj[index].nonzero().view(-1)
output[index] = src.index_select(0, nz).max(0).values
The bottleneck is of course the for loop. What first comes to mind is to use some kind of scatter function. However, the main issue here is the fact that the number of neighbors can vary from row to row. This means we will be unable to construct a tensor containing the candidate nodes before max pooling.
One possible solution is to create a helper tensor similar to src where the first node would contain placeholder values (these should not get chosen by the max-pooling, i.e. we can use -inf). We can index this tensor using a tensor containing indices: compared to your method, instead of removing the zeros with torch.nonzero(), we will place an index value of 0 (referring to the placeholder row in the first position of modified-src).
In practice, here how it looks like:
For the helper tensor src_, I placed -1s as placeholder values.
>>> src_ = torch.cat((-torch.ones_like(src[:1]), src))
tensor([[-inf, -inf, -inf],
[ 0., 1., 2.],
[ 3., 4., 5.],
[ 6., 7., 8.],
[ 9., 10., 11.],
[ 12., 13., 14.]])
We can convert the adj matrix into a tensor of indices:
>>> index = torch.arange(1, adj.size(1) + 1)*adj
tensor([[1, 0, 3, 4, 0],
[0, 2, 3, 4, 0],
[1, 2, 0, 4, 5],
[1, 2, 3, 0, 0],
[0, 0, 3, 0, 5]])
For easier indexing we will flatten index, index src_ on the first axis, and reshape right after:
>>> indexed = src_[index.flatten(), :].reshape(*adj.shape, 3)
tensor([[[ 0., 1., 2.],
[-inf, -inf, -inf],
[ 6., 7., 8.],
[ 9., 10., 11.],
[-inf, -inf, -inf]],
...
[[-inf, -inf, -inf],
[-inf, -inf, -inf],
[ 6., 7., 8.],
[-inf, -inf, -inf],
[ 12., 13., 14.]]])
Finally you can max-pool:
>>> indexed.max(dim=1).values
tensor([[ 9., 10., 11.],
[ 9., 10., 11.],
[12., 13., 14.],
[ 6., 7., 8.],
[12., 13., 14.]])

Ivan gave a pretty smart solution. The key idea is to transform mask to index. I have tested it and wrapped it up to the function below
def mask_max_pool(embeddings, mask):
'''
Inputs:
------------------
embeddings: [B, D, E],
mask: [B, R, D], 0s and 1s, 1 indicates membership
Outputs:
------------------
max pooled embeddings: [B, R, E], the max pooled embeddings according to the membership in mask
max pooled index: [B, R, E], the max pooled index
'''
B, D, E = embeddings.shape
_, R, _ = mask.shape
# extend embedding with placeholder
embeddings_ = torch.cat([-1e6*torch.ones_like(embeddings[:, :1, :]), embeddings], dim=1)
# transform mask to index
index = torch.arange(1, D+1).view(1, 1, -1).repeat(B, R, 1) * mask# [B, R, D]
# batch indices
batch_indices = torch.arange(B).view(B, 1, 1).repeat(1, R, D)
# retrieve embeddings by index
indexed = embeddings_[batch_indices.flatten(), index.flatten(), :].view(B, R, D, E)# [B, R, D, E]
# return
return indexed.max(dim=-2)

Related

Indexing using pyTorch tensors along one specific dimension with 3 dimensional tensor

I have 2 tensors:
A with shape (batch, sequence, vocab)
and B with shape (batch, sequence).
A = torch.tensor([[[ 1., 2., 3.],
[ 5., 6., 7.]],
[[ 9., 10., 11.],
[13., 14., 15.]]])
B = torch.tensor([[0, 2],
[1, 0]])
I want to get the following:
C = torch.zeros_like(B)
for i in range(B.shape[0]):
for j in range(B.shape[1]):
C[i,j] = A[i,j,B[i,j]]
But in a vectorized way. I tried torch.gather and other stuff but I cannot make it work.
Can anyone please help me?
>>> import torch
>>> A = torch.tensor([[[ 1., 2., 3.],
... [ 5., 6., 7.]],
...
... [[ 9., 10., 11.],
... [13., 14., 15.]]])
>>> B = torch.tensor([[0, 2],
... [1, 0]])
>>> A.shape
torch.Size([2, 2, 3])
>>> B.shape
torch.Size([2, 2])
>>> C = torch.zeros_like(B)
>>> for i in range(B.shape[0]):
... for j in range(B.shape[1]):
... C[i,j] = A[i,j,B[i,j]]
...
>>> C
tensor([[ 1, 7],
[10, 13]])
>>> torch.gather(A, -1, B.unsqueeze(-1))
tensor([[[ 1.],
[ 7.]],
[[10.],
[13.]]])
>>> torch.gather(A, -1, B.unsqueeze(-1)).shape
torch.Size([2, 2, 1])
>>> torch.gather(A, -1, B.unsqueeze(-1)).squeeze(-1)
tensor([[ 1., 7.],
[10., 13.]])
Hi, you can use torch.gather(A, -1, B.unsqueeze(-1)).squeeze(-1).
the first -1 between A and B.unsqueeze(-1) is indicating the dimension along which you want to pick the element.
the second -1 in B.unsqueeze(-1) is to add one dim to B to make the two tensor the same dims otherwise you get RuntimeError: Index tensor must have the same number of dimensions as input tensor.
the last -1 is to reshape the result from torch.Size([2, 2, 1]) to torch.Size([2, 2])

Using Pytorch how to define a tensor with indices and corresponding values

Problem
I have a list of indices and a list of values like so:
i = torch.tensor([[2, 2, 1], [2, 0, 2]])
v = torch.tensor([1, 2, 3])
I want to define a (3x3 for the example) matrix which contains the values v at the indices i (1 at position (2,2), 2 at position (2, 0) and 3 at position (1,2)):
tensor([[0, 0, 0],
[0, 0, 3],
[2, 0, 1]])
What I have tried
I can do it using a trick, with torch.sparse and .to_dense() but I feel that it's not the "pytorchic" way to do that nor the most efficient:
f = torch.sparse.FloatTensor(indices, values, torch.Size([3, 3]))
print(f.to_dense())
Any idea for a better solution ?
Ideally I would appreciate a solution at least as fast than the one provided above.
Of course this was just an example, no particular structure in tensors i and v are assumed (neither for the dimension).
There is an alternative, as below:
import torch
i = torch.tensor([[2, 2, 1], [2, 0, 2]])
v = torch.tensor([1, 2, 3], dtype=torch.float) # enforcing same data-type
target = torch.zeros([3,3], dtype=torch.float) # enforcing same data-type
target.index_put_(tuple([k for k in i]), v)
print(target)
The target tensor will be as follows:
tensor([[0., 0., 0.],
[0., 0., 3.],
[2., 0., 1.]])
This medium.com blog article provides a comprehensive list of all index functions for PyTorch Tensors.

Why does this tensor.unfold method in this example add a dimension?

I am familiarizing myself with the Pytorch unfold method from https://pytorch.org/docs/stable/tensors.html#torch.Tensor.unfold
I looked at their example which is
>>> x = torch.arange(1., 8)
>>> x
tensor([ 1., 2., 3., 4., 5., 6., 7.])
>>> x.unfold(0, 2, 1)
tensor([[ 1., 2.],
[ 2., 3.],
[ 3., 4.],
[ 4., 5.],
[ 5., 6.],
[ 6., 7.]])
I understand above that when we unfold in dimension 0, we take chunks of size 2 at a time with stride 1 and therefore, the result is an arrangement of different chunks, which are [1., 2.], [2., 3.] and so on. As we have 6 chunks at the end, the chunks will be put together and the final shape is (6,2).
However, I have another example I ran as shown below.
In [115]: s = torch.arange(20).view(1,10,2)
In [116]: s
Out[116]:
tensor([[[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11],
[12, 13],
[14, 15],
[16, 17],
[18, 19]]])
In [117]: s.unfold(0,1,1)
Out[117]:
tensor([[[[ 0],
[ 1]],
[[ 2],
[ 3]],
[[ 4],
[ 5]],
[[ 6],
[ 7]],
[[ 8],
[ 9]],
[[10],
[11]],
[[12],
[13]],
[[14],
[15]],
[[16],
[17]],
[[18],
[19]]]])
In [119]: s.unfold(0,1,1).shape
Out[119]: torch.Size([1, 10, 2, 1])
So you see my original tensor was of shape (1,10,2) and I asked for an unfolding operation with parameters s.unfold(0, 1, 1).
Going by original understanding from the previous example, I assumed this means in the dimension 0, we take 1 chunk at a time with stride 1. Thus, as we have go into dimension 0, we see that we have only one chunk of size (10,2). So the output should have just taken this chunk and may be it should have just added a dimension to wrap this chunk and given me an output of size (1, 10, 2).
However, it gives me an output of size (1, 10, 2, 1). Why does it have an extra dimension at the last? Can someone elaborate intuitively please?
The documentation states:
An additional dimension of size size is appended in the returned tensor.
where size is the size of the chunks you specified (second argument). By definition, it always adds an additional dimension, which makes it consistent no matter what size you choose. Just because a dimension has size 1, doesn't mean it should be omitted automatically.
Regarding the intuition behind this, let's consider that instead of returning a tensor where the last dimension represents the chunks, we create a list of all chunks. For simplicity, we'll limit it to the first dimension with a step of 1.
import torch
from typing import List
def list_chunks(tensor: torch.Tensor, size: int) -> List[torch.Tensor]:
chunks = []
for i in range(tensor.size(0) - size + 1):
chunks.append(tensor[i : i + size])
return chunks
x = torch.arange(1.0, 8)
s = torch.arange(20).view(1, 10, 2)
# As expected, a list with 6 elements, as there are 6 chunks.
list_chunks(x, 2)
# => [tensor([1., 2.]),
# tensor([2., 3.]),
# tensor([3., 4.]),
# tensor([4., 5.]),
# tensor([5., 6.]),
# tensor([6., 7.])]
# The list has only a single element, as there is only a single chunk.
# But it's still a list.
list_chunks(s, 1)
# => [tensor([[[ 0, 1],
# [ 2, 3],
# [ 4, 5],
# [ 6, 7],
# [ 8, 9],
# [10, 11],
# [12, 13],
# [14, 15],
# [16, 17],
# [18, 19]]])]
I've deliberately included type annotations to make it clearer what we are expecting from the function. If there is only a single chunk, it will be a list with one element, as it is always a list of chunks.
You were expecting a different behaviour, namely when there is a single chunk, you want the single chunk instead of a list. Which would change the implementation as follows.
from typing import List, Union
def list_chunks(tensor: torch.Tensor, size: int) -> Union[List[torch.Tensor], torch.Tensor]:
chunks = []
for i in range(tensor.size(0) - size + 1):
chunks.append(tensor[i : i + size])
# If it's a single chunk, return just the chunk itself
if len(chunks) == 1:
return chunks[0]
else:
return chunks
With that change, anyone that uses this function, now needs to take two cases into consideration. If you don't distinguish between a list and a single chunk (tensor), you will get unexpected results, e.g. looping over the chunks would instead loop over the first dimension of the tensor.
The programmatically intuitive approach is to always return a list of chunks and torch.unfold does the same, but instead of a list of chunks, it's a tensor where the last dimension can be seen as the listing of the chunks.

np.add.at indexing with array

I'm working on cs231n and I'm having a difficult time understanding how this indexing works. Given that
x = [[0,4,1], [3,2,4]]
dW = np.zeros(5,6)
dout = [[[ 1.19034710e-01 -4.65005990e-01 8.93743168e-01 -9.78047129e-01
-8.88672957e-01 -4.66605091e-01]
[ -1.38617461e-03 -2.64569728e-01 -3.83712733e-01 -2.61360826e-01
8.07072009e-01 -5.47607277e-01]
[ -3.97087458e-01 -4.25187949e-02 2.57931759e-01 7.49565950e-01
1.37707667e+00 1.77392240e+00]]
[[ -1.20692745e+00 -8.28111550e-01 6.53041092e-01 -2.31247762e+00
-1.72370321e+00 2.44308033e+00]
[ -1.45191870e+00 -3.49328154e-01 6.15445782e-01 -2.84190582e-01
4.85997687e-02 4.81590106e-01]
[ -1.14828583e+00 -9.69055406e-01 -1.00773809e+00 3.63553835e-01
-1.28078363e+00 -2.54448436e+00]]]
The operation they do is
np.add.at(dW, x, dout)
x is a two dimensional array. How does indexing work here? I went through np.ufunc.at documentation but they have simple examples with 1d array and constant:
np.add.at(a, [0, 1, 2, 2], 1)
In [226]: x = [[0,4,1], [3,2,4]]
...: dW = np.zeros((5,6),int)
In [227]: np.add.at(dW,x,1)
In [228]: dW
Out[228]:
array([[0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0]])
With this x there aren't any duplicate entries, so add.at is the same as using += indexing. Equivalently we can read the changed values with:
In [229]: dW[x[0], x[1]]
Out[229]: array([1, 1, 1])
The indices work the same either way, including broadcasting:
In [234]: dW[...]=0
In [235]: np.add.at(dW,[[[1],[2]],[2,4,4]],1)
In [236]: dW
Out[236]:
array([[0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 2, 0],
[0, 0, 1, 0, 2, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]])
possible values
The values have to be broadcastable, with respect to the indexes:
In [112]: np.add.at(dW,[[[1],[2]],[2,4,4]],np.ones((2,3)))
...
In [114]: np.add.at(dW,[[[1],[2]],[2,4,4]],np.ones((2,3)).ravel())
...
ValueError: array is not broadcastable to correct shape
In [115]: np.add.at(dW,[[[1],[2]],[2,4,4]],[1,2,3])
In [117]: np.add.at(dW,[[[1],[2]],[2,4,4]],[[1],[2]])
In [118]: dW
Out[118]:
array([[ 0, 0, 0, 0, 0, 0],
[ 0, 0, 3, 0, 9, 0],
[ 0, 0, 4, 0, 11, 0],
[ 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0]])
In this case the indices define a (2,3) shape, so (2,3),(3,), (2,1), and scalar values work. (6,) does not.
In this case, add.at is mapping a (2,3) array onto a (2,2) subarray of dW.
recently I also have a hard time to understand this line of code. Hope what I got can help you, correct me if I am wrong.
The three arrays in this line of code is following:
x , whose shape is (N,T)
dW, ---(V,D)
dout ---(N,T,D)
Then we come to the line code we want to figure out what happens
np.add.at(dW, x, dout)
If you dont want to know the thinking procedure. The above code is equivalent to :
for row in range(N):
for col in range(T):
dW[ x[row,col] , :] += dout[row,col, :]
This is the thinking procedure:
Refering to this doc
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ufunc.at.html
We know that the x is the index array. So the key is to understand dW[x].
This is the concept of indexing an array(dW) using another array(x). If you are not familiar with this concept, can check out this link
https://docs.scipy.org/doc/numpy-1.13.0/user/basics.indexing.html
Generally speaking, what is returned when index arrays are used is an array with the same shape as the index array, but with the type and values of the array being indexed.
dW[x] will give us an array whose shape is (N,T,D), the (N,T) part comes from x, and the (D) comes from dW (V,D). Note here, every element of x is inside the range of [0, v).
Let's take some number as concrete example
x: np.array([[0,0],[0,0]]) ---- (2,2) N=2, T=2
dW: np.array([[0,0],[2,2]]) ---- (2,2) V=2, D=2
dout: np.arange(1,9).reshape(2,2,2) ----(2,2,2) N=2, T=2, D=2
dW[x] should be [ [[0 0] #this comes from the dW's firt row
[0 0]]
[[0 0]
[0 0]] ]
dW[x] add dout means that add the elemnet item(here, this some trick, later will explian)
np.add.at(dW, x, dout) gives
[ [16 20]
[ 2 2] ]
Why? The procedure is:
It add [1,2] to the first row of dW, which is [0,0].
Why first row? Because the x[0,0] = 0, indicating the first row of dW, dW[0] = dW[0,:] = the first row.
Then it add [3,4] to the first row of dW[0,0]. [3,4]=dout[0,1,:].
[0,0] again, comes from the dW, x[0,1] = 0, still the first row of dW[0].
Then it add [5,6] to the first row of dW.
Then it add [7,8] to the first row of dW.
So the result is [1+3+5+7, 2+4+6+8] = [16,20]. Because we do not touch the second row of dW. The dW's second row remains unchanged.
The trick is that we will only count the origin row once, can think that there is no buffer, and every step plays in the original place.
Let's consider an example based on this assignment from cs231n. If we are talking about multiple directions it's much easier to use a concrete settings.
np.random.seed(1)
N, T, V, D = 2, 3, 7, 6
x = np.random.randint(V, size=(N, T))
dW_man = np.zeros((V, D))
dW_man[x].shape, x.shape
((2, 3, 6), (2, 3))
x
array([[5, 3, 4],
[0, 1, 3]])
dout = np.arange(2*3*6).reshape(dW_man[x].shape)
dout
array([[[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17]],
[[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]]])
What should be the rows of dW_man[x]? Well [0, 1, ...] should be added to the row 5, [ 6, 7, ..] - to the row 3. And also [30, 31, ...] should be added to the row 3. So let's compute it manually. See more examples and explanation in this GitHub gist: link.
dW_man[5] = dout[0, 0]
dW_man[3] = dout[0, 1]
dW_man[4] = dout[0, 2]
dW_man[0] = dout[1, 0]
dW_man[1] = dout[1, 1]
dW_man[3] = dout[1, 2]
dW_man
array([[18., 19., 20., 21., 22., 23.],
[24., 25., 26., 27., 28., 29.],
[ 0., 0., 0., 0., 0., 0.],
[30., 31., 32., 33., 34., 35.],
[12., 13., 14., 15., 16., 17.],
[ 0., 1., 2., 3., 4., 5.],
[ 0., 0., 0., 0., 0., 0.]])
Now let's use np.add.at.
np.random.seed(1)
N, T, V, D = 2, 3, 7, 6
x = np.random.randint(V, size=(N, T))
dW = np.zeros((V, D))
dout = np.arange(2*3*6).reshape(dW[x].shape)
np.add.at(dW, x, dout)
dW
array([[18., 19., 20., 21., 22., 23.],
[24., 25., 26., 27., 28., 29.],
[ 0., 0., 0., 0., 0., 0.],
[36., 38., 40., 42., 44., 46.],
[12., 13., 14., 15., 16., 17.],
[ 0., 1., 2., 3., 4., 5.],
[ 0., 0., 0., 0., 0., 0.]])

Thresholding in Theano

Is there a way to threshold the values in a Theano tensor? For instance, if v = t.vector(), I would like to create another tensor w which contains the same values as v, except that the ones that exceed a certain threshold T are replaced by T itself:
v = [1, 2, 3, 100, 200, 300]
T = 100
w = [1, 2, 3, 100, 100, 100]
More generally, what is there a standard framework to create your own operations on tensors?
Here is code that do that. Use the clip function.
import theano
v = theano.tensor.vector()
f = theano.function([v], theano.tensor.clip(v, 0, 100))
f([1, 2, 3, 100, 200, 300])
# array([ 1., 2., 3., 100., 100., 100.])
If you you don't want a min you can use a switch:
import theano
v = theano.tensor.vector()
f = theano.function([v], theano.tensor.clip(v, 0, 100))
f([1, 2, 3, 100, 200, 300])
# array([ 1., 2., 3., 100., 100., 100.])
f = theano.function([v], theano.tensor.switch(v<100, v, 100))
f([1, 2, 3, 100, 200, 300])
# array([ 1., 2., 3., 100., 100., 100.])

Resources