I have 2 tensors:
A with shape (batch, sequence, vocab)
and B with shape (batch, sequence).
A = torch.tensor([[[ 1., 2., 3.],
[ 5., 6., 7.]],
[[ 9., 10., 11.],
[13., 14., 15.]]])
B = torch.tensor([[0, 2],
[1, 0]])
I want to get the following:
C = torch.zeros_like(B)
for i in range(B.shape[0]):
for j in range(B.shape[1]):
C[i,j] = A[i,j,B[i,j]]
But in a vectorized way. I tried torch.gather and other stuff but I cannot make it work.
Can anyone please help me?
>>> import torch
>>> A = torch.tensor([[[ 1., 2., 3.],
... [ 5., 6., 7.]],
...
... [[ 9., 10., 11.],
... [13., 14., 15.]]])
>>> B = torch.tensor([[0, 2],
... [1, 0]])
>>> A.shape
torch.Size([2, 2, 3])
>>> B.shape
torch.Size([2, 2])
>>> C = torch.zeros_like(B)
>>> for i in range(B.shape[0]):
... for j in range(B.shape[1]):
... C[i,j] = A[i,j,B[i,j]]
...
>>> C
tensor([[ 1, 7],
[10, 13]])
>>> torch.gather(A, -1, B.unsqueeze(-1))
tensor([[[ 1.],
[ 7.]],
[[10.],
[13.]]])
>>> torch.gather(A, -1, B.unsqueeze(-1)).shape
torch.Size([2, 2, 1])
>>> torch.gather(A, -1, B.unsqueeze(-1)).squeeze(-1)
tensor([[ 1., 7.],
[10., 13.]])
Hi, you can use torch.gather(A, -1, B.unsqueeze(-1)).squeeze(-1).
the first -1 between A and B.unsqueeze(-1) is indicating the dimension along which you want to pick the element.
the second -1 in B.unsqueeze(-1) is to add one dim to B to make the two tensor the same dims otherwise you get RuntimeError: Index tensor must have the same number of dimensions as input tensor.
the last -1 is to reshape the result from torch.Size([2, 2, 1]) to torch.Size([2, 2])
I am familiarizing myself with the Pytorch unfold method from https://pytorch.org/docs/stable/tensors.html#torch.Tensor.unfold
I looked at their example which is
>>> x = torch.arange(1., 8)
>>> x
tensor([ 1., 2., 3., 4., 5., 6., 7.])
>>> x.unfold(0, 2, 1)
tensor([[ 1., 2.],
[ 2., 3.],
[ 3., 4.],
[ 4., 5.],
[ 5., 6.],
[ 6., 7.]])
I understand above that when we unfold in dimension 0, we take chunks of size 2 at a time with stride 1 and therefore, the result is an arrangement of different chunks, which are [1., 2.], [2., 3.] and so on. As we have 6 chunks at the end, the chunks will be put together and the final shape is (6,2).
However, I have another example I ran as shown below.
In [115]: s = torch.arange(20).view(1,10,2)
In [116]: s
Out[116]:
tensor([[[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11],
[12, 13],
[14, 15],
[16, 17],
[18, 19]]])
In [117]: s.unfold(0,1,1)
Out[117]:
tensor([[[[ 0],
[ 1]],
[[ 2],
[ 3]],
[[ 4],
[ 5]],
[[ 6],
[ 7]],
[[ 8],
[ 9]],
[[10],
[11]],
[[12],
[13]],
[[14],
[15]],
[[16],
[17]],
[[18],
[19]]]])
In [119]: s.unfold(0,1,1).shape
Out[119]: torch.Size([1, 10, 2, 1])
So you see my original tensor was of shape (1,10,2) and I asked for an unfolding operation with parameters s.unfold(0, 1, 1).
Going by original understanding from the previous example, I assumed this means in the dimension 0, we take 1 chunk at a time with stride 1. Thus, as we have go into dimension 0, we see that we have only one chunk of size (10,2). So the output should have just taken this chunk and may be it should have just added a dimension to wrap this chunk and given me an output of size (1, 10, 2).
However, it gives me an output of size (1, 10, 2, 1). Why does it have an extra dimension at the last? Can someone elaborate intuitively please?
The documentation states:
An additional dimension of size size is appended in the returned tensor.
where size is the size of the chunks you specified (second argument). By definition, it always adds an additional dimension, which makes it consistent no matter what size you choose. Just because a dimension has size 1, doesn't mean it should be omitted automatically.
Regarding the intuition behind this, let's consider that instead of returning a tensor where the last dimension represents the chunks, we create a list of all chunks. For simplicity, we'll limit it to the first dimension with a step of 1.
import torch
from typing import List
def list_chunks(tensor: torch.Tensor, size: int) -> List[torch.Tensor]:
chunks = []
for i in range(tensor.size(0) - size + 1):
chunks.append(tensor[i : i + size])
return chunks
x = torch.arange(1.0, 8)
s = torch.arange(20).view(1, 10, 2)
# As expected, a list with 6 elements, as there are 6 chunks.
list_chunks(x, 2)
# => [tensor([1., 2.]),
# tensor([2., 3.]),
# tensor([3., 4.]),
# tensor([4., 5.]),
# tensor([5., 6.]),
# tensor([6., 7.])]
# The list has only a single element, as there is only a single chunk.
# But it's still a list.
list_chunks(s, 1)
# => [tensor([[[ 0, 1],
# [ 2, 3],
# [ 4, 5],
# [ 6, 7],
# [ 8, 9],
# [10, 11],
# [12, 13],
# [14, 15],
# [16, 17],
# [18, 19]]])]
I've deliberately included type annotations to make it clearer what we are expecting from the function. If there is only a single chunk, it will be a list with one element, as it is always a list of chunks.
You were expecting a different behaviour, namely when there is a single chunk, you want the single chunk instead of a list. Which would change the implementation as follows.
from typing import List, Union
def list_chunks(tensor: torch.Tensor, size: int) -> Union[List[torch.Tensor], torch.Tensor]:
chunks = []
for i in range(tensor.size(0) - size + 1):
chunks.append(tensor[i : i + size])
# If it's a single chunk, return just the chunk itself
if len(chunks) == 1:
return chunks[0]
else:
return chunks
With that change, anyone that uses this function, now needs to take two cases into consideration. If you don't distinguish between a list and a single chunk (tensor), you will get unexpected results, e.g. looping over the chunks would instead loop over the first dimension of the tensor.
The programmatically intuitive approach is to always return a list of chunks and torch.unfold does the same, but instead of a list of chunks, it's a tensor where the last dimension can be seen as the listing of the chunks.
I'm working on cs231n and I'm having a difficult time understanding how this indexing works. Given that
x = [[0,4,1], [3,2,4]]
dW = np.zeros(5,6)
dout = [[[ 1.19034710e-01 -4.65005990e-01 8.93743168e-01 -9.78047129e-01
-8.88672957e-01 -4.66605091e-01]
[ -1.38617461e-03 -2.64569728e-01 -3.83712733e-01 -2.61360826e-01
8.07072009e-01 -5.47607277e-01]
[ -3.97087458e-01 -4.25187949e-02 2.57931759e-01 7.49565950e-01
1.37707667e+00 1.77392240e+00]]
[[ -1.20692745e+00 -8.28111550e-01 6.53041092e-01 -2.31247762e+00
-1.72370321e+00 2.44308033e+00]
[ -1.45191870e+00 -3.49328154e-01 6.15445782e-01 -2.84190582e-01
4.85997687e-02 4.81590106e-01]
[ -1.14828583e+00 -9.69055406e-01 -1.00773809e+00 3.63553835e-01
-1.28078363e+00 -2.54448436e+00]]]
The operation they do is
np.add.at(dW, x, dout)
x is a two dimensional array. How does indexing work here? I went through np.ufunc.at documentation but they have simple examples with 1d array and constant:
np.add.at(a, [0, 1, 2, 2], 1)
In [226]: x = [[0,4,1], [3,2,4]]
...: dW = np.zeros((5,6),int)
In [227]: np.add.at(dW,x,1)
In [228]: dW
Out[228]:
array([[0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0]])
With this x there aren't any duplicate entries, so add.at is the same as using += indexing. Equivalently we can read the changed values with:
In [229]: dW[x[0], x[1]]
Out[229]: array([1, 1, 1])
The indices work the same either way, including broadcasting:
In [234]: dW[...]=0
In [235]: np.add.at(dW,[[[1],[2]],[2,4,4]],1)
In [236]: dW
Out[236]:
array([[0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 2, 0],
[0, 0, 1, 0, 2, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]])
possible values
The values have to be broadcastable, with respect to the indexes:
In [112]: np.add.at(dW,[[[1],[2]],[2,4,4]],np.ones((2,3)))
...
In [114]: np.add.at(dW,[[[1],[2]],[2,4,4]],np.ones((2,3)).ravel())
...
ValueError: array is not broadcastable to correct shape
In [115]: np.add.at(dW,[[[1],[2]],[2,4,4]],[1,2,3])
In [117]: np.add.at(dW,[[[1],[2]],[2,4,4]],[[1],[2]])
In [118]: dW
Out[118]:
array([[ 0, 0, 0, 0, 0, 0],
[ 0, 0, 3, 0, 9, 0],
[ 0, 0, 4, 0, 11, 0],
[ 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0]])
In this case the indices define a (2,3) shape, so (2,3),(3,), (2,1), and scalar values work. (6,) does not.
In this case, add.at is mapping a (2,3) array onto a (2,2) subarray of dW.
recently I also have a hard time to understand this line of code. Hope what I got can help you, correct me if I am wrong.
The three arrays in this line of code is following:
x , whose shape is (N,T)
dW, ---(V,D)
dout ---(N,T,D)
Then we come to the line code we want to figure out what happens
np.add.at(dW, x, dout)
If you dont want to know the thinking procedure. The above code is equivalent to :
for row in range(N):
for col in range(T):
dW[ x[row,col] , :] += dout[row,col, :]
This is the thinking procedure:
Refering to this doc
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ufunc.at.html
We know that the x is the index array. So the key is to understand dW[x].
This is the concept of indexing an array(dW) using another array(x). If you are not familiar with this concept, can check out this link
https://docs.scipy.org/doc/numpy-1.13.0/user/basics.indexing.html
Generally speaking, what is returned when index arrays are used is an array with the same shape as the index array, but with the type and values of the array being indexed.
dW[x] will give us an array whose shape is (N,T,D), the (N,T) part comes from x, and the (D) comes from dW (V,D). Note here, every element of x is inside the range of [0, v).
Let's take some number as concrete example
x: np.array([[0,0],[0,0]]) ---- (2,2) N=2, T=2
dW: np.array([[0,0],[2,2]]) ---- (2,2) V=2, D=2
dout: np.arange(1,9).reshape(2,2,2) ----(2,2,2) N=2, T=2, D=2
dW[x] should be [ [[0 0] #this comes from the dW's firt row
[0 0]]
[[0 0]
[0 0]] ]
dW[x] add dout means that add the elemnet item(here, this some trick, later will explian)
np.add.at(dW, x, dout) gives
[ [16 20]
[ 2 2] ]
Why? The procedure is:
It add [1,2] to the first row of dW, which is [0,0].
Why first row? Because the x[0,0] = 0, indicating the first row of dW, dW[0] = dW[0,:] = the first row.
Then it add [3,4] to the first row of dW[0,0]. [3,4]=dout[0,1,:].
[0,0] again, comes from the dW, x[0,1] = 0, still the first row of dW[0].
Then it add [5,6] to the first row of dW.
Then it add [7,8] to the first row of dW.
So the result is [1+3+5+7, 2+4+6+8] = [16,20]. Because we do not touch the second row of dW. The dW's second row remains unchanged.
The trick is that we will only count the origin row once, can think that there is no buffer, and every step plays in the original place.
Let's consider an example based on this assignment from cs231n. If we are talking about multiple directions it's much easier to use a concrete settings.
np.random.seed(1)
N, T, V, D = 2, 3, 7, 6
x = np.random.randint(V, size=(N, T))
dW_man = np.zeros((V, D))
dW_man[x].shape, x.shape
((2, 3, 6), (2, 3))
x
array([[5, 3, 4],
[0, 1, 3]])
dout = np.arange(2*3*6).reshape(dW_man[x].shape)
dout
array([[[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17]],
[[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]]])
What should be the rows of dW_man[x]? Well [0, 1, ...] should be added to the row 5, [ 6, 7, ..] - to the row 3. And also [30, 31, ...] should be added to the row 3. So let's compute it manually. See more examples and explanation in this GitHub gist: link.
dW_man[5] = dout[0, 0]
dW_man[3] = dout[0, 1]
dW_man[4] = dout[0, 2]
dW_man[0] = dout[1, 0]
dW_man[1] = dout[1, 1]
dW_man[3] = dout[1, 2]
dW_man
array([[18., 19., 20., 21., 22., 23.],
[24., 25., 26., 27., 28., 29.],
[ 0., 0., 0., 0., 0., 0.],
[30., 31., 32., 33., 34., 35.],
[12., 13., 14., 15., 16., 17.],
[ 0., 1., 2., 3., 4., 5.],
[ 0., 0., 0., 0., 0., 0.]])
Now let's use np.add.at.
np.random.seed(1)
N, T, V, D = 2, 3, 7, 6
x = np.random.randint(V, size=(N, T))
dW = np.zeros((V, D))
dout = np.arange(2*3*6).reshape(dW[x].shape)
np.add.at(dW, x, dout)
dW
array([[18., 19., 20., 21., 22., 23.],
[24., 25., 26., 27., 28., 29.],
[ 0., 0., 0., 0., 0., 0.],
[36., 38., 40., 42., 44., 46.],
[12., 13., 14., 15., 16., 17.],
[ 0., 1., 2., 3., 4., 5.],
[ 0., 0., 0., 0., 0., 0.]])
Is there a way to threshold the values in a Theano tensor? For instance, if v = t.vector(), I would like to create another tensor w which contains the same values as v, except that the ones that exceed a certain threshold T are replaced by T itself:
v = [1, 2, 3, 100, 200, 300]
T = 100
w = [1, 2, 3, 100, 100, 100]
More generally, what is there a standard framework to create your own operations on tensors?
Here is code that do that. Use the clip function.
import theano
v = theano.tensor.vector()
f = theano.function([v], theano.tensor.clip(v, 0, 100))
f([1, 2, 3, 100, 200, 300])
# array([ 1., 2., 3., 100., 100., 100.])
If you you don't want a min you can use a switch:
import theano
v = theano.tensor.vector()
f = theano.function([v], theano.tensor.clip(v, 0, 100))
f([1, 2, 3, 100, 200, 300])
# array([ 1., 2., 3., 100., 100., 100.])
f = theano.function([v], theano.tensor.switch(v<100, v, 100))
f([1, 2, 3, 100, 200, 300])
# array([ 1., 2., 3., 100., 100., 100.])