I have 2 tensors:
A with shape (batch, sequence, vocab)
and B with shape (batch, sequence).
A = torch.tensor([[[ 1., 2., 3.],
[ 5., 6., 7.]],
[[ 9., 10., 11.],
[13., 14., 15.]]])
B = torch.tensor([[0, 2],
[1, 0]])
I want to get the following:
C = torch.zeros_like(B)
for i in range(B.shape[0]):
for j in range(B.shape[1]):
C[i,j] = A[i,j,B[i,j]]
But in a vectorized way. I tried torch.gather and other stuff but I cannot make it work.
Can anyone please help me?
>>> import torch
>>> A = torch.tensor([[[ 1., 2., 3.],
... [ 5., 6., 7.]],
...
... [[ 9., 10., 11.],
... [13., 14., 15.]]])
>>> B = torch.tensor([[0, 2],
... [1, 0]])
>>> A.shape
torch.Size([2, 2, 3])
>>> B.shape
torch.Size([2, 2])
>>> C = torch.zeros_like(B)
>>> for i in range(B.shape[0]):
... for j in range(B.shape[1]):
... C[i,j] = A[i,j,B[i,j]]
...
>>> C
tensor([[ 1, 7],
[10, 13]])
>>> torch.gather(A, -1, B.unsqueeze(-1))
tensor([[[ 1.],
[ 7.]],
[[10.],
[13.]]])
>>> torch.gather(A, -1, B.unsqueeze(-1)).shape
torch.Size([2, 2, 1])
>>> torch.gather(A, -1, B.unsqueeze(-1)).squeeze(-1)
tensor([[ 1., 7.],
[10., 13.]])
Hi, you can use torch.gather(A, -1, B.unsqueeze(-1)).squeeze(-1).
the first -1 between A and B.unsqueeze(-1) is indicating the dimension along which you want to pick the element.
the second -1 in B.unsqueeze(-1) is to add one dim to B to make the two tensor the same dims otherwise you get RuntimeError: Index tensor must have the same number of dimensions as input tensor.
the last -1 is to reshape the result from torch.Size([2, 2, 1]) to torch.Size([2, 2])
Having two tensors :inputs_tokens is a batch of 20x300 of token ids
and seq_A is my model output with size of [20, 300, 512] (512 vector for each of the tokens in the batch)
seq_A.size()
Out[1]: torch.Size([20, 300, 512])
inputs_tokens.size()
torch.Size([20, 300])
I would like to get only the vectors of the token 101 (CLS) as follow:
cls_tokens = (inputs_tokens == 101)
cls_tokens
Out[4]:
tensor([[ True, False, False, ..., False, False, False],
[ True, False, False, ..., False, False, False],
[ True, False, False, ..., False, False, False], ...
How do I slice seq_A to get only the vectors which are true in cls_tokens for each batch?
when I do
seq_A[cls_tokens].size()
Out[7]: torch.Size([278, 512])
but I still need it to bee in the size of [20 x N x 512 ] (otherwise I don't know to which sample it belongs)
TLDR; You can't, all sequences must have the same size along a given axis.
Take this simplified example:
>>> inputs_tokens = torch.tensor([[ 1, 101, 18, 101, 9],
[ 1, 2, 101, 101, 101]])
>>> inputs_tokens.shape
torch.Size([2, 5])
>>> cls_tokens = inputs_tokens == 101
tensor([[False, True, False, True, False],
[False, False, True, True, True]])
Indexing inputs_tokens with the cls_tokens mask comes down to reducing inputs_tokens to cls_tokens's true values. In a general case where there is a different number of true values per batch, keeping the shape is impossible.
Following the above example, here is seq_A:
>>> seq_A = torch.rand(2, 5, 1)
tensor([[[0.4644],
[0.7656],
[0.3951],
[0.6384],
[0.1090]],
[[0.6754],
[0.0144],
[0.7154],
[0.5805],
[0.5274]]])
According to your example, you would expect to have an output shape of (2, N, 1). What would N be? 3? What about the first batch which only as 2 true values? The resulting tensor can't have different sizes (2 and 3 on axis=1). Hence: "all sequences on axis=1 must have the same size".
If however, you are expecting each batch to have the same number of tokens 101, then you could get away with a broadcast of your indexed tensor:
>>> inputs_tokens = torch.tensor([[ 1, 101, 101, 101, 9],
[ 1, 2, 101, 101, 101]])
>>> inputs_tokens.shape
>>> N = cls_tokens[0].sum()
3
Here remember, I'm assuming you have:
>>> assert all(cls_tokens.sum(axis=1) == N)
Therefore the desired output (with shape (2, 3, 1)) is:
>>> seq_A[cls_tokens].reshape(seq_A.size(0), N, -1)
tensor([[[0.7656],
[0.3951],
[0.6384]],
[[0.7154],
[0.5805],
[0.5274]]])
Edit - if you really want to do this though you would require the use of a list comprehension:
>>> [seq_A[i, cls_tokens[i]] for i in range(cls_tokens.size(0))]
[ tensor([[0.7656],
[0.6384]]),
tensor([[0.7154],
[0.5805],
[0.5274]]) ]
I'm working on cs231n and I'm having a difficult time understanding how this indexing works. Given that
x = [[0,4,1], [3,2,4]]
dW = np.zeros(5,6)
dout = [[[ 1.19034710e-01 -4.65005990e-01 8.93743168e-01 -9.78047129e-01
-8.88672957e-01 -4.66605091e-01]
[ -1.38617461e-03 -2.64569728e-01 -3.83712733e-01 -2.61360826e-01
8.07072009e-01 -5.47607277e-01]
[ -3.97087458e-01 -4.25187949e-02 2.57931759e-01 7.49565950e-01
1.37707667e+00 1.77392240e+00]]
[[ -1.20692745e+00 -8.28111550e-01 6.53041092e-01 -2.31247762e+00
-1.72370321e+00 2.44308033e+00]
[ -1.45191870e+00 -3.49328154e-01 6.15445782e-01 -2.84190582e-01
4.85997687e-02 4.81590106e-01]
[ -1.14828583e+00 -9.69055406e-01 -1.00773809e+00 3.63553835e-01
-1.28078363e+00 -2.54448436e+00]]]
The operation they do is
np.add.at(dW, x, dout)
x is a two dimensional array. How does indexing work here? I went through np.ufunc.at documentation but they have simple examples with 1d array and constant:
np.add.at(a, [0, 1, 2, 2], 1)
In [226]: x = [[0,4,1], [3,2,4]]
...: dW = np.zeros((5,6),int)
In [227]: np.add.at(dW,x,1)
In [228]: dW
Out[228]:
array([[0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0]])
With this x there aren't any duplicate entries, so add.at is the same as using += indexing. Equivalently we can read the changed values with:
In [229]: dW[x[0], x[1]]
Out[229]: array([1, 1, 1])
The indices work the same either way, including broadcasting:
In [234]: dW[...]=0
In [235]: np.add.at(dW,[[[1],[2]],[2,4,4]],1)
In [236]: dW
Out[236]:
array([[0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 2, 0],
[0, 0, 1, 0, 2, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]])
possible values
The values have to be broadcastable, with respect to the indexes:
In [112]: np.add.at(dW,[[[1],[2]],[2,4,4]],np.ones((2,3)))
...
In [114]: np.add.at(dW,[[[1],[2]],[2,4,4]],np.ones((2,3)).ravel())
...
ValueError: array is not broadcastable to correct shape
In [115]: np.add.at(dW,[[[1],[2]],[2,4,4]],[1,2,3])
In [117]: np.add.at(dW,[[[1],[2]],[2,4,4]],[[1],[2]])
In [118]: dW
Out[118]:
array([[ 0, 0, 0, 0, 0, 0],
[ 0, 0, 3, 0, 9, 0],
[ 0, 0, 4, 0, 11, 0],
[ 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0]])
In this case the indices define a (2,3) shape, so (2,3),(3,), (2,1), and scalar values work. (6,) does not.
In this case, add.at is mapping a (2,3) array onto a (2,2) subarray of dW.
recently I also have a hard time to understand this line of code. Hope what I got can help you, correct me if I am wrong.
The three arrays in this line of code is following:
x , whose shape is (N,T)
dW, ---(V,D)
dout ---(N,T,D)
Then we come to the line code we want to figure out what happens
np.add.at(dW, x, dout)
If you dont want to know the thinking procedure. The above code is equivalent to :
for row in range(N):
for col in range(T):
dW[ x[row,col] , :] += dout[row,col, :]
This is the thinking procedure:
Refering to this doc
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ufunc.at.html
We know that the x is the index array. So the key is to understand dW[x].
This is the concept of indexing an array(dW) using another array(x). If you are not familiar with this concept, can check out this link
https://docs.scipy.org/doc/numpy-1.13.0/user/basics.indexing.html
Generally speaking, what is returned when index arrays are used is an array with the same shape as the index array, but with the type and values of the array being indexed.
dW[x] will give us an array whose shape is (N,T,D), the (N,T) part comes from x, and the (D) comes from dW (V,D). Note here, every element of x is inside the range of [0, v).
Let's take some number as concrete example
x: np.array([[0,0],[0,0]]) ---- (2,2) N=2, T=2
dW: np.array([[0,0],[2,2]]) ---- (2,2) V=2, D=2
dout: np.arange(1,9).reshape(2,2,2) ----(2,2,2) N=2, T=2, D=2
dW[x] should be [ [[0 0] #this comes from the dW's firt row
[0 0]]
[[0 0]
[0 0]] ]
dW[x] add dout means that add the elemnet item(here, this some trick, later will explian)
np.add.at(dW, x, dout) gives
[ [16 20]
[ 2 2] ]
Why? The procedure is:
It add [1,2] to the first row of dW, which is [0,0].
Why first row? Because the x[0,0] = 0, indicating the first row of dW, dW[0] = dW[0,:] = the first row.
Then it add [3,4] to the first row of dW[0,0]. [3,4]=dout[0,1,:].
[0,0] again, comes from the dW, x[0,1] = 0, still the first row of dW[0].
Then it add [5,6] to the first row of dW.
Then it add [7,8] to the first row of dW.
So the result is [1+3+5+7, 2+4+6+8] = [16,20]. Because we do not touch the second row of dW. The dW's second row remains unchanged.
The trick is that we will only count the origin row once, can think that there is no buffer, and every step plays in the original place.
Let's consider an example based on this assignment from cs231n. If we are talking about multiple directions it's much easier to use a concrete settings.
np.random.seed(1)
N, T, V, D = 2, 3, 7, 6
x = np.random.randint(V, size=(N, T))
dW_man = np.zeros((V, D))
dW_man[x].shape, x.shape
((2, 3, 6), (2, 3))
x
array([[5, 3, 4],
[0, 1, 3]])
dout = np.arange(2*3*6).reshape(dW_man[x].shape)
dout
array([[[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17]],
[[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]]])
What should be the rows of dW_man[x]? Well [0, 1, ...] should be added to the row 5, [ 6, 7, ..] - to the row 3. And also [30, 31, ...] should be added to the row 3. So let's compute it manually. See more examples and explanation in this GitHub gist: link.
dW_man[5] = dout[0, 0]
dW_man[3] = dout[0, 1]
dW_man[4] = dout[0, 2]
dW_man[0] = dout[1, 0]
dW_man[1] = dout[1, 1]
dW_man[3] = dout[1, 2]
dW_man
array([[18., 19., 20., 21., 22., 23.],
[24., 25., 26., 27., 28., 29.],
[ 0., 0., 0., 0., 0., 0.],
[30., 31., 32., 33., 34., 35.],
[12., 13., 14., 15., 16., 17.],
[ 0., 1., 2., 3., 4., 5.],
[ 0., 0., 0., 0., 0., 0.]])
Now let's use np.add.at.
np.random.seed(1)
N, T, V, D = 2, 3, 7, 6
x = np.random.randint(V, size=(N, T))
dW = np.zeros((V, D))
dout = np.arange(2*3*6).reshape(dW[x].shape)
np.add.at(dW, x, dout)
dW
array([[18., 19., 20., 21., 22., 23.],
[24., 25., 26., 27., 28., 29.],
[ 0., 0., 0., 0., 0., 0.],
[36., 38., 40., 42., 44., 46.],
[12., 13., 14., 15., 16., 17.],
[ 0., 1., 2., 3., 4., 5.],
[ 0., 0., 0., 0., 0., 0.]])
Suppose I have a list of variables as follows:
v = [('d',0),('i',0),('g',0)]
What I want is to obtain a vector of values, that gives the truth value of the presence of the variable inside the list.
So, if have another list say
g = [('g',0)]
The output of that should be
op(v,g) = [False, False, True]
P.S.
I have tried using np.in1d but it gives the following:
array([False, True, False, True, True, True], dtype=bool)
In python you can use a list comprehension like following :
>>> v=[('d', 0), ('i', 0), ('g', 0)]
>>> g=[('t', 0), ('g', 0),('d',0)]
>>> [i in g for i in v]
[True, False, True]
You can convert those lists to numpy arrays and then use np.in1d like so -
import numpy as np
# Convert to numpy arrays
v_arr = np.array(v)
g_arr = np.array(g)
# Slice the first & second columns to get string & numeric parts.
# Use in1d to get matches between first columns of those two arrays;
# repeat for the second columns.
string_part = np.in1d(v_arr[:,0],g_arr[:,0])
numeric_part = np.in1d(v_arr[:,1],g_arr[:,1])
# Perform boolean AND to get the final boolean output
out = string_part & numeric_part
Sample run -
In [157]: v_arr
Out[157]:
array([['d', '0'],
['i', '0'],
['g', '0']],
dtype='<U1')
In [158]: g_arr
Out[158]:
array([['g', '1']],
dtype='<U1')
In [159]: string_part = np.in1d(v_arr[:,0],g_arr[:,0])
In [160]: string_part
Out[160]: array([False, False, True], dtype=bool)
In [161]: numeric_part = np.in1d(v_arr[:,1],g_arr[:,1])
In [162]: numeric_part
Out[162]: array([False, False, False], dtype=bool)
In [163]: string_part & numeric_part
Out[163]: array([False, False, False], dtype=bool)
Is there a way to threshold the values in a Theano tensor? For instance, if v = t.vector(), I would like to create another tensor w which contains the same values as v, except that the ones that exceed a certain threshold T are replaced by T itself:
v = [1, 2, 3, 100, 200, 300]
T = 100
w = [1, 2, 3, 100, 100, 100]
More generally, what is there a standard framework to create your own operations on tensors?
Here is code that do that. Use the clip function.
import theano
v = theano.tensor.vector()
f = theano.function([v], theano.tensor.clip(v, 0, 100))
f([1, 2, 3, 100, 200, 300])
# array([ 1., 2., 3., 100., 100., 100.])
If you you don't want a min you can use a switch:
import theano
v = theano.tensor.vector()
f = theano.function([v], theano.tensor.clip(v, 0, 100))
f([1, 2, 3, 100, 200, 300])
# array([ 1., 2., 3., 100., 100., 100.])
f = theano.function([v], theano.tensor.switch(v<100, v, 100))
f([1, 2, 3, 100, 200, 300])
# array([ 1., 2., 3., 100., 100., 100.])