I want to test my neural network.
For example, given: an input tensor input, a nn.module with some submodules module, an output tensor output,
I want to find which indices of input effected the index (1,2) of output
More specifically, given:
Two input matrix of size (12, 12),
Operation is matmul
Queried index of the output matrix is: (0,0)
the expected output is:
InputMatrix1: (0,0), (0, 1), ..., (0, 11)
InputMatrix2: (0,0), (1, 0), ..., (11, 0)
Maybe visualization is okay.
Is there any method or libraries that can achieve this?
This is easy. You want to look at the non-zeros entries of the grad of InputMatrix1 and InputMatrix2 w.r.t the (0,0) element of the product:
x = torch.rand((12, 12), requires_grad=True) # explicitly asking for gradient for this tensor
y = torch.rand((12, 12), requires_grad=True) # explicitly asking for gradient for this tensor
# compute the product using # operator:
out = x # y
# use back propagation to compute the gradient w.r.t out[0, 0]:
out[0,0].backward()
Inspect the non-zero elements of the inputs' gradients yield, as expected:
In []: x.grad.nonzero()
tensor([[ 0, 0],
[ 0, 1],
[ 0, 2],
[ 0, 3],
[ 0, 4],
[ 0, 5],
[ 0, 6],
[ 0, 7],
[ 0, 8],
[ 0, 9],
[ 0, 10],
[ 0, 11]])
In []: y.grad.nonzero()
tensor([[ 0, 0],
[ 1, 0],
[ 2, 0],
[ 3, 0],
[ 4, 0],
[ 5, 0],
[ 6, 0],
[ 7, 0],
[ 8, 0],
[ 9, 0],
[10, 0],
[11, 0]])
I am trying to figure out how I overwrite a torch.tensor object, located inside a dict, with a new torch.tensor that is a bit longer due to padding.
# pad the tensor
zeros = torch.zeros(55).long()
zeros[zeros == 0] = 100 # change to padding
temp_input = torch.cat([batch['input_ids'][0][0], zeros], dim=-1) # cat
temp_input.shape # [567]
batch['input_ids'][0][0].shape # [512]
batch['input_ids'][0][0] = temp_input
# The expanded size of the tensor (512) must match the existing size (567) at non-singleton dimension 0. Target sizes: [512]. Tensor sizes: [567]
I am struggling to find a way to extend the values of a tensor in-place or to overwrite them if the dimensions change.
The dict is emitted from torch's DataLoader and looks like this:
{'input_ids': tensor([[[ 101, 3720, 2011, ..., 25786, 2135, 102]],
[[ 101, 1017, 2233, ..., 0, 0, 0]],
[[ 101, 1996, 2899, ..., 14262, 20693, 102]],
[[ 101, 2197, 2305, ..., 2000, 1996, 102]]]),
'attn_mask': tensor([[[1, 1, 1, ..., 1, 1, 1]],
[[1, 1, 1, ..., 0, 0, 0]],
[[1, 1, 1, ..., 1, 1, 1]],
[[1, 1, 1, ..., 1, 1, 1]]]),
'cats': tensor([[-0.6410, 0.1481, -2.1568, -0.6976],
[-0.4725, 0.1481, -2.1568, 0.7869],
[-0.6410, -0.9842, -2.1568, -0.6976],
[-0.6410, -0.9842, -2.1568, -0.6976]], grad_fn=<StackBackward>),
'target': tensor([[1],
[0],
[1],
[1]]),
'idx': tensor([1391, 4000, 293, 830])}
Suppose, I have a 3D tensor A
A = torch.arange(24).view(4, 3, 2)
print(A)
and require masking it using 2D tensor
mask = torch.zeros((4, 3), dtype=torch.int64) # or dtype=torch.ByteTensor
mask[0, 0] = 1
mask[1, 1] = 1
mask[3, 0] = 1
print('Mask: ', mask)
Using masked_select functionality from PyTorch leads to the following error.
torch.masked_select(X, (mask == 1))
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-72-fd6809d2c4cc> in <module>
12
13 # Select based on new mask
---> 14 Y = torch.masked_select(X, (mask == 1))
15 #Y = X * mask_
16 print(Y)
RuntimeError: The size of tensor a (2) must match the size of tensor b (3) at non-singleton dimension 2
How to mask a 3D tensor with a 2D mask and keep the dimensions of the original vector? Any hints will be appreciated.
Essentially, we need to match the dimension of the tensor mask with the tensor being masked.
There are two ways to do it.
Approach 1: Does not preserve original tensor dimensions.
X = torch.arange(24).view(4, 3, 2)
print(X)
mask = torch.zeros((4, 3), dtype=torch.int64) # or dtype=torch.ByteTensor
mask[0, 0] = 1
mask[1, 1] = 1
mask[3, 0] = 1
print('Mask: ', mask)
# Add a dimension to the mask tensor and expand it to the size of original tensor
mask_ = mask.unsqueeze(-1).expand(X.size())
print(mask_)
# Select based on the new expanded mask
Y = torch.masked_select(X, (mask_ == 1)) # does not preserve the dims
print(Y)
The output for approach 1:
tensor([ 0, 1, 8, 9, 18, 19])
Approach 2: Preserves the original tensor dimensions (by padding).
X = torch.arange(24).view(4, 3, 2)
print(X)
mask = torch.zeros((4, 3), dtype=torch.int64) # or dtype=torch.ByteTensor
mask[0, 0] = 1
mask[1, 1] = 1
mask[3, 0] = 1
print('Mask: ', mask)
# Add a dimension to the mask tensor and expand it to the size of original tensor
mask_ = mask.unsqueeze(-1).expand(X.size())
print(mask_)
# Select based on the new expanded mask
Y = X * mask_
print(Y)
The output for approach 2:
tensor([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15],
[16, 17]],
[[18, 19],
[20, 21],
[22, 23]]])
Mask: tensor([[1, 0, 0],
[0, 1, 0],
[0, 0, 0],
[1, 0, 0]])
tensor([[[1, 1],
[0, 0],
[0, 0]],
[[0, 0],
[1, 1],
[0, 0]],
[[0, 0],
[0, 0],
[0, 0]],
[[1, 1],
[0, 0],
[0, 0]]])
tensor([[[ 0, 1],
[ 0, 0],
[ 0, 0]],
[[ 0, 0],
[ 8, 9],
[ 0, 0]],
[[ 0, 0],
[ 0, 0],
[ 0, 0]],
[[18, 19],
[ 0, 0],
[ 0, 0]]]
There is a simple way to preserve dims as follows:
torch.mul(X, mask.unsqueeze(-1))
the results is also:
tensor([[[ 0, 1],
[ 0, 0],
[ 0, 0]],
[[ 0, 0],
[ 8, 9],
[ 0, 0]],
[[ 0, 0],
[ 0, 0],
[ 0, 0]],
[[18, 19],
[ 0, 0],
[ 0, 0]]])
I'm working on cs231n and I'm having a difficult time understanding how this indexing works. Given that
x = [[0,4,1], [3,2,4]]
dW = np.zeros(5,6)
dout = [[[ 1.19034710e-01 -4.65005990e-01 8.93743168e-01 -9.78047129e-01
-8.88672957e-01 -4.66605091e-01]
[ -1.38617461e-03 -2.64569728e-01 -3.83712733e-01 -2.61360826e-01
8.07072009e-01 -5.47607277e-01]
[ -3.97087458e-01 -4.25187949e-02 2.57931759e-01 7.49565950e-01
1.37707667e+00 1.77392240e+00]]
[[ -1.20692745e+00 -8.28111550e-01 6.53041092e-01 -2.31247762e+00
-1.72370321e+00 2.44308033e+00]
[ -1.45191870e+00 -3.49328154e-01 6.15445782e-01 -2.84190582e-01
4.85997687e-02 4.81590106e-01]
[ -1.14828583e+00 -9.69055406e-01 -1.00773809e+00 3.63553835e-01
-1.28078363e+00 -2.54448436e+00]]]
The operation they do is
np.add.at(dW, x, dout)
x is a two dimensional array. How does indexing work here? I went through np.ufunc.at documentation but they have simple examples with 1d array and constant:
np.add.at(a, [0, 1, 2, 2], 1)
In [226]: x = [[0,4,1], [3,2,4]]
...: dW = np.zeros((5,6),int)
In [227]: np.add.at(dW,x,1)
In [228]: dW
Out[228]:
array([[0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0]])
With this x there aren't any duplicate entries, so add.at is the same as using += indexing. Equivalently we can read the changed values with:
In [229]: dW[x[0], x[1]]
Out[229]: array([1, 1, 1])
The indices work the same either way, including broadcasting:
In [234]: dW[...]=0
In [235]: np.add.at(dW,[[[1],[2]],[2,4,4]],1)
In [236]: dW
Out[236]:
array([[0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 2, 0],
[0, 0, 1, 0, 2, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]])
possible values
The values have to be broadcastable, with respect to the indexes:
In [112]: np.add.at(dW,[[[1],[2]],[2,4,4]],np.ones((2,3)))
...
In [114]: np.add.at(dW,[[[1],[2]],[2,4,4]],np.ones((2,3)).ravel())
...
ValueError: array is not broadcastable to correct shape
In [115]: np.add.at(dW,[[[1],[2]],[2,4,4]],[1,2,3])
In [117]: np.add.at(dW,[[[1],[2]],[2,4,4]],[[1],[2]])
In [118]: dW
Out[118]:
array([[ 0, 0, 0, 0, 0, 0],
[ 0, 0, 3, 0, 9, 0],
[ 0, 0, 4, 0, 11, 0],
[ 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0]])
In this case the indices define a (2,3) shape, so (2,3),(3,), (2,1), and scalar values work. (6,) does not.
In this case, add.at is mapping a (2,3) array onto a (2,2) subarray of dW.
recently I also have a hard time to understand this line of code. Hope what I got can help you, correct me if I am wrong.
The three arrays in this line of code is following:
x , whose shape is (N,T)
dW, ---(V,D)
dout ---(N,T,D)
Then we come to the line code we want to figure out what happens
np.add.at(dW, x, dout)
If you dont want to know the thinking procedure. The above code is equivalent to :
for row in range(N):
for col in range(T):
dW[ x[row,col] , :] += dout[row,col, :]
This is the thinking procedure:
Refering to this doc
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ufunc.at.html
We know that the x is the index array. So the key is to understand dW[x].
This is the concept of indexing an array(dW) using another array(x). If you are not familiar with this concept, can check out this link
https://docs.scipy.org/doc/numpy-1.13.0/user/basics.indexing.html
Generally speaking, what is returned when index arrays are used is an array with the same shape as the index array, but with the type and values of the array being indexed.
dW[x] will give us an array whose shape is (N,T,D), the (N,T) part comes from x, and the (D) comes from dW (V,D). Note here, every element of x is inside the range of [0, v).
Let's take some number as concrete example
x: np.array([[0,0],[0,0]]) ---- (2,2) N=2, T=2
dW: np.array([[0,0],[2,2]]) ---- (2,2) V=2, D=2
dout: np.arange(1,9).reshape(2,2,2) ----(2,2,2) N=2, T=2, D=2
dW[x] should be [ [[0 0] #this comes from the dW's firt row
[0 0]]
[[0 0]
[0 0]] ]
dW[x] add dout means that add the elemnet item(here, this some trick, later will explian)
np.add.at(dW, x, dout) gives
[ [16 20]
[ 2 2] ]
Why? The procedure is:
It add [1,2] to the first row of dW, which is [0,0].
Why first row? Because the x[0,0] = 0, indicating the first row of dW, dW[0] = dW[0,:] = the first row.
Then it add [3,4] to the first row of dW[0,0]. [3,4]=dout[0,1,:].
[0,0] again, comes from the dW, x[0,1] = 0, still the first row of dW[0].
Then it add [5,6] to the first row of dW.
Then it add [7,8] to the first row of dW.
So the result is [1+3+5+7, 2+4+6+8] = [16,20]. Because we do not touch the second row of dW. The dW's second row remains unchanged.
The trick is that we will only count the origin row once, can think that there is no buffer, and every step plays in the original place.
Let's consider an example based on this assignment from cs231n. If we are talking about multiple directions it's much easier to use a concrete settings.
np.random.seed(1)
N, T, V, D = 2, 3, 7, 6
x = np.random.randint(V, size=(N, T))
dW_man = np.zeros((V, D))
dW_man[x].shape, x.shape
((2, 3, 6), (2, 3))
x
array([[5, 3, 4],
[0, 1, 3]])
dout = np.arange(2*3*6).reshape(dW_man[x].shape)
dout
array([[[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17]],
[[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]]])
What should be the rows of dW_man[x]? Well [0, 1, ...] should be added to the row 5, [ 6, 7, ..] - to the row 3. And also [30, 31, ...] should be added to the row 3. So let's compute it manually. See more examples and explanation in this GitHub gist: link.
dW_man[5] = dout[0, 0]
dW_man[3] = dout[0, 1]
dW_man[4] = dout[0, 2]
dW_man[0] = dout[1, 0]
dW_man[1] = dout[1, 1]
dW_man[3] = dout[1, 2]
dW_man
array([[18., 19., 20., 21., 22., 23.],
[24., 25., 26., 27., 28., 29.],
[ 0., 0., 0., 0., 0., 0.],
[30., 31., 32., 33., 34., 35.],
[12., 13., 14., 15., 16., 17.],
[ 0., 1., 2., 3., 4., 5.],
[ 0., 0., 0., 0., 0., 0.]])
Now let's use np.add.at.
np.random.seed(1)
N, T, V, D = 2, 3, 7, 6
x = np.random.randint(V, size=(N, T))
dW = np.zeros((V, D))
dout = np.arange(2*3*6).reshape(dW[x].shape)
np.add.at(dW, x, dout)
dW
array([[18., 19., 20., 21., 22., 23.],
[24., 25., 26., 27., 28., 29.],
[ 0., 0., 0., 0., 0., 0.],
[36., 38., 40., 42., 44., 46.],
[12., 13., 14., 15., 16., 17.],
[ 0., 1., 2., 3., 4., 5.],
[ 0., 0., 0., 0., 0., 0.]])