I'm working on cs231n and I'm having a difficult time understanding how this indexing works. Given that
x = [[0,4,1], [3,2,4]]
dW = np.zeros(5,6)
dout = [[[ 1.19034710e-01 -4.65005990e-01 8.93743168e-01 -9.78047129e-01
-8.88672957e-01 -4.66605091e-01]
[ -1.38617461e-03 -2.64569728e-01 -3.83712733e-01 -2.61360826e-01
8.07072009e-01 -5.47607277e-01]
[ -3.97087458e-01 -4.25187949e-02 2.57931759e-01 7.49565950e-01
1.37707667e+00 1.77392240e+00]]
[[ -1.20692745e+00 -8.28111550e-01 6.53041092e-01 -2.31247762e+00
-1.72370321e+00 2.44308033e+00]
[ -1.45191870e+00 -3.49328154e-01 6.15445782e-01 -2.84190582e-01
4.85997687e-02 4.81590106e-01]
[ -1.14828583e+00 -9.69055406e-01 -1.00773809e+00 3.63553835e-01
-1.28078363e+00 -2.54448436e+00]]]
The operation they do is
np.add.at(dW, x, dout)
x is a two dimensional array. How does indexing work here? I went through np.ufunc.at documentation but they have simple examples with 1d array and constant:
np.add.at(a, [0, 1, 2, 2], 1)
In [226]: x = [[0,4,1], [3,2,4]]
...: dW = np.zeros((5,6),int)
In [227]: np.add.at(dW,x,1)
In [228]: dW
Out[228]:
array([[0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0]])
With this x there aren't any duplicate entries, so add.at is the same as using += indexing. Equivalently we can read the changed values with:
In [229]: dW[x[0], x[1]]
Out[229]: array([1, 1, 1])
The indices work the same either way, including broadcasting:
In [234]: dW[...]=0
In [235]: np.add.at(dW,[[[1],[2]],[2,4,4]],1)
In [236]: dW
Out[236]:
array([[0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 2, 0],
[0, 0, 1, 0, 2, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]])
possible values
The values have to be broadcastable, with respect to the indexes:
In [112]: np.add.at(dW,[[[1],[2]],[2,4,4]],np.ones((2,3)))
...
In [114]: np.add.at(dW,[[[1],[2]],[2,4,4]],np.ones((2,3)).ravel())
...
ValueError: array is not broadcastable to correct shape
In [115]: np.add.at(dW,[[[1],[2]],[2,4,4]],[1,2,3])
In [117]: np.add.at(dW,[[[1],[2]],[2,4,4]],[[1],[2]])
In [118]: dW
Out[118]:
array([[ 0, 0, 0, 0, 0, 0],
[ 0, 0, 3, 0, 9, 0],
[ 0, 0, 4, 0, 11, 0],
[ 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0]])
In this case the indices define a (2,3) shape, so (2,3),(3,), (2,1), and scalar values work. (6,) does not.
In this case, add.at is mapping a (2,3) array onto a (2,2) subarray of dW.
recently I also have a hard time to understand this line of code. Hope what I got can help you, correct me if I am wrong.
The three arrays in this line of code is following:
x , whose shape is (N,T)
dW, ---(V,D)
dout ---(N,T,D)
Then we come to the line code we want to figure out what happens
np.add.at(dW, x, dout)
If you dont want to know the thinking procedure. The above code is equivalent to :
for row in range(N):
for col in range(T):
dW[ x[row,col] , :] += dout[row,col, :]
This is the thinking procedure:
Refering to this doc
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ufunc.at.html
We know that the x is the index array. So the key is to understand dW[x].
This is the concept of indexing an array(dW) using another array(x). If you are not familiar with this concept, can check out this link
https://docs.scipy.org/doc/numpy-1.13.0/user/basics.indexing.html
Generally speaking, what is returned when index arrays are used is an array with the same shape as the index array, but with the type and values of the array being indexed.
dW[x] will give us an array whose shape is (N,T,D), the (N,T) part comes from x, and the (D) comes from dW (V,D). Note here, every element of x is inside the range of [0, v).
Let's take some number as concrete example
x: np.array([[0,0],[0,0]]) ---- (2,2) N=2, T=2
dW: np.array([[0,0],[2,2]]) ---- (2,2) V=2, D=2
dout: np.arange(1,9).reshape(2,2,2) ----(2,2,2) N=2, T=2, D=2
dW[x] should be [ [[0 0] #this comes from the dW's firt row
[0 0]]
[[0 0]
[0 0]] ]
dW[x] add dout means that add the elemnet item(here, this some trick, later will explian)
np.add.at(dW, x, dout) gives
[ [16 20]
[ 2 2] ]
Why? The procedure is:
It add [1,2] to the first row of dW, which is [0,0].
Why first row? Because the x[0,0] = 0, indicating the first row of dW, dW[0] = dW[0,:] = the first row.
Then it add [3,4] to the first row of dW[0,0]. [3,4]=dout[0,1,:].
[0,0] again, comes from the dW, x[0,1] = 0, still the first row of dW[0].
Then it add [5,6] to the first row of dW.
Then it add [7,8] to the first row of dW.
So the result is [1+3+5+7, 2+4+6+8] = [16,20]. Because we do not touch the second row of dW. The dW's second row remains unchanged.
The trick is that we will only count the origin row once, can think that there is no buffer, and every step plays in the original place.
Let's consider an example based on this assignment from cs231n. If we are talking about multiple directions it's much easier to use a concrete settings.
np.random.seed(1)
N, T, V, D = 2, 3, 7, 6
x = np.random.randint(V, size=(N, T))
dW_man = np.zeros((V, D))
dW_man[x].shape, x.shape
((2, 3, 6), (2, 3))
x
array([[5, 3, 4],
[0, 1, 3]])
dout = np.arange(2*3*6).reshape(dW_man[x].shape)
dout
array([[[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17]],
[[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]]])
What should be the rows of dW_man[x]? Well [0, 1, ...] should be added to the row 5, [ 6, 7, ..] - to the row 3. And also [30, 31, ...] should be added to the row 3. So let's compute it manually. See more examples and explanation in this GitHub gist: link.
dW_man[5] = dout[0, 0]
dW_man[3] = dout[0, 1]
dW_man[4] = dout[0, 2]
dW_man[0] = dout[1, 0]
dW_man[1] = dout[1, 1]
dW_man[3] = dout[1, 2]
dW_man
array([[18., 19., 20., 21., 22., 23.],
[24., 25., 26., 27., 28., 29.],
[ 0., 0., 0., 0., 0., 0.],
[30., 31., 32., 33., 34., 35.],
[12., 13., 14., 15., 16., 17.],
[ 0., 1., 2., 3., 4., 5.],
[ 0., 0., 0., 0., 0., 0.]])
Now let's use np.add.at.
np.random.seed(1)
N, T, V, D = 2, 3, 7, 6
x = np.random.randint(V, size=(N, T))
dW = np.zeros((V, D))
dout = np.arange(2*3*6).reshape(dW[x].shape)
np.add.at(dW, x, dout)
dW
array([[18., 19., 20., 21., 22., 23.],
[24., 25., 26., 27., 28., 29.],
[ 0., 0., 0., 0., 0., 0.],
[36., 38., 40., 42., 44., 46.],
[12., 13., 14., 15., 16., 17.],
[ 0., 1., 2., 3., 4., 5.],
[ 0., 0., 0., 0., 0., 0.]])
Related
I have 2 tensors:
A with shape (batch, sequence, vocab)
and B with shape (batch, sequence).
A = torch.tensor([[[ 1., 2., 3.],
[ 5., 6., 7.]],
[[ 9., 10., 11.],
[13., 14., 15.]]])
B = torch.tensor([[0, 2],
[1, 0]])
I want to get the following:
C = torch.zeros_like(B)
for i in range(B.shape[0]):
for j in range(B.shape[1]):
C[i,j] = A[i,j,B[i,j]]
But in a vectorized way. I tried torch.gather and other stuff but I cannot make it work.
Can anyone please help me?
>>> import torch
>>> A = torch.tensor([[[ 1., 2., 3.],
... [ 5., 6., 7.]],
...
... [[ 9., 10., 11.],
... [13., 14., 15.]]])
>>> B = torch.tensor([[0, 2],
... [1, 0]])
>>> A.shape
torch.Size([2, 2, 3])
>>> B.shape
torch.Size([2, 2])
>>> C = torch.zeros_like(B)
>>> for i in range(B.shape[0]):
... for j in range(B.shape[1]):
... C[i,j] = A[i,j,B[i,j]]
...
>>> C
tensor([[ 1, 7],
[10, 13]])
>>> torch.gather(A, -1, B.unsqueeze(-1))
tensor([[[ 1.],
[ 7.]],
[[10.],
[13.]]])
>>> torch.gather(A, -1, B.unsqueeze(-1)).shape
torch.Size([2, 2, 1])
>>> torch.gather(A, -1, B.unsqueeze(-1)).squeeze(-1)
tensor([[ 1., 7.],
[10., 13.]])
Hi, you can use torch.gather(A, -1, B.unsqueeze(-1)).squeeze(-1).
the first -1 between A and B.unsqueeze(-1) is indicating the dimension along which you want to pick the element.
the second -1 in B.unsqueeze(-1) is to add one dim to B to make the two tensor the same dims otherwise you get RuntimeError: Index tensor must have the same number of dimensions as input tensor.
the last -1 is to reshape the result from torch.Size([2, 2, 1]) to torch.Size([2, 2])
Suppose I have a matrix src with shape (5, 3) and a boolean matrix adj with shape (5, 5) as follow,
src = tensor([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
and
adj = tensor([[1, 0, 1, 1, 0],
[0, 1, 1, 1, 0],
[1, 1, 0, 1, 1],
[1, 1, 1, 0, 0],
[0, 0, 1, 0, 1]])
We can take each row in src as one node embedding, and regard each row in adj as the indicator of which nodes are the neighborhood.
My goal is to operate a max-pooling among all neighborhood node embeddings for each node in src.
For example, as the neighborhood nodes (including itself) for the 0-th node is 0, 2, 3, thus we compute a max-pooling on [0, 1, 2], [6, 7, 8], [ 9, 10, 11] and lead an updated embedding [ 9, 10, 11] to update 0-th node in src_update.
A simple solution I wrote is
src_update = torch.zeros_like(src)
for index in range(adj.size(0)):
list_of_non_zero = adj[index].nonzero().view(-1)
mat_non_zero = torch.index_select(src, 0, list_of_non_zero)
src_update[index] = torch.sum(mat_non_zero, dim=0)
And src_update is updated as:
tensor([[ 9, 10, 11],
[ 9, 10, 11],
[12, 13, 14],
[ 6, 7, 8],
[12, 13, 14]])
Although it works, it runs very slowly and doesn't look elegant!
Any suggestions to improve it for better efficiency?
In addition, if both src and adj are appended with batches ((batch, 5, 3), (batch, 5, 5)), how to make it works?
I was experimenting with your code:
output = torch.zeros_like(src)
for index in range(adj.size(0)):
nz = adj[index].nonzero().view(-1)
output[index] = src.index_select(0, nz).max(0).values
The bottleneck is of course the for loop. What first comes to mind is to use some kind of scatter function. However, the main issue here is the fact that the number of neighbors can vary from row to row. This means we will be unable to construct a tensor containing the candidate nodes before max pooling.
One possible solution is to create a helper tensor similar to src where the first node would contain placeholder values (these should not get chosen by the max-pooling, i.e. we can use -inf). We can index this tensor using a tensor containing indices: compared to your method, instead of removing the zeros with torch.nonzero(), we will place an index value of 0 (referring to the placeholder row in the first position of modified-src).
In practice, here how it looks like:
For the helper tensor src_, I placed -1s as placeholder values.
>>> src_ = torch.cat((-torch.ones_like(src[:1]), src))
tensor([[-inf, -inf, -inf],
[ 0., 1., 2.],
[ 3., 4., 5.],
[ 6., 7., 8.],
[ 9., 10., 11.],
[ 12., 13., 14.]])
We can convert the adj matrix into a tensor of indices:
>>> index = torch.arange(1, adj.size(1) + 1)*adj
tensor([[1, 0, 3, 4, 0],
[0, 2, 3, 4, 0],
[1, 2, 0, 4, 5],
[1, 2, 3, 0, 0],
[0, 0, 3, 0, 5]])
For easier indexing we will flatten index, index src_ on the first axis, and reshape right after:
>>> indexed = src_[index.flatten(), :].reshape(*adj.shape, 3)
tensor([[[ 0., 1., 2.],
[-inf, -inf, -inf],
[ 6., 7., 8.],
[ 9., 10., 11.],
[-inf, -inf, -inf]],
...
[[-inf, -inf, -inf],
[-inf, -inf, -inf],
[ 6., 7., 8.],
[-inf, -inf, -inf],
[ 12., 13., 14.]]])
Finally you can max-pool:
>>> indexed.max(dim=1).values
tensor([[ 9., 10., 11.],
[ 9., 10., 11.],
[12., 13., 14.],
[ 6., 7., 8.],
[12., 13., 14.]])
Ivan gave a pretty smart solution. The key idea is to transform mask to index. I have tested it and wrapped it up to the function below
def mask_max_pool(embeddings, mask):
'''
Inputs:
------------------
embeddings: [B, D, E],
mask: [B, R, D], 0s and 1s, 1 indicates membership
Outputs:
------------------
max pooled embeddings: [B, R, E], the max pooled embeddings according to the membership in mask
max pooled index: [B, R, E], the max pooled index
'''
B, D, E = embeddings.shape
_, R, _ = mask.shape
# extend embedding with placeholder
embeddings_ = torch.cat([-1e6*torch.ones_like(embeddings[:, :1, :]), embeddings], dim=1)
# transform mask to index
index = torch.arange(1, D+1).view(1, 1, -1).repeat(B, R, 1) * mask# [B, R, D]
# batch indices
batch_indices = torch.arange(B).view(B, 1, 1).repeat(1, R, D)
# retrieve embeddings by index
indexed = embeddings_[batch_indices.flatten(), index.flatten(), :].view(B, R, D, E)# [B, R, D, E]
# return
return indexed.max(dim=-2)
I have a complicated nested numpy array which contains list. I am trying to converted the elements to float32. However, it gives me following error:
ValueError Traceback (most recent call last)
<ipython-input-225-22d2824961c2> in <module>
----> 1 x_train_single.astype(np.float32)
ValueError: setting an array element with a sequence.
Here is the code and sample input:
x_train_single.astype(np.float32)
array([[ list([[[0, 0, 0, 0, 0, 0]], [-1.0], [0]]),
list([[[0, 0, 0, 0, 0, 0], [173, 8, 172, 0, 0, 0]], [-1.0], [0]])
]])
As your array contains lists of different sizes and nesting depths, I doubt that there is a simple or fast solution.
Here is a "get-the-job-done-no-matter-what" approach. It comes in two flavors. One creates arrays for leaves, the other one lists.
>>> a
array([[list([[[0, 0, 0, 0, 0, 0]], [-1.0], [0]]),
list([[[0, 0, 0, 0, 0, 0], [173, 8, 172, 0, 0, 0]], [-1.0], [0]])]],
dtype=object)
>>> def mkarr(a):
... try:
... return np.array(a,np.float32)
... except:
... return [*map(mkarr,a)]
...
>>> def mklst(a):
... try:
... return [*map(mklst,a)]
... except:
... return np.float32(a)
...
>>> np.frompyfunc(mkarr,1,1)(a)
array([[list([array([[0., 0., 0., 0., 0., 0.]], dtype=float32), array([-1.], dtype=float32), array([0.], dtype=float32)]),
list([array([[ 0., 0., 0., 0., 0., 0.],
[173., 8., 172., 0., 0., 0.]], dtype=float32), array([-1.], dtype=float32), array([0.], dtype=float32)])]],
dtype=object)
>>> np.frompyfunc(mklst,1,1)(a)
array([[list([[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0]], [-1.0], [0.0]]),
list([[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [173.0, 8.0, 172.0, 0.0, 0.0, 0.0]], [-1.0], [0.0]])]],
dtype=object)
if number of columns is fixed then
np.array([l.astype(np.float) for l in x_train_single.squeeze()])
But it will remove the redundant dimensions, convert everything into numpy array.
Before: (1, 1, 1, 11, 6)
After: (11,6)
Try this:
np.array(x_train_single.tolist())
Looks like you have a (1,1) shaped array, where the single element is a list. And the sublists a consistent in size.
I expect you will get an array with shape (1, 1, 1, 11, 6) and int dtype.
or:
np.array(x_train_single[0,0])
Again this extracts the list from the array, and then makes an array from that.
My answer so far was based on the display:
array([[list([[[173, 8, 172, 0, 0, 0], [512, 58, 57, 0, 0, 0],
...: [513, 514, 0, 0, 0, 0], [515, 189, 516, 0, 0, 0], [309, 266, 0, 0, 0,
...: 0],
...: [32, 310, 0, 0, 0, 0], [271, 58, 517, 0, 0, 0], [164, 40, 0, 0, 0, 0],
...: [38, 32, 60, 0, 0, 0], [38, 83, 60, 0, 0, 0], [149, 311, 0, 0, 0, 0]]
...: ])]])
The new display is more complicated
array([[ list([[[0, 0, 0, 0, 0, 0]], [-1.0], [0]]),
...: list([[[0, 0, 0, 0, 0, 0], [173, 8, 172, 0, 0, 0]], [-1.0], [0]])]])
because the inner lists differ in size. It can't be made into a numeric dtype array.
It can be turned into a (1,2,3) shape array, but still object dtype with 1d list elements.
I have a list of one batch data with multi-label for every sample. So how to covert it into torch.Tensor in one-hot encoding?
For example, with batch_size=5 and class_num=6,
label =[
[1,2,3],
[4,6],
[1],
[1,4,5],
[4]
]
how to make it into one-hot encoding in pytorch?
label_tensor=tensor([
[1,1,1,0,0,0],
[0,0,0,1,0,1],
[1,0,0,0,0,0],
[1,0,0,1,1,0],
[0,0,0,1,0,0]
])
If the batch size can be derived from len(labels):
def to_onehot(labels, n_categories, dtype=torch.float32):
batch_size = len(labels)
one_hot_labels = torch.zeros(size=(batch_size, n_categories), dtype=dtype)
for i, label in enumerate(labels):
# Subtract 1 from each LongTensor because your
# indexing starts at 1 and tensor indexing starts at 0
label = torch.LongTensor(label) - 1
one_hot_labels[i] = one_hot_labels[i].scatter_(dim=0, index=label, value=1.)
return one_hot_labels
and you have 6 categories and want the output to be a tensor of integers:
to_onehot(labels, n_categories=6, dtype=torch.int64)
tensor([[1, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 1],
[1, 0, 0, 0, 0, 0],
[1, 0, 0, 1, 1, 0],
[0, 0, 0, 1, 0, 0]])
I would stick to torch.float32 in case you want to use label smoothing, mix-up or something along those lines later.
To handle any situation (include string labels) I've extended #karniol's answer:
def multihot_encoder(labels, dtype=torch.float32):
""" Convert list of label lists into a 2-D multihot Tensor """
label_set = set()
for label_list in labels:
label_set = label_set.union(set(label_list))
label_set = sorted(label_set)
multihot_vectors = []
for label_list in labels:
multihot_vectors.append([1 if x in label_list else 0 for x in label_set])
# To keep track of which columns are which, set dtype to None and...
# import pandas as pd
if dtype is None:
return pd.DataFrame(multihot_vectors, columns=label_set)
return torch.Tensor(multihot_vectors).to(dtype)
Your use case:
label_lists = [[1,2,3], [4,6], [1], [1,4,5], [4]]
>>> multihot_encoder(label_lists)
tensor([[1., 1., 1., 0., 0., 0.],
[0., 0., 0., 1., 0., 1.],
[1., 0., 0., 0., 0., 0.],
[1., 0., 0., 1., 1., 0.],
[0., 0., 0., 1., 0., 0.]])
If you want to keep track of your labels (feature names) before converting your dataset to a Tensor, just set dtype to None:
label_lists = [
['happy', 'kind'], ['sad', 'mean'],
['loud', 'happy'], ['quiet', 'kind']
]
multihot_encoder(label_lists, dtype=None)
happy kind loud mean quiet sad
0 1 1 0 0 0 0
1 0 0 0 1 0 1
2 1 0 1 0 0 0
3 0 1 0 0 1 0
and you have 6 categories and want the output to be a tensor of integers:
to_onehot(labels, n_categories=6, dtype=torch.int64)
tensor([[1, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 1],
[1, 0, 0, 0, 0, 0],
[1, 0, 0, 1, 1, 0],
[0, 0, 0, 1, 0, 0]])
Is there a way to threshold the values in a Theano tensor? For instance, if v = t.vector(), I would like to create another tensor w which contains the same values as v, except that the ones that exceed a certain threshold T are replaced by T itself:
v = [1, 2, 3, 100, 200, 300]
T = 100
w = [1, 2, 3, 100, 100, 100]
More generally, what is there a standard framework to create your own operations on tensors?
Here is code that do that. Use the clip function.
import theano
v = theano.tensor.vector()
f = theano.function([v], theano.tensor.clip(v, 0, 100))
f([1, 2, 3, 100, 200, 300])
# array([ 1., 2., 3., 100., 100., 100.])
If you you don't want a min you can use a switch:
import theano
v = theano.tensor.vector()
f = theano.function([v], theano.tensor.clip(v, 0, 100))
f([1, 2, 3, 100, 200, 300])
# array([ 1., 2., 3., 100., 100., 100.])
f = theano.function([v], theano.tensor.switch(v<100, v, 100))
f([1, 2, 3, 100, 200, 300])
# array([ 1., 2., 3., 100., 100., 100.])