how to pad a text after build the vocab in pytorch - nlp

I used torchtext vocab to convert the text to index, but which function should I use to make all the index list be the same length before I send them to the net?
For example I have 2 texts:
I am a good man
I would like a coffee please
After vocab:
[1, 3, 2, 5, 7]
[1, 9, 6, 2, 4, 8]
And what I want is:
[1, 3, 2, 5, 7, 0]
[1, 9, 6, 2, 4, 8]

It is easy to understand by looking at the following example.
Code:
import torch
v = [
[0,2],
[0,1,2],
[3,3,3,3]
]
torch.nn.utils.rnn.pad_sequence([torch.tensor(p) for p in v], batch_first=True)
Result:
tensor([[0, 2, 0, 0],
[0, 1, 2, 0],
[3, 3, 3, 3]])

Related

Select on second dimension on a 3D pytorch tensor with an array of indexes

I am kind of new with numpy and torch and I am struggling to understand what to me seems the most basic operations.
For instance, given this tensor:
A = tensor([[[6, 3, 8, 3],
[1, 0, 9, 9]],
[[4, 9, 4, 1],
[8, 1, 3, 5]],
[[9, 7, 5, 6],
[3, 7, 8, 1]]])
And this other tensor:
B = tensor([1, 0, 1])
I would like to use B as indexes for A so that I get a 3 by 4 tensor that looks like this:
[[1, 0, 9, 9],
[4, 9, 4, 1],
[3, 7, 8, 1]]
Thanks!
Ok, my mistake was to assume this:
A[:, B]
is equal to this:
A[[0, 1, 2], B]
Or more generally the solution I wanted is:
A[range(B.shape[0]), B]
Alternatively, you can use torch.gather:
>>> indexer = B.view(-1, 1, 1).expand(-1, -1, 4)
tensor([[[1, 1, 1, 1]],
[[0, 0, 0, 0]],
[[1, 1, 1, 1]]])
>>> A.gather(1, indexer).view(len(B), -1)
tensor([[1, 0, 9, 9],
[4, 9, 4, 1],
[3, 7, 8, 1]])

numpy remove column by different value in batch

I want to ask how numpy remove columns in batch by list.
The value in the list corresponds to the batch is different from each other.
I know this problem can use the for loop to solve, but it is too slow ...
Can anyone give me some idea to speed up?
array (batch size = 3):
[[0, 1, 2, 3, 4, 5, 6], [0, 1, 2, 3, 4, 5, 6], [0, 1, 2, 3, 4, 5, 6]]
remove index in the list (batch size = 3)
[[2, 3, 4], [1, 2, 6], [0, 1, 5]]
output:
[[0, 1, 5, 6], [0, 3, 4, 5], [2, 3, 4, 6]]
Assuming the array is 2d, and the indexing removes equal number of elements per row, we can remove items with a boolean mask:
In [289]: arr = np.array([[0, 1, 2, 3, 4, 5, 6], [0, 1, 2, 3, 4, 5, 6], [0, 1, 2, 3, 4, 5, 6]]
...: )
In [290]: idx = np.array([[2, 3, 4], [1, 2, 6], [0, 1, 5]])
In [291]: mask = np.ones_like(arr, dtype=bool)
In [292]: mask[np.arange(3)[:,None], idx] = False
In [293]: arr[mask]
Out[293]: array([0, 1, 5, 6, 0, 3, 4, 5, 2, 3, 4, 6])
In [294]: arr[mask].reshape(3,-1)
Out[294]:
array([[0, 1, 5, 6],
[0, 3, 4, 5],
[2, 3, 4, 6]])

How to make a new matrix from another matrix by using its first column as new's first row?

I want to use a python function or library - if any - for creating a new matrix whose first row beginning from the right-below is created by using old matrix's first column beginning from the left-top. That matrix can have different columns and rows but of course my new matrix have to have same dimension as previous one. My will is something like that:
In keeping with the brief style of the question:
In [467]: alist = [5,6,4,3,4,5,3,2,5,3,1,2,2,3,2,1,3,1,1,1]
In [468]: arr = np.array(alist).reshape(4,5)
In [469]: arr
Out[469]:
array([[5, 6, 4, 3, 4],
[5, 3, 2, 5, 3],
[1, 2, 2, 3, 2],
[1, 3, 1, 1, 1]])
In [470]: arr.reshape(5,4)
Out[470]:
array([[5, 6, 4, 3],
[4, 5, 3, 2],
[5, 3, 1, 2],
[2, 3, 2, 1],
[3, 1, 1, 1]])
In [471]: arr.reshape(5,4,order='F')
Out[471]:
array([[5, 3, 2, 1],
[5, 2, 1, 4],
[1, 3, 3, 3],
[1, 4, 5, 2],
[6, 2, 3, 1]])
In [473]: np.rot90(_)
Out[473]:
array([[1, 4, 3, 2, 1],
[2, 1, 3, 5, 3],
[3, 2, 3, 4, 2],
[5, 5, 1, 1, 6]])

Adding two list in a list in python

Can anyone help me. This is what i want to do.
x = [[1,2,3,4,5],[6,7,8,9,10]]
y= [0,1]
desired output = [
[[1,2,3,4,5],[0,1]],
[[6,7,8,9,10],[0,1]]
]
I try putting it in a for loop
>>> x = [[1,2,3,4,5],[6,7,8,9,10]]
>>> for value in x:
... a = []
... a += ([x,y])
... print(a)
...
[[[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]], [0, 1]]
[[[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]], [0, 1]]
I also tried doing this
>>> for value in x:
... a = []
... a += ([x,y])
... print(a)
...
[[1, 2, 3, 4, 5], [0, 1]]
[[1, 2, 3, 4, 5], [0, 1]]
[[1, 2, 3, 4, 5], [0, 1]]
[[1, 2, 3, 4, 5], [0, 1]]
[[1, 2, 3, 4, 5], [0, 1]]
Thank you for helping. I need it for putting label on my data for neural networks.
You can use a list comprehension, and iterate over each sublist in x. Since you're inserting y into different sublists, you might want to insert a copy of the list, not the original.
[[i, y[:]] for i in x]
Or,
[[i, y.copy()] for i in x]
[[[1, 2, 3, 4, 5], [0, 1]], [[6, 7, 8, 9, 10], [0, 1]]]
The copy is done as a safety precaution. To understand why, consider an example,
z = [[i, y] for i in x] # inserting y (reference copy)
y[0] = 12345
print(z)
[[[1, 2, 3, 4, 5], [12345, 1]], [[6, 7, 8, 9, 10], [12345, 1]]] # oops
Modifying the original y or the y in any other sublist will reflect changes across all sublists. You can prevent that by inserting a copy instead, which is what I've done at the top.
Try this:
for i in range(len(x)):
z[i] = [x[i],y];

numpy assignment doesn't work

Suppose I have the following numpy.array:
In[]: x
Out[]:
array([[1, 2, 3, 4, 5],
[5, 2, 4, 1, 5],
[6, 7, 2, 5, 1]], dtype=int16)
In[]: y
Out[]:
array([[-3, -4],
[-4, -1]], dtype=int16)
I want to replace a sub array of x by y and tried the following:
In[]: x[[0,2]][:,[1,3]]= y
Ideally, I wanted this to happen:
In[]: x
Out[]:
array([[1, -3, 3, -4, 5],
[5, 2, 4, 1, 5],
[6, -4, 2, -1, 1]], dtype=int16)
The assignment line doesn't give me any error, but when I check the output of x
In[]: x
I find that x hasn't changed, i.e. the assignment didn't happen.
How can I make that assignment? Why did the assignment didn't happen?
The the "fancy indexing" x[[0,2]][:,[1,3]] returns a copy of the data. Indexing with slices returns a view. The assignment does happen, but to a copy (actually a copy of a copy of...) of x.
Here we see that the indexing returns a copy:
>>> x[[0,2]]
array([[1, 2, 3, 4, 5],
[6, 7, 2, 5, 1]], dtype=int16)
>>> x[[0,2]].base is x
False
>>> x[[0,2]][:, [1, 3]].base is x
False
>>>
Now you can use fancy indexing to set array values, but not when you nest the indexing.
You can use np.ix_ to generate the indices and perform the assignment:
>>> x[np.ix_([0, 2], [1, 3])]
array([[2, 4],
[7, 5]], dtype=int16)
>>> np.ix_([0, 2], [1, 3])
(array([[0],
[2]]), array([[1, 3]]))
>>> x[np.ix_([0, 2], [1, 3])] = y
>>> x
array([[ 1, -3, 3, -4, 5],
[ 5, 2, 4, 1, 5],
[ 6, -4, 2, -1, 1]], dtype=int16)
>>>
You can also make it work with broadcasted fancy indexing (if that's even the term) but it's not pretty
>>> x[[0, 2], np.array([1, 3])[..., None]] = y
>>> x
array([[ 1, -3, 3, -4, 5],
[ 5, 2, 4, 1, 5],
[ 6, -4, 2, -1, 1]], dtype=int16)
By the way, there is some interesting discussion at the moment on the NumPy Discussion mailing list on better support for "orthogonal" indexing so this may become easier in the future.

Resources