Filled with numpy.zero but getting nan instead - python-3.x

I have a Class that contains various methods with include the following:
def _doc_mean(self, doc):
doc_vector_values = []
for w in doc:
#print(w)
if w.lower().strip() in self._E:
Q = np.zeros((1, 200), dtype=np.float64) #this is a zero array for when a word doesnt have a vector representation in our pretrained embeddings
doc_vector_values.append(self._E.get(w, Q))
with warnings.catch_warnings():
warnings.simplefilter("ignore", category=RuntimeWarning)
return np.mean(np.array(doc_vector_values, dtype=np.float64), axis=0)
def fit(self, X, y=None):
return self
def transform(self, X):
return np.array([self._doc_mean(doc) for doc in X])
def fit_transform(self, X, y=None):
return self.fit(X).transform(X)
in _doc_mean, i compare w with the keys in a dictionary E_, if there is a match, then load the value of the key-value pair which contains a 1*200 vector into a list, if there is no match, then load numpy.zeros((1,200)) into a list. This list is now converted to an array and the mean calculated.
When i instantiate the class and fit-transform my 'doc' data
mc = MeanClass()
X_ = mc.fit_transform(doc)
X_ is of dtype "object" and the places where there was a mismatch was replaced with nan instead of numpy.zero.
This leads to multiple other problems in my code that i cant fix. What am i doing wrong?
EDIT:
The E_ dictionary looks like this :
{'hello': array([ 5.84850e-02, 6.20640e-02, ..... -2.08990e-02])
'good': array([ -4.80050e-02, 2.80610e-02, ..... -5.04991e-02])
while doc looks like this :
['hello', 'bye', 'good']
['good', 'bye', 'night']

Since you haven't given a [mcve], I'll create something simple:
In [125]: E_ = {'foo':np.arange(5), 'bar':np.arange(1,6), 'baz':np.arange(5,10)}
In [126]: doc = ['foo','bar','sub','baz','foo']
Now do the dictionary lookup:
In [127]: alist = []
In [128]: for w in doc:
...: alist.append(E_.get(w,np.zeros((1,5),int)))
...:
In [129]: alist
Out[129]:
[array([0, 1, 2, 3, 4]),
array([1, 2, 3, 4, 5]),
array([[0, 0, 0, 0, 0]]),
array([5, 6, 7, 8, 9]),
array([0, 1, 2, 3, 4])]
In [130]: np.array(alist)
Out[130]:
array([array([0, 1, 2, 3, 4]), array([1, 2, 3, 4, 5]),
array([[0, 0, 0, 0, 0]]), array([5, 6, 7, 8, 9]),
array([0, 1, 2, 3, 4])], dtype=object)
The arrays in E_ are all shape (5,). The 'fill' array is (1,5). Due to the mismatch in shapes, the Out[130] array is 1d object.
I think you are trying to avoid the 'fill' case, but you test w.lower().strip() in self._E, and then use w in the get. So you might get the Q value sometimes. I got it with the 'sub' string.
If instead I make the 'fill' be (5,):
In [131]: alist = []
In [132]: for w in doc:
...: alist.append(E_.get(w,np.zeros((5,),int)))
...:
In [133]: alist
Out[133]:
[array([0, 1, 2, 3, 4]),
array([1, 2, 3, 4, 5]),
array([0, 0, 0, 0, 0]),
array([5, 6, 7, 8, 9]),
array([0, 1, 2, 3, 4])]
In [134]: np.array(alist)
Out[134]:
array([[0, 1, 2, 3, 4],
[1, 2, 3, 4, 5],
[0, 0, 0, 0, 0],
[5, 6, 7, 8, 9],
[0, 1, 2, 3, 4]])
The result is a (n,5) numeric array.
I can take two different means. One is the mean across all words, with a value for each 'attribute'. The other is the mean for each word, which I could just as well have gotten by taking the mean in E_.
In [135]: np.mean(_, axis=0)
Out[135]: array([1.2, 2. , 2.8, 3.6, 4.4])
In [137]: np.mean(__, axis=1)
Out[137]: array([2., 3., 0., 7., 2.]) # mean for each 'word'
mean of the object array in Out[130]:
In [138]: np.mean(_130, axis=0)
Out[138]: array([[1, 2, 2, 3, 4]])
The result is (1,5) and looks like Out[135] truncated, but I'd have to dig a bit further to be sure.
Hopefully this gives you an idea of what to watch out for. And an idea of the kind of 'minimal reproducable concrete example' that we find most useful.

Related

Subtract the elements of every possible pair of a torch Tensor efficiently

I have a huge torch Tensor and I'm looking for an efficient approach to subtract the elements of every pair of that Tensor.
Of course I could use two nested for but it wouldn't be efficient.
For example giving
[1, 2, 3, 4]
The output I want is
[1-2, 1-3, 1-4, 2-3, 2-4, 3-4]
You can do this easily:
>>> x = torch.tensor([1, 2, 3, 4])
>>> x[:, None] - x[None, :]
tensor([[ 0, -1, -2, -3],
[ 1, 0, -1, -2],
[ 2, 1, 0, -1],
[ 3, 2, 1, 0]])
see more details here.

how to pad a text after build the vocab in pytorch

I used torchtext vocab to convert the text to index, but which function should I use to make all the index list be the same length before I send them to the net?
For example I have 2 texts:
I am a good man
I would like a coffee please
After vocab:
[1, 3, 2, 5, 7]
[1, 9, 6, 2, 4, 8]
And what I want is:
[1, 3, 2, 5, 7, 0]
[1, 9, 6, 2, 4, 8]
It is easy to understand by looking at the following example.
Code:
import torch
v = [
[0,2],
[0,1,2],
[3,3,3,3]
]
torch.nn.utils.rnn.pad_sequence([torch.tensor(p) for p in v], batch_first=True)
Result:
tensor([[0, 2, 0, 0],
[0, 1, 2, 0],
[3, 3, 3, 3]])

3D matrix addition python

I am trying to add 3D matrix but third loop is not starting from 0.
Here shape of matrix is (2,3,3).
Code:
for i in range(0,r):
for j in range(0,c):
for l in range(0,k):
sum[i][j][k]=A1[i][j][k]+A2[i][j][k]
Output:
IndexError: index 3 is out of bounds for axis 0 with size 3
For element-wise addition of two matrices, you can simply use the + operator between two numpy arrays:
#create two matrices of random integers
matrix1 = np.random.randint(10, size=(2,3,3))
matrix2 = np.random.randint(10, size=(2,3,3))
#add the two matrices element-wise
sum_matrix = matrix1 + matrix2
print(matrix1, matrix2, sum_matrix, sep='\n__________\n')
I don't get IndexError. Maybe you post your whole code?
This is my code:
arr1 = [[[2, 4, 8], [7, 7, 1], [4, 9, 0]], [[5, 0, 0], [3, 8, 6], [0, 5, 8]]]
arr2 = [[[3, 8, 0], [1, 5, 2], [0, 3, 9]], [[9, 7, 7], [1, 2, 5], [1, 1, 3]]]
sumArr = [[[0, 0, 0], [0, 0, 0], [0, 0, 0]], [[0, 0, 0], [0, 0, 0],[0, 0, 0]]]
for i in range(2): #can also use range(0,2)
for j in range(3):
for k in range(3):
sumArr[i][j][k]=arr1[i][j][k]+arr2[i][j][k]
print(sumArr)
By the way, is it necessary to use for loop?
If not, you can use numpy library.
import numpy as np
Convert your manual array to numpy matrix array, then do addition.
arr1 = [[[2, 4, 8], [7, 7, 1], [4, 9, 0]], [[5, 0, 0], [3, 8, 6], [0, 5, 8]]]
arr2 = [[[3, 8, 0], [1, 5, 2], [0, 3, 9]], [[9, 7, 7], [1, 2, 5], [1, 1, 3]]]
m1 = np.array(arr1)
m2 = np.array(arr2)
print("M1: \n", m1)
print("M2: \n", m2)
print("Sum: \n", m1 + m2)
You iterate with 'l' in the third loop but to access in list, you used k. As a result, your code is trying to access k-th index which doesn't exists, and you're getting an error.
Use this:
for i in range(0, r):
for j in range(0, c):
for l in range(0, k):
sum[i][j][l] = A1[i][j][l] + A2[i][j][l]

Explanation for slicing in Pytorch

why is the output same every time?
a = torch.tensor([0, 1, 2, 3, 4])
a[-2:] = torch.tensor([[[5, 6]]])
a
tensor([0, 1, 2, 5, 6])
a = torch.tensor([0, 1, 2, 3, 4])
a[-2:] = torch.tensor([[5, 6]])
a
tensor([0, 1, 2, 5, 6])
a = torch.tensor([0, 1, 2, 3, 4])
a[-2:] = torch.tensor([5, 6])
a
tensor([0, 1, 2, 5, 6])
Pytorch is following Numpy here which allows assignment to slices as long as the shapes are compatible meaning that the two sides have the same shape or the right hand side is broadcastable to the shape of the slice. Starting with trailing dimensions, two arrays are broadcastable if they only differ in dimensions where one of them is 1. So in this case
a = torch.tensor([0, 1, 2, 3, 4])
b = torch.tensor([[[5, 6]]])
print(a[-2:].shape, b.shape)
>> torch.Size([2]) torch.Size([1, 1, 2])
Pytorch will perform the following comparisons:
a[-2:].shape[-1] and b.shape[-1] are equal so the last dimension is compatible
a[-2:].shape[-2] does not exist, but b.shape[-2] is 1 so they are compatible
a[-2:].shape[-3] does not exist, but b.shape[-3] is 1 so they are compatible
All dimensions are compatible, so b can be broadcasted to a
Finally, Pytorch will convert b to tensor([5, 6]) before performing the assignment thus producing the result:
a[-2:] = b
print(a)
>> tensor([0, 1, 2, 5, 6])

Error:setting an array element with a sequence

if triangles is None:
tridata = mesh['face'].data['vertex_indices']
print(tridata)
print(type(tridata))
print(tridata.dtype)
triangles = plyfile.make2d(tridata)
there have a error :setting an array element with a sequence.
I check the type of tridata:
[array([ 0, 5196, 10100], dtype=int32)
array([ 0, 2850, 10103], dtype=int32)
array([ 0, 3112, 10102], dtype=int32) ...
array([ 2849, 10076, 5728], dtype=int32)
array([ 2849, 10099, 8465], dtype=int32)
array([ 2849, 10098, 8602], dtype=int32)]
<class 'numpy.ndarray'>
object
ValueError:Error:setting an array element with a sequence.
I don't know where is wrong?
There is the code of function "make2d" :
def make2d(array, cols=None, dtype=None):
'''
Make a 2D array from an array of arrays. The `cols' and `dtype'
arguments can be omitted if the array is not empty.
'''
if (cols is None or dtype is None) and not len(array):
raise RuntimeError("cols and dtype must be specified for empty "
"array")
if cols is None:
cols = len(array[0])
if dtype is None:
dtype = array[0].dtype
return _np.fromiter(array, [('_', dtype, (cols,))],
count=len(array))['_']
Where's this code from? The use of a compound dtype in fromiter is tricky.
In [102]: dt1=np.dtype([('_',int,(4,))])
In [103]: dt2=np.dtype('i,i,i,i')
In [104]: x = np.arange(12).reshape(3,4)
In [105]: np.fromiter(x, dt1)
....
ValueError: setting an array element with a sequence.
In [106]: np.fromiter(x, dt2)
...
ValueError: setting an array element with a sequence.
If I flatten the array, it works - except values are replicated:
In [107]: np.fromiter(x.ravel(), dt1)
Out[107]:
array([([ 0, 0, 0, 0],), ([ 1, 1, 1, 1],), ([ 2, 2, 2, 2],),
([ 3, 3, 3, 3],), ([ 4, 4, 4, 4],), ([ 5, 5, 5, 5],),
([ 6, 6, 6, 6],), ([ 7, 7, 7, 7],), ([ 8, 8, 8, 8],),
([ 9, 9, 9, 9],), ([10, 10, 10, 10],), ([11, 11, 11, 11],)],
dtype=[('_', '<i8', (4,))])
Converting the array to a nested list, works:
In [108]: np.fromiter(x.tolist(), dt1)
Out[108]:
array([([ 0, 1, 2, 3],), ([ 4, 5, 6, 7],), ([ 8, 9, 10, 11],)],
dtype=[('_', '<i8', (4,))])
In [109]: np.fromiter(x.tolist(), dt2)
....
ValueError: setting an array element with a sequence.
But if I make it a list of tuples, I can create this structured array. List of tuples is the normal way of filling a structured array.
In [110]: np.fromiter([tuple(i) for i in x.tolist()], dt2)
Out[110]:
array([(0, 1, 2, 3), (4, 5, 6, 7), (8, 9, 10, 11)],
dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4'), ('f3', '<i4')])
But with an object dtype array, none of these tricks work:
In [111]: a
Out[111]:
array([array([0, 1, 2, 3]), array([5, 6, 7, 8]), array([10, 11, 12, 13])],
dtype=object)
I can make an array with dt1 using assignment to an initialized array:
In [123]: b = np.zeros((3,), dt1)
In [124]: b
Out[124]:
array([([0, 0, 0, 0],), ([0, 0, 0, 0],), ([0, 0, 0, 0],)],
dtype=[('_', '<i8', (4,))])
In [125]: b['_']=x
In [126]: b
Out[126]:
array([([ 0, 1, 2, 3],), ([ 4, 5, 6, 7],), ([ 8, 9, 10, 11],)],
dtype=[('_', '<i8', (4,))])
I can also iteratively fill it from the array of arrays:
In [128]: for i in range(3):
...: b['_'][i]=a[i]
...:
In [129]: b
Out[129]:
array([([ 0, 1, 2, 3],), ([ 5, 6, 7, 8],), ([10, 11, 12, 13],)],
dtype=[('_', '<i8', (4,))])

Resources