I have one numpy array with two dimensions, for the example let's say :
a = np.array([[1,2,3,4,5],[4,6,5,8,9]])
I tried to do a = a[a[0]>2] but i got an error. I would like to obtain:
array([[3, 4, 5],
[5, 8, 9]])
Is it possible ? thanks !
Evaluate the options step by step:
In [75]: a = np.array([[1,2,3,4,5],[4,6,5,8,9]])
first row, a 1d array
In [76]: a[0]
Out[76]: array([1, 2, 3, 4, 5])
where that first row is >2, a 1d boolean array of same size
In [77]: a[0]>2
Out[77]: array([False, False, True, True, True])
Using that direct, produces an error:
In [78]: a[a[0]>2]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-78-631a57b67cdb> in <module>()
----> 1 a[a[0]>2]
IndexError: boolean index did not match indexed array along dimension 0; dimension is 2 but corresponding boolean dimension is 5
First dimension of a is 2, but the boolean index (mask) has size 2 (2nd dim)
So we need to apply it to the 2nd dimension. 2d indexing syntax: x[i, j], x[:, j] to select all rows, but subset of columns:
In [79]: a[:,a[0]>2]
Out[79]:
array([[3, 4, 5],
[5, 8, 9]])
Related
I believe this is not a duplicate question, although there are questions that are fairly close to this one on the website. I would like to isolate a row from a numpy list given a set of conditions for some of its elements. Here is an example, consider the array Z:
>>> Z = [[1,0,3,4], [1,1,3,6], [1,2,3,9], [1,3,4,0], [2,1,4,5]]
>>> Z = np.array(Z)
>>> Z
array([[1, 0, 3, 4],
[1, 1, 3, 6],
[1, 2, 3, 9],
[1, 3, 4, 0],
[2, 1, 4, 5]])
and say I would like to isolate the row whose first and second element are both 1. The command that executes that should output the row
np.array([[1, 1, 3, 6]])
However, if I follow this popular question, and make an intuitive extension, such as:
Z[Z[:,0] == 1 & Z[:,1] == 1, :]
I get:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Is there any quick fix to that? I do not want to iterate over my list. I was wondering if there is a quick "numpy" way for it.
Elegant is np.equal
Z[np.equal(Z[:, [0,1]], 1).all(axis=1)]
Or:
Z[np.equal(Z[:,0], 1) & np.equal(Z[:,1], 1)]
More simple
print (Z[(Z[:,0]==1)&(Z[:,1]==1)])
or
print (Z[(Z[:,0]==1)&(Z[:,1]==1),:])
You got
[[1 1 3 6]]
I have a hard time understanding the difference between these two kinds of indexing.
Let's say I have a nested nested list:
x = np.array([[[1,2],[5,6]],[[9,7],[12,23]]])
if I did
x[:][:][1] and x[:,:,1]
I would get
[[9 7][12 23]]
[[5 6][12 23]]
respectively.
To be honest, I have no clue as to how I would get these results. Could someone explain the steps to me as to how I would get these arrays ?
This has to do with python's slice syntax. Essentially, obj[a:b:c] is syntactic shorthand for obj.__getitem__(slice(a,b,c)).
x[:] simply returns a 'full slice' of x - that is, it returns an exact copy of x. Doing x[:][:][1] is no different from doing x[1].
Meanwhile, doing x[:,:,1] equates to:
x.__getitem__((slice(), slice(), 1))
that is, using a 3-tuple as an index. For an ordinary python list, this would fail, but Numpy accepts it gracefully. To see how Numpy does so, let's look a bit closer at this example:
>>> x = np.array([[[1,2],[5,6]],[[9,7],[12,23]]])
>>> x[1]
array([[ 9, 7],
[12, 23]])
>>> x[:,1]
array([[ 5, 6],
[12, 23]])
>>> x[:,:,1]
array([[ 2, 6],
[ 7, 23]])
>>> x[:,:,:,1]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: too many indices for array: array is 3-dimensional, but 4 were indexed
We can see a pattern.
When you give a Numpy array a tuple as an index, it maps each element of the tuple to a dimension, and then applies the subsequent elements of the tuple to each of those dimensions. In short:
x[1] just gets the element at index 1 from the first dimension of the array. This is the single element [[9, 7], [12, 23]]
x[:, 1] gets the element at index 1 from each element in the first dimension of the array. Which is to say, it gets the elements at index 1 from the second dimension of the array. This is two elements: [5, 6] and [12, 23]. Numpy groups them together in a list.
x[:, :, 1] follows the same pattern - it gets the elements at index 1 from the third dimension of the array. This time there are four unique elements: 2 and 6 from the first element in the second dimension, and, 7 and 23 from the second element in the second dimension. Numpy groups them by dimension in a nested list.
x[:, :, :, 1] fails, because the array only has three dimensions - there's no way to further subdivide any of the third dimension's elements.
I'm attempting to put together a number of 3D arrays with the same size on the first two dimensions but differing sizes on the 3rd dimensions. I'm using numpy.hstack().
import numpy as np
first = np.array([[[1,2], [3,4]],
[[5,6], [7,8]],
[[9,10],[11,12]]])
second = np.array([[[88],[88]],
[[88],[88]],
[[88],[88]]])
output = np.hstack((first,second))
print (output)
This results in an error:
Exception has occurred: ValueError
all the input array dimensions for the concatenation axis must match exactly, but along dimension 2, the array at index 0 has size 2 and the array at index 1 has size 1all the input array dimensions for the concatenation axis must match exactly, but along dimension 2, the array at index 0 has size 2 and the array at index 1 has size 1
Now, if I try this on two 2D arrays with a mismatched second dimension, np.hstack() has no trouble. For instance:
import numpy as np
first= np.array([[1,2],[3,4],[5,6]])
second= np.array([[88],[88],[88]])
output = np.hstack((first,second))
print (output)
outputs, as expected:
[[ 1 2 88]
[ 3 4 88]
[ 5 6 88]]
The result I'm going for with the 3D concatenation is:
[[[ 1 2 88],[ 3 4 88]]
[[ 5 6 88],[ 7 8 88]]
[[ 9 10 88],[ 11 12 88]]]
Am I going about it the right way? Is there an alternative? Thanks for your help.
np.concatenate is what you're looking for:
>>> import numpy as np
>>> first = np.arange(1, 13).reshape(3, 2, 2); first
array([[[ 1, 2],
[ 3, 4]],
[[ 5, 6],
[ 7, 8]],
[[ 9, 10],
[11, 12]]])
>>> second = np.repeat(88, 6).reshape(3, 2, 1); second
array([[[88],
[88]],
[[88],
[88]],
[[88],
[88]]])
>>> np.concatenate((first, second), axis=2)
array([[[ 1, 2, 88],
[ 3, 4, 88]],
[[ 5, 6, 88],
[ 7, 8, 88]],
[[ 9, 10, 88],
[11, 12, 88]]])
DataReader = pd.read_csv('Quality.csv')
...
ip = [DataReader.x1, DataReader.x2, DataReader.x3, DataReader.x4,........., DataReader.x12,
DataReader.x13]
op = DataReader.y
ip = np.matrix(ip).transpose()
op = np.matrix(op).transpose()
Please help to solve below error. Python 3.7v and numpy 1.17v
Traceback (most recent call last):
File "Quality.py", line xx, in <module>
ip = np.matrix(ip).transpose()
File "\\defmatrix.py", line 147, in __new__
arr = N.array(data, dtype=dtype, copy=copy)
**ValueError: cannot copy sequence with size 13 to array axis with dimension 200**
You start with a dataframe with 200 rows and 14 columns. Its .values attribute (or to_numpy() method result) will then be a (200,14) shape array.
In:
ip = [DataReader.x1, DataReader.x2, ...DataReader.x13]
DataReader.x1 is a column, a pandas Series. Its .values is a 1d array, (200,) shape. I would expect np.array(ip) to be a (13,200) array.
If instead you'd done
ip.DataReader[['x1','x2',...,'x13']].values
the result would be a (200,13) array.
With a simple dataframe, your code shouldn't produce an error:
In [61]: df = pd.DataFrame(np.arange(12).reshape(4,3), columns=['a','b','c'])
In [63]: df.values
Out[63]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
In [65]: ip = [df.a, df.b, df.c]
In [67]: np.array(ip)
Out[67]:
array([[ 0, 3, 6, 9],
[ 1, 4, 7, 10],
[ 2, 5, 8, 11]])
np.matrix(ip).transpose() works just as well (though there's no need to use np.matrix instead of np.array).
I can't reproduce your error. Making an array from certain mixes of shaped arrays produces an error like
ValueError: could not broadcast input array from shape (3) into shape (1)
or in other cases an object array (of Series).
====
For one column, I'd expect the resulting 1d array, or reshape if needed.
In [82]: df.a
Out[82]:
0 0
1 3
2 6
3 9
Name: a, dtype: int64
In [83]: df.a.values
Out[83]: array([0, 3, 6, 9])
In [84]: df.a.values[:,None]
Out[84]:
array([[0],
[3],
[6],
[9]])
I want to add a new row to a numpy 2d-array, say if array 1 has dimensions of (2, 5) and array-2 is a kind of row (which has 3 values or cols) of shape (3,)
my resultant array should look like (3, 10) and the last two indices in 3rd row should be NA's.
arr1 = array([[9, 4, 2, 6, 7],
[8, 5, 4, 1, 3]])
arr2 = array([3, 1, 5])
after some join or concat operation
arr1_arr2 = array([[9, 4, 2, 6, 7],
[8, 5, 4, 1, 3],
[3, 1, 5, np.nan, np.nan]])
I have tried numpy append concat functions but they dont work this way.
You can only concatenate arrays of the same number of dimensions (this can be resolved by broadcasting) and the same number of elements except for the concatenating axis.
Thus you need to append/concatenate an empty array of the correct shape and fill it with the values of arr2 afterwards.
# concatenate an array of the correct shape filled with np.nan
arr1_arr2 = np.concatenate((arr1, np.full((1, arr1.shape[1]), np.nan)))
# fill concatenated row with values from arr2
arr1_arr2[-1, :3] = arr2
But in general it is always good advice, NOT to append/concatenate etc. arrays. If it is possible, try to guess the correct shape of the final array in advance and create an empty array (or filled with np.nan) of the final shape, which will be filled in the process. For example:
arr1_arr2 = np.full((3, 5), np.nan)
arr1_arr2[:-1, :] = arr1
arr1_arr2[-1, :arr2.shape[0]] = arr2
If it is only one appending/concatenating operation and it is not performance critical, it is ok to concat/append, otherwise a full preallocation in advance should be preferred.
If many arrays shall be concatenated and all arrays to concatenate have the same shape, this will be the best way to do it:
arr1 = np.array([[9, 4, 2, 6, 7],
[8, 5, 4, 1, 3]])
# some arrays to concatenate:
arr2 = np.array([3, 1, 5])
arr3 = np.array([5, 7, 9])
arr4 = np.array([54, 1, 99])
# make array of all arrays to concatenate:
arrs_to_concat = np.vstack((arr2, arr3, arr4))
# preallocate result array filled with nan
arr1_arr2 = np.full((arr1.shape[0] + arrs_to_concat.shape[0], 5), np.nan)
# fill with values:
arr1_arr2[:arr1.shape[0], :] = arr1
arr1_arr2[arr1.shape[0]:, :arrs_to_concat.shape[1]] = arrs_to_concat
Performance-wise it may be a good idea for large arrays to use np.empty for preallocating the final array and filling only the remaining shape with np.nan.