numpy structured array no shape information? - python-3.x

Why is the shape of a single row numpy structured array not defined ( '()') and whats the common "workaround"?
import io
fileWrapper = io.StringIO("-0.09469 0.032987 0.061009 0.0588")
a =np.loadtxt(fileWrapper,dtype=np.dtype([('min', (float,2) ), ('max',(float,2) )]), delimiter= " ", comments="#");
print(np.shape(a), a)
Output: () ([-0.09469, 0.032987], [0.061009, 0.0588])

Short answer: Add the argument ndmin=1 to the loadtxt call.
Long answer:
The shape is () for the same reason that reading a single floating point value with loadtxt returns an array with shape ():
In [43]: a = np.loadtxt(['1.0'])
In [44]: a.shape
Out[44]: ()
In [45]: a
Out[45]: array(1.0)
By default, loadtxt uses the squeeze function to eliminate trivial (i.e. length 1) dimensions in the array that it returns. In my example above, it means the result is a "scalar array"--an array with shape ().
When you give loadtxt a structured dtype, the structure defines the fields of a single element of the array. It is common to think of these fields as "columns", but structured arrays will make more sense if you consistently think of them as what they are: arrays of structures with fields. If your data file had two lines, the array returned by loadtxt would be an array with shape (2,). That is, it is a one-dimensional array with length 2. Each element of the array is a structure whose fields are defined by the given dtype. When the input file has only a single line, the array would have shape (1,), but loadtxt squeezes that to be a scalar array with shape ().
To force loadtxt to always return a one-dimensional array, even when there is a single line of data, use the argument ndmin=1.
For example, here's a dtype for a structured array:
In [58]: dt = np.dtype([('x', np.float64), ('y', np.float64)])
Read one line using that dtype. The result has shape ():
In [59]: a = np.loadtxt(['1.0 2.0'], dtype=dt)
In [60]: a.shape
Out[60]: ()
Use ndmin=1 to ensure that even an input with a single line results in a one-dimensional array:
In [61]: a = np.loadtxt(['1.0 2.0'], dtype=dt, ndmin=1)
In [62]: a.shape
Out[62]: (1,)
In [63]: a
Out[63]:
array([(1.0, 2.0)],
dtype=[('x', '<f8'), ('y', '<f8')])

Related

Stacking up dataframes in a 3-dimenional numpy array

I have several pandas dataframe that I would like to stack them up using numpy as a three-dimensional numpy array. I could manually do the job using the following code:
arr = np.array([df1.values, df2.values], dtype="object")
However, since I have many dataframes, I can neither write this line for all the dataframes nor automate it.
I tried to use append function (np.append(df1.values, df2['1002'].values)) but it flattens dataframes and ignores their structure. What I want is a three-dimensional numpy array where the first dimension is the number of dataframes (that I have), the second one is the number of rows in each dataframe, and the third one is the number of columns. In the first example that I mentioned earlier, I get a three-dimensional numpy array. In fact when I run arr.shape the result is (2,) and when I run arr[0].shape and arr[1].shape, I get (26, 7) and (24, 7), respectively which are the structure of their corresponding dataframe.
I even ran np.append(df1.values, df2['1002'].values, axis=0) but I received the error of ValueError: all the input array dimensions for the concatenation axis must match exactly. Is there any way that I can fix this problem and stack up all my dataframes in a 3-dimensional numpy array?
Looks like you start with 2 frames with 7 columns, but different numbers of rows. The equivalent of:
In [1]: arr1 = np.ones((26,7)); arr2 = np.zeros((24,7))
...:
In [2]: arr = np.array([arr1, arr2], object)
In [3]: arr.shape
Out[3]: (2,)
In [4]: arr[0].shape
Out[4]: (26, 7)
You probably tried this without the object and got a 'ragged array' warning. In any case, this is not a 3d array. It is 1d (2,), with two arrays. It's roughly the same as the list
[arr1, arr2]
The np.append docs should make it clear that it flattens the arguments, when you don't specify an axis.
In [6]: np.append(arr1,arr2).shape
Out[6]: (350,)
You could specify an axis, and get a 2d array, where the 50 is the sum of 26 and 24.
In [7]: np.append(arr1,arr2,axis=0).shape
Out[7]: (50, 7)
This is the same as:
In [8]: np.concatenate((arr1,arr2), axis=0).shape
Out[8]: (50, 7)
np.append is poorly name cover for np.concatenate. It is not a list append clone. Learn to use concatenate and its stack derivatives. In
With different dataframe shapes, you cannot make a 3d array. Arrays cannot be 'ragged'.
As for working with more than 2 dataframes, if you can make a list of all the frames, you can use the initial syntax.
alist = []
for a in frame_list:
alist.append(a.values)
arr = np.array(alist, object)
But make such array doesn't do much for you.
If the frames are all the same size, then you can make a 3d array
In [10]: np.array([arr1[:10,:],arr2[:10,:]]).shape
Out[10]: (2, 10, 7)
In [11]: np.stack([arr1[:10,:],arr2[:10,:]]).shape
Out[11]: (2, 10, 7)
But if they differ, stack will complain about that:
In [12]: np.stack([arr1, arr2])
Traceback (most recent call last):
File "<ipython-input-12-23d05d0422dc>", line 1, in <module>
np.stack([arr1, arr2])
File "<__array_function__ internals>", line 180, in stack
File "/usr/local/lib/python3.8/dist-packages/numpy/core/shape_base.py", line 426, in stack
raise ValueError('all input arrays must have the same shape')
ValueError: all input arrays must have the same shape

Can we initialise a numpy array of numpy arrays with different shapes using some constructor?

I want an array that looks like this,
array([array([[1, 1], [2, 2]]), array([3, 3])], dtype=object)
I can make an empty array and then assign elements one by one like this,
z = [np.array([[1,1],[2,2]]), np.array([3,3])]
x = np.empty(shape=2, dtype=object)
x[0], x[1] = z
I thought if this possible then so should be this: x = np.array(z, dtype=object), but that gets me the error: ValueError: could not broadcast input array from shape (2,2) into shape (2).
So is the way given above the only way to make a ragged numpy array? Or, is there a nice one line constructor/function we can can call to make the array x from above.

slicing error in numpy array

I am trying to run the following code
fs = 1000
data = np.loadtxt("trainingdataset.txt", delimiter=",")
data1 = data[:,2]
data2 = data1.astype(int)
X,Y = data2['521']
but it gets me the following error
Traceback (most recent call last):
File "C:\Users\hadeer.elziaat\Desktop\testspec.py", line 58, in <module>
X,Y = data2['521']
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
my dataset
1,4,6,10
2,100,125,10
3,100,7216,254
4,100,527,263
5,100,954,13
6,100,954,23
You're using the string '521' rather than the number 521 for indexing. Try X,Y = data2[521] instead.
If you are only given the string, you could cast it to an int first: X,Y = data2[int('521')], but this might result in some errors and/or unexpected behaviour.
Next problem, you are requiring two variable, one for X and one for Y, yet the data2[521] selection only provides you with a single variable (the number in the 3rd column, 522nd row).
You say you want all the data in the 3rd column.
I assume you also want some kind of x-axis, since you are attempting to do X, Y = .... How about using the first column for that? Then your code would be:
import numpy as np
data = np.loadtxt("trainingdataset.txt", delimiter=',', dtype='int')
x = data[:, 0]
y = data[:, 2]
What remains unclear from your question is why you tried to index your data with 521 - which failed because you cannot use strings as indices on plain arrays.

Tensorflow Expanding Placeholder Dimensions

I have a 2-d placeholder tensor with dimensions of (2,2). How can I expand the columns (same number dimensions) so that the new tensor is (2,3) and assign a constant value to the new column?
For example, the current data may look like
[[2,2], [2,2]]
And I want to transform through tensorflow to (prepending a constant of 1):
[[1,2,2], [1,2,2]]
You can use the tf.concat() op to concatenate a constant with your placeholder:
placeholder = tf.placeholder(tf.int32, shape=[2, 2])
prefix_column = tf.constant([[1], [1]])
expanded_placeholder = tf.concat([prefix_column, placeholder], axis=1)

Value error while generating indexes using PCA in scikit-learn

Using the following function i am trying to generate index from the data:
Function:
import numpy as np
from sklearn.decomposition import PCA
def pca_index(data,components=1,indx=1):
corrs = np.asarray(data.cov())
pca = PCA(n_components = components).fit(corrs)
trns = pca.transform(data)
index=np.dot(trns[0:indx],pca.explained_variance_ratio_[0:indx])
return index
Index: generation from principal components
index = pca_index(data=mydata,components=3,indx=2)
Following error is being generated when i am calling the function:
Traceback (most recent call last):
File "<ipython-input-411-35115ef28e61>", line 1, in <module>
index = pca_index(data=mydata,components=3,indx=2)
File "<ipython-input-410-49c0174a047a>", line 15, in pca_index
index=np.dot(trns[0:indx],pca.explained_variance_ratio_[0:indx])
ValueError: shapes (2,3) and (2,) not aligned: 3 (dim 1) != 2 (dim 0)
Can anyone help with the error.
According to my understanding there is some error at the following point when i am passing the subscript indices as variable (indx):
trns[0:indx],pca.explained_variance_ratio_[0:**indx**]
In np.dot you are trying to multiply a matrix having dimensions (2,3) with a matrix having dimensions (2,), i.e. a vector.
However, you can only multiply NxM to MxP, e.g. (3,2) to (2,1) or (2,3) to (3,1).
In your example the second matrix have dimensions of (2,) which, in numpy terms, is similar but not the same as (2,1). You can reshape a vector into a matrix with vector.reshape([2,1])
You might also transpose you first matrix, thus converting its dimensions from (2,3) to (3,2).
However, make sure that you multiply appropriate matrices as the result will differ from you might expect.

Resources