numpy ndarray dtype convertion failed - python-3.x

I have a piece of code that did some ndarray transformation, and I'd like to convert the final output to be np.int8 type and output it to file. However, the conversion did not work. Here is the piece of code:
print("origin dtype:", image[0].dtype)
print(type(image[0]))
image[0] = image[0].astype(np.uint8)
print(image[0])
print("image datatype1:",image[0].dtype)
image[0].tofile(f'{image_name}_{org_h}_{org_w}_{dst_h}_{dst_w}.bin')
print("image datatype2:",image[0].dtype)
Here is what I got:
origin dtype: float32
<class 'numpy.ndarray'>
[[[ 71. 73. 73. ... 167. 170. 173.]
[ 62. 63. 64. ... 164. 168. 170.]
[ 54. 56. 57. ... 157. 163. 165.]
...
[142. 154. 138. ... 115. 91. 111.]
[158. 127. 123. ... 128. 130. 113.]
[133. 114. 106. ... 114. 110. 106.]]]
image datatype1: float32
image datatype2: float32
Can somebody help me with where it went wrong?

Rows of a 2D array cannot have a different dtypes: when you assign a uint8 array to the row of a float32 array, it is cast to float32; for example:
image = np.ones((4, 4), dtype='float32')
print(image[0].dtype)
# float32
image[0] = image[0].astype('uint8')
print(image[0].dtype)
# float32
Your options are either to convert the dtype of the entire array at once:
image = image.astype('uint8')
print(image[0].dtype)
# uint8
Or to convert your 2D array to a list of 1D arrays, each of which can then have its own dtype:
image = list(image)
print(image[0].dtype)
# float32
image[0] = image[0].astype('uint8')
print(image[0].dtype)
# uint8

Related

How to set up the interpolation problem using ndimage.map_coordinates?

According to the documentation of scipy.ndimage.map_coordinates,
The array of coordinates is used to find, for each point in the
output, the corresponding coordinates in the input. The value of the
input at those coordinates is determined by spline interpolation of
the requested order.
The shape of the output is derived from that of the coordinate array
by dropping the first axis. The values of the array along the first
axis are the coordinates in the input array at which the output value
is found.
I have a discrete 3-d function that is defined on a 3d grid (t, x, y); on every point of this 3d grid, the function has a unique value unless it's value is zero.
I have another set of arrays in the form of a pandas dataframe with three columns, t_new, x_new, and y_new.
I would like to use scipy.ndimage.map_coordinates to interpolate the function in order to calculate its value on the new dataset presented in the said dataframe.
Since I am getting the following error message, I am sure I am not setting up the map_coordinates correctly:
File "D:\Users\username\Anaconda3\lib\site-packages\scipy\ndimage\interpolation.py", line 437, in map_coordinates
raise RuntimeError('invalid shape for coordinate array')
Here is my definition of the interpolation function:
from scipy.ndimage import map_coordinates
def interpolator_3d(df, func_values):
# The coordinates at which input is evaluated
coordinates = df[['t_new', 'x_new', 'y_new']].values.T # (3, 1273)
# list of input array [[t0, x0, y0, value0], [t1, x1, y1, value1], ...]
input_arr = func_values # (1780020000, 4)
return map_coordinates(input_arr, coordinates)
There are at least two issues with how you are using map_coordinates. Keep in mind that this function was designed for image resampling.
If you have a 3d-function the array input_arr should be 3-dimensional. map_coordinates will use the indices as t, x and y coordinates. The value v of the function has to be stored at each respective position. If your original function has another base grid, then you have to normalize everything accordingly to the arrays indices before and after. This requires an equidistant grid as input.
The coordinates have to be an array e.g. of the form [[t_new_0, t_new_1, ...], [x_new_0, x_new_1 ...], [y_new_0, y_new_1, ...]]. The result will be a list of interpolated samples [[v_new_0, v_new_1, ...]]. Generally, if input_array is n-dimensional, coordinates has to be a list that contains n arrays of same shape S. The result will be a list of arrays of shape S.
Example with n=3 dimensions and 5 samples to interpolate in a 1-dimensional shape:
import numpy as np
from scipy import ndimage
a = np.arange(64.).reshape((4, 4, 4))
print(a)
out = ndimage.map_coordinates(a, [
[0.5, 1.0, 1.5, 2.0, 2.5], [0.1, 0.2, 0.3, 0.4, 0.5], [2.0, 1.9, 1.8, 1.7, 1.6]
])
print(out)
Output:
[[[ 0. 1. 2. 3.]
[ 4. 5. 6. 7.]
[ 8. 9. 10. 11.]
[12. 13. 14. 15.]]
[[16. 17. 18. 19.]
[20. 21. 22. 23.]
[24. 25. 26. 27.]
[28. 29. 30. 31.]]
[[32. 33. 34. 35.]
[36. 37. 38. 39.]
[40. 41. 42. 43.]
[44. 45. 46. 47.]]
[[48. 49. 50. 51.]
[52. 53. 54. 55.]
[56. 57. 58. 59.]
[60. 61. 62. 63.]]]
[ 7.6688, 18.148 , 26.3424, 34.6304, 45.3904]
Update:
That means, if your input_array has the form [[t0, x0, y0, value0], [t1, x1, y1, value1], ...] with length 1780020000 = 19778 * 500 * 180 it has to be transformed accordingly to an array of shape (19778, 500, 180):
t_max, x_max, y_max, _ = np.max(func_values, axis=0).astype(int) + 1 # 19778, 500, 180
input_arr = np.zeros((t_max, x_max, y_max), dtype=float)
for t, x, y, v in func_values:
input_arr[int(t), int(x), int(y)] = v

AxisError: axis 1 is out of bounds for array of dimension 1 when calculating AUC

I have a classification problem where I have the pixels values of an 8x8 image and the number the image represents and my task is to predict the number('Number' attribute) based on the pixel values using RandomForestClassifier. The values of the number values can be 0-9.
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score
forest_model = RandomForestClassifier(n_estimators=100, random_state=42)
forest_model.fit(train_df[input_var], train_df[target])
test_df['forest_pred'] = forest_model.predict_proba(test_df[input_var])[:,1]
roc_auc_score(test_df['Number'], test_df['forest_pred'], average = 'macro', multi_class="ovr")
Here it throws an AxisError.
Traceback (most recent call last):
File "dap_hazi_4.py", line 44, in
roc_auc_score(test_df['Number'], test_df['forest_pred'], average = 'macro', multi_class="ovo")
File "/home/balint/.local/lib/python3.6/site-packages/sklearn/metrics/_ranking.py", line 383, in roc_auc_score
multi_class, average, sample_weight)
File "/home/balint/.local/lib/python3.6/site-packages/sklearn/metrics/_ranking.py", line 440, in _multiclass_roc_auc_score
if not np.allclose(1, y_score.sum(axis=1)):
File "/home/balint/.local/lib/python3.6/site-packages/numpy/core/_methods.py", line 38, in _sum
return umr_sum(a, axis, dtype, out, keepdims, initial, where)
AxisError: axis 1 is out of bounds for array of dimension 1
The error is due to multi-class problem that you are solving as others suggested. All you need to do is instead of predicting the class, you need to predict the probabilities. I had this same problem before, doing this solves it.
Here is how to do it -
# you might be predicting the class this way
pred = clf.predict(X_valid)
# change it to predict the probabilities which solves the AxisError problem.
pred_prob = clf.predict_proba(X_valid)
roc_auc_score(y_valid, pred_prob, multi_class='ovr')
0.8164900342274142
# shape before
pred.shape
(256,)
pred[:5]
array([1, 2, 1, 1, 2])
# shape after
pred_prob.shape
(256, 3)
pred_prob[:5]
array([[0. , 1. , 0. ],
[0.02, 0.12, 0.86],
[0. , 0.97, 0.03],
[0. , 0.8 , 0.2 ],
[0. , 0.42, 0.58]])
Actually, as your problem is multi-class the labels must be one-hot encoded.
When labels are one-hot encoded then the 'multi_class' arguments work.
By providing one-hot encoded labels you can resolve the error.
Suppose, you have 100 test labels with 5 unique classes then your matrix size(test label's) must be (100,5) NOT (100,1)
You sure this [:,1] in test_df['forest_pred'] = forest_model.predict_proba(test_df[input_var])[:,1]
is right? It's probably 1D array

In Python and Scipy.sparse, how to assign the value of a matrix to a block of another larger matrix with a loop?

This is for Python 3.x or more specifically Scipy.sparse. I wish to write
import numpy as np, scipy.sparse as sp
A = sp.csc_matrix((4,3))
B = sp.csc_matrix(np.random.rand(2,1))
A[2:-1,0] = B
The last line does not work. I intend the last line to insert the matrix B into A as the block with row 3 to row 4 and column 0. What is the correct way to achieve this assignment without a loop?
The setup
In [219]: from scipy import sparse
In [220]: A = sparse.csr_matrix((4,3))
In [221]: A
Out[221]:
<4x3 sparse matrix of type '<class 'numpy.float64'>'
with 0 stored elements in Compressed Sparse Row format>
In [222]: B = sparse.csr_matrix(np.random.rand(2,1))
In [223]: B
Out[223]:
<2x1 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>
In [224]: B.A
Out[224]:
array([[0.04427272],
[0.03421125]])
Your attempt, WITH ERROR
In [225]: A[2:-1, 0] = B
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-225-0dcce4b72635> in <module>
----> 1 A[2:-1, 0] = B
/usr/local/lib/python3.6/dist-packages/scipy/sparse/_index.py in __setitem__(self, key, x)
111 if not ((broadcast_row or x.shape[0] == i.shape[0]) and
112 (broadcast_col or x.shape[1] == i.shape[1])):
--> 113 raise ValueError('shape mismatch in assignment')
114 if x.size == 0:
115 return
ValueError: shape mismatch in assignment
So let's focus on shapes:
In [226]: A[2:-1, 0].shape
Out[226]: (1, 1)
In [227]: B.shape
Out[227]: (2, 1)
Well duh! we can't put a (2,1) into a (1,1) slot, even with dense arrays.
If we drop the -1, we get a 2 element slot:
In [230]: A.A[2:,0].shape
Out[230]: (2,)
Now the assignment works - with a warning.
In [231]: A[2:, 0] = B
/usr/local/lib/python3.6/dist-packages/scipy/sparse/_index.py:118: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
self._set_arrayXarray_sparse(i, j, x)
In [232]: A
Out[232]:
<4x3 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>
In [233]: A.A
Out[233]:
array([[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0.04427272, 0. , 0. ],
[0.03421125, 0. , 0. ]])
As a general rule, we don't make sparse matrices by assigning values to an existing 'empty' one. We make a csr matrix with the coo styles of inputs - the row, col, data arrays.

opencv python3 merge acting weird

I have been trying to find answer for hours for this strange behavior of cv2.merge().
In short, I'm merging 3 images of uint8 1-channel with size of 960x1280, gets a merged image of 960x1280x3,
but each channel is 1280x3 instead of 960x1280.
As a result, I can't plot it.
I'm loading each image using:
img = cv2.imread(file).astype(np.uint8)
if len(img.shape) > 2: img = img[:,:,1]
Here is the code for merging (with additional information):
alg = (img1,img2,img3)
print('type: ',type(alg[0]),type(alg[1]),type(alg[2]))
print('dtype: ',alg[0].dtype, alg[1].dtype, alg[2].dtype)
print('shape: ',alg[0].shape, alg[1].shape, alg[2].shape)
PseudoRGB = cv2.merge(alg)
print('\nmerged type: ',type(PseudoRGB))
print('merged dtype: ',PseudoRGB.dtype)
print('merged shape: ',PseudoRGB.shape)
print('merged shape, each channel: ',PseudoRGB[0].shape, PseudoRGB[1].shape, PseudoRGB[2].shape)
That gives me:
type: <class 'numpy.ndarray'> <class 'numpy.ndarray'> <class 'numpy.ndarray'>
dtype: uint8 uint8 uint8
shape: (960, 1280) (960, 1280) (960, 1280)
merged type: <class 'numpy.ndarray'>
merged dtype: uint8
merged shape: (960, 1280, 3)
merged shape, each channel: (1280, 3) (1280, 3) (1280, 3)
Any help is much appreciated.

Tree classifier to graphviz ERROR

I had made a Tree Classifier named model and tried to use the export graphviz function like this:
export_graphviz(decision_tree=model,
out_file='NT_model.dot',
feature_names=X_train.columns,
class_names=model.classes_,
leaves_parallel=True,
filled=True,
rotate=False,
rounded=True)
For some reason my run had raised this exception:
TypeError Traceback (most recent call last)
<ipython-input-298-40fe56bb0c85> in <module>()
6 filled=True,
7 rotate=False,
----> 8 rounded=True)
C:\Users\yonatanv\AppData\Local\Continuum\Anaconda3\lib\site-
packages\sklearn\tree\export.py in export_graphviz(decision_tree, out_file,
max_depth, feature_names, class_names, label, filled, leaves_parallel,
impurity, node_ids, proportion, rotate, rounded, special_characters)
431 recurse(decision_tree, 0, criterion="impurity")
432 else:
--> 433 recurse(decision_tree.tree_, 0,
criterion=decision_tree.criterion)
434
435 # If required, draw leaf nodes at same depth as each other
C:\Users\yonatanv\AppData\Local\Continuum\Anaconda3\lib\site-
packages\sklearn\tree\export.py in recurse(tree, node_id, criterion, parent,
depth)
319 out_file.write('%d [label=%s'
320 % (node_id,
--> 321 node_to_str(tree, node_id,
criterion)))
322
323 if filled:
C:\Users\yonatanv\AppData\Local\Continuum\Anaconda3\lib\site-
packages\sklearn\tree\export.py in node_to_str(tree, node_id, criterion)
289 np.argmax(value),
290 characters[2])
--> 291 node_string += class_name
292
293 # Clean up any trailing newlines
TypeError: ufunc 'add' did not contain a loop with signature matching types
dtype('<U90') dtype('<U90') dtype('<U90')
My hyper parameters for the visualizations are those:
print(model)
DecisionTreeClassifier(class_weight={1.0: 10, 0.0: 1}, criterion='gini',
max_depth=7, max_features=None, max_leaf_nodes=None,
min_impurity_split=1e-07, min_samples_leaf=50,
min_samples_split=2, min_weight_fraction_leaf=0.0,
presort=False, random_state=0, splitter='best')
print(model.classes_)
[ 0. , 1. ]
Help would be most appreciated!
As you see here specified in the documentation of export_graphviz, the param class_names works for strings, not float or int.
class_names : list of strings, bool or None, optional (default=None)
Try converting the model.classes_ to list of strings before passing them in export_graphviz.
Try class_names=['0', '1'] or class_names=['0.0', '1.0'] in the call to export_graphviz().
For a more general solution, use:
class_names=[str(x) for x in model.classes_]
But is there a specific reason that you are passing float values as y in model.fit()? Because that is mostly not required in classification task. Do you have actual y labels as this only or are you converting string labels to numeric before fitting the model?

Resources