I keep getting "TypeError: only integer scalar arrays can be converted to a scalar index" while using custom-defined metric in KNeighborsClassifier - python-3.x

I am using a custom-defined metric in SKlearn's KNeighborsClassifier. Here's my code:
def chi_squared(x,y):
return np.divide(np.square(np.subtract(x,y)), np.sum(x,y))
Above function implementation of chi squared distance function. I have used NumPy functions because according to scikit-learn docs, metric function takes two one-dimensional numpy arrays.
I have passed the chi_squared function as an argument to KNeighborsClassifier().
knn = KNeighborsClassifier(algorithm='ball_tree', metric=chi_squared)
However, I keep getting following error:
TypeError Traceback (most recent call last)
<ipython-input-29-d2a365ebb538> in <module>
4
5 knn = KNeighborsClassifier(algorithm='ball_tree', metric=chi_squared)
----> 6 knn.fit(X_train, Y_train)
7 predictions = knn.predict(X_test)
8 print(accuracy_score(Y_test, predictions))
~/.local/lib/python3.8/site-packages/sklearn/neighbors/_classification.py in fit(self, X, y)
177 The fitted k-nearest neighbors classifier.
178 """
--> 179 return self._fit(X, y)
180
181 def predict(self, X):
~/.local/lib/python3.8/site-packages/sklearn/neighbors/_base.py in _fit(self, X, y)
497
498 if self._fit_method == 'ball_tree':
--> 499 self._tree = BallTree(X, self.leaf_size,
500 metric=self.effective_metric_,
501 **self.effective_metric_params_)
sklearn/neighbors/_binary_tree.pxi in sklearn.neighbors._ball_tree.BinaryTree.__init__()
sklearn/neighbors/_binary_tree.pxi in sklearn.neighbors._ball_tree.BinaryTree._recursive_build()
sklearn/neighbors/_ball_tree.pyx in sklearn.neighbors._ball_tree.init_node()
sklearn/neighbors/_binary_tree.pxi in sklearn.neighbors._ball_tree.BinaryTree.rdist()
sklearn/neighbors/_dist_metrics.pyx in sklearn.neighbors._dist_metrics.DistanceMetric.rdist()
sklearn/neighbors/_dist_metrics.pyx in sklearn.neighbors._dist_metrics.PyFuncDistance.dist()
sklearn/neighbors/_dist_metrics.pyx in sklearn.neighbors._dist_metrics.PyFuncDistance._dist()
<ipython-input-29-d2a365ebb538> in chi_squared(x, y)
1 def chi_squared(x,y):
----> 2 return np.divide(np.square(np.subtract(x,y)), np.sum(x,y))
3
4
5 knn = KNeighborsClassifier(algorithm='ball_tree', metric=chi_squared)
<__array_function__ internals> in sum(*args, **kwargs)
~/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py in sum(a, axis, dtype, out, keepdims, initial, where)
2239 return res
2240
-> 2241 return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
2242 initial=initial, where=where)
2243
~/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
85 return reduction(axis=axis, out=out, **passkwargs)
86
---> 87 return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
88
89
TypeError: only integer scalar arrays can be converted to a scalar index

I can reproduce your error message with:
In [173]: x=np.arange(3); y=np.array([2,3,4])
In [174]: np.sum(x,y)
Traceback (most recent call last):
File "<ipython-input-174-1a1a267ebd82>", line 1, in <module>
np.sum(x,y)
File "<__array_function__ internals>", line 5, in sum
File "/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py", line 2247, in sum
return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
File "/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
TypeError: only integer scalar arrays can be converted to a scalar index
Correct use(s) of np.sum:
In [175]: np.sum(x)
Out[175]: 3
In [177]: np.sum(np.arange(6).reshape(2,3), axis=0)
Out[177]: array([3, 5, 7])
In [178]: np.sum(np.arange(6).reshape(2,3), 0)
Out[178]: array([3, 5, 7])
(re)read the np.sum docs if necessary!
Using np.add instead of np.sum:
In [179]: np.add(x,y)
Out[179]: array([2, 4, 6])
In [180]: x+y
Out[180]: array([2, 4, 6])
The following should be equivalent:
np.divide(np.square(np.subtract(x,y)), np.add(x,y))
(x-y)**2/(x+y)

Related

TypeError: No loop matching the specified signature and casting was found for ufunc add (Python)

I have a strange problem relating to a topic model I am running with BERTopic. The model runs without any errors in Colab and vscode venv. However, when I run the same model in Jupyter Notebook using the same venv as I have in the vscode venv, the model returns an error, half-way through the run.
The error is below:
TypeError Traceback (most recent call last)
<timed exec> in <module>
c:\python\python39\lib\site-packages\bertopic\_bertopic.py in fit_transform(self, documents, embeddings, y)
285 # Reduce dimensionality with UMAP
286 if self.seed_topic_list is not None and self.embedding_model is not None:
--> 287 y, embeddings = self._guided_topic_modeling(embeddings)
288 umap_embeddings = self._reduce_dimensionality(embeddings, y)
289
c:\python\python39\lib\site-packages\bertopic\_bertopic.py in _guided_topic_modeling(self, embeddings)
1424 for seed_topic in range(len(seed_topic_list)):
1425 indices = [index for index, topic in enumerate(y) if topic == seed_topic]
-> 1426 embeddings[indices] = np.average([embeddings[indices], seed_topic_embeddings[seed_topic]], weights=[3, 1])
1427 return y, embeddings
1428
<__array_function__ internals> in average(*args, **kwargs)
c:\python\python39\lib\site-packages\numpy\lib\function_base.py in average(a, axis, weights, returned)
405 wgt = wgt.swapaxes(-1, axis)
406
--> 407 scl = wgt.sum(axis=axis, dtype=result_dtype)
408 if np.any(scl == 0.0):
409 raise ZeroDivisionError(
c:\python\python39\lib\site-packages\numpy\core\_methods.py in _sum(a, axis, dtype, out, keepdims, initial, where)
45 def _sum(a, axis=None, dtype=None, out=None, keepdims=False,
46 initial=_NoValue, where=True):
---> 47 return umr_sum(a, axis, dtype, out, keepdims, initial, where)
48
49 def _prod(a, axis=None, dtype=None, out=None, keepdims=False,
TypeError: No loop matching the specified signature and casting was found for ufunc add
Not sure what the source of the error could be, since the same code works in Colab and vscode venv. Any pointers in the right direction would be greatly appreciated.

the TypeError: 'float' object cannot be interpreted as an integer in stride_trick.as_strided

When trying to replicating the code given here.
import numpy as np
n=4
m=5
a = np.arange(1,n*m+1).reshape(n,m)
sz = a.itemsize
h,w = a.shape
bh,bw = 2,2
shape = (h/bh, w/bw, bh, bw)
strides = sz*np.array([w*bh,bw,w,1])
blocks=np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
print(blocks)
I got the following error message, what might be the reason?
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-3-0c3a23be3e7f> in <module>
12
13
---> 14 blocks=np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
15 print(blocks)
~\AppData\Local\Continuum\anaconda3\envs\dropletflow\lib\site-packages\numpy\lib\stride_tricks.py in as_strided(x, shape, strides, subok, writeable)
100 interface['strides'] = tuple(strides)
101
--> 102 array = np.asarray(DummyArray(interface, base=x))
103 # The route via `__interface__` does not preserve structured
104 # dtypes. Since dtype should remain unchanged, we set it explicitly.
~\AppData\Local\Continuum\anaconda3\envs\dropletflow\lib\site-packages\numpy\core\numeric.py in asarray(a, dtype, order)
499
500 """
--> 501 return array(a, dtype, copy=False, order=order)
502
503
TypeError: 'float' object cannot be interpreted as an integer
Your shape is (2.0, 2.5, 2, 2), however the shape parameter is expecting a sequence of integers (as seen in the API for np.lib.stride_tricks.as_strided)

Problems with seaborn (ndim)

I am using seaborn to plot a very simple data set. Here is what I do:
import seaborn as sns
import pandas as pd
df = pd.read_excel('myfile.xlsx')
sns.set(style="white")
g = sns.PairGrid(df, diag_sharey=False)
g.map_lower(sns.kdeplot)
g.map_upper(sns.scatterplot)
g.map_diag(sns.kdeplot, lw=3)
I get the following error: AttributeError: 'NoneType' object has no attribute 'ndim'. Weirdly, the plot is ploted in parts (see below).
Any idea why that is the case and what I can do to solve the issue?
EDIT:
The dataframe has the following attributes:
plan_change int64
user_login float64
new_act_ratio float64
on_time int64
Unfortunately, I cannot upload the data set. However I can say, that plotting other seaborn graphs works just fine.
The total error message is the following:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-16-2dbc61abd2bd> in <module>()
3 g = sns.PairGrid(df, diag_sharey=False)
4 g.map_lower(sns.kdeplot)
----> 5 g.map_upper(sns.scatterplot)
6 g.map_diag(sns.kdeplot, lw=3)
7
/anaconda/lib/python3.5/site-packages/seaborn/axisgrid.py in map_upper(self, func, **kwargs)
1488 color = self.palette[k] if kw_color is None else kw_color
1489 func(data_k[x_var], data_k[y_var], label=label_k,
-> 1490 color=color, **kwargs)
1491
1492 self._clean_axis(ax)
/anaconda/lib/python3.5/site-packages/seaborn/relational.py in scatterplot(x, y, hue, style, size, data, palette, hue_order, hue_norm, sizes, size_order, size_norm, markers, style_order, x_bins, y_bins, units, estimator, ci, n_boot, alpha, x_jitter, y_jitter, legend, ax, **kwargs)
1333 x_bins=x_bins, y_bins=y_bins,
1334 estimator=estimator, ci=ci, n_boot=n_boot,
-> 1335 alpha=alpha, x_jitter=x_jitter, y_jitter=y_jitter, legend=legend,
1336 )
1337
/anaconda/lib/python3.5/site-packages/seaborn/relational.py in __init__(self, x, y, hue, size, style, data, palette, hue_order, hue_norm, sizes, size_order, size_norm, dashes, markers, style_order, x_bins, y_bins, units, estimator, ci, n_boot, alpha, x_jitter, y_jitter, legend)
850
851 plot_data = self.establish_variables(
--> 852 x, y, hue, size, style, units, data
853 )
854
/anaconda/lib/python3.5/site-packages/seaborn/relational.py in establish_variables(self, x, y, hue, size, style, units, data)
155 units=units
156 )
--> 157 plot_data = pd.DataFrame(plot_data)
158
159 # Option 3:
/anaconda/lib/python3.5/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
264 dtype=dtype, copy=copy)
265 elif isinstance(data, dict):
--> 266 mgr = self._init_dict(data, index, columns, dtype=dtype)
267 elif isinstance(data, ma.MaskedArray):
268 import numpy.ma.mrecords as mrecords
/anaconda/lib/python3.5/site-packages/pandas/core/frame.py in _init_dict(self, data, index, columns, dtype)
400 arrays = [data[k] for k in keys]
401
--> 402 return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
403
404 def _init_ndarray(self, values, index, columns, dtype=None, copy=False):
/anaconda/lib/python3.5/site-packages/pandas/core/frame.py in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
5382
5383 # don't force copy because getting jammed in an ndarray anyway
-> 5384 arrays = _homogenize(arrays, index, dtype)
5385
5386 # from BlockManager perspective
/anaconda/lib/python3.5/site-packages/pandas/core/frame.py in _homogenize(data, index, dtype)
5693 v = lib.fast_multiget(v, oindex.values, default=NA)
5694 v = _sanitize_array(v, index, dtype=dtype, copy=False,
-> 5695 raise_cast_failure=False)
5696
5697 homogenized.append(v)
/anaconda/lib/python3.5/site-packages/pandas/core/series.py in _sanitize_array(data, index, dtype, copy, raise_cast_failure)
2917
2918 # scalar like
-> 2919 if subarr.ndim == 0:
2920 if isinstance(data, list): # pragma: no cover
2921 subarr = np.array(data, dtype=object)
AttributeError: 'NoneType' object has no attribute 'ndim'

issues storing and extracting arrays in numpy file

Trying to store an array in numpy file however, while trying to extract it, and use it, getting an error message as trying to apply array to a sequence.
These are the two arrays, unsure which one is causing the issue.
X = [[1,2,3],[4,5,6],[7,8,9]]
y = [0,1,2,3,4,5,6....]
while trying to retrieve it and use it getting the values as:
X: array(list[1,2,3],list[4,5,6],list[7,8,9])
y = array([0,1,2,3,4,5...])
Here is the code:
vectors = np.array(X)
labels = np.array(y)
While retrieving working on t-sne
visualisations = TSNE(n_components=2).fit_transform(X,y)
I get the following error:
ValueError Traceback (most recent call last)
<ipython-input-11-244f99341167> in <module>()
----> 1 visualisations = TSNE(n_components=2).fit_transform(X,y)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\manifold\t_sne.py in fit_transform(self, X, y)
856 Embedding of the training data in low-dimensional space.
857 """
--> 858 embedding = self._fit(X)
859 self.embedding_ = embedding
860 return self.embedding_
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\manifold\t_sne.py in _fit(self, X, skip_num_points)
658 else:
659 X = check_array(X, accept_sparse=['csr', 'csc', 'coo'],
--> 660 dtype=[np.float32, np.float64])
661 if self.method == 'barnes_hut' and self.n_components > 3:
662 raise ValueError("'n_components' should be inferior to 4 for the "
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
431 force_all_finite)
432 else:
--> 433 array = np.array(array, dtype=dtype, order=order, copy=copy)
434
435 if ensure_2d:
ValueError: setting an array element with a sequence.
Assuming I understand you correctly you need to package the first group in a list; something like this:
import numpy as np
#X = [[1,2,3],[4,5,6],[7,8,9]]
#y = [0,1,2,3,4,5,6, 7, 8, 9]
X = np.array([[1,2,3],[4,5,6],[7,8,9]])
y = np.array([0,1,2,3,4,5, 6, 7, 8, 9])
array(list[1,2,3],list[4,5,6],list[7,8,9])
is a 1d object dtype array. To get that from
[[1,2,3],[4,5,6],[7,8,9]]
requires more than np.array([[1,2,3],[4,5,6],[7,8,9]]); either the list elements have to vary in size, or you have to initialize an object array and copy the list values to it.
In any case fit_transform cannot handle that kind of array. It expects a 2d numeric dtype. Notice the parameters to the check_array function.
If all the list elements of X are the same size, then
X = np.stack(X)
should turn it into a 2d numeric array.
I suspect X was that 1d object array type before saving. By itself save/load should not turn a 2d numeric array into an object one.

Reconstruct Image from overlapping patches of image

I have used tf.extract_image_patches() to get a tensor of overlapping patches
from the image as described in this link. The answer in the mentioned link suggests to use tf.space_to_depth() to reconstruct the image from overlapping patches. But the problem is that this does not give the desirable results in my case and upon researching I came to know that tf.space_to_depth() does not deal with the overlapping blocks. My code looks like:
import tensorflow as tf
import numpy as np
c = 3
height = 3900
width = 6000
ksizes = [1, 150, 150, 1]
strides = [1, 75, 75, 1]
image = #image of shape [1, height, width, 3]
patches = tf.extract_image_patches(image, ksizes = ksizes, strides= strides, [1, 1, 1, 1], 'VALID')
patches = tf.reshape(patches, [-1, 150, 150, 3])
reconstructed = tf.reshape(patches, [1, height, width, 3])
rec_new = tf.space_to_depth(reconstructed,75)
rec_new = tf.reshape(rec_new,[height,width,3])
This gives me error:
InvalidArgumentError Traceback (most recent call
last)
D:\AnacondaIDE\lib\site-packages\tensorflow\python\framework\common_shapes.py
in _call_cpp_shape_fn_impl(op, input_tensors_needed,
input_tensors_as_shapes_needed, require_shape_fn)
653 graph_def_version, node_def_str, input_shapes, input_tensors,
--> 654 input_tensors_as_shapes, status)
655 except errors.InvalidArgumentError as err:
D:\AnacondaIDE\lib\contextlib.py in exit(self, type, value,
traceback)
87 try:
---> 88 next(self.gen)
89 except StopIteration:
D:\AnacondaIDE\lib\site-packages\tensorflow\python\framework\errors_impl.py
in raise_exception_on_not_ok_status()
465 compat.as_text(pywrap_tensorflow.TF_Message(status)),
--> 466 pywrap_tensorflow.TF_GetCode(status))
467 finally:
InvalidArgumentError: Dimension size must be evenly divisible by
70200000 but is 271957500 for 'Reshape_22' (op: 'Reshape') with input
shapes: [4029,150,150,3], [4] and with input tensors computed as
partial shapes: input1 = [?,3900,6000,3].
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call
last) in ()
----> 1 reconstructed = tf.reshape(features, [-1, height, width, channel])
2 rec_new = tf.space_to_depth(reconstructed,75)
3 rec_new = tf.reshape(rec_new,[h,h,c])
D:\AnacondaIDE\lib\site-packages\tensorflow\python\ops\gen_array_ops.py
in reshape(tensor, shape, name) 2617 """ 2618 result =
_op_def_lib.apply_op("Reshape", tensor=tensor, shape=shape,
-> 2619 name=name) 2620 return result 2621
D:\AnacondaIDE\lib\site-packages\tensorflow\python\framework\op_def_library.py
in apply_op(self, op_type_name, name, **keywords)
765 op = g.create_op(op_type_name, inputs, output_types, name=scope,
766 input_types=input_types, attrs=attr_protos,
--> 767 op_def=op_def)
768 if output_structure:
769 outputs = op.outputs
D:\AnacondaIDE\lib\site-packages\tensorflow\python\framework\ops.py in
create_op(self, op_type, inputs, dtypes, input_types, name, attrs,
op_def, compute_shapes, compute_device) 2630
original_op=self._default_original_op, op_def=op_def) 2631 if
compute_shapes:
-> 2632 set_shapes_for_outputs(ret) 2633 self._add_op(ret) 2634
self._record_op_seen_by_control_dependencies(ret)
D:\AnacondaIDE\lib\site-packages\tensorflow\python\framework\ops.py in
set_shapes_for_outputs(op) 1909 shape_func =
_call_cpp_shape_fn_and_require_op 1910
-> 1911 shapes = shape_func(op) 1912 if shapes is None: 1913 raise RuntimeError(
D:\AnacondaIDE\lib\site-packages\tensorflow\python\framework\ops.py in
call_with_requiring(op) 1859 1860 def
call_with_requiring(op):
-> 1861 return call_cpp_shape_fn(op, require_shape_fn=True) 1862 1863 _call_cpp_shape_fn_and_require_op =
call_with_requiring
D:\AnacondaIDE\lib\site-packages\tensorflow\python\framework\common_shapes.py
in call_cpp_shape_fn(op, require_shape_fn)
593 res = _call_cpp_shape_fn_impl(op, input_tensors_needed,
594 input_tensors_as_shapes_needed,
--> 595 require_shape_fn)
596 if not isinstance(res, dict):
597 # Handles the case where _call_cpp_shape_fn_impl calls unknown_shape(op).
D:\AnacondaIDE\lib\site-packages\tensorflow\python\framework\common_shapes.py
in _call_cpp_shape_fn_impl(op, input_tensors_needed,
input_tensors_as_shapes_needed, require_shape_fn)
657 missing_shape_fn = True
658 else:
--> 659 raise ValueError(err.message)
660
661 if missing_shape_fn:
ValueError: Dimension size must be evenly divisible by 70200000 but is
271957500 for 'Reshape_22' (op: 'Reshape') with input shapes:
[4029,150,150,3], [4] and with input tensors computed as partial
shapes: input1 = [?,3900,6000,3].
I know this is error due to non-compatible dimensions, but it should be that way, right? Please help me to solve this.
I guess that the problem is that in the link you posted the author is using the same value for strides and ksizes, while you are using strides equal to one half of ksizes. This is the reason why the dimensions do not match, you should write the logic of reducing the size of the patches before gluing them (for instance by selecting the central square of each patch).

Resources