Following is the function I want to implement in python. I am getting Type Errors when defining a function. I tried defining using numpy.piecewise function object and also using just elif commands as a definition. I want to be able to then evaluate this function at different points as well as expressions like f(X-1) etc
This is my code:
from numpy import piecewise
from scipy import *
from sympy.abc import x
from sympy.utilities.lambdify import lambdify, implemented_function
from sympy import Function
from sympy import *
h = 0.5
a = -1
n = 2
x = Symbol('x')
expr = piecewise((0, x-a <= -2*h), ((1/6)*(2*h+(x-a))**3, -2*h<=x-a<=-h), (2*h**3/3-0.5*(x-a)**2*(2*h+(x-a)), -h<= x-a<= 0), (2*(h**3/3)-0.5*(x-a)**2*(2*h+(x-a)), 0<=x-a<=2*h), ((1/6)*(2*h-(x-a))**3, h<=x-a<=2*h), (0, x-a<=2*h))
p = lambdify((x, a,b,h), expr)
def basis(x,a,b, h):
if x <= a-2*h:
return 0;
elif (x<=a-h) or (x >=2*h):
return (1/6)*(2*h+(x-a))**3
elif (x-a<= 0) or (x-a >= -h):
return (2*h**3/3-0.5*(x-a)**2*(2*h+(x-a)));
elif (x<=2*h+a) or (x >= 0):
return (2*(h**3/3)-0.5*(x-a)**2*(2*h+(x-a)));
elif (x<=a+2*h) or (x >= h):
return (1/6)*(2*h-(x-a))**3;
elif x-a<=2*h:
return 0
basis(x, -1,0.5,0)
Both ways I get this :
raise TypeError("cannot determine truth value of Relational")
TypeError: cannot determine truth value of Relational
You can use sympy's lambdify function to generate the numpy piecewise function. This is a simpler example but shows the general idea:
In [15]: from sympy import symbols, Piecewise
In [16]: x, a = symbols('x, a')
In [17]: expr = Piecewise((x, x>a), (0, True))
In [18]: expr
Out[18]:
⎧x for a < x
⎨
⎩0 otherwise
In [19]: from sympy import lambdify
In [20]: fun = lambdify((x, a), expr)
In [21]: fun([1, 3], [4, 2])
Out[21]: array([0., 3.])
In [22]: import inspect
In [23]: print(inspect.getsource(fun))
def _lambdifygenerated(x, a):
return (select([less(a, x),True], [x,0], default=nan))
Sorry about the length of this answer, but I think you need to see the full debugging process. I had to look at the tracebacks and test small pieces of your code to identify the exact problem. I've seen a lot of the numpy ambiguity error, but not this sympy relational error.
===
Lets look at the whole traceback, not just one line of it. At the very least we need to identify which line of your code is producing the problem.
In [4]: expr = np.piecewise((0, x-a <= -2*h), ((1/6)*(2*h+(x-a))**3, -2*h<=x-a<
...: =-h), (2*h**3/3-0.5*(x-a)**2*(2*h+(x-a)), -h<= x-a<= 0), (2*(h**3/3)-0.5
...: *(x-a)**2*(2*h+(x-a)), 0<=x-a<=2*h), ((1/6)*(2*h-(x-a))**3, h<=x-a<=2*h)
...: , (0, x-a<=2*h))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-4-893bb4b36321> in <module>
----> 1 expr = np.piecewise((0, x-a <= -2*h), ((1/6)*(2*h+(x-a))**3, -2*h<=x-a<=-h), (2*h**3/3-0.5*(x-a)**2*(2*h+(x-a)), -h<= x-a<= 0), (2*(h**3/3)-0.5*(x-a)**2*(2*h+(x-a)), 0<=x-a<=2*h), ((1/6)*(2*h-(x-a))**3, h<=x-a<=2*h), (0, x-a<=2*h))
/usr/local/lib/python3.8/dist-packages/sympy/core/relational.py in __nonzero__(self)
382
383 def __nonzero__(self):
--> 384 raise TypeError("cannot determine truth value of Relational")
385
386 __bool__ = __nonzero__
TypeError: cannot determine truth value of Relational
While np.piecewise is a numpy function, because x is a sympy.Symbol, the equations are sympy expressions. numpy and sympy are not well integrated. Somethings work, many others don't.
Did you try a small expression? Good programming practice is to start with small pieces, making sure those work first.
Let's try something smaller:
In [8]: expr = np.piecewise((0, x-a <= -2*h),
...: ((1/6)*(2*h+(x-a))**3, -2*h<=x-a<=-h))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-8-37ff62e49efb> in <module>
1 expr = np.piecewise((0, x-a <= -2*h),
----> 2 ((1/6)*(2*h+(x-a))**3, -2*h<=x-a<=-h))
/usr/local/lib/python3.8/dist-packages/sympy/core/relational.py in __nonzero__(self)
382
383 def __nonzero__(self):
--> 384 raise TypeError("cannot determine truth value of Relational")
385
386 __bool__ = __nonzero__
TypeError: cannot determine truth value of Relational
and smaller pieces:
In [10]: (0, x-a <= -2*h)
Out[10]: (0, x + 1 ≤ -1.0)
In [11]: ((1/6)*(2*h+(x-a))**3, -2*h<=x-a<=-h)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-11-7bd9f95d077d> in <module>
----> 1 ((1/6)*(2*h+(x-a))**3, -2*h<=x-a<=-h)
/usr/local/lib/python3.8/dist-packages/sympy/core/relational.py in __nonzero__(self)
382
383 def __nonzero__(self):
--> 384 raise TypeError("cannot determine truth value of Relational")
385
386 __bool__ = __nonzero__
TypeError: cannot determine truth value of Relational
In [12]: (1/6)*(2*h+(x-a))**3
Out[12]:
3
1.33333333333333⋅(0.5⋅x + 1)
But:
In [13]: -2*h<=x-a<=-h
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-13-5ffb419cd443> in <module>
----> 1 -2*h<=x-a<=-h
/usr/local/lib/python3.8/dist-packages/sympy/core/relational.py in __nonzero__(self)
382
383 def __nonzero__(self):
--> 384 raise TypeError("cannot determine truth value of Relational")
385
386 __bool__ = __nonzero__
TypeError: cannot determine truth value of Relational
Simplify further:
In [14]: 0 < x < 3
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-14-59ba4ce00627> in <module>
----> 1 0 < x < 3
/usr/local/lib/python3.8/dist-packages/sympy/core/relational.py in __nonzero__(self)
382
383 def __nonzero__(self):
--> 384 raise TypeError("cannot determine truth value of Relational")
385
386 __bool__ = __nonzero__
TypeError: cannot determine truth value of Relational
While a < b < c is allowed for regular Python variables and scalars, it does not work for numpy arrays, and evidently doesn't work for sympy variables either.
So the immediate problem has nothing to do with numpy. You are using invalid sympy expressions!
===
Your basis function reveals an aspect of the same problem. Again we need to look at the FULL traceback, and then test portions to identify the exact problem expression.
In [16]: basis(x, -1,0.5,0)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-16-b328f95b3c79> in <module>
----> 1 basis(x, -1,0.5,0)
<ipython-input-15-c6436540e3f3> in basis(x, a, b, h)
1 def basis(x,a,b, h):
----> 2 if x <= a-2*h:
3 return 0;
4 elif (x<=a-h) or (x >=2*h):
5 return (1/6)*(2*h+(x-a))**3
/usr/local/lib/python3.8/dist-packages/sympy/core/relational.py in __nonzero__(self)
382
383 def __nonzero__(self):
--> 384 raise TypeError("cannot determine truth value of Relational")
385
386 __bool__ = __nonzero__
TypeError: cannot determine truth value of Relational
This expression is a sympy relational:
In [17]: x <= -1
Out[17]: x ≤ -1
But we can't use such a relational in a Python if statement.
In [18]: if x <= -1: pass
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-18-b56148a48367> in <module>
----> 1 if x <= -1: pass
/usr/local/lib/python3.8/dist-packages/sympy/core/relational.py in __nonzero__(self)
382
383 def __nonzero__(self):
--> 384 raise TypeError("cannot determine truth value of Relational")
385
386 __bool__ = __nonzero__
TypeError: cannot determine truth value of Relational
Python if is simple True/False switch; its argument must evaluate to one or the other. The error is telling us that a sympy.Relational does not work. 0 < x < 1 is variation on that basic Python if (it tests 0<x and x<1 and performs a and).
A variation on this that we often see in numpy (and pandas) is:
In [20]: 0 < np.array([0,1,2])
Out[20]: array([False, True, True])
In [21]: 0 < np.array([0,1,2])<1
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-21-bc1039cec1fc> in <module>
----> 1 0 < np.array([0,1,2])<1
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
The numpy expression has multiple True/False values, and can't be used im a Python expression that requires a simple True/False.
edit
Correctly expanding the two sided tests:
In [23]: expr = np.piecewise((0, x-a <= -2*h),
...: ((1/6)*(2*h+(x-a))**3, (-2*h<=x-a)&(x-a<=-h)),
...: (2*h**3/3-0.5*(x-a)**2*(2*h+(x-a)), (-h<= x-a)&(x-a<= 0)),
...: (2*(h**3/3)-0.5*(x-a)**2*(2*h+(x-a)), (0<=x-a)&(x-a<=2*h)),
...: ((1/6)*(2*h-(x-a))**3, (h<=x-a)&(x-a<=2*h)), (0, x-a<=2*h))
In [24]: expr
Out[24]:
array([-0.5*(x + 1)**2*(x + 2.0) + 0.0833333333333333,
-0.5*(x + 1)**2*(x + 2.0) + 0.0833333333333333], dtype=object)
In [26]: p = lambdify((x,), expr)
x is the only sympy symbol in expr.
Looking at the resulting function:
In [27]: print(p.__doc__)
Created with lambdify. Signature:
func(x)
Expression:
[-0.5*(x + 1)**2*(x + 2.0) + 0.0833333333333333 -0.5*(x + 1)**2*(x + 2.0)...
Source code:
def _lambdifygenerated(x):
return ([-0.5*(x + 1)**2*(x + 2.0) + 0.0833333333333333, -0.5*(x + 1)**2*(x + 2.0) + 0.0833333333333333])
Related
I want to use curve_fit for functions that involve case-splitting.
However python throws Error.
Does curve_fit not support such a function ? Or is there is any problem at function definition ?
Example)
from scipy.optimize import curve_fit
import numpy as np
def slope_devided_by_cases(x,a,b):
if x < 4:
return a*x + b
else:
return 4*a + b
data_x = [1,2,3,4,5,6,7,8,9] # x
data_y = [45,46,42,36,27,23,21,13,11] # y
coef, cov = curve_fit(slope_devided_by_cases, data_x, data_y)
Error)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
C:\Users\Lisa~1\AppData\Local\Temp/ipykernel_1516/1012358816.py in <module>
10 data_x = [1,2,3,4,5,6,7,8,9] # x
11 data_y = [45,46,42,36,27,23,21,13,11] # y
---> 12 coef, cov = curve_fit(slope_devided_by_cases, data_x, data_y)
~\anaconda3\lib\site-packages\scipy\optimize\minpack.py in curve_fit(f, xdata, ydata, p0, sigma, absolute_sigma, check_finite, bounds, method, jac, **kwargs)
787 # Remove full_output from kwargs, otherwise we're passing it in twice.
788 return_full = kwargs.pop('full_output', False)
--> 789 res = leastsq(func, p0, Dfun=jac, full_output=1, **kwargs)
790 popt, pcov, infodict, errmsg, ier = res
791 ysize = len(infodict['fvec'])
~\anaconda3\lib\site-packages\scipy\optimize\minpack.py in leastsq(func, x0, args, Dfun, full_output, col_deriv, ftol, xtol, gtol, maxfev, epsfcn, factor, diag)
408 if not isinstance(args, tuple):
409 args = (args,)
--> 410 shape, dtype = _check_func('leastsq', 'func', func, x0, args, n)
411 m = shape[0]
412
~\anaconda3\lib\site-packages\scipy\optimize\minpack.py in _check_func(checker, argname, thefunc, x0, args, numinputs, output_shape)
22 def _check_func(checker, argname, thefunc, x0, args, numinputs,
23 output_shape=None):
---> 24 res = atleast_1d(thefunc(*((x0[:numinputs],) + args)))
25 if (output_shape is not None) and (shape(res) != output_shape):
26 if (output_shape[0] != 1):
~\anaconda3\lib\site-packages\scipy\optimize\minpack.py in func_wrapped(params)
483 if transform is None:
484 def func_wrapped(params):
--> 485 return func(xdata, *params) - ydata
486 elif transform.ndim == 1:
487 def func_wrapped(params):
C:\Users\Lisa~1\AppData\Local\Temp/ipykernel_1516/1012358816.py in slope_devided_by_cases(x, a, b)
3
4 def slope_devided_by_cases(x,a,b):
----> 5 if x < 4:
6 return a*x + b
7 else:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I want to use curve_fit for functions that involve case-splitting such as above example.
The problem is that x < 4 is not a boolean scalar value because curve_fit will evaluate your function with an np.ndarray x (your given x data points), not a scalar value. Consequently, x < 4 will give you an array of boolean values.
That said, you could rewrite your function by using NumPy's vectorized operations:
def slope_devided_by_cases(x,a,b):
return (x < 4) * (a*x + b) + (x >= 4) * (4*a+b)
Alternatively, you could use np.where as a vectorized alternative to your if-else approach:
def slope_devided_by_cases(x,a,b):
return np.where(x < 4, a*x + b, 4+a+b)
Another interesting approach could be using the piecewise function from numpy.
from matplotlib import pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
def f(x, a, b):
return np.piecewise(
x, [x < 4, x >= 4], [lambda x_: a * x_ + b, lambda x_: 4 * a + b]
)
data_x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
data_y = np.array([45, 46, 42, 36, 27, 23, 21, 13, 11])
coeff, cov = curve_fit(f, data_x, data_y)
y_fit = f(data_x, *coeff)
plt.plot(data_x, data_y, "o")
plt.plot(data_x, y_fit, "-")
plt.show()
Here is the result of the optimization (maybe a better model could be chosen, but I don't know all the details of the problem at hand and I didn't even specify any initial value since this question is more about making the code work).
I'm trying to find cosine similarity between two set of documents in Python 3.x. So I wrote following code
count_vectorizer = CountVectorizer(stop_words=stopwords)
sparse_matrix = count_vectorizer.fit_transform(formatted0)
doc_term_matrix = sparse_matrix.todense()
sparse_matrix = count_vectorizer.fit_transform(formatted)
doc_term_matrix1 = sparse_matrix.todense()
z=cosine_similarity(doc_term_matrix,doc_term_matrix1)
Length of doc_term_matrix is 29982 & doc_term_matrix1 is 346. But I'm getting error message
/opt/conda/lib/python3.9/site-packages/sklearn/utils/validation.py:593: FutureWarning: np.matrix usage is deprecated in 1.0 and will raise a TypeError in 1.2. Please convert to a numpy array with np.asarray. For more information see: https://numpy.org/doc/stable/reference/generated/numpy.matrix.html
warnings.warn(
/opt/conda/lib/python3.9/site-packages/sklearn/utils/validation.py:593: FutureWarning: np.matrix usage is deprecated in 1.0 and will raise a TypeError in 1.2. Please convert to a numpy array with np.asarray. For more information see: https://numpy.org/doc/stable/reference/generated/numpy.matrix.html
warnings.warn(
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/tmp/ipykernel_888/79579735.py in <module>
----> 1 z419=cosineSimilarity(splittedCosine419,doc_term_matrix)
2 z419
/tmp/ipykernel_888/2223236548.py in cosineSimilarity(splitted_german, doc_term_matrix)
8 sparse_matrix = count_vectorizer.fit_transform(formatted)
9 doc_term_matrix1 = sparse_matrix.todense()
---> 10 z=cosine_similarity(doc_term_matrix1,doc_term_matrix)
11 return z
/opt/conda/lib/python3.9/site-packages/sklearn/metrics/pairwise.py in cosine_similarity(X, Y, dense_output)
1249 # to avoid recursive import
1250
-> 1251 X, Y = check_pairwise_arrays(X, Y)
1252
1253 X_normalized = normalize(X, copy=True)
/opt/conda/lib/python3.9/site-packages/sklearn/metrics/pairwise.py in check_pairwise_arrays(X, Y, precomputed, dtype, accept_sparse, force_all_finite, copy)
179 )
180 elif X.shape[1] != Y.shape[1]:
--> 181 raise ValueError(
182 "Incompatible dimension for X and Y matrices: "
183 "X.shape[1] == %d while Y.shape[1] == %d" % (X.shape[1], Y.shape[1])
ValueError: Incompatible dimension for X and Y matrices: X.shape[1] == 1027 while Y.shape[1] == 10346
Can you suggest me the steps to resolve this issue?
I am using a custom-defined metric in SKlearn's KNeighborsClassifier. Here's my code:
def chi_squared(x,y):
return np.divide(np.square(np.subtract(x,y)), np.sum(x,y))
Above function implementation of chi squared distance function. I have used NumPy functions because according to scikit-learn docs, metric function takes two one-dimensional numpy arrays.
I have passed the chi_squared function as an argument to KNeighborsClassifier().
knn = KNeighborsClassifier(algorithm='ball_tree', metric=chi_squared)
However, I keep getting following error:
TypeError Traceback (most recent call last)
<ipython-input-29-d2a365ebb538> in <module>
4
5 knn = KNeighborsClassifier(algorithm='ball_tree', metric=chi_squared)
----> 6 knn.fit(X_train, Y_train)
7 predictions = knn.predict(X_test)
8 print(accuracy_score(Y_test, predictions))
~/.local/lib/python3.8/site-packages/sklearn/neighbors/_classification.py in fit(self, X, y)
177 The fitted k-nearest neighbors classifier.
178 """
--> 179 return self._fit(X, y)
180
181 def predict(self, X):
~/.local/lib/python3.8/site-packages/sklearn/neighbors/_base.py in _fit(self, X, y)
497
498 if self._fit_method == 'ball_tree':
--> 499 self._tree = BallTree(X, self.leaf_size,
500 metric=self.effective_metric_,
501 **self.effective_metric_params_)
sklearn/neighbors/_binary_tree.pxi in sklearn.neighbors._ball_tree.BinaryTree.__init__()
sklearn/neighbors/_binary_tree.pxi in sklearn.neighbors._ball_tree.BinaryTree._recursive_build()
sklearn/neighbors/_ball_tree.pyx in sklearn.neighbors._ball_tree.init_node()
sklearn/neighbors/_binary_tree.pxi in sklearn.neighbors._ball_tree.BinaryTree.rdist()
sklearn/neighbors/_dist_metrics.pyx in sklearn.neighbors._dist_metrics.DistanceMetric.rdist()
sklearn/neighbors/_dist_metrics.pyx in sklearn.neighbors._dist_metrics.PyFuncDistance.dist()
sklearn/neighbors/_dist_metrics.pyx in sklearn.neighbors._dist_metrics.PyFuncDistance._dist()
<ipython-input-29-d2a365ebb538> in chi_squared(x, y)
1 def chi_squared(x,y):
----> 2 return np.divide(np.square(np.subtract(x,y)), np.sum(x,y))
3
4
5 knn = KNeighborsClassifier(algorithm='ball_tree', metric=chi_squared)
<__array_function__ internals> in sum(*args, **kwargs)
~/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py in sum(a, axis, dtype, out, keepdims, initial, where)
2239 return res
2240
-> 2241 return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
2242 initial=initial, where=where)
2243
~/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
85 return reduction(axis=axis, out=out, **passkwargs)
86
---> 87 return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
88
89
TypeError: only integer scalar arrays can be converted to a scalar index
I can reproduce your error message with:
In [173]: x=np.arange(3); y=np.array([2,3,4])
In [174]: np.sum(x,y)
Traceback (most recent call last):
File "<ipython-input-174-1a1a267ebd82>", line 1, in <module>
np.sum(x,y)
File "<__array_function__ internals>", line 5, in sum
File "/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py", line 2247, in sum
return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
File "/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
TypeError: only integer scalar arrays can be converted to a scalar index
Correct use(s) of np.sum:
In [175]: np.sum(x)
Out[175]: 3
In [177]: np.sum(np.arange(6).reshape(2,3), axis=0)
Out[177]: array([3, 5, 7])
In [178]: np.sum(np.arange(6).reshape(2,3), 0)
Out[178]: array([3, 5, 7])
(re)read the np.sum docs if necessary!
Using np.add instead of np.sum:
In [179]: np.add(x,y)
Out[179]: array([2, 4, 6])
In [180]: x+y
Out[180]: array([2, 4, 6])
The following should be equivalent:
np.divide(np.square(np.subtract(x,y)), np.add(x,y))
(x-y)**2/(x+y)
I have tried to put this in sympy, but it doesn't work:
import sympy
sympy.init_printing(use_unicode=True)
sympy.Rational(3, sympy.sqrt(3))
It returns to me the following:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-18-25b0450ea0d0> in <module>()
1 root = sympy.sqrt(3)
2
----> 3 sympy.Rational(3, sympy.sqrt(3))
~\Anaconda3\lib\site-packages\sympy\core\numbers.py in __new__(cls, p, q, gcd)
1488 else:
1489 p = Rational(p)
-> 1490 q = Rational(q)
1491
1492 if isinstance(q, Rational):
~\Anaconda3\lib\site-packages\sympy\core\numbers.py in __new__(cls, p, q, gcd)
1483
1484 if not isinstance(p, SYMPY_INTS + (Rational,)):
-> 1485 raise TypeError('invalid input: %s' % p)
1486 q = q or S.One
1487 gcd = 1
TypeError: invalid input: sqrt(3)
Why does this happen? How can I put an irrational number inside a fraction in sympy, then?
To create the ratio, use 3/sqrt(3); to print it as such requires the use of unevaluated expressions: (Mul(3,Pow(sqrt(3),-1, evaluate=False),evaluate=False).
I have to take the percentile on the whole data at once but I have several ids data and want to have the separate results of all ids. here is my code where i am getting some error
result_frame.groupby('ID').apply(percentile('rolling_mean', [25]))
I am getting the following error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-60-87a94290cfde> in <module>()
----> 1 result_frame.groupby('VoyageID').apply(percentile('rolling_mean', [25]))
~/anaconda3/lib/python3.6/site-packages/numpy/lib/function_base.py in percentile(a, q, axis, out, overwrite_input, interpolation, keepdims)
4272 r, k = _ureduce(a, func=_percentile, q=q, axis=axis, out=out,
4273 overwrite_input=overwrite_input,
-> 4274 interpolation=interpolation)
4275 if keepdims:
4276 if q.ndim == 0:
~/anaconda3/lib/python3.6/site-packages/numpy/lib/function_base.py in _ureduce(a, func, **kwargs)
4014 keepdim = [1] * a.ndim
4015
-> 4016 r = func(a, **kwargs)
4017 return r, keepdim
4018
~/anaconda3/lib/python3.6/site-packages/numpy/lib/function_base.py in _percentile(a, q, axis, out, overwrite_input, interpolation, keepdims)
4389 n = np.isnan(ap[-1:, ...])
4390
-> 4391 x1 = take(ap, indices_below, axis=axis) * weights_below
4392 x2 = take(ap, indices_above, axis=axis) * weights_above
4393
TypeError: ufunc 'multiply' did not contain a loop with signature matching types dtype('<U32') dtype('<U32') dtype('<U32')
How about this?
import numpy as np
import pandas as pd
data = pd.DataFrame({'group': ['A', 'B'] * 100, 'value': np.random.randn(200)})
data.groupby('group')['value'].quantile([.25, .75])
You are correct to group, you just need to identify the column you'd like to summarise, and then apply the percentile using quantile.