Difference in use of ** and pow function - python-3.x

while attempting to write a cost function for linear regression the error is arising while replacing ** with pow function in cost_function :
Original cost function
def cost_function(x,y,theta):
m = np.size(y)
j = (1/(2*m))*np.sum(np.power(np.matmul(x,theta)-y),2)
return j
Cost function giving the error:
def cost_function(x,y,theta):
m = np.size(y)
j = (1/(2*m))*np.sum((np.matmul(x,theta)-y)**2)
return j
Gradient Descent
def gradient_descent(x,y,theta,learn_rate,iters):
x = np.mat(x);y = np.mat(y); theta= np.mat(theta);
m = np.size(y)
j_hist = np.zeros(iters)
for i in range(0,iters):
temp = theta - (learn_rate/m)*(x.T*(x*theta-y))
theta = temp
j_hist[i] = cost_function(x,y,theta)
return (theta),j_hist
Variable values
theta = np.zeros((2,1))
learn_rate = 0.01
iters = 1000
x is (97,2) matrix
y is (97,1) matrix
cost function is calculated fine with value of 32.0727
The error arises while using the same function in gradient descent.
The error am getting is LinAlgError: Last 2 dimensions of the array must be square

First let's distinguish between pow, ** and np.power. pow is the Python function, that according to docs is equivalent to ** when used with 2 arguments.
Second, you apply np.mat to the arrays, making np.matrix objects. According to its docs:
It has certain special operators, such as *
(matrix multiplication) and ** (matrix power).
matrix power:
In [475]: np.mat([[1,2],[3,4]])**2
Out[475]:
matrix([[ 7, 10],
[15, 22]])
Elementwise square:
In [476]: np.array([[1,2],[3,4]])**2
Out[476]:
array([[ 1, 4],
[ 9, 16]])
In [477]: np.power(np.mat([[1,2],[3,4]]),2)
Out[477]:
matrix([[ 1, 4],
[ 9, 16]])
Matrix power:
In [478]: arr = np.array([[1,2],[3,4]])
In [479]: arr#arr # np.matmul
Out[479]:
array([[ 7, 10],
[15, 22]])
With a non-square matrix:
In [480]: np.power(np.mat([[1,2]]),2)
Out[480]: matrix([[1, 4]]) # elementwise
Attempting to do matrix_power on a non-square matrix:
In [481]: np.mat([[1,2]])**2
---------------------------------------------------------------------------
LinAlgError Traceback (most recent call last)
<ipython-input-481-18e19d5a9d6c> in <module>()
----> 1 np.mat([[1,2]])**2
/usr/local/lib/python3.6/dist-packages/numpy/matrixlib/defmatrix.py in __pow__(self, other)
226
227 def __pow__(self, other):
--> 228 return matrix_power(self, other)
229
230 def __ipow__(self, other):
/usr/local/lib/python3.6/dist-packages/numpy/linalg/linalg.py in matrix_power(a, n)
600 a = asanyarray(a)
601 _assertRankAtLeast2(a)
--> 602 _assertNdSquareness(a)
603
604 try:
/usr/local/lib/python3.6/dist-packages/numpy/linalg/linalg.py in _assertNdSquareness(*arrays)
213 m, n = a.shape[-2:]
214 if m != n:
--> 215 raise LinAlgError('Last 2 dimensions of the array must be square')
216
217 def _assertFinite(*arrays):
LinAlgError: Last 2 dimensions of the array must be square
Note that the whole traceback lists matrix_power. That's why we often ask to see the whole traceback.
Why are you setting x,y and theta to np.mat? The cost_function uses matmul. With that function, and its # operator, there are few(er) good reasons for using np.matrix.
Despite the subject line, you did not try to use pow. That confused me and at least one other commentator. I tried to find a np.pow or a scipy version.

Related

Numba jit and Scipy

I have found a few posts on the subject here, but most of them did not have a useful answer.
I have a 3D NumPy dataset [images number, x, y] in which the probability that the pixel belongs to a class is stored as a float (0-1). I would like to correct the wrong segmented pixels (with high performance).
The probabilities are part of a movie in which objects are moving from right to left and possibly back again. The basic idea is that I fit the pixels with a Gaussian function or comparable function and look at around 15-30 images ( [i-15 : i+15 ,x, y] ). It is very probable that if the previous 5 pixels and the following 5 pixels are classified in this class, this pixel also belongs to this class.
To illustrate my problem I add a sample code, the results were calculated without the usage of numba:
from scipy.optimize import curve_fit
from scipy import exp
import numpy as np
from numba import jit
#jit
def fit(size_of_array, outputAI, correct_output):
x = range(size_of_array[0])
for i in range(size_of_array[1]):
for k in range(size_of_array[2]):
args, cov = curve_fit(gaus, x, outputAI[:, i, k])
correct_output[2, i, k] = gaus(2, *args)
return correct_output
#jit
def gaus(x, a, x0, sigma):
return a*exp(-(x-x0)**2/(2*sigma**2))
if __name__ == '__main__':
# output_AI = [imageNr, x, y] example 5, 2, 2
# At position [2][1][1] is the error, the pixels before and after were classified to the class but not this pixel.
# The objects do not move in such a speed, so the probability should be corrected.
outputAI = np.array([[[0.1, 0], [0, 0]], [[0.8, 0.3], [0, 0.2]], [[1, 0.1], [0, 0.2]],
[[0.1, 0.3], [0, 0.2]], [[0.8, 0.3], [0, 0.2]]])
correct_output = np.zeros(outputAI.shape)
# I correct now in this example only all pixels in image 3, in the code a loop runs over the whole 3D array and
# corrects every image and every pixel separately
size_of_array = outputAI.shape
correct_output = fit(size_of_array, outputAI, correct_output)
# numba error: Compilation is falling back to object mode WITH looplifting enabled because Function "fit" failed
# type inference due to: Untyped global name 'curve_fit': cannot determine Numba type of <class 'function'>
print(correct_output[2])
# [[9.88432346e-01 2.10068763e-01]
# [6.02428922e-20 2.07921125e-01]]
# The wrong pixel at position [0][0] was corrected from 0.2 to almost 1, the others are still not assigned
# to the class.
Unfortunately numba does NOT work. I always get the following error:
Compilation is falling back to object mode WITH looplifting enabled because Function "fit" failed type inference due to: Untyped global name 'curve_fit': cannot determine Numba type of <class 'function'>
** ------------------------------------------------------------------------**
Update 04.08.2020
Currently I have this solution for my problem in mind. But I am open for further suggestions.
from scipy.optimize import curve_fit
from scipy import exp
import numpy as np
import time
def fit_without_scipy(input):
x = range(input.size)
x0 = outputAI[i].argmax()
a = input.max()
var = (input - input.mean())**2
return a * np.exp(-(x - x0) ** 2 / (2 * var.mean()))
def fit(input):
x = range(len(input))
try:
args, cov = curve_fit(gaus, x, outputAI[i])
return gaus(x, *args)
except:
return input
def gaus(x, a, x0, sigma):
return a * exp(-(x - x0) ** 2 / (2 * sigma ** 2))
if __name__ == '__main__':
nr = 31
N = 100000
x = np.linspace(0, 30, nr)
outputAI = np.zeros((N, nr))
correct_output = outputAI.copy()
correct_output_numba = outputAI.copy()
perfekt_result = outputAI.copy()
for i in range(N):
perfekt_result[i] = gaus(x, np.random.random(), np.random.randint(-N, 2*N), np.random.random() * np.random.randint(0, 100))
outputAI[i] = perfekt_result[i] + np.random.normal(0, 0.5, nr)
start = time.time()
for i in range(N):
correct_output[i] = fit(outputAI[i])
print("Time with scipy: " + str(time.time() - start))
start = time.time()
for i in range(N):
correct_output_numba[i] = fit_without_scipy(outputAI[i])
print("Time without scipy: " + str(time.time() - start))
for i in range(N):
correct_output[i] = abs(correct_output[i] - perfekt_result[i])
correct_output_numba[i] = abs(correct_output_numba[i] - perfekt_result[i])
print("Mean deviation with scipy: " + str(correct_output.mean()))
print("Mean deviation without scipy: " + str(correct_output_numba.mean()))
Output [with nr = 31 and N = 100000]:
Time with scipy: 193.27853846549988 secs
Time without scipy: 2.782526969909668 secs
Mean deviation with scipy: 0.03508043754489116
Mean deviation without scipy: 0.0419951370808896
In the next step I would try to speed up the code even more with numba. Currently this does not work because of the argmax function.
Curve_fit eventually calls into either least_squares (pure python) or leastsq (C extension). You have three options:
figure out how to make numba-jitted code talk to a C extension which powers leastsq
extract relevant parts of least_squares and numba.jit them
implement the LowLevelCallable support for least_squares or minimize.
None of these is easy. OTOH all of these would be interesting to a wider audience if successful.

numpy condition function for 2-D data

I have a synthetic dataset consisting of features (X) and labels (y) which is used for KMeans clustering using Python 3.8 and sklearn 0.22.2 and numpy 1.19.
X.shape, y.shape
# ((100, 2), (100,))
kmeans = KMeans(n_clusters = 3, init = 'random', n_init = 10, max_iter = 300)
# Train model on scaled features-
kmeans.fit(X)
After training KMeans on 'X', I want to replace the unique (continuous) values of 'X' with the cluster centers (discreet) obtained using KMeans.
for i in range(3):
print("cluster number {0} has center = {1}".format(i + 1, kmeans.cluster_centers_[i, :]))
'''
cluster number 1 has center = [-0.7869159 1.14173859]
cluster number 2 has center = [ 1.28010442 -1.04663318]
cluster number 3 has center = [-0.54654735 0.0054752 ]
'''
set(kmeans.labels_)
# {0, 1, 2}
One way I have of doing it is:
X[np.where(clustered_labels == 0)] = val[0,:]
X[np.where(clustered_labels == 1)] = val[1,:]
X[np.where(clustered_labels == 2)] = val[2,:]
Can I do it using np.select()?
cond = [clustered_labels == i for i in range(3)]
val = kmeans.cluster_centers_[:,:]
But on executing the code:
np.select(cond, val)
I get the following error:
--------------------------------------------------------------------------- ValueError Traceback (most recent call
last) in
----> 1 np.select(cond, val)
<array_function internals> in select(*args, **kwargs)
~/.local/lib/python3.8/site-packages/numpy/lib/function_base.py in
select(condlist, choicelist, default)
693 result_shape = condlist[0].shape
694 else:
--> 695 result_shape = np.broadcast_arrays(condlist[0], choicelist[0])[0].shape
696
697 result = np.full(result_shape, choicelist[-1], dtype)
<array_function internals> in broadcast_arrays(*args, **kwargs)
~/.local/lib/python3.8/site-packages/numpy/lib/stride_tricks.py in
broadcast_arrays(subok, *args)
256 args = [np.array(_m, copy=False, subok=subok) for _m in args]
257
--> 258 shape = _broadcast_shape(*args)
259
260 if all(array.shape == shape for array in args):
~/.local/lib/python3.8/site-packages/numpy/lib/stride_tricks.py in
_broadcast_shape(*args)
187 # use the old-iterator because np.nditer does not handle size 0 arrays
188 # consistently
--> 189 b = np.broadcast(*args[:32])
190 # unfortunately, it cannot handle 32 or more arguments directly
191 for pos in range(32, len(args), 31):
ValueError: shape mismatch: objects cannot be broadcast to a single
shape
Suggestions?
Thanks!
Somewhat cleaner way to do it (but very similar to your way) will be the following. Here's a simple example:
from sklearn.cluster import KMeans
import numpy as np
x1 = np.random.normal(0, 2, 100)
y1 = np.random.normal(0, 1, 100)
label1 = np.ones(100)
d1 = np.column_stack([x1, y1, label1])
x2 = np.random.normal(3, 1, 100)
y2 = np.random.normal(1, 2, 100)
label2 = np.ones(100) * 2
d2 = np.column_stack([x2, y2, label2])
x3 = np.random.normal(-3, 0.5, 100)
y3 = np.random.normal(0.5, 0.25, 100)
label3 = np.ones(100) * 3
d3 = np.column_stack([x3, y3, label3])
D = np.row_stack([d1, d2, d3])
np.random.shuffle(D)
X = D[:, :2]
y = D[:, 2]
print(f'X.shape = {X.shape}, y.shape = {y.shape}')
# X.shape = (300, 2), y.shape = (300,)
kmeans = KMeans(n_clusters = 3, init = 'random', n_init = 10, max_iter = 300)
# Train model on scaled features-
kmeans.fit(X)
preds = kmeans.predict(X)
X[preds==0] = kmeans.cluster_centers_[0]
X[preds==1] = kmeans.cluster_centers_[1]
X[preds==2] = kmeans.cluster_centers_[2]
Yet another way to accomplish the task is to use the np.put method instead of the assignment as follows:
np.put(X, preds==0, kmeans.cluster_centers_[0])
np.put(X, preds==1, kmeans.cluster_centers_[1])
np.put(X, preds==2, kmeans.cluster_centers_[2])
Frankly, I don't see a way to accomplish the task by the means of the np.select function, and I guess the way you do it is the best way, based on this answer.
Cheers.

Squaring multi-dimensional array, including cross term, without for loop

I'm trying to square a particular axis of a multi dimensional array without using loop in python.
Here I will present the code with loop.
First, let's define a simple array
x = np.random.randint(1, size=(2, 3))
Since the value of the second axis is 3, we have x1, x2, x3. The square term of this array is x12, x22, x32, 2x1x2, 2x1x3, 2x2x3. In total, we have 9 terms.
Here is the full code:
import numpy as np
import time
x = np.random.randint(low=20, size=(2, 3))
print(x)
a, b = x.shape
for i in range(b):
XiXj = np.einsum('i, ij->ij', x[:, i], x[:, i:b])
x = np.concatenate((x, XiXj) , axis=1)
print(x)
Print:
[[ 3 12 18]
[12 10 10]]
[[ 3 12 18 9 36 54 144 216 324]
[ 12 10 10 144 120 120 100 100 100]]
Of course, this won't take long to compute. However, one may have the size of the array of [2000, 5000]. This will take awhile to compute.
How would you do it without the for loop?

Scipy optimization with matrix multiplication

I've tried to use spicy.optimize.minimize to solve a matrix multiplication optimization problem, however, the result gives me a dimension error, can someone help me with it?
import numpy as np
from scipy.optimize import minimize
# define known variables, mu, sigma, rf
mu = np.matrix([[0.12],
[0.08],
[0.05]])
sigma = np.matrix([[0.5, 0.05, 0.03],
[0.05, 0.4, 0.01],
[0.03, 0.01, 0.2]])
rf = 0.02
def objective_fun(x):
'''
This is the objective function
'''
s = np.sqrt(x.T * sigma * x)/(mu.T * x - rf)
return s
def constraint(x):
con = 1
for i in np.arange(0,3):
con = con - x[i]
return con
# set up the boundaries for x
bound_i = (0, np.Inf)
bnds = (bound_i, bound_i, bound_i)
#set up the constraints for x
con = {'type':'eq', 'fun':constraint}
# initial guess for variable x
x = np.matrix([[0.5],
[0.3],
[0.2]])
sol = minimize(objective_fun, x, method = 'SLSQP', bounds = bnds, constraints = con)
The error gives me:
ValueError Traceback (most recent call last)
<ipython-input-31-b8901077b164> in <module>
----> 1 sol = minimize(objective_fun, x, method = 'SLSQP', bounds = bnds, constraints = con)
e:\Anaconda3\lib\site-packages\scipy\optimize\_minimize.py in minimize(fun, x0, args, method, jac, hess, hessp, bounds, constraints, tol, callback, options)
606 elif meth == 'slsqp':
607 return _minimize_slsqp(fun, x0, args, jac, bounds,
--> 608 constraints, callback=callback, **options)
609 elif meth == 'trust-constr':
610 return _minimize_trustregion_constr(fun, x0, args, jac, hess, hessp,
e:\Anaconda3\lib\site-packages\scipy\optimize\slsqp.py in _minimize_slsqp(func, x0, args, jac, bounds, constraints, maxiter, ftol, iprint, disp, eps, callback, **unknown_options)
397
398 # Compute objective function
--> 399 fx = func(x)
400 try:
401 fx = float(np.asarray(fx))
e:\Anaconda3\lib\site-packages\scipy\optimize\optimize.py in function_wrapper(*wrapper_args)
324 def function_wrapper(*wrapper_args):
325 ncalls[0] += 1
--> 326 return function(*(wrapper_args + args))
327
328 return ncalls, function_wrapper
<ipython-input-28-b1fb2386a380> in objective_fun(x)
3 This is the objective function
4 '''
----> 5 s = np.sqrt(x.T * sigma * x)/(mu.T * x - rf)
6 return s
e:\Anaconda3\lib\site-packages\numpy\matrixlib\defmatrix.py in __mul__(self, other)
218 if isinstance(other, (N.ndarray, list, tuple)) :
219 # This promotes 1-D vectors to row vectors
--> 220 return N.dot(self, asmatrix(other))
221 if isscalar(other) or not hasattr(other, '__rmul__') :
222 return N.dot(self, other)
ValueError: shapes (1,3) and (1,3) not aligned: 3 (dim 1) != 1 (dim 0)
However, I tried individually every function I wrote, they all have no errors in the end, like, if after defining the x matrix as shown in the code, I simply run objective_fun(x) in the console, and I immediately get an answer:
optimize_fun(x)
matrix([[5.90897598]])
That means that my function can do the matrix multiplication correctly, so what is wrong with the code here?
The docs for minimize() says that x0 should be an (n,) shaped array, but you are trying to treat it like a (3,1) array. I'm not sure on the inner workings of minimize() but I suspect when it steps over different values of the fit parameters it converts to the format that it thinks it wants. Anyways, the following minor corrections make it so the code works.
import numpy as np
from scipy.optimize import minimize
# define known variables, mu, sigma, rf
mu = np.matrix([[0.12],
[0.08],
[0.05]])
sigma = np.matrix([[0.5, 0.05, 0.03],
[0.05, 0.4, 0.01],
[0.03, 0.01, 0.2]])
rf = 0.02
def objective_fun(x):
'''
This is the objective function
'''
x = np.expand_dims(x, 1) # convert the (3,) shape to (3,1). Then we can do our normal matrix math on it
s = np.sqrt(x.T * sigma * x)/(mu.T * x - rf) # Transposes so the shapes are correct
return s
def constraint(x):
con = 1
for i in np.arange(0,3):
con = con - x[i]
return con
# set up the boundaries for x
bound_i = (0, np.Inf)
bnds = (bound_i, bound_i, bound_i)
#set up the constraints for x
con = {'type':'eq', 'fun':constraint}
# initial guess for variable x
x = np.array([0.5, 0.3, 0.2]) # Defining the initial guess as an (3,) array)
sol = minimize(objective_fun, x, method = 'SLSQP', bounds = bnds, constraints = con)
print(sol) # and the solution looks reasonable
Output
fun: 5.86953830952583
jac: array([-1.70555401, -1.70578796, -1.70573896])
message: 'Optimization terminated successfully.'
nfev: 32
nit: 6
njev: 6
status: 0
success: True
x: array([0.42809911, 0.29522438, 0.27667651])
Take a look at the comments I put in for an explanation on what you need to do.

Using huber scale and location estimator in statsmodel

I want to use huber simultaneous scale and mean estimator found here : http://www.statsmodels.org/dev/generated/statsmodels.robust.scale.Huber.html but here is the error :
In [1]: from statsmodels.robust.scale import huber
In [2]: huber([1,2,1000,3265,454])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-80c7d73a4467> in <module>()
----> 1 huber([1,2,1000,3265,454])
/usr/local/lib/python3.5/dist-packages/statsmodels/robust/scale.py in __call__(self, a, mu, initscale, axis)
132 scale = tools.unsqueeze(scale, axis, a.shape)
133 mu = tools.unsqueeze(mu, axis, a.shape)
--> 134 return self._estimate_both(a, scale, mu, axis, est_mu, n)
135
136 def _estimate_both(self, a, scale, mu, axis, est_mu, n):
/usr/local/lib/python3.5/dist-packages/statsmodels/robust/scale.py in _estimate_both(self, a, scale, mu, axis, est_mu, n)
176 else:
177 return nmu.squeeze(), nscale.squeeze()
--> 178 raise ValueError('joint estimation of location and scale failed to converge in %d iterations' % self.maxiter)
179
180 huber = Huber()
ValueError: joint estimation of location and scale failed to converge in 30 iterations
The weird thing is that it depends on the input:
In [3]: huber([1,2,1000,3265])
Out[3]: (array(1067.0), array(1744.3785635989168))
Is it a bug or did I do something wrong here ?
Thanks
EDIT : I knew about the tol and maxiter parameter, what you say works in that case but here is an example where it doesn't :
In [1]: a=[4.3498776644415429, 16.549773154535362, 4.6335866963356445, 8.2581784707468771, 1.3508951981036594, 1.2918098244960199, 5.734
...: 9939516388453, 0.41663442483143953, 4.5632532990486077, 8.1020487048604473, 1.3823829480004797, 1.7848176927929804, 4.3058348043
...: 423473, 0.9427710734983884, 0.95646846668018171, 0.75309469901235238, 8.4689505489677011, 0.77420558084543778, 0.765060223824508
...: 45, 1.5673666392992407, 1.4109878442590897, 0.45592078018861532, 4.71748181503082, 0.65942167325205436, 0.19099796838644958, 1.0
...: 979997466466069, 4.8145761128848106, 0.75417363824157768, 5.0723603274833362, 0.30627007428414721, 4.8178689054947981, 1.5383475
...: 959362511, 0.7971041296695851, 4.689826268915076, 8.6704498595703274, 0.56825576954483947, 8.0383098149129708, 0.394000842811084
...: 22, 0.89827542590321019, 8.5160701523615785, 9.0413284666560934, 1.3590549231652516, 8.355489609767794, 4.2413169378427682, 4.84
...: 97143419119348, 4.8566372637376292, 0.80979444214378904, 0.26613505510736446, 1.1525345100417608, 4.9784132426823824, 1.07663603
...: 91211101, 1.9604545887151259, 0.77151237419054963, 1.2302626325699455, 0.846912462599126, 0.85852710339862037, 0.380355420248302
...: 99, 4.7586522644359093, 0.46796412732813891, 0.52933680009769146, 5.2521765047159708, 0.71915381047435945, 1.3502865819436387, 0
...: .76647272458736559, 1.1206637428992841, 0.72560665950851866, 4.4248008256265781, 4.7984989298357457, 1.0696617588880453, 0.71104
...: 701759920497, 0.46986438176394463, 0.71008686283792688, 0.40698839770374351, 1.0015132141773508, 1.3825224746094535, 0.932562703
...: 04709066, 8.8896053101317687, 0.64148877800521564, 0.69250319745644506, 4.7187793763802919, 5.0620089438920939, 5.17105647739872
...: 33, 9.5341720525579809, 0.43052713463119635, 0.79288845392647533, 0.51059695992994469, 0.48295891743804287, 0.93370512281086504,
...: 1.7493284310512855, 0.62744557356984221, 5.0965146009791704, 0.12615625248684664, 1.1064189602023351, 0.33183381198282491, 4.90
...: 32450273833179, 0.90296573725985785, 1.2885647882049298, 0.84669066664867576, 1.1481783837280477, 0.94784483590946278, 9.8019240
...: 792478755, 0.91501030105202807, 0.57121190468293803, 5.5511993201050887, 0.66054793663263078, 9.6626055869916065, 5.262806161853
...: 6908, 9.5905100705465696, 0.70369230764306401, 8.9747551552440186, 1.572014845182425, 1.9571634928868149, 0.62030418652298325, 0
...: .3395356767840213, 0.48287760518144929, 4.7937042347984198, 0.74251393675618682, 0.87369567300592954, 4.5381205696031586, 5.2673
...: 192797619084]
In [2]: from statsmodels.robust.scale import huber, Huber
In [3]: Huber(maxiter=10000,tol=1e-1)(a)
/usr/lib/python3.6/site-packages/statsmodels/robust/scale.py:168: RuntimeWarning: invalid value encountered in sqrt
/ (n * self.gamma - (a.shape[axis] - card) * self.c**2))
/usr/lib/python3.6/site-packages/statsmodels/robust/scale.py:164: RuntimeWarning: invalid value encountered in less_equal
subset = np.less_equal(np.fabs((a - mu)/scale), self.c)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-4b9929ff84bb> in <module>()
----> 1 Huber(maxiter=10000,tol=1e-1)(a)
/usr/lib/python3.6/site-packages/statsmodels/robust/scale.py in __call__(self, a, mu, initscale, axis)
132 scale = tools.unsqueeze(scale, axis, a.shape)
133 mu = tools.unsqueeze(mu, axis, a.shape)
--> 134 return self._estimate_both(a, scale, mu, axis, est_mu, n)
135
136 def _estimate_both(self, a, scale, mu, axis, est_mu, n):
/usr/lib/python3.6/site-packages/statsmodels/robust/scale.py in _estimate_both(self, a, scale, mu, axis, est_mu, n)
176 else:
177 return nmu.squeeze(), nscale.squeeze()
--> 178 raise ValueError('joint estimation of location and scale failed to converge in %d iterations' % self.maxiter)
179
180 huber = Huber()
ValueError: joint estimation of location and scale failed to converge in 10000 iterations
Sorry, this was my original error but because the "a" is long, I tried to recreate the error with a smaller array. In this case, I don't think maxiter and tol are to blame.
The number of iterations allowed, maxiter, can be changed when using the Huber class.
e.g. this works
>>> from statsmodels.robust.scale import huber, Huber
>>> Huber(maxiter=200)([1,2,1000,3265,454])
(array(925.6483958529737), array(1497.0624070525248))
It is also possible to change the threshold parameter for the norm function when using the class. In very small samples like this the estimate might be very sensitive to the threshold parameter.
As alternative we can use the RLM model and regress on a constant, both thresholds and the algorithm are different but it should produce similar robust results. In the new example the estimate for the scale in between standard deviation and robust MAD, while the mean estimate is larger than the median and the mean.
>>> res = RLM(a, np.ones(len(a)), M=norms.HuberT(t=1.5)).fit(scale_est=scale.HuberScale(d=1.5))
>>> res.params, res.scale
(array([ 2.47711987]), 2.5218278029435406)
>>> np.median(a), scale.mad(a)
(1.1503564468849041, 0.98954533464908301)
>>> np.mean(a), np.std(a)
(2.8650886010542269, 3.0657561979615977)
The resulting weights show that some of the high values are downweighted
>>> widx = np.argsort(res.weights)
>>> np.asarray(a)[widx[:10]]
array([ 16.54977315, 9.80192408, 9.66260559, 9.59051007,
9.53417205, 9.04132847, 8.97475516, 8.88960531,
8.67044986, 8.51607015])
I am not familiar with the details of the implementation of the Huber joint mean-scale estimator.
One possible reason for the convergence failure is that the distribution of the values is bunched in 3 groups with one extra outlier at 16, visible when plotting the histogram. This could result in a convergence cycle with the iterative solver where the third group is either included or excluded. But that is just a guess.

Resources