how to solve this shap.waterfall_plot error? - python-3.x

I'm trying to do a waterfall plot form the SHAP library to represent an instance of the predictions of a model like that:
ex = shap.Explanation(shap_values[0],
explainer.expected_value,
X.iloc[0],
columns)
ex
ex returns that:
.values =
array([-2.27243590e-01, 5.41666667e-02, 3.33333333e-03, 2.21153846e-02,
1.92307692e-04, -7.17948718e-02])
.base_values =
0.21923076923076923
.data =
BMI 18.716444
ROM-PADF-KE_D 33
Asym-ROM-PHIR(≥8)_discr 1
Asym_SLCMJLanding-pVGRF(10percent)_discr 1
Asym_TJ_Valgus_FPPA(10percent)_discr 1
DVJ_Valgus_KneeMedialDisplacement_D_discr 0
Name: 0, dtype: object
but when I try to plot the waterfall plot I receive that error
shap.waterfall_plot(ex)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
/tmp/ipykernel_4785/3628025354.py in <module>
----> 1 shap.waterfall_plot(ex)
/usr/local/lib/python3.8/dist-packages/shap/plots/_waterfall.py in waterfall(shap_values, max_display, show)
120 yticklabels[rng[i]] = feature_names[order[i]]
121 else:
--> 122 yticklabels[rng[i]] = format_value(features[order[i]], "%0.03f") + " = " + feature_names[order[i]]
123
124 # add a last grouped feature to represent the impact of all the features we didn't show
/usr/local/lib/python3.8/dist-packages/shap/utils/_general.py in format_value(s, format_str)
232 s = format_str % s
233 s = re.sub(r'\.?0+$', '', s)
--> 234 if s[0] == "-":
235 s = u"\u2212" + s[1:]
236 return s
IndexError: string index out of range**strong text**
Edit for minimal reproducible error:
the explainer is a kernel explainer:
explainer_2 = shap.KernelExplainer(sci_Model_2.predict, X)
shap_values_2 = explainer.shap_values(X)
X and y are lists from dataFrames charged like that:
y = data_modelo_1_2_csv_encoded['Soft-Tissue_injury_≥4days']
y_list = label_encoder.fit_transform(y)
X = data_modelo_1_2_csv_encoded.drop('Soft-Tissue_injury_≥4days',axis=1)
X_list = X.to_numpy()
and the model is a little weka model wrapper for python, to use python libraries with weka models like SHAP, done like that:
class weka_classifier(BaseEstimator, ClassifierMixin):
def __init__(self, classifier = None, dataset = None):
if classifier is not None:
self.classifier = classifier
if dataset is not None:
self.dataset = dataset
self.dataset.class_is_last()
if index is not None:
self.index = index
def fit(self, X, y):
return self.fit2()
def fit2(self):
return self.classifier.build_classifier(self.dataset)
def predict_instance(self,x):
x.append(0.0)
inst = Instance.create_instance(x,classname='weka.core.DenseInstance', weight=1.0)
inst.dataset = self.dataset
return self.classifier.classify_instance(inst)
def predict_proba_instance(self,x):
x.append(0.0)
inst = Instance.create_instance(x,classname='weka.core.DenseInstance', weight=1.0)
inst.dataset = self.dataset
return self.classifier.distribution_for_instance(inst)
def predict_proba(self,X):
prediction = []
for i in range(X.shape[0]):
instance = []
for j in range(X.shape[1]):
instance.append(X[i][j])
instance.append(0.0)
instance = Instance.create_instance(instance,classname='weka.core.DenseInstance', weight=1.0)
instance.dataset=self.dataset
prediction.append(self.classifier.distribution_for_instance(instance))
return np.asarray(prediction)
def predict(self,X):
prediction = []
for i in range(X.shape[0]):
instance = []
for j in range(X.shape[1]):
instance.append(X[i][j])
instance.append(0.0)
instance = Instance.create_instance(instance,classname='weka.core.DenseInstance', weight=1.0)
instance.dataset=self.dataset
prediction.append(self.classifier.classify_instance(instance))
return np.asarray(prediction)
def set_data(self,dataset):
self.dataset = dataset
self.dataset.class_is_last()
the database is an arff charged to a csv and uploaded like a dataframe with this variables:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 260 entries, 0 to 259
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 BMI 260 non-null float64
1 ROM-PADF-KE_D 260 non-null int64
2 Asym-ROM-PHIR(≥8)_discr 260 non-null int64
3 Asym_SLCMJLanding-pVGRF(10percent)_discr 260 non-null int64
4 Asym_TJ_Valgus_FPPA(10percent)_discr 260 non-null int64
5 DVJ_Valgus_KneeMedialDisplacement_D_discr 260 non-null int64
6 Soft-Tissue_injury_≥4days 260 non-null category
dtypes: category(1), float64(1), int64(5)

likely your issue is that 0 in your .data field is a string instead if a number.
I can reproduce the same error with format_value('0', "%0.03f").
Looking at current format_value we can see that it removes all trailing zeros from a string and in particular format_value('100', "%0.03f") gives 1.
This is a bug and that the regex should be replaced (for example with this: https://stackoverflow.com/a/26299205/4178189)
Note that when you supply a number (e.g. 100 or 0) the number is first replaced with a string (100.000 or 0.000) so the function does not show its bug when called with a number (int or float).
Also the development version of shap (not yet released), would not suffer from this issue since when called with a non number value the function waterfall_plot would not call format_value, see: https://github.com/slundberg/shap/blob/8926cd0122d0a1b3cca0768f2c386de706090668/shap/plots/_waterfall.py#L127
note: this question is also a github issue, see https://github.com/slundberg/shap/issues/2581#issuecomment-1155134604

Related

Too many copies? Poor comparison? Urn Probability Problem

full code: https://gist.github.com/QuantVI/79a1c164f3017c6a7a2d860e55cf5d5b
TLDR: sum(a3) gives a number like 770, when it should be more like 270 - as in 270 of 1000 trials where the results of drawing 4 contained (at least) 2 blue and 1 green ball.
I've rewritten both my way of creating the sample output, and my way of comparing the results twice already. Python as a syntax `all(x in a for x n b)` which I used initially, then change to something more deliberate to see if there was a change. I still have 750+ `True` evaluations of each trial. This is why I reassessed how I was selecting without replacement.
I've tested the draw function on its own with different Hats and was sure it worked.
The expected probability when drawing 4balls, without replacement, from a hat containing (blue=3,red=2,green=6), and having the outcome contain (blue=2,green=1) or ['blue','blue','green']
is around 27.2%. In my 1000 trials, I get higher then 700, repeatedly.
Is the error in Hat.draw() or is it in experiment()?
Note: Certain things are commented out, because I am debugging. Thus use sum(a3) as experiment is commented out to return things other than the probability right now.
import copy
import random
# Consider using the modules imported above.
class Hat:
def __init__(self, **kwargs):
self.d = kwargs
self.contents = [
key for key, val in kwargs.items() for num in range(val)
]
def draw(self, num: int) -> list:
if num >= len(self.contents):
return self.contents
else:
indices = random.sample(range(len(self.contents)), num)
chosen = [self.contents[idx] for idx in indices]
#new_contents = [ v for i, v in enumerate(self.contents) if i not in indices]
new_contents = [pair[1] for pair in enumerate(self.contents)
if pair[0] not in indices]
self.contents = new_contents
return chosen
def __repr__(self): return str(self.contents)
def experiment(hat, expected_balls, num_balls_drawn, num_experiments):
trials =[]
for n in range(num_experiments):
copyn = copy.deepcopy(hat)
result = copyn.draw(num_balls_drawn)
trials.append(result)
#trials = [ copy.deepcopy(hat).draw(num_balls_drawn) for n in range(num_experiments) ]
expected_contents = [key for key, val in expected_balls.items() for num in range(val)]
temp_eval = [[o for o in expected_contents if o in trial] for trial in trials]
temp_compare = [ evaled == expected_contents for evaled in temp_eval]
return expected_contents,temp_eval,temp_compare, trials
#evaluations = [ all(x in trial for x in expected_contents) for trial in trials ]
#if evaluations: prob = sum(evaluations)/len(evaluations)
#else: prob = 0
#return prob, expected_contents
#hat3 = Hat(red=5, orange=4, black=1, blue=0, pink=2, striped=9)
#hat4 = Hat(red=1, orange=2, black=3, blue=2)
hat1 = Hat(blue=3,red=2,green=6)
a1,a2,a3,a4 = experiment(hat=hat1, expected_balls={"blue":2,"green":1}, num_balls_drawn=4, num_experiments=1000)
#actual = probability
#expected = 0.272
#self.assertAlmostEqual(actual, expected, delta = 0.01, msg = 'Expected experiment method to return a different probability.')
hat2 = Hat(yellow=5,red=1,green=3,blue=9,test=1)
b1,b2,b3,b4 = experiment(hat=hat2, expected_balls={"yellow":2,"blue":3,"test":1}, num_balls_drawn=20, num_experiments=100)
#actual = probability
#expected = 1.0
#self.assertAlmostEqual(actual, expected, delta = 0.01, msg = 'Expected experiment method to return a different probability.')
The issue is temp_eval = [[o for o in expected_contents if o in trial] for trial in trials]. It will always ad both blue to the list even if only one blue exists in the results of one trial.
However, I couldn't fix the error in a straight-forward way. Instead, my fix created a much lower answer, something less than 0.1, when around 0.27 is (270 of 1000 trials) is what I need.
The roundabout solution was to convert lists like ['red', 'green', 'blue', 'green'] into dictionaries using list on collections.Counter of that list. Then do a key-wose comparison of the values, such as [y[key]<= x.get(key,0) for key in y.keys()]). In this comparison y is the expected_balls variable, and x is the list of the counter object. If x doesn't have one of the keys, we get 0. Zero will be less than the value of any key in expected_balls.
From here we use functols.reduce to turn the output into a single True or False value. Then we map that functionality (compare all keys and get one T/F value) across all trials.
def experiment(hat, expected_balls, num_balls_drawn, num_experiments):
trials =[]
trials = [ copy.deepcopy(hat).draw(num_balls_drawn)
for n in range(num_experiments) ]
trials_kvpairs = [dict(collections.Counter(trial)) for trial in trials]
def contains(contained:dict , container:dict):
each = [container.get(key,0) >= contained[key]
for key in contained.keys()]
return reduce(lambda item0,item1: item0 and item1, each)
trials_success = list(map(lambda t: contains(expected_balls,t), trials_kvpairs))
# expected_contents = [pair[0] for pair in expected_balls.items() for num in range(pair[1])]
# temp_eval = [[o for o in trial if o in expected_contents] for trial in trials]
# temp_compare = [ evaled == expected_contents for evaled in temp_eval]
# if temp_compare: prob = sum(temp_compare)/len(trials)
# else: prob = 0
return 'prob', trials_kvpairs, trials_success
When run using the this experiment(hat=hat1, expected_balls={"blue":2,"green":1}, num_balls_drawn=4, num_experiments=1000) the sum of the third part of the output was 276.

What is the best way of solving a memory error without changing values in function?

I'm doing some tests to check if some choices from my sampling algorithm is better changing its values.
As I was doing them(till this moment without a hitch) and tried to run a couple more tests for more results I got the MemoryError.
MemoryError Traceback (most recent call last)
<ipython-input-66-1ab060bc6067> in <module>
22 for g in range(0,10000):
23 # sample
---> 24 sample_df = stratified_sample(df,test,size=38, keep_index=False)
25 pathaux = "C://Users//Pedro//Desktop//EscolhasAlgoritmos//Stratified//Stratified_Tests//"
26 example = "exampleFCUL"
<ipython-input-10-7aba847839db> in stratified_sample(df, strata, size, seed, keep_index)
79 # final dataframe
80 if first:
---> 81 stratified_df = df.query(qry).sample(n=n, random_state=seed).reset_index(drop=(not keep_index))
82 first = False
83 else:
D:\Anaconda\lib\site-packages\pandas\core\frame.py in query(self, expr, inplace, **kwargs)
3182 kwargs["level"] = kwargs.pop("level", 0) + 1
3183 kwargs["target"] = None
-> 3184 res = self.eval(expr, **kwargs)
3185
3186 try:
D:\Anaconda\lib\site-packages\pandas\core\frame.py in eval(self, expr, inplace, **kwargs)
3298 kwargs["target"] = self
3299 kwargs["resolvers"] = kwargs.get("resolvers", ()) + tuple(resolvers)
-> 3300 return _eval(expr, inplace=inplace, **kwargs)
3301
3302 def select_dtypes(self, include=None, exclude=None):
D:\Anaconda\lib\site-packages\pandas\core\computation\eval.py in eval(expr, parser, engine, truediv, local_dict, global_dict, resolvers, level, target, inplace)
325 eng = _engines[engine]
326 eng_inst = eng(parsed_expr)
--> 327 ret = eng_inst.evaluate()
328
329 if parsed_expr.assigner is None:
D:\Anaconda\lib\site-packages\pandas\core\computation\engines.py in evaluate(self)
68
69 # make sure no names in resolvers and locals/globals clash
---> 70 res = self._evaluate()
71 return _reconstruct_object(
72 self.result_type, res, self.aligned_axes, self.expr.terms.return_type
D:\Anaconda\lib\site-packages\pandas\core\computation\engines.py in _evaluate(self)
117 truediv = scope["truediv"]
118 _check_ne_builtin_clash(self.expr)
--> 119 return ne.evaluate(s, local_dict=scope, truediv=truediv)
120 except KeyError as e:
121 # python 3 compat kludge
D:\Anaconda\lib\site-packages\numexpr\necompiler.py in evaluate(ex, local_dict, global_dict, out, order, casting, **kwargs)
814 expr_key = (ex, tuple(sorted(context.items())))
815 if expr_key not in _names_cache:
--> 816 _names_cache[expr_key] = getExprNames(ex, context)
817 names, ex_uses_vml = _names_cache[expr_key]
818 arguments = getArguments(names, local_dict, global_dict)
D:\Anaconda\lib\site-packages\numexpr\necompiler.py in getExprNames(text, context)
705
706 def getExprNames(text, context):
--> 707 ex = stringToExpression(text, {}, context)
708 ast = expressionToAST(ex)
709 input_order = getInputOrder(ast, None)
D:\Anaconda\lib\site-packages\numexpr\necompiler.py in stringToExpression(s, types, context)
282 else:
283 flags = 0
--> 284 c = compile(s, '<expr>', 'eval', flags)
285 # make VariableNode's for the names
286 names = {}
MemoryError:
My question is, what is the best way of solving this memory error, without changing number of parameters? With all the search I did here and on Google I have no clear answser.
Code:
def transform(multilevelDict):
return {"t"+'_'+str(key) : (transform(value) if isinstance(value, dict) else value) for key, value in multilevelDict.items()}
df = pd.read_csv('testingwebsitedata6.csv', sep=';')
df['Element_Count'] = df['Element_Count'].apply((json.loads))
df['Tag_Count'] = df['Tag_Count'].apply((json.loads))
for i in range(len(df['Tag_Count'])):
df['Tag_Count'][i] = transform(df['Tag_Count'][i])
df1 = pd.DataFrame(df['Element_Count'].values.tolist())
df2 = pd.DataFrame(df['Tag_Count'].values.tolist())
df = pd.concat([df.drop('Element_Count', axis=1), df1], axis=1)
df= pd.concat([df.drop('Tag_Count', axis=1), df2], axis=1)
df= df.fillna(0)
df[df.select_dtypes(include=['float64']).columns]= df.select_dtypes(include=['float64']).astype(int)
df
test= ['link', 'document', 'heading', 'form', 'textbox', 'button', 'list', 'listitem', 'img', 'navigation', 'banner', 'main', 'article', 'contentinfo', 'checkbox', 'table', 'rowgroup', 'row', 'cell', 'listbox', 'presentation', 'figure', 'columnheader', 'separator', 'group', 'region', 't_html', 't_head', 't_title', 't_meta', 't_link', 't_script', 't_style', 't_body', 't_a', 't_div', 't_h1', 't_form', 't_label', 't_input', 't_ul', 't_li', 't_i', 't_img', 't_nav', 't_header', 't_span', 't_article', 't_p', 't_footer', 't_h3', 't_br', 't_noscript', 't_em', 't_strong', 't_button', 't_h2', 't_ol', 't_time', 't_center', 't_table', 't_tbody', 't_tr', 't_td', 't_font', 't_select', 't_option', 't_b', 't_figure', 't_figcaption', 't_u', 't_iframe', 't_caption', 't_thead', 't_th', 't_h5', 't_sup', 't_map', 't_area', 't_hr', 't_h4', 't_blockquote', 't_sub', 't_fieldset', 't_legend', 't_pre', 't_main', 't_section', 't_small', 't_tfoot', 't_textarea', 't_inserir', 't_s']
print('test1')
print('\n')
for g in range(0,10000):
# sample
sample_df = stratified_sample(df,test,size=38, keep_index=False)
pathaux = "C://Users//Pedro//Desktop//EscolhasAlgoritmos//Stratified//Stratified_Tests//"
example = "exampleFCUL"
randomnumber = g+1
csv = ".csv"
path = pathaux + '26'+'//'+ example +str(randomnumber) + csv
chosencolumns= ["Uri"]
sample_df.to_csv(path,sep=';', index = False, columns =chosencolumns, header = False)
Stratifed Sampling function used:
def stratified_sample(df, strata, size=None, seed=None, keep_index= True):
'''
It samples data from a pandas dataframe using strata. These functions use
proportionate stratification:
n1 = (N1/N) * n
where:
- n1 is the sample size of stratum 1
- N1 is the population size of stratum 1
- N is the total population size
- n is the sampling size
Parameters
----------
:df: pandas dataframe from which data will be sampled.
:strata: list containing columns that will be used in the stratified sampling.
:size: sampling size. If not informed, a sampling size will be calculated
using Cochran adjusted sampling formula:
cochran_n = (Z**2 * p * q) /e**2
where:
- Z is the z-value. In this case we use 1.96 representing 95%
- p is the estimated proportion of the population which has an
attribute. In this case we use 0.5
- q is 1-p
- e is the margin of error
This formula is adjusted as follows:
adjusted_cochran = cochran_n / 1+((cochran_n -1)/N)
where:
- cochran_n = result of the previous formula
- N is the population size
:seed: sampling seed
:keep_index: if True, it keeps a column with the original population index indicator
Returns
-------
A sampled pandas dataframe based in a set of strata.
Examples
--------
>> df.head()
id sex age city
0 123 M 20 XYZ
1 456 M 25 XYZ
2 789 M 21 YZX
3 987 F 40 ZXY
4 654 M 45 ZXY
...
# This returns a sample stratified by sex and city containing 30% of the size of
# the original data
>> stratified = stratified_sample(df=df, strata=['sex', 'city'], size=0.3)
Requirements
------------
- pandas
- numpy
'''
population = len(df)
size = __smpl_size(population, size)
tmp = df[strata]
tmp['size'] = 1
tmp_grpd = tmp.groupby(strata).count().reset_index()
tmp_grpd['samp_size'] = round(size/population * tmp_grpd['size']).astype(int)
# controlling variable to create the dataframe or append to it
first = True
for i in range(len(tmp_grpd)):
# query generator for each iteration
qry=''
for s in range(len(strata)):
stratum = strata[s]
value = tmp_grpd.iloc[i][stratum]
n = tmp_grpd.iloc[i]['samp_size']
if type(value) == str:
value = "'" + str(value) + "'"
if s != len(strata)-1:
qry = qry + stratum + ' == ' + str(value) +' & '
else:
qry = qry + stratum + ' == ' + str(value)
# final dataframe
if first:
stratified_df = df.query(qry).sample(n=n, random_state=seed).reset_index(drop=(not keep_index))
first = False
else:
tmp_df = df.query(qry).sample(n=n, random_state=seed).reset_index(drop=(not keep_index))
stratified_df = stratified_df.append(tmp_df, ignore_index=True)
return stratified_df
def stratified_sample_report(df, strata, size=None):
'''
Generates a dataframe reporting the counts in each stratum and the counts
for the final sampled dataframe.
Parameters
----------
:df: pandas dataframe from which data will be sampled.
:strata: list containing columns that will be used in the stratified sampling.
:size: sampling size. If not informed, a sampling size will be calculated
using Cochran adjusted sampling formula:
cochran_n = (Z**2 * p * q) /e**2
where:
- Z is the z-value. In this case we use 1.96 representing 95%
- p is the estimated proportion of the population which has an
attribute. In this case we use 0.5
- q is 1-p
- e is the margin of error
This formula is adjusted as follows:
adjusted_cochran = cochran_n / 1+((cochran_n -1)/N)
where:
- cochran_n = result of the previous formula
- N is the population size
Returns
-------
A dataframe reporting the counts in each stratum and the counts
for the final sampled dataframe.
'''
population = len(df)
size = __smpl_size(population, size)
tmp = df[strata]
tmp['size'] = 1
tmp_grpd = tmp.groupby(strata).count().reset_index()
tmp_grpd['samp_size'] = round(size/population * tmp_grpd['size']).astype(int)
return tmp_grpd
def __smpl_size(population, size):
'''
A function to compute the sample size. If not informed, a sampling
size will be calculated using Cochran adjusted sampling formula:
cochran_n = (Z**2 * p * q) /e**2
where:
- Z is the z-value. In this case we use 1.96 representing 95%
- p is the estimated proportion of the population which has an
attribute. In this case we use 0.5
- q is 1-p
- e is the margin of error
This formula is adjusted as follows:
adjusted_cochran = cochran_n / 1+((cochran_n -1)/N)
where:
- cochran_n = result of the previous formula
- N is the population size
Parameters
----------
:population: population size
:size: sample size (default = None)
Returns
-------
Calculated sample size to be used in the functions:
- stratified_sample
- stratified_sample_report
'''
if size is None:
cochran_n = round(((1.96)**2 * 0.5 * 0.5)/ 0.02**2)
n = round(cochran_n/(1+((cochran_n -1) /population)))
elif size >= 0 and size < 1:
n = round(population * size)
elif size < 0:
raise ValueError('Parameter "size" must be an integer or a proportion between 0 and 0.99.')
elif size >= 1:
n = size
return n
(Anything that I have forgot to mention that u feel is important to understand the problem please say and I will edit it in)

Why Exception Handling doesn't print text?

My question is why Python doesn't execute the print statement in the Exception Handling code below. I am trying to calculate the log of volumes for a bunch of stocks. Each stock has 1259 volume values. But Python generates a RunTimeWarning "divide by zero encountered in log". So I try to use Exception Handling to locate where the log input is zero, but Python doesn't execute the print statement under except. The print statement is supposed to print the name of the stock and the index in the array where the volume is zero. Why?
Here is the code:
for i, stock in enumerate(df.columns):
volumes = df[stock].to_numpy()
for r in range(len(volumes)): # len(volumes) = 1259
try:
v = np.log(volumes[r])
except:
print(stock, r)
Here is the Error that follows after the RunTimeWarning.
LinAlgError Traceback (most recent call last)
<ipython-input-6-6aa283671e2c> in <module>
13 closes = df_close[stock].to_numpy()
14 volumes = df_vol[stock].to_numpy()
---> 15 indicator_values_all_stocks[i] = indicator.price_volume_fit(volumes, closes, histLength)
16
17 indicator_values_all_stocks_no_NaN = indicator_values_all_stocks[:, ~np.isnan(indicator_values_all_stocks).any(axis=0)]
~\Desktop\Python Projects Organized\Finance\Indicator Statistics\B.57. Price Volume Fit\indicator.py in price_volume_fit(volumes, closes, histLength)
1259 x = log_volumes[i - histLength:i]
1260 y = log_prices[i - histLength:i]
-> 1261 model = np.polyfit(x, y, 1, full = True)
1262 slope[i] = model[0][0]
1263
<__array_function__ internals> in polyfit(*args, **kwargs)
c:\users\donald seger\miniconda3\envs\tensorflow\lib\site-packages\numpy\lib\polynomial.py in polyfit(x, y, deg, rcond, full, w, cov)
629 scale = NX.sqrt((lhs*lhs).sum(axis=0))
630 lhs /= scale
--> 631 c, resids, rank, s = lstsq(lhs, rhs, rcond)
632 c = (c.T/scale).T # broadcast scale coefficients
633
<__array_function__ internals> in lstsq(*args, **kwargs)
c:\users\donald seger\miniconda3\envs\tensorflow\lib\site-packages\numpy\linalg\linalg.py in lstsq(a, b, rcond)
2257 # lapack can't handle n_rhs = 0 - so allocate the array one larger in that axis
2258 b = zeros(b.shape[:-2] + (m, n_rhs + 1), dtype=b.dtype)
-> 2259 x, resids, rank, s = gufunc(a, b, rcond, signature=signature, extobj=extobj)
2260 if m == 0:
2261 x[...] = 0
c:\users\donald seger\miniconda3\envs\tensorflow\lib\site-packages\numpy\linalg\linalg.py in _raise_linalgerror_lstsq(err, flag)
107
108 def _raise_linalgerror_lstsq(err, flag):
--> 109 raise LinAlgError("SVD did not converge in Linear Least Squares")
110
111 def get_linalg_error_extobj(callback):
LinAlgError: SVD did not converge in Linear Least Squares

Cannot cast ufunc subtract output from dtype('float64') to dtype('int64') with casting rule 'same_kind' despite forced convertion

I have a data Series ts:
0 2599.0
1 2599.0
2 3998.0
3 3998.0
4 1299.0
5 1499.0
6 1499.0
7 2997.5
8 749.5
Name: 0, dtype: float64
and I would like to predict the next period using ARIMA:
import statsmodels.tsa.api as smt
array = []
for i, row in test.iterrows():
print("row['shop_id']: ", row['shop_id'], " row['item_id']: ", row['item_id'])
ts = pd.DataFrame(sales_monthly.loc[pd.IndexSlice[:, [row['shop_id']],[row['item_id']]], :]['item_price'].values*sales_monthly.loc[pd.IndexSlice[:, [row['shop_id']],[row['item_id']]], :]['item_cnt_day'].values).T.iloc[0]
rng = range(5)
for i in rng:
for j in rng:
try:
tmp_mdl = smt.ARMA(ts, order = (i, j)).fit(method='mle', trand='nc')
tmp_aic = tmp_mdl.aic
if tmp_aic < best_aic:
best_aic = tmp_aic
best_order = (i, j)
best_mdl = tmp_mdl
except:
continue
if best_mdl.predict()<0:
y_pred = 0
else:
y_pred = best_mdl.predict()
d = {'id':row['ID'], 'item_cnt_month': y_pred}
array.append(d)
df = pd.DataFrame(array)
df
But I get:
---------------------------------------------------------------------------
UFuncTypeError Traceback (most recent call last)
<ipython-input-104-85dfa2fa67c1> in <module>()
22 except:
23 continue
---> 24 if best_mdl.predict()<0:
25 y_pred = 0
26 else:
3 frames
/usr/local/lib/python3.6/dist-packages/statsmodels/tsa/arima_model.py in geterrors(self, params)
686 k = self.k_exog + self.k_trend
687 if k > 0:
--> 688 y -= dot(self.exog, params[:k])
689
690 k_ar = self.k_ar
UFuncTypeError: Cannot cast ufunc 'subtract' output from dtype('float64') to dtype('int64') with casting rule 'same_kind'
So I used best_mdl.predict().astype('float32') but it didn't changed anything.

Why one function are giving diffrent errors in Python

This is an assignment in Udacity linear algebra refresher course and I have solved it in my way although there is a solution available for this assignment I am trying to solve it in my way. But I am getting error while finding angle between 2 vector that is vectore v1 and v2 .
There is 2 problems asked 2 solve to this program. But even if there is same. but to programs is giving to diffrent out put even if there "angle" function is same.
import math
from decimal import Decimal,getcontext
getcontext().prec = 30
class Vector(object):
def __init__(self,coordinates):
try:
if not coordinates:
raise ValueError
#if coordinates is not passed then it will rise Value Error
self.coordinates = tuple([Decimal(x)for x in coordinates])
#Outside Class :-Vector.coordinates will give print vectors in tuple form
#Inside Class :- self.coordinates will print vectors in tuple form
self.dimension = len(coordinates)
#Outside Class :-Vector.dimension will print vectors dimension/size
#Inside Class :- self.dimension will print vectors dimension/size
except ValueError:
raise ValueError('The coordinates must be non empty')
except TypeError:
raise TypeError('The coordinates must be itterable')
def __str__(self):
return 'Vector:{}'.format(self,coordinates)
def __eq__(self,v):
return self.coordinates == v.coordinates
def add (self,v):
coordinates=[]
for i in range(0,self.dimension):
i=self.coordinates[i]+v.coordinates[i]
coordinates.append(i)
return coordinates
def mul(self,v):
coordinates=[]
for i in range(0,self.dimension):
i=self.coordinates[i]*v.coordinates[i]
coordinates.append(i)
return coordinates
def sub (self,v):
coordinates=[]
for i in range(0,self.dimension):
i=self.coordinates[i]-v.coordinates[i]
coordinates.append(i)
return coordinates
def scal_mul(self,s):
coordinates=[]
for i in self.coordinates:
i=i*Decimal(s)
coordinates.append(i)
return coordinates
def magnitude(self):
mag = 0
for i in self.coordinates:
i=i*i
mag =mag+i
return math.sqrt(mag)
def magnitude1(self):
mag = 0
coordinate_squre=[i*i for i in self.coordinates]
return math.sqrt(sum(coordinate_squre))
# def normalize(self):
# try:
# recip = Decimal(1)/self.magnitude()
# return Vector(self.scal_mul(recip))
# except ZeroDivisionError:
# raise Exception("Can not Normalize Zero Vector")
def normalize(self):
try:
recip = 1/self.magnitude()
return self.scal_mul(recip)
except ZeroDivisionError:
raise Exception("Can not Normalize Zero Vector")
def dot_product(self,v):
mul = self.mul(v)
return sum(mul)
def dot_product2(self,v):
self.mul = [x*y for x,y in zip(self.coordinates,v.coordinates)]
return sum(self.mul)
#-----------------angle functionn is giving wrong answer-------------
def angle_rad(self,v):
norm = self.normalize()
angle = 1/ math.cos(norm.dot_product2(v))
return angle
def angle(self,v,in_degree=False):
nrm_self = self.normalize()
nrm_v = v.normalize()
angle_rad = math.acos(nrm_self.dot_product2(nrm_v))
if in_degree:
angle_in_degree = angle_rad * 180./math.pi
return angle_in_degree
else:
return angle_rad
v1 = Vector([7.887,4.138])
v2 = Vector([-8.802,6.776])
print(v1.angle(v2))
v1 = Vector([-7.579,-7.88])
v2 = Vector([22.737,23.64])
v2.angle(v1)
This Code is giving Error as following
AttributeError Traceback (most recent call last)
<ipython-input-44-2087e4f0ca26> in <module>()
101 v1 = Vector([7.887,4.138])
102 v2 = Vector([-8.802,6.776])
--> 103 print(v1.angle(v2))
104 v1 = Vector([-7.579,-7.88])
105 v2 = Vector([22.737,23.64])
<ipython-input-44-2087e4f0ca26> in angle(self, v, in_degree)
92 nrm_self = self.normalize()
93 nrm_v = v.normalize()
---> 94 angle_rad = math.acos(nrm_self.dot_product2(nrm_v))
95 if in_degree:
96 angle_in_degree = angle_rad * 180./math.pi
AttributeError: 'list' object has no attribute 'dot_product2'
And another program with exactly same angle function is
import math
class Vector(object):
def __init__(self,coordinates):
try:
if not coordinates:
raise ValueError
#if coordinates is not passed then it will rise Value Error
self.coordinates = tuple(coordinates)
#Outside Class :-Vector.coordinates will give print vectors in tuple form
#Inside Class :- self.coordinates will print vectors in tuple form
self.dimension = len(coordinates)
#Outside Class :-Vector.dimension will print vectors dimension/size
#Inside Class :- self.dimension will print vectors dimension/size
except ValueError:
raise ValueError('The coordinates must be non empty')
except TypeError:
raise TypeError('The coordinates must be itterable')
def __str__(self):
return 'Vector:{}'.format(self,coordinates)
def __eq__(self,v):
return self.coordinates == v.coordinates
def add (self,v):
coordinates=[]
for i in range(0,self.dimension):
i=self.coordinates[i]+v.coordinates[i]
coordinates.append(i)
return coordinates
def mul(self,v):
coordinates=[]
for i in range(0,self.dimension):
i=self.coordinates[i]*v.coordinates[i]
coordinates.append(i)
return coordinates
def sub (self,v):
coordinates=[]
for i in range(0,self.dimension):
i=self.coordinates[i]-v.coordinates[i]
coordinates.append(i)
return coordinates
def scal_mul(self,s):
coordinates=[]
for i in self.coordinates:
i=i*s
coordinates.append(i)
return coordinates
def magnitude(self):
mag = 0
for i in self.coordinates:
i=i*i
mag =mag+i
return math.sqrt(mag)
def magnitude1(self):
mag = 0
coordinate_squre=[i*i for i in self.coordinates]
return math.sqrt(sum(coordinate_squre))
def normalize(self):
try:
recip = 1/self.magnitude()
return Vector(self.scal_mul(recip))
except ZeroDivisionError:
raise Exception("Can not Normalize Zero Vector")
def dot_product(self,v):
mul = self.mul(v)
return sum(mul)
def dot_product2(self,v):
self.mul = [x*y for x,y in zip(self.coordinates,v.coordinates)]
return sum(self.mul)
#-----------------angle functionn is giving wrong answer-------------
def angle_rad(self,v):
norm = self.normalize()
angle = 1/ math.cos(norm.dot_product2(v))
return angle
def angle(self,v,in_degree=False):
nrm_self = self.normalize()
nrm_v = v.normalize()
angle_rad = math.acos(nrm_self.dot_product2(nrm_v))
if in_degree:
angle_in_degree = angle_rad * 180./math.pi
return angle_in_degree
else:
return angle_rad
v1 = Vector([7.887,4.138])
v2 = Vector([-8.802,6.776])
print(v1.angle(v2))
v1 = Vector([-7.579,-7.88])
v2 = Vector([22.737,23.64])
v2.angle(v1)
it is giving Error as following
2.0023426999774925
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-214-2e3bee12967a> in <module>()
95 v1 = Vector([-7.579,-7.88])
96 v2 = Vector([22.737,23.64])
---> 97 v2.angle(v1)
98
<ipython-input-214-2e3bee12967a> in angle(self, v, in_degree)
82 nrm_self = self.normalize()
83 nrm_v = v.normalize()
---> 84 angle_rad = math.acos(nrm_self.dot_product2(nrm_v))
85 if in_degree:
86 angle_in_degree = angle_rad * 180./math.pi
ValueError: math domain error
We can see they are having same angle function
The problem with your first program is that the normalize method returns a list of coordinates, return self.scal_mul(recip) but doesn't convert that list into a Vector object like the second program, which instead return Vector(self.scal_mul(recip)). When you call nrm_self.dot_product2(nrm_v), the object nrm_self is a list, not a Vector, and it doesn't have a dot_product2 method. You need to add an explicit call to the constructor Vector(), like in the second program, to be able to invoke methods from the norm_self object.
The math domain error is being thrown because dot_product2 is returning a value larger than 1 (unfortunately, 1.0000000000000002), and the inverse cosine acos(x) is only defined for values between -1 and 1. This is due to numerical error in the normalization, and can be corrected by always rounding the reciprocal slightly up.

Resources