Featuretools dfs runtime error - featuretools

Working through the featuretools "predict_next_purchase" demo against my own data. I've created the entity set, and have also created a new pandas.dataframe comprised of the labels and times. I'm to the point of using ft.dfs for deep feature synthesis, and am getting a RuntimeError: maximum recursion depth exceeded. Below is the stack trace:
feature_matrix, features = ft.dfs(target_entity='projects',
cutoff_time=labels.reset_index().loc[:,['jobnumber','time']],
training_window=inst_defn['training_window'],
entityset=es,
verbose=True)
Stack Trace:
Building features: 0it [00:00, ?it/s]
RuntimeError: maximum recursion depth exceeded
RuntimeErrorTraceback (most recent call last)
<ipython-input-743-f05fc567dd1b> in <module>()
3 training_window=inst_defn['training_window'],
4 entityset=es,
----> 5 verbose=True)
/Users/nbernini/OneDrive/PSC/venv/ml20/lib/python2.7/site-packages/featuretools/synthesis/dfs.pyc in dfs(entities, relationships, entityset, target_entity, cutoff_time, instance_ids, agg_primitives, trans_primitives, allowed_paths, max_depth, ignore_entities, ignore_variables, seed_features, drop_contains, drop_exact, where_primitives, max_features, cutoff_time_in_index, save_progress, features_only, training_window, approximate, verbose)
164 seed_features=seed_features)
165
--> 166 features = dfs_object.build_features(verbose=verbose)
167
168 if features_only:
/Users/nbernini/OneDrive/PSC/venv/ml20/lib/python2.7/site-packages/featuretools/synthesis/deep_feature_synthesis.pyc in build_features(self, variable_types, verbose)
227 self.where_clauses = defaultdict(set)
228 self._run_dfs(self.es[self.target_entity_id], [],
--> 229 all_features, max_depth=self.max_depth)
230
231 new_features = list(all_features[self.target_entity_id].values())
/Users/nbernini/OneDrive/PSC/venv/ml20/lib/python2.7/site-packages/featuretools/synthesis/deep_feature_synthesis.pyc in _run_dfs(self, entity, entity_path, all_features, max_depth)
353 entity_path=list(entity_path),
354 all_features=all_features,
--> 355 max_depth=new_max_depth)
356
357 """
/Users/nbernini/OneDrive/PSC/venv/ml20/lib/python2.7/site-packages/featuretools/synthesis/deep_feature_synthesis.pyc in _run_dfs(self, entity, entity_path, all_features, max_depth)
338 if self._apply_traversal_filters(entity, self.es[b_id],
339 entity_path,
--> 340 forward=False) and
341 b_id not in self.ignore_entities]
342 for b_entity_id in backward_entities:
/Users/nbernini/OneDrive/PSC/venv/ml20/lib/python2.7/site-packages/featuretools/synthesis/deep_feature_synthesis.pyc in _apply_traversal_filters(self, parent_entity, child_entity, entity_path, forward)
429 child_entity=child_entity,
430 target_entity_id=self.target_entity_id,
--> 431 entity_path=entity_path, forward=forward):
432 return False
433
/Users/nbernini/OneDrive/PSC/venv/ml20/lib/python2.7/site-packages/featuretools/synthesis/dfs_filters.pyc in is_valid(self, feature, entity, target_entity_id, child_feature, child_entity, entity_path, forward, where)
53
54 if type(feature) != list:
---> 55 return func(*args)
56
57 else:
/Users/nbernini/OneDrive/PSC/venv/ml20/lib/python2.7/site-packages/featuretools/synthesis/dfs_filters.pyc in apply_filter(self, parent_entity, child_entity, target_entity_id, entity_path, forward)
76 if (parent_entity.id == target_entity_id or
77 es.find_backward_path(parent_entity.id,
---> 78 target_entity_id) is None):
79 return True
80 path = es.find_backward_path(parent_entity.id, child_entity.id)
/Users/nbernini/OneDrive/PSC/venv/ml20/lib/python2.7/site-packages/featuretools/entityset/base_entityset.pyc in find_backward_path(self, start_entity_id, goal_entity_id)
308 is returned if no path exists.
309 """
--> 310 forward_path = self.find_forward_path(goal_entity_id, start_entity_id)
311 if forward_path is not None:
312 return forward_path[::-1]
/Users/nbernini/OneDrive/PSC/venv/ml20/lib/python2.7/site-packages/featuretools/entityset/base_entityset.pyc in find_forward_path(self, start_entity_id, goal_entity_id)
287
288 for r in self.get_forward_relationships(start_entity_id):
--> 289 new_path = self.find_forward_path(r.parent_entity.id, goal_entity_id)
290 if new_path is not None:
291 return [r] + new_path
... last 1 frames repeated, from the frame below ...
/Users/nbernini/OneDrive/PSC/venv/ml20/lib/python2.7/site-packages/featuretools/entityset/base_entityset.pyc in find_forward_path(self, start_entity_id, goal_entity_id)
287
288 for r in self.get_forward_relationships(start_entity_id):
--> 289 new_path = self.find_forward_path(r.parent_entity.id, goal_entity_id)
290 if new_path is not None:
291 return [r] + new_path
RuntimeError: maximum recursion depth exceeded

The issue here is cyclical relationships in your entity set. Currently, Deep Feature Synthesis can only create features when there is one unique path between two entities. If you have an entity with a relationship to itself, you would also get this error.
A future release of Featuretools will offer better support for this use case.

Related

fitting ARIMA model including exogeneos predictor getting DLASCL error

I have two pandas dataframes. the dataframes are named stationary_train and test_exog. I have the shape of the dataframes below. I also have some sample data from the dataframes. I'm trying to fit an arima model to the data using ARIMA from statsmodel. I want to forecast the endogeneous variable "stationary_train" using exogeneous variable "test_exog". I'm using the code below. I'm getting the error below, I'm unclear why. the two dataframes are the same shape, and I don't see any missing values in either dataframe. can anyone see what the issue is and suggest how to fix it?
data:
test_exog[:-1].shape
(203, 1)
stationary_train.shape
(203, 1)
print(exog_auto_model.order)
(11, 0, 6)
print(test_exog[:-1].head())
exog_passengers
month
2000-01-01 46513.9
2000-02-01 48555.7
2000-03-01 58812.4
2000-04-01 56101.1
2000-05-01 58237.4
print(stationary_train.head())
passengers
month
2000-02-01 2034.0
2000-03-01 10238.0
2000-04-01 -2731.0
2000-05-01 2168.0
2000-06-01 2872.0
code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# statmodels
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima_model import ARMA, ARIMA
# datetime
from datetime import datetime
ARIMA(endog=stationary_train.values.reshape(-1,1),
exog=test_exog[:-1],
order=exog_auto_model.order).fit()
error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-90-ae6d07b32cf7> in <module>
1 exog_predict= ARIMA(endog=stationary_train.values.reshape(-1,1),
2 exog=test_exog[:-1],
----> 3 order=exog_auto_model.order).fit()
~/anaconda3/envs/arima_forecast/lib/python3.6/site-packages/statsmodels/tsa/arima_model.py in fit(self, start_params, trend, method, transparams, solver, maxiter, full_output, disp, callback, start_ar_lags, **kwargs)
1028 maxiter=maxiter,
1029 full_output=full_output, disp=disp,
-> 1030 callback=callback, **kwargs)
1031 params = mlefit.params
1032
~/anaconda3/envs/arima_forecast/lib/python3.6/site-packages/statsmodels/base/model.py in fit(self, start_params, method, maxiter, full_output, disp, fargs, callback, retall, skip_hessian, **kwargs)
525 callback=callback,
526 retall=retall,
--> 527 full_output=full_output)
528
529 # NOTE: this is for fit_regularized and should be generalized
~/anaconda3/envs/arima_forecast/lib/python3.6/site-packages/statsmodels/base/optimizer.py in _fit(self, objective, gradient, start_params, fargs, kwargs, hessian, method, maxiter, full_output, disp, callback, retall)
225 disp=disp, maxiter=maxiter, callback=callback,
226 retall=retall, full_output=full_output,
--> 227 hess=hessian)
228
229 optim_settings = {'optimizer': method, 'start_params': start_params,
~/anaconda3/envs/arima_forecast/lib/python3.6/site-packages/statsmodels/base/optimizer.py in _fit_lbfgs(f, score, start_params, fargs, kwargs, disp, maxiter, callback, retall, full_output, hess)
630 callback=callback, args=fargs,
631 bounds=bounds, disp=disp,
--> 632 **extra_kwargs)
633
634 if full_output:
~/anaconda3/envs/arima_forecast/lib/python3.6/site-packages/scipy/optimize/lbfgsb.py in fmin_l_bfgs_b(func, x0, fprime, args, approx_grad, bounds, m, factr, pgtol, epsilon, iprint, maxfun, maxiter, disp, callback, maxls)
196
197 res = _minimize_lbfgsb(fun, x0, args=args, jac=jac, bounds=bounds,
--> 198 **opts)
199 d = {'grad': res['jac'],
200 'task': res['message'],
~/anaconda3/envs/arima_forecast/lib/python3.6/site-packages/scipy/optimize/lbfgsb.py in _minimize_lbfgsb(fun, x0, args, jac, bounds, disp, maxcor, ftol, gtol, eps, maxfun, maxiter, iprint, callback, maxls, finite_diff_rel_step, **unknown_options)
306 sf = _prepare_scalar_function(fun, x0, jac=jac, args=args, epsilon=eps,
307 bounds=new_bounds,
--> 308 finite_diff_rel_step=finite_diff_rel_step)
309
310 func_and_grad = sf.fun_and_grad
~/anaconda3/envs/arima_forecast/lib/python3.6/site-packages/scipy/optimize/optimize.py in _prepare_scalar_function(fun, x0, jac, args, bounds, epsilon, finite_diff_rel_step, hess)
260 # calculation reduces overall function evaluations.
261 sf = ScalarFunction(fun, x0, args, grad, hess,
--> 262 finite_diff_rel_step, bounds, epsilon=epsilon)
263
264 return sf
~/anaconda3/envs/arima_forecast/lib/python3.6/site-packages/scipy/optimize/_differentiable_functions.py in __init__(self, fun, x0, args, grad, hess, finite_diff_rel_step, finite_diff_bounds, epsilon)
74
75 self._update_fun_impl = update_fun
---> 76 self._update_fun()
77
78 # Gradient evaluation
~/anaconda3/envs/arima_forecast/lib/python3.6/site-packages/scipy/optimize/_differentiable_functions.py in _update_fun(self)
164 def _update_fun(self):
165 if not self.f_updated:
--> 166 self._update_fun_impl()
167 self.f_updated = True
168
~/anaconda3/envs/arima_forecast/lib/python3.6/site-packages/scipy/optimize/_differentiable_functions.py in update_fun()
71
72 def update_fun():
---> 73 self.f = fun_wrapped(self.x)
74
75 self._update_fun_impl = update_fun
~/anaconda3/envs/arima_forecast/lib/python3.6/site-packages/scipy/optimize/_differentiable_functions.py in fun_wrapped(x)
68 def fun_wrapped(x):
69 self.nfev += 1
---> 70 return fun(x, *args)
71
72 def update_fun():
~/anaconda3/envs/arima_forecast/lib/python3.6/site-packages/statsmodels/base/model.py in f(params, *args)
499
500 def f(params, *args):
--> 501 return -self.loglike(params, *args) / nobs
502
503 if method == 'newton':
~/anaconda3/envs/arima_forecast/lib/python3.6/site-packages/statsmodels/tsa/arima_model.py in loglike(self, params, set_sigma2)
839 method = self.method
840 if method in ['mle', 'css-mle']:
--> 841 return self.loglike_kalman(params, set_sigma2)
842 elif method == 'css':
843 return self.loglike_css(params, set_sigma2)
~/anaconda3/envs/arima_forecast/lib/python3.6/site-packages/statsmodels/tsa/arima_model.py in loglike_kalman(self, params, set_sigma2)
849 Compute exact loglikelihood for ARMA(p,q) model by the Kalman Filter.
850 """
--> 851 return KalmanFilter.loglike(params, self, set_sigma2)
852
853 def loglike_css(self, params, set_sigma2=True):
~/anaconda3/envs/arima_forecast/lib/python3.6/site-packages/statsmodels/tsa/kalmanf/kalmanfilter.py in loglike(cls, params, arma_model, set_sigma2)
218 loglike, sigma2 = kalman_loglike.kalman_loglike_double(
219 y, k, k_ar, k_ma, k_lags, int(nobs),
--> 220 Z_mat, R_mat, T_mat)
221 elif np.issubdtype(paramsdtype, np.complex128):
222 loglike, sigma2 = kalman_loglike.kalman_loglike_complex(
statsmodels/tsa/kalmanf/kalman_loglike.pyx in statsmodels.tsa.kalmanf.kalman_loglike.kalman_loglike_double()
statsmodels/tsa/kalmanf/kalman_loglike.pyx in statsmodels.tsa.kalmanf.kalman_loglike.kalman_filter_double()
<__array_function__ internals> in pinv(*args, **kwargs)
~/anaconda3/envs/arima_forecast/lib/python3.6/site-packages/numpy/linalg/linalg.py in pinv(a, rcond, hermitian)
2001 return wrap(res)
2002 a = a.conjugate()
-> 2003 u, s, vt = svd(a, full_matrices=False, hermitian=hermitian)
2004
2005 # discard small singular values
<__array_function__ internals> in svd(*args, **kwargs)
~/anaconda3/envs/arima_forecast/lib/python3.6/site-packages/numpy/linalg/linalg.py in svd(a, full_matrices, compute_uv, hermitian)
1659
1660 signature = 'D->DdD' if isComplexType(t) else 'd->ddd'
-> 1661 u, s, vh = gufunc(a, signature=signature, extobj=extobj)
1662 u = u.astype(result_t, copy=False)
1663 s = s.astype(_realType(result_t), copy=False)
ValueError: On entry to DLASCL parameter number 4 had an illegal value

eli5 explain_weights_xgboost KeyError: 'bias'

I am new to xgboost, I trained a model, that works pretty well. Now I am trying to use eli5 to see the weights and I get: KeyError: 'bias'
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
in
3 clf6 = model6.named_steps['clf']
4 vec6 = model6.named_steps['transformer']
----> 5 explain_weights_xgboost(clf6, vec=vec6)
~/dev/envs/env3.7/lib/python3.7/site-packages/eli5/xgboost.py in explain_weights_xgboost(xgb, vec, top, target_names, targets, feature_names, feature_re, feature_filter, importance_type)
80 description=DESCRIPTION_XGBOOST,
81 is_regression=is_regression,
---> 82 num_features=coef.shape[-1],
83 )
84
~/dev/envs/env3.7/lib/python3.7/site-packages/eli5/_feature_importances.py in get_feature_importance_explanation(estimator, vec, coef, feature_names, feature_filter, feature_re, top, description, is_regression, estimator_feature_names, num_features, coef_std)
35 feature_filter=feature_filter,
36 feature_re=feature_re,
---> 37 num_features=num_features,
38 )
39 feature_importances = get_feature_importances_filtered(
~/dev/envs/env3.7/lib/python3.7/site-packages/eli5/sklearn/utils.py in get_feature_names_filtered(clf, vec, bias_name, feature_names, num_features, feature_filter, feature_re, estimator_feature_names)
124 feature_names=feature_names,
125 num_features=num_features,
--> 126 estimator_feature_names=estimator_feature_names,
127 )
128 return feature_names.handle_filter(feature_filter, feature_re)
~/dev/envs/env3.7/lib/python3.7/site-packages/eli5/sklearn/utils.py in get_feature_names(clf, vec, bias_name, feature_names, num_features, estimator_feature_names)
77 features are named x0, x1, x2, etc.
78 """
---> 79 if not has_intercept(clf):
80 bias_name = None
81
~/dev/envs/env3.7/lib/python3.7/site-packages/eli5/sklearn/utils.py in has_intercept(estimator)
60 if hasattr(estimator, 'fit_intercept'):
61 return estimator.fit_intercept
---> 62 if hasattr(estimator, 'intercept_'):
63 if estimator.intercept_ is None:
64 return False
~/dev/envs/env3.7/lib/python3.7/site-packages/xgboost/sklearn.py in intercept_(self)
743 .format(self.booster))
744 b = self.get_booster()
--> 745 return np.array(json.loads(b.get_dump(dump_format='json')[0])['bias'])
746
747
KeyError: 'bias'
Thank you!
I had the same issue and fixed it by specifying explicitly the argument booster when creating the estimator:
clf = XGBClassifier(booster='gbtree')

FeatureTools TypeError: unhashable type: 'set'

I'm trying this code for featuretools:
features, feature_names = ft.dfs(entityset = es, target_entity = 'demo',
agg_primitives = ['count', 'max', 'time_since_first', 'median', 'time_since_last', 'avg_time_between',
'sum', 'mean'],
trans_primitives = ['is_weekend', 'year', 'week', 'divide_by_feature', 'percentile'])
But I had this error
TypeError Traceback (most recent call last)
<ipython-input-17-89e925ff895d> in <module>
3 agg_primitives = ['count', 'max', 'time_since_first', 'median', 'time_since_last', 'avg_time_between',
4 'sum', 'mean'],
----> 5 trans_primitives = ['is_weekend', 'year', 'week', 'divide_by_feature', 'percentile'])
~/.local/lib/python3.6/site-packages/featuretools/utils/entry_point.py in function_wrapper(*args, **kwargs)
44 ep.on_error(error=e,
45 runtime=runtime)
---> 46 raise e
47
48 # send return value
~/.local/lib/python3.6/site-packages/featuretools/utils/entry_point.py in function_wrapper(*args, **kwargs)
36 # call function
37 start = time.time()
---> 38 return_value = func(*args, **kwargs)
39 runtime = time.time() - start
40 except Exception as e:
~/.local/lib/python3.6/site-packages/featuretools/synthesis/dfs.py in dfs(entities, relationships, entityset, target_entity, cutoff_time, instance_ids, agg_primitives, trans_primitives, groupby_trans_primitives, allowed_paths, max_depth, ignore_entities, ignore_variables, seed_features, drop_contains, drop_exact, where_primitives, max_features, cutoff_time_in_index, save_progress, features_only, training_window, approximate, chunk_size, n_jobs, dask_kwargs, verbose, return_variable_types)
226 n_jobs=n_jobs,
227 dask_kwargs=dask_kwargs,
--> 228 verbose=verbose)
229 return feature_matrix, features
~/.local/lib/python3.6/site-packages/featuretools/computational_backends/calculate_feature_matrix.py in calculate_feature_matrix(features, entityset, cutoff_time, instance_ids, entities, relationships, cutoff_time_in_index, training_window, approximate, save_progress, verbose, chunk_size, n_jobs, dask_kwargs)
265 cutoff_df_time_var=cutoff_df_time_var,
266 target_time=target_time,
--> 267 pass_columns=pass_columns)
268
269 feature_matrix = pd.concat(feature_matrix)
~/.local/lib/python3.6/site-packages/featuretools/computational_backends/calculate_feature_matrix.py in linear_calculate_chunks(chunks, feature_set, approximate, training_window, verbose, save_progress, entityset, no_unapproximated_aggs, cutoff_df_time_var, target_time, pass_columns)
496 no_unapproximated_aggs,
497 cutoff_df_time_var,
--> 498 target_time, pass_columns)
499 feature_matrix.append(_feature_matrix)
500 # Do a manual garbage collection in case objects from calculate_chunk
~/.local/lib/python3.6/site-packages/featuretools/computational_backends/calculate_feature_matrix.py in calculate_chunk(chunk, feature_set, entityset, approximate, training_window, verbose, save_progress, no_unapproximated_aggs, cutoff_df_time_var, target_time, pass_columns)
341 ids,
342 precalculated_features=precalculated_features_trie,
--> 343 training_window=window)
344
345 id_name = _feature_matrix.index.name
~/.local/lib/python3.6/site-packages/featuretools/computational_backends/utils.py in wrapped(*args, **kwargs)
35 def wrapped(*args, **kwargs):
36 if save_progress is None:
---> 37 r = method(*args, **kwargs)
38 else:
39 time = args[0].to_pydatetime()
~/.local/lib/python3.6/site-packages/featuretools/computational_backends/calculate_feature_matrix.py in calc_results(time_last, ids, precalculated_features, training_window)
316 ignored=all_approx_feature_set)
317
--> 318 matrix = calculator.run(ids)
319 return matrix
320
~/.local/lib/python3.6/site-packages/featuretools/computational_backends/feature_set_calculator.py in run(self, instance_ids)
100 precalculated_trie=self.precalculated_features,
101 filter_variable=target_entity.index,
--> 102 filter_values=instance_ids)
103
104 # The dataframe for the target entity should be stored at the root of
~/.local/lib/python3.6/site-packages/featuretools/computational_backends/feature_set_calculator.py in _calculate_features_for_entity(self, entity_id, feature_trie, df_trie, full_entity_df_trie, precalculated_trie, filter_variable, filter_values, parent_data)
187 columns=columns,
188 time_last=self.time_last,
--> 189 training_window=self.training_window)
190
191 # Step 2: Add variables to the dataframe linking it to all ancestors.
~/.local/lib/python3.6/site-packages/featuretools/entityset/entity.py in query_by_values(self, instance_vals, variable_id, columns, time_last, training_window)
271
272 if columns is not None:
--> 273 df = df[columns]
274
275 return df
~/.local/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
2686 return self._getitem_multilevel(key)
2687 else:
-> 2688 return self._getitem_column(key)
2689
2690 def _getitem_column(self, key):
~/.local/lib/python3.6/site-packages/pandas/core/frame.py in _getitem_column(self, key)
2693 # get column
2694 if self.columns.is_unique:
-> 2695 return self._get_item_cache(key)
2696
2697 # duplicate columns & possible reduce dimensionality
~/.local/lib/python3.6/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
2485 """Return the cached item, item represents a label indexer."""
2486 cache = self._item_cache
-> 2487 res = cache.get(item)
2488 if res is None:
2489 values = self._data.get(item)
TypeError: unhashable type: 'set'
I also tried the simplest code for deep feature synthesis (dfs) as shown below, but it still encountered the same error
features, feature_names = ft.dfs(entityset = es, target_entity = 'demo')
I'm not really sure why I encountered this error, any help or recommendations on how to go about from here is deeply appreciated.
Thanks in advance for your help!
I found a solution, my current version had bugs in it that was fixed by the FeatureTools team. Just run pip install directly from master,
pip install --upgrade https://github.com/featuretools/featuretools/zipball/master
This fixed and has been released in Featuretools 0.9.1. If you upgrade to the latest version of Featuretools, it will go away.

package cvxpy new syntax error summing entries

I’m new to the cvxpy package. I’m trying to use it to work through an example from the following blog:
https://towardsdatascience.com/integer-programming-in-python-1cbdfa240df2
Where we’re trying to optimize the combination of marketing channels sent to a customer.
There’s been some recent changes to the cvxpy package and I’m getting the error below when I try to run the sum_entries step, (which has in the latest version been changed to cvxpy.sum)
I think the problem is coming from the dimensions of “selection” and “TRANSFORMER” being incompatible, but I’m not familiar enough with the cvxpy package to know. Any tips are greatly appreciated.
Code:
test_probs.shape
(200, 8)
Code:
# selection = cvxpy.Bool(*test_probs.shape) # syntax changed in latest version
selection = cvxpy.Variable(*test_probs.shape, boolean=True)
# constraints
# Constant matrix that counts how many of each
# material we sent to each customer
TRANSFORMER = np.array([[1,0,0],
[0,1,0],
[0,0,1],
[1,1,0],
[1,0,1],
[0,1,1],
[1,1,1],
[0,0,0]])
# can't send customer more promotion than there is supply
# note: sum_entries changed to sum in latest cvxpy version
supply_constraint = cvxpy.sum(selection * TRANSFORMER, axis=0) <= supply
Error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-47-f2ebf41a00af> in <module>()
18 # note: sum_entries changed to sum in latest cvxpy version
19
---> 20 supply_constraint = cvxpy.sum(selection * TRANSFORMER, axis=0) <= supply
21
22
~/anaconda2/envs/py36/lib/python3.6/site-packages/cvxpy/expressions/expression.py in cast_op(self, other)
47 """
48 other = self.cast_to_const(other)
---> 49 return binary_op(self, other)
50 return cast_op
51
~/anaconda2/envs/py36/lib/python3.6/site-packages/cvxpy/expressions/expression.py in __mul__(self, other)
385 return cvxtypes.multiply_expr()(self, other)
386 elif self.is_constant() or other.is_constant():
--> 387 return cvxtypes.mul_expr()(self, other)
388 else:
389 warnings.warn("Forming a nonconvex expression.")
~/anaconda2/envs/py36/lib/python3.6/site-packages/cvxpy/atoms/affine/binary_operators.py in __init__(self, lh_exp, rh_exp)
41
42 def __init__(self, lh_exp, rh_exp):
---> 43 super(BinaryOperator, self).__init__(lh_exp, rh_exp)
44
45 def name(self):
~/anaconda2/envs/py36/lib/python3.6/site-packages/cvxpy/atoms/atom.py in __init__(self, *args)
42 self.args = [Atom.cast_to_const(arg) for arg in args]
43 self.validate_arguments()
---> 44 self._shape = self.shape_from_args()
45 if len(self._shape) > 2:
46 raise ValueError("Atoms must be at most 2D.")
~/anaconda2/envs/py36/lib/python3.6/site-packages/cvxpy/atoms/affine/binary_operators.py in shape_from_args(self)
107 """Returns the (row, col) shape of the expression.
108 """
--> 109 return u.shape.mul_shapes(self.args[0].shape, self.args[1].shape)
110
111 def is_atom_convex(self):
~/anaconda2/envs/py36/lib/python3.6/site-packages/cvxpy/utilities/shape.py in mul_shapes(lh_shape, rh_shape)
140 lh_old = lh_shape
141 rh_old = rh_shape
--> 142 lh_shape, rh_shape, shape = mul_shapes_promote(lh_shape, rh_shape)
143 if lh_shape != lh_old:
144 shape = shape[1:]
~/anaconda2/envs/py36/lib/python3.6/site-packages/cvxpy/utilities/shape.py in mul_shapes_promote(lh_shape, rh_shape)
107 if lh_mat_shape[1] != rh_mat_shape[0]:
108 raise ValueError("Incompatible dimensions %s %s" % (
--> 109 lh_shape, rh_shape))
110 if lh_shape[:-2] != rh_shape[:-2]:
111 raise ValueError("Incompatible dimensions %s %s" % (
ValueError: Incompatible dimensions (1, 200) (8, 3)
Update:
I tried changing the selection shape as suggested in the comment below.
code:
selection = cvxpy.Variable(test_probs.shape, boolean=True)
and now I get the new error when I run the supply_constraint part of the code below.
code:
# constraints
# Constant matrix that counts how many of each
# material we sent to each customer
TRANSFORMER = np.array([[1,0,0],
[0,1,0],
[0,0,1],
[1,1,0],
[1,0,1],
[0,1,1],
[1,1,1],
[0,0,0]])
# can't send customer more promotion than there is supply
# note: sum_entries changed to sum in latest cvxpy version
supply_constraint = cvxpy.sum(selection * TRANSFORMER, axis=0) <= supply
Error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-10-6eb7a55ea896> in <module>()
18 # note: sum_entries changed to sum in latest cvxpy version
19
---> 20 supply_constraint = cvxpy.sum(selection * TRANSFORMER, axis=0) <= supply
21
22
~/anaconda2/envs/py36/lib/python3.6/site-packages/cvxpy/expressions/expression.py in cast_op(self, other)
47 """
48 other = self.cast_to_const(other)
---> 49 return binary_op(self, other)
50 return cast_op
51
~/anaconda2/envs/py36/lib/python3.6/site-packages/cvxpy/expressions/expression.py in __le__(self, other)
482 """NonPos : Creates an inequality constraint.
483 """
--> 484 return NonPos(self - other)
485
486 def __lt__(self, other):
~/anaconda2/envs/py36/lib/python3.6/site-packages/cvxpy/expressions/expression.py in cast_op(self, other)
47 """
48 other = self.cast_to_const(other)
---> 49 return binary_op(self, other)
50 return cast_op
51
~/anaconda2/envs/py36/lib/python3.6/site-packages/cvxpy/expressions/expression.py in __sub__(self, other)
370 """Expression : The difference of two expressions.
371 """
--> 372 return self + -other
373
374 #_cast_other
~/anaconda2/envs/py36/lib/python3.6/site-packages/cvxpy/expressions/expression.py in cast_op(self, other)
47 """
48 other = self.cast_to_const(other)
---> 49 return binary_op(self, other)
50 return cast_op
51
~/anaconda2/envs/py36/lib/python3.6/site-packages/cvxpy/expressions/expression.py in __add__(self, other)
358 """Expression : Sum two expressions.
359 """
--> 360 return cvxtypes.add_expr()([self, other])
361
362 #_cast_other
~/anaconda2/envs/py36/lib/python3.6/site-packages/cvxpy/atoms/affine/add_expr.py in __init__(self, arg_groups)
34 # For efficiency group args as sums.
35 self._arg_groups = arg_groups
---> 36 super(AddExpression, self).__init__(*arg_groups)
37 self.args = []
38 for group in arg_groups:
~/anaconda2/envs/py36/lib/python3.6/site-packages/cvxpy/atoms/atom.py in __init__(self, *args)
42 self.args = [Atom.cast_to_const(arg) for arg in args]
43 self.validate_arguments()
---> 44 self._shape = self.shape_from_args()
45 if len(self._shape) > 2:
46 raise ValueError("Atoms must be at most 2D.")
~/anaconda2/envs/py36/lib/python3.6/site-packages/cvxpy/atoms/affine/add_expr.py in shape_from_args(self)
42 """Returns the (row, col) shape of the expression.
43 """
---> 44 return u.shape.sum_shapes([arg.shape for arg in self.args])
45
46 def expand_args(self, expr):
~/anaconda2/envs/py36/lib/python3.6/site-packages/cvxpy/utilities/shape.py in sum_shapes(shapes)
50 raise ValueError(
51 "Cannot broadcast dimensions " +
---> 52 len(shapes)*" %s" % tuple(shapes))
53
54 longer = shape if len(shape) >= len(t) else t
ValueError: Cannot broadcast dimensions (3,) (1, 3)
Your issue is happening when you create the selection variable. You are unpacking the shape tuple into multiple arguments. The first argument to Variable should be a shape. So the correct construction is:
selection = cvxpy.Variable(test_probs.shape, boolean=True)
You can verify this is correct by inspecting the shape attribute:
selection.shape
Which should now give:
(200, 8)

fit_transform error using CountVectorizer

So I have a dataframe X which looks something like this:
X.head()
0 My wife took me here on my birthday for breakf...
1 I have no idea why some people give bad review...
3 Rosie, Dakota, and I LOVE Chaparral Dog Park!!...
4 General Manager Scott Petello is a good egg!!!...
6 Drop what you're doing and drive here. After I...
Name: text, dtype: object
And then,
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer()
X = cv.fit_transform(X)
But I get this error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-61-8ff79b91e317> in <module>()
----> 1 X = cv.fit_transform(X)
~/anaconda3/lib/python3.6/site-packages/sklearn/feature_extraction/text.py in fit_transform(self, raw_documents, y)
867
868 vocabulary, X = self._count_vocab(raw_documents,
--> 869 self.fixed_vocabulary_)
870
871 if self.binary:
~/anaconda3/lib/python3.6/site-packages/sklearn/feature_extraction/text.py in _count_vocab(self, raw_documents, fixed_vocab)
790 for doc in raw_documents:
791 feature_counter = {}
--> 792 for feature in analyze(doc):
793 try:
794 feature_idx = vocabulary[feature]
~/anaconda3/lib/python3.6/site-packages/sklearn/feature_extraction/text.py in <lambda>(doc)
264
265 return lambda doc: self._word_ngrams(
--> 266 tokenize(preprocess(self.decode(doc))), stop_words)
267
268 else:
~/anaconda3/lib/python3.6/site-packages/sklearn/feature_extraction/text.py in <lambda>(x)
230
231 if self.lowercase:
--> 232 return lambda x: strip_accents(x.lower())
233 else:
234 return strip_accents
~/anaconda3/lib/python3.6/site-packages/scipy/sparse/base.py in __getattr__(self, attr)
574 return self.getnnz()
575 else:
--> 576 raise AttributeError(attr + " not found")
577
578 def transpose(self, axes=None, copy=False):
AttributeError: lower not found
No idea why.
You need to specify the column name of the text data even if the dataframe has single column.
X_countMatrix = cv.fit_transform(X['text'])
Because a CountVectorizer expects an iterable as input and when you supply a dataframe as an argument, only thing thats iterated is the column names. So even if you did not have any errors, that would be incorrect. Lucky that you got an error and got a chance to correct it.

Resources