I am doing Document Classification and obtained accuracy upto 76%. And while predicting the document category i did following one
doc_clf.predict(tf_idf.transform((count_vect.transform([r'document']))))
and i get the following error:
File "/usr/local/lib/python3.5/dist- packages/sklearn/utils/metaestimators.py", line 115, in <lambda>
out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/sklearn/pipeline.py", line 306, in predict
Xt = transform.transform(Xt)
File "/usr/local/lib/python3.5/dist-packages/sklearn/feature_extraction/text.py", line 923, in transform
_, X = self._count_vocab(raw_documents, fixed_vocab=True)
File "/usr/local/lib/python3.5/dist-packages/sklearn/feature_extraction/text.py", line 792, in _count_vocab
for feature in analyze(doc):
File "/usr/local/lib/python3.5/dist-packages/sklearn/feature_extraction/text.py", line 266, in <lambda>
tokenize(preprocess(self.decode(doc))), stop_words)
File "/usr/local/lib/python3.5/dist-packages/sklearn/feature_extraction/text.py", line 232, in <lambda>
return lambda x: strip_accents(x.lower())
File "/usr/local/lib/python3.5/dist-packages/scipy/sparse/base.py", line 647, in __getattr__
raise AttributeError(attr + " not found")
How do i correct this error ? And any other way to improve the accuracy further?
I share link to review full code
Full Code
In your code, doc_clf is a pipeline. So the tf_idf.transform() and count_vect.transform() will be handled automatically by the pipeline.
You should only call
category = doc_clf.predict([r'document'])
As this document passes through the pipeline, it will be automatically transformed by the CountVectorizer and TfidfTransformer.
Related
I am training a PyTorch model. I am able to run the training script successfully on GPU instances (for instance on EC2 instances with pytorch_p36 conda evnrionment activated). Here is the script for reference:
https://github.com/zhanghang1989/PyTorch-Encoding/blob/master/experiments/segmentation/train.py
I adapted the script to run under SageMaker but there i get this error generated by something smdebug is doing:
File "ss_training_entrypoint.py", line 400, in <module>
trainer.training(epoch)
File "ss_training_entrypoint.py", line 315, in training
loss = self.criterion(outputs, target)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 543, in __call__
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/encoding/parallel.py", line 132, in forward
outputs = _criterion_parallel_apply(replicas, inputs, targets, kwargs)
File "/opt/conda/lib/python3.6/site-packages/encoding/parallel.py", line 185, in _criterion_parallel_apply
raise output
File "/opt/conda/lib/python3.6/site-packages/encoding/parallel.py", line 160, in _worker
output = module(*(input + target), **kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 545, in __call__
hook_result = hook(self, input, result)
File "/opt/conda/lib/python3.6/site-packages/smdebug/pytorch/hook.py", line 156, in forward_hook
module_name = module._module_name
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 587, in __getattr__
type(self).__name__, name))
AttributeError: 'SegmentationLosses' object has no attribute '_module_name'
Does anyone know why this is happening and how to fix it?
Alternatively, is it possible to disable the smdebug hooks while not losing the SageMaker functionalities (i.e. Having the model trained and usable)?
Thanks very much!
I am training a BPR model following this repo https://github.com/guoyang9/BPR-pytorch
I have two experiments, and they are very similar. I preprocessed it use the same method, but one of them experiences the following error. I think it could be item size problem, but still not be able to figure out the exact reason. Could someone guide me through here? Thank you.
Traceback (most recent call last):
File , line 215, in <module>
HR, NDCG = metrics(model, test_loader, top_k)
File "t.py", line 174, in metrics
prediction_i, prediction_j = model(user, item_i, item_j)
File "anaconda3/envs/BPR/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/t.py", line 137, in forward
item_i = self.embed_item(item_i)
File "/anaconda3/envs/BPR/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda3/envs/BPR/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 158, in forward
return F.embedding(
File "/anaconda3/envs/BPR/lib/python3.10/site-packages/torch/nn/functional.py", line 2183, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self
When I call model.predict_proba(X) on my StackingClassifier model the run execution crashes because the library is calling a method assert_all_finite() to check whether my dataframe contains missing values.
Since the estimators I stacked are able to handle missing values, I don't see the reason why this should happen and I didn't find anything in the documentation that says that the StackingClassifier requires data without missing values.
It's a bit hard for me to come up with a short reproducibile snippet of code given that it comes from several layers of model abstraction, but I can print out the model effectively raising the error call.
p = model.predict_proba(X_loyal)
where model is:
StackingClassifier(estimators=[('ExtraTreesClassifier_117',
ExtraTreesClassifier(bootstrap=True,
class_weight={0: 1, 1: 5},
criterion='entropy',
max_depth=11,
max_features='log2',
max_samples=0.5946040593595099,
min_samples_leaf=2,
n_estimators=163,
random_state=117)),
('RandomForestClassifier_117',
RandomForestClassifier(class_weight={0: 1,
1: 5},
criterion='entropy',
max_depth=11,
max_features='log2',
max_samples=0.5946040593595099,
min_samples_leaf=2,
n_estimators=163,
random_state=117)),
('LGBMClassifier_117',
LGBMClassifier(class_weight={0: 1, 1: 1},
deterministic=True, max_depth=9,
n_estimators=183, num_leaves=3,
subsample=0.2986274713775564,
verbose=-1))])
Error
Traceback (most recent call last):
File "/home/mlpoc/miniconda3/envs/churn/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3417, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-12-eafa75c49322>", line 1, in <module>
model.predict_proba(X_loyal)
File "/home/mlpoc/miniconda3/envs/churn/lib/python3.8/site-packages/sklearn/utils/metaestimators.py", line 120, in <lambda>
out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
File "/home/mlpoc/miniconda3/envs/churn/lib/python3.8/site-packages/sklearn/ensemble/_stacking.py", line 485, in predict_proba
return self.final_estimator_.predict_proba(self.transform(X))
File "/home/mlpoc/miniconda3/envs/churn/lib/python3.8/site-packages/sklearn/ensemble/_stacking.py", line 522, in transform
return self._transform(X)
File "/home/mlpoc/miniconda3/envs/churn/lib/python3.8/site-packages/sklearn/ensemble/_stacking.py", line 215, in _transform
predictions = [
File "/home/mlpoc/miniconda3/envs/churn/lib/python3.8/site-packages/sklearn/ensemble/_stacking.py", line 216, in <listcomp>
getattr(est, meth)(X)
File "/home/mlpoc/miniconda3/envs/churn/lib/python3.8/site-packages/sklearn/ensemble/_forest.py", line 674, in predict_proba
X = self._validate_X_predict(X)
File "/home/mlpoc/miniconda3/envs/churn/lib/python3.8/site-packages/sklearn/ensemble/_forest.py", line 422, in _validate_X_predict
return self.estimators_[0]._validate_X_predict(X, check_input=True)
File "/home/mlpoc/miniconda3/envs/churn/lib/python3.8/site-packages/sklearn/tree/_classes.py", line 407, in _validate_X_predict
X = self._validate_data(X, dtype=DTYPE, accept_sparse="csr",
File "/home/mlpoc/miniconda3/envs/churn/lib/python3.8/site-packages/sklearn/base.py", line 421, in _validate_data
X = check_array(X, **check_params)
File "/home/mlpoc/miniconda3/envs/churn/lib/python3.8/site-packages/sklearn/utils/validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "/home/mlpoc/miniconda3/envs/churn/lib/python3.8/site-packages/sklearn/utils/validation.py", line 720, in check_array
_assert_all_finite(array,
File "/home/mlpoc/miniconda3/envs/churn/lib/python3.8/site-packages/sklearn/utils/validation.py", line 103, in _assert_all_finite
raise ValueError(
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
Versions
sklearn.__version__
Out[6]: '0.24.2'
lightgbm.__version__
Out[8]: '3.2.1'
I'm new to PyTorch and going through this tutorial on the transformer model. I'm using PyCharm on Win10.
For now, I've basically just copy-pasted the example code, but I'm getting the following error:
RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got CPUType instead (while checking arguments for embedding)
It seems to be coming from this line
def encode(self, src, src_mask):
return self.encodder(self.src_embed(src), src_mask)
Tbh, I'm not even sure what this means, let alone how I should go about fixing it.
What's a CPUType? When did I create a variable of such type? From looking at the code I'm only using tensors (or numpy arrays)
here's the full error message:
C:...\Python\Python37\lib\site-packages\torch\nn_reduction.py:46: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
warnings.warn(warning.format(ret))
C:/.../PycharmProjects/Transformer/all_the_code.py:263: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
nn.init.xavier_uniform(p)
Traceback (most recent call last):
File "C:/.../PycharmProjects/Transformer/all_the_code.py", line 421, in
SimpleLossCompute(model.generator, criterion, model_opt))
File "C:/.../PycharmProjects/Transformer/all_the_code.py", line 297, in run_epoch
batch.src_mask, batch.trg_mask)
File "C:/.../PycharmProjects/Transformer/all_the_code.py", line 30, in forward
return self.decode(self.encode(src, src_mask), src_mask,
File "C:/.../PycharmProjects/Transformer/all_the_code.py", line 34, in encode
return self.encoder(self.src_embed(src), src_mask)
File "C:...\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "C:...\Python\Python37\lib\site-packages\torch\nn\modules\container.py", line 92, in forward
input = module(input)
File "C:...\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "C:/.../PycharmProjects/Transformer/all_the_code.py", line 218, in forward
return self.lut(x) * math.sqrt(self.d_model)
File "C:...\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "C:...\Python\Python37\lib\site-packages\torch\nn\modules\sparse.py", line 117, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "C:...\Python\Python37\lib\site-packages\torch\nn\functional.py", line 1506, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
I'm new with keras, I'm trying to create new metric to the Keras, but I'm having one problem when do Loop with symbolic Keras. Can someone help me?
def function(y_true, y_pred):
l_y_true = K.argmax(y_true, axis=1)
max_value = l_y_true.eval()
for val in range(len(max_value)):
pass
Error:
"/home/user/Documents/Playground/NeuralNetwork/Project/keras/GraphCNN.py", line 60, in my_metric label = l_y_true_max.eval()
File "/home/user/Documents/keras-env/lib/python3.5/site-packages/theano/gof/graph.py", line 520, in eval
self._fn_cache[inputs] = theano.function(inputs, self)
File "/home/user/Documents/keras-env/lib/python3.5/site-packages/theano/compile/function.py", line 320, in function
output_keys=output_keys)
File "/home/user/Documents/keras-env/lib/python3.5/site-packages/theano/compile/pfunc.py", line 479, in pfunc
output_keys=output_keys)
File "/home/user/Documents/keras-env/lib/python3.5/site-packages/theano/compile/function_module.py", line 1776, in orig_function
output_keys=output_keys).create(
File "/home/user/Documents/keras-env/lib/python3.5/site-packages/theano/compile/function_module.py", line 1428, in __init__
accept_inplace)
File "/home/user/Documents/keras-env/lib/python3.5/site-packages/theano/compile/function_module.py", line 177, in std_fgraph
update_mapping=update_mapping)
File "/home/user/Documents/keras-env/lib/python3.5/site-packages/theano/gof/fg.py", line 171, in __init__
self.__import_r__(output, reason="init")
File "/home/user/Documents/keras-env/lib/python3.5/site-packages/theano/gof/fg.py", line 360, in __import_r__
self.__import__(variable.owner, reason=reason)
File "/home/user/Documents/keras-env/lib/python3.5/site-packages/theano/gof/fg.py", line 474, in __import__
r)
theano.gof.fg.MissingInputError: ("An input of the graph, used to compute MaxAndArgmax(output_y_target, TensorConstant{(1,) of 1}), was not provided and not given a value.Use the Theano flag exception_verbosity='high',for more information on this error.", output_y_target)