Using GridSearchCV for NLP Missing Positional Argument Self - python-3.x

I am working on a NLP problem. I've been testing various models and the process has been working fine.
from sklearn.linear_model import SGDClassifier
classifier = SGDClassifier().fit(X_train_tfidf, y_train)
y_predicted_tfidf = classifier.predict(X_test_tfidf)
from sklearn.metrics import precision_score
precision = precision_score(y_test, y_predicted_tfidf, pos_label=None,average='weighted')
print(precision)
>>> 0.79708294305
Now I am trying to employ Grid Search in order find tune parameters and running into an error.
from sklearn.model_selection import GridSearchCV
parameters = {'alpha': [0.00001, 0.0001, 0.001, 0.001, 0.01] }
gs_classifier = GridSearchCV(SGDClassifier, parameters, n_jobs=-1)
gs_classifier = gs_classifier.fit(X_train_tfidf, y_train)
Which results in the following output:
TypeError Traceback (most recent call last)
<ipython-input-25-95b85f78662f> in <module>()
1 gs_classifier = GridSearchCV(SGDClassifier, parameters, n_jobs=-1)
----> 2 gs_classifier = gs_classifier.fit(X_train_tfidf, y_train)
anaconda/lib/python3.6/site-packages/sklearn/model_selection/_search.py in fit(self, X, y, groups)
943 train/test set.
944 """
--> 945 return self._fit(X, y, groups,
...
/anaconda/lib/python3.6/site-packages/sklearn/base.py in clone(estimator, safe)
65 % (repr(estimator), type(estimator)))
66 klass = estimator.__class__
---> 67 new_object_params = estimator.get_params(deep=False)
68 for name, param in six.iteritems(new_object_params):
69 new_object_params[name] = clone(param, safe=False)
TypeError: get_params() missing 1 required positional argument: 'self'
I've tried various combinations of parameters and all result in the same error. For this example I've kept it simple and am just using a range of alpha values.

Related

How can I solve svm predict model problem

Im having problem by svm predict model
from sklearn.svm import SVC
svm_model = SVC(kernel='rbf', C=8, gamma=0.1)
svm_model.fit(X_train_std, y_train)
y_pred = svm_model.predict(X_test_std)
/usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:993: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-53-398f1caaa8e8> in <module>
3 svm_model = SVC(kernel='rbf', C=8, gamma=0.1)
4
----> 5 svm_model.fit(X_train_std, y_train)
6
7 y_pred = svm_model.predict(X_test_std)
2 frames
/usr/local/lib/python3.8/dist-packages/sklearn/utils/multiclass.py in check_classification_targets(y)
195 "multilabel-sequences",
196 ]:
--> 197 raise ValueError("Unknown label type: %r" % y_type)
198
199
ValueError: Unknown label type: 'continuous'
I thought y type problem
train = pd.get_dummies(train, columns=['LSTAT'], drop_first=True)
So I use that but problem was disappeared
Somebody help me

ValueError: continuous is not supported for RandomForestRegressor

After I had Pipeline preprocessed the weld data, I was able to get clean data in the output. Next, I need to pass the cleaned data through the model for training. Both the data preprocessing and model training steps can be further encapsulated in a Pipeline as follows:
from sklearn.ensemble import RandomForestRegressor
completed_pl = Pipeline(
steps=[
("preprocessor", preprocessor),
("classifier", RandomForestRegressor())
]
)
# training
completed_pl.fit(X_train, y_train)
# accuracy
y_train_pred = completed_pl.predict(X_train)
print(f"Accuracy on train: {accuracy_score(list(y_train), list(y_train_pred)):.2f}")
y_pred = completed_pl.predict(X_test)
print(f"Accuracy on test: {accuracy_score(list(y_test), list(y_pred)):.2f}")
I have used load_boston dataset from sklearn
And the error :
ValueError Traceback (most recent call last)
<ipython-input-86-d0b1928cf1a7> in <module>
12 # accuracy
13 y_train_pred = completed_pl.predict(X_train)
---> 14 print(f"Accuracy on train: {accuracy_score(list(y_train), list(y_train_pred)):.2f}")
15
16 y_pred = completed_pl.predict(X_test)
1 frames
/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_classification.py in _check_targets(y_true, y_pred)
102 # No metrics support "multiclass-multioutput" format
103 if y_type not in ["binary", "multiclass", "multilabel-indicator"]:
--> 104 raise ValueError("{0} is not supported".format(y_type))
105
106 if y_type in ["binary", "multiclass"]:
ValueError: continuous is not supported

Classification Metrics for Sequential tagging in NLP

I am writing one sequential tagging code in NLP using python google colab. As I have used crf layer in my model, I used crf.metrices to get the results . I had executed the following code :
//
pred_cat = model.predict(X_te)
pred = np.argmax(pred_cat, axis=-1)
y_te_true = np.argmax(y_te, axis=-1)
tags=['O', 'BOC', 'IOC','<pad>']
from sklearn_crfsuite import metrics as crf_metrics
print(crf_metrics.flat_classification_report(y_true=y_te_true,y_pred=pred,labels=tags))
//
I am getting the following error:
TypeError Traceback (most recent call last)
<ipython-input-21-510856efb26d> in <module>()
1 from sklearn_crfsuite import metrics as crf_metrics
----> 2 print(crf_metrics.flat_classification_report(y_true=y_te_true,y_pred=pred,labels=tags))
1 frames
/usr/local/lib/python3.7/dist-packages/sklearn_crfsuite/metrics.py in flat_classification_report(y_true, y_pred, labels, **kwargs)
66 """
67 from sklearn import metrics
---> 68 return metrics.classification_report(y_true, y_pred, labels, **kwargs)
69
70
TypeError: classification_report() takes 2 positional arguments but 3 were given
Can anyone please throw some light on this?

sklearn train_test_split - ValueError: Found input variables with inconsistent numbers of samples: [2552, 1] /Linear Regression

I need assistance reshaping my input to match my output.
I wanted to create a model that vectorizes and classifies 'All information' information so that the label'Fall' can be divided into 0 and 1.
However, I keep getting the [ValueError: Found input variables with inconsistent numbers of samples: [2552, 1]] error.
The'shape' looks fine, but I don't know how to fix it.
## Linear Regression
import pandas as pd
import numpy as np
from tqdm import tqdm
#instance->fit->predict
from sklearn.linear_model import LinearRegression
model=LinearRegression(fit_intercept=True)
data=pd.read_csv("Fall_test_0826.csv", encoding='cp949', header=0)
data.head(2)
X=data.drop(["fall"], axis=1)
y= data.fall
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state = 0)
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf_vect=TfidfVectorizer()
tfidf_vect.fit(X_train)#단어사전 만듬
X_train_tfidf_vect = tfidf_vect.fit_transform(X_train['All information']).toarray()
X_test_tfidf_vect = tfidf_vect.transform(X_test)
lr_clf=LinearRegression()
lr_clf.fit(X_train_tfidf_vect, y_train)
pred = lr_clf.predict(X_test_tfidf_vect)
from sklearn.metrics import accuracy_score
print('Logisitic Regression _ {0:.3f}'.format(accuracy_score(y_test, pred)))
Error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-85-bec6ead862c8> in <module>
----> 1 print('{0:.3f}'.format(accuracy_score(y_test, pred)))
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
71 FutureWarning)
72 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 73 return f(**kwargs)
74 return inner_f
75
~\anaconda3\lib\site-packages\sklearn\metrics\_classification.py in accuracy_score(y_true, y_pred, normalize, sample_weight)
185
186 # Compute accuracy for each possible representation
--> 187 y_type, y_true, y_pred = _check_targets(y_true, y_pred)
188 check_consistent_length(y_true, y_pred, sample_weight)
189 if y_type.startswith('multilabel'):
~\anaconda3\lib\site-packages\sklearn\metrics\_classification.py in _check_targets(y_true, y_pred)
79 y_pred : array or indicator matrix
80 """
---> 81 check_consistent_length(y_true, y_pred)
82 type_true = type_of_target(y_true)
83 type_pred = type_of_target(y_pred)
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_consistent_length(*arrays)
254 uniques = np.unique(lengths)
255 if len(uniques) > 1:
--> 256 raise ValueError("Found input variables with inconsistent numbers of"
257 " samples: %r" % [int(l) for l in lengths])
258
ValueError: Found input variables with inconsistent numbers of samples: [2552, 1]
enter image description here
enter image description here
I think you have to change your the lines in your code from
X_test_tfidf_vect = tfidf_vect.transform(X_test)
to
X_test_tfidf_vect = tfidf_vect.transform(X_test['All information'])
But your approach is wrong. You are going for Linear Regression but trying to use classifciation metrics (accuracy_score) (Reference)
Doing so should lead to the error ValueError: Classification metrics can't handle a mix of binary and continuous targets
So this will not work, because your array pred will hold float values, so for example 0.5, but for the accuracy_score you need class labels as integers, so for example 0,1,2 or 3 etc.
You need to use regression metrics instead to evaluate your Linear Regression.
Have a look at the available regression metrics here.

LabelEncoder instance is not fitted yet

I have a code for prediction of unseen data in a sentence classification task.
The code is
from sklearn.preprocessing import LabelEncoder
maxlen = 1152
### PREDICT NEW UNSEEN DATA ###
tokenizer = Tokenizer()
label_enc = LabelEncoder()
X_test = ['this is boring', 'wow i like this you did a great job']
X_test = tokenizer.texts_to_sequences(X_test)
X_test = sequence.pad_sequences(X_test, maxlen=maxlen)
a = (model.predict(X_test)>0.5).astype(int).ravel()
print(a)
reverse_pred = label_enc.inverse_transform(a.ravel())
print(reverse_pred)
But I am getting this error
[1 1]
---------------------------------------------------------------------------
NotFittedError Traceback (most recent call last)
<ipython-input-33-7e12dbe8aec1> in <module>()
39 print(a)
40
---> 41 reverse_pred = label_enc.inverse_transform(a.ravel())
42 print(reverse_pred)
1 frames
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_is_fitted(estimator, attributes, msg, all_or_any)
965
966 if not attrs:
--> 967 raise NotFittedError(msg % {'name': type(estimator).__name__})
968
969
NotFittedError: This LabelEncoder instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.
I have used Sequential model and the model.fit is written as history=model.fit() in the training part. Why am I getting this error?
following the sklearn documentation and what reported here, you have simply to fit your encoder before making an inverse transform
y = ['positive','negative','positive','negative','positive','negative']
label_enc = LabelEncoder()
label_enc.fit(y)
model_predictions = np.random.uniform(0,1, 3)
model_predictions = (model_predictions>0.5).astype(int).ravel()
model_predictions = label_enc.inverse_transform(model_predictions)

Resources