In an attempt to apply the feature_importance attribute for a random forest classification model.
I have imported the relevant libraries:
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
Importances = rf.feature_importances_
But I get this error:
'Sequential' object has no attribute 'feature_importances_'
What could be the explanation for this? Thanks
Related
It is a multi-class classification model with sklearn.
I am using OneVsOneClassifier model to train and predict 150 intents. Its a multi-class classification problem.
Data:
text intents
text1 int1
text2 int2
I convert these intents in labels using:
le = LabelEncoder()
y_train = le.fit_transform(y_train)
y_test = le.fit_transform(y_test)
Expectation:
Without changing the training pipeline or parameters, note the inference time. Currently, it's slow, ~1second for 1 inference. So to convert pipeline to ONNX format and then use for inferencing on 1 example.
Code:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.compose import ColumnTransformer
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC,LinearSVC
def create_pipe(clf):
# Each pipeline uses the same column transformer.
column_trans = ColumnTransformer(
[('Text', TfidfVectorizer(), 'text')
],
remainder='drop')
pipeline = Pipeline([('prep',column_trans),
('clf', clf)])
return pipeline
def fit_and_print(pipeline):
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)
print(metrics.classification_report(y_test, y_pred,
target_names=le.classes_,
digits=3))
clf = OneVsOneClassifier(LinearSVC(random_state=42, class_weight='balanced'))
pipeline = create_pipe(clf)
%time fit_and_print(pipeline)
# convert input to df
def create_test_data(x):
d = {'text' : x}
df = pd.DataFrame(d, index=[0])
return df
revs=[]
for idx in [948, 5717, 458]:
cur = test.loc[idx, 'text']
revs.append(cur)
print(revs)
revs=sam['text'].values
%%time
for rev in revs:
c_res = pipeline.predict(create_test_data(rev))
print(rev, '=', labels[c_res[0]])
ONNX conversion code
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType, StringTensorType
initial_type = [('UTTERANCE', StringTensorType([None, 2]))]
model_onnx = convert_sklearn(pipeline, initial_types=initial_type)
Error
MissingShapeCalculator: Unable to find a shape calculator for type '<class 'sklearn.multiclass.OneVsOneClassifier'>'.
It usually means the pipeline being converted contains a
transformer or a predictor with no corresponding converter
implemented in sklearn-onnx. If the converted is implemented
in another library, you need to register
the converted so that it can be used by sklearn-onnx (function
update_registered_converter). If the model is not yet covered
by sklearn-onnx, you may raise an issue to
https://github.com/onnx/sklearn-onnx/issues
to get the converter implemented or even contribute to the
project. If the model is a custom model, a new converter must
be implemented. Examples can be found in the gallery.
How to resolve this? Also how to do prediction after converting to ONNX format?
I'm getting this failure:
File "C:\Users\ophirbh\AppData\Roaming\Python\Python38\site-packages\scipy\special_logsumexp.py", line 112, in logsumexp
tmp = np.exp(a - a_max)
MemoryError: Unable to allocate array with shape (950028, 45) and data type float64
for that code:
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
X, y = datasets.load_svmlight_file('converted_features')
clf = LogisticRegression(random_state = 0, solver = 'lbfgs', multi_class =
'multinomial',max_iter=2).fit(X,y)
print(cross_val_score(clf, X, y, scoring='recall_macro', cv = 5))
Is there any way around it?
In this case you seem to have insufficient memory, you can use google colab for quick experimentations without getting into any memory issues.
I am running below code and getting error as "AttributeError: module 'tensorflow_core.keras.layers' has no attribute 'Conv1d'". Any help will be grateful
import tensorflow as tf
print(tf.__version__)
(mnist_train, minst_train_label), (mnist_test, mnist_test_label) = tf.keras.datasets.mnist.load_data()
train_label_batch_int = tf.cast(minst_train_label, tf.int32) ## This is important because one tf.one_hot does not accept float
train_label_batch_onehot = tf.one_hot(train_label_batch_int, depth = 10)
test_label_batch_int = tf.cast(mnist_test_label, tf.int32) ## This is important because one tf.one_hot does not accept float
test_label_batch_onehot = tf.one_hot(test_label_batch_int, depth = 10)
## converting to ndarray
if type(train_label_batch_onehot).__name__ != 'ndarray' :
train_label_batch_onehot = train_label_batch_onehot.numpy()
test_label_batch_onehot = test_label_batch_onehot.numpy()
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers import Conv1D
model = tf.keras.Sequential([
tf.keras.layers.Conv1D(32,5,activation=tf.nn.relu),
tf.keras.layers.Conv1D(32,5,activation=tf.nn.relu),
tf.keras.layers.MaxPooling1D(2,2),
tf.keras.layers.Conv1d(64,3,activation=tf.nn.relu),
tf.keras.layers.MaxPooling1D(2,2),
tf.keras.layers.Conv1d(128,3,activation=tf.nn.relu),
tf.keras.layers.flatten(),
tf.keras.layers.Dense(1024,activation=tf.nn.relu),
tf.keras.layers.Dense(256,activation=tf.nn.relu),
tf.keras.layers.Dense(10,activation=None)])
Spelling mistake . thank you its resolved. It should be Conv1D and not Conv1d
model = sklearn.model_selection.GridSearchCV(
estimator = est,
param_grid = param_grid,
scoring = 'precision',
verbose = 1,
n_jobs = 1,
iid = True,
cv = 3)
In sklearn.metrics.precision_score(y, y_pred,pos_label=[0]), I can specify the positive label, how can I specify this in GridSearchCV too?
If there is no way to specify, when using custom scoring, how can I define?
I have tried this:
custom_score = make_scorer(precision_score(y, y_pred,pos_label=[0]),
greater_is_better=True)
but I got error:
NameError: name 'y_pred' is not defined
Reading the docs, you can pass any kwargs into make_scorer and they will be automatically passed into the score_func callable.
from sklearn.metrics import precision_score, make_scorer
custom_scorer = make_scorer(precision_score, greater_is_better=True, pos_label=0)
Then you pass this custom_scorer to GridSearchCV:
gs = GridSearchCV(est, ..., scoring=custom_scorer)
I'm trying to recompute grid.best_score_ I obtained on my own data without success...
So I tried it using a conventional dataset but no more success. Here is the code :
from sklearn import datasets
from sklearn import linear_model
from sklearn.cross_validation import ShuffleSplit
from sklearn import grid_search
from sklearn.metrics import r2_score
import numpy as np
lr = linear_model.LinearRegression()
boston = datasets.load_boston()
target = boston.target
param_grid = {'fit_intercept':[False]}
cv = ShuffleSplit(target.size, n_iter=5, test_size=0.30, random_state=0)
grid = grid_search.GridSearchCV(lr, param_grid, cv=cv)
grid.fit(boston.data, target)
# got cv score computed by gridSearchCV :
print grid.best_score_
0.677708680059
# now try a custom computation of cv score
cv_scores = []
for (train, test) in cv:
y_true = target[test]
y_pred = grid.best_estimator_.predict(boston.data[test,:])
cv_scores.append(r2_score(y_true, y_pred))
print np.mean(cv_scores)
0.703865991851
I can't see why it's different, GridSearchCV is supposed to use scorer from LinearRegression, which is r2 score. Maybe the way I code cv score is not the one used to compute best_score_... I'm asking here before going through GridSearchCV code.
Unless refit=False in the GridSearchCV constructor, the winning estimator is refit on the entire dataset at the end of fit. best_score_ is the estimator's average score using the cross-validation splits, while best_estimator_ is an estimator of the winning configuration fit on all the data.
lr2 = linear_model.LinearRegression(fit_intercept=False)
scores2 = [lr2.fit(boston.data[train,:], target[train]).score(boston.data[test,:], target[test])
for train, test in cv]
print np.mean(scores2)
Will print 0.67770868005943297.