'Sequential' object has no attribute 'feature_importances_' - scikit-learn

In an attempt to apply the feature_importance attribute for a random forest classification model.
I have imported the relevant libraries:
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
Importances = rf.feature_importances_
But I get this error:
'Sequential' object has no attribute 'feature_importances_'
What could be the explanation for this? Thanks

Related

How to convert sklearn model using pipeline to ONNX format for real time inferencing

It is a multi-class classification model with sklearn.
I am using OneVsOneClassifier model to train and predict 150 intents. Its a multi-class classification problem.
Data:
text intents
text1 int1
text2 int2
I convert these intents in labels using:
le = LabelEncoder()
y_train = le.fit_transform(y_train)
y_test = le.fit_transform(y_test)
Expectation:
Without changing the training pipeline or parameters, note the inference time. Currently, it's slow, ~1second for 1 inference. So to convert pipeline to ONNX format and then use for inferencing on 1 example.
Code:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.compose import ColumnTransformer
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC,LinearSVC
def create_pipe(clf):
# Each pipeline uses the same column transformer.
column_trans = ColumnTransformer(
[('Text', TfidfVectorizer(), 'text')
],
remainder='drop')
pipeline = Pipeline([('prep',column_trans),
('clf', clf)])
return pipeline
def fit_and_print(pipeline):
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)
print(metrics.classification_report(y_test, y_pred,
target_names=le.classes_,
digits=3))
clf = OneVsOneClassifier(LinearSVC(random_state=42, class_weight='balanced'))
pipeline = create_pipe(clf)
%time fit_and_print(pipeline)
# convert input to df
def create_test_data(x):
d = {'text' : x}
df = pd.DataFrame(d, index=[0])
return df
revs=[]
for idx in [948, 5717, 458]:
cur = test.loc[idx, 'text']
revs.append(cur)
print(revs)
revs=sam['text'].values
%%time
for rev in revs:
c_res = pipeline.predict(create_test_data(rev))
print(rev, '=', labels[c_res[0]])
ONNX conversion code
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType, StringTensorType
initial_type = [('UTTERANCE', StringTensorType([None, 2]))]
model_onnx = convert_sklearn(pipeline, initial_types=initial_type)
Error
MissingShapeCalculator: Unable to find a shape calculator for type '<class 'sklearn.multiclass.OneVsOneClassifier'>'.
It usually means the pipeline being converted contains a
transformer or a predictor with no corresponding converter
implemented in sklearn-onnx. If the converted is implemented
in another library, you need to register
the converted so that it can be used by sklearn-onnx (function
update_registered_converter). If the model is not yet covered
by sklearn-onnx, you may raise an issue to
https://github.com/onnx/sklearn-onnx/issues
to get the converter implemented or even contribute to the
project. If the model is a custom model, a new converter must
be implemented. Examples can be found in the gallery.
How to resolve this? Also how to do prediction after converting to ONNX format?

Memory failure while trying to train a logistic regression model

I'm getting this failure:
File "C:\Users\ophirbh\AppData\Roaming\Python\Python38\site-packages\scipy\special_logsumexp.py", line 112, in logsumexp
tmp = np.exp(a - a_max)
MemoryError: Unable to allocate array with shape (950028, 45) and data type float64
for that code:
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
X, y = datasets.load_svmlight_file('converted_features')
clf = LogisticRegression(random_state = 0, solver = 'lbfgs', multi_class =
'multinomial',max_iter=2).fit(X,y)
print(cross_val_score(clf, X, y, scoring='recall_macro', cv = 5))
Is there any way around it?
In this case you seem to have insufficient memory, you can use google colab for quick experimentations without getting into any memory issues.

AttributeError: module 'tensorflow_core.keras.layers' has no attribute 'Conv1d'

I am running below code and getting error as "AttributeError: module 'tensorflow_core.keras.layers' has no attribute 'Conv1d'". Any help will be grateful
import tensorflow as tf
print(tf.__version__)
(mnist_train, minst_train_label), (mnist_test, mnist_test_label) = tf.keras.datasets.mnist.load_data()
train_label_batch_int = tf.cast(minst_train_label, tf.int32) ## This is important because one tf.one_hot does not accept float
train_label_batch_onehot = tf.one_hot(train_label_batch_int, depth = 10)
test_label_batch_int = tf.cast(mnist_test_label, tf.int32) ## This is important because one tf.one_hot does not accept float
test_label_batch_onehot = tf.one_hot(test_label_batch_int, depth = 10)
## converting to ndarray
if type(train_label_batch_onehot).__name__ != 'ndarray' :
train_label_batch_onehot = train_label_batch_onehot.numpy()
test_label_batch_onehot = test_label_batch_onehot.numpy()
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers import Conv1D
model = tf.keras.Sequential([
tf.keras.layers.Conv1D(32,5,activation=tf.nn.relu),
tf.keras.layers.Conv1D(32,5,activation=tf.nn.relu),
tf.keras.layers.MaxPooling1D(2,2),
tf.keras.layers.Conv1d(64,3,activation=tf.nn.relu),
tf.keras.layers.MaxPooling1D(2,2),
tf.keras.layers.Conv1d(128,3,activation=tf.nn.relu),
tf.keras.layers.flatten(),
tf.keras.layers.Dense(1024,activation=tf.nn.relu),
tf.keras.layers.Dense(256,activation=tf.nn.relu),
tf.keras.layers.Dense(10,activation=None)])
Spelling mistake . thank you its resolved. It should be Conv1D and not Conv1d

How to specify positive label when use precision as scoring in GridSearchCV

model = sklearn.model_selection.GridSearchCV(
estimator = est,
param_grid = param_grid,
scoring = 'precision',
verbose = 1,
n_jobs = 1,
iid = True,
cv = 3)
In sklearn.metrics.precision_score(y, y_pred,pos_label=[0]), I can specify the positive label, how can I specify this in GridSearchCV too?
If there is no way to specify, when using custom scoring, how can I define?
I have tried this:
custom_score = make_scorer(precision_score(y, y_pred,pos_label=[0]),
greater_is_better=True)
but I got error:
NameError: name 'y_pred' is not defined
Reading the docs, you can pass any kwargs into make_scorer and they will be automatically passed into the score_func callable.
from sklearn.metrics import precision_score, make_scorer
custom_scorer = make_scorer(precision_score, greater_is_better=True, pos_label=0)
Then you pass this custom_scorer to GridSearchCV:
gs = GridSearchCV(est, ..., scoring=custom_scorer)

trying a custom computation of grid.best_score_ (obtained with GridSearchCV)

I'm trying to recompute grid.best_score_ I obtained on my own data without success...
So I tried it using a conventional dataset but no more success. Here is the code :
from sklearn import datasets
from sklearn import linear_model
from sklearn.cross_validation import ShuffleSplit
from sklearn import grid_search
from sklearn.metrics import r2_score
import numpy as np
lr = linear_model.LinearRegression()
boston = datasets.load_boston()
target = boston.target
param_grid = {'fit_intercept':[False]}
cv = ShuffleSplit(target.size, n_iter=5, test_size=0.30, random_state=0)
grid = grid_search.GridSearchCV(lr, param_grid, cv=cv)
grid.fit(boston.data, target)
# got cv score computed by gridSearchCV :
print grid.best_score_
0.677708680059
# now try a custom computation of cv score
cv_scores = []
for (train, test) in cv:
y_true = target[test]
y_pred = grid.best_estimator_.predict(boston.data[test,:])
cv_scores.append(r2_score(y_true, y_pred))
print np.mean(cv_scores)
0.703865991851
I can't see why it's different, GridSearchCV is supposed to use scorer from LinearRegression, which is r2 score. Maybe the way I code cv score is not the one used to compute best_score_... I'm asking here before going through GridSearchCV code.
Unless refit=False in the GridSearchCV constructor, the winning estimator is refit on the entire dataset at the end of fit. best_score_ is the estimator's average score using the cross-validation splits, while best_estimator_ is an estimator of the winning configuration fit on all the data.
lr2 = linear_model.LinearRegression(fit_intercept=False)
scores2 = [lr2.fit(boston.data[train,:], target[train]).score(boston.data[test,:], target[test])
for train, test in cv]
print np.mean(scores2)
Will print 0.67770868005943297.

Resources