Python 3.x - AttributeError: 'function' object has no attribute 'Kfold' - python-3.x

I'm following along an older tutorial with an SVM optimized with a genetic algorithm. I originally thought the issue was just with versions of python and/or scikit, but I'm now unsure of what the issue could be as it continues displaying the same error. I'm currently using python-scikit-learn 0.20.3-1 on Antergos and found a link here that unfortunately didn't seem to help.
So far I've found a few links and examples that have had me alter different aspects of the code, which overall just jumbled everything up. This GitHub page was useful in at least understanding the version difference, as was the first link. This blog post was also neat, but again didn't really help me narrow down the exact issue as to why it's reading out the error. I even tried looking at the sklearn documentation but I still couldn't get it.
These are what I've imported:
import numpy as np
import pandas as pd
import random as rd
from sklearn.model_selection import cross_validate
from sklearn import preprocessing
from sklearn import svm
I had "kfold" defined earlier in the program as such:
kfold = 3
As well, this is the exact line it seem to be having issues with:
kf = cross_validate.KFold(Cnt1,n_splits=kfold)
What it should be doing is simply applying cross validation. However, the error reads:
AttributeError: 'function' object has no attribue 'KFold'
I can't tell if the issue is that I'm not understanding what I should be altering via the links I've given, or if it's a different error born of ignorance. Is there something I'm missing in order to get this to work?

The KFold function is in the sklearn.model_selection module not in sklearn.model_selection.cross_validate
So you sould import
from sklearn import model_selection
and then using like
model_selection.KFold(...)
or you can import the function
from sklearn.model_selection import KFold
just like in the KFold Doc example.

Related

python 3.8 error with cross validation Implementation

When I run the following code from a tutorial, I keep getting the following error at the end in almost every video I attempt.
Source: https://pythonprogramming.net/k-means-from-scratch-2-machine-learning-tutorial/?completed=/k-means-from-scratch-machine-learning-tutorial/
I get the following error:
from sklearn import preprocessing, cross_validation
ImportError: cannot import name 'cross_validation' from 'sklearn
I did pip installs, changing the way cross_validation is stated based on other suggestions but I still can't solve it.
I could not find cross_validation as a library in sklearn.
You need to use from sklearn.model_selection import cross_validate as per the documentation. This has already been answered here.
I also suggest going through the documentation of the functions you use to gain a better understanding of what you are doing.

Alternative to partial dependence plot?

as far as I can see, sklearn has deprecated the partial dependence functionality. I tried to run a simple example:
from sklearn.datasets import make_friedman1
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.inspection import partial_dependence
from sklearn.inspection import plot_partial_dependence
X, y = make_friedman1()
clf = GradientBoostingRegressor(n_estimators=10).fit(X, y)
plot_partial_dependence(clf, X, [0, (0, 1)])
But I am returned the following error message: ImportError: No module named 'sklearn.inspection'
To me, partial dependence (and marginal effects) plots a very important (in combination with relative importances) to better understand machine learning results and predictions.
Is there an alternative available? Respectively, how can I plot the partial dependence?
I think there might be a confusion with versions of sklearn. Just as a suggestion -- I would check yours (e.g., import sklearn; sklearn.__version__). For example, if it is v.0.20.3, by chance -- aren't you looking for partial_dependence and plot_partial_dependence from sklearn.ensemble.partial_dependence instead of sklearn.inspection?
I had the same issue, I resolved it by simply updating sklearn, which now contains sklearn.inspection. I'm using Anaconda, if you're also using Anaconda, simply type this in the Anaconda Propmt:
conda update --all
to update all packages. Restart your jupyter notebook and now it should work.

Tensorflow causes errors in scikit-learn

When I import scikit-learn before importing tensorflow I don't have any issues. Running this block of code produces an output of 1.7766212763101197e-12.
import numpy as np
np.random.seed(123)
import numpy.random as rand
from sklearn.decomposition import PCA
import tensorflow as tf
X = rand.randn(100,15)
X = X - X.mean(axis=0)
mod = PCA()
w = mod.fit_transform(X)
h = mod.components_
print(np.sum(np.abs(X-np.dot(w,h))))
However, if I import tensorflow before importing scikit-learn my code no longer functions. When I run this code-block
import tensorflow as tf
import numpy as np
np.random.seed(123)
import numpy.random as rand
from sklearn.decomposition import PCA
X = rand.randn(100,15)
X = X - X.mean(axis=0)
mod = PCA()
w = mod.fit_transform(X)
h = mod.components_
print(np.sum(np.abs(X-np.dot(w,h))))
I get an output of 130091393261440.25.
Why is that? My versions for the packages are:
numpy - 1.13.1
sklearn - 0.19.0
tensorflow - 1.3.0
Import order should not affect output, as python modules are self-contained, except in the case of dependencies.
I was unable to reproduce your error, and get an output of 1.7951539777252834e-12 for both code blocks.
This is an interesting problem and I am curious to see if others can provide a better response for why you are seeing this issue.
Note: the present answer is an answer to the title for the ones looking for using TensorFlow within Scikit-Learn, and does not just regards some import errors as you've had.
You can use TensorFlow within Scikit-Learn pipelines using Neuraxle.
Neuraxle is an extension of Scikit-Learn to make it more compatible with all deep learning libraries.
Problem: You can’t Parallelize nor Save Pipelines Using Steps that Can’t be Serialized “as-is” by Joblib (e.g.: a TensorFlow step)
Whereas a step is a transformer or estimator in a scikit-learn Pipeline.
This problem will only surface past some point of using Scikit-Learn. This is the point of no-return: you’ve coded your entire production pipeline, but once you trained it and selected the best model, you realize that what you’ve just coded can’t be serialized.
This means once trained, your pipeline can’t be saved to disks because one of its steps imports things from a weird python library coded in another language and/or uses GPU resources. Your code smells weird and you start panicking over what was a full year of research development.
Solution with Code Examples:
Here is a full project example from A to Z where TensorFlow is used with Neuraxle as if it was used with Scikit-Learn.
Here is another practical example where TensorFlow is used within a scikit-learn-like pipeline
The trick is performed by using Neuraxle-TensorFlow.
This is to make use of Neuraxle's savers.
Read also: https://stackoverflow.com/a/60557192/2476920

name 'classification_model' is not defined

Im trying to model in Python 3.5 and am following an example that can be found at here.
I have imported all the required libraries from sklearn.
However I'm getting the following error.
Code:
from sklearn.linear_model import LogisticRegression
from sklearn.cross_validation import KFold #For K-fold cross validation
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn import metrics
outcome_var = 'Loan_Status'
model = LogisticRegression()
predictor_var = ['Credit_History']
classification_model(model, loan,predictor_var,outcome_var)
When I run the above code I get the following error:
NameError: name 'classification_model' is not defined
I'm not sure how to resolve this as I tried importing sklearn and all the sub libraries.
P.S. I'm new to Python, hence I'm trying to figure out basic steps
Depending on the exact details this may not be what you want but I have never had a problem with
import sklearn.linear_model as sk
logreg = sk.LogisticRegressionCV()
logreg.fit(predictor_var,outcome_var)
This means you have to explicitly separate your training and test set, but having fit to a training set (the process in the final line of my code), you can then use the methods detailed in the documentation [1].
For example figuring out what scores (how many did I get correct) you get on unseen data with the .score method
[1] http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegressionCV.html
It appears this code came from this tutorial.
The issue is exactly as the error describes. classification_model is currently undefined. You need to create this function yourself before you can call it. Check out this part of that tutorial so you can see how it's defined. Good luck!
from sklearn.metrics import classification_report

Cannot resolve import "MultiLabelBinarizer"

I am new to the "scikit-Learn" API and wish to implement a multilabel classification problem. After importing the following packages:
import numpy as np
from sklearn.multiclass import OneVsRestClassifier
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.svm import LinearSVC
from sklearn.metrics import classification_report
I get an error which says 'Unresolved import: MultiLabelBinarizer'. But other related packages imported seem to work fine. I wonder why the 'MultiLabelBinarizer' cannot be imported, given the fact that the 'sklearn ' package was properly installed. Any help will be appreciated.
I found out the reason, in case someone comes across the same problem. The error was due to fact that I was running the above code on 'sklearn' version 0.14 (which was installed by default on Ubuntu 14.04 LTS) instead of 0.16. I also think the MultiLabelBinarizer Class is only available on 'sklearn' version 0.16 (I have not tried the 0.15 - in case there is any).

Resources