name 'classification_model' is not defined - python-3.x

Im trying to model in Python 3.5 and am following an example that can be found at here.
I have imported all the required libraries from sklearn.
However I'm getting the following error.
Code:
from sklearn.linear_model import LogisticRegression
from sklearn.cross_validation import KFold #For K-fold cross validation
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn import metrics
outcome_var = 'Loan_Status'
model = LogisticRegression()
predictor_var = ['Credit_History']
classification_model(model, loan,predictor_var,outcome_var)
When I run the above code I get the following error:
NameError: name 'classification_model' is not defined
I'm not sure how to resolve this as I tried importing sklearn and all the sub libraries.
P.S. I'm new to Python, hence I'm trying to figure out basic steps

Depending on the exact details this may not be what you want but I have never had a problem with
import sklearn.linear_model as sk
logreg = sk.LogisticRegressionCV()
logreg.fit(predictor_var,outcome_var)
This means you have to explicitly separate your training and test set, but having fit to a training set (the process in the final line of my code), you can then use the methods detailed in the documentation [1].
For example figuring out what scores (how many did I get correct) you get on unseen data with the .score method
[1] http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegressionCV.html

It appears this code came from this tutorial.
The issue is exactly as the error describes. classification_model is currently undefined. You need to create this function yourself before you can call it. Check out this part of that tutorial so you can see how it's defined. Good luck!

from sklearn.metrics import classification_report

Related

python 3.8 error with cross validation Implementation

When I run the following code from a tutorial, I keep getting the following error at the end in almost every video I attempt.
Source: https://pythonprogramming.net/k-means-from-scratch-2-machine-learning-tutorial/?completed=/k-means-from-scratch-machine-learning-tutorial/
I get the following error:
from sklearn import preprocessing, cross_validation
ImportError: cannot import name 'cross_validation' from 'sklearn
I did pip installs, changing the way cross_validation is stated based on other suggestions but I still can't solve it.
I could not find cross_validation as a library in sklearn.
You need to use from sklearn.model_selection import cross_validate as per the documentation. This has already been answered here.
I also suggest going through the documentation of the functions you use to gain a better understanding of what you are doing.

Python 3.x - AttributeError: 'function' object has no attribute 'Kfold'

I'm following along an older tutorial with an SVM optimized with a genetic algorithm. I originally thought the issue was just with versions of python and/or scikit, but I'm now unsure of what the issue could be as it continues displaying the same error. I'm currently using python-scikit-learn 0.20.3-1 on Antergos and found a link here that unfortunately didn't seem to help.
So far I've found a few links and examples that have had me alter different aspects of the code, which overall just jumbled everything up. This GitHub page was useful in at least understanding the version difference, as was the first link. This blog post was also neat, but again didn't really help me narrow down the exact issue as to why it's reading out the error. I even tried looking at the sklearn documentation but I still couldn't get it.
These are what I've imported:
import numpy as np
import pandas as pd
import random as rd
from sklearn.model_selection import cross_validate
from sklearn import preprocessing
from sklearn import svm
I had "kfold" defined earlier in the program as such:
kfold = 3
As well, this is the exact line it seem to be having issues with:
kf = cross_validate.KFold(Cnt1,n_splits=kfold)
What it should be doing is simply applying cross validation. However, the error reads:
AttributeError: 'function' object has no attribue 'KFold'
I can't tell if the issue is that I'm not understanding what I should be altering via the links I've given, or if it's a different error born of ignorance. Is there something I'm missing in order to get this to work?
The KFold function is in the sklearn.model_selection module not in sklearn.model_selection.cross_validate
So you sould import
from sklearn import model_selection
and then using like
model_selection.KFold(...)
or you can import the function
from sklearn.model_selection import KFold
just like in the KFold Doc example.

Tensorflow causes errors in scikit-learn

When I import scikit-learn before importing tensorflow I don't have any issues. Running this block of code produces an output of 1.7766212763101197e-12.
import numpy as np
np.random.seed(123)
import numpy.random as rand
from sklearn.decomposition import PCA
import tensorflow as tf
X = rand.randn(100,15)
X = X - X.mean(axis=0)
mod = PCA()
w = mod.fit_transform(X)
h = mod.components_
print(np.sum(np.abs(X-np.dot(w,h))))
However, if I import tensorflow before importing scikit-learn my code no longer functions. When I run this code-block
import tensorflow as tf
import numpy as np
np.random.seed(123)
import numpy.random as rand
from sklearn.decomposition import PCA
X = rand.randn(100,15)
X = X - X.mean(axis=0)
mod = PCA()
w = mod.fit_transform(X)
h = mod.components_
print(np.sum(np.abs(X-np.dot(w,h))))
I get an output of 130091393261440.25.
Why is that? My versions for the packages are:
numpy - 1.13.1
sklearn - 0.19.0
tensorflow - 1.3.0
Import order should not affect output, as python modules are self-contained, except in the case of dependencies.
I was unable to reproduce your error, and get an output of 1.7951539777252834e-12 for both code blocks.
This is an interesting problem and I am curious to see if others can provide a better response for why you are seeing this issue.
Note: the present answer is an answer to the title for the ones looking for using TensorFlow within Scikit-Learn, and does not just regards some import errors as you've had.
You can use TensorFlow within Scikit-Learn pipelines using Neuraxle.
Neuraxle is an extension of Scikit-Learn to make it more compatible with all deep learning libraries.
Problem: You can’t Parallelize nor Save Pipelines Using Steps that Can’t be Serialized “as-is” by Joblib (e.g.: a TensorFlow step)
Whereas a step is a transformer or estimator in a scikit-learn Pipeline.
This problem will only surface past some point of using Scikit-Learn. This is the point of no-return: you’ve coded your entire production pipeline, but once you trained it and selected the best model, you realize that what you’ve just coded can’t be serialized.
This means once trained, your pipeline can’t be saved to disks because one of its steps imports things from a weird python library coded in another language and/or uses GPU resources. Your code smells weird and you start panicking over what was a full year of research development.
Solution with Code Examples:
Here is a full project example from A to Z where TensorFlow is used with Neuraxle as if it was used with Scikit-Learn.
Here is another practical example where TensorFlow is used within a scikit-learn-like pipeline
The trick is performed by using Neuraxle-TensorFlow.
This is to make use of Neuraxle's savers.
Read also: https://stackoverflow.com/a/60557192/2476920

ValueError: could not convert string to float in panda

My code is :
import pandas as pd
data = pd.read_table('train.tsv')
X=data.Phrase
Y=data.Sentiment
from sklearn import cross_validation
X_train,X_test,Y_train,Y_test=cross_validation.train_test_split(X,Y,test_size=0.2,random_state=0)
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB()
clf.fit(X,Y)
I get the error :ValueError: could not convert string to float:
What changes can I make that my code works?
You can't pass in text data into MultinomialNB of scikit-learn as stated in its documentation.
None of the algorithms in scikit-learn works directly with text data. You need to do some preprocessing to get desired output. You'll need to first extract the features from text data using techniques like bagging or tokenizing. Have a look at this link for better understanding.
You also might want to look at using NLTK for such use cases as yours.
ValueError when using Multinomial Naive Bayes classifier
You probably should preprocess your data as shown in the answer above.

how can i find the confidence level of svm using scikit-learn library [duplicate]

Trying to use SVC from sklearn to do a classification problem. Given a bunch of data, and information telling me whether some subject is in a certain class or not, I want to be able to give a probability that a new, unknown subject is in a class.
I only have 2 classes, so the problem is binary. Here is my code and some of my errors
from sklearn.svm import SVC
clf=SVC()
clf=clf.fit(X,Y)
SVC(probability=True)
print clf.predict_proba(W) #Error is here
But it returns the following error:
NotImplementedError: probability estimates must be enabled to use this method
How can I fix this?
You have to construct the SVC object with probability=True
from sklearn.svm import SVC
clf=SVC(probability=True)
clf.fit(X,Y)
print clf.predict_proba(W) #No error
Your code creates a SVC with probability estimates and discards it (as you do not store it in any variable) and use some previous SVC stored in clf (without probability)
Always set the parameters before fit.
from sklearn.svm import SVC
clf=SVC(probability=True)
clf=clf.fit(X,Y)
print clf.predict_proba(W)

Resources