Pipeline issues in sklearn [closed]

Pipeline issues in sklearn [closed] - scikit-learn

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I have a neural network model which I wish to fit to my training data. When I compile the below line of code
history = pipeline.fit(inputs[train], targets[train], epochs=epochs, batchsize=batchsize)
I receive the following error message:
Pipeline.fit does not accept the epochs parameter. You can pass parameters to specific
steps of your pipeline using the stepname__parameter format, e.g.
`Pipeline.fit(X, y, logisticregression__sample_weight=sample_weight)`
How to resolve this issue?

I don't think that epochs parameter is defined for MLPClassifier, you should use max_iter parameter instead.
Then if you want to specify hyperparameter within a Pipeline, you can do as follows:
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.impute import SimpleImputer
from sklearn.datasets import make_classification
X, y = make_classification()
model = make_pipeline(SimpleImputer(), StandardScaler(), MLPClassifier())
params = {
'mlpclassifier__max_iter' : 10,
'mlpclassifier__batch_size' : 20
}
model.set_params(**params)
model.fit(X, y)
I would suggest to use this notation as you can easily reuse it to perform a GridSearchCV.

Related

Invalid Syntax in keras model.add Convolutional layers

I am trying to build a VGG16 model but came across the
invalid syntax error
while compiling it. Error is in the activation function of the below line.
model.add(Convolution2D((64,3,3,activation='relu')))
However, if I change the code as below, its working fine.
model.add(Convolution2D((64,3,3)))
model.add(Activation('relu'))
I have seen many related questions and the answers say that it is due to the missing parenthesis in the above line of code, but I checked the parenthesis and they are fine. Why is the code giving me issues when i mention activation parameter in Convolution2D layer. I checked the documentation and it is supposed to accept the parameter. What am I missing here.
Here is the detailed code
from keras.models import Sequential
from keras.layers.core import Flatten,Dense,Dropout
from keras.layers.convolutional import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras.layers import Activation
from keras.optimizers import SGD
import cv2, numpy as np
def VGG16(weights_path=none):
model = Sequential()
model.add(ZeroPadding2D((1,1),input=(3,224,224)))
model.add(Convolution2D(64,3,3,activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(64,3,3,activation='relu'))
model.add(MaxPooling2D((2,2),strides=(2,2)))

What are common sources of randomness in Machine Learning projects with Keras?

Reproducibility is important. In a closed-source machine learning project I'm currently working on it is hard to achieve it. What are the parts to look at?

Setting seeds
Computers have pseudo-random number generators which are initialized with a value called the seed. For machine learning, you might need to do the following:
# I've heard the order here is important
import random
random.seed(0)
import numpy as np
np.random.seed(0)
import tensorflow as tf
tf.set_random_seed(0)
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
from keras import backend as K
K.set_session(sess) # tell keras about the seeded session
# now import keras stuff
See also: Keras FAQ: How can I obtain reproducible results using Keras during development?
sklearn
sklearn.model_selection.train_test_split has a random_state parameter.
What to check
Am I loading the data in the same order every time?
Do I initialize the model the same way?
Do you use external data that might change?
Do you use external state that might change (e.g. datetime.now)?

methods logLikelihood and logPerplexity not available for Spark LDA, how to measure them? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I'm trying to get perplexity and log likelihood of a Spark LDA model (with Spark 2.1). The code below does not work (methods logLikelihood and logPerplexity not found) although I can save the model.
from pyspark.mllib.clustering import LDA
from pyspark.mllib.linalg import Vectors
# construct corpus
# run LDA
ldaModel = LDA.train(corpus, k=10, maxIterations=10)
logll = ldaModel.logLikelihood(corpus)
perplexity = ldaModel.logPerplexity(corpus)
Notice that such methods do not come up with dir(LDA).
What would be a working example?

I can do train but not fit. 'LDA' object has no attribute 'fit'
That's because you are working with the old, RDD-based API (MLlib), i.e.
from pyspark.mllib.clustering import LDA # WRONG import
whose LDA class indeed does not include fit, logLikelihood, or logPerplexity methods.
In order to work with these methods, you should switch to the new, dataframe-based API (ML):
from pyspark.ml.clustering import LDA # NOTE: different import
# Loads data.
dataset = (spark.read.format("libsvm")
.load("data/mllib/sample_lda_libsvm_data.txt"))
# Trains a LDA model.
lda = LDA(k=10, maxIter=10)
model = lda.fit(dataset)
ll = model.logLikelihood(dataset)
lp = model.logPerplexity(dataset)

how can i find the confidence level of svm using scikit-learn library [duplicate]

Trying to use SVC from sklearn to do a classification problem. Given a bunch of data, and information telling me whether some subject is in a certain class or not, I want to be able to give a probability that a new, unknown subject is in a class.
I only have 2 classes, so the problem is binary. Here is my code and some of my errors
from sklearn.svm import SVC
clf=SVC()
clf=clf.fit(X,Y)
SVC(probability=True)
print clf.predict_proba(W) #Error is here
But it returns the following error:
NotImplementedError: probability estimates must be enabled to use this method
How can I fix this?

You have to construct the SVC object with probability=True
from sklearn.svm import SVC
clf=SVC(probability=True)
clf.fit(X,Y)
print clf.predict_proba(W) #No error
Your code creates a SVC with probability estimates and discards it (as you do not store it in any variable) and use some previous SVC stored in clf (without probability)

Always set the parameters before fit.
from sklearn.svm import SVC
clf=SVC(probability=True)
clf=clf.fit(X,Y)
print clf.predict_proba(W)

name 'classification_model' is not defined

Im trying to model in Python 3.5 and am following an example that can be found at here.
I have imported all the required libraries from sklearn.
However I'm getting the following error.
Code:
from sklearn.linear_model import LogisticRegression
from sklearn.cross_validation import KFold #For K-fold cross validation
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn import metrics
outcome_var = 'Loan_Status'
model = LogisticRegression()
predictor_var = ['Credit_History']
classification_model(model, loan,predictor_var,outcome_var)
When I run the above code I get the following error:
NameError: name 'classification_model' is not defined
I'm not sure how to resolve this as I tried importing sklearn and all the sub libraries.
P.S. I'm new to Python, hence I'm trying to figure out basic steps

Depending on the exact details this may not be what you want but I have never had a problem with
import sklearn.linear_model as sk
logreg = sk.LogisticRegressionCV()
logreg.fit(predictor_var,outcome_var)
This means you have to explicitly separate your training and test set, but having fit to a training set (the process in the final line of my code), you can then use the methods detailed in the documentation [1].
For example figuring out what scores (how many did I get correct) you get on unseen data with the .score method
[1] http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegressionCV.html

It appears this code came from this tutorial.
The issue is exactly as the error describes. classification_model is currently undefined. You need to create this function yourself before you can call it. Check out this part of that tutorial so you can see how it's defined. Good luck!

from sklearn.metrics import classification_report

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Pipeline issues in sklearn [closed] - scikit-learn

Related

Invalid Syntax in keras model.add Convolutional layers

What are common sources of randomness in Machine Learning projects with Keras?

methods logLikelihood and logPerplexity not available for Spark LDA, how to measure them? [closed]

how can i find the confidence level of svm using scikit-learn library [duplicate]

name 'classification_model' is not defined

Categories

Resources