GridSearchCV uses predict or predict_proba? - python-3.x

It works for the huber and log, however only the logarithm has a predict_proba? How it works? I used roc_auc_score.

as it has written in O'reily book it works on mean_score as you can access to all mean_scores it got with this code
cvres = grid_search.cv_results_
for mean_score, params in zip(cvres["mean_test_score"], cvres["params"]):
print(np.sqrt(-mean_score), params)
it will print all mean_scores that calculated for each param and you can see the difference between them easily

Grid Search CV has both the predict and the predict_proba functions.
If you consider a binary classification problem, predict will have the values of 0 or 1. While, Predict_proba will have the probability values of it being 0 or 1.
predict_proba will have an array output like [0.23 0.77]

Related

Keras XOR not training

We have been trying for a while to get this to work. This is probably the easiest example to create and so now we need help. We've been changing the number of epochs in the fit function and that's giving us different results, but never anything good, and when we increase them too much they will always converge on 0.5.
#%%
inputValues = numpy.array([[0,0],[0,1],[1,0],[1,1]])
inputResults = numpy.array([[0],[1],[1],[0]])
print(inputValues)
print(inputResults)
#%%
model = keras.Sequential([
keras.layers.Flatten(input_shape=(2,)),
keras.layers.Dense(units=2, activation=("relu")),
keras.layers.Dense(units=2, activation=("softmax"))
])
model.compile(loss = keras.losses.SparseCategoricalCrossentropy(), optimizer = tensorflow.optimizers.Adam(), metrics=['accuracy'])
model.fit(inputValues, inputResults, epochs=2500)
model.summary()
print(model.weights)
#%%
print(model.predict_proba(inputValues))
print("End of file.")
From my understanding of ANN's we should have 2 inputs in the first layer, specifically for the XOR example. And two outputs for the output (either a 0, or a 1). I assume that since it is not required to say what these outputs are (0 or 1), tensor flow is dealing with this automatically by comparing the results in the fit function? Lastly, we have tried with both a hidden layer (of 2) and without and still don't seem to get any better results.
Could someone let us know what we have done wrong?
Your problem is essentially a binary classification problem, because the output can be either 0 or 1. For this you don't need two ouput neurons; one will do, with a sigmoid function that will return either a 0 or a 1 as output (sigmoid generally works well for binary classification, because its characteristic S-shape will get you values either close to 0 or close to 1).
Another adjustment you need to make is to set the loss function to binary crossentropy; your choice, sparse categorical crossentropy, is suitable for classifications into more than 2 categories.
So the code that I tried is:
model = keras.Sequential([
keras.layers.Flatten(input_shape=(2,)),
keras.layers.Dense(units=4, activation=("sigmoid")),
keras.layers.Dense(units=1, activation=("sigmoid"))
])
model.compile(loss = keras.losses.BinaryCrossentropy(from_logits=False), optimizer = optimizers.Adam(), metrics=['accuracy'])
model.fit(inputValues, inputResults, epochs=2500)
With these settings I got training accuracy to 1.0000. It took a while to get there, and I suppose that could be sped up by playing around with the learning rate, but it should be enough to get the job done.

GridSearchCV gives different results than LassoCV for optimal alpha

I am aware of the standard process of finding the optimal value of alpha/lambda using Cross Validation technique through GridSearchCV class in sklearn.model_selection library.Here's my code to find that .
alphas=np.arange(0.0001,0.01,0.0005)
cv=RepeatedKFold(n_splits=10,n_repeats=3, random_state=100)
hyper_param = {'alpha':alphas}
model = Lasso()
model_cv = GridSearchCV(estimator = model,
param_grid=hyper_param,
scoring='r2',
cv=cv,
verbose=1,
return_train_score=True
)
model_cv.fit(X_train,y_train)
#checking the bestscore
model_cv.best_params_
This gives me alpha=0.01
Now, looking on LassoCV , as per my understanding , this library creates model by selecting best optimal alpha by the passed alphas list, and please note , I have used the same cross validation scheme for both of them. But when trying sklearn.linear_model.LassoCV with RepeatedKFold cross validation scheme.
alphas=np.arange(0.0001,0.01,0.0005)
cv=RepeatedKFold(n_splits=10,n_repeats=3,random_state=100)
ls_cv_m=LassoCV(alphas,cv=cv,n_jobs=1,verbose=True,random_state=100)
ls_cv_m.fit(X_train_reduced,y_train)
print('Alpha Value %d'%ls_cv_m.alpha_)
print('The coefficients are {}',ls_cv_m.coef_)
I get alpha=0 for the same data and this alpha value in not present in the list of decimal values passed in alphas argument for this.
This has confused me about the actual implementation of LassoCV.
and my doubts are ..
Why do I get optimal alpha as 0 in LassoCV when the list passed to the argument does not has zero in it.
What is the difference between LassoCV and Lasso then, if I have to anyways find most suitable alpha from GridSearchCV only?
First you should pass your alphas as keywords parameters rather then positional parameters since the first positional parameter for LassoCV is eps.
ls_cv_m=LassoCV(alphas=alphas,cv=cv,n_jobs=1,verbose=True,random_state=100)
Then, the model is returning as optimal parameter one of the alphas that you previously defined, however you are simply printing it as an integer number casting the float to int. Replace %d with %f to print it in the float format:
print('Alpha Value %f'%ls_cv_m.alpha_)
Have a look here for more details about Python printing formats and styles.
As for your second question, Lasso is the linear model while LassoCV is an iterative process that allows you to find the optimal parameters for a Lasso model using Cross-validation.

How to use cross_val_predict to predict probabilities for a new dataset?

I am using sklearn's cross_val_predict for training like so:
myprobs_train = cross_val_predict(LogisticRegression(),X = x_old, y=y_old, method='predict_proba', cv=10)
I am happy with the returned probabilities, and would like now to score up a brand-new dataset. I tried:
myprobs_test = cross_val_predict(LogisticRegression(), X =x_new, y= None, method='predict_proba',cv=10)
but this did not work, it's complaining about y having zero shape. Does it mean there's no way to apply the trained and cross-validated model from cross_val_predict on new data? Or am I just using it wrong?
Thank you!
You are looking at a wrong method. Cross validation methods do not return a trained model; they return values that evaluate the performance of a model (logistic regression in your case). Your goal is to fit some data and then generate prediction for new data. The relevant methods are fit and predict of the LogisticRegression class. Here is the basic structure:
logreg = linear_model.LogisticRegression()
logreg.fit(x_old, y_old)
predictions = logreg.predict(x_new)
I have the same concern as #user3490622. If we can only use cross_val_predict on training and testing sets, why y (target) is None as the default value? (sklearn page)
To partially achieve the desired results of multiple predicted probability, one could use the fit then predict approach repeatedly to mimic the cross-validation.

Default value in Svm prediction Scikitlearn

I am using scikitlearn for svm classification.
I need a classifier that returns default value when a given test item doesn't match any of the training-set items, i.e. when the distance is very high. Is that possible?
For Example
Let's say my training-set is
X= [[0.5,0.5,2],[4, 4,16],[16, 16,64]]
and labels
y=[0,1,2]
then I run training
clf = svm.SVC()
clf.fit(X, y)
then I run prediction
clf.predict([-100,-100,-200])
Now as we can see the test-item [-100,-100,-200] is too far away from any of the training-items, in this case the prediction will yield [2] which is this item [16, 16,64], is there anyway to make it return anything else (not from training-set)?
I think you can create a label for those big values, and added into your training set.
X= [[0.5,0.5,2],[4, 4,16],[16, 16,64],[-100,-100,200]]
Y=[0,1,2,100]
and give a try.
Since SVM is supervised learning, which means the 'OUTPUT' have to be specified. If you are not certain about the 'OUTPUT', do some non supervised clustering (kmeans for example), and have a rough idea how many possible 'OUTPUT' you will expect.

How do I correctly manually recreate sklearn (python) logistic regression predict_proba outcome for multiple classification

If I run a basic logistic regression with 4 classes, I can get the predict_proba array.
How can i manually calculate the probabilities using the coefficients and intercepts? What are the exact steps to get the same answers that predict_proba generates?
There seem to be multiple questions about this online and several suggestions which are either incomplete or don't match up anyway.
For example, I can't replicate this process from my sklearn model so what is missing?
https://stats.idre.ucla.edu/stata/code/manually-generate-predicted-probabilities-from-a-multinomial-logistic-regression-in-stata/
Thanks,
Because I had the same question but could not find an answer that gave the same results I had a look at the sklearn GitHub repository to find the answer. Using the functions from their own package I was able to create the same results I got from predict_proba().
It appears that sklearn uses a special softmax() function that differs from the usual softmax function in their code.
Let's assume you build a model like this:
from sklearn.linear_model import LogisticRegression
X = ...
Y = ...
model = LogisticRegression(multi_class="multinomial", solver="saga")
model.fit(X, Y)
Then you can calculate the probabilities either with model.predict(X) or use the sklearn function mentioned above to calculate them manually like this.
from sklearn.utils.extmath import softmax,
import numpy as np
scores = np.dot(X, model.coef_.T) + model.intercept_
softmax(scores) # Sklearn implementation
In the documentation for their own softmax() function, they note that
The softmax function is calculated by
np.exp(X) / np.sum(np.exp(X), axis=1)
This will cause overflow when large values are exponentiated. Hence
the largest value in each row is subtracted from each data point to
prevent this.
Replicate sklearn calcs (saw this on a different post):
V = X_train.values.dot(model.coef_.transpose())
U = V + model.intercept_
A = np.exp(U)
P=A/(1+A)
P /= P.sum(axis=1).reshape((-1, 1))
seems slightly different than softmax calcs, or the UCLA stat example, but it works.

Resources