I am trying to reproduce the behavior of LassoCV outside the CV process and I am struggling to understand what happens. I am fixing the random seed in the cross validation so the behavior should be deterministic, as well as the alpha values (I figured that LassoCV re-orders them in descending order). But I must be missing something because I only get the same result if I only use one alpha at a time, or for the largest value of alpha, if it coincides between two runs. In code:
clf = LassoCV(alphas = np.logspace(-2,2,5), cv = KFold(n_splits=10, shuffle=True, random_state=20), max_iter = 1000000, tol = 0.005)
clf.fit(new_xb,ypb)
print('Alphas', clf.alphas_)
for i, alpha in enumerate(clf.alphas_):
print('Score for alpha', alpha, np.mean(clf.mse_path_[i,:])) #for each alpha (row), 10 cv estimates of MSE
returns
Alphas [1.e+02 1.e+01 1.e+00 1.e-01 1.e-02]
Score for alpha 100.0 158200.48456097275
Score for alpha 10.0 158216.20827618148
Score for alpha 1.0 158231.52763707296
Score for alpha 0.1 158194.40120074182
Score for alpha 0.01 157886.51644333656
but if I run the same code with a different range, ie I change
clf = LassoCV(alphas = np.logspace(-1,1,3), cv = KFold(n_splits=10, shuffle=True, random_state=20), max_iter = 1000000, tol = 0.005)
I get
Alphas [10. 1. 0.1]
Score for alpha 10.0 165760.88919712842
Score for alpha 1.0 165704.1358282215
Score for alpha 0.1 161309.30244060006
On the other hand, if the first (largest value) is the same for two runs, the result stays the same for that value; using
clf = LassoCV(alphas = np.logspace(-1,1,2), cv = KFold(n_splits=10, shuffle=True, random_state=20), max_iter = 1000000, tol = 0.005)
returns
alphas [10. 0.1]
Score for alpha 10.0 165760.88919712842
Score for alpha 0.1 161330.76311479398
And if I run it with one alpha at a time, I get consistent results. So my guess is that the alphas are somehow changing inside the classifier, but I can't figure out why.
Related
I am new and trying to understand GridSearchCV.
I found the best parameter from GridSearchCV does not alwyas give the highest R-squared value.
For example GridSearchCV below, I got alpha = 1 for the best parameter.
Input
alpha = [0.1, 1, 10, 100]
# GridSearchCV
estimator = Ridge()
pipe = Pipeline([('scale',StandardScaler()), ('model', estimator)])
parameters = [{'model__alpha':alpha}]
grid = GridSearchCV(pipe, parameters, scoring='r2', cv=5)
grid.fit(x_train, y_train)
print(f'best alpha: {alpha[grid.best_index_]}')
print(f'score: {grid.best_score_}')
Out
best alpha: 1
score: 0.7928298359066712
When use Ridge regression with the same set of alpha, R2_train has highest value in alpha=0.1.
Also, R2_test has highest value when alpha=10.
Input
# Ridge
results = pd.DataFrame(columns=['alpha', 'R2_train', 'R2_test'])
for n in alpha:
estimator = Ridge(n)
pipe = Pipeline([('scale',StandardScaler()), ('model', estimator)])
pipe.fit(x_train, y_train)
# train R2
R2_train = pipe.score(x_train, y_train)
# test R2
yhat_test = pipe.predict(x_test)
R2_test = r2_score(y_test, yhat_test)
results.loc[len(results)] = [n, R2_train, R2_test]
results
Out
alpha
R2_train
R2_test
0
0.1
0.803881
0.824372
1
1.0
0.803823
0.824664
2
10.0
0.800996
0.825727
3
100.0
0.774868
0.822778
So, I am confused which alpha do I need to choose and why?
What I am trying to do is use the triple loss as my loss function, but I don't know if I am getting the right values from the merged vector that is used.
So here is my loss function:
def triplet_loss(y_true, y_pred, alpha=0.2):
"""
Implementation of the triplet loss function
Arguments:
y_true -- true labels, required when you define a loss in Keras, not used in this function.
y_pred -- python list containing three objects:
anchor: the encodings for the anchor data
positive: the encodings for the positive data (similar to anchor)
negative: the encodings for the negative data (different from anchor)
Returns:
loss -- real number, value of the loss
"""
print("Ypred")
print(y_pred.shape)
anchor = y_pred[:,0:512]
positive = y_pred[:,512:1024]
negative = y_pred[:,1024:1536]
print(anchor.shape)
print(positive.shape)
print(negative.shape)
#anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2] # Dont think this is working
# distance between the anchor and the positive
pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, positive)))
print("PosDist", pos_dist)
# distance between the anchor and the negative
neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, negative)))
print("Neg Dist", neg_dist)
# compute loss
basic_loss = (pos_dist - neg_dist) + alpha
loss = tf.maximum(basic_loss, 0.0)
return loss
Now this does work when I use this line in the code and nother the sliceing one
anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]
But I dont think that this is correct as the shape of the merged vector is (?, 3, 3, 1536)
I think it is grabbing the wrong information. But I cannot seem to figure out how to slice this correctly. as the uncommented code gives me this issue.
Dimensions must be equal, but are 3 and 0 for 'loss_9/concatenate_10_loss/Sub' (op: 'Sub') with input shapes: [?,3,3,1536], [?,0,3,1536].
My network set up is like this:
input_dim = (7,7,2048)
anchor_in = Input(shape=input_dim)
pos_in = Input(shape=input_dim)
neg_in = Input(shape=input_dim)
base_network = create_base_network()
# Run input through base network
anchor_out = base_network(anchor_in)
pos_out = base_network(pos_in)
neg_out = base_network(neg_in)
print(anchor_out.shape)
merged_vector = Concatenate(axis=-1)([anchor_out, pos_out, neg_out])
print("Meged Vector", merged_vector.shape)
print(merged_vector)
model = Model(inputs=[anchor_in, pos_in, neg_in], outputs=merged_vector)
adam = Adam(lr=0.01, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
model.compile(optimizer=adam, loss=triplet_loss)
Update
Using this seems to be right, could anyone confirm this?
anchor = y_pred[:,:,:,0:512]
positive = y_pred[:,:,:,512:1024]
negative = y_pred[:,:,:,1024:1536]
You do not need to do the concatenation operation:
# change this line to this
model = Model(inputs=[anchor_in, pos_in, neg_in], outputs=[anchor_out, pos_out, neg_out])
Complete code:
input_dim = (7,7,2048)
anchor_in = Input(shape=input_dim)
pos_in = Input(shape=input_dim)
neg_in = Input(shape=input_dim)
base_network = create_base_network()
# Run input through base network
anchor_out = base_network(anchor_in)
pos_out = base_network(pos_in)
neg_out = base_network(neg_in)
print(anchor_out.shape)
# code changed here
model = Model(inputs=[anchor_in, pos_in, neg_in], outputs=[anchor_out, pos_out, neg_out])
adam = Adam(lr=0.01, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
model.compile(optimizer=adam, loss=triplet_loss)
Then you can use the following loss:
def triplet_loss(y_true, y_pred, alpha=0.3):
'''
Inputs:
y_true: True values of classification. (y_train)
y_pred: predicted values of classification.
alpha: Distance between positive and negative sample, arbitrarily
set to 0.3
Returns:
Computed loss
Function:
--Implements triplet loss using tensorflow commands
--The following function follows an implementation of Triplet-Loss
where the loss is applied to the network in the compile statement
as usual.
'''
anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]
positive_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, positive)), -1)
negative_dist = tf.reduce_sum(tf.square(tf.subtract(anchor,negative)), -1)
loss_1 = tf.add(tf.subtract(positive_dist, negative_dist), alpha)
loss = tf.reduce_sum(tf.maximum(loss_1, 0.0))
return loss
Surprised I can't quickly find this info online--
After training my CNN I grabbed the predictions by running;
predictions = model.predict_generator(test_generator, steps=num_test)
Rather than use
predicted_classes = np.argmax(predictions, axis=1)
I'd like to set a threshold of anything greater than 0.3 probability being labeled as class 1, rather than 0.5. Is there a quick and easy way to do this?
If it is a binary classification you could try:
i=0
while i < len(predictions):
if(predictions[a]<=0.3):
predictions[a]=0
else:
predictions[a]=0
i+=1
This should "round" to class 1 if the predicted value is bigger than 0.3
There is no place in Keras to set such threshold, even as Keras uses 0.5 to compute the binary_accuracy metric. Your only option is to manually threshold the predictions:
predictions = model.predict_generator(test_generator, steps=num_test)
classes = predictions > 0.3
I am evaluating different classifiers for my sentiment analysis model. I am looking at all available metrics, and whilst most achieve a similar precision, recall, F1-scores and ROC-AUC scores, Linear SVM appears to get a perfect ROC-AUC score. Look at the chart below:
Abbreviations: MNB=Multinomial Naive Bayes, SGD=Stochastic Gradient Descent, LR=Logistic Regression, LSVC=Linear Support Vector Classification
Here are the rest of the performance metrics for LSVC, which are very similar to the rest of the classifiers:
precision recall f1-score support
neg 0.83 0.90 0.87 24979
pos 0.90 0.82 0.86 25021
avg / total 0.87 0.86 0.86 50000
As you can see the dataset is balanced for pos and neg comments.
Here is the relevant code:
def evaluate(classifier):
predicted = classifier.predict(testing_text)
if isinstance(classifier.steps[2][1], LinearSVC):
probabilities = np.array(classifier.decision_function(testing_text))
scores = probabilities
else:
probabilities = np.array(classifier.predict_proba(testing_text))
scores = np.max(probabilities, axis=1)
pos_idx = np.where(predicted == 'pos')
predicted_true_binary = np.zeros(predicted.shape)
predicted_true_binary[pos_idx] = 1
fpr, tpr, thresholds = metrics.roc_curve(predicted_true_binary, scores)
auc = metrics.roc_auc_score(predicted_true_binary, scores)
mean_acc = np.mean(predicted == testing_category)
report = metrics.classification_report(testing_category, predicted)
confusion_matrix = metrics.confusion_matrix(testing_category, predicted)
return fpr, tpr, auc, mean_acc, report, confusion_matrix
I am using predict_proba for all classifiers apart from LSVC which uses decision_function instead (since it does not have a predict_proba method`)
What's going on?
EDIT: changes according to #Vivek Kumar's comments:
def evaluate(classifier):
predicted = classifier.predict(testing_text)
if isinstance(classifier.steps[2][1], LinearSVC):
probabilities = np.array(classifier.decision_function(testing_text))
scores = probabilities
else:
probabilities = np.array(classifier.predict_proba(testing_text))
scores = probabilities[:, 1] # NEW
testing_category_array = np.array(testing_category) # NEW
pos_idx = np.where(testing_category_array == 'pos')
predicted_true_binary = np.zeros(testing_category_array.shape)
predicted_true_binary[pos_idx] = 1
fpr, tpr, thresholds = metrics.roc_curve(predicted_true_binary, scores)
auc = metrics.roc_auc_score(predicted_true_binary, scores)
mean_acc = np.mean(predicted == testing_category)
report = metrics.classification_report(testing_category, predicted)
confusion_matrix = metrics.confusion_matrix(testing_category, predicted)
return fpr, tpr, auc, mean_acc, report, confusion_matrix
This now yields this graph:
I don't think it is valid to compare the methods predict_proba and decision_function like for like. The first sentence in the docs for LSVC decision function "Predict confidence scores for samples." must not be read as "predicting probabilties". The second sentences clarifies it, it is similar to the decision function for the general SVC.
You can use predict_proba for a linear SVC with sklearn; then you need to specific under the general SVC the kernel as 'linear'. However, then you are changing the implementation under the hood (away from "LIBLINEAR").
I'm new to this so apologies if this is obvious.
lr = LogisticRegression(penalty = 'l1')
parameters = {'C': [0.001, 0.01, 0.1, 1, 10, 100, 1000]}
clf = GridSearchCV(lr, parameters, scoring='roc_auc', cv = 5)
clf.fit(X, Y)
print clf.score(X, Y)
tn, fp, fn, tp = metrics.confusion_matrix(Y, clf.predict(X)).ravel()
print tn, fp, fn, tp
I want to run a Logistic Regression - I'm using an L1 penalty because I want to reduce the number of features I use. I'm using GridSearchCV to find the best C value for the Logistic Regression
I run this and get C = 0.001, AUC = 0.59, Confusion matrix: 46, 0, 35, 0. Only 1 feature has a non-zero coefficient.
I go back to my code and remove the option of C = 0.001 from my parameter list and run it again.
Now I get C = 1, AUC = 0.95, Confusion matrix: 42, 4, 6, 29. Many, but not all, features have a non-zero coefficient.
I thought that since I have scoring as 'roc_auc' shouldn't the model be created with a better AUC?
Thinking this may be to do with my l1 penalty I switched it to l2. But this gave C = 0.001, AUC = 0.80, CM = 42,4,16,19 and again when I removed C = 0.001 as an option it gave C = 0.01, AUC = 0.88, CM = 41,5,13,22.
There is less of an issue with the l2 penalty but it seems to be a pretty big difference in l1. Is it a penalty thing?
From some of my readings I know ElasticNet is supposed to combine some l1 and l2 - is that where I should be looking?
Also, not completely relevant but while I'm posting - I haven't done any data normalization for this. That is normal for Logistic Regression?
clf.score(X, Y) is the score on the training dataset (the gridsearch refits the model on the entire dataset after it's chosen the best parameters), you don't want to use this to evaluate your model. This also isn't what the gridsearch uses internally in its model selection, instead it uses cross-validated folds and takes the average. You can access the actual score used in the model selection with clf.best_score_.