Micro F1 score in Scikit-Learn with Class imbalance - scikit-learn

I have some class imbalance and a simple baseline classifier that assigns the majority class to every sample:
from sklearn.metrics import precision_score, recall_score, confusion_matrix
y_true = [0,0,0,1]
y_pred = [0,0,0,0]
confusion_matrix(y_true, y_pred)
This yields
[[3, 0],
[1, 0]]
This means TP=3, FP=1, FN=0.
So far, so good. Now I want to calculate the micro average of precision and recall.
precision_score(y_true, y_pred, average='micro') # yields 0.75
recall_score(y_true, y_pred, average='micro') # yields 0.75
I am Ok with the precision, but why is recall not 1.0? How can they ever be the same in this example, given that FP > 0 and FN == 0? I know it must have to do with the micro averaging, but I can't wrap my head around this one.

Yes, its because of micro-averaging. See the documentation here to know how its calculated:
Note that if all labels are included, “micro”-averaging in a
multiclass setting will produce precision, recall and f-score that are all
identical to accuracy.
As you can see in the above linked page, both precision and recall are defined as:
where R(y, y-hat) is:
So in your case, Recall-micro will be calculated as
R = number of correct predictions / total predictions = 3/4 = 0.75

Related

F1 metric and LeaveOneOut validation strategy in scikit-learn

I want to use GridSearchCV to find the optimal n_neighbors parameter of KNeighborsClassifier
I want to use 'f1_score' metrics AND 'leave one out' strategy.
But this code
clf = GridSearchCV(KNeighborsClassifier(), {'n_neighbors': [1, 2, 3]}, cv=LeaveOneOut(), scoring='f1')
clf.fit(x_train, y_train)
leads to an error
UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 due to no true nor predicted samples. Use `zero_division` parameter to control this behavior.
I want to compute f1 score not of each fold of cross validation (it is not possible to compute f1 score of the only one test example), but to compute f1 score based on the whole iteration set with n_neighbors = n.
Is it possible using GridSearchCV?
Not sure if this functionality is directly available in Scikit-Learn, but you can implement the following function to get the desired outcome.
In particular, we will make a dummy scorer which just returns the predicted class instead of computing any score using the ground-truth and the prediction. In this way we can access the predictions of each hyperparameters combination on the different examples in the LOO cv.
from sklearn.metrics import f1_score, make_scorer
def get_pred(y_true, y_predicted):
return y_predicted
get_pred_scorer = make_scorer(get_pred)
clf = GridSearchCV(
KNeighborsClassifier(),
{'n_neighbors': [1, 2, 3]},
cv=LeaveOneOut(),
refit=False,
scoring=get_pred_scorer
)
clf.fit(X_train, y_train)
The problem with this approach is that certain results available in the cv_results_ dictionary (and in certain attributes of GridSearchCV) won't have any meaning, but that probably is not a problem. We should just remember to put refit=False, since GridSearchCV doesn't have a way to determine the best model.
Now we can access the predictions through cv_results_ and just use f1_score to compute the metric for each hyperparams configuration.
def print_params_f1_scores(clf, y_true):
y_preds = [] # will contain the predictions of each params combination
results = clf.cv_results_
params = results["params"] # all params combinations
for j in range(len(params)): # for each combination
y_preds.append([])
for i in range(clf.n_splits_): # for each split (sample in loo)
prediction_of_j_on_i = results[f"split{i}_test_score"][j]
y_preds[j].append(prediction_of_j_on_i)
# show the f1-scores of each combination
for j in range(len(y_preds)):
score = f1_score(y_true, y_preds[j])
print(f"KNeighborsClassifier with {params[j]} obtained f1-score of {score}")
print_params_f1_scores(clf, y_train)
The function prints the following output:
KNeighborsClassifier with {'n_neighbors': 1} obtained f1-score of 0.94
KNeighborsClassifier with {'n_neighbors': 2} obtained f1-score of 0.94
KNeighborsClassifier with {'n_neighbors': 3} obtained f1-score of 0.92

torchmetric calculate accuracy with threshold

How does torchmetrics.Accuracy threshold keyword work? I have the following setup:
import torch, torchmetrics
preds = torch.tensor([[0.3600, 0.3200, 0.3200]])
target = torch.tensor([0])
torchmetrics.functional.accuracy(preds, target, threshold=0.5, num_classes=3)
output:
tensor(1.)
None of the values in preds have a probability higher than 0.5, why is it giving 100% accuracy?
threshold=0.5 sets each probability under 0.5 to 0. It is used only in case you are dealing with binary (which is not your case, since num_classes=3) or multilabel classification (which seems not the case because multiclass is not set). Therefore threshold is not actually involved.
In your case, preds represents a prediction related to one observation. Its largest probability (0.36) is set at position 0, thus argmax(preds) and target are equal to 0, therefore accuracy is set to 1, because the prediction of the (only) observation is correct.
You could use a multilabel target shape with subset_accuracy=True as explained here
import torch, torchmetrics
target = torch.tensor([0,1,2])
target = torch.nn.functional.one_hot(target)
preds = torch.tensor([[0.3600, 0.3200, 0.3200],
[0.300, 0.4000, 0.3000],
[0.3000, 0.2000, 0.5000]])
torchmetrics.functional.accuracy(preds, target, subset_accuracy=True, threshold=0.6)
Output:
tensor(0.)
Decreasing threshold:
torchmetrics.functional.accuracy(preds, target, subset_accuracy=True, threshold=0.5)
Output:
tensor(0.333)
Decreasing threshold to 0.35:
torchmetrics.functional.accuracy(preds, target, subset_accuracy=True, threshold=0.35)
Output:
tensor(1.)
unfortunately this does not work for if any other probability is higher then the threshold, because it will think you are predicting multiple labels. For instance threshold 0.0 will give:
torchmetrics.functional.accuracy(preds, target, subset_accuracy=True, threshold=0.0)
Output:
tensor(0.)

Set custom probability threshold for Keras CNN

Surprised I can't quickly find this info online--
After training my CNN I grabbed the predictions by running;
predictions = model.predict_generator(test_generator, steps=num_test)
Rather than use
predicted_classes = np.argmax(predictions, axis=1)
I'd like to set a threshold of anything greater than 0.3 probability being labeled as class 1, rather than 0.5. Is there a quick and easy way to do this?
If it is a binary classification you could try:
i=0
while i < len(predictions):
if(predictions[a]<=0.3):
predictions[a]=0
else:
predictions[a]=0
i+=1
This should "round" to class 1 if the predicted value is bigger than 0.3
There is no place in Keras to set such threshold, even as Keras uses 0.5 to compute the binary_accuracy metric. Your only option is to manually threshold the predictions:
predictions = model.predict_generator(test_generator, steps=num_test)
classes = predictions > 0.3

Getting perfect ROC-AUC score for Linear SVC

I am evaluating different classifiers for my sentiment analysis model. I am looking at all available metrics, and whilst most achieve a similar precision, recall, F1-scores and ROC-AUC scores, Linear SVM appears to get a perfect ROC-AUC score. Look at the chart below:
Abbreviations: MNB=Multinomial Naive Bayes, SGD=Stochastic Gradient Descent, LR=Logistic Regression, LSVC=Linear Support Vector Classification
Here are the rest of the performance metrics for LSVC, which are very similar to the rest of the classifiers:
precision recall f1-score support
neg 0.83 0.90 0.87 24979
pos 0.90 0.82 0.86 25021
avg / total 0.87 0.86 0.86 50000
As you can see the dataset is balanced for pos and neg comments.
Here is the relevant code:
def evaluate(classifier):
predicted = classifier.predict(testing_text)
if isinstance(classifier.steps[2][1], LinearSVC):
probabilities = np.array(classifier.decision_function(testing_text))
scores = probabilities
else:
probabilities = np.array(classifier.predict_proba(testing_text))
scores = np.max(probabilities, axis=1)
pos_idx = np.where(predicted == 'pos')
predicted_true_binary = np.zeros(predicted.shape)
predicted_true_binary[pos_idx] = 1
fpr, tpr, thresholds = metrics.roc_curve(predicted_true_binary, scores)
auc = metrics.roc_auc_score(predicted_true_binary, scores)
mean_acc = np.mean(predicted == testing_category)
report = metrics.classification_report(testing_category, predicted)
confusion_matrix = metrics.confusion_matrix(testing_category, predicted)
return fpr, tpr, auc, mean_acc, report, confusion_matrix
I am using predict_proba for all classifiers apart from LSVC which uses decision_function instead (since it does not have a predict_proba method`)
What's going on?
EDIT: changes according to #Vivek Kumar's comments:
def evaluate(classifier):
predicted = classifier.predict(testing_text)
if isinstance(classifier.steps[2][1], LinearSVC):
probabilities = np.array(classifier.decision_function(testing_text))
scores = probabilities
else:
probabilities = np.array(classifier.predict_proba(testing_text))
scores = probabilities[:, 1] # NEW
testing_category_array = np.array(testing_category) # NEW
pos_idx = np.where(testing_category_array == 'pos')
predicted_true_binary = np.zeros(testing_category_array.shape)
predicted_true_binary[pos_idx] = 1
fpr, tpr, thresholds = metrics.roc_curve(predicted_true_binary, scores)
auc = metrics.roc_auc_score(predicted_true_binary, scores)
mean_acc = np.mean(predicted == testing_category)
report = metrics.classification_report(testing_category, predicted)
confusion_matrix = metrics.confusion_matrix(testing_category, predicted)
return fpr, tpr, auc, mean_acc, report, confusion_matrix
This now yields this graph:
I don't think it is valid to compare the methods predict_proba and decision_function like for like. The first sentence in the docs for LSVC decision function "Predict confidence scores for samples." must not be read as "predicting probabilties". The second sentences clarifies it, it is similar to the decision function for the general SVC.
You can use predict_proba for a linear SVC with sklearn; then you need to specific under the general SVC the kernel as 'linear'. However, then you are changing the implementation under the hood (away from "LIBLINEAR").

Shouldn't a SVM binary classifier understand the threshold from the training set?

I'm very confused about SVM classifiers and I'm sorry if I'll sound stupid.
I'm using the Spark library for java http://spark.apache.org/docs/latest/mllib-linear-methods.html, the first example from the Linear Support Vector Machines paragraph. On this training set:
1 1:10
1 1:9
1 1:9
1 1:9
0 1:1
1 1:8
1 1:8
0 1:2
0 1:2
0 1:3
the prediction on values: 8, 2 and 1 are all positive (1). Given the training set, I would expect them to be positive, negative, negative. It gives negative only on 0 or negative values. I read that the standard threshold is "positive" if the prediction is a positive double, "negative" if it's negative, and I've seen that there is a method to manually set the threshold. But isn't this the exact reason I need a binary classifier for? I mean, if I know in advance what the threshold is I can distinguish between positive and negative values, so why bother training a classifier?
UPDATE:
Using this python code from a different library:
X = [[10], [9],[9],[9],[1],[8],[8],[2],[2],[3]]
y = [1,1,1,1,0,1,1,0,0,0]
​
from sklearn.svm import SVC
from sklearn.cross_validation import StratifiedKFold
from sklearn.metrics import precision_recall_fscore_support, accuracy_score
import numpy as np
​
# we convert our list of lists in numpy arrays
X = np.array(X)
y = np.array(y)
# we compute the general accuracy of the system - we need more "false questions" to continue the study
accuracy = []
​
#we do 10 fold cross-validation - to be sure to test all possible combination of training and test
kf_total = StratifiedKFold(y, n_folds=5, shuffle=True)
for train, test in kf_total:
X_train, X_test = X[train], X[test]
y_train, y_test = y[train], y[test]
print X_train
clf = SVC().fit(X_train, y_train)
y_pred = clf.predict(X_test)
print "the classifier says: ", y_pred
print "reality is: ", y_test
print accuracy_score(y_test, y_pred)
print ""
accuracy.append(accuracy_score(y_test, y_pred))
print sum(accuracy)/len(accuracy)
the results are correct:
######
1 [0]
######
2 [0]
######
8 [1]
So I think it's possible for a SVM classifier to understand the threshold by itself; how can I do the same with the spark library?
SOLVED: I solved the issue changing the example to this:
SVMWithSGD std = new SVMWithSGD();
std.setIntercept(true);
final SVMModel model = std.run(training.rdd());
From this:
final SVMModel model = SVMWithSGD.train(training.rdd(), numIterations);
The standard value for "intercept" is false, which is what I needed to be true.
If you search for probability calibration you will find some research on a related matter (recalibrating the outputs to return better scores).
If your problem is a binary classification problem, you can calculate the slope of the cost by assigning vales to true/false positive/negative options multiplied by the class ratio. You can then form a line with the given AUC curve that intersects at only one point to find a point that is in some sense optimal as a threshold for your problem.
Threshold is one value that will differentiate classes .

Resources