why have high AUC and low accuracy in a balanced dataset for SVM - svm

I used LIBSVM to classify 256 classes. My dataset is about 5000-10000. For SVM, I used one against one strategy to train my models. Now, I get the results of low accuracy (15%~30%) but high AUC (>90%). I suppose that one cannot obtain high AUC (0.9 and higher) if Acc of the corresponding predictive model is low (13-30 %)?
I refer to the Open Source Python Library (scikit-learn )to compute the AUC of many kinds of problems. (http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html#sphx-glr-auto-examples-model-selection-plot-roc-py)
This is used these code to compute AUC:
# compute ROC curve and ROC area for each class
fpr = dict()
tpr = dict()
roc_auc = dict()
# test_label_kernel: the true label of one insance
# LensOfLabel : the number of all classes
y = label_binarize( test_label_kernel, classes = list(range(0,LensOfLabel,1)) )
#sort_pval: the prediction probability of SVM
for i in range(LensOfLabel):
fpr[i], tpr[i], _ = metrics.roc_curve( y[:,i], sort_pval[:,i] )
roc_auc[i] = metrics.auc( fpr[i], tpr[i] )
# First aggregate all false positive rates
n_classes = LensOfLabel
all_fpr = np.unique(np.concatenate([fpr[i] for i in range(n_classes)]))
# Then interpolate all ROC curves at this points
mean_tpr = np.zeros_like(all_fpr)
for i in range(n_classes):
mean_tpr += interp(all_fpr, fpr[i], tpr[i])
# Finally average it and compute AUC
mean_tpr /= n_classes
fpr["macro"] = all_fpr
tpr["macro"] = mean_tpr
roc_auc["macro"] = metrics.auc(fpr["macro"], tpr["macro"])
print( ("macroAUC: %.4f") %roc_auc["macro"] )
#compute micro-average ROC curve and ROC area
fpr["micro"], tpr["micro"], _ = metrics.roc_curve( y.ravel(), sort_pval.ravel() )
roc_auc["micro"] = metrics.auc( fpr["micro"], tpr["micro"] )
print( ("microAUC: %.4f") %roc_auc["micro"] )
The ROC curve is;
https://i.stack.imgur.com/GEUqr.png
https://i.stack.imgur.com/ucbE6.png

Related

calculate Entropy for each class of the test set to measure uncertainty on pytorch

I am trying to calculate Entropy of each class of the dataset for an image classification task to measure uncertainty on pytorch,using the MC Dropout method and the solution proposed in this link
Measuring uncertainty using MC Dropout on pytorch
First,I have calculated the mean of each class per batch across different forward passes (class_mean_batch) and then for all the testloader (classes_mean) and then did some transformations to get (total_mean) to use it for calculating Entropy as shown in the code below
def mcdropout_test(batch_size,n_classes,model,T):
#set non-dropout layers to eval mode
model.eval()
#set dropout layers to train mode
enable_dropout(model)
softmax = nn.Softmax(dim=1)
classes_mean = []
for images,labels in testloader:
images = images.to(device)
labels = labels.to(device)
classes_mean_batch = []
with torch.no_grad():
output_list = []
#getting outputs for T forward passes
for i in range(T):
output = model(images)
output = softmax(output)
output_list.append(torch.unsqueeze(output, 0))
concat_output = torch.cat(output_list,0)
# getting mean of each class per batch across multiple MCD forward passes
for i in range (n_classes):
mean = torch.mean(concat_output[:, : , i])
classes_mean_batch.append(mean)
# getting mean of each class for the testloader
classes_mean.append(torch.stack(classes_mean_batch))
total_mean = []
concat_classes_mean = torch.stack(classes_mean)
for i in range (n_classes):
concat_classes = concat_classes_mean[: , i]
total_mean.append(concat_classes)
total_mean = torch.stack(total_mean)
total_mean = np.asarray(total_mean.cpu())
epsilon = sys.float_info.min
# Calculating entropy across multiple MCD forward passes
entropy = (- np.sum(total_mean*np.log(total_mean + epsilon), axis=-1)).tolist()
for i in range(n_classes):
print(f'The uncertainty of class {i+1} is {entropy[i]:.4f}')
Can anyone please correct or confirm the implementation i have used to calculate Entropy of each class.

Plotting AUC score for multiple model for multiclass classification in Python

I am doing a multiclass classification problem. There are a total of 46 unique classes in my dataset. I have computed the AUC sore for all the class and plot it but I want to plot my AUC score for different types of models in one graph means I want to plot my graph for LogisticRegression, XGBoost and 2 more which is used to solve the multiclass problem. My code what I have done till-
n_classes = 46
best_C =1000
best_gamma =0.0001
svc_model_grid_param = SVC(C=best_C, kernel="rbf", gamma= best_gamma, )
model_OVR_svc = OneVsRestClassifier(svc_model_grid_param)
y_score = model_OVR_svc.fit(X_train, y_train).decision_function(X_valid)
# Compute ROC curve and ROC area for each class
fpr = dict()
tpr = dict()
roc_auc = dict()
# calculate dummies once
y_test_dummies = pd.get_dummies(y_valid, drop_first=False).values
for i in range(n_classes):
fpr[i], tpr[i], _ = roc_curve(y_test_dummies[:, i], y_score[:, i])
roc_auc[i] = auc(fpr[i], tpr[i])
Plotting--
import matplotlib.pylab as plt
lists = sorted(roc_auc.items()) # sorted by key, return a list of tuples
x, y = zip(*lists) # unpack a list of pairs into two tuples
plt.xlabel('Class')
plt.ylabel('AUC Score')
plt.plot(x, y)
plt.show()
Graph--
What I want to do--
Can anyone help me to do this.. Thanks in advance

Getting perfect ROC-AUC score for Linear SVC

I am evaluating different classifiers for my sentiment analysis model. I am looking at all available metrics, and whilst most achieve a similar precision, recall, F1-scores and ROC-AUC scores, Linear SVM appears to get a perfect ROC-AUC score. Look at the chart below:
Abbreviations: MNB=Multinomial Naive Bayes, SGD=Stochastic Gradient Descent, LR=Logistic Regression, LSVC=Linear Support Vector Classification
Here are the rest of the performance metrics for LSVC, which are very similar to the rest of the classifiers:
precision recall f1-score support
neg 0.83 0.90 0.87 24979
pos 0.90 0.82 0.86 25021
avg / total 0.87 0.86 0.86 50000
As you can see the dataset is balanced for pos and neg comments.
Here is the relevant code:
def evaluate(classifier):
predicted = classifier.predict(testing_text)
if isinstance(classifier.steps[2][1], LinearSVC):
probabilities = np.array(classifier.decision_function(testing_text))
scores = probabilities
else:
probabilities = np.array(classifier.predict_proba(testing_text))
scores = np.max(probabilities, axis=1)
pos_idx = np.where(predicted == 'pos')
predicted_true_binary = np.zeros(predicted.shape)
predicted_true_binary[pos_idx] = 1
fpr, tpr, thresholds = metrics.roc_curve(predicted_true_binary, scores)
auc = metrics.roc_auc_score(predicted_true_binary, scores)
mean_acc = np.mean(predicted == testing_category)
report = metrics.classification_report(testing_category, predicted)
confusion_matrix = metrics.confusion_matrix(testing_category, predicted)
return fpr, tpr, auc, mean_acc, report, confusion_matrix
I am using predict_proba for all classifiers apart from LSVC which uses decision_function instead (since it does not have a predict_proba method`)
What's going on?
EDIT: changes according to #Vivek Kumar's comments:
def evaluate(classifier):
predicted = classifier.predict(testing_text)
if isinstance(classifier.steps[2][1], LinearSVC):
probabilities = np.array(classifier.decision_function(testing_text))
scores = probabilities
else:
probabilities = np.array(classifier.predict_proba(testing_text))
scores = probabilities[:, 1] # NEW
testing_category_array = np.array(testing_category) # NEW
pos_idx = np.where(testing_category_array == 'pos')
predicted_true_binary = np.zeros(testing_category_array.shape)
predicted_true_binary[pos_idx] = 1
fpr, tpr, thresholds = metrics.roc_curve(predicted_true_binary, scores)
auc = metrics.roc_auc_score(predicted_true_binary, scores)
mean_acc = np.mean(predicted == testing_category)
report = metrics.classification_report(testing_category, predicted)
confusion_matrix = metrics.confusion_matrix(testing_category, predicted)
return fpr, tpr, auc, mean_acc, report, confusion_matrix
This now yields this graph:
I don't think it is valid to compare the methods predict_proba and decision_function like for like. The first sentence in the docs for LSVC decision function "Predict confidence scores for samples." must not be read as "predicting probabilties". The second sentences clarifies it, it is similar to the decision function for the general SVC.
You can use predict_proba for a linear SVC with sklearn; then you need to specific under the general SVC the kernel as 'linear'. However, then you are changing the implementation under the hood (away from "LIBLINEAR").

How to binarize RandomForest to plot a ROC in python?

I have 21 classes. I am using RandomForest. I want to plot a ROC curve, so I checked the example in scikit ROC with SVM
The example uses SVM. SVM has parameters like: probability and decision_function_shape which RF does not.
So how can I binarize RandomForest and plot a ROC?
Thank you
EDIT
To create the fake data. So there are 20 features and 21 classes (3 samples for each class).
df = pd.DataFrame(np.random.rand(63, 20))
label = np.arange(len(df)) // 3 + 1
df['label']=label
df
#TO TRAIN THE MODEL: IT IS A STRATIFIED SHUFFLED SPLIT
clf = make_pipeline(RandomForestClassifier())
xSSSmean10 = []
for i in range(10):
sss = StratifiedShuffleSplit(y, 10, test_size=0.1, random_state=i)
scoresSSS = cross_validation.cross_val_score(clf, x, y , cv=sss)
xSSSmean10.append(scoresSSS.mean())
result_list.append(xSSSmean10)
print("")
For multilabel random forest, each of your 21 labels has a binary classification, and you can create a ROC curve for each of the 21 classes.
Your y_train should be a matrix of 0 and 1 for each label.
Assume you fit a multilabel random forest from sklearn and called it rf, and have a X_test and y_test after a test train split. You can plot the ROC curve in python for your first label using this:
from sklearn import metrics
probs = rf.predict_proba(X_test)
fpr, tpr, threshs = metrics.roc_curve(y_test['name_of_your_first_tag'],probs[0][:,1])
Hope this helps. If you provide your code and data I could write this more specifically.

Low accuracy for TF-IDF with SVM using TfidfVectorizer and Scikit-learn

I'm trying to classify documents as deceptive or truthful using TF-IDF and SVM. I know that this has been done before but I'm not quite sure I'm implementing it right. I have a corpus of texts and am building the TF-IDF such as
vectorizer = TfidfVectorizer(min_df=1, binary=0, use_idf=1, smooth_idf=0, sublinear_tf=1)
tf_idf_model = vectorizer.fit_transform(corpus)
features = tf_idf_model.toarray()
And for the classification:
seed = random.random()
random.seed(seed)
random.shuffle(features)
random.seed(seed)
random.shuffle(labels)
features_folds = np.array_split(features, folds)
labels_folds = np.array_split(labels, folds)
for C_power in C_powers:
scores = []
start_time = time.time()
svc = svm.SVC(C=2**C_power, kernel='linear')
for k in range(folds):
features_train = list(features_folds)
features_test = features_train.pop(k)
features_train = np.concatenate(features_train)
labels_train = list(labels_folds)
labels_test = labels_train.pop(k)
labels_train = np.concatenate(labels_train)
scores.append(svc.fit(features_train, labels_train).score(features_test, labels_test))
print(scores)
But I'm receiving an accuracy of ~50%. My corpus is 1600 texts.
I think you may want to reduce the TF-IDF matrix before feeding it into SVM because SVM is not quite good at handling large sparse matrix. I would suggest using TruncatedSVD to reduce the dimensionality of the TF-IDF matrix.
vectorizer = TfidfVectorizer(min_df=1, binary=0, use_idf=1, smooth_idf=0, sublinear_tf=1)
svd = TruncatedSVD(n_components=20)
pipeline = Pipeline([
('tfidf', vectorizer),
('svd', svd)])
features = pipeline.fit_transform(corpus)
Of course you need to tune the n_components to find the optimal number of components to keep.

Resources