calculating confidence while doing classification - statistics

I am using a Naive Bayes algorithm to predict movie ratings as positive or negative. I have been able to rate movies with 81% accuracy. I am, however, trying to assign a 'confidence level' for each of the ratings as well.
I am trying to identify how I can tell the user something like "we think that the review is positive with 80% confidence". Can someone help me understand how I can calculate a confidence level to our classification result?

You can report the probability P(y|x) that Naive Bayes calculates. (But note that Naive Bayes isn't a very good probability model, even if it's not too bad a classifier.)

Related

F1 score in a multi-label classification where the number of labels in one image is sparse and the number of labels between classes is biased

I use scikit-learn to measure multi-label classification with f-score where labels are imbalanced per image and the number of labels is low per image.
What should I use and why? average = "micro" or "samples"?
It does not matter if you have biased or unbiased data, the average='micro' or average='macro' is considered as the better as it gives you "better" results. You can refer this answer to know why macro is considered as good.

Sensitivity Vs Positive Predicted Value - which is best?

I am trying to build a model on a class imbalanced dataset (binary - 1's:25% and 0's 75%). Tried with Classification algorithms and ensemble techniques. I am bit confused on below two concepts as i am more interested in predicting more 1's.
1. Should i give preference to Sensitivity or Positive Predicted Value.
Some ensemble techniques give maximum 45% of sensitivity and low Positive Predicted Value.
And some give 62% of Positive Predicted Value and low Sensitivity.
2. My dataset has around 450K observations and 250 features.
After power test i took 10K observations by Simple random sampling. While selecting
variable importance using ensemble technique's the features
are different compared to the features when i tried with 150K observations.
Now with my intuition and domain knowledge i felt features that came up as important in
150K observation sample are more relevant. what is the best practice?
3. Last, can i use the variable importance generated by RF in other ensemple
techniques to predict the accuracy?
Can you please help me out as am bit confused on which w
The preference between Sensitivity and Positive Predictive value depends on your ultimate goal of the analysis. The difference between these two values is nicely explained here: https://onlinecourses.science.psu.edu/stat507/node/71/
Altogether, these are two measures that look at the results from two different perspectives. Sensitivity gives you a probability that a test will find a "condition" among those you have it. Positive Predictive value looks at the prevalence of the "condition" among those who is being tested.
Accuracy is depends on the outcome of your classification: it is defined as (true positive + true negative)/(total), not variable importance's generated by RF.
Also, it is possible to compensate for the imbalances in the dataset, see https://stats.stackexchange.com/questions/264798/random-forest-unbalanced-dataset-for-training-test

Azure Machine Learning - Classification Clarfication

Hi,
Can someone please help me understand the results from the model (image) below? I am new to ML and checking if my understanding is correct that the model is 66% correct and not 83% in terms of prediction?
The metrics have different meanings, they are both correct, but if you wonder which one is more useful for evaluation, I think you should understand the difference between overall accuracy and average accuracy.
Overall accuracy : number of correctly predicted items/total of item to predict.
average accuracy : it is the average of each accuracy per class (sum of accuracy for each class predicted/number of class).
You could refer to the two articles, 1 and 2, they will be helpful.
Read My Article which explain you parameters of classification Algorithm
https://social.technet.microsoft.com/wiki/contents/articles/33879.classification-algorithms-parameters-in-azure-ml.aspx

Interpreting results from lightFM

I built a recommendation model on a user-item transactional dataset where each transaction is represented by 1.
model = LightFM(learning_rate=0.05, loss='warp')
Here are the results
Train precision at k=3: 0.115301
Test precision at k=3: 0.0209936
Train auc score: 0.978294
Test auc score : 0.810757
Train recall at k=3: 0.238312330233
Test recall at k=3: 0.0621618086561
Can anyone help me interpret this result? How is it that I am getting such good auc score and such bad precision/recall? The precision/recall gets even worse for 'bpr' Bayesian personalized ranking.
Prediction task
users = [0]
items = np.array([13433, 13434, 13435, 13436, 13437, 13438, 13439, 13440])
model.predict(users, item)
Result
array([-1.45337546, -1.39952552, -1.44265926, -0.83335167, -0.52803332,
-1.06252205, -1.45194077, -0.68543684])
How do I interpret the prediction scores?
Thanks
When it comes to the difference between precision#K at AUC, you may want to have a look at my answer here: Evaluating the LightFM Recommendation Model.
The scores themselves do not have a defined scale and are not interpretable. They only make sense in the context of defining a ranking over items for a given user, with higher scores denoting a stronger predicted preference.

information criteria for confusion matrices

One can measure goodness of fit of a statistical model using Akaike Information Criterion (AIC), which accounts for goodness of fit and for the number of parameters that were used for model creation. AIC involves calculation of maximized value of likelihood function for that model (L).
How can one compute L, given prediction results of a classification model, represented as a confusion matrix?
It is not possible to calculate the AIC from a confusion matrix since it doesn't contain any information about the likelihood. Depending on the model you are using it may be possible to calculate the likelihood or quasi-likelihood and hence the AIC or QIC.
What is the classification problem that you are working on, and what is your model?
In a classification context often other measures are used to do GoF testing. I'd recommend reading through The Elements of Statistical Learning by Hastie, Tibshirani and Friedman to get a good overview of this kind of methodology.
Hope this helps.
Information-Based Evaluation Criterion for Classifier's Performance by Kononenko and Bratko is exactly what I was looking for:
Classification accuracy is usually used as a measure of classification performance. This measure is, however, known to have several defects. A fair evaluation criterion should exclude the influence of the class probabilities which may enable a completely uninformed classifier to trivially achieve high classification accuracy. In this paper a method for evaluating the information score of a classifier''s answers is proposed. It excludes the influence of prior probabilities, deals with various types of imperfect or probabilistic answers and can be used also for comparing the performance in different domains.

Resources