Basically I had to do an application for classifying documents based on the part of the speech of the vocabulary of the words. The algorithm that was used for learning the classification problem was ready made and handed over for me.
Based on the examples I got, I need to interpret these results( precision, recall, accuracy). Can someone say his opinion if these results are good or not?
accuracy = 0.91 (true positive + true negative)/all
f-measure = 0.34
precision = 0.45
recall = 0.33
negative rate = 0.92
Related
In risk management is frequent that we want to be more conservative in the face of high uncertainty. Is there a way of "adjusting" a linear regression prediction based on the uncertainty (i.e., standard deviation of prediction)? For example, if I have that prediction 1 = prediction 2 = 100, but prediction 2 has a higher uncertainty, then I'd like to adjust prediction 2 to be smaller than prediction 1 because I'm acknowledging risk and being more conservative.
I assume this is a common problem, but I haven't been able to find anything online for some reason.
Thanks!
I would like a recommendation
I have two classes in my data.
This is how class distribution looks like.
0.0 169072
1.0 84944
In other words, I have 2:1 class distribution.
I believe I have two choices. Downsample the class 0.0 or upsample class 1.0. If I go with option 1, I'm losing data. If i go with option 2, then I'm using non-real data.
Is there a way, I can train the model without upsample or downsample?
This is how my classification_report looks like.
precision recall f1-score support
0.0 0.68 1.00 0.81 51683
1.0 1.00 0.00 0.00 24522
accuracy 0.68 76205
macro avg 0.84 0.50 0.40 76205
weighted avg 0.78 0.68 0.55 76205
Your data is slightly imbalanced yes, but it does not mean that you only have one of the two options (under or over sample your data). You can leave the data as is and apply cost sensitive training in your model. For example, if in your case the classes have a match of 2:1, then you need to give a weight of 2 to your minority class. In the example of an XGBoost classifier, this argument is called scale_pos_weight. See more in this excellent tutorial.
Regarding model evaluation, you should use a classification report to have a full intuition of your model's true and false predictions (precision and recall are your two best friends in this process!).
I would not recommend either approach.
I'm thinking about models to detect fraud. By definition, fraud should be a small percentage of outcomes - on the order of 1-5%. Changing the percentage for training would be a gross distortion of the problem being solved.
Better to leave the proportions as they are.
Make sure that your train, validation, and test data sets all have ratios that reflect the real problem.
Adjust your outcome instead. Don't go for accuracy. A naive model that assumes the 0 outcome will be correct 2/3rds of the time. You want your model to be better than that or a weighted coin flip.
I'd recommend using recall as your criterion for success.
I am training a deep neural net using keras. One of the scores is called val_acc. I get like a 70% val_acc. How do I know if this is good or bad? The neural net is a binary classifier, so I am trying to predict a 1 or a 0. The data itself is about 65% 0's and 35% 1's. Is my 70% val_acc any good?
Accuracy is not always the right metric for the evaluation of a classifier. For example, it could be more important for you to classify the 1s more correctly than 0s (for example fraud detection) or the other way. So you may be interested to have a classifier with higher precision (specificity) or recall (sensitivity). In other words, false positives may be more expensive for you than false negatives. If you have some idea about the costs of misclassifications (e.g. for FPs & FNs) then you can precisely compute the specific threshold that will be optimal (instead of default 0.5) for 0-1 classification. You can use ROC curves and AUC to find performance of your classifier as well (the higher AUC the better). Finally you may want to consider kappa statistics to find how useful / effective your classifier is.
I'm designing a text classifier in Python using NLTK. One of the features considered in every sentence is it's sentiment. I want to weight sentences with either positive or negative sentiments more that those without any sentiment(neutral sentences). Using the movie review corpus along with the naive bayes classifier results in only positive and negative labels. I tried using demo_liu_hu_lexicon in nltk.sentiment.utils but the function does not return any values but instead prints it to the output and is very slow. Does anyone know of a library which gives some sort of weight to sentences based on sentiment?
Thanks!
Try the textblob module:
from textblob import TextBlob
text = '''
These laptops are horrible but I've seen worse. How about lunch today? The food was okay.
'''
blob = TextBlob(text)
for sentence in blob.sentences:
print(sentence.sentiment.polarity)
# -0.7
# 0.0
# 0.5
It uses the nltk library to determine the polarity - which is a float measure ranging from -1 to 1 for the sentiment. Neutral sentences have zero polarity. You should be able to get the same measure directly from nltk.
Vader is a rule-based sentiment analysis tool that works well for social media texts as well regular texts.
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyser = SentimentIntensityAnalyzer()
def print_sentiment_scores(tweets):
vadersenti = analyser.polarity_scores(tweets)
return pd.Series([vadersenti['pos'], vadersenti['neg'], vadersenti['neu'], vadersenti['compound']])
text = 'This goes beyond party lines. Separating families betrays our values as Texans, Americans and fellow human beings'
print_sentiment_scores(text)
The results are:
0 0.2470
1 0.0000
2 0.7530
3 0.5067
The compound score is computed by summing the valence scores of each word in the lexicon, adjusted according to the rules, and then normalized to be between -1 (most extreme negative) and +1 (most extreme positive). This is the most useful metric if you want a single unidimensional measure of sentiment for a given sentence. Calling it a 'normalized, weighted composite score' is accurate
Though positive sentiment is derived with the compound score >= 0.05, we always have an option to determine the positive, negative & neutrality of the sentence, by changing these scores
I personally find Vader Sentiment to figure out the sentiment based on the emotions, special characters, emojis very well.
I'm currently evaluating a bag of words texture classifier which output binary results:
true positives(TP)
true negatives(TN)
false positives(FP)
false negatives(FN)
I'm looking to calculate the accuracy but am not sure i'm assigning true negatives correctly.
I'm currently working with 8 classes and assign 7 true negatives each time there is a true positive, and 6 true negatives and a false negative each time there is a false positive.
I wasn't sure if i should instead add one to the true negatives only when there is a true positive?
This still seems to give overly high results, like for these results:
TP: 20
FP: 10
TN: 20
FN: 10
Accuracy: 0.66
when assigning true negatives like i originally did it's even higher. Shouldn't accuracy be 50% when only half the results are correct or is this normal?
Also do you think this is the best metric to measure classifier accuracy or is there something more advanced?
thanks
From what i've read the method i was using at first was correct, although standard Accuracy (Overall Accuracy) is not necessarily the best way to evaluate a classifier.
Precision and Recall are the widely used as they represent both type 1 and type 2 errors. For a single combined metric however then the F1Measure is typically used F1Score this is the Harmonic mean of precision and recall and can be calculated with this formula:
formula.
Other options such as ROC curves(generated from the True Positive Rate(TPR) and False Positive Rate(FPR)), are also used although not necessarily for multi class systems. To generate a single metric with these the Area Under the Curve(AUC) is taken which largely represents the classifier's predictive ability. This is again however not widely used for multiclass systems.