I am testing a Sentiment Analysis model using NLTK. I need to add a Confusion Matrix to the classifier results and if possible also Precision, Recall and F-Measure values. I have only accuracy so far. Movie_reviews data has pos and neg labels. However to train the classifier I am using "featuresets" that has a different format from the usual (sentence, label) structure. I am not sure if I can use confusion_matrix from sklearn, after training the classifier by "featuresets"
import nltk
import random
from nltk.corpus import movie_reviews
documents = [(list(movie_reviews.words(fileid)), category)
for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]
random.shuffle(documents)
all_words = []
for w in movie_reviews.words():
all_words.append(w.lower())
all_words = nltk.FreqDist(all_words)
word_features = list(all_words.keys())[:3000]
def find_features(document):
words = set(document)
features = {}
for w in word_features:
features[w] = (w in words)
return features
featuresets = [(find_features(rev), category) for (rev, category) in documents]
training_set = featuresets[:1900]
testing_set = featuresets[1900:]
classifier = nltk.NaiveBayesClassifier.train(training_set)
print("Naive Bayes Algo accuracy percent:", (nltk.classify.accuracy(classifier, testing_set))*100)
First you can classify all test values and store predicted outcomes and gold results in a list.
Then, you can use nltk.ConfusionMatrix.
test_result = []
gold_result = []
for i in range(len(testing_set)):
test_result.append(classifier.classify(testing_set[i][0]))
gold_result.append(testing_set[i][1])
Now, You can calculate different metrics.
CM = nltk.ConfusionMatrix(gold_result, test_result)
print(CM)
print("Naive Bayes Algo accuracy percent:"+str((nltk.classify.accuracy(classifier, testing_set))*100)+"\n")
labels = {'pos', 'neg'}
from collections import Counter
TP, FN, FP = Counter(), Counter(), Counter()
for i in labels:
for j in labels:
if i == j:
TP[i] += int(CM[i,j])
else:
FN[i] += int(CM[i,j])
FP[j] += int(CM[i,j])
print("label\tprecision\trecall\tf_measure")
for label in sorted(labels):
precision, recall = 0, 0
if TP[label] == 0:
f_measure = 0
else:
precision = float(TP[label]) / (TP[label]+FP[label])
recall = float(TP[label]) / (TP[label]+FN[label])
f_measure = float(2) * (precision * recall) / (precision + recall)
print(label+"\t"+str(precision)+"\t"+str(recall)+"\t"+str(f_measure))
You can check - how to calculate precision and recall here.
You can also use : sklearn.metrics for these calculations using gold_result and test_result values.
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
print '\nClasification report:\n', classification_report(gold_result, test_result)
print '\nConfussion matrix:\n',confusion_matrix(gold_result, test_result)
Related
I have 50 different pairs of training and test sets. No matter which pair I choose I always get 0.5 as the accuracy. So something is definitely off. I would really appreciate it, if you could check that I am using everything correctly. The data consists of vectors that have to be classified correctly (either 1 or 0).
Below is my code:
train_set = pd.read_csv('C:/Users/.../train_set7.csv', sep=',')
test_set = pd.read_csv('C:/Users/.../test_set7.csv', sep=',')
train_set_values = train_set.iloc[:,0:51]
labels_train = train_set['50']
vects_train = [i for i in train_set_values.values]
test_set_values = test_set.iloc[:,0:51]
labels_test = test_set['50']
vects_test = [i for i in test_set_values.values]
clf = svm.SVC(kernel='linear', C=1000)
clf.fit(vects_train, labels_train)
from sklearn.metrics import accuracy_score
pred = clf.predict(vects_test)
accuracy_score(labels_test, pred)
>0.5
clf.score(vects_test, labels_test)
>0.5
from sklearn.metrics import f1_score
def f1_score_func(preds, labels):
preds_flat = np.argmax(preds, axis=1).flatten()
labels_flat = labels.flatten()
return f1_score(labels_flat, preds_flat, average='weighted')
def accuracy_per_class(preds, labels):
label_dict_inverse = {v: k for k, v in label_dict.items()}
preds_flat = np.argmax(preds, axis=1).flatten()
labels_flat = labels.flatten()
for label in np.unique(labels_flat):
y_preds = preds_flat[labels_flat==label]
y_true = labels_flat[labels_flat==label]
print(f'Class: {label_dict_inverse[label]}')
print(f'Accuracy: {len(y_preds[y_preds==label])}/{len(y_true)}\n')
Need to calculate classification report for multi class model but it gives accuracy and f1 score only
I suppose you are using the Pytorch environment. Here is the correct code to print the F1, recall and precision for each class in the dataset. If you have a trained model, load it and also the dataset to test on.
from sklearn.metrics import classification_report, confusion_matrix
val_dataset = LoadDataset('/content/val.csv')
val_loader = torch.utils.data.DataLoader(val_dataset,batch_size=51) # Load the data
model.load_state_dict(torch.load('vit-base.bin')) # Load the trained model
model.cuda() # For putting model on GPUs
with torch.no_grad():
image,target = next(iter(val_loader))
image = image.to(device)
target = target.flatten().to(device)
prediction = model(image)
prediction = prediction.argmax(dim=1).view(target.size()).cpu().numpy()
target = target.cpu().numpy()
print(classification_report(target,prediction,target_names=val_dataset.LE.classes_)) # LE is the label encoder
Playing around with Python's scikit SVM Linear Support Vector Classification and I'm running into an error when I attempt to make predictions:
import pickle
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.stem import PorterStemmer
from nltk import word_tokenize
import string
# Function to pass the list to the Tf-idf vectorizer
def returnPhrase(inputList):
return inputList
# Pre-processing the sentence which we input to predict the emotion
def transformSentence(sentence):
s = []
sentence = sentence.replace('\n', '')
sentTokenized = word_tokenize(sentence)
s.append(sentTokenized)
sWithoutPunct = []
punctList = list(string.punctuation)
curSentList = s[0]
newSentList = []
for word in curSentList:
if word.lower() not in punctList:
newSentList.append(word.lower())
sWithoutPunct.append(newSentList)
mystemmer = PorterStemmer()
tokenziedStemmed = []
for i in range(0, len(sWithoutPunct)):
curList = sWithoutPunct[i]
newList = []
for word in curList:
newList.append(mystemmer.stem(word))
tokenziedStemmed.append(newList)
return tokenziedStemmed
# Extracting the features for SVM
myVectorizer = TfidfVectorizer(analyzer='word', tokenizer=returnPhrase, preprocessor=returnPhrase,
token_pattern=None,
ngram_range=(1, 3))
# The SVM Model
curC = 2 # cost factor in SVM
SVMClassifier = svm.LinearSVC(C=curC)
filename = 'finalized_model.sav'
# load the model from disk
loaded_model = pickle.load(open(filename, 'rb'))
# Input sentence
with open('trial_truth_001.txt', 'r') as file:
sent = file.read().replace('\n', '')
transformedTest = transformSentence(sent)
X_test = myVectorizer.transform(transformedTest).toarray()
Prediction = loaded_model.predict(X_test)
# Printing the predicted emotion
print(Prediction)
It's when I attempt to use the LinearSVC to predict that I'm informed:
sklearn.exceptions.NotFittedError: Vocabulary not fitted or provided
What am I missing here? Obviously it is the way I fit and transform the data.
I think you just have to change the line
X_test = myVectorizer.transform(transformedTest).toarray()
to
X_test = myVectorizer.fit_transform(transformedTest).toarray()
I am using the following code to classify a document in to three categories Sports, Politics and money. I can see that this code calculates Precision recall and F1. But I am not able to find a way to use this code to test against custom document a predict its label.
from nltk.corpus import stopwords, reuters
from nltk import word_tokenize
from nltk.stem.porter import PorterStemmer
import re
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.svm import LinearSVC
from sklearn.multiclass import OneVsRestClassifier
from sklearn.metrics import f1_score, precision_score, recall_score
cachedStopWords = stopwords.words("english")
def tokenize(text):
min_length = 3
words = map(lambda word: word.lower(), word_tokenize(text))
words = [word for word in words if word not in cachedStopWords]
tokens = (list(map(lambda token: PorterStemmer().stem(token),words)))
p = re.compile('[a-zA-Z]+');
filtered_tokens = list(filter (lambda token: p.match(token) and len(token) >= min_length,tokens))
return filtered_tokens
def represent(documents, representer):
train_docs_id = list(filter(lambda doc: doc.startswith("train"), documents))
test_docs_id = list(filter(lambda doc: doc.startswith("test"), documents))
train_docs = [reuters.raw(doc_id) for doc_id in train_docs_id]
test_docs = [reuters.raw(doc_id) for doc_id in test_docs_id]
# Learn and transform train documents
vectorised_train_documents = representer.fit_transform(train_docs)
vectorised_test_documents = representer.transform(test_docs)
# Transform multilabel labels
mlb = MultiLabelBinarizer()
train_labels = mlb.fit_transform([reuters.categories(doc_id) for doc_id in train_docs_id])
test_labels = mlb.transform([reuters.categories(doc_id) for doc_id in test_docs_id])
return (vectorised_train_documents, train_labels, vectorised_test_documents, test_labels)
def evaluate(test_labels, predictions):
precision = precision_score(test_labels, predictions, average='micro')
recall = recall_score(test_labels, predictions, average='micro')
f1 = f1_score(test_labels, predictions, average='micro')
print("Micro-average quality numbers")
print("Precision: {:.4f}, Recall: {:.4f}, F1-measure: {:.4f}".format(precision, recall, f1))
precision = precision_score(test_labels, predictions, average='macro')
recall = recall_score(test_labels, predictions, average='macro')
f1 = f1_score(test_labels, predictions, average='macro')
print("Macro-average quality numbers")
print("Precision: {:.4f}, Recall: {:.4f}, F1-measure: {:.4f}".format(precision, recall, f1))
documents = reuters.fileids()
candidate = {'representer': TfidfVectorizer(tokenizer=tokenize),
'estimator': OneVsRestClassifier(LinearSVC(random_state=42))}
train_docs, train_labels, test_docs, test_labels = represent(documents, candidate['representer'])
candidate['estimator'].fit(train_docs, train_labels)
predictions = candidate['estimator'].predict(test_docs)
evaluate(test_labels, predictions)
Credits:
https://github.com/miguelmalvarez/reuters-tc/blob/master/notebook/Classification_Reuters.ipynb
You can store your custom documents as text files in a folder, lets say yourfolder. After that you can use the below code to train on reuters data and predict labels for your text documents. all_labels will contain the list of predicted labels (as tuples) for each document
import os
classifier=OneVsRestClassifier(LinearSVC(random_state=42))
vectorizer=TfidfVectorizer(tokenizer=tokenize)
#LOAD AND TRANSFORM TRAINING DOCS
documents = reuters.fileids()
train_docs_id = list(filter(lambda doc: doc.startswith("train"), documents))
train_docs = [reuters.raw(doc_id) for doc_id in train_docs_id]
vectorised_train_documents = vectorizer.fit_transform(train_docs)
mlb = MultiLabelBinarizer()
train_labels = mlb.fit_transform([reuters.categories(doc_id) for doc_id in train_docs_id])
#LEARN CLASSIFICATION MODEL
classifier=classifier.fit(vectorised_train_documents, train_labels)
#LOAD AND TRANSFORM TEST DOCS
documents_yours=os.listdir('yourfoldername')
test_docs_yours = [open('yourfoldername/'+doc_id).read() for doc_id in documents_yours]
vectorised_test_documents_yours = vectorizer.transform(test_docs_yours)
#MAKEPREIDCTIONS
predictions_yours=classifier.predict(vectorised_test_documents_yours)
all_labels = mlb.inverse_transform(predictions_yours)
all_labels
It is my first random forest practice, unfortunately it gets worse performance than a single decision tree. I have been working on this for while, but failed to figure out where is the problem. Below is some running records. (I am so sorry for posting complete code.)
Sklearn Decision Tree Classifier 0.714285714286
Sklearn Random Forest Classifier 0.714285714286
My home made Random Forest Classifier 0.628571428571
Sklearn Decision Tree Classifier 0.642857142857
Sklearn Random Forest Classifier 0.814285714286
My home made Random Forest Classifier 0.571428571429
Sklearn Decision Tree Classifier 0.757142857143
Sklearn Random Forest Classifier 0.771428571429
My home made Random Forest Classifier 0.585714285714
I use sonar dataset from this ,(Sonar,+Mines+vs.+Rocks) because it has about 60 features.
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
# section 1: read data, shuffle, change label from string to float
filename = "sonar_all_data.csv"
colnames = ['c'+str(i) for i in range(60)]
colnames.append('type')
df = pd.read_csv(filename, index_col=None, header=None, names=colnames)
df = df.sample(frac=1).reset_index(drop=True)
df['lbl'] = 1.0
df.loc[df['type']=='R', 'lbl'] = 0.0
df.drop('type', axis=1, inplace=True)
df.astype(np.float32, inplace=True)
feature_names = ['c' + str(i) for i in range(60)]
label_name =['lbl']
# section 2: prep train and test data
test_x = df[:70][feature_names].get_values()
test_y = df[:70][label_name].get_values().ravel()
train_x = df[70:][feature_names].get_values()
train_y = df[70:][label_name].get_values().ravel()
# section 3: take a look at performance of sklearn decision tree and randomforest
clf = DecisionTreeClassifier()
clf.fit(train_x, train_y)
print("Sklearn Decision Tree Classifier", clf.score(test_x, test_y))
rfclf = RandomForestClassifier(n_jobs=2)
rfclf.fit(train_x, train_y)
print("Sklearn Random Forest Classifier", rfclf.score(test_x, test_y))
# section 4: my first practice of random forest
m = 10
votes = [1/m] * m
num_train = len(train_x)
num_feat = len(train_x[0])
n = int(num_train * 0.6)
k = int(np.sqrt(num_feat))
index_of_train_data = np.arange(num_train)
index_of_train_feat = np.arange(num_feat)
clfs = [DecisionTreeClassifier() for _ in range(m)]
feats = []
for i, xclf in enumerate(clfs):
np.random.shuffle(index_of_train_data)
np.random.shuffle(index_of_train_feat)
row_idx = index_of_train_data[:n]
feat_idx = index_of_train_feat[:k]
sub_train_x = train_x[row_idx,:][:, feat_idx]
sub_train_y = train_y[row_idx]
xclf.fit(sub_train_x, sub_train_y)
feats.append(feat_idx)
pred = np.zeros(test_y.shape)
for clf, feat, vote in zip(clfs, feats, votes):
pred += clf.predict(test_x[:, feat]) * vote
pred[pred > 0.5] = 1.0
pred[pred <= 0.5] = 0.0
print("My home made Random Forest Classifier", sum(pred==test_y)/len(test_y))
As chrisckwong821 put it, you are overfitting: if you construct a random forest that is too deep, it will too much look like your training data, and will badly predict new (test) data.