I'm doing a classification with cnn. I find confusion_matrix to return a single value. Isn't there a problem?
from sklearn.metrics import classification_report
y_pred=model.predict(X_test)
y_pred=np.argmax(y_pred, axis=1)
y_test=np.argmax(y_test, axis=1)
cm = confusion_matrix(y_test, y_pred)
print(cm)
#[[851]]
I would expect something like:
#[[2 0][0 0]]
Related
I have a classification problem for which I want to do classification for class A, B, and C. I try to use naive bayes classifier and the accuracy is 100%, which I really doubt is not true. I have small dataset around 350, among that class A is 140, class B is 140 and rest are class C. Here is the code I used. Can anyone please provide me some suggestions on this?
import sklearn
from sklearn.metrics import accuracy_score
X = feature_data_frame.values
y = label_data
import sklearn.preprocessing as preprocessing
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB, MultinomialNB
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=.10)
gnb = GaussianNB()
y_pred = gnb.fit(x_train, y_train).predict(x_test)
accuracy = accuracy_score(y_test, y_pred)
Thanks in advance.
I am following this images tutorial for Tensorflow, but I am having trouble setting up a confusion matrix because the tutorial does not follow the X_test, y_test format that traditional examples use:
Ex:
from sklearn.metrics import classification_report, confusion_matrix
y_proba = model.predict(X_test)
y_pred = np.argmax(y_proba,axis=1)
print('Confusion Matrix')
print(confusion_matrix(y_pred, y_test))
print('Classification Report')
print(classification_report(y_pred, y_test))
How can I set up a confusion matrix based on the tensorflow images tutorial?
#NewtoNN
I looked over the tutorial that you provided, but I cannot find the predicted values.
You could try to first predict and then compare the true values with the predicted ones.
Try the following or something similar:
from sklearn.metrics import confusion_matrix
y_proba = model.predict(val_ds)`
y_pred = np.argmax(y_proba,axis=1)`
print('Confusion Matrix')
print(confusion_matrix(y_pred, val_ds))
I am trying to plot a confusion matrix for my classification model given the iris dataset. However, I keep getting an error. I hope someone can guide.Thanks
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn import tree
from sklearn import metrics
from sklearn.metrics import confusion_matrix
def train_and_predict(train_input_features, train_outputs, prediction_features):
classifier=tree.DecisionTreeClassifier()
classifier.fit(train_input_features,train_outputs)
predictions=classifier.predict(prediction_features)
iris = datasets.load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target,test_size=0.3, random_state=0)
y_pred = train_and_predict(X_train, y_train, X_test)
print(confusion_matrix(y_test, predictions))
OUT: NameError: name 'predictions' is not defined
I found out that I needed to paste the code within the function,i.e.:
def train_and_predict(train_input_features, train_outputs, prediction_features):
classifier=tree.DecisionTreeClassifier()
classifier.fit(train_input_features,train_outputs)
predictions=classifier.predict(prediction_features)
print(predictions)
print('Confusion matrix\n',confusion_matrix(y_test,classifier.predict(X_test)))
I want to compare results of my regression analysis with encoded categorical variables with two baseline models where the baseline predictions are specified as the average or min values of the groups. I've chosen Rsquare and MAE for comparison. Below is a simplified example of my code for illustration. It works in that it gives me an output which I think achieves my goal. Is this the correct and/or best way to do this?
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics
df = pd.DataFrame([['a1','c1',10],
['a1','c2',15],
['a1','c3',20],
['a1','c1',15],
['a2','c2',20],
['a2','c3',15],
['a2','c1',20],
['a2','c2',15],
['a3','c3',20],
['a3','c3',15],
['a3','c3',15],
['a3','c3',20]], columns=['aid','cid','T'])
df_dummies = pd.get_dummies(df, columns=['aid','cid'],prefix_sep='',prefix='')
df_dummies
X = df_dummies
y = df_dummies['T']
# train test split 80-20
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
regr = LinearRegression()
regr.fit(X_train, y_train)
y_pred = regr.predict(X_test)
print('R-squared:', metrics.r2_score(y_test, y_pred))
print('MAE:', metrics.mean_absolute_error(y_test, y_pred))
# Baseline model with group average as prediction
y_pred = df.groupby('aid').agg({'T': ['mean']})
print('R-squared:', metrics.r2_score(y_test, y_pred))
print('MAE:', metrics.mean_absolute_error(y_test, y_pred))
# Baseline model with group min as prediction
y_pred = df.groupby('aid').agg({'T': ['min']})
print('R-squared:', metrics.r2_score(y_test, y_pred))
print('MAE:', metrics.mean_absolute_error(y_test, y_pred))
First of all, I would rename y_predall the time to not get confused.
In general:
y_pred = df.groupby('aid').agg({'T': ['mean']})
will give you the mean of the column 'aid'.
And y_pred = df.groupby('aid').agg({'T': ['min']}) will give you the minimum.
There is an interessting package for you: https://scikit-learn.org/stable/modules/generated/sklearn.dummy.DummyRegressor.html
This is helpful for dummy regression and has also other methods inside.
In your case it should work like this:
df_dummies = pd.get_dummies(df, columns=['aid','cid'],prefix_sep='',prefix='')
X = df_dummies
y = df['T']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
dummy_min=DummyRegressor(strategy='constant',constant=min_value)
dummy_min.fit(X_train,y_train)
I have a labeled and clean dataset for sentiment analysis, and I used logistic regression for classification. Here is my code.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
xl = pd.ExcelFile('d:/data.xlsx')
df3 = xl.parse("Sheet1")
cl_data, sent = df3['Clean-Reviews'].fillna(' '), df3['Sentiment']
sent_train, sent_test, y_train, y_test = train_test_split(cl_data, sent,
test_size=0.25, random_state=1000)
vectorizer = CountVectorizer()
vectorizer.fit(sent_train)
X_train = vectorizer.transform(sent_train)
X_test = vectorizer.transform(sent_test)
classifier = LogisticRegression()
classifier.fit(X_train, y_train)
when I try to calculate precision, recall, and F-measure:
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrix
print(f1_score(X_test, y_test, average="macro"))
print(precision_score(X_test, y_test, average="macro"))
print(recall_score(X_test, y_test, average="macro"))
I got an error:
TypeError: len() of unsized object
Can anyone tell what's the problem here? Thanks in Advance
accuracy is measured between predicted and true value, and in your code x_test is not a predicted value. it should be
y_pred = classifier.predict(x_test)
print(f1_score(y_test,y_pred, average="macro"))