Why the Accuracy Score is Zero in Sentiment Analysis - python-3.x

The training data contains around 20000 rows with headers: id, sentiment, text
I have mapped the sentiment as follow:
df.sentiment= df.sentiment.map({"Neutral": 1, "Negative":0, "Positive":2 })
After I have clean and text pre-processing, I used Logistic Regression as follow:
XTR, XTST, YTR, YTST= train_test_split(df.text, df.sentiment, test_size =.2, random_state=100)
lg= LogisticRegression(max_iter=20000)
pp = make_pipeline(TfidfVectorizer(),lg)
pg= {'logisticregression__C': [0.01, 0.1, 1, 10, 100]}
m= GridSearchCV(pipe, pg, cv=5)
m.fit(XTR,YTR)
pr= m.predict(XTST)
print(f"Accuracy: {accuracy_score(YTST, pr):.2f}")
print(classification_report(YTST, pr))
The Output is as follow:
Accuracy 0.59
precision recall f1-score support
0 0.00 0.00 0.00 686
1 0.59 1.00 0.74 2374
2 0.00 0.00 0.00 940
accuracy 0.59 4000
macro avg 0.20 0.33 0.25 4000
weighted avg 0.35 0.59 0.44 4000
Why I get 0.00 for both Negative: 0 and Positive: 2 ? any help please

This is happening because the logistic regression model is predicting every row as Neutral.
So Accuracy of Neutral = 2374/4000 = 0.59
Accuracy of Positive = 0/4000 = 0
Accuracy of Negative = 0/4000 = 0
Moreover, you are taking out the predictions on X_train, while you have to do it on X_test
pr= m.predict(XTS)
print(f"Accuracy: {accuracy_score(YTST, pr):.2f}")
print(classification_report(YTST, pr))

Related

MLP classifier results in different models using almost identical data

I am simulating soccer predictions using scikit-learns MLP classifier. Two model trainings using almost identical data (the second one contains 42 more rows out of 5466 total) and configuration (e.g. random_state) results in the below statistics:
2020-09-19 00:00:00
-------------------------------------------MLPClassifier--------------------------------------------
Fitting 3 folds for each of 27 candidates, totalling 81 fits
GridSearchCV:
Best score : 0.5179227897048015
Best params: {'classifier__alpha': 2.4, 'classifier__hidden_layer_sizes': [3, 3], 'preprocessor__num__scaling': StandardScaler(), 'selector': SelectFromModel(estimator=RandomForestClassifier(n_estimators=10,
random_state=42),
threshold='2.1*median'), 'selector__threshold': '2.1*median'}
precision recall f1-score support
A 0.59 0.57 0.58 1550
D 0.09 0.47 0.15 244
H 0.82 0.57 0.67 3143
accuracy 0.57 4937
macro avg 0.50 0.54 0.47 4937
weighted avg 0.71 0.57 0.62 4937
2020-09-26 00:00:00
-------------------------------------------MLPClassifier--------------------------------------------
Fitting 3 folds for each of 27 candidates, totalling 81 fits
GridSearchCV:
Best score : 0.5253689104507783
Best params: {'classifier__alpha': 2.4, 'classifier__hidden_layer_sizes': [3, 3], 'preprocessor__num__scaling': StandardScaler(), 'selector': SelectFromModel(estimator=RandomForestClassifier(n_estimators=10,
random_state=42),
threshold='1.6*median'), 'selector__threshold': '1.6*median'}
precision recall f1-score support
A 0.62 0.57 0.59 1611
D 0.00 0.00 0.00 0
H 0.86 0.57 0.69 3336
accuracy 0.57 4947
macro avg 0.49 0.38 0.43 4947
weighted avg 0.78 0.57 0.66 4947
How is that possible, that one model never predicts D, while the other one does? I am trying to understand, what's going here. I am afraid, posting the whole problem/code is not possible, so I am looking for a generic answer. I have this behaviour (D's <-> no D's) throughout 38 observations.

Why are non-appearing classes shown in the classification report?

I' m working on NER and using sklearn.metrics.classification_report to caculate micro and macro f1 score. It printed a table like:
precision recall f1-score support
0 0.0000 0.0000 0.0000 0
3 0.0000 0.0000 0.0000 0
4 0.8788 0.9027 0.8906 257
5 0.9748 0.9555 0.9650 1617
6 0.9862 0.9888 0.9875 1156
7 0.9339 0.9138 0.9237 835
8 0.8542 0.7593 0.8039 216
9 0.8945 0.8575 0.8756 702
10 0.9428 0.9382 0.9405 1668
11 0.9234 0.9139 0.9186 1661
accuracy 0.9285 8112
macro avg 0.7388 0.7230 0.7305 8112
weighted avg 0.9419 0.9285 0.9350 8112
It's obvious that the predicted labels have '0' or '3', but there's no '0' or '3' in true labels. Why the classification report will show these two classes which don't have any samples? And how to do to prevent "0-support" classes from being shown. It seems that these two classes have a great impact to macro f1 score.
You can use the following snippet to ensure that all labels in the classification report are present in y_true labels:
from sklearn.metrics import classification_report
y_true = [0, 1, 2, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1, 42]
print(classification_report(y_true, y_pred, labels=np.unique(y_true)))
Which output:
precision recall f1-score support
0 0.50 1.00 0.67 1
1 0.00 0.00 0.00 1
2 1.00 0.50 0.67 4
micro avg 0.60 0.50 0.55 6
macro avg 0.50 0.50 0.44 6
weighted avg 0.75 0.50 0.56 6
As you see the label 42 present in the prediction is not shown as it has no support in y_true.

Hyperparameters optimization gives worse results

I trained my random forest classifier as follows:
rf = RandomForestClassifier(n_jobs=-1, max_depth = None, max_features = "auto",
min_samples_leaf = 1, min_samples_split = 2,
n_estimators = 1000, oob_score=True, class_weight="balanced",
random_state=0)
​
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)
​
print("Confusion matrix")
print(metrics.confusion_matrix(y_test, y_pred))
print("F1-score")
print(metrics.f1_score(y_test, y_pred, average="weighted"))
print("Accuracy")
print(metrics.accuracy_score(y_test, y_pred))
print(metrics.classification_report(y_test, y_pred))
and got the following results:
Confusion matrix
[[558 42 2 0 1]
[ 67 399 84 3 2]
[ 30 135 325 48 7]
[ 5 69 81 361 54]
[ 8 17 7 48 457]]
F1-score
0.7459670332027826
Accuracy
0.7473309608540926
precision recall f1-score support
1 0.84 0.93 0.88 603
2 0.60 0.72 0.66 555
3 0.65 0.60 0.62 545
4 0.78 0.63 0.70 570
5 0.88 0.85 0.86 537
Then I decided to perform a hyperparameters optimization in order to improve this result.
clf = RandomForestClassifier(random_state = 0, n_jobs=-1)
param_grid = {
'n_estimators': [1000,2000],
'max_features': [0.2, 0.5, 0.7, 'auto'],
'max_depth' : [None, 10],
'min_samples_leaf': [1, 2, 3, 5],
'min_samples_split': [0.1, 0.2]
}
k_fold = StratifiedKFold(n_splits=10, shuffle=True, random_state=0)
clf = GridSearchCV(estimator=clf,
param_grid=param_grid,
cv=k_fold,
scoring='accuracy',
verbose=True)
clf.fit(X_train, y_train)
But it gave me worse results if I do y_pred = clf.best_estimator_.predict(X_test):
Confusion matrix
[[533 68 0 0 2]
[145 312 70 0 28]
[ 58 129 284 35 39]
[ 21 68 73 287 121]
[ 32 12 3 36 454]]
F1-score
0.6574507466273805
Accuracy
0.6654804270462633
precision recall f1-score support
1 0.68 0.88 0.77 603
2 0.53 0.56 0.55 555
3 0.66 0.52 0.58 545
4 0.80 0.50 0.62 570
5 0.70 0.85 0.77 537
I assume that it's happening because of scoring='accuracy'. Which score should I use to get the same or better result as my initial random forest?
Defining scoring='accuracy' in your gridsearch should not be responsible for this difference, because this would be the default anyway for a Random Forest Classifier.
The reason why you have an unexpected difference here is because you have specified class_weight="balanced" in your first random forest rf, but not in the second classifier clf. As a result, your classes are weighted differently when calculating the accuracy score, which eventually leads to different performance metrics.
To correct this, just define clf through:
clf = RandomForestClassifier(random_state = 0, n_jobs=-1, class_weight="balanced")

Sample weighting didn't help in imbalanced data training

I am training a two-layer LSTM network with 16 through 32 cells in each layer and had a fairly imbalanced dataset for training. Based on my seven class frequencies, the sample weights calculated through the simple formula of total_samples/class_frequency is [3.7, 5.6, 26.4, 3.2, 191.6, 8.4, 13.2], and I add this weight for each sample to the tuple of (data, label) output of my dataset generator to run my Keras model.fit() function. The training code was:
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
mc = ModelCheckpoint(model_file, monitor='val_acc', mode='max', verbose=1, save_best_only=True)
es = EarlyStopping(monitor='val_acc', mode='max', verbose=1, patience=50)
history = model.fit(train_data, epochs=epochs, steps_per_epoch = train_steps, validation_data=val_data,
validation_steps = val_steps, verbose=verbose, callbacks=[es, mc])
Then I used the best saved model to evaluate it and calculate performance statistics by this code (my data is in tensorflow datasets):
saved_model = load_model(model_file)
iterator = test_data.make_one_shot_iterator()
next_element = iterator.get_next()
y_test = y_pred = np.empty(0)
for i in range(test_steps):
batch = sess.run(next_element)
x_test_batch = batch[0]
y_test_batch = batch[1]
y_pred_batch = saved_model.predict_on_batch(x_test_batch)
y_test = np.append(y_test, np.argmax(y_test_batch, axis=1))
y_pred = np.append(y_pred, np.argmax(y_pred_batch, axis=1))
print('\nTest data classification report:\n{}\n'.format(classification_report(y_test, y_pred)))
But what I see in the output statistics is that the weighted stats are overall worse than unweighted ones (setting all weights equally to 1), even for rare classes (highest weights). Here is the stat:
For weighted run:
class prec. recall f1 support
0.0 1.00 0.97 0.98 79785
1.0 0.89 0.88 0.88 52614
2.0 0.61 0.76 0.68 11090
3.0 0.96 0.93 0.95 91160
4.0 0.59 0.92 0.72 1530
5.0 0.89 0.90 0.89 34746
6.0 0.81 0.87 0.84 22289
accuracy 0.92 293214
macro avg 0.82 0.89 0.85 293214
For unweighted run:
class prec. recall f1 support
0.0 0.99 0.98 0.99 79785
1.0 0.89 0.90 0.90 52614
2.0 0.79 0.66 0.72 11090
3.0 0.95 0.96 0.95 91160
4.0 0.85 0.82 0.83 1530
5.0 0.89 0.92 0.90 34746
6.0 0.88 0.86 0.87 22289
accuracy 0.93 293214
macro avg 0.89 0.87 0.88 293214
what is wrong here?
You should be using the class_weight in fit function or fit_generator to apply weights to your classes.
First you have to create a dictionary with label:weight format:
class_weight = {0: 3.7,
1: 5.6,
2: 2.64,...}
Then apply it to your fit function:
history = model.fit(train_data, epochs=epochs, steps_per_epoch = train_steps, validation_data=val_data,
class_weight=class_weight, validation_steps = val_steps, verbose=verbose, callbacks=[es, mc])
If you want to apply a weight per instance, then you need to create an array that contains the weight for the corresponding instance in the training data and set it in sample_weight in fit function.

Is it possible to run a Cox-Proportional-Hazards-Model with an exponential distribution for the baseline hazard in `lifelines` or another package?

I consider using the lifelines package to fit a Cox-Proportional-Hazards-Model. I read that lifelines uses a nonparametric approach to fit the baseline hazard, which results in different baseline_hazards for some time points (see code example below). For my application, I need an
exponential distribution leading to a baseline hazard h0(t) = lambda which is constant across time.
So my question is: is it (in the meantime) possible to run a Cox-Proportional-Hazards-Model with an exponential distribution for the baseline hazard in lifelines or another Python package?
Example code:
from lifelines import CoxPHFitter
import pandas as pd
df = pd.DataFrame({'duration': [4, 6, 5, 5, 4, 6],
'event': [0, 0, 0, 1, 1, 1],
'cat': [0, 1, 0, 1, 0, 1]})
cph = CoxPHFitter()
cph.fit(df, duration_col='duration', event_col='event', show_progress=True)
cph.baseline_hazard_
gives
baseline hazard
T
4.0 0.160573
5.0 0.278119
6.0 0.658032
👋lifelines author here.
So, this model is not natively in lifelines, but you can easily implement it yourself (and maybe something I'll do for a future release). This idea relies on the intersection of proportional hazard models and AFT (accelerated failure time) models. In the cox-ph model with exponential hazard (i.e. constant baseline hazard), the hazard looks like:
h(t|x) = lambda_0(t) * exp(beta * x) = lambda_0 * exp(beta * x)
In the AFT specification for an exponential distribution, the hazard looks like:
h(t|x) = exp(-beta * x - beta_0) = exp(-beta * x) * exp(-beta_0) = exp(-beta * x) * lambda_0
Note the negative sign difference!
So instead of doing a CoxPH, we can do an Exponential AFT fit (and flip the signs if we want the same interpretation as the CoxPH). We can use the custom regession model syntax to do this:
from lifelines.fitters import ParametricRegressionFitter
from autograd import numpy as np
class ExponentialAFTFitter(ParametricRegressionFitter):
# this is necessary, and should always be a non-empty list of strings.
_fitted_parameter_names = ['lambda_']
def _cumulative_hazard(self, params, T, Xs):
# params is a dictionary that maps unknown parameters to a numpy vector.
# Xs is a dictionary that maps unknown parameters to a numpy 2d array
lambda_ = np.exp(np.dot(Xs['lambda_'], params['lambda_']))
return T / lambda_
Testing this,
from lifelines.datasets import load_rossi
from lifelines import CoxPHFitter
rossi = load_rossi()
rossi['intercept'] = 1
regressors = {'lambda_': rossi.columns}
eaf = ExponentialAFTFitter().fit(rossi, "week", "arrest", regressors=regressors)
eaf.print_summary()
"""
<lifelines.ExponentialAFTFitter: fitted with 432 observations, 318 censored>
event col = 'arrest'
number of subjects = 432
number of events = 114
log-likelihood = -686.37
time fit was run = 2019-06-27 15:13:18 UTC
---
coef exp(coef) se(coef) z p -log2(p) lower 0.95 upper 0.95
lambda_ fin 0.37 1.44 0.19 1.92 0.06 4.18 -0.01 0.74
age 0.06 1.06 0.02 2.55 0.01 6.52 0.01 0.10
race -0.30 0.74 0.31 -0.99 0.32 1.63 -0.91 0.30
wexp 0.15 1.16 0.21 0.69 0.49 1.03 -0.27 0.56
mar 0.43 1.53 0.38 1.12 0.26 1.93 -0.32 1.17
paro 0.08 1.09 0.20 0.42 0.67 0.57 -0.30 0.47
prio -0.09 0.92 0.03 -3.03 <0.005 8.65 -0.14 -0.03
_intercept 4.05 57.44 0.59 6.91 <0.005 37.61 2.90 5.20
_fixed _intercept 0.00 1.00 0.00 nan nan nan 0.00 0.00
---
"""
CoxPHFitter().fit(load_rossi(), 'week', 'arrest').print_summary()
"""
<lifelines.CoxPHFitter: fitted with 432 observations, 318 censored>
duration col = 'week'
event col = 'arrest'
number of subjects = 432
number of events = 114
partial log-likelihood = -658.75
time fit was run = 2019-06-27 15:17:41 UTC
---
coef exp(coef) se(coef) z p -log2(p) lower 0.95 upper 0.95
fin -0.38 0.68 0.19 -1.98 0.05 4.40 -0.75 -0.00
age -0.06 0.94 0.02 -2.61 0.01 6.79 -0.10 -0.01
race 0.31 1.37 0.31 1.02 0.31 1.70 -0.29 0.92
wexp -0.15 0.86 0.21 -0.71 0.48 1.06 -0.57 0.27
mar -0.43 0.65 0.38 -1.14 0.26 1.97 -1.18 0.31
paro -0.08 0.92 0.20 -0.43 0.66 0.59 -0.47 0.30
prio 0.09 1.10 0.03 3.19 <0.005 9.48 0.04 0.15
---
Concordance = 0.64
Log-likelihood ratio test = 33.27 on 7 df, -log2(p)=15.37
"""
Notice the sign change! So if you want the constant baseline hazard in the model, it's exp(-4.05).

Resources