Why are non-appearing classes shown in the classification report? - scikit-learn

I' m working on NER and using sklearn.metrics.classification_report to caculate micro and macro f1 score. It printed a table like:
precision recall f1-score support
0 0.0000 0.0000 0.0000 0
3 0.0000 0.0000 0.0000 0
4 0.8788 0.9027 0.8906 257
5 0.9748 0.9555 0.9650 1617
6 0.9862 0.9888 0.9875 1156
7 0.9339 0.9138 0.9237 835
8 0.8542 0.7593 0.8039 216
9 0.8945 0.8575 0.8756 702
10 0.9428 0.9382 0.9405 1668
11 0.9234 0.9139 0.9186 1661
accuracy 0.9285 8112
macro avg 0.7388 0.7230 0.7305 8112
weighted avg 0.9419 0.9285 0.9350 8112
It's obvious that the predicted labels have '0' or '3', but there's no '0' or '3' in true labels. Why the classification report will show these two classes which don't have any samples? And how to do to prevent "0-support" classes from being shown. It seems that these two classes have a great impact to macro f1 score.

You can use the following snippet to ensure that all labels in the classification report are present in y_true labels:
from sklearn.metrics import classification_report
y_true = [0, 1, 2, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1, 42]
print(classification_report(y_true, y_pred, labels=np.unique(y_true)))
Which output:
precision recall f1-score support
0 0.50 1.00 0.67 1
1 0.00 0.00 0.00 1
2 1.00 0.50 0.67 4
micro avg 0.60 0.50 0.55 6
macro avg 0.50 0.50 0.44 6
weighted avg 0.75 0.50 0.56 6
As you see the label 42 present in the prediction is not shown as it has no support in y_true.

Related

convert txt tables to python dictionary

I have multiple text files which contain some tables , most of the tables are of these two types. I want a way to convert these tables into python dictionaries.
precision recall f1-score support
BASE_RENT_ANNUAL 0.53 0.57 0.55 1408
BASE_RENT_MONTHLY 0.65 0.54 0.59 3904
BASE_RENT_PSF 0.68 0.59 0.63 1248
RENT_INCREMENT_MONTHLY 0.63 0.44 0.52 7530
SECURITY_DEPOSIT_AMOUNT 0.88 0.89 0.88 3557
micro avg 0.69 0.58 0.63 17647
macro avg 0.67 0.61 0.63 17647
weighted avg 0.68 0.58 0.62 17647
Hard Evaluation Metrics
--------------------------------------------------
Reading predictions from /mnt/c/Users/Aleksandra/mlbuddy/python/bilstm/training/test_predictions.txt...
Nb tokens in test set: 957800
Reading training data from /mnt/c/Users/Aleksandra/mlbuddy/python/bilstm/corpus/train.txt...
Nb tokens in training set: 211153
Strict mode: OFF
---------------------------------------------------------------------
Test tokens Nb tokens Nb words Nb errors Token error rate
---------------------------------------------------------------------
all 957800 5408 39333 0.0411
---------------------------------------------------------------------
unseen-I 704 19 704 1.0000
unseen-O 59870 1724 10208 0.1705
unseen-all 60574 1743 10912 0.1801
---------------------------------------------------------------------
diff-I 13952 70 13952 1.0000
diff-O 5285 121 4645 0.8789
diff-etype 0 0 0 0.0000
diff-all 19237 191 18597 0.9667
---------------------------------------------------------------------
all-unseen+diff 79811 1934 29509 0.3697
---------------------------------------------------------------------
Avg TER on unseen and diff: 0.5734
I have tried this in my code to convert the second table to dictionary but it is not working as expected.:
from itertools import dropwhile, takewhile
with open("idm.txt") as f:
dp = dropwhile(lambda x: not x.startswith("-"), f)
next(dp) # skip ----
names = next(dp).split() # get headers names
next(f) # skip -----
out = []
for line in takewhile(lambda x: not x.startswith("-"), f):
a, b = line.rsplit(None, 1)
out.append(dict(zip(names, a.split(None, 7) + [b])))
Expected output:
{BASE_RENT_ANNUAL: {precision:0.53,recall:0.57,f1-score:0.55,support:1408},
BASE_RENT_MONTHLY: {...}, ..
}
Not the same approach, but the following could be a beginning to your true solution
txt = ''' precision recall f1-score support
BASE_RENT_ANNUAL 0.53 0.57 0.55 1408
BASE_RENT_MONTHLY 0.65 0.54 0.59 3904
BASE_RENT_PSF 0.68 0.59 0.63 1248
RENT_INCREMENT_MONTHLY 0.63 0.44 0.52 7530
SECURITY_DEPOSIT_AMOUNT 0.88 0.89 0.88 3557
micro avg 0.69 0.58 0.63 17647
macro avg 0.67 0.61 0.63 17647
weighted avg 0.68 0.58 0.62 17647
Hard Evaluation Metrics
--------------------------------------------------
Reading predictions from /mnt/c/Users/Aleksandra/mlbuddy/python/bilstm/training/test_predictions.txt...
Nb tokens in test set: 957800
Reading training data from /mnt/c/Users/Aleksandra/mlbuddy/python/bilstm/corpus/train.txt...
Nb tokens in training set: 211153
Strict mode: OFF
---------------------------------------------------------------------
Test tokens Nb tokens Nb words Nb errors Token error rate
---------------------------------------------------------------------
all 957800 5408 39333 0.0411
---------------------------------------------------------------------
unseen-I 704 19 704 1.0000
unseen-O 59870 1724 10208 0.1705
unseen-all 60574 1743 10912 0.1801
---------------------------------------------------------------------
diff-I 13952 70 13952 1.0000
diff-O 5285 121 4645 0.8789
diff-etype 0 0 0 0.0000
diff-all 19237 191 18597 0.9667
---------------------------------------------------------------------
all-unseen+diff 79811 1934 29509 0.3697
---------------------------------------------------------------------
Avg TER on unseen and diff: 0.5734'''
lst1 = [x.split() for x in txt.split('\n') if x]
lst2 = [(x[0],x[1:]) for x in lst1 if (not x[0].startswith('-') and x[0] == x[0].upper())]
dico = dict(lst2)
dico2 = {}
for k in dico:
dico2[k] = {'precision':dico[k][0],'recall':dico[k][1],'f1-score':dico[k][2],'support':dico[k][3]}
print(dico2)

MLP classifier results in different models using almost identical data

I am simulating soccer predictions using scikit-learns MLP classifier. Two model trainings using almost identical data (the second one contains 42 more rows out of 5466 total) and configuration (e.g. random_state) results in the below statistics:
2020-09-19 00:00:00
-------------------------------------------MLPClassifier--------------------------------------------
Fitting 3 folds for each of 27 candidates, totalling 81 fits
GridSearchCV:
Best score : 0.5179227897048015
Best params: {'classifier__alpha': 2.4, 'classifier__hidden_layer_sizes': [3, 3], 'preprocessor__num__scaling': StandardScaler(), 'selector': SelectFromModel(estimator=RandomForestClassifier(n_estimators=10,
random_state=42),
threshold='2.1*median'), 'selector__threshold': '2.1*median'}
precision recall f1-score support
A 0.59 0.57 0.58 1550
D 0.09 0.47 0.15 244
H 0.82 0.57 0.67 3143
accuracy 0.57 4937
macro avg 0.50 0.54 0.47 4937
weighted avg 0.71 0.57 0.62 4937
2020-09-26 00:00:00
-------------------------------------------MLPClassifier--------------------------------------------
Fitting 3 folds for each of 27 candidates, totalling 81 fits
GridSearchCV:
Best score : 0.5253689104507783
Best params: {'classifier__alpha': 2.4, 'classifier__hidden_layer_sizes': [3, 3], 'preprocessor__num__scaling': StandardScaler(), 'selector': SelectFromModel(estimator=RandomForestClassifier(n_estimators=10,
random_state=42),
threshold='1.6*median'), 'selector__threshold': '1.6*median'}
precision recall f1-score support
A 0.62 0.57 0.59 1611
D 0.00 0.00 0.00 0
H 0.86 0.57 0.69 3336
accuracy 0.57 4947
macro avg 0.49 0.38 0.43 4947
weighted avg 0.78 0.57 0.66 4947
How is that possible, that one model never predicts D, while the other one does? I am trying to understand, what's going here. I am afraid, posting the whole problem/code is not possible, so I am looking for a generic answer. I have this behaviour (D's <-> no D's) throughout 38 observations.

Why the Accuracy Score is Zero in Sentiment Analysis

The training data contains around 20000 rows with headers: id, sentiment, text
I have mapped the sentiment as follow:
df.sentiment= df.sentiment.map({"Neutral": 1, "Negative":0, "Positive":2 })
After I have clean and text pre-processing, I used Logistic Regression as follow:
XTR, XTST, YTR, YTST= train_test_split(df.text, df.sentiment, test_size =.2, random_state=100)
lg= LogisticRegression(max_iter=20000)
pp = make_pipeline(TfidfVectorizer(),lg)
pg= {'logisticregression__C': [0.01, 0.1, 1, 10, 100]}
m= GridSearchCV(pipe, pg, cv=5)
m.fit(XTR,YTR)
pr= m.predict(XTST)
print(f"Accuracy: {accuracy_score(YTST, pr):.2f}")
print(classification_report(YTST, pr))
The Output is as follow:
Accuracy 0.59
precision recall f1-score support
0 0.00 0.00 0.00 686
1 0.59 1.00 0.74 2374
2 0.00 0.00 0.00 940
accuracy 0.59 4000
macro avg 0.20 0.33 0.25 4000
weighted avg 0.35 0.59 0.44 4000
Why I get 0.00 for both Negative: 0 and Positive: 2 ? any help please
This is happening because the logistic regression model is predicting every row as Neutral.
So Accuracy of Neutral = 2374/4000 = 0.59
Accuracy of Positive = 0/4000 = 0
Accuracy of Negative = 0/4000 = 0
Moreover, you are taking out the predictions on X_train, while you have to do it on X_test
pr= m.predict(XTS)
print(f"Accuracy: {accuracy_score(YTST, pr):.2f}")
print(classification_report(YTST, pr))

Extract specific value in pandas dataframe based on column condition

I am faced with a small problem, the solution of which is certainly very simple, but I cannot find how to do it.
Let's say I have the following pandas dataframe df:
import pandas as pd
X = [0.78, 0.82, 1.03, 1.06, 1.21]
Y = [0.0, 0.2521, 0.4905, 0.5003, 1.0]
df = pd.DataFrame({'X':X, 'Y':Y})
df
X Y
0 0.78 0.0000
1 0.82 0.2521
2 1.03 0.4905
3 1.06 0.5003
4 1.21 1.0000
I want to recover the value of X for which Y exceeds 0.5; in other words, I am looking for a piece of program which creates a new variable val such as:
print (val)
1.06
I imagine only complicated things, style:
df['Z'] = df.apply(lambda row: 0 if row.Y <= 0.5 else 1, axis = 1)
df
X Y Z
0 0.78 0.0000 0
1 0.82 0.2521 0
2 1.03 0.4905 0
3 1.06 0.5003 1
4 1.21 1.0000 1
But this shows me where is the X value I want (first appearance of 1 in Z), but it doesn't extract that value.
How could I do that in a simple way?
We can check with idxmax, notice it will need have one value less than 0.5
df.loc[df.Y.gt(0.5).idxmax(),'Z']=1
df.Z.fillna(0,inplace=True)
df
X Y Z
0 0.78 0.0000 0.0
1 0.82 0.2521 0.0
2 1.03 0.4905 0.0
3 1.06 0.5003 1.0
4 1.21 1.0000 0.0
If would like separated dataframe
df1=df.loc[df.Y.gt(0.5)]

Is it possible to run a Cox-Proportional-Hazards-Model with an exponential distribution for the baseline hazard in `lifelines` or another package?

I consider using the lifelines package to fit a Cox-Proportional-Hazards-Model. I read that lifelines uses a nonparametric approach to fit the baseline hazard, which results in different baseline_hazards for some time points (see code example below). For my application, I need an
exponential distribution leading to a baseline hazard h0(t) = lambda which is constant across time.
So my question is: is it (in the meantime) possible to run a Cox-Proportional-Hazards-Model with an exponential distribution for the baseline hazard in lifelines or another Python package?
Example code:
from lifelines import CoxPHFitter
import pandas as pd
df = pd.DataFrame({'duration': [4, 6, 5, 5, 4, 6],
'event': [0, 0, 0, 1, 1, 1],
'cat': [0, 1, 0, 1, 0, 1]})
cph = CoxPHFitter()
cph.fit(df, duration_col='duration', event_col='event', show_progress=True)
cph.baseline_hazard_
gives
baseline hazard
T
4.0 0.160573
5.0 0.278119
6.0 0.658032
👋lifelines author here.
So, this model is not natively in lifelines, but you can easily implement it yourself (and maybe something I'll do for a future release). This idea relies on the intersection of proportional hazard models and AFT (accelerated failure time) models. In the cox-ph model with exponential hazard (i.e. constant baseline hazard), the hazard looks like:
h(t|x) = lambda_0(t) * exp(beta * x) = lambda_0 * exp(beta * x)
In the AFT specification for an exponential distribution, the hazard looks like:
h(t|x) = exp(-beta * x - beta_0) = exp(-beta * x) * exp(-beta_0) = exp(-beta * x) * lambda_0
Note the negative sign difference!
So instead of doing a CoxPH, we can do an Exponential AFT fit (and flip the signs if we want the same interpretation as the CoxPH). We can use the custom regession model syntax to do this:
from lifelines.fitters import ParametricRegressionFitter
from autograd import numpy as np
class ExponentialAFTFitter(ParametricRegressionFitter):
# this is necessary, and should always be a non-empty list of strings.
_fitted_parameter_names = ['lambda_']
def _cumulative_hazard(self, params, T, Xs):
# params is a dictionary that maps unknown parameters to a numpy vector.
# Xs is a dictionary that maps unknown parameters to a numpy 2d array
lambda_ = np.exp(np.dot(Xs['lambda_'], params['lambda_']))
return T / lambda_
Testing this,
from lifelines.datasets import load_rossi
from lifelines import CoxPHFitter
rossi = load_rossi()
rossi['intercept'] = 1
regressors = {'lambda_': rossi.columns}
eaf = ExponentialAFTFitter().fit(rossi, "week", "arrest", regressors=regressors)
eaf.print_summary()
"""
<lifelines.ExponentialAFTFitter: fitted with 432 observations, 318 censored>
event col = 'arrest'
number of subjects = 432
number of events = 114
log-likelihood = -686.37
time fit was run = 2019-06-27 15:13:18 UTC
---
coef exp(coef) se(coef) z p -log2(p) lower 0.95 upper 0.95
lambda_ fin 0.37 1.44 0.19 1.92 0.06 4.18 -0.01 0.74
age 0.06 1.06 0.02 2.55 0.01 6.52 0.01 0.10
race -0.30 0.74 0.31 -0.99 0.32 1.63 -0.91 0.30
wexp 0.15 1.16 0.21 0.69 0.49 1.03 -0.27 0.56
mar 0.43 1.53 0.38 1.12 0.26 1.93 -0.32 1.17
paro 0.08 1.09 0.20 0.42 0.67 0.57 -0.30 0.47
prio -0.09 0.92 0.03 -3.03 <0.005 8.65 -0.14 -0.03
_intercept 4.05 57.44 0.59 6.91 <0.005 37.61 2.90 5.20
_fixed _intercept 0.00 1.00 0.00 nan nan nan 0.00 0.00
---
"""
CoxPHFitter().fit(load_rossi(), 'week', 'arrest').print_summary()
"""
<lifelines.CoxPHFitter: fitted with 432 observations, 318 censored>
duration col = 'week'
event col = 'arrest'
number of subjects = 432
number of events = 114
partial log-likelihood = -658.75
time fit was run = 2019-06-27 15:17:41 UTC
---
coef exp(coef) se(coef) z p -log2(p) lower 0.95 upper 0.95
fin -0.38 0.68 0.19 -1.98 0.05 4.40 -0.75 -0.00
age -0.06 0.94 0.02 -2.61 0.01 6.79 -0.10 -0.01
race 0.31 1.37 0.31 1.02 0.31 1.70 -0.29 0.92
wexp -0.15 0.86 0.21 -0.71 0.48 1.06 -0.57 0.27
mar -0.43 0.65 0.38 -1.14 0.26 1.97 -1.18 0.31
paro -0.08 0.92 0.20 -0.43 0.66 0.59 -0.47 0.30
prio 0.09 1.10 0.03 3.19 <0.005 9.48 0.04 0.15
---
Concordance = 0.64
Log-likelihood ratio test = 33.27 on 7 df, -log2(p)=15.37
"""
Notice the sign change! So if you want the constant baseline hazard in the model, it's exp(-4.05).

Resources