scikit MLPClassifier couldn't fit xor problem - scikit-learn

My teacher asked us to use the scikit mlpclassifier to solve the xor problem, and asked to use 2 hidden layers the first with 4 units and the second with 2 units, and the identity for activation and lbfgs as solver
x = [ [ 0 , 0 ] , [ 0 , 1 ] , [ 1 , 0 ] , [ 1 , 1 ] ]
y = [ 0 , 1 , 1 , 0 ]
clf = MLPClassifier(hidden_layer_sizes = (4,2), activation = 'identity', solver = 'lbfgs')
clf.fit(x,y)
clf.predict(x)
# output:array([1, 0, 0, 0])
I don't why it fails to predict correctly, I thought any combination of linear functions could solve a non linear problem

Related

How can I perform GridSearchCV but cross validate using multiple validation sets?

I have a Train set training_set of m observations and n features, and I have three different validation sets val_a, val_b, and val_c which don't leak information to one another.
I would like to perform hyperparameter tuning via HalvingGridSearchCV, where I fit models on training_set, and validate on all three validation sets separately, and then take the score to be the average score for all three (or the lowest score).
The reason is that the three validation were observations of the samples at three distinct time points (A, B, C), and the training set contains observations from only time point A. Thus, a model trained on training_set and evaluated on val_a would not necessarily be best for val_b and val_c.
Also, concatenating all of the sets via training_set = pd.concat([training_set, val_a, val_b, val_c]), and then performing a variant of GroupShuffleSplit is non-ideal, as this results in leaking information from different time points to the model.
Thus far here's what I've tried:
import pandas as pd
from sklearn.model_selection import PredefinedSplit
# Assume each dataset has 4 observations.
tf = [-1] * len(training_set)
training_set = pd.concat([training_set, val_a, val_b, val_c])
tf += [0] * len(val_a) + [1] * len(val_b) + [2] * len(val_c)
print("Test fold:", tf)
pds = PredefinedSplit(test_fold = tf)
# gs = HalvingGridSearchCV(estimator = LGBMRegressor(), param_grid = param_grid, cv = pds, scoring = 'r2', refit = False, min_resources = 'exhaust')
for train_index, test_index in ps.split():
print("TRAIN:", train_index, "TEST:", test_index)
Output:
Test fold: [-1, -1, -1, -1, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2]
TRAIN: [ 0 1 2 3 8 9 10 11 12 13 14 15] TEST: [4 5 6 7]
TRAIN: [ 0 1 2 3 4 5 6 7 12 13 14 15] TEST: [ 8 9 10 11]
TRAIN: [ 0 1 2 3 4 5 6 7 8 9 10 11] TEST: [12 13 14 15]
As you can see, this would generate a 3 fold cross-validation, where each validation set is left out once, and included in the training set all of the other times. I know -1 will leave the observations out of any test set, but there is no value to leave the observations out of any train set. ):
Thank you!

Is this a bug in xgboost's XGBClassifier?

import numpy as np
from xgboost import XGBClassifier
model = XGBClassifier(
use_label_encoder=False,
label_lower_bound=0, label_upper_bound=1
# setting the bounds doesn't seem to help
)
x = np.array([ [1,2,3], [4,5,6] ], 'ushort' )
y = [ 1, 1 ]
try :
model.fit(x,y)
# this fails with ValueError:
# "The label must consist of integer labels
# of form 0, 1, 2, ..., [num_class - 1]."
except Exception as e :
print(e)
y = [ 0, 0 ]
# this works
model.fit(x,y)
model = XGBClassifier()
y = [ 1, 1 ]
# this works, but with UserWarning:
# "The use of label encoder in XGBClassifier is deprecated, etc."
model.fit(x,y)
Seems to me like label encoder is deprecated but we are FORCED to use it, if our classifications don't happen to contain a zero.
I had the same problem. I solved using use_label_encoder=False as parameter and the warning message disappear.
I think in your case the problem is that you have only 1 in your y, but XGBoost wants the target starting from 0. If you change y = [ 1, 1 ] with y = [ 0, 0 ] the UserWarning should disappear.

Linear regression issue with categorical variables

I've built a linear regression model predicting recidivism among convicts based on the COMPAS dataset.
I've some issues regarding the categorical variables, specifically the gender variable.
This was transformed to dummy variables and dropping one of the two binary variables to prevent collinearity.
However it seems after training the model the female gender gets a higher recidivism score than the male gender.
It looks like this is not correct, since the male offenders have a higher score on the independent variables than females.
Also the target variable (recidivism score) is lower in the female category than the male.
I would expect that females would have a lower predicted score.
I get the feeling that there's something wrong with the model.
Can someone please help me out?
See below the dataset and code:
subset of the data after dummy transformation and data cleansing:
age;priors_count;juv_fel_count;sex_Male;race_Caucasian;race_Asian;race_Hispanic;race_NativeAmerican;race_Other;v_decile_score;event;is_recid;decile_score
69;0;0;1;0;0;0;0;1;1;0;0;1
69;0;0;1;0;0;0;0;1;1;0;0;1
34;0;0;1;0;0;0;0;0;1;1;1;3
24;4;0;1;0;0;0;0;0;3;0;1;4
24;4;0;1;0;0;0;0;0;3;0;1;4
24;4;0;1;0;0;0;0;0;3;0;1;4
24;4;0;1;0;0;0;0;0;3;0;1;4
24;4;0;1;0;0;0;0;0;3;0;1;4
41;14;0;1;1;0;0;0;0;2;0;1;6
41;14;0;1;1;0;0;0;0;2;0;1;6
43;3;0;1;0;0;0;0;1;3;0;0;4
43;3;0;1;0;0;0;0;1;3;0;0;4
#model
X = df[[
'age'
,'priors_count'
,'juv_fel_count'
,'sex_Male'
,'race_Caucasian'
,'race_Asian'
,'race_Hispanic'
,'race_Native American'
,'race_Other'
,'v_decile_score'
,'event'
,'is_recid'
]]
Y = df['decile_score']
# with sklearn
regr = linear_model.LinearRegression()
regr.fit(X, Y)
print('Intercept: \n', regr.intercept_)
print('Coefficients: \n', regr.coef_)
# prediction with sklearn
New_Age = 18
New_Priors_Count = 0
New_Juvenile_Count = 0
New_Sex_Male = 0
#Race Variables
# 0 in all the below means TRUE for Afro-American Race
New_Race_Caucasian = 0
New_Race_Asian = 0
New_Race_Hispanic = 0
New_Race_Native_American = 0
New_Race_Other = 0
#Violence & Events
New_Violent_Score = 0
New_Event_In_Custody = 0
New_Is_Recid = 0
print ('Recividism Score: \n',
regr.predict(
[[
New_Age
, New_Priors_Count
, New_Juvenile_Count
, New_Sex_Male
, New_Race_Caucasian
, New_Race_Asian
, New_Race_Hispanic
, New_Race_Native_American
, New_Race_Other
, New_Violent_Score
, New_Event_In_Custody
, New_Is_Recid
# , New_Days_In_Jail
]]
))
# with statsmodels
X = sm.add_constant(X) # adding a constant
model = sm.OLS(Y, X).fit()
predictions = model.predict(X)
print_model = model.summary()
print(print_model)

EDITED Learning data not correctly

I'm studying deep-learning.
I'm making figure classifier: circle, rectangle, triangle, pentagon, star. And one-hot-encoded into label2idx = dict(rectangle=0, circle=1, pentagon=2, star=3, triangle=4)
But every learning rates per epoch are same and it do not learn about the image.
I made a Layer with using Relu function for activation function, Affine for each layer, Softmax for the last layer, and using Adam to optimizing the gradients.
I have totally 234 RGB images to learn, which has created on window paint 2D tool and it is 128 * 128 size but not using the whole canvas to draw the figure.
And the picture looks like:
The train result. left [] is predict, and the right [] is answer lable(I picked random images to print predict value and answer lable).:
epoch: 0.49572649572649574
[ 0.3149641 -0.01454905 -0.23183 -0.2493432 0.11655246] [0 0 0 0 1]
epoch: 0.6837606837606838
[ 1.67341673 0.27887525 -1.09800398 -1.12649948 -0.39533065] [1 0 0 0 0]
epoch: 0.7094017094017094
[ 0.93106499 1.49599772 -0.98549052 -1.20471573 -0.24997779] [0 1 0 0 0]
epoch: 0.7905982905982906
[ 0.48447043 -0.05460748 -0.23526179 -0.22869489 0.05468969] [1 0 0 0 0]
...
epoch: 0.9230769230769231
[14.13835867 0.32432293 -5.01623202 -6.62469261 -3.21594355] [1 0 0 0 0]
epoch: 0.9529914529914529
[ 1.61248239 -0.47768294 -0.41580036 -0.71899219 -0.0901478 ] [1 0 0 0 0]
epoch: 0.9572649572649573
[ 5.93142154 -1.16719891 -1.3656573 -2.19785097 -1.31258801] [1 0 0 0 0]
epoch: 0.9700854700854701
[ 7.42198941 -0.85870225 -2.12027192 -2.81081263 -1.83810873] [1 0 0 0 0]
I think the more it learn, prediction should like [ 0.00143 0.09357 0.352 0.3 0.253 ] [ 1 0 0 0 0 ], which means answer index should be close to 0, but it does not.
Even the train accuracy sometimes goes to 1.0 ( 100% ).
I'm loading and normalizing the images with below codes.
#data_list = data_list = glob('dataset\\training\\*\\*.jpg')
dataset['train_img'] = _load_img()
def _load_img():
data = [np.array(Image.open(v)) for v in data_list]
a = np.array(data)
a = a.reshape(-1, img_size * 3)
return a
#normalize
for v in dataset:
dataset['train_img'] = dataset['train_img'].astype(np.float32)
dataset['train_img'] /= dataset['train_img'].max()
dataset['train_img'] -= dataset['train_img'].mean(axis=1).reshape(len(dataset['train_img']), 1)
EDIT
I let the images to gray scale with Image.open(v).convert('LA')
and checking my prediction value, and it's example:
[-3.98576886e-04 3.41216374e-05] [1 0]
[ 0.00698861 -0.01111879] [1 0]
[-0.42003415 0.42222863] [0 1]
still not learning about the images. I removed 3 figures to test it, so I just have rectangle, and triangle total 252 images ( I drew more imges. )
And the prediction value is usually like opposite value( 3.1323, -3.1323 or 3.1323, -3.1303 ), I cannot figure out the reason.
Not just increasing numerical accuracy, when I use SGD for optimizer, the accuracy do not increase. Just same accuracy.
[ 0.02090227 -0.02085848] [1 0]
epoch: 0.5873015873015873
[ 0.03058879 -0.03086193] [0 1]
epoch: 0.5873015873015873
[ 0.04006064 -0.04004988] [1 0]
[ 0.04545139 -0.04547538] [1 0]
epoch: 0.5873015873015873
[ 0.05605123 -0.05595288] [0 1]
epoch: 0.5873015873015873
[ 0.06495255 -0.06500597] [1 0]
epoch: 0.5873015873015873
Yes. Your model is performing pretty well. The problem is not related to normalization(not even a problem). The model actually predicted outside of 0,1 which means the model is really confident.
The model will not try to optimize towards [1,0,0,0] because when it calculates the loss, it will firstly clip the values.
Hope this helps!

calculate precision and recall in a confusion matrix

Suppose I have a confusion matrix as like as below. How can I calculate precision and recall?
first, your matrix is arranged upside down.
You want to arrange your labels so that true positives are set on the diagonal [(0,0),(1,1),(2,2)] this is the arrangement that you're going to find with confusion matrices generated from sklearn and other packages.
Once we have things sorted in the right direction, we can take a page from this answer and say that:
True Positives are on the diagonal position
False positives are column-wise sums. Without the diagonal
False negatives are row-wise sums. Without the diagonal.
\ Then we take some formulas from sklearn docs for precision and recall.
And put it all into code:
import numpy as np
cm = np.array([[2,1,0], [3,4,5], [6,7,8]])
true_pos = np.diag(cm)
false_pos = np.sum(cm, axis=0) - true_pos
false_neg = np.sum(cm, axis=1) - true_pos
precision = np.sum(true_pos / (true_pos + false_pos))
recall = np.sum(true_pos / (true_pos + false_neg))
Since we remove the true positives to define false_positives/negatives only to add them back... we can simplify further by skipping a couple of steps:
true_pos = np.diag(cm)
precision = np.sum(true_pos / np.sum(cm, axis=0))
recall = np.sum(true_pos / np.sum(cm, axis=1))
I don't think you need summation at last. Without summation, your method is correct; it gives precision and recall for each class.
If you intend to calculate average precision and recall, then you have two options: micro and macro-average.
Read more here http://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html
For the sake of completeness for future reference, given a list of grounth (gt) and prediction (pd). The following code snippet computes confusion matrix and then calculates precision and recall.
from sklearn.metrics import confusion_matrix
gt = [1,1,2,2,1,0]
pd = [1,1,1,1,2,0]
cm = confusion_matrix(gt, pd)
#rows = gt, col = pred
#compute tp, tp_and_fn and tp_and_fp w.r.t all classes
tp_and_fn = cm.sum(1)
tp_and_fp = cm.sum(0)
tp = cm.diagonal()
precision = tp / tp_and_fp
recall = tp / tp_and_fn
Given:
hypothetical confusion matrix (cm)
cm =
[[ 970 1 2 1 1 6 10 0 5 0]
[ 0 1105 7 3 1 6 0 3 16 0]
[ 9 14 924 19 18 3 13 12 24 4]
[ 3 10 35 875 2 34 2 14 19 19]
[ 0 3 6 0 903 0 9 5 4 32]
[ 9 6 4 28 10 751 17 5 24 9]
[ 7 2 6 0 9 13 944 1 7 0]
[ 3 11 17 3 16 3 0 975 2 34]
[ 5 38 10 16 7 28 5 4 830 20]
[ 5 3 5 13 39 10 2 34 5 853]]
Goal:
precision and recall for each class using map() to calculate list division.
from operator import truediv
import numpy as np
tp = np.diag(cm)
prec = list(map(truediv, tp, np.sum(cm, axis=0)))
rec = list(map(truediv, tp, np.sum(cm, axis=1)))
print ('Precision: {}\nRecall: {}'.format(prec, rec))
Result:
Precision: [0.959, 0.926, 0.909, 0.913, 0.896, 0.880, 0.941, 0.925, 0.886, 0.877]
Recall: [0.972, 0.968, 0.888, 0.863, 0.937, 0.870, 0.954, 0.916, 0.861, 0.880]
please note: 10 classes, 10 precisions and 10 recalls.
Agreeing with gruangly and EuWern, I modified PabTorre's solution accordingly to generate precision and recall per class.
Also, given my use case (NER) where a model could:
Never predict a class that is present in the input text (i.e. a column of zeros, i.e. TP:0, FP:0, FN: all), causing a nan in the precision array, or
Predict a class that is completely absent in the input text (i.e. a row of zeros, i.e. TP:0, FN:0, FP: all), causing a nan in the recall array...
I wrap the array with a numpy.nan_to_num() to convert any nan to zero. This is not a mathematical decision, but a per use-case, functional decision in how to handle never-predicted, or never-occuring classes.
import numpy
confusion_matrix = numpy.array([
[ 5, 0, 0, 0, 0, 3],
[ 0, 2, 0, 1, 0, 5],
[ 0, 0, 0, 3, 5, 7],
[ 0, 0, 0, 9, 0, 0],
[ 0, 0, 0, 9, 32, 3],
[ 0, 0, 0, 0, 0, 0]
])
true_positives = numpy.diag(confusion_matrix)
false_positives = numpy.sum(confusion_matrix, axis=0) - true_positives
false_negatives = numpy.sum(confusion_matrix, axis=1) - true_positives
precision = numpy.nan_to_num(numpy.divide(true_positives, (true_positives + false_positives)))
recall = numpy.nan_to_num(numpy.divide(true_positives, (true_positives + false_negatives)))
print(true_positives) # [ 5 2 0 9 32 0 ]
print(false_positives) # [ 0 0 0 13 5 18 ]
print(false_negatives) # [ 3 6 15 0 12 0 ]
print(precision) # [1. 1. 0. 0.40909091 0.86486486 0. ]
print(recall) # [0.625 0.25 0. 1. 0.72727273 0. ]
import numpy as np
n_classes=3
cm = np.array([[0,1,2],
[5,4,3],
[8,7,6]])
sp = []
f1 = []
gm = []
sens = []
acc= []
for c in range(n_classes):
tp = cm[c,c]
fp = sum(cm[:,c]) - cm[c,c]
fn = sum(cm[c,:]) - cm[c,c]
tn = sum(np.delete(sum(cm)-cm[c,:],c))
recall = tp/(tp+fn)
precision = tp/(tp+fp)
accuracy = (tp+tn)/(tp+fp+fn+tn)
specificity = tn/(tn+fp)
f1_score = 2*((precision*recall)/(precision+recall))
g_mean = np.sqrt(recall * specificity)
sp.append(specificity)
f1.append(f1_score)
gm.append(g_mean)
sens.append(recall)
acc.append(tp)
print("for class {}: recall {}, specificity {}\
precision {}, f1 {}, gmean {}".format(c,round(recall,4), round(specificity,4), round(precision,4),round(f1_score,4),round(g_mean,4)))
print("sp: ", np.average(sp))
print("f1: ", np.average(f1))
print("gm: ", np.average(gm))
print("sens: ", np.average(sens))
print("accuracy: ", np.sum(acc)/np.sum(cm))

Resources