How to use ModelCheckpoint() in Keras with weighted validation loss - keras

I train a DNN in Keras which has high imbalanced classes. So I used class_weight in fit_generator to correct this. Now I want to save the model with the lowest weighted validation loss using the ModelCheckpoint() function. I am trying but I can't figure out the way to achieve this. Would any one have a simple example?

ModelCheckpoint("checkpoint.hdf5", monitor='val_loss', mode = 'min', verbose=1, save_best_only = True)
model.fit_genetor(....)
I think you are asking for this piece of code.

Related

Does it make sense to use scikit-learn cross_val_predict() to (i)make predictions with unseen data in k-fold cross-validation and (ii)compare models?

I'm training and evaluating a logistic regression and a XGBoost classifier.
With the XGBoost classifier, a training/validation/test split of the data and the subsequent training and validation shows the model is overfitting the training data. So, I'm working with k-fold cross-validation to reduce overfitting.
To work with k-fold cross-validation, I'm splitting my data into training and test sets and performing the k-fold cross-validation on the training set. The code looks something like the following:
model = XGBClassifier()
kfold = StratifiedKFold(n_splits = 10)
results = cross_val_score(model, x_train, y_train, cv = kfold)
The code works. Now, I've read several forums and blogs on how to make predictions after a k-fold cross-validation, but after these readings, I'm still not sure about the proper way of doing the predictions.
It would seem that using the cross_val_predict() method from sklearn.model_selection and using the test set is OK. The code would look something like the following:
y_pred = cross_val_predict(model, x_test, y_test, cv = kfold)
The code works, but the issue is whether this makes sense since I've seen more complicated ways of doing so and where it doesn't seem clear whether the training or the test set should be used for the predictions.
And if this makes sense, computing the accuracy score and the confusion matrix would be as simple as running something like the following:
accuracy = metrics.accuracy_score(y_test, y_pred)
cm = metrics.confusion_matrix(y_test, y_pred)
These two would help compare the logistic regression and the XGBoost classifier. Does this way of making predictions and evaluating models make sense?
Any help is appreciated! Thanks!
I want to answer this question I posted myself by summarizing things I have read and tried.
First, I want to clarify that the idea behind splitting my data into training/test sets and performing the k-fold cross-validation on the training set is to reserve the test set for providing a generalization error in much the same way we split data into training/validation/test sets and use the test set for providing a generalization error. For the sake of clarity, let me split the discussion into 2 sections.
Section 1
Now, reading more stuff, it's clearer to me cross_val_predict() returns the predictions that were obtained during the cross-validation when the elements were in a test set (see section 3.1.1.2 in this scikit-learn cross-validation doc). This test set refers to one of the test sets the cross-validation procedure internally creates (cross-validation creates a test set in each fold). Thus:
y_pred = cross_val_predict(model, x_train, y_train, cv = kfold)
returns the predictions from the cross-validation internal test sets. It then seems safe to obtain the accuracy and confusion matrix with:
accuracy = metrics.accuracy_score(y_train, y_pred)
cm = metrics.confusion_matrix(y_train, y_pred)
While cross_val_predict(model, x_test, y_test, cv = kfold) runs, it seems doing this doesn't make much sense.
Section 2
From some blogs that talk about creating a confusion matrix after a cross-validation procedure (see here and here), I borrowed code that, for each fold of the cross-validation, extracts the labels and predictions from the internal test set. These labels and predictions are later used to compute the confusion matrix. Assuming I store the labels and predictions in variables called actual_classes and predicted_classes, respectively, I then run:
accuracy = metrics.accuracy_score(actual_classes, predicted_classes)
cm = metrics.confusion_matrix(actual_classes, predicted_classes)
The results are exactly the same as the ones from Section 1's equivalent code. This reinforces that cross_val_predict(model, x_train, y_train, cv = kfold) works fine.
Thus:
Does it make sense to use scikit-learn cross_val_predict() to make
predictions with unseen data in k-fold cross-validation? I would say
No, it doesn't since cross_val_predict() makes predictions with
the internal test sets from the cross-validation procedure. It
seems that to make predictions with unseen data and compute a
generalization error we would need a way to extract one of the
models from the cross-validation procedure (e.g., see this
question)
Does it make sense to use scikit-learn cross_val_predict() to
compare models? I would say Yes, it does as long as the method is
executed as shown in Section 1. The accuracy and confusion matrix
could be used to make comparisons against other models.
Any comment is appreciated! Thanks!

Why accuracy train not constant in MLPClassifier

I am beginner in neural network. and I'm trying to do text classification using MLPClassifier. I'm confused by the results of the accuracy of the train which changes every time I re-run it. The code I use is as follows:
classifier2 = MLPClassifier(activation='logistic',
batch_size=32,hidden_layer_sizes=(200),learning_rate='constant',learning_rate_init=0.01,
max_iter=100,random_state=None, solver='adam', verbose=2, beta_1=0.9, beta_2=0.999, epsilon=1e-8,
n_iter_no_change=10, early_stopping=True, warm_start=True)
classifier2 = classifier2.fit(Train_X1_Tfidf, Train_Y1)
classifier2.score(Train_X1_Tfidf, Train_Y1)
although the difference is not significant, as far as I have tried, the biggest difference in accuracy is only around 3%. is there any explanation about this? Thank you if someone wants to help explain.

LSTM Autoencoder for Anomaly detection in time series, correct way to fit model

I'm trying to find correct examples of using LSTM Autoencoder for defining anomalies in time series data in internet and see a lot of examples, where LSTM Autoencoder model are fitted with labels, which are future time steps for feature sequences (as for usual time series forecasting with LSTM), but I suppose, that this kind of model should be trained with labels which are the same sequence as sequence of features (previous time steps).
The first link in the google by this searching for example - https://towardsdatascience.com/time-series-of-price-anomaly-detection-with-lstm-11a12ba4f6d9
1.This function defines the way to get labels (y feature)
def create_sequences(X, **y**, time_steps=TIME_STEPS):
Xs, ys = [], []
for i in range(len(X)-time_steps):
Xs.append(X.iloc[i:(i+time_steps)].values)
ys.append(y.iloc**[i+time_steps]**)
return np.array(Xs), np.array(ys)
X_train, **y_train** = create_sequences(train[['Close']], train['Close'])
X_test, y_test = create_sequences(test[['Close']], test['Close'])
2.Model is fitted as follow
history = model.fit(X_train, **y_train**, epochs=100, batch_size=32, validation_split=0.1,
callbacks=[keras.callbacks.EarlyStopping(monitor='val_loss', patience=3, mode='min')], shuffle=False)
Could you kindly comment the way how Autoencoder is implemented in the link on towardsdatascience.com/?
Is it correct method or model should be fitted following way ?
model.fit(X_train,X_train)
Thanks in advance!
This is time series auto-encoder. If you want to predict for future, it goes this way. The auto-encoder / machine learning model fitting is different for different problems and their solutions. You cannot train and fit one model / workflow for all problems. Time-series / time lapse can be what we already collected data for time period and predict, it can be for data collected and future prediction. Both are differently constructed. Like time series data for sub surface earth is differently modeled, and for weather forecast is differently. One model cannot work for both.
By definition an autoencoder is any model attempting at reproducing it's input, independent of the type of architecture (LSTM, CNN,...).
Framed this way it is a unspervised task so the training would be : model.fit(X_train,X_train)
Now, what she does in the article you linked, is to use a common architecture for LSTM autoencoder but applied to timeseries forecasting:
model.add(LSTM(128, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(RepeatVector(X_train.shape[1]))
model.add(LSTM(128, return_sequences=True))
model.add(TimeDistributed(Dense(X_train.shape[2])))
She's pre-processing the data in a way to get X_train = [x(t-seq)....x(t)] and y_train = x(t+1)
for i in range(len(X)-time_steps):
Xs.append(X.iloc[i:(i+time_steps)].values)
ys.append(y.iloc[i+time_steps])
So the model does not per-se reproduce the input it's fed, but it doesn't mean it's not a valid implementation since it produce valuable prediction.

How to implement weighted cross entropy loss in Keras?

I would like to know how to add in custom weights for the loss function in a binary or multiclass classifier in Keras. I am using binary_crossentropy or sparse_categorical_crossentropy as the baseline and I want to be able to choose what weight to give incorrect predictions for each class.
For multiple classes one should use not binary but categorical crossentropy.
Consider using custom loss function as described here: Custom loss function in Keras

Imbalanced dataset using MLP classifier in python

I am dealing with imbalanced dataset and I try to make a predictive model using MLP classifier. Unfortunately the algorithm classifies all the observations from test set to class "1" and hence the f1 score and recall values in classification report are 0. Does anyone know how to deal with it?
model= MLPClassifier(solver='lbfgs', activation='tanh')
model.fit(X_train, y_train)
score=accuracy_score(y_test, model.predict(X_test), )
fpr, tpr, thresholds = roc_curve(y_test, model.predict_proba(X_test)[:,1])
roc=roc_auc_score(y_test, model.predict_proba(X_test)[:,1])
cr=classification_report(y_test, model.predict(X_test))
There are few techniques to handle the imbalanced dataset. A fully dedicated python library "imbalanced-learn" is available here. But one should be cautious about which technique should be used in a specific case.
Few interesting examples are also available at https://svds.com/learning-imbalanced-classes/

Resources