Why accuracy train not constant in MLPClassifier - scikit-learn

I am beginner in neural network. and I'm trying to do text classification using MLPClassifier. I'm confused by the results of the accuracy of the train which changes every time I re-run it. The code I use is as follows:
classifier2 = MLPClassifier(activation='logistic',
batch_size=32,hidden_layer_sizes=(200),learning_rate='constant',learning_rate_init=0.01,
max_iter=100,random_state=None, solver='adam', verbose=2, beta_1=0.9, beta_2=0.999, epsilon=1e-8,
n_iter_no_change=10, early_stopping=True, warm_start=True)
classifier2 = classifier2.fit(Train_X1_Tfidf, Train_Y1)
classifier2.score(Train_X1_Tfidf, Train_Y1)
although the difference is not significant, as far as I have tried, the biggest difference in accuracy is only around 3%. is there any explanation about this? Thank you if someone wants to help explain.

Related

What does this loss Curve tell me about the data?

currently i am working with LSTM and GRU Modells.
I applied them on a Multivariate Time Series Problem.
reset_random_seeds()
# design network weekly
model = Sequential()
#inputshape is using time stamps , feautures
model.add(LSTM(64,activation="relu" ,dropout=0.2, input_shape=(1, 19)))
model.add(Dense(20))
model.add(Dense(1))
model.compile(loss=root_mean_squared_error, optimizer='adam')
# fit network
history = model.fit(train_X, train_y, epochs=300, batch_size=258, validation_data=(test_X, test_y), verbose=2, shuffle=False)
# plot history
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
pyplot.show()
enter image description here
The Loss Validation looks like this and the results are pretty good to but i wonder what this graphs means to me.
Results:
enter image description here
enter image description here
So at first this loss graph tells you how high the cumulated error for each example in your validation or training dataset is. So the loss value implies how good or bad your model is after each training iteration.
You loss graph shows us, that you have a much higher training loss than validation loss. This should be a good indicator, that your model at least might not be overfitting. On the other hand it also can be an indication, that your validation dataset might be very small or not that representativ. For example your validation dataset could be very skewed.
So we would have to look into your datasets a little deeper to fully interpret your loss graphes. If you can provide them I would be happy to have a look.

After a few epochs, the difference between Valid loss and Loss increases

I'm trying to train the model on a MagnaTagAtune dataset. Is the model properly trained? What is the problem, does anyone know? Will waiting solve the problem?
The results are shown in the image.
enter image description here
Thank you pseudo_random_here for your answer. Your tips were helpful, but the problem was still there.
Unfortunately, changing the learning rate did not work. Now, after your advice, I will use the SGD optimizer with a learning rate of 0.1. I even used another model that was for this but the problem was not solved.
from keras.optimizers import SGD
opt = SGD(lr=0.1)
model.compile(loss = "categorical_crossentropy", optimizer = opt)
Short answer: I would say your val_loss is too high and waiting is unlikely to solve your problem
Explanation: I believe there are two possibilities here:
Your architecture is not suitable for the data
Your learning rate is too small
PS. It would help a lot if you were to provide info on what architecture of NNs you are using, what loss function we are looking at and what exactly is it that you are predicting?

LSTM Autoencoder for Anomaly detection in time series, correct way to fit model

I'm trying to find correct examples of using LSTM Autoencoder for defining anomalies in time series data in internet and see a lot of examples, where LSTM Autoencoder model are fitted with labels, which are future time steps for feature sequences (as for usual time series forecasting with LSTM), but I suppose, that this kind of model should be trained with labels which are the same sequence as sequence of features (previous time steps).
The first link in the google by this searching for example - https://towardsdatascience.com/time-series-of-price-anomaly-detection-with-lstm-11a12ba4f6d9
1.This function defines the way to get labels (y feature)
def create_sequences(X, **y**, time_steps=TIME_STEPS):
Xs, ys = [], []
for i in range(len(X)-time_steps):
Xs.append(X.iloc[i:(i+time_steps)].values)
ys.append(y.iloc**[i+time_steps]**)
return np.array(Xs), np.array(ys)
X_train, **y_train** = create_sequences(train[['Close']], train['Close'])
X_test, y_test = create_sequences(test[['Close']], test['Close'])
2.Model is fitted as follow
history = model.fit(X_train, **y_train**, epochs=100, batch_size=32, validation_split=0.1,
callbacks=[keras.callbacks.EarlyStopping(monitor='val_loss', patience=3, mode='min')], shuffle=False)
Could you kindly comment the way how Autoencoder is implemented in the link on towardsdatascience.com/?
Is it correct method or model should be fitted following way ?
model.fit(X_train,X_train)
Thanks in advance!
This is time series auto-encoder. If you want to predict for future, it goes this way. The auto-encoder / machine learning model fitting is different for different problems and their solutions. You cannot train and fit one model / workflow for all problems. Time-series / time lapse can be what we already collected data for time period and predict, it can be for data collected and future prediction. Both are differently constructed. Like time series data for sub surface earth is differently modeled, and for weather forecast is differently. One model cannot work for both.
By definition an autoencoder is any model attempting at reproducing it's input, independent of the type of architecture (LSTM, CNN,...).
Framed this way it is a unspervised task so the training would be : model.fit(X_train,X_train)
Now, what she does in the article you linked, is to use a common architecture for LSTM autoencoder but applied to timeseries forecasting:
model.add(LSTM(128, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(RepeatVector(X_train.shape[1]))
model.add(LSTM(128, return_sequences=True))
model.add(TimeDistributed(Dense(X_train.shape[2])))
She's pre-processing the data in a way to get X_train = [x(t-seq)....x(t)] and y_train = x(t+1)
for i in range(len(X)-time_steps):
Xs.append(X.iloc[i:(i+time_steps)].values)
ys.append(y.iloc[i+time_steps])
So the model does not per-se reproduce the input it's fed, but it doesn't mean it's not a valid implementation since it produce valuable prediction.

How to use ModelCheckpoint() in Keras with weighted validation loss

I train a DNN in Keras which has high imbalanced classes. So I used class_weight in fit_generator to correct this. Now I want to save the model with the lowest weighted validation loss using the ModelCheckpoint() function. I am trying but I can't figure out the way to achieve this. Would any one have a simple example?
ModelCheckpoint("checkpoint.hdf5", monitor='val_loss', mode = 'min', verbose=1, save_best_only = True)
model.fit_genetor(....)
I think you are asking for this piece of code.

Overfitting after one epoch

I am training a model using Keras.
model = Sequential()
model.add(LSTM(units=300, input_shape=(timestep,103), use_bias=True, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(units=536))
model.add(Activation("sigmoid"))
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
while True:
history = model.fit_generator(
generator = data_generator(x_[train_indices],
y_[train_indices], batch = batch, timestep=timestep),
steps_per_epoch=(int)(train_indices.shape[0] / batch),
epochs=1,
verbose=1,
validation_steps=(int)(validation_indices.shape[0] / batch),
validation_data=data_generator(
x_[validation_indices],y_[validation_indices], batch=batch,timestep=timestep))
It is a multiouput classification accoriding to scikit-learn.org definition:
Multioutput regression assigns each sample a set of target values.This can be thought of as predicting several properties for each data-point, such as wind direction and magnitude at a certain location.
Thus, it is a recurrent neural network I tried out different timestep sizes. But the result/problem is mostly the same.
After one epoch, my train loss is around 0.0X and my validation loss is around 0.6X. And this values keep stable for the next 10 epochs.
Dataset is around 680000 rows. Training data is 9/10 and validation data is 1/10.
I ask for intuition behind that..
Is my model already over fittet after just one epoch?
Is 0.6xx even a good value for a validation loss?
High level question:
Therefore it is a multioutput classification task (not multi class), I see the only way by using sigmoid an binary_crossentropy. Do you suggest an other approach?
I've experienced this issue and found that the learning rate and batch size have a huge impact on the learning process. In my case, I've done two things.
Reduce the learning rate (try 0.00005)
Reduce the batch size (8, 16, 32)
Moreover, you can try the basic steps for preventing overfitting.
Reduce the complexity of your model
Increase the training data and also balance each sample per class.
Add more regularization (Dropout, BatchNorm)

Resources