What's the effect of monitor in keras? - keras

In Keras, we use ModelCheckpoint to save our trained models. In a document of Keras, it explains that "monitor: quantity to monitor.", but I still can't understand it. What's the effect of monitor in our machine learning process?
keras.callbacks.callbacks.ModelCheckpoint(filepath, monitor='val_loss', verbose=0, save_best_only=False, save_weights_only=False, mode='auto', period=1)

https://keras.io/callbacks/
From the keras documentation, I'm explaining the parameters of the ModelCheckpoint. It's used to save your best model while training, the reason is maybe after training for few epochs the model may start to diverge or show poor performance/ may get overfitted. Many epochs do not always mean best performance, so it's better to keep saving the weights while training.
save_best_only: if save_best_only=True, the latest best model according to the quantity monitored will not be overwritten. Here, it clearly says, the model will be saved based on the value of the quantity to be monitored.
mode: one of {auto, min, max}. If save_best_only=True, the decision to overwrite the current save file is made based on either the maximization or the minimization of the monitored quantity. For val_acc, this should be max, for val_loss this should be min, etc. In auto mode, the direction is automatically inferred from the name of the monitored quantity.
The mode will be determined based on your monitoring metric, if it's a loss then the mode must be min, if it's something like accuracy, f1 score etc. then the mode must be max. (You want to save the weights which shows least loss, and best accuracy so far)
verbose: verbosity mode, 0 or 1. verbose determines how much information you want to get printed about your metrics (0 means nothing will be printed, 1 means some information will be printed)
Other parameters should be very easy to understand.

Related

What is the proper to save the fitted CNN model for MNIST dataset?

I develpoed a simple CNN model for MNIST dataset and i got 98% validation accuracy. But after saving the model through keras as model.h5 and evaluating the inference of th saved model in another jypyter session, the performance of the model is poor and the predictions are random
What needs to be done to get same accuracy after saving and uploading the model in different jypyter notebook session?
(Consider sharing your code/results so the community can help you better).
I'm assuming you're using Tensorflow/Keras, so model.save('my_model.h5') after your model.fit(...) should save the model, including the trained parameters (but not including the internal optimizer data; i.e gradients, etc..., which shouldn't affect the prediction capabilities of the model).
A number of things could cause a generalization gap like that, but...
Case 1: having a high training/validation accuracy and a low test (prediction) accuracy typically means your model overfit on the given training data.
I suggest adding some regularization to your training phase (dropout layers, cutout augmentation, L1/L2, etc...), a fewer number of epochs or early-stopping, or cross-validation/data reshuffle to cross off the possibility of overfitting.
Case 2: low intrinsic dataset variance, but unless you're using a subset of MNIST, this is unlikely. Make sure you are properly splitting your training/validation/test sets.
Again, it could be a number of issues, but these are the most common cases for low model generalization. Post your code (specifying the architecture, optimizer, hyperparameters, data prepropcessing, and test data used) so the answers can be more relevant to your problem.

Can the increase in training loss lead to better accuracy?

I'm working on a competition on Kaggle. First, I trained a Longformer base with the competition dataset and achieved a quite good result on the leaderboard. Due to the CUDA memory limit and time limit, I could only train 2 epochs with a batch size of 1. The loss started at about 2.5 and gradually decreased to 0.6 at the end of my training.
I then continued training 2 more epochs using that saved weights. This time I used a little bit larger learning rate (the one on the Longformer paper) and added the validation data to the training data (meaning I no longer split the dataset 90/10). I did this to try to achieve a better result.
However, this time the loss started at about 0.4 and constantly increased to 1.6 at about half of the first epoch. I stopped because I didn't want to waste computational resources.
Should I have waited more? Could it eventually lead to a better test result? I think the model could have been slightly overfitting at first.
Your model got fitted to the original training data the first time you trained it. When you added the validation data to the training set the second time around, the distribution of your training data must have changed significantly. Thus, the loss increased in your second training session since your model was unfamiliar with this new distribution.
Should you have waited more? Yes, the loss would have eventually decreased (although not necessarily to a value lower than the original training loss)
Could it have led to a better test result? Probably. It depends on if your validation data contains patterns that are:
Not present in your training data already
Similar to those that your model will encounter in deployment
In fact it's possible for an increase in training loss to lead to an increase in training accuracy. Accuracy is not perfectly (negatively) correlated with any loss function. This is simply because a loss function is a continuous function of the model outputs whereas accuracy is a discrete function of model outputs. For example, a model that predicts low confidence but always correct is 100% accurate, whereas a model that predicts high confidence but is occasionally wrong can produce a lower loss value but less than 100% accuracy.

Data augmentation affects convergence speed

Data augmentation is surely a great regularization method, and it improves my accuracy on the unseen test set. However, I do not understand why it reduces the convergence speed of the network? I know each epoch takes a longer time to train since image transformations are applied on the fly. But why does it affect the convergence? For my current setup, the network hits a 100% training accuracy after 5 epochs without data augmentation (and clearly overfits) - with data augmentation, it takes 23 epochs to hit 95% training accuracy and never seems to hit 100%.
Any links to research papers or comments on the reasonings behind this?
I guess you are evaluating accuracy on the train set, right? And it is a mistake...
Without augmentation your network simply overfits. You have a predefined number of images, for instance, 1000, and your network during training can easily memorize dataset labels. And you are evaluating the model on the fixed (not augmented) dataset.
When you are training your network with data augmentation, basically, you are training a model on a dataset of infinite size. You are doing augmentation on the fly, which means that the model "sees" new images every time, and it cannot memorize them perfectly with 100% accuracy. And you are evaluating the model on the augmented (infinite) dataset.
When you train your model with and without augmentation, you evaluate it on the different datasets, so it is not correct to compare their accuracy.
Piece of advice:
Do not look at train set accuracy, it is simply misleading when you use augmentations. Instead - evaluate your model on the test set (or validation set), which is not augmented. By doing this - you'll see the real accuracy increase for your model.
P.S. If you want to find out more about image augmentaitons, I really recommend you to check this guide - https://notrocketscience.blog/complete-guide-to-data-augmentation-for-computer-vision/

Abrupt increase in RMSE in LSTM while working on Time Series Prediction

I have the following LSTM network(Fig 1) for predicting the Bitcoin Price. The input is every hour close price of Bitcoin. I am facing some issues and any advice is appreciated.
Earlier on the same network, my RMSE on testing and training set was 6.71 and 7.41 RMSE. I recompiled the whole code and there was an abrupt increase to 233.51 for the training set and 345.56 for the testing set. Can anyone help me with finding out the reason behind this?
Also, How to improve the accuracy of my network as it very low in every iteration?
How should I decide the parameters for my LSTM network. (units, epochs, batch_size, time_steps to input)
Thank you in advance for any help extended.
Your question requires a lot more information. For example, data size, timesteps lookback, data preprocessing procedure, etc. But I would recommend you to debug your problem with the following method. First, check whether your input/output data are processed properly or not. Then, try to train the simpler model apart from LSTM as it could result in overfitting. But sometimes, if the input signal is too random, it is normal that your model results would highly fluctuate as there's no correlation in data.
PS. never use the Machine Learning model to predict stock price. It never works.

Validation loss in keras while training LSTM and stability of LSTM

I am using Keras now to train my LSTM model for a time series problem. My activation function is linear and the optimizer is Rmsprop.
However, i observe the tendency that while the training loss is decreasing slowly overtime, and fluctuate around a small value, the validation loss jumps up and down with a large variance.
Therefore, I come up with two questions:
1. Does the validation loss affect the training process? Will the algorithm look at the validation loss and slow down the learning rate in case it fluctuates alot?
2. How can i make the model more stable so that it will return a more stable values of validation loss?
Thanks
Does the validation loss affect the training process?
No. The validation loss is just a small sample of data that is excluded from the training process. It is run through the network at the end of an epoch, to test how well training is going, so that you can check if the model is over fitting (i.e. training loss much < validation loss).
Fluctuation in validation loss
This is bit tougher to answer without the network or data. It could just mean that your model isn't converging well to unseen data, meaning that its not seeing a enough similar trends from training data to validation data, and each time the weights are adjusted to better suit the training data, the model becomes less accurate for the validation set. You could possibly turn down the learning rate, but if your training loss is decreasing slowly, the learning rate is probably fine. I think in this situation, you have to ask yourself a few questions. Do I have enough data? Does a true time series trend exist in my data? Have I normalized my data correctly? Is my network to large for the data I have?
I had this issue - while training loss was decreasing, the validation loss was not decreasing. I checked and found while I was using LSTM:
I simplified the model - instead of 20 layers, I opted for 8 layers.
Instead of scaling within range (-1,1), I choose (0,1), this right there reduced my validation loss by magnitude of one order
I reduced the batch size from 500 to 50 (just trial and error)
I added more features, which I thought intuitively would add some new intelligent information to the X->y pair
Possible reasons:
Your validation set is very small compare to your trainning set which usually happens. A little change of weights makes validation loss fluctuate much more than trainning loss. This may not neccessary mean that your model is overfiting. As long as the overall trendency of validation loss keeps decreasing.
May be your train and validation data are from different sources, they may have different distributions. This may happen when your data is time series, and you split your train/validation data by a specific timestamp.
Does the validation loss affect the training process?
No, validation(forward-pass-once) and training(forward-and-backward) are different processes. Hence a single forword pass does not change how would you train next.
Will the algorithm look at the validation loss and slow down the learning rate in case it fluctuates alot?
No, But I guess you can implement your own method to do so. However, one thing should be noted, the model is trying to learn the best solution to your cost function which are fed by trainning data only, so changing this learning rate by observing validation loss doesnt make too much sense.
How can i make the model more stable so that it will return a more stable values of validation loss?
The reasons are expained above. If it is the first case, enlarge validation set will make your loss looks more stable but it does NOT mean it fits better. My suggestion is as long as your are sure your model does not overfit (gap between train loss and validation loss are not too large ), you can just save the model which gives the lowest validation loss.
If its the second case, it can be complecated depend on your case. You could try to exclude samples in trainning set which are not "similar" with your validation set, or enlarge your model's capacity if you have enough data. Or perhapes add more metrics to monitor how well the training.

Resources