Change learning rate in Keras once loss stop decreasing - python-3.x

I'm new to deep learning.
I have build a small architecture and compiling it using Adam optimizer as shown below:
model.compile(optimizer=Adam(learning_rate=0.0001), loss="mse")
#Train it by providing training images
model.fit(x, x, epochs=10, batch_size=16)
Now i'm aware of all type of decay where I can change learning rate at some epoch, but is there a way where I can change my learning rate automatically once my loss stop decreasing.
PS: It might be silly ques but please don't be harsh on me as I'm new!

You can use the Callbacks API in Keras.
It provides the following classes in keras.callbacks to alter learning rate on each epoch:
1. LearningRateScheduler
You can create your own learning rate schedule as a function of epoch.
Then pass the callback object to the callbacks argument to the fit method in a list.
For example, say your callback object is called lr_callback, then you would use:
model.fit(train_X, train_y, epochs=10, callbacks=[lr_callback]
Refer: keras.callbacks.LearningRateScheduler
2. ReduceLROnPlateau
This reduces the learning rate once your learning rate stops decreasing by min_delta amount. You can also set the patience and other useful parameters.
You can pass the callback to the fit method in the same way as done above.
Refer: keras.callbacks.ReduceLROnPlateau
The usage for both the callbacks is detailed quite well in the docs, which I have linked above already.
Alternatively, you could define your own callback to schedule the learning rate, if the above do not satisfy your requirements.

As you use the Adam optimizer, the learning rate is automatically adjusted depending on the gradient. So the more the gradient reaches a global minimum, the smaller the learning rate gets, so it does not "jump" over the global minimum. I am not sure, what you want to achieve, but the learning_rate you define in Adam() is just a start learning rate.
For more information on Adam and optimizers, I recommend the book hands-on machine learning of Aurélien Géron

Related

Setting adaptable learning rate for Adam without using callbacks

I am exploring the translation model with attention from Tensorflow docs - NMT with Attention . Here in TF 2.0, optimizer is defined as
optimizer = tf.keras.optimizers.Adam()
How do I set a learning rate in this case? Is it just initializing the argument like below? How do I set an adaptable learning rate?
tf.keras.optimizers.Adam(learning_rate=0.001)
In the NMT with Attention model, they dont use Keras to define the model and I could not use Callbacks or 'model.fit' like below:
model.fit(x_train, y_train, callbacks=[LearningRateReducerCb()], epochs=5)
I have not much experience with NMT, but regarding the link you have provided, the best would be probably to use LearningRateSchedule that can be directly used as learning rate parameter in any optimizer (e.g. in Adam in your example). The process would be following:
Define your adaptive learning rate schedule (e.g. AdaptiveLRSchedule that would inherit from LearningRateSchedule and would implement any adaptive learning rate you prefer, similarly like in this example).
Instantiate your object - learning_rate = AdaptiveLRSchedule(your_parameters)
Use it as a learning rate in Adam optimizer - optimizer = tf.keras.optimizers.Adam(learning_rate)
Keep the rest as is in the example (optimizer.apply_gradients(zip(gradients, variables) should now correctly apply gradients and use adaptive learning rate according to your definition).
Note, that instead of defining your own class in the first step you can use one of schedules that already exists in TF like ExponencialDecay.
The second option would be to manually set lr in train_step (here in your example). After backpropagation (see apply_gradients method in (4) above that comes from your example) you can directly set your lr. This can be achieved via optimizer.learning_rate.assign(new_lr) where new_lr would be new learning rate coming from your adaptive lr function that you have to define (something like new_lr = adaptive_lr(optimizer.learning_rate) where adaptive_lr would implement it).

Best Way to Overcome Early Convergence for Machine Learning Model

I have a machine learning model built that tries to predict weather data, and in this case I am doing a prediction on whether or not it will rain tomorrow (a binary prediction of Yes/No).
In the dataset there is about 50 input variables, and I have 65,000 entries in the dataset.
I am currently running a RNN with a single hidden layer, with 35 nodes in the hidden layer. I am using PyTorch's NLLLoss as my loss function, and Adaboost for the optimization function. I've tried many different learning rates, and 0.01 seems to be working fairly well.
After running for 150 epochs, I notice that I start to converge around .80 accuracy for my test data. However, I would wish for this to be even higher. However, it seems like the model is stuck oscillating around some sort of saddle or local minimum. (A graph of this is below)
What are the most effective ways to get out of this "valley" that the model seems to be stuck in?
Not sure why exactly you are using only one hidden layer and what is the shape of your history data but here are the things you can try:
Try more than one hidden layer
Experiment with LSTM and GRU layer and combination of these layers together with RNN.
Shape of your data i.e. the history you look at to predict the weather.
Make sure your features are scaled properly since you have about 50 input variables.
Your question is little ambiguous as you mentioned RNN with a single hidden layer. Also without knowing the entire neural network architecture, it is tough to say how can you bring in improvements. So, I would like to add a few points.
You mentioned that you are using "Adaboost" as the optimization function but PyTorch doesn't have any such optimizer. Did you try using SGD or Adam optimizers which are very useful?
Do you have any regularization term in the loss function? Are you familiar with dropout? Did you check the training performance? Does your model overfit?
Do you have a baseline model/algorithm so that you can compare whether 80% accuracy is good or not?
150 epochs just for a binary classification task looks too much. Why don't you start from an off-the-shelf classifier model? You can find several examples of regression, classification in this tutorial.

Adam's Optimizer and Gradient Descent

I am trying to understand what is the difference between these Adam Optimizer and Gradient Descent Optimizer and which one is the best to use in which situation. I am looking into TF website, but if you know of place where these are explained in a better and easy to understand way, let me know?
AdamOptimizer is using the Adam Optimizer to update the learning rate. Its is an adaptive method compared to the gradient descent which maintains a single learning rate for all weight updates and the learning rate does not change.
Adam has the advantage over the GradientDescent of using the running average (momentum) of the gradients (mean) as well as the running average of the gradient squared.
There is no such thing as which one is the better to use, it is all dependent on your problem, network and data. But generally, Adam has proven itself to be leading and is one of the most commonly used within DL tasks, as it achieves better results and accuracy metrics.

Pytorch: Intermediate testing during training

How can I test my pytorch model on validation data during training?
I know that there is the function myNet.eval() which apparantly switches of any dropout layers, but is it also preventing the gradients from being accumulated?
Also how would I undo the myNet.eval() command in order to continue with the training?
If anyone has some code snippet / toy example I would be grateful!
How can I test my pytorch model on validation data during training?
There are plenty examples where there are train and test steps for every epoch during training. An easy one would be the official MNIST example. Since pytorch does not offer any high-level training, validation or scoring framework you have to write it yourself. Commonly this consists of
a data loader (commonly based on torch.utils.dataloader.Dataloader)
a main loop over the total number of epochs
a train() function that uses training data to optimize the model
a test() or valid() function to measure the effectiveness of the model given validation data and a metric
This is also what you will find in the linked example.
Alternatively you can use a framework that provides basic looping and validation facilities so you don't have to implement everything by yourself all the time.
tnt is torchnet for pytorch, supplying you with different metrics (such as accuracy) and abstraction of the train loop. See this MNIST example.
inferno and torchsample attempt to model things very similar to Keras and provide some tools for validation
skorch is a scikit-learn wrapper for pytorch that lets you use all the tools and metrics from sklearn
Also how would I undo the myNet.eval() command in order to continue with the training?
myNet.train() or, alternatively, supply a boolean to switch between eval and training: myNet.train(True) for train mode.
I know that there is the function myNet.eval() which apparantly switches of any dropout layers, but is it also preventing the gradients from being accumulated?
It doesn't prevent gradients from accumulating.
But I think during testing, you do want to ignore gradients. In that case, you should mark the variable input to the network as volatile=True, and it will save some time and space used in forward calculation.
Also how would I undo the myNet.eval() command in order to continue with the training?
myNet.train()

Adaptive learning rate Lasagne

I am using Lasagne and Theano library to build my own deep learning model following the MNIST example. Can anyone please tell me how the adaptively change the learning rate?
I recommend having a look at https://github.com/Lasagne/Lasagne/blob/master/lasagne/updates.py.
If you are using sgd, then you can use a momentum term (e.g. https://github.com/Lasagne/Lasagne/blob/master/lasagne/updates.py#L156) to adaptively change the learning rate. If you want to make anything non-standard, the momentum implementation give you enough hints how to create something similar on your own.
I think the best way of doing this is by creating a theano shared variable for your learning rate, passing the shared variable to the updates function and changing through the set_value method, as follows:
lr_shared = theano.shared(np.array(0.1, dtype=theano.config.floatX))
updates = lasagne.updates.rmsprop(..., learning_rate=lr_shared)
...
for epoch in range(num_epochs):
if epoch % 10 == 0:
lr_shared.set_value(lr_shared.get_value() / 10)
Of course you can change the optimizer and the if codition, this is just an example.

Resources