Adaptive learning rate Lasagne - theano

I am using Lasagne and Theano library to build my own deep learning model following the MNIST example. Can anyone please tell me how the adaptively change the learning rate?

I recommend having a look at https://github.com/Lasagne/Lasagne/blob/master/lasagne/updates.py.
If you are using sgd, then you can use a momentum term (e.g. https://github.com/Lasagne/Lasagne/blob/master/lasagne/updates.py#L156) to adaptively change the learning rate. If you want to make anything non-standard, the momentum implementation give you enough hints how to create something similar on your own.

I think the best way of doing this is by creating a theano shared variable for your learning rate, passing the shared variable to the updates function and changing through the set_value method, as follows:
lr_shared = theano.shared(np.array(0.1, dtype=theano.config.floatX))
updates = lasagne.updates.rmsprop(..., learning_rate=lr_shared)
...
for epoch in range(num_epochs):
if epoch % 10 == 0:
lr_shared.set_value(lr_shared.get_value() / 10)
Of course you can change the optimizer and the if codition, this is just an example.

Related

Change learning rate in Keras once loss stop decreasing

I'm new to deep learning.
I have build a small architecture and compiling it using Adam optimizer as shown below:
model.compile(optimizer=Adam(learning_rate=0.0001), loss="mse")
#Train it by providing training images
model.fit(x, x, epochs=10, batch_size=16)
Now i'm aware of all type of decay where I can change learning rate at some epoch, but is there a way where I can change my learning rate automatically once my loss stop decreasing.
PS: It might be silly ques but please don't be harsh on me as I'm new!
You can use the Callbacks API in Keras.
It provides the following classes in keras.callbacks to alter learning rate on each epoch:
1. LearningRateScheduler
You can create your own learning rate schedule as a function of epoch.
Then pass the callback object to the callbacks argument to the fit method in a list.
For example, say your callback object is called lr_callback, then you would use:
model.fit(train_X, train_y, epochs=10, callbacks=[lr_callback]
Refer: keras.callbacks.LearningRateScheduler
2. ReduceLROnPlateau
This reduces the learning rate once your learning rate stops decreasing by min_delta amount. You can also set the patience and other useful parameters.
You can pass the callback to the fit method in the same way as done above.
Refer: keras.callbacks.ReduceLROnPlateau
The usage for both the callbacks is detailed quite well in the docs, which I have linked above already.
Alternatively, you could define your own callback to schedule the learning rate, if the above do not satisfy your requirements.
As you use the Adam optimizer, the learning rate is automatically adjusted depending on the gradient. So the more the gradient reaches a global minimum, the smaller the learning rate gets, so it does not "jump" over the global minimum. I am not sure, what you want to achieve, but the learning_rate you define in Adam() is just a start learning rate.
For more information on Adam and optimizers, I recommend the book hands-on machine learning of Aurélien Géron

Setting adaptable learning rate for Adam without using callbacks

I am exploring the translation model with attention from Tensorflow docs - NMT with Attention . Here in TF 2.0, optimizer is defined as
optimizer = tf.keras.optimizers.Adam()
How do I set a learning rate in this case? Is it just initializing the argument like below? How do I set an adaptable learning rate?
tf.keras.optimizers.Adam(learning_rate=0.001)
In the NMT with Attention model, they dont use Keras to define the model and I could not use Callbacks or 'model.fit' like below:
model.fit(x_train, y_train, callbacks=[LearningRateReducerCb()], epochs=5)
I have not much experience with NMT, but regarding the link you have provided, the best would be probably to use LearningRateSchedule that can be directly used as learning rate parameter in any optimizer (e.g. in Adam in your example). The process would be following:
Define your adaptive learning rate schedule (e.g. AdaptiveLRSchedule that would inherit from LearningRateSchedule and would implement any adaptive learning rate you prefer, similarly like in this example).
Instantiate your object - learning_rate = AdaptiveLRSchedule(your_parameters)
Use it as a learning rate in Adam optimizer - optimizer = tf.keras.optimizers.Adam(learning_rate)
Keep the rest as is in the example (optimizer.apply_gradients(zip(gradients, variables) should now correctly apply gradients and use adaptive learning rate according to your definition).
Note, that instead of defining your own class in the first step you can use one of schedules that already exists in TF like ExponencialDecay.
The second option would be to manually set lr in train_step (here in your example). After backpropagation (see apply_gradients method in (4) above that comes from your example) you can directly set your lr. This can be achieved via optimizer.learning_rate.assign(new_lr) where new_lr would be new learning rate coming from your adaptive lr function that you have to define (something like new_lr = adaptive_lr(optimizer.learning_rate) where adaptive_lr would implement it).

sklearn SGDClassifier can't stop

I am using sklearn to train a model. The train dataset is about 3000k, so i use SGDClassifier. The feature is not very good, so i know it may not converge. But i want SGDClassifier to stop early according to my setting just like max_iter = 1000. As far as I am concerned, the function SGDClassifier has no parameter like max_iter. How can i do it?
This is the code.
This is the print information.
Any help will be appreciated...
This is weird, by default in scikit-learn 0.18.2, n_iter is set to 5 epochs. Can you please update your question with a script that makes it possible to reproduce the behavior using a toy dataset (for instance generated with numpy.random.randn or similar).
Note that in scikit-learn master and 0.19 once released, n_iter will be deprecated and replaced by max_iter and a tol (for instance set to 1e-3) to automatically stop when the objective function is no longer making progress.
The 20hours running could be not so strange since you have a dataset of 3000k and you use SGDClassifier that is slow. What processor do you have?
Try stopping it by using CTRL+C if you are in Windows. Then, use n_iter to control the number of iterations that you want. The default is 5 however.
Finally, if you want to save a model see here:
Save and Load Machine Learning Models in Python with scikit-learn

How to continue training svm and knn in scikit-learn?

after training since it cost a lot of time is there a way for me to continue my training and add samples using nusvc() and nearestneighbor() in scikitlearn?
For the SVM, you might be able to use the online learning abilities of the SGDClassifier class. To do so, you would need to use the partial_fit() function.

Custom operation implementation for RBM/DBN with tensorflow?

Since Google released out tensorflow, it becomes kind of trend in the current deep learning selections.
I'd like to do some experiments about RBM/DBN (Restricted Boltzmann Machine/Deep Belief Network), I've made some attempt by myself and kind of implement it well through the combination of available APIs from tensorflow. See code and previous answer.
So, if doesn't bother the code running performance, here's the gift for RBM/DBN implementation with tensorflow.
But, the running performance must be considered for the future. Because of the special progress of CD (Contrastive Divergence) algorithm, I think it just works against the framework (data flow graph) used by tensorflow. That's why my code seems weired.
So, the custom operation should be implemented for acceleration. I've followed the current documentation about adding custom ops.
REGISTER_OP("NaiveRbm")
.Input("visible: float32")
.Input("weights: float32")
.Input("h_bias: float32")
.Input("v_bias: float32")
.Output("hidden: float32")
.Doc(R"doc(
Naive Rbm for seperate training use. DO NOT mix up with other operations
)doc");
In my design, NaiveRbm should is an operation that takes visible,weights,h_bias,v_bias as input, but output by only first 3 Variables ( simply sigmoid(X*W+hb) ), its gradient should return at least gradients for last 3 Variables.
Imagine example psuedo code like this:
X = tf.placeholder()
W1, hb1, vb1 = tf.Variable()
W2, hb2, vb2 = tf.Variable()
rbm1 = NaiveRbm(X,W1,hb1,vb1)
train_op = tf.train.MomentumOptimizer(0.01, 0.5).minimize(rbm1)
rbm2 = NaiveRbm(tf.stop_gradient(rbm1), W2, hb2, vb2)
train_op2 = tf.train.MomentumOptimizer(0.01, 0.5).minimize(rbm2)
with tf.Session() as sess:
for batch in batches:
sess.run(train_op, feed_dict={X: batch})
for batch in batches:
sess.run(train_op2, feed_dict={X: batch})
But the tensorflow library is too complex for me. And after too much time seeking for how to implement these existing operations (sigmoid, matmul, ma_add, relu, random_uniform) in custom operation, no solution is found by myself.
So, I'd like to ask if someone could help me achieve the remain works.
PS: before getting some ideas, I'd like to dive into Theano since it implements RBM/DBN already. Just in my opinion, Caffe is kind of not suitable for RBM/DBN because of its framework.
Update: After scratch through the tutorials from Theano, I found the key reason for Theano implemented the RBM/DBN while the tensorflow haven't is the scan technology. So, there might wait tensorflow to implement scan technology to prepare for RBM/DBN implementation.

Resources