I am working in the StableBaselines3 package. I know that I can make learning rate schedule by inputting a function to the "learning_rate" argument. However, what I want to be able to do is adaptively decrease the learning rate. Instead of fine-tuning a new LR schedule for every situation, I want the learning rate to "automatically" decrease a certain amount when the performance hits a plateau/stops increasing.
This post here explores how to do this in PyTorch. I know SB3 is built on PyTorch, so is this same function usable somehow in SB3?
Is there any elegant way to do this in SB3? Or would I have to just stop the training periodically, run a check, implement new LR, run again etc.?
Thank you!
I haven't done it myself, but I'm deaing with SB3 callbacks these days and it should be possible to implement a custom callback that monitors the performance and then changes the model's learning rate based on that.
You can use callbacks to access internal state of the RL model during
training. It allows one to do monitoring, auto saving, model
manipulation, progress bars, …
See here https://stable-baselines3.readthedocs.io/en/master/guide/callbacks.html
Related
I am trying to use the TensorFlow object detection API to recognize a specific object (guitars) in pictures and videos.
As for the data, I downloaded the images from the OpenImage dataset, and derived the .tfrecord files. I am testing with different numbers, but for now let's say I have 200 images in the training set and 100 in the evaluation one.
I'm traininig the model using the "ssd_mobilenet_v1_coco" as a starting point, and the "model_main.py" script, so that I can have training and validation results.
When I visualize the training progress in TensorBoard, I get the following results for train:
and validation loss:
respectively.
I am generally new to computer vision and trying to learn, so I was trying to figure out the meaning of these plots.
The training loss goes as expected, decreasing over time.
In my (probably simplistic) view, I was expecting the validation loss to start at high values, decrease as training goes on, and then start increasing again if the training goes on for too long and the model starts overfitting.
But in my case, I don't see this behavior for the validation curve, which seems to be trending upwards basically all the time (excluding fluctuations).
Have I been training the model for too little time to see the behavior I'm expecting? Are my expectations wrong in the first place? Am I misinterpreting the curves?
Ok, I fixed it by decreasing the initial_learning_rate from 0.004 to 0.0001.
It was the obvious solution, considering the wild oscillations of the validation loss, but at first I thought it wouldn't work since there seems to be already a learning rate scheduler in the config file.
However, immediately below (in the config file) there's a num_steps option, and it's stated that
# Note: The below line limits the training process to 200K steps, which we
# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
Honestly, I don't remember if I commented out the num_steps option...if I didn't, it seems my learning rate was kept to the initial value of 0.004, which turned out to be too high.
If I did comment it out (so that the learning scheduler was active), I guess that, instead of the decrease, it still started from too high of a value.
Anyway, it's working much better now, I hope this can be useful if anyone is experiencing the same problem.
I'm currently experimenting with the possibilities of Sparkling-Water. There are a few possible Use-Cases including Data-Munging in H2O/Spark, Model Building and Offline-Training and Online Stream Prediction. I was wondering whether it is also possible to use Sparkling-Water for Online-Training together with a Kafka Streaming Source?
The Deep Learning model in particular can continuously train forever if you keep presenting new data. So you could do online training with that.
Models like DRM and GBM can “add another tree” from new data using a checkpoint, although you really don’t want to end up with infinity trees.
You could keep around a window of data and periodically train a new complete model. (Swapping in a new model instance at runtime is pretty straighforward. So you could just keep training in the background and update the model that predicts on streaming data periodically — like every hour or every few minutes, or whatever).
Or do your own ensembling by averaging the prediction of many models — by periodically throwing away older models and adding newer ones in a conveyor-belt type of strategy. Similar to a moving average.
I want to create a web application which uses machine learning to predict the price of agriculture commodities before 2-3 months.
Is it really feasible or not?
If yes, then please provide some rough idea about which tools and technologies I can use to implement it.
First of all, study math, more precisely, statistics and differential algebra.
Then, use any open (or not) source neural networking libraries you could find. Even MATLAB would help, as it has a good set of examples (I think it has some of alike prediction models, at least I remember creating a model for predicting election results in Poland)
Decide on your training and input data. Research how news and global situation influences commodity prices. Research how existing bots predict prices for next 1-2 minutes. Also consider using history of predictions from certain individuals, I think Reuters has some API for this. Saying this I imply you'll have to integrate natural language processors, too.
Train your model, test it, improve it for quite a long time.
Finally, deploy a boring front-end and monetize it.
If you dont want to implement ML, you can also use kalman filters.
I have a CNN model. The requests of using this model, for example to classify a picture, come 1 time a second.
I would like to collect the requests as new unsuperised data, and keep training my model.
My question is: How can I handle the training task and classify task effictively?
I will explain why it becomes a problem:
Every training step takes a long time, at least severy seconds, using GPU and not interruptable. So, if my classify tasks use GPU too, I cannot response the requests in time. I would like to make classify tasks using CPU, but looks like theano not support two diffrent config.device in one process.
Multi-process is not acceptable, because my memory is limited and theano costs too much.
Any help or advice would be apreciated.
You could build two separate copies of the same CNN, one on the CPU and one on the GPU. I think this could be done under either the old GPU backend or the new one, but in different ways....some ideas:
Under the old backend:
Load Theano with device=cpu. Build your inference function and compile it. Then call theano.sandbox.cuda.use('gpu'), and build a new copy of your inference function and take gradients of that one to make any training functions. Now the inference function should execute on the CPU, and the training should happen on the GPU. (I've never done this on purpose but I had it happen to me on accident!)
Under the new backend:
As far as I know, you have to tell Theano about any GPUs right when importing, not later. In this case, you could use THEANO_FLAGS="contexts=dev0->cuda0", which doesn't force using one device over another. Then build the inference version of your function like normal, and for the training version, again put all the shared variables on the GPU, and the input variables to any of your training functions should also be GPU variables (e.g. input_var_1.transfer('dev0')). When all your functions are compiled, look at the programs using theano.printing.debugprint(function) to see what's on GPU vs CPU. (When compiling the CPU functions, it might give a warning that it cannot infer the context, and as far as I've seen, that lands it on the CPU...not sure if this behavior is safe to depend on.)
In either case, this will depend on your GPU-based functions do NOT RETURN ANYTHING TO THE CPU (make sure the output variables are GPU ones). This should allow the training function to run concurrently to your inference function, and later you grab what you need to the CPU. For example when you take a training step, just copy the new values over to your inference network parameters, of course.
Let us hear what you come up with!
I'm trying to find out if it is possible to have "incremental training" on data using MLlib in Apache Spark.
My platform is Prediction IO, and it's basically a wrapper for Spark (MLlib), HBase, ElasticSearch and some other Restful parts.
In my app data "events" are inserted in real-time, but to get updated prediction results I need to "pio train" and "pio deploy". This takes some time and the server goes offline during the redeploy.
I'm trying to figure out if I can do incremental training during the "predict" phase, but cannot find an answer.
I imagine you are using spark MLlib's ALS model which is performing matrix factorization. The result of the model are two matrices a user-features matrix and an item-features matrix.
Assuming we are going to receive a stream of data with ratings or transactions for the case of implicit, a real (100%) online update of this model will be to update both matrices for each new rating information coming by triggering a full retrain of the ALS model on the entire data again + the new rating. In this scenario one is limited by the fact that running the entire ALS model is computationally expensive and the incoming stream of data could be frequent, so it would trigger a full retrain too often.
So, knowing this we can look for alternatives, a single rating should not change the matrices much plus we have optimization approaches which are incremental, for example SGD. There is an interesting (still experimental) library written for the case of Explicit Ratings which does incremental updates for each batch of a DStream:
https://github.com/brkyvz/streaming-matrix-factorization
The idea of using an incremental approach such as SGD follows the idea of as far as one moves towards the gradient (minimization problem) one guarantees that is moving towards a minimum of the error function. So even if we do an update to the single new rating, only to the user feature matrix for this specific user, and only the item-feature matrix for this specific item rated, and the update is towards the gradient, we guarantee that we move towards the minimum, of course as an approximation, but still towards the minimum.
The other problem comes from spark itself, and the distributed system, ideally the updates should be done sequentially, for each new incoming rating, but spark treats the incoming stream as a batch, which is distributed as an RDD, so the operations done for updating would be done for the entire batch with no guarantee of sequentiality.
In more details if you are using Prediction.IO for example, you could do an off line training which uses the regular train and deploy functions built in, but if you want to have the online updates you will have to access both matrices for each batch of the stream, and run updates using SGD, then ask for the new model to be deployed, this functionality of course is not in Prediction.IO you would have to build it on your own.
Interesting notes for SGD updates:
http://stanford.edu/~rezab/classes/cme323/S15/notes/lec14.pdf
For updating Your model near-online (I write near, because face it, the true online update is impossible) by using fold-in technique, e.g.:
Online-Updating Regularized Kernel Matrix Factorization Models for Large-Scale Recommender Systems.
Ou You can look at code of:
MyMediaLite
Oryx - framework build with Lambda Architecture paradigm. And it should have updates with fold-in of new users/items.
It's the part of my answer for similar question where both problems: near-online training and handling new users/items were mixed.