RLlib: Multiple training phases with different configurations - rllib

I want to do some complicated training using RLlib and I'm not sure how.
I have an environment for two agents, and I want to train the first agent while I'm forcing the policy of the second agent to be a hard-coded policy that I write. I want to run that training for 10 steps. Then I want to continue training both agents normally for 10 more steps. That means that in the second training, the first agent is starting out with the policy that I trained in the first training phase, while the second agent is starting with a blank policy.
Is that possible with RLlib? How?

Related

How do I deploy deep reinforcement learning neural network I coded in Pytorch to my website?

I have built and trained neural network in Pytorch and is ready for production to a website but how do I deploy it?
There is multiple ways to do it.
Yet first, I add a PS : I noticed that you were asking specifically about reinforcement learning after having posted my answer. Know that even though I have written this answer with a static neural network model in mind, I offer at the end of the post a solution to apply the ideas of this answer to reinforcement learning.
The different options :
From what I know, using PyTorch in production is not especially recommended for big scale production.
It is more common to convert the PyTorch model to the ONNX format (a format to make ai models interchangeable between frameworks). Here is a tutorial if you want to operate this way : https://github.com/onnx/tutorials/blob/master/tutorials/PytorchOnnxExport.ipynb .
Then run it using the ONNX runtime, Caffe2 (by Facebook) or with TensorFlow (by Google).
My answer is not going to explore those solutions (and i did not include tutorials to those options), because i recently did the same as you are trying to do (building a neural network architecture and wanting to deploy it, and also allowing users to train their neural network with the architecture), yet i did not converted my neural network for the following reasons :
ONNX is evolving quickly, yet is currently not supporting all the operations you can possibly do in a PyTorch model. So if you have a highly custom or specific neural network (like in my case), you might not be able to convert it to ONNX with ease. You might need to change your architecture, or maybe have to re-write a big part of it so that it can be converted to ONNX.
You will need to use one or two additional tools, where most tutorials are not going really deep, or not explaining the logic behind what they are doing.
Note that you might want to convert your neural network if you call your network billions or trillions of times a day, otherwise i think you can stick with PyTorch without issues even for production, and avoid the fallback of converting to ONNX.
First let's see how we can save a trained neural network, load it back trough the architecture of the network, and re-run the trained network.
Second how we can deploy a network to a website, and also how you can allow users to train their networks. It is likely not the best or most efficient way, yet it sure works.
Saving the network :
First, you clearly need to have imported pyTorch with "import torch". Inside your neural network file you should save the stateDict (basically a dictionary of the operations and weights of your network) of the network you want to re-use. You could for example only save the stateDict of the model with the smallest loss of your epoch.
# network is the variable containing your neural network class
network_stateDict = network.state_dict()
# Saving network stateDict to a variable
Then when you want to save the stateDict to a file that you can re-use later, use :
torch.save(network_stateDict, "folderPath/myStateDict.pt)
# Saving the stateDict variable to a file
# The pt extension is just a convention in the PyTorch community, ptr is also used a lot
Finally when you will want to re-use your trained network later on, you will need to :
network = myNetwork(1, 2, 3)
# Load the architecture of the network in a variable (use the same architecture
# and the same network parameters as the ones used to create the stateDict)
network.load_state_dict(torch.load(folderPath/myStateDict.pt))
# Loading the file containing the stateDict of the trained network into a format
# pyTorch can read with the torch.load function. Then load the stateDict inside the
# network architecture with the load_state_dict function, applied to your network
# object with network.load_state_dict .
network.eval()
# To make sure that the stateDict has correctly been loaded.
output = network(input_data)
# You should now be able to get output data from your
# trained network, by feeding it a single set of input data.
For more infos on saving models and the stateDicts : https://pytorch.org/tutorials/beginner/saving_loading_models.html
Deploying the network:
Now that we know how to save, restore and feed input data to a network, all that there is left to do is to deploy it so that this process is done trough the website.
You first need to get (likely from your user) the inputs that your neural network will use. I am not going to include any link, since there is so many different web frameworks.
You would then need to either use a framework (like Django) that allow you to do in Python the logic of :
import torch
network = myNetwork(1, 2, 3)
network.load_state_dict(torch.load(folderPath/myStateDict.pt))
network.eval()
input_data = data_fromMyUser
output = network(input_data)
Then you would collect the output to display it, or do whatever you want it.
If your framework is not giving you the ability to use Python, i think it would be a good idea to have a tiny Python script, to which you would give the input data, and which would return the output.
If you would like to give the possibility to the user to train networks, you should just give them the possibility to start the training of one, and then use torch.save on a stateDict object to save the stateDict to a file.
You or they could later use the trained networks (you should also need to create a little function to make sure that you do not override previous stateDict files).
How to apply it to reinforcement learning :
I did not deploy a reinforcement learning model, yet i can offer you some ideas and leads to explore to deploy one.
You could store and add the inputs that you get from your user to a file or a database, and write a little program, that say every 24 hours or every hour, re-run the neural network with the now bigger dataset.
You could then totally apply the suggestions in this answer, of running the network, saving the stateDict of the model and then changing the stateDict that your network is using in production.
This is a bit hacky, yet would allow you to save in a "static way" your trained networks, and still have them evolving and changing their stateDicts.
Conclusion
This is clearly not the most mass-scale production approach that you could employ, yet it is in my opinion the easiest to put in place.
You also know that the output that you will get, will be the actual output of your neural network, without any distorsions or errors in the values.
Have a great day !
save the trained model however you want (HD5 or with pickle)
write the program to handle in production by loading the trained model
deploy the program on distributed system for real time computation like on Apache storm, Flink, Alink, Apache Samoa etc..
if you feel you need to retrain the model depending on feedback then retrain the model on different cluster or parallel environment and observe the model accuracy if looks good then move the model to production (initial days you need to retrain multiple times and it will decrease time goes on if your model is designed in a good way)

Need to know how to properly regression test a Dialogflow agent - multiple, conflicting options

I've been working with Dialogflow for several months now - really enjoy the tool. However, it's very important to me to know if my changes (in training) are making intent mapping accuracy better or worse over time. I've talked to numerous people about the best way to do this using Dialogflow and I've gotten at least 4 different answers. Can you help me out here? Here are the 4 answers I've received on how to properly train/test your agent. Please note that ALL of these answers involve generating a confusion matrix...
"Traditional" - randomly split all your input data into 80/20% - use the 80% for training and the 20% for testing. Every time you train your agent (because you've collected new input data), start the process all over again - meaning randomly split all your input data (old and new) into 80/20%. In this model, a piece of training data for one agent iteration might be used as a piece of test data on another iteration - and vice-versa. I've seen variations on this option (KFold and Monte Carlo).
"Golden Set" - similar to the above except that the initial 80/20% training and testing sets that you use to create your first agent continue to grow over time as you add more input data. In this model, once a piece of input data has been tagged as training data, it will NEVER be used as testing data - and once a piece of input data has been tagged as testing data, it will NEVER be used as training data. The 2 initial sets of training and testing data just continue to grow over time as new inputs are randomly split into the existing sets.
"All Data is Training Data - Except for Fake Testing Data" - In this model, we use all the input data as training data. We then copy a portion of the input data (20%) and "munge it" - meaning that we inject characters or words into the 20% portion - and use that as test data.
"All Data is Training Data - and Some of it is Testing Data Also" - In this model, we use all the input data as training data. We then copy a portion of the input data (20%) and use it as testing data. In other words, we are testing our agent using a portion of our (unmodified) training data. A variation on this option is to still use all your inputs as training data but sort your inputs by "popularity/usage" and take the top 20% for testing data.
If I were creating a bot from scratch, I'd simply go with option #1 above. However, I'm using an off-the-shelf product (Dialogflow) and it isn't clear to me that traditional testing is required. Golden Set seems like it will (mostly) get me to the same place as "traditional" so I don't have a big problem with it. Option #3 seems bad - creating fake testing data sounds problematic on many levels. And option #4 is using the same data to test as it uses to train - which scares me.
Anyway, would love to hear some thoughts on the above from the experts!

Train multiple models with various measures and accumulate predictions

So I have been playing around with Azure ML lately, and I got one dataset where I have multiple values I want to predict. All of them uses different algorithms and when I try to train multiple models within one experiment; it says the “train model can only predict one value”, and there are not enough input ports on the train-model to take in multiple values even if I was to use the same algorithm for each measure. I tried launching the column selector and making rules, but I get the same error as mentioned. How do I predict multiple values and later put the predicted columns together for the web service output so I don’t have to have multiple API’s?
What you would want to do is to train each model and save them as already trained models.
So create a new experiment, train your models and save them by right clicking on each model and they will show up in the left nav bar in the Studio. Now you are able to drag your models into the canvas and have them score predictions where you eventually make them end up in the same output as I have done in my example through the “Add columns” module. I made this example for Ronaldo (Real Madrid CF player) on how he will perform in match after training day. You can see my demo on http://ronaldoinform.azurewebsites.net
For more detailed explanation on how to save the models and train multiple values; you can check out Raymond Langaeian (MSFT) answer in the comment section on this link:
https://azure.microsoft.com/en-us/documentation/articles/machine-learning-convert-training-experiment-to-scoring-experiment/
You have to train models for each variable that you going to predict. Then add all those predicted columns together and get as a single output for the web service.
The algorithms available in ML are only capable of predicting a single variable at a time based on the inputs it's getting.

Google Prediction API for FAQ/Recommendation system

I want to build automated FAQ system where user can ask some questions and based on the questions and their answers from the training data, the application would suggest set of answers.
Can this be achieved via Prediction API?
If yes, how should I create my training data?
I have tested Prediction API for sentiment analysis. But having doubts and confusion on using it as FAQ/Recommendation system.
My training data has following structure:
"Question":"How to create email account?"
"Answer":"Step1: xxxxxxxx Step2: xxxxxxxxxxxxx Step3: xxxxx xxx xxxxx"
"Question":"Who can view my contact list?"
"Answer":"xxxxxx xxxx xxxxxxxxxxxx x xxxxx xxx"
train your data like input is question and output is answer
when you are sending a question as a input to predict it can give output of your answer.
simple faq you will rock.
but if you completed in PHP Help me too man.
In order to use the Prediction API, you must first train it against a set of training data. At the end of the training process, the Prediction API creates a model for your data set. Each model is either categorical (if the answer column is string) or regression (if the answer column is numeric). The model remains until you explicitly delete it. The model learns only from the original training session and any Update calls; it does not continue to learn from the Predict queries that you send to it.
Training data can be submitted in one of the following ways:
A comma-separated value (CSV) file. Each row is an example consisting of a collection of data plus an answer (a category or a value) for that example, as you saw in the two data examples above. All answers in a training file must be either categorical or numeric; you cannot mix the two. After uploading the training file, you will tell the Prediction API to train against it.
Training instances embedded directly into the request. The training instances can be embedded into the trainingInstances parameter. Note: due to limits on the size of an HTTP request, this would only work with small datasets (< 2 MB).
Via Update calls. First an empty model is trained by passing in empty storageDataLocation and trainingInstances parameters into an Insert call. Then, the training instances are passed in using the Update call to update the empty model. Note: since not all classifiers can be updated, this may result in lower model accuracy than batch training the model on the entire dataset.
You can have more information in this Help Center article.
NB: Google Prediction API client library for PHP is still in Beta.

Multi threaded AForge.NET training

I am using AForge.NET ANN and training it on my training set. Because the training is single threaded and the process can take ages, I wondered if it's possible to run a multi threaded training.
Because it is a problem to use threads while training a Resilient Backpropagation network I thought about splitting my training set between different networks and once every N epoch's, combine the weights of all networks in to one, Then, duplicate it to all threads (so the next epoch will start with the new weights).
I can't seem to find a method in the AForge.NET that combines two (or more) networks. Looking for some help on how to get started with the implementation process.
Combining the neural networks every N number of iterations won't work really well. It can be very tricky to just take the weights and combine them. In some ways this is how the crossover operation of a Genetic Algorithm works.
Really the only way you are going to be able to do this is modify AForge's training to support multiple threads. Basically to do this you need to map the gradient calculation and then do a reduce-sum on the gradients. Then use the reduced gradients to update the network.
I've implemented this exact thing in the Encog Framework, it supports multi-threaded (RPROP), and has a C# version. http://www.heatonresearch.com/encog.

Resources