Does AzureML RL support PyTorch? - azure-machine-learning-service

Does AzureML RL support PyTorch?
As RLlib itself supports PyTorch as a framework, I tried to run AzureML RL with PyTorch but it failed.
I referred to this page to know how to specify the framework.
I added "framework":"torch" to my AzureML RL experiment's config but it failed.
Here're snippet from the training script.
tune.run(
run_or_experiment="PPO",
config={
"env":"CartPole-v0",
"env_config":env_config,
"num_gpus":0,
"num_workers":1,
"callbacks":callbacks,
"framework": "torch",
},
stop=stop,
checkpoint_freq=2,
checkpoint_at_end=True,
local_dir='./logs',

Ray's support for PyTorch exists, but is not nearly as extensive as its support for Tensorflow.
Whether or not PyTorch will work for your problem depends on the version of Ray/RLLib you're using, the algorithm you're running, and sometimes even the nature of the Environment (specifically the action and observation spaces).
I recommend starting by making sure you're using a recent version of Ray. You can select a version by specifying a Pip package in the configuration for your ReinforcementLearningEstimator (this will be in your notebook code, not in the training script). You can add code that looks something like this:
pip_packages=["ray[rllib]==0.8.7"]
Then in your ReinforcementLearningEstimator setup make sure you set pip_packages:
rl_estimator = ReinforcementLearningEstimator(
...
# Pip packages
pip_packages=pip_packages,
...

Related

How can I use a model I trained to make predictions in the future without retraining whenever I want to use it

I recently finished training a linear regression algorithm but I don't know how to save it so that in the future, I can use it to make relevant predictions without having to retrain it whenever I want to use it.
Do I save the .py file and call it whenever I need it or create a class or what?
I just want to know how I can save a model I trained so I can use it in the future.
Depending on how you make the linear regression, you should be able to obtain the equation of the regression, as well as the values of the coefficients, most likely by inspecting the workspace.
If you explain what module, function, or code you use to do the regression, it will be easier to give a specific solution.
Furthermore, you can probably use the dill package:
https://pypi.org/project/dill/
I saw the solution here:
https://askdatascience.com/441/anyone-knows-workspace-jupyter-python-variables-functions
The steps proposed for using dill are:
Install dill. If you use conda, the code would be conda install -c anaconda dill
To save workspace using dill:
import dill
dill.dump_session('notebook_session.db')
To restore sesion:
import dill
dill.load_session('notebook_session.db')
I saw the same package discussed here: How to save all the variables in the current python session?
and I tested it using a model created with the interpretML package, and it worked for me.

Has anyone done training on custom data using AllenNLP for coreference resolution?

I'm trying to train AllenNLP on custom data instead of using the pre-trained model for coreference resolution. The instructions are here but they are very vague and I am not sure how to progress, in particular I don't know how to modify the JSONNET file to indicate the path to my train, test and dev ConLL-2012 training files. Has anyone ever accomplished this before? Thank you very much.
You can specify the path to your data in these lines in the jsonnet config:
"train_data_path": std.extVar("COREF_TRAIN_DATA_PATH"),
"validation_data_path": std.extVar("COREF_DEV_DATA_PATH"),
"test_data_path": std.extVar("COREF_TEST_DATA_PATH"),
Either you can update the config to use your paths explicitly, or else set these environment variables before running the config with the allennlp train command.

Porting pre-trained keras models and run them on IPU

I am trying to port two pre-trained keras models into the IPU machine. I managed to load and run them using IPUstrategy.scope but I dont know if i am doing it the right way. I have my pre-trained models in .h5 file format.
I load them this way:
def first_model():
model = tf.keras.models.load_model("./model1.h5")
return model
After searching your ipu.keras.models.py file I couldn't find any load methods to load my pre-trained models, and this is why i used tf.keras.models.load_model().
Then i use this code to run:
cfg=ipu.utils.create_ipu_config()
cfg=ipu.utils.auto_select_ipus(cfg, 1)
ipu.utils.configure_ipu_system(cfg)
ipu.utils.move_variable_initialization_to_cpu()
strategy = ipu.ipu_strategy.IPUStrategy()
with strategy.scope():
model = first_model()
print('compile attempt\n')
model.compile("sgd", "categorical_crossentropy", metrics=["accuracy"])
print('compilation completed\n')
print('running attempt\n')
res = model.predict(input_img)[0]
print('run completed\n')
you can see the output here:link
So i have some difficulties to understand how and if the system is working properly.
Basically the model.compile wont compile my model but when i use model.predict then the system first compiles and then is running. Why is that happening? Is there another way to run pre-trained keras models on an IPU chip?
Another question I have is if its possible to load a pre-trained keras model inside an ipu.keras.model and then use model.fit/evaluate to further train and evaluate it and then save it for future use?
One last question I have is about the compilation part of the graph. Is there a way to avoid recompilation of the graph every time i use the model.predict() in a different strategy.scope()?
I use tensorflow2.1.2 wheel
Thank you for your time
To add some context, the Graphcore TensorFlow wheel includes a port of Keras for the IPU, available as tensorflow.python.ipu.keras. You can access the API documentation for IPU Keras at this link. This module contains IPU-specific optimised replacement for TensorFlow Keras classes Model and Sequential, plus more high-performance, multi-IPU classes e.g. PipelineModel and PipelineSequential.
As per your specific issue, you are right when you mention that there are no IPU-specific ways to load pre-trained Keras models at present. I would encourage you, as you appear to have access to IPUs, to reach out to Graphcore Support. When doing so, please attach your pre-trained Keras model model1.h5 and a self-contained reproducer of your code.
Switching topic to the recompilation question: using an executable cache prevents recompilation, you can set that up with environmental variable TF_POPLAR_FLAGS='--executable_cache_path=./cache'. I'd also recommend to take a look into the following resources:
this tutorial gathers several considerations around recompilation and how to avoid it when using TensorFlow2 on the IPU.
Graphcore TensorFlow documentation here explains how to use the pre-compile mode on the IPU.

Using PyTorch on AWS Lambda

Has anyone had any luck being able to use PyTorch on AWS Lambda for feature extraction from images or just using the framework at all? I finally got PyTorch, numpy, and pillow zipped in a folder under the uncompressed size limit (which is actually around 262 MB) but I had to build PyTorch from source to do this. The problem I am having now is that Lambda has a very old version of gcc running on it (4.8.3) which is very buggy and missing whole header files altogether. I believe the Pytorch docs state you should be using at least gcc 7 or later but I'm hoping someone may have found a way around this? I built the source using gcc 7.5 but then when I tried to import torch Lambda obviously used it's installed version of 4.8.3 causing an error on import: Floating point exception (core dumped) which stems from the old version of gcc. Is there a possible solution around this? I've been at this for a day and a half now so any help would be great. I think the bottom line is I am facing this similar issue. Better yet does anyone have a Pytorch lambda layer I could use?
I was able to utilize the below layers for using pytorch on AWS Lambda:
arn:aws:lambda:AWS_REGION:934676248949:layer:pytorchv1-py36:1 PyTorch 1.0.1
arn:aws:lambda:AWS_REGION:934676248949:layer:pytorchv1-py36:2 PyTorch 1.1.0
Found these on Fastai production deployment page, thanks to Matt McClean

SageMaker deploying to EIA from TF Script Mode Python3

I've fitted a Tensorflow Estimator in SageMaker using Script Mode with framework_version='1.12.0' and python_version='py3', using a GPU instance.
Calling deploy directly on this estimator works if I select deployment instance type as GPU as well. However, if I select a CPU instance type and/or try to add an accelerator, it fails with an error that docker cannot find a corresponding image to pull.
Anybody know how to train a py3 model on a GPU with Script Mode and then deploy to a CPU+EIA instance?
I've found a partial workaround by taking the intermediate step of creating a TensorFlowModel from the estimator's training artifacts and then deploying from the model, but this does not seem to support python 3 (again, doesn't find a corresponding container). If I switch to python_version='py2', it will find the container, but fail to pass health checks because all my code is for python 3.
Unfortunately there are no TF + Python 3 + EI serving images at this time. If you would like to use TF + EI, you'll need to make sure your code is compatible with Python 2.
Edit: after I originally wrote this, support for TF + Python 3 + EI has been released. At the time of this writing, I believe TF 1.12.0, 1.13.1, and 1.14.0 all have Python 3 + EI support. For the full list, see https://github.com/aws/sagemaker-python-sdk#tensorflow-sagemaker-estimators.

Resources