No Library cv2 on AzureML - azure

I am trying to learn AzureML sdk and train my model on cloud.
I successfully train demo project located here.
Now, that I want to train my own model, I get error :
UserError","message":"No module named 'cv2'","target":null,"details":[],"innerErro...
This means, that cv2 is not installed on AzureML and I use it in my train script,...
How to pip install library on AzureML or, how to "copy" virtual environment to my workspace

The answer is to add opencv-python-headless as a pip installation
like this:
TensorFlow(source_directory=script_folder,
script_params=script_params,
compute_target=compute_target,
entry_script=train_script_name,
pip_packages=['opencv-python-headless', 'scikit-image', 'mathematics', 'keras', 'scikit-learn'],
use_gpu=True)

I assume you mean that you are training on Azure ML managed compute?! If so, you need to specify all your required packages in a Conda dependencies file. See here for guidance: https://learn.microsoft.com/sl-si/azure/machine-learning/service/how-to-set-up-training-targets#system-managed-environment
Use a system-managed environment when you want Conda to manage the
Python environment and the script dependencies for you. A
system-managed environment is assumed by default and the most common
choice. It is useful on remote compute targets, especially when you
cannot configure that target.
All you need to do is specify each package dependency using the
CondaDependency class Then Conda creates a file named
conda_dependencies.yml in the aml_config directory in your workspace
with your list of package dependencies and sets up your Python
environment when you submit your training experiment.
Alternativly, if you are using estimators and require only a few packages, you can also specify them directly:
estimator = SKLearn(source_directory=project_folder,
script_params=script_params,
compute_target=compute_target,
entry_script='train_iris.py'
pip_packages=['joblib']
)
https://learn.microsoft.com/en-Us/azure/machine-learning/service/how-to-train-scikit-learn#create-a-scikit-learn-estimator

Related

How to add extra library in ML FLOW which can not be added through conda.yaml

It would be great if you can shed some light.
I m building a data science project using MLFLOW. I can install most of library through conda.yaml. But there is one library which is in azure artifact. which can not be packaged directly. Is there any way we can mention extra libraries for MLFLOW when we run it on databricks/Locally. Which MLFLOW can use it while running

install JAR package related to pyspark into foundry

we would like to install Spark-Alchemy to use it within Pyspark inside foundry (we would like to use their hyperloglog functions). While I know how to install a pip package, I am not sure what it is needed to install this kind of package.
Any help or alternative solutions related to the use of hyperloglog with pyspark will be appreciated, thanks!
PySpark Transform repositories in Foundry are connected to conda. You can use the coda_recipe/meta.yml to pull packages into your transforms. If a package you want is not available in your channels, I would recommend you reach out to your administrators to ask if it's possible to add it. Adding a custom jar that extends spark is something that needs to be reviewed by your platform administrators since it can represent a security risk.
I did a $ conda search spark-alchemy and couldn't find anything related and reading through these instructions https://github.com/swoop-inc/spark-alchemy/wiki/Spark-HyperLogLog-Functions#python-interoperability it makes me guess that there isn't a conda package available.
I can't comment about the use of this specific library but in general, Foundry support Conda channels and if you have a Conda repo and configure foundry to connect to that channel you can add this library or others and reference them in your code.

How to specify pytorch as a package requirement on windows?

I have a python package which depends on pytorch and which I’d like windows users to be able to install via pip (the specific package is: https://github.com/mindsdb/lightwood, but I don’t think this is very relevant to my question).
What are the best practices for going about this ?
Are there some project I could use as examples ?
It seems like the pypi hosted version of torch & torchvision aren’t windows compatible and the “getting started” section suggests installing from the custom pytorch repository, but beyond that I’m not sure what the ideal solution would be to incorporate this as part of a setup script.
What are the best practices for going about this ?
If your project depends on other projects that are not distributed through PyPI then you have to inform the users of your project one way or another. I recommend the following combination:
clearly specify (in your project's documentation pages, or in the project's long description, or in the README, or anything like this) which dependencies are not available through PyPI (and possibly the reason why, with the appropriate links) as well as the possible locations to get them from;
to facilitate the user experience, publish alongside your project a pre-prepared requirements.txt file with the appropriate --find-links options.
The reason why (or main reason, there are others), is that anyone using pip assumes that (by default) everything will be downloaded from PyPI and nowhere else. In other words anyone using pip puts some trust into pypi.org as a source for Python project distributions. If pip were suddenly to download artifacts from other sources, it would breach this trust. It should be the user's decision to download from other sources.
So you could provide in your project's documentation an example of requirements.txt file like the following:
# ...
torch===1.4.0 --find-links https://download.pytorch.org/whl/torch_stable.html
torchvision===0.5.0 --find-links https://download.pytorch.org/whl/torch_stable.html
# ...
Update
The best solution would be to help the maintainers of the projects in question to publish Windows wheels on PyPI directly:
https://github.com/pytorch/pytorch/issues/24310
https://github.com/pytorch/vision/issues/1774
https://pypi.org/help/#file-size-limit

SageMaker Script Mode Serving

I've trained a tensorflow.keras model using SageMaker Script Mode like this:
import os
import sagemaker
from sagemaker.tensorflow import TensorFlow
estimator = TensorFlow(entry_point='train.py',
source_dir='src',
train_instance_type=train_instance_type,
train_instance_count=1,
hyperparameters=hyperparameters,
role=sagemaker.get_execution_role(),
framework_version='1.12.0',
py_version='py3',
script_mode=True)
However, how do I specify what the serving code is when I call estimator.deploy()? And what is it by default? Also is there any way to modify the nginx.conf using Script Mode?
The Tensorflow container is open source: https://github.com/aws/sagemaker-tensorflow-container You can view exactly how it works. Of course, you can tweak it, build it locally, push it to ECR and use it on SageMaker :)
Generally, you can deploy in two ways:
Python-based endpoints: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/deploying_python.rst
TensorFlow Serving endpoints: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/deploying_tensorflow_serving.rst
I would also recommend looking at the TensorFlow examples here: https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk
With script mode the default serving method is the TensorFlow Serving-based one:
https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/estimator.py#L393
Custom script is not allowed with the TFS based container. You can use serving_input_receiver_fn to specify how the input data is processed as described here: https://www.tensorflow.org/guide/saved_model
As for modifying the ngnix.conf, there are no supported ways of doing that. Depends on what you want to change in the config file you can hack the sagemaker-python-sdk to pass in different values for these environment variables: https://github.com/aws/sagemaker-tensorflow-serving-container/blob/3fd736aac4b0d97df5edaea48d37c49a1688ad6e/container/sagemaker/serve.py#L29
Here is where you can override the environment variables: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/serving.py#L130

Recommended approach for project-specific keras config?

My goal is to maintain keras config on a per-project basis, e.g. one project prefers the theano backend, and another project prefers the tensorflow backend. As a bonus, I would like to share this config with other developers relatively seamlessly.
Here are a few ideas:
Can keras config be managed by/within a virtual environment?
Should I use something like dotenv or autoenv to manage some shared environment configuration (via the KERAS_BACKEND environment variable)?
Should keras be updated to look for a .keras/keras.json file in the working tree before using the version in $HOME?
Can keras config be managed by/within a virtual environment?
The basic config parameters (like backend, floating point precision) are managed in the $KERAS_HOME/keras.json file. You could create a keras.json per Anaconda/virtual environment and set KERAS_HOME to point to a specific file as you load it.
Alternatively, these variables can be set during runtime through Keras backend, which would override the value in the config file:
from keras import backend as K
K.set_floatx('float16')
Depending on the Keras backend, there are other parameters one can configure. With tensorflow backend, for instance, one might want to configure tf.ConfigProto. One practical way to do it is during the runtime like:
import os
if os.environ['KERAS_BACKEND'] == 'tensorflow':
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.95, allow_growth=True)
config = tf.ConfigProto(gpu_options=gpu_options)
set_session(tf.Session(config=config))
See config.proto for what can be configured.
Should I use something like dotenv or autoenv to manage some shared
environment configuration (via the KERAS_BACKEND environment
variable)?
It is definitely not a must, one could live with os.environ and get/set methods available in Keras to modify the variables.
Should keras be updated to look for a .keras/keras.json file in the working tree before using the version in $HOME?
It is possible to point to a custom location of the keras.json config file by changing the KERAS_HOME env variable or launching your application like:
env KERAS_HOME=<path to custom folder containing keras.json> python keras_app.py

Resources