SageMaker Script Mode Serving - python-3.x

I've trained a tensorflow.keras model using SageMaker Script Mode like this:
import os
import sagemaker
from sagemaker.tensorflow import TensorFlow
estimator = TensorFlow(entry_point='train.py',
source_dir='src',
train_instance_type=train_instance_type,
train_instance_count=1,
hyperparameters=hyperparameters,
role=sagemaker.get_execution_role(),
framework_version='1.12.0',
py_version='py3',
script_mode=True)
However, how do I specify what the serving code is when I call estimator.deploy()? And what is it by default? Also is there any way to modify the nginx.conf using Script Mode?

The Tensorflow container is open source: https://github.com/aws/sagemaker-tensorflow-container You can view exactly how it works. Of course, you can tweak it, build it locally, push it to ECR and use it on SageMaker :)
Generally, you can deploy in two ways:
Python-based endpoints: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/deploying_python.rst
TensorFlow Serving endpoints: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/deploying_tensorflow_serving.rst
I would also recommend looking at the TensorFlow examples here: https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk

With script mode the default serving method is the TensorFlow Serving-based one:
https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/estimator.py#L393
Custom script is not allowed with the TFS based container. You can use serving_input_receiver_fn to specify how the input data is processed as described here: https://www.tensorflow.org/guide/saved_model
As for modifying the ngnix.conf, there are no supported ways of doing that. Depends on what you want to change in the config file you can hack the sagemaker-python-sdk to pass in different values for these environment variables: https://github.com/aws/sagemaker-tensorflow-serving-container/blob/3fd736aac4b0d97df5edaea48d37c49a1688ad6e/container/sagemaker/serve.py#L29
Here is where you can override the environment variables: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/serving.py#L130

Related

Structure and reuse python with Apache Airflow 2.3.4

I have to write some code for an Apache Airflow DAG and I have encountered something that I do not know. I want to reuse some existing code in python 3.x within the Apache Airflow environment.
What I would like to achieve with this question:
I have my dags folder in /home/'user'/airflow/dags
I have another repository with code stored in /home/sources. Here I have an init.py and a main function which ca be called with parameters and in this repository there are several functions that are called based on the parameters
How can I access most efficient the main.py of the code in /home/sources from dag using the PythonOperator?
Thank you
If you are using a conda environment, you can simply add "/home/sources" to the PYTHONPATH.
Let's say your environment name is "airflow":
conda activate airflow
conda develop /home/sources
Now, along with airflow, your python will check everything under "/sources" folder. In your dag files you can simply use:
from main import my_functions
And give them to Python Operators.
For more information, check here.

Test AWS lambdas and layer using Jest?

I have an AWS lambda API that uses a lambda layer with some helper functions.
Now, when deployed AWS forces a path for the layer that's something like /opt/nodejs/lib/helpers/awsGatewayResponses. However, locally I have another folder structure which (on my local machine) would make the path is layers/api-layer/nodejs/lib/helpers/awsGatewayResponses. (Cause I don't want to have a folder setup that /opt/nodejs/lib/...)
However, I'm setting up some tests using Jest and I've come across the issue that I have to change the imports which is of the format /opt/nodejs/lib/helpers/... to be layers/api-layer/nodejs/lib/helpers/ otherwise I will get import errors - and I don't want to make this change since it is not aligned with the actual deployed environment.
I'm looking for something that can change my paths to be layers/api-layer/nodejs/lib/helpers/ only when I'm running tests. Any ideas on how I can make some kind of dynamic import? I want to run some tests automatically on Github on commits.
Thanks in advance! Please let me know if I have to elaborate.

No Library cv2 on AzureML

I am trying to learn AzureML sdk and train my model on cloud.
I successfully train demo project located here.
Now, that I want to train my own model, I get error :
UserError","message":"No module named 'cv2'","target":null,"details":[],"innerErro...
This means, that cv2 is not installed on AzureML and I use it in my train script,...
How to pip install library on AzureML or, how to "copy" virtual environment to my workspace
The answer is to add opencv-python-headless as a pip installation
like this:
TensorFlow(source_directory=script_folder,
script_params=script_params,
compute_target=compute_target,
entry_script=train_script_name,
pip_packages=['opencv-python-headless', 'scikit-image', 'mathematics', 'keras', 'scikit-learn'],
use_gpu=True)
I assume you mean that you are training on Azure ML managed compute?! If so, you need to specify all your required packages in a Conda dependencies file. See here for guidance: https://learn.microsoft.com/sl-si/azure/machine-learning/service/how-to-set-up-training-targets#system-managed-environment
Use a system-managed environment when you want Conda to manage the
Python environment and the script dependencies for you. A
system-managed environment is assumed by default and the most common
choice. It is useful on remote compute targets, especially when you
cannot configure that target.
All you need to do is specify each package dependency using the
CondaDependency class Then Conda creates a file named
conda_dependencies.yml in the aml_config directory in your workspace
with your list of package dependencies and sets up your Python
environment when you submit your training experiment.
Alternativly, if you are using estimators and require only a few packages, you can also specify them directly:
estimator = SKLearn(source_directory=project_folder,
script_params=script_params,
compute_target=compute_target,
entry_script='train_iris.py'
pip_packages=['joblib']
)
https://learn.microsoft.com/en-Us/azure/machine-learning/service/how-to-train-scikit-learn#create-a-scikit-learn-estimator

Recommended approach for project-specific keras config?

My goal is to maintain keras config on a per-project basis, e.g. one project prefers the theano backend, and another project prefers the tensorflow backend. As a bonus, I would like to share this config with other developers relatively seamlessly.
Here are a few ideas:
Can keras config be managed by/within a virtual environment?
Should I use something like dotenv or autoenv to manage some shared environment configuration (via the KERAS_BACKEND environment variable)?
Should keras be updated to look for a .keras/keras.json file in the working tree before using the version in $HOME?
Can keras config be managed by/within a virtual environment?
The basic config parameters (like backend, floating point precision) are managed in the $KERAS_HOME/keras.json file. You could create a keras.json per Anaconda/virtual environment and set KERAS_HOME to point to a specific file as you load it.
Alternatively, these variables can be set during runtime through Keras backend, which would override the value in the config file:
from keras import backend as K
K.set_floatx('float16')
Depending on the Keras backend, there are other parameters one can configure. With tensorflow backend, for instance, one might want to configure tf.ConfigProto. One practical way to do it is during the runtime like:
import os
if os.environ['KERAS_BACKEND'] == 'tensorflow':
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.95, allow_growth=True)
config = tf.ConfigProto(gpu_options=gpu_options)
set_session(tf.Session(config=config))
See config.proto for what can be configured.
Should I use something like dotenv or autoenv to manage some shared
environment configuration (via the KERAS_BACKEND environment
variable)?
It is definitely not a must, one could live with os.environ and get/set methods available in Keras to modify the variables.
Should keras be updated to look for a .keras/keras.json file in the working tree before using the version in $HOME?
It is possible to point to a custom location of the keras.json config file by changing the KERAS_HOME env variable or launching your application like:
env KERAS_HOME=<path to custom folder containing keras.json> python keras_app.py

Configuring an aiohttp app hosted by gunicorn

I implemented my first aiohttp based RESTlike service, which works quite fine as a toy example. Now I want to run it using gunicorn. All examples I found, specify some prepared application in some module, which is then hosted by gunicorn. This requires me to setup the application at import time, which I don't like. I would like to specify some config file (development.ini, production.ini) as I'm used from Pyramid and setup the application based on that ini file.
This is common to more or less all python web frameworks, but I don't get how to do it with aiohttp + gunicorn. What is the smartest way to switch between development and production settings using those tools?
At least for now aiohttp is a library without reading configuration from .ini or .yaml file.
But you can write code for reading config and setting up aiohttp server by hands easy.

Resources