Customising model in AWS sagemaker - python-3.x

I have a python script which I wrote using tensorflow python 3.6 AWS sagemaker jupyter notebook inside AWS sagemaker instance. I have to use sagemaker debugger for my Deep Learning model. I can see many links suggesting that first dockerise the algorithm image and then use it over sagemaker. Can anyone please suggest that is there any available alternative such that Tensorflow-1 docker image is available and I can include some other packages via pip in this image and then run my model on sagemaker ? I am using keras 2.3.0 with tensorflow 1.15 .Please guide and share necessary references.

You don't have to dockerize your code yourself, you can use an existing SageMaker TensorFlow image, and with the SageMaker Python SDK you can let SageMaker manipulate docker images for you - no docker knowledge needed ! This documentation explains how to launch your own TF code on SageMaker Training or SageMaker Hosting. You can add a requirements.txt file to bring extra dependencies

Related

Is there a way to OCR images in PySpark?

I can not find an open source solution for OCRing images in PySpark. I know solutions like pytesseract exist, but not sure if they will play nicely with PySpark since the tesseract-ocr will need to be installed in the linux machines. Are there any open source OCR solutions that would play nicely with PySpark?
I could not find a pure python library. pytesseract calls a linux library called tesseract-ocr which I was able to install on a Spark cluster. You can also install this on your Spark cluster fairly easily and it works well.
Here's an answer on how to install it on Databricks. I used global init scripts to install it:
How to install Tesseract OCR on Databricks

When importing matplotlib. I get the error: No module named 'numpy.core._multiarray_umath'

I am using matplotlib library in my python project which in turn uses numpy. I have deployed the libraries in AWS Lambda Layers and I am importing them in my AWS lambda function. When I test my AWS Lambda function it throws the following error:
Importing the numpy C-extensions failed. This error can happen for many reasons, often due to issues with your setup or how NumPy was installed. We have compiled some common reasons and troubleshooting tips at: numpy.org/devdocs/user/troubleshooting-importerror.html Please note and check the following: * The Python version is: Python3.8 from "/var/lang/bin/python3.8" * The NumPy version is: "1.18.5" Original error was: No module named 'numpy.core._multiarray_umath'
Any idea what could be the possible reason and how to resolve it?
I am answering the question so that if anyone in future also faces this issue so the below solution would might work for them as well.
The problem was that I compiled the required packages in windows 10 enviroment and then I deployed them on layers to be used by AWS Lambda function. AWS Lambda function and Layers use Linux in
the background so the packages compiled in the Windows enviroment were not compatible with AWS Lambda function. When I compiled the required packages again in Linux enviroment and deployed them on layers and used them again with lambda function then it worked like a charm!
This Medium article helped me in solving my issue.

No Library cv2 on AzureML

I am trying to learn AzureML sdk and train my model on cloud.
I successfully train demo project located here.
Now, that I want to train my own model, I get error :
UserError","message":"No module named 'cv2'","target":null,"details":[],"innerErro...
This means, that cv2 is not installed on AzureML and I use it in my train script,...
How to pip install library on AzureML or, how to "copy" virtual environment to my workspace
The answer is to add opencv-python-headless as a pip installation
like this:
TensorFlow(source_directory=script_folder,
script_params=script_params,
compute_target=compute_target,
entry_script=train_script_name,
pip_packages=['opencv-python-headless', 'scikit-image', 'mathematics', 'keras', 'scikit-learn'],
use_gpu=True)
I assume you mean that you are training on Azure ML managed compute?! If so, you need to specify all your required packages in a Conda dependencies file. See here for guidance: https://learn.microsoft.com/sl-si/azure/machine-learning/service/how-to-set-up-training-targets#system-managed-environment
Use a system-managed environment when you want Conda to manage the
Python environment and the script dependencies for you. A
system-managed environment is assumed by default and the most common
choice. It is useful on remote compute targets, especially when you
cannot configure that target.
All you need to do is specify each package dependency using the
CondaDependency class Then Conda creates a file named
conda_dependencies.yml in the aml_config directory in your workspace
with your list of package dependencies and sets up your Python
environment when you submit your training experiment.
Alternativly, if you are using estimators and require only a few packages, you can also specify them directly:
estimator = SKLearn(source_directory=project_folder,
script_params=script_params,
compute_target=compute_target,
entry_script='train_iris.py'
pip_packages=['joblib']
)
https://learn.microsoft.com/en-Us/azure/machine-learning/service/how-to-train-scikit-learn#create-a-scikit-learn-estimator

SageMaker Script Mode Serving

I've trained a tensorflow.keras model using SageMaker Script Mode like this:
import os
import sagemaker
from sagemaker.tensorflow import TensorFlow
estimator = TensorFlow(entry_point='train.py',
source_dir='src',
train_instance_type=train_instance_type,
train_instance_count=1,
hyperparameters=hyperparameters,
role=sagemaker.get_execution_role(),
framework_version='1.12.0',
py_version='py3',
script_mode=True)
However, how do I specify what the serving code is when I call estimator.deploy()? And what is it by default? Also is there any way to modify the nginx.conf using Script Mode?
The Tensorflow container is open source: https://github.com/aws/sagemaker-tensorflow-container You can view exactly how it works. Of course, you can tweak it, build it locally, push it to ECR and use it on SageMaker :)
Generally, you can deploy in two ways:
Python-based endpoints: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/deploying_python.rst
TensorFlow Serving endpoints: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/deploying_tensorflow_serving.rst
I would also recommend looking at the TensorFlow examples here: https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk
With script mode the default serving method is the TensorFlow Serving-based one:
https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/estimator.py#L393
Custom script is not allowed with the TFS based container. You can use serving_input_receiver_fn to specify how the input data is processed as described here: https://www.tensorflow.org/guide/saved_model
As for modifying the ngnix.conf, there are no supported ways of doing that. Depends on what you want to change in the config file you can hack the sagemaker-python-sdk to pass in different values for these environment variables: https://github.com/aws/sagemaker-tensorflow-serving-container/blob/3fd736aac4b0d97df5edaea48d37c49a1688ad6e/container/sagemaker/serve.py#L29
Here is where you can override the environment variables: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/serving.py#L130

Node.js cloud functions using Tensorflow graph

I trained a deep-learning model with python using Tensorflow library, and I saved it in pickle file.
My question- Is there a way to extract this file with firebase cloud functions via node.js runtime?
thanks.
There has been an official javascript version of tensorflow released some weeks ago.
With tfjs-converter it is possible to convert pretrained models to javascript.
Check out https://github.com/tensorflow/tfjs-converter and https://js.tensorflow.org/

Resources