How to add python executable path for sagemaker SKLearn python SDK? - scikit-learn

When I run train.py via SKLearn estimater in sagemaker python SDK it runs using default python executable path of sagemaker which is /miniconda3/bin/python
I want to run using another python executable path which is /miniconda3/envs/interplay-env/bin/python3
How to set this in sagemaker SKLearn estimator?
Below is current configuration of sagemaker estimator
sklearn_estimator = SKLearn(
entry_point=train.py,
role=role,
instance_count=1,
instance_type="ml.c5.xlarge",
framework_version=FRAMEWORK_VERSION,
base_job_name=training_job_name,
hyperparameters={
"parameters_dict": parameters_dict1,
"features": features,
"target": target,
"project_path":project_path,
"bucket_name":bucket_name,
}
)
I tried setting the executable path in entry_point like below. But it doesn't work.
sklearn_estimator = SKLearn(
entry_point=["/miniconda3/envs/interplay-env/bin/python3",train_file_name],
role=role,
instance_count=1,
instance_type="ml.c5.xlarge",
framework_version=FRAMEWORK_VERSION,
base_job_name=training_job_name,
hyperparameters={
"parameters_dict": parameters_dict1,
"features": features,
"target": target,
"project_path":project_path,
"bucket_name":bucket_name,
}
)

Not sure why you would want to do that. By Default SageMaker runs the code in a container and provides a separate conda environment for your code to run. If you need additional libraries you can have a requirements.txt and add the dependent libraries into it. Later you will add another attribute in your SKlearn Estimator called source_dir point to a folder that will contain your entrypoint script and the requirements.txt

Related

Finding right version of python/sklearn in to use machine learning model in pyenv

I pickled a model on Kaggle and tried to download it to run locally. Using poetry and pyenv I ran the following commands to create a project:
pyenv local 3.6.6
poetry new model_api
cd model_test
poetry env use python
poetry add "sklearn>=0.21.3"
but received the error below.
If I simply use sklearn and install it with poetry I get this error when executing my code in VS Code.
/bin/python /home/gary/Documents/model_api/model_api/app.py
Traceback (most recent call last):
File "/home/gary/Documents/model_api/model_api/app.py", line 5, in <module>
model = pickle.load(f)
ModuleNotFoundError: No module named 'sklearn.ensemble.forest'
This is the code I am attempting to run.
import sklearn
import pickle
f = open('./model/ForestModel','rb')
model = pickle.load(f)
I'm trying to use Python 3.6.6 and sklearn 0.21.3 based on what I'm seeing on Kaggle:
If I try to use a more current version of Python like 3.8.10 I get the same error. I think I'm missing something simple/obvious. Any pointers or things I could check would be greatly appreciated.
There is no package sklearn with the version you like to be installed. I think you are looking for scikit-learn instead (Docs).
You can install the most up to date version that is supported by your other dependencies by running:
poetry add scikit-learn
Or if you need to install a specific version:
poetry add "scikit-learn==0.24.2"
For other options, check out the poetry docs here.
I would try to use the Anaconda package manager instead of pyenv. You could create an environment with the following code:
conda create -n envName sklearn
And they would probably keep up with the best coordination of Python packages so that you don’t get errors.

Unable to import numpy 1.19.1 in AWS Lambda No module named 'numpy.core._multiarray_umath'

I am unable to import numpy 1.19.1 in AWS Lambda with python3.8 on AWS Lambda
I am using the following dependencies:
pandas 1.1.0
pyarrow 1.0.0
numpy 1.19.1
psycopg2 2.8.5
Because I work on a windows environment, I created an EC2 Linux instance installed python3.8 and downloaded all required libraries, then I added them into the project, but the moment I try to import pandas I get the following:
[ERROR] ImportError: Unable to import required dependencies:
numpy:
IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
Importing the numpy c-extensions failed.
- Try uninstalling and reinstalling numpy.
- If you have already done that, then:
1. Check that you expected to use Python3.8 from "/var/lang/bin/python3.8",
and that you have no directories in your PATH or PYTHONPATH that can
interfere with the Python and numpy version "1.18.2" you're trying to use.
2. If (1) looks fine, you can open a new issue at
https://github.com/numpy/numpy/issues. Please include details on:
- how you installed Python
- how you installed numpy
- your operating system
- whether or not you have multiple versions of Python installed
- if you built from source, your compiler versions and ideally a build log
- If you're working with a numpy git repository, try `git clean -xdf`
(removes all files not under version control) and rebuild numpy.
Note: this error has many possible causes, so please don't comment on
an existing issue about this - open a new one instead.
Original error was: No module named 'numpy.core._multiarray_umath'
Traceback (most recent call last):
  File "/var/task/src/py38-lib-test.py", line 28, in py38test
    import pandas
  File "/tmp/lib/pandas/__init__.py", line 16, in <module>
    raise ImportError(END RequestId: 07762380-1fc4)
Lastly, I noticed AWS Lambda provides a layer with numpy and sci-kit, I tried removing my numpy version but kept the rest and added the layer to the function, but the same error occurs.
Thanks in advance your comments.
I use the layer provided by Klayers to solve the problem.
Suppose you're running python 3.8 in us-east-1 region, according to this Klayers document, you can use arn:aws:lambda:us-east-1:770693421928:layer:Klayers-p38-numpy:9 as your layer so that you can run import numpy in the lambda function.
AWS Lambda function don't work this way. If you open the Pandas package it'll be having the Numpy package with them but they would not work.
The easy solution is to first download the required packages separately depending upon your python version and work enviroment from this site, unzip them and add them to your project directory. Create a .zip of your project and deploy it on AWS Lambda function. It'll work this way.
You can refer to this site in order to follow the complete procedure.
Is your ec2 instance an amazon linux2 machine? You could also try building and run a docker image for amazon linux 2 and get the python libs compatible to the environment you need in your Lambda, by volume mounting to your host.
Something similat to docker lambda:
https://github.com/lambci/docker-lambda/tree/master/python3.8
I had the same issue, tried packaging all libs with my base code, tried custom lambda layer by separating numpy and pandas libs. Nothing worked.
Used default AWS Layers. In the default layers, AWS provides layers like AWSSDKPandas, CodeGuru, Lambda Insights, etc. AWSSDKPandas layer is packaged with pandas libs and other dependencies like numpy, etc.
So I removed numpy dependency from my base package and added AWSSDKPandas as Lambda layer. Worked well.

PyInstaller .exe file plugin could not be extracted (with Keras and ONNX converter libraries and modules)

I have a python script that takes an ONNX Neural Network and converts it to a keras (.h5) model to be trained and exported back into ONNX as a newly trained model to be deployed later. The problem is that I am required to create a python .exe file from the python script as the goal is to deploy the deep learning model in C Plus Plus. Currently, the Python script does a great job of altering the onnx program for the C Plus Plus program to deploy the trained model, and successfully creates a .exe file with Pyinstaller with the following command:
pyinstaller --onefile Material_Classifier.py
However, the problem is that when I click on the .exe file in File Explorer, the console renders empty for about half a minute, and then stops working. When I use cmd, it gives the following error:
Failed to decode wchar_t from UTF-8
MultiByteToWideChar: The data area passed to a system call is too small.
share\jupyter\lab\staging\node_modules\.cache\terser-webpack-plugin\content-v2\sha512\2e\ba\cfce62ec1f408830c0335f2b46219d58ee5b068473e7328690e542d2f92f2058865c600d845a2e404e282645529eb0322aa4429a84e189eb6b58c1b97c1a could not be extracted!
fopen: No such file or directory
My first hunch was that my input ONNX folder was not in the same folder as the .exe file as the script requires it for input, but this did nothing to fix the .exe file. And now, I'm leaning towards the fact that PyInstaller may not like some of the following libraries that I used in my script:
import os
import onnx
import keras
import keras2onnx
from onnx2keras import onnx_to_keras
import tensorflow as tf
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.python.keras.models import load_model
import cv2
from tqdm import tqdm
import pickle
My second hunch is that because my program requires an external file and outputs a new file, I would need to specify this somehow before creating a .exe with Pyinstaller. But I currently have no way of knowing either way.
TLDR; I'm wondering if PyInstaller has trouble processing deep learning libraries if there is something else I'm doing incorrectly, and if there are any alternatives to creating .exe files from python scripts for deep learning. Thanks!
Answering my own question,
This process had something to do with PyInstaller not working with the Anaconda environment upon creating the .exe executable. When uninstalling Anaconda and reinstalling Python 3.7.x, I was able to successfully able to do this with no external environment when copying the tensorflow, onnx, and onnx.dist folders into the same directory as my script, then writing
pyinstaller --onefile --add-data C:\path\to\onnx;onnx. --add-data C:\path\to\tensorflow;tensorflow. --add-data C:\path\to\onnx-dist;onnx-dist. script.py
It is not hard to guess that the error occurs because the cache file name is too long. In most practical cases, it is enough to enter the directory specified in the error message and manually change the file name to any shorter one. As a rule, the share directory is located in directory Anaconda3. After that, you need to rebuild the executable file and make sure that the error message is no longer displayed.

Error with scikit-learn module in AWS Lambda

I'm using AWS Lambda to host a sklearn model. I'm able to do this successfully with a model that was made in python 3.6 but I'm getting the following error with one using python 3.7.
[ERROR] Runtime.ImportModuleError: Unable to import module 'lambda_function': No module named 'sklearn.__check_build._check_build'
___________________________________________________________________________
Contents of /opt/python/lib/python3.7/site-packages/sklearn/__check_build:
__init__.py __pycache__ _check_build.cpython-37m-darwin.so
setup.py
___________________________________________________________________________
It seems that scikit-learn has not been built correctly.
If you have installed scikit-learn from source, please do not forget
to build the package before using it: run `python setup.py install` or
`make` in the source directory.
If you have used an installer, please check that it is suited for your
Python version, your operating system and your platform.
I created my sklearn layer by uploading a zipped file of the library. I created this zipped file using a virtual environment and installing the library through pip.
Does anyone know what I'm doing wrong? Has anyone been able to successfully install a sklearn layer in python 3.7 in AWS Lambda?
I uploaded my zip file here: https://github.com/aos226/Sklearn_AWSLambda
Thanks for your help!

keras -> mlmodel: coreml object has no attribute 'convert'

I am trying to convert my keras model into mlmodel using coreml. However, it is saying that coremltools module has no attribute 'convert'.
AttributeError: 'module' object has no attribute 'convert'
My coremltools, keras, tensorflow(tensorflow-gpu) modules are all up to date.
I am also using python 2.7.10.
I've used windows and mac, in which, neither worked. However, caffe.convert is working using a caffe model.
Code:
coreml_model = coremltools.converters.keras.convert(MODEL_PATH)
As per the documentation, I expected the converters.keras.convert method to be available in coremltools.
Documentation: https://apple.github.io/coremltools/generated/coremltools.converters.keras.convert.html
Please help, thanks in advance!
Edit:
import coremltools
# from keras.models import load_model
import keras
import sys
from keras.applications import MobileNet
from keras.utils.generic_utils import CustomObjectScope
with CustomObjectScope({'relu6': keras.applications.MobileNet.relu6, 'DepthwiseConv2D': keras.applications.mobilenet.DepthwiseConv2D}):
model = load_model('weights.hdf5')
MODEL_PATH = "data/model_wide_cifar-10_fruits_model.h5"
def main():
""" Takes in keras model and convert to .mlmodel"""
print(sys.version)
# Load in keras model.
# model = load_model(MODEL_PATH)
# load labels
labels=[]
label_handler = open("fruit-labels.txt", 'r')
for label in label_handler:
labels.append(label.rstrip())
label_handler.close()
print("[INFO] Labels: {0}".format(labels))
# Convert to .mlmodel
coreml_model = coremltools.converters.keras.convert(
model=MODEL_PATH,
input_names="image",
output_names="image",
class_labels=labels)
labels = 'fruit-labels.txt'
# Save .mlmodel
coreml_model.utils.save_spec('fruitclassifier.mlmodel')
The solution is to use virtualenv. Follow the instructions from the coremltools README:
Installation
We recommend using virtualenv to use, install, or build coremltools. Be
sure to install virtualenv using your system pip.
pip install virtualenv
The method for installing coremltools follows the
standard python package installation steps.
To create a Python virtual environment called pythonenv follow these steps:
# Create a folder for virtualenv
mkdir virtualenvs
cd virtualenvs
# Create a Python virtual environment for your Core ML project
virtualenv coremltools
To activate your new virtual environment and install coremltools in this environment, follow these steps:
# Active your virtual environment
source coremltools/bin/activate
# Install coremltools in the new virtual environment, pythonenv
pip install --upgrade pip
pip install -U coremltools==3.0b5
Install keras and tensorflow
pip install keras tensorflow
Now make sure it works. With the coremltools environment activated, run
>>> python
Python 3.7.4 (v3.7.4:e09359112e, Sep 5 2019, 14:54:52)
>>> import coremltools
>>> coremltools.converters.keras.convert()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: convert() missing 1 required positional argument: 'model'
coremltools documentation
Credit to this guys issue: https://github.com/apple/coremltools/issues/440
Communist Hacker's answer does not work for my current setup:
tensorflow 2.4.1
coremltools 4.1
Python 3.8.7
However, after reviewing the documentation for coremltools here, I was able to fix it by removing keras from the function and the call now works:
import coremltools
coreml_model = coremltools.converters.convert(model,
input_names="inputname",
output_names="outputname")
Running the above command now produces this in my Jupyter notebook:
Running TensorFlow Graph Passes: 100%|██████████| 5/5 [00:00<00:00, 37.53 passes/s]
Converting Frontend ==> MIL Ops: 100%|██████████| 6/6 [00:00<00:00, 5764.05 ops/s]
Running MIL optimization passes: 100%|██████████| 17/17 [00:00<00:00, 5633.05 passes/s]
Translating MIL ==> MLModel Ops: 100%|██████████| 3/3 [00:00<00:00, 6864.65 ops/s]
Note - I am a complete noob with this, so I probably described things
incorrectly.

Resources