SageMaker deploying to EIA from TF Script Mode Python3 - python-3.x

I've fitted a Tensorflow Estimator in SageMaker using Script Mode with framework_version='1.12.0' and python_version='py3', using a GPU instance.
Calling deploy directly on this estimator works if I select deployment instance type as GPU as well. However, if I select a CPU instance type and/or try to add an accelerator, it fails with an error that docker cannot find a corresponding image to pull.
Anybody know how to train a py3 model on a GPU with Script Mode and then deploy to a CPU+EIA instance?
I've found a partial workaround by taking the intermediate step of creating a TensorFlowModel from the estimator's training artifacts and then deploying from the model, but this does not seem to support python 3 (again, doesn't find a corresponding container). If I switch to python_version='py2', it will find the container, but fail to pass health checks because all my code is for python 3.

Unfortunately there are no TF + Python 3 + EI serving images at this time. If you would like to use TF + EI, you'll need to make sure your code is compatible with Python 2.
Edit: after I originally wrote this, support for TF + Python 3 + EI has been released. At the time of this writing, I believe TF 1.12.0, 1.13.1, and 1.14.0 all have Python 3 + EI support. For the full list, see https://github.com/aws/sagemaker-python-sdk#tensorflow-sagemaker-estimators.

Related

Does AzureML RL support PyTorch?

Does AzureML RL support PyTorch?
As RLlib itself supports PyTorch as a framework, I tried to run AzureML RL with PyTorch but it failed.
I referred to this page to know how to specify the framework.
I added "framework":"torch" to my AzureML RL experiment's config but it failed.
Here're snippet from the training script.
tune.run(
run_or_experiment="PPO",
config={
"env":"CartPole-v0",
"env_config":env_config,
"num_gpus":0,
"num_workers":1,
"callbacks":callbacks,
"framework": "torch",
},
stop=stop,
checkpoint_freq=2,
checkpoint_at_end=True,
local_dir='./logs',
Ray's support for PyTorch exists, but is not nearly as extensive as its support for Tensorflow.
Whether or not PyTorch will work for your problem depends on the version of Ray/RLLib you're using, the algorithm you're running, and sometimes even the nature of the Environment (specifically the action and observation spaces).
I recommend starting by making sure you're using a recent version of Ray. You can select a version by specifying a Pip package in the configuration for your ReinforcementLearningEstimator (this will be in your notebook code, not in the training script). You can add code that looks something like this:
pip_packages=["ray[rllib]==0.8.7"]
Then in your ReinforcementLearningEstimator setup make sure you set pip_packages:
rl_estimator = ReinforcementLearningEstimator(
...
# Pip packages
pip_packages=pip_packages,
...

opencv doesn't use all GPU memory

I'm trying to use the cvlib package which use yolov3 model to recognize objects on images on windows 10.
Let's take an easy example:
import cvlib as cv
import time
from cvlib.object_detection import draw_bbox
inittimer=time.time()
bbox, label, conf = cv.detect_common_objects(img,confidence=0.5,model='yolov3-worker',enable_gpu=True)
print('The process tooks %.3f s'%(time.time()-inittimer)
output_image = draw_bbox(img, bbox, label, conf)
The results give ~60ms.
cvlib use opencv to compute this cnn part.
If now I try to see how much GPU tensorflow used, using subprocess, It tooks only 824MiB.
while the program runs, if I start nvidia-smi it gives me this result:
As u can see there is much more memory available here. My question is simple.. why Cvlib (and so tensorflow) doesn't use all of it to improve the time's detection?
EDIT:
As far as I understand, cvlib use tensorflow but it also use opencv detector. I installed opencv using cmake and Cuda 10.2
I don't understand why but in the nvidia-smi it's written CUDA Version : 11.0 which is not. Maybe that's the part of the problem?
You can verify if opencv is using CUDA or not. This can be done using the following
import cv2
print(cv2.cuda.getCudaEnabledDeviceCount())
This should get you the number of CUDA enabled devices in your machine. You should also check the build information by using the following
import cv2
print cv2.getBuildInformation()
The output for both the above cases can indicate whether your opencv can access GPU or not. In case it doesn't access GPU then you may consider reinstallation.
I got it! The problem come from the fact that I created a new Net object for each itteration.
Here is the related issue on github where you can follow it: https://github.com/opencv/opencv/issues/16348
With a custom function, it now works at ~60 fps. Be aware that cvlib is, maybe, not done for real time computation.
workon opencv_cuda
cd opencv
mkdir build
cd build
cmake -D CMAKE_BUILD_TYPE=RELEASE
and share the result.
It should be something like this

keras preprocessing logic

Background:
In a vision application on a GCP, we are using TF serving. The application using TF Serving is written in Go. This application converts the image to Tensor and sends it to the TF serving using gRPC.
Problem:
The preprocessing logic in Golang does not work as well as it does in Python, using Keras image library (the accuracy of inference suffers). A part of the reason could be that Python libraries were used during training.
We tried,
Tensorflow serving provides a way to introduce a pre-processor that can run on the serving container. It seems to have limited functionality (can't package Keras library with the model). We tried the following two options
What works is Keras Preprocessing (Python), on the client side as follows.
img = tf.keras.preprocessing.image.load_img(file_name, target_size=(HEIGHT, WIDTH))
img_array = tf.keras.preprocessing.image.img_to_array(img)
… grpc call to TensorflowServing...
Our goal is to use “serving_input_receiver_fn” and preprocess image in the TFServing space as described in this blog post: https://medium.com/devseed/technical-walkthrough-packaging-ml-models-for-inference-with-tf-serving-2a50f73ce6f8
But the following code which is executed as “serving_input_receiver_fn” does not yield the correct inferences.
image = tf.image.decode_image(image_str_tensor, channels=CHANNELS dtype=tf.uint8)
image = tf.reshape(image, [HEIGHT, WIDTH, CHANNELS])
Our goal is to run the following Keras code (in a similar way) inside ““serving_input_receiver_fn” (assuming that we can load the image from “grpc” stream).
img = tf.keras.preprocessing.image.load_img(file_name, target_size=(HEIGHT, WIDTH))
img_array = tf.keras.preprocessing.image.img_to_array(img)
Is it possible? This is a massive deployment (70 GPUs & 2300 CPUs) therefore every bit of the performance counts. In our case, the image preprocessing on TF-Serving machine is the most optimal.
I don't actually have an answer but maybe can point you to some resources to help. I think firstly, keras.preprocessing is supposed to be quite slow, check out https://www.tensorflow.org/tutorials/load_data/images and it recommends building pre-processing pipeline as a tf.data.Dataset pipeline
The above keras.preprocessing method is convienient, but has two
downsides:
It's slow. See the performance section below. It lacks fine-grained
control. It is not well integrated with the rest of TensorFlow. To
load the files as a tf.data.Dataset
Why not include the pre-processing layer as part of the model graph itself so that it will run WITHIN tensorflow serving?

ImageDataGenerator.flow_from_directory() segfaulting with no augmentation

I'm trying to construct an autoencoder for ultrasound images, and am unable to use ImageDataGenerator.flow_from_directory() to provide train/test datasets due to segfault on call to the method. No augmentation is being used, which should only result in the original images being provided by the generator.
The source images are in TIFF format, so I first tried converting them to JPG and PNG thinking that maybe PIL was faulting on the encoding, no difference. I have tried converting to different color modes (grayscale, RGB, RGBA) with no change in behavior. I have stripped the code down to the bare minimum, taking defaults for nearly all function params and still getting a segfault on call in both debug and full run.
# Directory below contains a single subdirectory "input" containing 5635 TIFF images
from keras.preprocessing.image import *
print('Create train_gen')
train_gen = ImageDataGenerator().flow_from_directory(
directory=r'/data/ultrasound-nerve-segmentation/train/',
class_mode='input'
)
print('Created train_gen')
Expected output is a report of 5635 images found in one class "input" and both debug messages to print out, with usable generator for use in Model.fit_generator().
Actual output:
Using TensorFlow backend.
Create train_gen
Found 5635 images belonging to 1 classes.
Segmentation fault
Is there something I'm doing above that could be causing the problem? According to every scrap of sample code I can find, it looks like it should be working.
Environment is:
Ubuntu 16.04 LTS
CUDA 10.1
tensorflow-gpu 1.14
Keras 2.2.4
Python 3.7.2
Thanks for any help you can provide!
OK so I haven't pegged specifically why it is segfaulting, but it appears to be related to the virtualenv it is running under. I was apparently using a JupyterHub environment, which seems to not behave even when run from an ssh session (vs from within JupyterHub consoles). Once I created a whole new standalone virtualenv with only the TF + Keras packages installed, it appears to run just fine.

How to train YOLO-Tensor flow own dataset

I am trying make an app which will detect traffic sign from video's frames.I am using yolo-tensor by following steps from https://github.com/thtrieu/darkflow .
I need to know how can I train this model with my data-set of images of traffice signs?
If you're using Darkflow on Windows then you need to make some small adjustments to how you use Darkflow. If cloning the code and using straight from the repository then you need to place python in front of the commands given as it is a python file.
e.g. python flow --imgdir sample_img/ --model cfg/yolo-tiny.cfg --load bin/yolo-tiny.weights --json
If you are installing using pip globally (not a bad idea) and you still want to use the flow utility from any directory just make sure you take the flow file with you.
To train, use the commands listed on the github page here: https://github.com/thtrieu/darkflow
If training on your own data you will need to take some extra steps as outlined here: https://github.com/thtrieu/darkflow#training-on-your-own-dataset
Your annotations need to be in the popular PASCAL VOC format which are a set of xml files including file information and the bounding box data.
Point your flow command at your new dataset and annotations to train.
The best data for you to practice is PASCAL VOC dataset. There are 2 folders you need to prepare for the training. 1 folder with images and 1 folder with xml files(annotation folder), 1 image will need 1 xml file (have the same name) content all the basic informations (object name, object position, ...). after that you only need to choose 1 predefine .cfg file in cfg folder and run the command follow:
flow --model cfg/yolo-new.cfg --train --dataset "path/to/images/folder" --annotation "path/to/annotation/folder"
Read more the options supported by darkflow to optimize more the training process.
After spending too much time on how to train custom data set for object detection
Prerequisite :
1:training environment : a system with at least 4gb GPU or you can use AWS / GCP pre-configured cloud machine with cuda 9 installation
2: ubuntu 16.04 os
3: images of the object you want to detect. images size should not be too much large it will create out of memory issue in dataset training
4: labelling tool many are available like LabelImg/ BBox-Label-Tool i used is also good one
I tried python project dataset-generator also but result of labelling using dataset generator was not efficient in real time scenarios
My suggestion for training environment is to use AWS machine rather than spend time in local installation of cuda and cudnn even though you are able to install cuda locally but if you are not having GPU >= 4 gb you will not be able to train many times it will break due to out of memory issue
solutions to train data set :
1: train ssd_mobilenet_v2 data set using tensorflow object detection api
this training output can be use on both android , ios platform
2: use darknet to train data set which required pascal VOC data format of labelling , for that labelIMG can do the job of labelling very good
3: retrain that data weights which comes as output from darknet with darkflow

Resources