I have a Tensorflow object detection model deployed on Google cloud platform's ML Engine. I have come across posts suggesting Tensorflow Serving + Docker for better performance. I am new to Tensorflow and want to know what is the best way to serve predictions. Currently, the ml engine online predictions have a latency of >50 seconds. My use case is a User uploading pictures using a mobile app and the getting a suitable response based on the prediction result. So, I am expecting th prediciton latency to come down to 2-3 seconds. What else can I do to make the predictions faster?
Google Cloud ML Engine has recently released GPUs support for Online Prediction (Alpha). I believe that our offering may provide the performance improvements you're looking for. Feel free to sign up here: https://docs.google.com/forms/d/e/1FAIpQLSexO16ULcQP7tiCM3Fqq9i6RRIOtDl1WUgM4O9tERs-QXu4RQ/viewform?usp=sf_link
Related
I'm trying to train a deep-learning model for a 512x512 model with TensorFlow. Normally, I would do it with Google Colab or another GPU in the cloud provider. However, due to security reasons, I am going to train the model in Azure which have instances with GPU restricted. My current options are the following:
-Request a Standard_NC4as_T4_v3 as a computing instance for Azure Machine Learning Studio and train everything in Azure Notebooks. I currently have the dataset there.
-Request an NC4as_T4_v3 for a VM and get the NVIDIA image to train the model in a VM. Getting the data from Azure Machine Learning Studio is not a problem.
Both options have the T4 GPU (16GB vRAM) because I did similar experiments in the past and it was good for the job. Before requesting access to an instance, I would like to know which option is better and more likely to be accepted.
I've tried to train a model in the currently available computing instances (Tesla K80 and M60), but they don't have enough power and are out of date with the latest libraries. Tried to work with the only GPU instance available at the moment (NV8as_v4) but it has an AMD GPU and is not intended for Deep Learning training.
VM or ML Studio will not give much difference but the feasibility with Azure ML studio in validation of the images and then we are using the deep learning models. Computational power can be scalable in the form of clusters and instances when we use the azure can be increased in the node count.
In ML Studio we need to use the attached computes to increase the capacity of the computation.
We are researching Stream.io and Stream Framework.
We want to build a high-volume feed with many producers( sources) that include highly personal messages (private messages?)
For building this feed and to make this relevant for all subcribers we will need to use our own ML model for the feed personalisation.
We found this as their solution for personalisation but this might scale badly to allow us to run and develop our own ML model
https://go.getstream.io/knowledge/volumes-and-pricing/can-i
Questions :
1. How do we integrate / add our own ML model for a Getstream-io feed ?
2. SHould we move more to the Stream Framework and how do we connect our own ML model to that feed solution ?
Thanks for pointing us in the right directions !
we have the ability to work with your team to incorporate ML models into Stream. The model has to be close to the data otherwise lag is an issue. If you use the Stream Framework, you're working with python and your own instance of cassandra, which we stopped using because of performance and scalability issues. If you'd like to discuss options, you can reach out via a form on our site.
I'm evaluating tools for production ML based applications and one of our options is Spark MLlib , but I have some questions about how to serve a model once its trained?
For example in Azure ML, once trained, the model is exposed as a web service which can be consumed from any application, and it's a similar case with Amazon ML.
How do you serve/deploy ML models in Apache Spark ?
From one hand, a machine learning model built with spark can't be served the way you serve in Azure ML or Amazon ML in a traditional manner.
Databricks claims to be able to deploy models using it's notebook but I haven't actually tried that yet.
On other hand, you can use a model in three ways :
Training on the fly inside an application then applying prediction. This can be done in a spark application or a notebook.
Train a model and save it if it implements an MLWriter then load in an application or a notebook and run it against your data.
Train a model with Spark and export it to PMML format using jpmml-spark. PMML allows for different statistical and data mining tools to speak the same language. In this way, a predictive solution can be easily moved among tools and applications without the need for custom coding. e.g from Spark ML to R.
Those are the three possible ways.
Of course, you can think of an architecture in which you have RESTful service behind which you can build using spark-jobserver per example to train and deploy but needs some development. It's not a out-of-the-box solution.
You might also use projects like Oryx 2 to create your full lambda architecture to train, deploy and serve a model.
Unfortunately, describing each of the mentioned above solution is quite broad and doesn't fit in the scope of SO.
One option is to use MLeap to serve a Spark PipelineModel online with no dependencies on Spark/SparkContext. Not having to use the SparkContext is important as it will drop scoring time for a single record from ~100ms to single-digit microseconds.
In order to use it, you have to:
Serialize your Spark Model with MLeap utilities
Load the model in MLeap (does not require a SparkContext or any Spark dependencies)
Create your input record in JSON (not a DataFrame)
Score your record with MLeap
MLeap is well integrated with all the Pipeline Stages available in Spark MLlib (with the exception of LDA at the time of this writing). However, things might get a bit more complicated if you are using custom Estimators/Transformers.
Take a look at the MLeap FAQ for more info about custom transformers/estimators, performances, and integration.
You are comparing two rather different things. Apache Spark is a computation engine, while mentioned by you Amazon and Microsoft solutions are offering services. These services might as well have Spark with MLlib behind the scene. They save you from the trouble building a web service yourself, but you pay extra.
Number of companies, like Domino Data Lab, Cloudera or IBM offer products that you can deploy on your own Spark cluster and easily build service around your models (with various degrees of flexibility).
Naturally you build a service yourself with various open source tools. Which specifically? It all depends on what you are after. How user should interact with the model? Should there be some sort of UI or jest a REST API? Do you need to change some parameters on the model or the model itself? Are the jobs more of a batch or real-time nature? You can naturally build all-in-one solution, but that's going to be a huge effort.
My personal recommendation would be to take advantage, if you can, of one of the available services from Amazon, Google, Microsoft or whatever. Need on-premises deployment? Check Domino Data Lab, their product is mature and allows easy working with models (from building till deployment). Cloudera is more focused on cluster computing (including Spark), but it will take a while before they have something mature.
[EDIT] I'd recommend to have a look at Apache PredictionIO, open source machine learning server - amazing project with lot's of potential.
I have been able to just get this to work. Caveats: Python 3.6 + using Spark ML API (not MLLIB, but sure it should work the same way)
Basically, follow this example provided on MSFT's AzureML github.
Word of warning: the code as-is will provision but there is an error in the example run() method at the end:
#Get each scored result
preds = [str(x['prediction']) for x in predictions]
result = ",".join(preds)
# you can return any data type as long as it is JSON-serializable
return result.tolist()
Should be:
#Get each scored result
preds = [str(x['prediction']) for x in predictions]
#result = ",".join(preds)
# you can return any data type as long as it is JSON-serializable
output = dict()
output['predictions'] = preds
return json.dumps(output)
Also, completely agree with MLeap assessment answer, this can make the process run way faster but thought I would answer the question specifically
I'm evaluating tools for production ML based applications and one of our options is Spark MLlib , but I have some questions about how to serve a model once its trained?
For example in Azure ML, once trained, the model is exposed as a web service which can be consumed from any application, and it's a similar case with Amazon ML.
How do you serve/deploy ML models in Apache Spark ?
From one hand, a machine learning model built with spark can't be served the way you serve in Azure ML or Amazon ML in a traditional manner.
Databricks claims to be able to deploy models using it's notebook but I haven't actually tried that yet.
On other hand, you can use a model in three ways :
Training on the fly inside an application then applying prediction. This can be done in a spark application or a notebook.
Train a model and save it if it implements an MLWriter then load in an application or a notebook and run it against your data.
Train a model with Spark and export it to PMML format using jpmml-spark. PMML allows for different statistical and data mining tools to speak the same language. In this way, a predictive solution can be easily moved among tools and applications without the need for custom coding. e.g from Spark ML to R.
Those are the three possible ways.
Of course, you can think of an architecture in which you have RESTful service behind which you can build using spark-jobserver per example to train and deploy but needs some development. It's not a out-of-the-box solution.
You might also use projects like Oryx 2 to create your full lambda architecture to train, deploy and serve a model.
Unfortunately, describing each of the mentioned above solution is quite broad and doesn't fit in the scope of SO.
One option is to use MLeap to serve a Spark PipelineModel online with no dependencies on Spark/SparkContext. Not having to use the SparkContext is important as it will drop scoring time for a single record from ~100ms to single-digit microseconds.
In order to use it, you have to:
Serialize your Spark Model with MLeap utilities
Load the model in MLeap (does not require a SparkContext or any Spark dependencies)
Create your input record in JSON (not a DataFrame)
Score your record with MLeap
MLeap is well integrated with all the Pipeline Stages available in Spark MLlib (with the exception of LDA at the time of this writing). However, things might get a bit more complicated if you are using custom Estimators/Transformers.
Take a look at the MLeap FAQ for more info about custom transformers/estimators, performances, and integration.
You are comparing two rather different things. Apache Spark is a computation engine, while mentioned by you Amazon and Microsoft solutions are offering services. These services might as well have Spark with MLlib behind the scene. They save you from the trouble building a web service yourself, but you pay extra.
Number of companies, like Domino Data Lab, Cloudera or IBM offer products that you can deploy on your own Spark cluster and easily build service around your models (with various degrees of flexibility).
Naturally you build a service yourself with various open source tools. Which specifically? It all depends on what you are after. How user should interact with the model? Should there be some sort of UI or jest a REST API? Do you need to change some parameters on the model or the model itself? Are the jobs more of a batch or real-time nature? You can naturally build all-in-one solution, but that's going to be a huge effort.
My personal recommendation would be to take advantage, if you can, of one of the available services from Amazon, Google, Microsoft or whatever. Need on-premises deployment? Check Domino Data Lab, their product is mature and allows easy working with models (from building till deployment). Cloudera is more focused on cluster computing (including Spark), but it will take a while before they have something mature.
[EDIT] I'd recommend to have a look at Apache PredictionIO, open source machine learning server - amazing project with lot's of potential.
I have been able to just get this to work. Caveats: Python 3.6 + using Spark ML API (not MLLIB, but sure it should work the same way)
Basically, follow this example provided on MSFT's AzureML github.
Word of warning: the code as-is will provision but there is an error in the example run() method at the end:
#Get each scored result
preds = [str(x['prediction']) for x in predictions]
result = ",".join(preds)
# you can return any data type as long as it is JSON-serializable
return result.tolist()
Should be:
#Get each scored result
preds = [str(x['prediction']) for x in predictions]
#result = ",".join(preds)
# you can return any data type as long as it is JSON-serializable
output = dict()
output['predictions'] = preds
return json.dumps(output)
Also, completely agree with MLeap assessment answer, this can make the process run way faster but thought I would answer the question specifically
We are trying to figure out how to host and run many of our existing scikit-learn and R models (as is) in GCP. It seems ML Engine is pretty specific to Tensorflow. How can I train a scikit-learn model on Google cloud platform and manage my model if the dataset is too large to pull into datalab? Can I still use ML Engine or is there a different approach most people take?
As an update I was able to get the python script that trains the scikit-learn model to run by submitting it as a training job to ML Engine but haven't found a way to host the pickled model or use it for prediction.
Cloud ML Engine only supports models written in TensorFlow.
If you're using scikit-learn you might want to look at some of the higher level TensorFlow libraries like TF Learn or Keras. They might help migrate your model to TensorFlow in which case you could then use Cloud ML Engine.
It's possible, Cloud ML has this feature from Dec 2017, As of today it is provided as an early access. Basically Cloud ML team is testing this feature but you can also be part of it. More on here.
Use the following command to deploy your scikit-learn models to cloud ml. Please note these parameters may change in future.
gcloud ml-engine versions create ${MODEL_VERSION} --model=${MODEL} --origin="gs://${MODEL_PATH_IN_BUCKET}" --runtime-version="1.2" --framework="SCIKIT_LEARN"
sklearn is now supported on ML Engine.
Here is a fully worked out example of using fully-managed scikit-learn training, online prediction and hyperparameter tuning:
https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/blogs/sklearn/babyweight_skl.ipynb