I played a bit around with Azure ML studio. So as I understand the process goes like this:
a) Create training experiment. Train it with data.
b) Create Scoring experiment. This will include the trained model from the training experiment. Expose this as a service to be consumed over REST.
Maybe a stupid question but what is the recommended way to get the complete experience like the one i get when I use an app like https://datamarket.azure.com/dataset/amla/mba (Frequently Bought Together API built with Azure Machine Learning).
I mean the following:
a) Expose 2 or more services - one to train the model and the other to consume (test) the trained model.
b) User periodically sends training data to train the model
c) The trained model/models now gets saved available for consumption
d) User is now able to send a dataframe to get the predicted results.
Is there an additional wrapper that needs to be built?
If there is a link documenting this please point me to the same.
The Azure ML retraining API is designed to handle the workflow you describe:
http://azure.microsoft.com/en-us/documentation/articles/machine-learning-retrain-models-programmatically/
Hope this helps,
Roope - Microsoft Azure ML Team
You need to take a look at Azure Data Factory.
I have written a Custom Activity to do the same.
And used the logic to retrain the model in the custom activity.
Related
i'm using Custom vision from Microsoft service to classify image. Since the model will have to be re train few times a years, I would like to know if I can save current version of azure custom vision model to re train my new model on the same version? because I guess microsoft will try to increase performances of its service among time so model used on this tools will probably change...
You can export the model after each run, but you cannot use an existing model as a starting point for another training run.
So yes, as it is a managed service, Microsoft might optimize or somehow change the algorithms to train in the background. It is on you to decide if that works for you. If not, a managed service like this is probably generally not something you should use, but instead train your own models entirely.
We are researching Stream.io and Stream Framework.
We want to build a high-volume feed with many producers( sources) that include highly personal messages (private messages?)
For building this feed and to make this relevant for all subcribers we will need to use our own ML model for the feed personalisation.
We found this as their solution for personalisation but this might scale badly to allow us to run and develop our own ML model
https://go.getstream.io/knowledge/volumes-and-pricing/can-i
Questions :
1. How do we integrate / add our own ML model for a Getstream-io feed ?
2. SHould we move more to the Stream Framework and how do we connect our own ML model to that feed solution ?
Thanks for pointing us in the right directions !
we have the ability to work with your team to incorporate ML models into Stream. The model has to be close to the data otherwise lag is an issue. If you use the Stream Framework, you're working with python and your own instance of cassandra, which we stopped using because of performance and scalability issues. If you'd like to discuss options, you can reach out via a form on our site.
I'm evaluating tools for production ML based applications and one of our options is Spark MLlib , but I have some questions about how to serve a model once its trained?
For example in Azure ML, once trained, the model is exposed as a web service which can be consumed from any application, and it's a similar case with Amazon ML.
How do you serve/deploy ML models in Apache Spark ?
From one hand, a machine learning model built with spark can't be served the way you serve in Azure ML or Amazon ML in a traditional manner.
Databricks claims to be able to deploy models using it's notebook but I haven't actually tried that yet.
On other hand, you can use a model in three ways :
Training on the fly inside an application then applying prediction. This can be done in a spark application or a notebook.
Train a model and save it if it implements an MLWriter then load in an application or a notebook and run it against your data.
Train a model with Spark and export it to PMML format using jpmml-spark. PMML allows for different statistical and data mining tools to speak the same language. In this way, a predictive solution can be easily moved among tools and applications without the need for custom coding. e.g from Spark ML to R.
Those are the three possible ways.
Of course, you can think of an architecture in which you have RESTful service behind which you can build using spark-jobserver per example to train and deploy but needs some development. It's not a out-of-the-box solution.
You might also use projects like Oryx 2 to create your full lambda architecture to train, deploy and serve a model.
Unfortunately, describing each of the mentioned above solution is quite broad and doesn't fit in the scope of SO.
One option is to use MLeap to serve a Spark PipelineModel online with no dependencies on Spark/SparkContext. Not having to use the SparkContext is important as it will drop scoring time for a single record from ~100ms to single-digit microseconds.
In order to use it, you have to:
Serialize your Spark Model with MLeap utilities
Load the model in MLeap (does not require a SparkContext or any Spark dependencies)
Create your input record in JSON (not a DataFrame)
Score your record with MLeap
MLeap is well integrated with all the Pipeline Stages available in Spark MLlib (with the exception of LDA at the time of this writing). However, things might get a bit more complicated if you are using custom Estimators/Transformers.
Take a look at the MLeap FAQ for more info about custom transformers/estimators, performances, and integration.
You are comparing two rather different things. Apache Spark is a computation engine, while mentioned by you Amazon and Microsoft solutions are offering services. These services might as well have Spark with MLlib behind the scene. They save you from the trouble building a web service yourself, but you pay extra.
Number of companies, like Domino Data Lab, Cloudera or IBM offer products that you can deploy on your own Spark cluster and easily build service around your models (with various degrees of flexibility).
Naturally you build a service yourself with various open source tools. Which specifically? It all depends on what you are after. How user should interact with the model? Should there be some sort of UI or jest a REST API? Do you need to change some parameters on the model or the model itself? Are the jobs more of a batch or real-time nature? You can naturally build all-in-one solution, but that's going to be a huge effort.
My personal recommendation would be to take advantage, if you can, of one of the available services from Amazon, Google, Microsoft or whatever. Need on-premises deployment? Check Domino Data Lab, their product is mature and allows easy working with models (from building till deployment). Cloudera is more focused on cluster computing (including Spark), but it will take a while before they have something mature.
[EDIT] I'd recommend to have a look at Apache PredictionIO, open source machine learning server - amazing project with lot's of potential.
I have been able to just get this to work. Caveats: Python 3.6 + using Spark ML API (not MLLIB, but sure it should work the same way)
Basically, follow this example provided on MSFT's AzureML github.
Word of warning: the code as-is will provision but there is an error in the example run() method at the end:
#Get each scored result
preds = [str(x['prediction']) for x in predictions]
result = ",".join(preds)
# you can return any data type as long as it is JSON-serializable
return result.tolist()
Should be:
#Get each scored result
preds = [str(x['prediction']) for x in predictions]
#result = ",".join(preds)
# you can return any data type as long as it is JSON-serializable
output = dict()
output['predictions'] = preds
return json.dumps(output)
Also, completely agree with MLeap assessment answer, this can make the process run way faster but thought I would answer the question specifically
I wanted to know how exactly the following works in backend
Scenario :
-> We get data from Edgex foundry in UTC format and we it store it in Azure Document DB in (CST/CDT timezone) format
-> We trained ML model on data(with Date in CST/CDT timezone) and Deploy web service.
So I have few basic doubts below
When web job hits our predictive webservice , will the trained ML model be run again?
Do we need to convert the UTC timezone for new incoming test data( which we want to predict) into CST/CDT timezone, as TimeStamp does
matter for our prediction?
What happens in backend when predictive webservice API is called?
This is only based on my experience with Azure ML, but I think I can help with your questions.
When web job hits our predictive webservice, will the trained ML model be run again?
Yes, in the sense that it will call the predict (or similar) method on the model on the new data. For instance, in scikit-learn you would train your model using the fit method. Once the model is in production, only the predict method would be called.
It will also run the whole workflow you have set up to be deployed as the web service. As an example below is a workflow I've played around with before. Each time the web service is run with new data, this whole thing will be run. This is like creating a Pipeline in scikit-learn.
Do we need to convert the UTC timezone for new incoming test data( which we want to predict) into CST/CDT timezone, as TimeStamp does matter for our prediction?
I would say yes, you would need to convert to the timezone that was used when training in the model. This can be done by adding a step in your workflow then when you call the web service it will do the necessary converting for you before making a prediction.
What happens in backend when predictive webservice API is called?
I'm not sure if anyone knows for sure other than the folks at Microsoft, but for sure it will run the workflow you have set up.
I know it's not much, but I hope this helps or at least gets you on the right track for what you need.
I’ve started experimenting with the Azure ML studio and started playing with templates, upload data into it and immediately start working with it.
The problem is, I can’t seem to figure out how to tie these algorithm to real time data. Can I define a data source to input or can I configure the Azure ML studio in a way that it runs on data that I’ve specified?
Azure ML studio is for experimenting to find a proper solution to the problem set you have. You can upload data to sample, split and train your algorithms to obtain “trained models”. Once you feel comfortable with the results, you can turn that “training experiment” to a “Predictive Experiment”. From there on, your experiment will not be training but be predicting results based on user input.
To do so, you can publish the experiment as a web service, once you’ve published the web service, under the web services tab you can find your web service and run samples with it. There’s a manual input box dialog ( entry boxes here depend on the features you were using in your data samples), some documentation and REST API info for single query and BATCH query processing with the web service. Under batch you can even find sample code to connect to the published webservice.
From here on from any platform that can talk REST API, you can call the published webservice and get the results.
Find below the article about converting from training to predictive experiments
https://azure.microsoft.com/en-us/documentation/articles/machine-learning-walkthrough-5-publish-web-service/
Hope this helps!