How to add our own ML model into Stream.io or Stream Framework - getstream-io

We are researching Stream.io and Stream Framework.
We want to build a high-volume feed with many producers( sources) that include highly personal messages (private messages?)
For building this feed and to make this relevant for all subcribers we will need to use our own ML model for the feed personalisation.
We found this as their solution for personalisation but this might scale badly to allow us to run and develop our own ML model
https://go.getstream.io/knowledge/volumes-and-pricing/can-i
Questions :
1. How do we integrate / add our own ML model for a Getstream-io feed ?
2. SHould we move more to the Stream Framework and how do we connect our own ML model to that feed solution ?
Thanks for pointing us in the right directions !

we have the ability to work with your team to incorporate ML models into Stream. The model has to be close to the data otherwise lag is an issue. If you use the Stream Framework, you're working with python and your own instance of cassandra, which we stopped using because of performance and scalability issues. If you'd like to discuss options, you can reach out via a form on our site.

Related

How to export a non PMML Algorithm for predicting the values in another JVM Process [duplicate]

I'm evaluating tools for production ML based applications and one of our options is Spark MLlib , but I have some questions about how to serve a model once its trained?
For example in Azure ML, once trained, the model is exposed as a web service which can be consumed from any application, and it's a similar case with Amazon ML.
How do you serve/deploy ML models in Apache Spark ?
From one hand, a machine learning model built with spark can't be served the way you serve in Azure ML or Amazon ML in a traditional manner.
Databricks claims to be able to deploy models using it's notebook but I haven't actually tried that yet.
On other hand, you can use a model in three ways :
Training on the fly inside an application then applying prediction. This can be done in a spark application or a notebook.
Train a model and save it if it implements an MLWriter then load in an application or a notebook and run it against your data.
Train a model with Spark and export it to PMML format using jpmml-spark. PMML allows for different statistical and data mining tools to speak the same language. In this way, a predictive solution can be easily moved among tools and applications without the need for custom coding. e.g from Spark ML to R.
Those are the three possible ways.
Of course, you can think of an architecture in which you have RESTful service behind which you can build using spark-jobserver per example to train and deploy but needs some development. It's not a out-of-the-box solution.
You might also use projects like Oryx 2 to create your full lambda architecture to train, deploy and serve a model.
Unfortunately, describing each of the mentioned above solution is quite broad and doesn't fit in the scope of SO.
One option is to use MLeap to serve a Spark PipelineModel online with no dependencies on Spark/SparkContext. Not having to use the SparkContext is important as it will drop scoring time for a single record from ~100ms to single-digit microseconds.
In order to use it, you have to:
Serialize your Spark Model with MLeap utilities
Load the model in MLeap (does not require a SparkContext or any Spark dependencies)
Create your input record in JSON (not a DataFrame)
Score your record with MLeap
MLeap is well integrated with all the Pipeline Stages available in Spark MLlib (with the exception of LDA at the time of this writing). However, things might get a bit more complicated if you are using custom Estimators/Transformers.
Take a look at the MLeap FAQ for more info about custom transformers/estimators, performances, and integration.
You are comparing two rather different things. Apache Spark is a computation engine, while mentioned by you Amazon and Microsoft solutions are offering services. These services might as well have Spark with MLlib behind the scene. They save you from the trouble building a web service yourself, but you pay extra.
Number of companies, like Domino Data Lab, Cloudera or IBM offer products that you can deploy on your own Spark cluster and easily build service around your models (with various degrees of flexibility).
Naturally you build a service yourself with various open source tools. Which specifically? It all depends on what you are after. How user should interact with the model? Should there be some sort of UI or jest a REST API? Do you need to change some parameters on the model or the model itself? Are the jobs more of a batch or real-time nature? You can naturally build all-in-one solution, but that's going to be a huge effort.
My personal recommendation would be to take advantage, if you can, of one of the available services from Amazon, Google, Microsoft or whatever. Need on-premises deployment? Check Domino Data Lab, their product is mature and allows easy working with models (from building till deployment). Cloudera is more focused on cluster computing (including Spark), but it will take a while before they have something mature.
[EDIT] I'd recommend to have a look at Apache PredictionIO, open source machine learning server - amazing project with lot's of potential.
I have been able to just get this to work. Caveats: Python 3.6 + using Spark ML API (not MLLIB, but sure it should work the same way)
Basically, follow this example provided on MSFT's AzureML github.
Word of warning: the code as-is will provision but there is an error in the example run() method at the end:
#Get each scored result
preds = [str(x['prediction']) for x in predictions]
result = ",".join(preds)
# you can return any data type as long as it is JSON-serializable
return result.tolist()
Should be:
#Get each scored result
preds = [str(x['prediction']) for x in predictions]
#result = ",".join(preds)
# you can return any data type as long as it is JSON-serializable
output = dict()
output['predictions'] = preds
return json.dumps(output)
Also, completely agree with MLeap assessment answer, this can make the process run way faster but thought I would answer the question specifically

Productionizing Spark Pipeline [duplicate]

I'm evaluating tools for production ML based applications and one of our options is Spark MLlib , but I have some questions about how to serve a model once its trained?
For example in Azure ML, once trained, the model is exposed as a web service which can be consumed from any application, and it's a similar case with Amazon ML.
How do you serve/deploy ML models in Apache Spark ?
From one hand, a machine learning model built with spark can't be served the way you serve in Azure ML or Amazon ML in a traditional manner.
Databricks claims to be able to deploy models using it's notebook but I haven't actually tried that yet.
On other hand, you can use a model in three ways :
Training on the fly inside an application then applying prediction. This can be done in a spark application or a notebook.
Train a model and save it if it implements an MLWriter then load in an application or a notebook and run it against your data.
Train a model with Spark and export it to PMML format using jpmml-spark. PMML allows for different statistical and data mining tools to speak the same language. In this way, a predictive solution can be easily moved among tools and applications without the need for custom coding. e.g from Spark ML to R.
Those are the three possible ways.
Of course, you can think of an architecture in which you have RESTful service behind which you can build using spark-jobserver per example to train and deploy but needs some development. It's not a out-of-the-box solution.
You might also use projects like Oryx 2 to create your full lambda architecture to train, deploy and serve a model.
Unfortunately, describing each of the mentioned above solution is quite broad and doesn't fit in the scope of SO.
One option is to use MLeap to serve a Spark PipelineModel online with no dependencies on Spark/SparkContext. Not having to use the SparkContext is important as it will drop scoring time for a single record from ~100ms to single-digit microseconds.
In order to use it, you have to:
Serialize your Spark Model with MLeap utilities
Load the model in MLeap (does not require a SparkContext or any Spark dependencies)
Create your input record in JSON (not a DataFrame)
Score your record with MLeap
MLeap is well integrated with all the Pipeline Stages available in Spark MLlib (with the exception of LDA at the time of this writing). However, things might get a bit more complicated if you are using custom Estimators/Transformers.
Take a look at the MLeap FAQ for more info about custom transformers/estimators, performances, and integration.
You are comparing two rather different things. Apache Spark is a computation engine, while mentioned by you Amazon and Microsoft solutions are offering services. These services might as well have Spark with MLlib behind the scene. They save you from the trouble building a web service yourself, but you pay extra.
Number of companies, like Domino Data Lab, Cloudera or IBM offer products that you can deploy on your own Spark cluster and easily build service around your models (with various degrees of flexibility).
Naturally you build a service yourself with various open source tools. Which specifically? It all depends on what you are after. How user should interact with the model? Should there be some sort of UI or jest a REST API? Do you need to change some parameters on the model or the model itself? Are the jobs more of a batch or real-time nature? You can naturally build all-in-one solution, but that's going to be a huge effort.
My personal recommendation would be to take advantage, if you can, of one of the available services from Amazon, Google, Microsoft or whatever. Need on-premises deployment? Check Domino Data Lab, their product is mature and allows easy working with models (from building till deployment). Cloudera is more focused on cluster computing (including Spark), but it will take a while before they have something mature.
[EDIT] I'd recommend to have a look at Apache PredictionIO, open source machine learning server - amazing project with lot's of potential.
I have been able to just get this to work. Caveats: Python 3.6 + using Spark ML API (not MLLIB, but sure it should work the same way)
Basically, follow this example provided on MSFT's AzureML github.
Word of warning: the code as-is will provision but there is an error in the example run() method at the end:
#Get each scored result
preds = [str(x['prediction']) for x in predictions]
result = ",".join(preds)
# you can return any data type as long as it is JSON-serializable
return result.tolist()
Should be:
#Get each scored result
preds = [str(x['prediction']) for x in predictions]
#result = ",".join(preds)
# you can return any data type as long as it is JSON-serializable
output = dict()
output['predictions'] = preds
return json.dumps(output)
Also, completely agree with MLeap assessment answer, this can make the process run way faster but thought I would answer the question specifically

How to implement BOT engine like WIT.AI for on an on-premise solution?

I want to build a chatbot for a customer service application. I tried SaaS services like Wit.Ai, Motion.Ai, Api.Ai, LUIS.ai etc. These cognitive services find the "intent" and "entities" when trained with the typical interactions model.
I need to build chatbot for on-premise solution, without using any of these SaaS services.
e.g Typical conversation would be as following -
Can you book me a ticket?
Is my ticket booked?
What is the status of my booking BK02?
I want to cancel the booking BK02.
Book the tickets
StandFord NLP toolkit looks promising but there are licensing constraints. Hence I started experimenting with the OpenNLP. I assume, there are two OpenNLP tasks involved -
Use 'Document Categorizer' to find out the intent
Use 'Named Entity Recognition' to find out entities
Once the context is identified, I will call my application APIS to build the response.
Is it a right approach?
How good OpenNLP is in parsing the text?
Can I use Facebook FASTTEXT library for Intent identification?
Is there any other open source library which can be helpful in building the BOT?
Will "SyntaxNet" be useful for my adventure?
I prefer to do this in Java. BUT open to node or python solution too.
PS - I am new to NLP.
Have a look at this. It says it is an Open-source language understanding for bots and a drop-in replacement for popular NLP tools like wit.ai, api.ai or LUIS
https://rasa.ai/
Have a look at my other answer for a plan of attack when using Luis.ai:
Creating an API for LUIS.AI or using .JSON files in order to train the bot for non-technical users
In short use Luis.ai and setup some intents, start with one or two and train it based on your domain. I am using asp.net to call the Cognitive Service API as outlined above. Then customize the response via some JQuery...you could search a list of your rules in a javascript array when each intent or action is raised by the response from Luis.
If your Bot is english based, then I would use OpenNLP's sentence parser to dump the customer input into a database (I do this today). I then use the OpenNLP tokenizer and push the keywords (less the stop words) and Parts of Speech into a database table for keyword analysis. I have a custom Sentiment model built for OpenNLP that will tag each sentence with a Pos, Neg, Neutral sentiment...You can then use this to identify negative customer service feedback. To build your own Sentiment model have a look at SentiWord.net and download their domain agnostic data file to build and train an OpenNLP model or have a look at this Node version...
https://www.npmjs.com/package/sentiword
Hope that helps.
I'd definitely recommend Rasa, it's great for your use case, working on-premise easily, handling intents and entities for you and on top of that it has a friendly community too.
Check out my repo for an example of how to build a chatbot with Rasa that interacts with a simple database: https://github.com/nmstoker/lockebot
I tried RASA, But one glitch I found there was the inability of Rasa to answer unmatched/untrained user texts.
Now, I'm using ChatterBot and I'm totally in love with it.
Use "ChatterBot", and host it locally using - 'flask-chatterbot-master"
Links:
ChatterBot Installation: https://chatterbot.readthedocs.io/en/stable/setup.html
Host Locally using - flask-chatterbot-master: https://github.com/chamkank/flask-chatterbot
Cheers,
Ratnakar
With the help of the RASA and Botkit framework we can build the onpremise chatbot and the NLP engine for any channel. Please follow this link for End to End steps on building the same. An awsome blog that helped me to create a one for my office
https://creospiders.blogspot.com/2018/03/complete-on-premise-and-fully.html
First of all any chatbot is going to be the program that runs along with the NLP, Its the NLP that brings the knowledge to the chatbot. NLP lies on the hands of the Machine learning techniques.
There are few reasons why the on premise chatbots are less.
We need to build the infrastructure
We need to train the model often
But using the cloud based NLP may not provide the data privacy and security and also the flexibility of including my business logic is very less.
All together going to the on premise or on cloud is based on the needs and the use case of the requirements.
How ever please refer this link for end to end knowledge on building the chatbot on premise with very few steps and easily and fully customisable.
Complete On-Premise and Fully Customisable Chat Bot - Part 1 - Overview
Complete On-Premise and Fully Customisable Chat Bot - Part 2 - Agent Building Using Botkit
Complete On-Premise and Fully Customisable Chat Bot - Part 3 - Communicating to the Agent that has been built
Complete On-Premise and Fully Customisable Chat Bot - Part 4 - Integrating the Natural Language Processor NLP
Disclaimer: I am the author of this package.
Abodit NLP (https://nlp.abodit.com) can do what you want but it's .NET only at present.
In particular you can easily connect it to databases and can provide custom Tokens that are queries against a database. It's all strongly-typed and adding new rules is as easy as adding a method in C#.
It's also particularly adept at turning date time expressions into queries. For example "next month on a Thursday after 4pm" becomes ((((DatePart(year,[DATEFIELD])=2019) AND (DatePart(month,[DATEFIELD])=7)) AND (DatePart(dw,[DATEFIELD])=4)) AND DatePart(hour,[DATEFIELD])>=16)

Using real time data with Azure machine learning studio?

I’ve started experimenting with the Azure ML studio and started playing with templates, upload data into it and immediately start working with it.
The problem is, I can’t seem to figure out how to tie these algorithm to real time data. Can I define a data source to input or can I configure the Azure ML studio in a way that it runs on data that I’ve specified?
Azure ML studio is for experimenting to find a proper solution to the problem set you have. You can upload data to sample, split and train your algorithms to obtain “trained models”. Once you feel comfortable with the results, you can turn that “training experiment” to a “Predictive Experiment”. From there on, your experiment will not be training but be predicting results based on user input.
To do so, you can publish the experiment as a web service, once you’ve published the web service, under the web services tab you can find your web service and run samples with it. There’s a manual input box dialog ( entry boxes here depend on the features you were using in your data samples), some documentation and REST API info for single query and BATCH query processing with the web service. Under batch you can even find sample code to connect to the published webservice.
From here on from any platform that can talk REST API, you can call the published webservice and get the results.
Find below the article about converting from training to predictive experiments
https://azure.microsoft.com/en-us/documentation/articles/machine-learning-walkthrough-5-publish-web-service/
Hope this helps!

Azure ML App - Complete Experince - Train automatically and Consume

I played a bit around with Azure ML studio. So as I understand the process goes like this:
a) Create training experiment. Train it with data.
b) Create Scoring experiment. This will include the trained model from the training experiment. Expose this as a service to be consumed over REST.
Maybe a stupid question but what is the recommended way to get the complete experience like the one i get when I use an app like https://datamarket.azure.com/dataset/amla/mba (Frequently Bought Together API built with Azure Machine Learning).
I mean the following:
a) Expose 2 or more services - one to train the model and the other to consume (test) the trained model.
b) User periodically sends training data to train the model
c) The trained model/models now gets saved available for consumption
d) User is now able to send a dataframe to get the predicted results.
Is there an additional wrapper that needs to be built?
If there is a link documenting this please point me to the same.
The Azure ML retraining API is designed to handle the workflow you describe:
http://azure.microsoft.com/en-us/documentation/articles/machine-learning-retrain-models-programmatically/
Hope this helps,
Roope - Microsoft Azure ML Team
You need to take a look at Azure Data Factory.
I have written a Custom Activity to do the same.
And used the logic to retrain the model in the custom activity.

Resources