We are trying to use the container preview of form recogniser,OCR and labeltool and have following questions:
Is there any software which can help us to classify similar kind of documents. This will help us to categorize document and create training dataset
Is there any way to give the model user-defined name. Following is output from model query API.It is difficult to tie it back to different kind of models:
{
"modelId": "f136f65b-bb94-493b-a798-a3e8023ea1b5",
"status": "ready",
"createdDateTime": "2020-05-06T21:35:58+00:00",
"lastUpdatedDateTime": "2020-05-06T21:36:06+00:00"
}
I can see models file stored in \output\subscriptions\global\models where /output directory shared container in docker compose file. Is it possible to import this model to new containers.
Models have json and gz file with the same nae as model id
I am also attaching docker compose file for your reference
Is there way to fine tune or update same custom model(same model id) with model training data
We were also trying the labeltool but it only takes Azure blob as input. Is it possible to provide input same as we do for training of form recognizer.
We are struggling to get this setup and if it is not resolved we might to start looking to alternatives.
Following are answers to your questions:
To classify documents you can use custom vision to build a document classifier or use text classification and OCR. In addition you can use the Form Recognizer train without labels run it on the training data and use the cluster option within the model to classify similar documents and pages in the training dataset.
Friendly Model name is not yet available in Form Recognizer, its a future feature on our roadmap but not available yet.
Models can't be copied between containers, you can use the same data-set to train a model in a different container. Models can be copied between subscriptions, resources and regions when using the Form Recognizer cloud service.
Each train creates a new model ID in order not to overwrite the previous model you can't update existing models.
Form Recognizer v2.0 release is not yet available in containers, only Form Recognizer v1.0 release is currently available in containers. Form Recognizer v2.0 will be also available in containers shortly. When using containers release all the data remains on premise and the labeling tool once available for the v2.0 containers release will also take as input a local or mounted disk and not blob.
Thanks !
Neta - MSFT
Related
Using Azure ML through the web UI. I'm doing a timeseries forecasting automl training job. In the explanations tab for a model, how can I upload the actual data for the forecast period to compare. See the red circled box in the image below.
We are currently developing test-set ingestion in the UI. However, currently there is no way to upload test data through the UI to populate these graphs. This experience can only be accessed by kicking off an explanation through the SDK with the test data. We refer to this as "Interpretability at inference time" and have some documentation on how to do this here: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-interpretability-aml#interpretability-at-inference-time
Test-set ingestion is scoped to land for private preview before end of June. Let's keep in touch to ensure you get early access here.
Thanks,
Sabina
i'm using Custom vision from Microsoft service to classify image. Since the model will have to be re train few times a years, I would like to know if I can save current version of azure custom vision model to re train my new model on the same version? because I guess microsoft will try to increase performances of its service among time so model used on this tools will probably change...
You can export the model after each run, but you cannot use an existing model as a starting point for another training run.
So yes, as it is a managed service, Microsoft might optimize or somehow change the algorithms to train in the background. It is on you to decide if that works for you. If not, a managed service like this is probably generally not something you should use, but instead train your own models entirely.
I am using Azure Cognitive Services, aka CustomVision website, to create, train and test models. I understand the main goal of this site is to create and API which can be called to run your model in production. I should mention I am using this to do object detection.
There are times when you have to support running offline (meaning you don't have a connection to Azure, etc...). I believe Microsoft knows and understands this because they have a feature which allows you to export your model in many different formats (such as TensorFlow, ONNX, etc...).
The issue I am having is particularly when you export to TensorFlow, which is what I need, it will only download the frozen model graph (model.pb). However, there are times when you need either the .pbtxt file that goes along with the model or the config file. I know you can generate a pbtxt file but for that you need the .config.
Also, there is little to no information about your model once you export it, such as what the input image size should be. I would like to see this better documented somewhere. For example, is it 300x300, etc... Without getting the config or pbtxt along with the model, you have to figure this out by loading your model into a TensorBoard or something similar to figure out the input information (size, name, etc..). Furthermore, we don't even know what the baseline of the model is, is it ResNet, SSD, etc...
So, anybody know how I can get these missing files when I export a model? Or, anybody know how you can generate a pbtxt when all you have is the frozen graph .pb file?
If not, I would recommend these as improvements for the Azure Cognitive services team. With all of this missing data or information, it is really hard to consume the exported model.
Thanks!
Many model architectures allow you to change the network input size, such as Yolo, which is the architecture exported from Custom Vision. Including a fixed input size does somewhere does not make sense in this case.
Netron will be your good friend and pretty easy to use to figure out the details about the model.
Custom Vision Service only exports compact domains.For object detection exports there is code to load and run the object detection model in the zip file downloaded(model.pb,labels.txt). Along with the the export model you will find Python code to exercise the model.
I'm evaluating tools for production ML based applications and one of our options is Spark MLlib , but I have some questions about how to serve a model once its trained?
For example in Azure ML, once trained, the model is exposed as a web service which can be consumed from any application, and it's a similar case with Amazon ML.
How do you serve/deploy ML models in Apache Spark ?
From one hand, a machine learning model built with spark can't be served the way you serve in Azure ML or Amazon ML in a traditional manner.
Databricks claims to be able to deploy models using it's notebook but I haven't actually tried that yet.
On other hand, you can use a model in three ways :
Training on the fly inside an application then applying prediction. This can be done in a spark application or a notebook.
Train a model and save it if it implements an MLWriter then load in an application or a notebook and run it against your data.
Train a model with Spark and export it to PMML format using jpmml-spark. PMML allows for different statistical and data mining tools to speak the same language. In this way, a predictive solution can be easily moved among tools and applications without the need for custom coding. e.g from Spark ML to R.
Those are the three possible ways.
Of course, you can think of an architecture in which you have RESTful service behind which you can build using spark-jobserver per example to train and deploy but needs some development. It's not a out-of-the-box solution.
You might also use projects like Oryx 2 to create your full lambda architecture to train, deploy and serve a model.
Unfortunately, describing each of the mentioned above solution is quite broad and doesn't fit in the scope of SO.
One option is to use MLeap to serve a Spark PipelineModel online with no dependencies on Spark/SparkContext. Not having to use the SparkContext is important as it will drop scoring time for a single record from ~100ms to single-digit microseconds.
In order to use it, you have to:
Serialize your Spark Model with MLeap utilities
Load the model in MLeap (does not require a SparkContext or any Spark dependencies)
Create your input record in JSON (not a DataFrame)
Score your record with MLeap
MLeap is well integrated with all the Pipeline Stages available in Spark MLlib (with the exception of LDA at the time of this writing). However, things might get a bit more complicated if you are using custom Estimators/Transformers.
Take a look at the MLeap FAQ for more info about custom transformers/estimators, performances, and integration.
You are comparing two rather different things. Apache Spark is a computation engine, while mentioned by you Amazon and Microsoft solutions are offering services. These services might as well have Spark with MLlib behind the scene. They save you from the trouble building a web service yourself, but you pay extra.
Number of companies, like Domino Data Lab, Cloudera or IBM offer products that you can deploy on your own Spark cluster and easily build service around your models (with various degrees of flexibility).
Naturally you build a service yourself with various open source tools. Which specifically? It all depends on what you are after. How user should interact with the model? Should there be some sort of UI or jest a REST API? Do you need to change some parameters on the model or the model itself? Are the jobs more of a batch or real-time nature? You can naturally build all-in-one solution, but that's going to be a huge effort.
My personal recommendation would be to take advantage, if you can, of one of the available services from Amazon, Google, Microsoft or whatever. Need on-premises deployment? Check Domino Data Lab, their product is mature and allows easy working with models (from building till deployment). Cloudera is more focused on cluster computing (including Spark), but it will take a while before they have something mature.
[EDIT] I'd recommend to have a look at Apache PredictionIO, open source machine learning server - amazing project with lot's of potential.
I have been able to just get this to work. Caveats: Python 3.6 + using Spark ML API (not MLLIB, but sure it should work the same way)
Basically, follow this example provided on MSFT's AzureML github.
Word of warning: the code as-is will provision but there is an error in the example run() method at the end:
#Get each scored result
preds = [str(x['prediction']) for x in predictions]
result = ",".join(preds)
# you can return any data type as long as it is JSON-serializable
return result.tolist()
Should be:
#Get each scored result
preds = [str(x['prediction']) for x in predictions]
#result = ",".join(preds)
# you can return any data type as long as it is JSON-serializable
output = dict()
output['predictions'] = preds
return json.dumps(output)
Also, completely agree with MLeap assessment answer, this can make the process run way faster but thought I would answer the question specifically
I played a bit around with Azure ML studio. So as I understand the process goes like this:
a) Create training experiment. Train it with data.
b) Create Scoring experiment. This will include the trained model from the training experiment. Expose this as a service to be consumed over REST.
Maybe a stupid question but what is the recommended way to get the complete experience like the one i get when I use an app like https://datamarket.azure.com/dataset/amla/mba (Frequently Bought Together API built with Azure Machine Learning).
I mean the following:
a) Expose 2 or more services - one to train the model and the other to consume (test) the trained model.
b) User periodically sends training data to train the model
c) The trained model/models now gets saved available for consumption
d) User is now able to send a dataframe to get the predicted results.
Is there an additional wrapper that needs to be built?
If there is a link documenting this please point me to the same.
The Azure ML retraining API is designed to handle the workflow you describe:
http://azure.microsoft.com/en-us/documentation/articles/machine-learning-retrain-models-programmatically/
Hope this helps,
Roope - Microsoft Azure ML Team
You need to take a look at Azure Data Factory.
I have written a Custom Activity to do the same.
And used the logic to retrain the model in the custom activity.