Train multiple models with various measures and accumulate predictions - azure

So I have been playing around with Azure ML lately, and I got one dataset where I have multiple values I want to predict. All of them uses different algorithms and when I try to train multiple models within one experiment; it says the “train model can only predict one value”, and there are not enough input ports on the train-model to take in multiple values even if I was to use the same algorithm for each measure. I tried launching the column selector and making rules, but I get the same error as mentioned. How do I predict multiple values and later put the predicted columns together for the web service output so I don’t have to have multiple API’s?

What you would want to do is to train each model and save them as already trained models.
So create a new experiment, train your models and save them by right clicking on each model and they will show up in the left nav bar in the Studio. Now you are able to drag your models into the canvas and have them score predictions where you eventually make them end up in the same output as I have done in my example through the “Add columns” module. I made this example for Ronaldo (Real Madrid CF player) on how he will perform in match after training day. You can see my demo on http://ronaldoinform.azurewebsites.net
For more detailed explanation on how to save the models and train multiple values; you can check out Raymond Langaeian (MSFT) answer in the comment section on this link:
https://azure.microsoft.com/en-us/documentation/articles/machine-learning-convert-training-experiment-to-scoring-experiment/

You have to train models for each variable that you going to predict. Then add all those predicted columns together and get as a single output for the web service.
The algorithms available in ML are only capable of predicting a single variable at a time based on the inputs it's getting.

Related

Darknet Yolov3 - Custom training on pre-trained model

Actually in darknet yolov3 model has coco.names file for labels which include 80 classes.
Now if I want to train a custom model with two labels only, where one label is already there in coco.names and another is not there.
For example I want to train a model to detect for cell phone and dslr camera, so cell phone class already exist in coco.names whereas dslr camera is not there in its labels file.
So can I train custom model using two classes cell phone and dslr camera and give data of only dslr camera for training and it will predict for both dslr camera and cell phone or shall I train with both data of cell phone and dslr images or is there any other way out.
I am a bit new to ML, so any help would be great
Thanks
So you want to fine tune a pre-trained model.
You need to think of classes by just being a set of end nodes of a network, the labels (phone, camera) are just a naming convention for them, and to give us visual guidance.
These nodes are fully connected (with associated weights) to the previous layer of the network, the total number of these intermediate connections varies depending on the number of end nodes (classes) you have.
With the fully trained model, you can't just select the nodes you want, and take out the rest, and add a few more. Because the previous layer (and full network) was trained to give estimates/predictions taking into account a certain number of final nodes.
So basically you need to give a full reset on the last layer (the head), and restart it with the desired number of classes. The idea here, is that you take advantage of the previous training effort on a broader dataset, and fine tune it to your desired data.
Short answer, you need data for both, and need to change the model to accept 2 classes only.
To configure that specific model for the new number of classes and data, I believe you can find some guidance and instructions here

when to normalize data with zscore (before or after split)

I was taking a udemy course, which made a strong case for normalizing only the train data (after the split from test data) since the model will typically used by fresh data, with features of the scale of the original set. And if you scale the test data, then you are not scoring the model properly.
On the other hand, what I found was that my two-class logistic regression model (created with Azure Machine Learning Studio) was getting terrible results after Z-Score scaling only the train data.
a. Is this a problem only with Azure's tools?
b. What is a good rule of thumb for when feature data needs to be scaled (one, two, or three orders of magnitude in difference)?
Not scoring the model properly due to normalized test set doesn't seem to make sense:
you would presumably also normalize data that you use for predictions in the future.
I found this similar question in datascience stackexchange and the top answer suggests not only that test data has to be normalized, but you need to apply the exact same scaling as you have done to the training data, because the scale of your data is also taken into account by your model: differently scaled test/prediction data would potentially lead to over/under-exaggeration of a feature.

Form Recognizer Labeling - Traning model

I am trying to use Azure Form Recognizer with Labeling tool to train and extract text out of images.
As per the documentation:
First, make sure all the training documents are of the same format. If you have forms in multiple formats, organize them into subfolders based on common format. When you train, you'll need to direct the API to a subfolder. (https://learn.microsoft.com/en-us/azure/cognitive-services/form-recognizer/quickstarts/label-tool#set-up-input-data)
In my case I have different formatted images. I can create different projects, label images, train them and get expected output.
Challenge in my case is, if I follow this approach I need to create different projects, train it separately and maintain several model ids.
So I just wanted to know is there any way where we can train different formats together as a single training model? Basically I want to know if we can use single model Id to extract key-value pair out of different formatted images?
This is a feature that has been asked for by a few customers. We are working on a solution for this, expecting it to arrive in a few months. For now, we suggest you to train models separately and maintain multiple model IDs.
If these are only a few different types (e.g., 2-4), and they are easily distinguishable, you can also try training them all together. For that to work, though, you'll need to label more files, and results are still likely not going to be as good as separate models.
For trying that, put approximately the same number of images for each type all in same folder, and label altogether.
If there are many different types, this is not likely to work.

Design consideration for "recurrent" neural network with Keras and Tensorflow

I am currently working on my grade project. The main goal of this project is to build a AI aided inventory system. My current approach is to use a neural network to "teach" the system how to behave according to its current state (the current inventory) and a pseudorandom demand. The inputs of the net should be the current inventory and the network should output how much the company should order of a given product to minimize both sales losses due to lacking inventory and inventory level (the model should not output crazy amounts because keeping inventory costs money). The outputs from one run should be used to compute the inputs of the next run based on current inventory, current demand and orders(the outputs themselves). The problem I am facing is that I only need an initial state (inventory at time 0) to train the model, but I do not know any ways to set up Keras like this. Maybe someone could point me to a useful resource to take on this problem. To me it looks like I need to define a custom loss function and maybe create a custom layer to the system (I know to to define the function and the inventory update rule, buy I don't know how to tell Keras what to do with them).
I don't think that you need to create custom layers or custom losses.
IMO the steps you shoul follow are:
Create a dataset where this time correlation between inventory at time t and t - 1 inventory is present.
Decalare a lookback hyperparameter (How many inventories you want to analyse in a single sequence?)
Then you can feed this data to an RNN in order to solve the regression problem (For example a loss for regression is mse).
Check this explainatory notebook written by François Chollet on how to work with RNNs.

Google Prediction API for FAQ/Recommendation system

I want to build automated FAQ system where user can ask some questions and based on the questions and their answers from the training data, the application would suggest set of answers.
Can this be achieved via Prediction API?
If yes, how should I create my training data?
I have tested Prediction API for sentiment analysis. But having doubts and confusion on using it as FAQ/Recommendation system.
My training data has following structure:
"Question":"How to create email account?"
"Answer":"Step1: xxxxxxxx Step2: xxxxxxxxxxxxx Step3: xxxxx xxx xxxxx"
"Question":"Who can view my contact list?"
"Answer":"xxxxxx xxxx xxxxxxxxxxxx x xxxxx xxx"
train your data like input is question and output is answer
when you are sending a question as a input to predict it can give output of your answer.
simple faq you will rock.
but if you completed in PHP Help me too man.
In order to use the Prediction API, you must first train it against a set of training data. At the end of the training process, the Prediction API creates a model for your data set. Each model is either categorical (if the answer column is string) or regression (if the answer column is numeric). The model remains until you explicitly delete it. The model learns only from the original training session and any Update calls; it does not continue to learn from the Predict queries that you send to it.
Training data can be submitted in one of the following ways:
A comma-separated value (CSV) file. Each row is an example consisting of a collection of data plus an answer (a category or a value) for that example, as you saw in the two data examples above. All answers in a training file must be either categorical or numeric; you cannot mix the two. After uploading the training file, you will tell the Prediction API to train against it.
Training instances embedded directly into the request. The training instances can be embedded into the trainingInstances parameter. Note: due to limits on the size of an HTTP request, this would only work with small datasets (< 2 MB).
Via Update calls. First an empty model is trained by passing in empty storageDataLocation and trainingInstances parameters into an Insert call. Then, the training instances are passed in using the Update call to update the empty model. Note: since not all classifiers can be updated, this may result in lower model accuracy than batch training the model on the entire dataset.
You can have more information in this Help Center article.
NB: Google Prediction API client library for PHP is still in Beta.

Resources