I can see how I can view training data - particularly unanswered questions is what I need. However, I would like to export this data for someone else (without access to the system) to review. Is there a way I can export training data to any format?
Related
I just finished training a Custom Azure Translate Model with a set of 10.000 sentences. I now have the options to review the result and test the data. While I already get a good result score I would like to continue training the same model with additional data sets before publishing. I cant find any information regarding this in the documentation.
The only remotely close option I can see is to duplicate the first model and add the new data sets but this would create a new model and not advance the original one.
Once the project is created, we can train with different models on different datasets. Once the dataset is uploaded and the model was trained, we cannot modify the content of the dataset or upgrade it.
https://learn.microsoft.com/en-us/azure/cognitive-services/translator/custom-translator/quickstart-build-deploy-custom-model
The above document can help you.
I have created few models in ML and saved them for future use in predicting the outcomes. This time there is a common scenario but unseen for me.
I need to provide this model to someone else to test it out on their dataset.
I had removed few redundant columns from my training data, trained a regression model on it and saved it after validating it. However, when I give this model to someone to use it on their dataset, how do I tell them to drop few columns. I could have manually added the column list in a python file where saved model will be called from but that does not sound too neat.
What is the best way to do this in general. Kindly share some inputs.
One can simply use pickle library to save column list and other things along with the model. In the new session, one can simply use pickle to upload those things in the session again.
I've been working with Dialogflow for several months now - really enjoy the tool. However, it's very important to me to know if my changes (in training) are making intent mapping accuracy better or worse over time. I've talked to numerous people about the best way to do this using Dialogflow and I've gotten at least 4 different answers. Can you help me out here? Here are the 4 answers I've received on how to properly train/test your agent. Please note that ALL of these answers involve generating a confusion matrix...
"Traditional" - randomly split all your input data into 80/20% - use the 80% for training and the 20% for testing. Every time you train your agent (because you've collected new input data), start the process all over again - meaning randomly split all your input data (old and new) into 80/20%. In this model, a piece of training data for one agent iteration might be used as a piece of test data on another iteration - and vice-versa. I've seen variations on this option (KFold and Monte Carlo).
"Golden Set" - similar to the above except that the initial 80/20% training and testing sets that you use to create your first agent continue to grow over time as you add more input data. In this model, once a piece of input data has been tagged as training data, it will NEVER be used as testing data - and once a piece of input data has been tagged as testing data, it will NEVER be used as training data. The 2 initial sets of training and testing data just continue to grow over time as new inputs are randomly split into the existing sets.
"All Data is Training Data - Except for Fake Testing Data" - In this model, we use all the input data as training data. We then copy a portion of the input data (20%) and "munge it" - meaning that we inject characters or words into the 20% portion - and use that as test data.
"All Data is Training Data - and Some of it is Testing Data Also" - In this model, we use all the input data as training data. We then copy a portion of the input data (20%) and use it as testing data. In other words, we are testing our agent using a portion of our (unmodified) training data. A variation on this option is to still use all your inputs as training data but sort your inputs by "popularity/usage" and take the top 20% for testing data.
If I were creating a bot from scratch, I'd simply go with option #1 above. However, I'm using an off-the-shelf product (Dialogflow) and it isn't clear to me that traditional testing is required. Golden Set seems like it will (mostly) get me to the same place as "traditional" so I don't have a big problem with it. Option #3 seems bad - creating fake testing data sounds problematic on many levels. And option #4 is using the same data to test as it uses to train - which scares me.
Anyway, would love to hear some thoughts on the above from the experts!
Background information:
I did an Data Mining experiment where I used historical data of customer purchases as case table for my mining structure. The second data set (prospective buyers) is used for testing.
Now I want to implement the same scenario in Azure Machine Learning (Studio). However, I could not figure it out how I can use one data set to be used for training and a different data set to be used for testing.
Furthermore, I'd like to ask if it is possible to use a data set for training the model but after deploying the model to a web service, to limit the input fields to certain columns?
The historical data set contains 12 columns that I want to use for training the model. However, I want only 9 of those columns to be required as input when testing via the deployed model.
I hope I made myself clear and that everything is understandable. If not, please ask me anything you want.
Kind regards,
lja
However, I could not figure it out how I can use one data set to be used for training and a different data set to be used for testing.
You can do it like this:
Please note both data sets should have the same columns!
The historical data set contains 12 columns that I want to use for training the model. However, I want only 9 of those columns to be required as input when testing via the deployed model.
The model (and the generated web service) needs the columns you trained with to feed the model. If those other 3 columns are not required, just leave it empty.
If you have your own application consuming the web service, you can just ask the input fields you want, and send empty values behind the scene.
I want to build automated FAQ system where user can ask some questions and based on the questions and their answers from the training data, the application would suggest set of answers.
Can this be achieved via Prediction API?
If yes, how should I create my training data?
I have tested Prediction API for sentiment analysis. But having doubts and confusion on using it as FAQ/Recommendation system.
My training data has following structure:
"Question":"How to create email account?"
"Answer":"Step1: xxxxxxxx Step2: xxxxxxxxxxxxx Step3: xxxxx xxx xxxxx"
"Question":"Who can view my contact list?"
"Answer":"xxxxxx xxxx xxxxxxxxxxxx x xxxxx xxx"
train your data like input is question and output is answer
when you are sending a question as a input to predict it can give output of your answer.
simple faq you will rock.
but if you completed in PHP Help me too man.
In order to use the Prediction API, you must first train it against a set of training data. At the end of the training process, the Prediction API creates a model for your data set. Each model is either categorical (if the answer column is string) or regression (if the answer column is numeric). The model remains until you explicitly delete it. The model learns only from the original training session and any Update calls; it does not continue to learn from the Predict queries that you send to it.
Training data can be submitted in one of the following ways:
A comma-separated value (CSV) file. Each row is an example consisting of a collection of data plus an answer (a category or a value) for that example, as you saw in the two data examples above. All answers in a training file must be either categorical or numeric; you cannot mix the two. After uploading the training file, you will tell the Prediction API to train against it.
Training instances embedded directly into the request. The training instances can be embedded into the trainingInstances parameter. Note: due to limits on the size of an HTTP request, this would only work with small datasets (< 2 MB).
Via Update calls. First an empty model is trained by passing in empty storageDataLocation and trainingInstances parameters into an Insert call. Then, the training instances are passed in using the Update call to update the empty model. Note: since not all classifiers can be updated, this may result in lower model accuracy than batch training the model on the entire dataset.
You can have more information in this Help Center article.
NB: Google Prediction API client library for PHP is still in Beta.