Google prediction API - Training data syntax for multi classification - web

Trying to harness the power of Google Prediction API, to classify my data. Each item in my DB can have multi categories assign to it.
For example: "My Nexus phone is rebooting constantly" could be assigned both #Android and #troubleshooting tags.
I would like to upload my training data to Google, but I'm not sure how to apply both tags to the same content. In the following example I've found the syntax that provide one category for each content like so:
"Android" ,"My Nexus phone is rebooting constantly"
What is the right syntax for multi-classification training data?

Unless I'm misunderstanding something from your question, I think the answer to it is in the docs here.
Namely, the section about text strings explains that when you submit a text string, the system actually cuts it into multiple strings, separating everything using whitespaces as a delimiter. They point out to "Godzilla vs Mothra" to be "Godzilla", "vs", and "Mothra". So in your case, you could just use "Android troubleshooting". The system will separate it in "Android" and "troubleshooting".

From the docs:
Each line can only have one label assigned, but you can apply multiple labels to one example by repeating an example and applying different labels to each one. For example:
"excited", "OMG! Just had a fabulous day!"
"annoying", "OMG! Just had a fabulous day!"
If you send a tweet to this model, you might get a classification something like this: "excited":0.6, "annoying":0.2.

Related

Programmatically populate a multi-select field in SharePoint Online via the m365 spo tool

I am trying to create a list item with a multiselect field, according to the m365 spo documentation. It feels like I have tried all possible variations, but I cannot get it to work at all.
Is there any official guide as to the syntax for a multi-select value?
Ok, for whomever bangs their heads against this in the future, the format for the seeds of a multi-lookup field is the following:
Id;Value;#
Where ;# is the separator. You may use # as a wildcard for a value, ending up with the following example where I am adding the related entities with Id 3 and 5 to the seed:
3;#;#5;#
or one single entity:
1;#
or three entities:
2;#;#3;#;#5;#
There seems to be a tiny bit of tolerance on the trailing value, but I did not experiment much with this.
PS: it works, and I am very happy, but dear reader know this: if you feel the need for some eye-bleach after reading this, you are not alone!

Azure form recognizer does not identifies any Keys

I'm using the Microsoft custom model API for the form recognizer, I test it first with the example they have in this link:
https://learn.microsoft.com/en-us/azure/cognitive-services/form-recognizer/quickstarts/label-tool
The problem I have now, is that for any other form, that is not the one in the example, the recognizer does not recognizes properly any key-value pair.
E.G.
For the below form:
I get the response as:
Where any of the values is mapped to its key. E.G. for "Receiving Officer" the value should be "Ramon" but instead I'm getting them as token_2 and token_5, which is information I can not use.
It is suspicious to me, that this happens for all the forms I have tried, aside from the example.
Can you please try to train with labels follow this quick start and see if it extracts the values you need - https://learn.microsoft.com/en-us/azure/cognitive-services/form-recognizer/quickstarts/label-tool
try out site - https://fott.azurewebsites.net/
How do you get the response as this looks like the train without labels response where text which is not associated with a key is outputted as Tokens.

Approach for Text mining on a file and assigning category

Need help in deciding an algorithmic approach where the text is read line by line the text contains description of incident ticket, one reading each row it should assign a category to that incident type using a set of keywords association already decided ...for example if the description contain words like password(s) then it should assign it as a category password issue.
Kindly help
You can try bag-of-words, or document vectors.
If there are spelling errors, you’ll need fuzzymatching techniques.
You’ll want to clean stop words beforehand as well.
Good luck.

SentenceSplitter in GATE

I am trying to detect Sentences using GATE and more specifically using either ANNIE SentenceSplitter or RegexSentenceSplitter.
RegexSentenceSplitter seems to be working very well, however the only problem is that a new sentence annotation is being created at the beginning of each new page of the document. (The documents analysed are PDFs).
Is it possible to change this behavior of the RegexSentenceSplitter?
You can probably try to use a conditional corpus pipeline. This method allows you to run PR (here the RegExSentenceSplitter) or not according to the value of a feature on the document.
More details here: https://gate.ac.uk/sale/tao/splitch3.html#x6-480003.8.2

Can I use parameters with Twist concepts?

The Twist documentation for extracting concepts shows how multiple steps can be grouped into one step that contains those steps. For instance, the following eight fixtures
1. Start at the Maintain product catalog page.
2. The page title should be “Joe’s musical —Maintain Product Catalog.”
3. Click the Add New Instrument button.
4. The page title should be “Joe’s musical—Add New Musical Instrument.”
5. Enter text “Guitar” into the Instrument field.
6. Select “Slide” from the Type selection list.
7. Select “Dobro” from the Brand selection list.
8. Click the Save button.
Can be condensed into one concept:
1. Add a New Musical Instrument “Guitar” of type “Slide” and brand “Dobro”
However, the tutorial doesn't say if it's possible to use this concept with other parameters (perhaps with "Drum" instead of "Guitar"). However, it does clearly say that parameters in the concept name should be surrounded by quotes, but they also should match the parameter name, so it's not clear if it's possible.
So can I use parameters with Twist concepts?
Yes! The documentation is really crummy about making this clear, but it is absolutely possible.
If you extracted a concept in the way that they described in the tutorial you referenced and others, then the fixture Add a New Musical Instrument “Guitar” of type “Slide” and brand “Dobro” actually contains three parameters named Guitar, Slide, and Dobro. What makes this so confusing is in each scenario you can change the value of each parameter to whatever you like (perhaps "Drum", "Snare", "Yamaha"), but under the hood, the variables are still called by their original names (thus Guitar=Drum, etc.) and these original names will appear as default values whenever you add the concept to a scenario.
To eliminate confusion, I recommend changing these default names. In this case, it might be Add a New Musical Instrument “Instrument” of type “Type” and brand “Brand”. Bizarrely, you can't rename the parameters via "Rephrase the Open Concept" because you run into a catch-22 situation. You can't change the name of the concept because it doesn't match the usage in the fixture. And you can't rename the fixtures because the parameters are bound to the concept name. So I recommend just opening it up in the text editor and making the change there.
So bottom line, the examples make it seem like you can't use parameters because the parameters wind up being named after whatever value you inputted. I recommend changing the default parameter names, but you have to do it in the text editor because Twist won't let you.

Resources