How is #sys.ignore assigned to Dialogflow entity - dialogflow-es

In exported json files of Dialogflow agent I noticed that my training phrase is split into 2 json objects and one of them has #sys.ignore meta property. I know that entity can be defined as system entity starting with #sys. but I don't know what #sys.ignore is and how can it be assigned or unassigned
Example of intent with training phrase split:
"data": [
{
"text": "cleaning of ",
"userDefined": false
},
{
"text": "machine part",
"meta": "#sys.ignore",
"userDefined": false
}
I also checked here
https://dialogflow.com/docs/reference/system-entities
and here
https://cloud.google.com/dialogflow-enterprise/docs/reference/system-entities
but no luck

I had the same query and I followed up with Dialogflow team for the same, here the response from them:
#sys.ignore is used to ignore matches from the ML with entities.
#sys.ignore may have been added while you were editing your training
phrases and removing a highlighted phrase or word.
So, #sys.ignore will force Dialogflow to prevent it from getting matched to any of the entities.
In my experience, it is generally added when Dialogflow annotates some entity in the training phrases and I manually removes it.
Hope it helps.

Related

Entity with excluded values in Dialogflow

I want to create an entity that has any value except the values that are defined in another entity.
For example, i have an entity that contains all the possible products categories that i use in the bot, and if the user type a value that is not in that entity i want to react in some way.
It's like a fallback but only triggered when that condition is met.
Any suggestion?
Entity extraction is based on some definite value that can be identified and separated. There should be some basic features defined for the agent to train on. Based on these trained features, the agent will look for an entity and extract it from the user's response.
If you have already defined an entity to look for, it will be extracted by the Dialogflow based on the training data. If there is nothing defined it will not be identified as an entity as the agent will be not sure what to look for.
So, what you can do is,
Make the entity (already defined) as not required. Uncheck the "required" checkbox in the Dialogflow.
Add the "#sys.any" in the Entity you defined and make it a composite Entity with the combination of your Entity and "#sys.any" something in the line of
Train your agent to look for this new Entity with your Basic Entity data and Anything else data.
Collect this in the webhook.
OR
You when you want to collect anything else, you can collect user utterance from the agent object and parse the data using Regex pattern of your choice.

dialogflow ambiguity with same synonyms for different entity values

I have an issue developing an agent with dialogflow (api.ai). I am using a lot of entity values which are all different from one another. however there are similar synonyms for some entity values but the agent is returning only one value.
How can i get all the possible matches or ask question to resolve the ambiguity
for example i have an intent like: tell me the location of ABC express train
if my entity values are :
entity synonym
15127 ABC express
12345 ABC express
I want it to return two values or ask question to resolve such ambiguity
how can i work this out
Thanks in advance
If you enable fulfillment for this intent, you can take a look at the value the user said and ask a further question if you need to disambiguate between entities.
Let's imagine you are extracting an entity called "trains". The parameters table in your intent might look like this:
By default, if the user says ABC express, the fulfillment webhook will be called with the following parameter hash:
"parameters": {
"trains": "15127"
}
This isn't enough information to decide if the request was ambiguous, since train 15127 might also have non-ambiguous synonyms.
You can configure Dialogflow to send the original text of the entity, alongside the resolved value. This means you will receive the following information to your webhook:
"parameters": {
"trains": "15127",
"original": "ABC express"
}
You can then use some simple logic to ask a further question if the value of original appears in a list of known ambiguous synonyms.
To have Dialogflow send this data, modify your parameters table so it looks like the following:
This will cause the original synonym to be sent to Dialogflow alongside the resolved value.

Discovery does not allow nullable date field to be indexed

I am trying to index JSON data in discovery. The issue comes with a date fields. It seems like that discovery is sensing data types. In my case these date fields might be empty in some cases. Is there a way to over ride this data type detection in discovery and let it allow only sense as String while indexing. Please clarify.
Soumitra
What you can do (well, assuming you have enough control over the JSON), is omit the date field for documents which have no date. For example these two documents will work together in a single Discovery collection.
{
"title": "Document With Date",
"text": "Discovery detects date types to support range queries, sorting and more.",
"updated": "2018-04-26T10:11:12Z"
}
{
"title": "Undated Document",
"text": "Discovery has no trouble with fields that appear in some documents and not others."
}

LUIS pattern feature does not detect my entity

I want to detect an entity from an intent, that might be written in several ways:
AB 123456
AB 123 456
AB123456
AB is an option from a closed list, and the rest should be detected as number(s). As long as there is a space between AB and the number, the list item and the number(s) are detected/resolved.
I discovered the pattern feature and hoped this could help out the parser.
Unfortunately, the entity can still not be recognized. I tried to write the pattern in different ways, with no success.
^([A-Za-z]{1,2})([0-9 ]+)$
([a-z]{2})([\d ]+)
[a-z]{2}[\d ]+
Any idea?
AB is an option from a closed list, and the rest should be detected as number(s). As long as there is a space between AB and the number, the list item and the number(s) are detected/resolved.
If "AB" is from a closed list entity, then you should create a simple entity for the numbers and a composite entity to hold the two together. The simple entity would need to be trained with a few utterances, e.g. "123456", "123 789", "456789", "201731" and then the model (with the help of a RegExp pattern for your pattern feature, [\d]{6}) should be able to handle the rest.
You would employ your list entity and newly created simple entity as children of a composite entity. One that might be something like "Product" or something better than that.
I think you might be thinking that a closed list is Machine learned, but it is not. It is directly matched against utterances, so if your list entity had the following:
canonicalForm: "ProductId"
synonyms: "AB", "BA", "AB 123456"
And the utterance "BA 123456" was processed by the model, the LUIS model would recognize "BA" as a "ProductId" and not recognize "123456" at all. By extension, the entire utterance would not be recognized as one "ProductId".
The problem only exists when there is no whitespace. If I train with "AB123456" I can't label just "AB". I'm only able to select the whole word.
Luis only analyses the text and help you to get the required endpoints and context for your work. It will not perform programming activity like reading a certain portion of the string and assigned to an entity... at least not yet

Azure Search: Searching for singular version of a word, but still include plural version in results

I have a question about a peculiar behavior I noticed in my custom analyzer (as well as in the fr.microsoft analyzer). The below Analyze API tests are shown using the “fr.microsoft” analyzer, but I saw the same exact behavior when I use my “text_contains_search_custom_analyzer” custom analyzer (which makes sense as I base it off the fr.microsoft analyzer).
UAT reported that when they search for “femme” (singular) they expect documents with “femmes” (plural) to also be found. But when I tested with the Analyze API, it appears that the Azure Search service only tokenizes plural -> plural + singular, but when tokenizing singular, only singular tokens are used. See below for examples.
Is there a way I can allow a user to search for the singular version of a word, but still include the plural version of that word in the search results? Or will I need to use synonyms to overcome this issue?
Request with “femme”
{
"analyzer": "fr.microsoft",
"text": "femme"
}
Response from “femme”
{
"#odata.context": "https://EXAMPLESEARCHINSTANCE.search.windows.net/$metadata#Microsoft.Azure.Search.V2016_09_01.AnalyzeResult",
"tokens": [
{
"token": "femme",
"startOffset": 0,
"endOffset": 5,
"position": 0
}
]
}
Request with “femmes”
{
"analyzer": "fr.microsoft",
"text": "femmes"
}
Response from “femmes”
{
"#odata.context": "https://EXAMPLESEARCHINSTANCE.search.windows.net/$metadata#Microsoft.Azure.Search.V2016_09_01.AnalyzeResult",
"tokens": [
{
"token": "femme",
"startOffset": 0,
"endOffset": 6,
"position": 0
},
{
"token": "femmes",
"startOffset": 0,
"endOffset": 6,
"position": 0
}
]
}
You are using the Analyze API which uses text analyzers, that is not the same as searching using the Search API.
Text analyzers are what supports the search engine when building the indexes that is really what is at the bottom of a search engine. In order to structure a search index the the documents that goes in there needs to be analyzed, this is where the Analyzers come in. They are the ones that can understand different languages and can parse a text and make sense of if, i.e. splitting up words, removing stop words, understand sentences and so on. Or as they put it in the docs: https://learn.microsoft.com/en-us/rest/api/searchservice/language-support
Searchable fields undergo analysis that most frequently involves word-breaking, text normalization, and filtering out terms. By default, searchable fields in Azure Search are analyzed with the Apache Lucene Standard analyzer (standard lucene) which breaks text into elements following the "Unicode Text Segmentation" rules. Additionally, the standard analyzer converts all characters to their lower case form.
So what you are seeing is actually perfectly right, the french analyzer breaks down the word you send in and returns possible tokens from the text. For the first text it cannot find any other possible tokens than 'femme' (I guess there are no other words like 'fem' or 'femm' in French?), but for the second one it can find both 'femme' and 'femmes' in there.
So, what you are seeing is a natural function of a text analyzer.
Searching for the same text using the search API on the other hand should return documents with both 'femme' and 'femmes' in, if you have set the right analyzer (for instance fr.microsoft) for the searchable field. The default 'standard' analyzer does not handle pluralis and other inflections of the same word.
Just to add to yoape's response, the fr.microsoft analyzer reduces inflected words to their base form. In your case, the word femmes is reduced to its singular form femme. All cases that you described will work:
Searching with the base form of a word if an inflected form was in the document. Let's say you're indexing a document with Vive with Femmes. The search engine will index the following terms: vif, vivre, vive, femme, femmes.If you search with any of these terms e.g., femme, the document will match.
Searching with an inflected form of a word if the base form was in the document.
Let's say you're indexing a document with teext Femme fatale. The search engine will index the following terms: femme, fatal, fatale.If you search with term femmes, the analyzer will produce also its base form. Your query will become femmes OR femme. Documents with any of these terms will match.
Searching with an inflected from if another inflected form of that word was in the document. If you have a document with allez, terms allez and aller will be indexed. If you search for alle, the query becomes alle OR aller. Since both inflected forms are reduced to the same base form the document will match.
The key learning here is that the analyzer processes the documents but also query terms. Terms are normalized accounting for language specific rules.
I hope that explains it.

Resources