LUIS pattern feature does not detect my entity - bots

I want to detect an entity from an intent, that might be written in several ways:
AB 123456
AB 123 456
AB123456
AB is an option from a closed list, and the rest should be detected as number(s). As long as there is a space between AB and the number, the list item and the number(s) are detected/resolved.
I discovered the pattern feature and hoped this could help out the parser.
Unfortunately, the entity can still not be recognized. I tried to write the pattern in different ways, with no success.
^([A-Za-z]{1,2})([0-9 ]+)$
([a-z]{2})([\d ]+)
[a-z]{2}[\d ]+
Any idea?

AB is an option from a closed list, and the rest should be detected as number(s). As long as there is a space between AB and the number, the list item and the number(s) are detected/resolved.
If "AB" is from a closed list entity, then you should create a simple entity for the numbers and a composite entity to hold the two together. The simple entity would need to be trained with a few utterances, e.g. "123456", "123 789", "456789", "201731" and then the model (with the help of a RegExp pattern for your pattern feature, [\d]{6}) should be able to handle the rest.
You would employ your list entity and newly created simple entity as children of a composite entity. One that might be something like "Product" or something better than that.
I think you might be thinking that a closed list is Machine learned, but it is not. It is directly matched against utterances, so if your list entity had the following:
canonicalForm: "ProductId"
synonyms: "AB", "BA", "AB 123456"
And the utterance "BA 123456" was processed by the model, the LUIS model would recognize "BA" as a "ProductId" and not recognize "123456" at all. By extension, the entire utterance would not be recognized as one "ProductId".

The problem only exists when there is no whitespace. If I train with "AB123456" I can't label just "AB". I'm only able to select the whole word.
Luis only analyses the text and help you to get the required endpoints and context for your work. It will not perform programming activity like reading a certain portion of the string and assigned to an entity... at least not yet

Related

Action and parameters in dialogeflow

When I created A Parameter and I assigned it to te entity called #sys.unit-information-name,
I keep getting this warning: *The annotated text 'diabetes' in training phrase 'show me results on diabetes' does not correspond to entity type '#sys.unit-information-name'. And as a result my chatbot does not give me the right result, basically it will just keep asking the question under the Prompts under the Action and parameters.
If I remove this parameter it works. But I want to use the parameter & Entity
Any Help
Thanks
The problem you are facing is because you are explicitly creating a parameter and adding the system entity #sys.unit-information-name. This entity is used to refer to information about units as per system Entities documentation.
Since you are referring to a disease “diabetes” as per your expression “show me results on diabetes”, this entity cannot be used to map diabetes or any other disease also. There is no system Entities for Diseases, so it's better that we create a custom entity defining most of the disease names.
When giving the training Phrases if the word matches with values defined in entity, then entity mapping for that word will be done and the response will be generated.

Dialogflow regex entity similar to #sys.any

I have many intents that extract a parameter that could be almost anything. An example would be a company name. Lots of variation there: "VWR", "1-800-Flowers", "#1 Mufflers". This list can include names in many languages.
I'm using the #sys.any entity now but it doesn't work well if the text includes numbers or punctuation. I get this for the parameter for example: "1 - 800 - Flowers". There are spaces around the numbers and punctuation.
I was expecting the Regex entity to solve my problem but on save it throws and error saying its too broad. \S+[\s\S]*\S+ will catch anything in any language. Here's the error: "com.google.apps.framework.request.BadRequestException: Validate entity with entityName 'RegexAny' and entityId '149486a3-7a49-4171-b23c-860f7d47b713' failed because of the following reasons: Regular expression match is too broad: \S+[\s\S]*\S+."
How can I get around this unhelpful restriction and capture the user's input just as they typed id?
I've had this problem happen to me as well. What I do is use the #sys.any parameter and do the regex check in my fulfillment code. Here you can remove any punctuations and spaces. If you decide to do it this way I'd recommend removing any output contexts and setting them programmatically if you find a match with that regex. If there's no match I will set the same context as the input contexts for that intent.
This works wonderfully.

How do I create a Dialogflow custom entity that works like #sys.airport?

Since #sys.airport only exists for the default English locale, I want to create a custom entity that emulates it for other locales.
From what I've read here, you can put subentity types into the value fields, say, the system entity #sys.geo-city:city and a custom entity #usr.iata-code:iata, and it will match either one or the other.
But I don't understand how you would tell Dialogflow which city and which IATA code go together, so that Dialogflow (ES) would know to send the complete object {"city":"Amsterdam", "iata": "AMS"} to the webhook after matching either "Amsterdam" or "AMS", as it does happen with #sys.airport.
Thanks for any input!
It will be difficult to create a custom entity that works just like #sys.airport. The #sys entities are special and can do somethings custom entities can't, for instance, pairing values together.
As you pointed out, you can put multiple entities together in one single entity by using Composite Entities, but the only thing this does is allow you to recognize two values made up from other #sys or custom entities in a single entity. It doesn't give you the option to create pairs between the values of the entities.
If you would want to create something like this, you would need some code that does a look up in a dictionary or list. So when "AMS" is matched, the code fills in the missing property "Amsterdam" or vice versa.

Dialogflow Agent regex entities definition

I have created an agent in dialogflow, for which I want to define an entity based on regex values, I know we have regex capability in defining the entities, but I don't know how to use it or how to define regex while defining the entity. There are no examples or blogs available to help me with this. I want to see an example or syntax of how to define regex entities so that I can replicate the same for my case. Any help will be highly appreciated.
Try this. Go to the Entity page. Create a new Entity an call it whatever you want. In the entity screen select regex and enter this value [A-Za-z]{3}[0-9]{7,10}$. Save the Entity. This regex will validate any value that begins with three letters and 7 to 10 character. Example PAP1234567 or DWL123456789.
Now go to an, intent or create one, and on the training phrases add one that says:
My number is PAP12345678. Select the PAP12345678 to highlighted and the entities menu will appeared. Select the new regex entity and save.
Test the intent on DialogFlow. Hope this help.

Implementing search : Identifying known keywords

I have implemented search functionality for my e-Commerce website using elastic search. The basic structure is like, each product has a title and whatever the user enters I search the exact string using elastic and return the result.
Now I notice that most of the search phrases (almost 90%) follow a similar pattern. It contains:
Brand name of the product (Apple, Nokia etc.)
Category of the product (phone, mobile phone, smartphone etc.)
Model name of the product (iPhone 6S, Lumia 950 etc.)
Now I think if I am able to identify the specific components, then I can return better results than just text match.
I have list of brands, categories and models. If i am able to identify the terms present, then I can request elasticsearch with that field specifically
For example, a search string of "Apple iPhone 5S", I should be able to deduce that brand=Apple.
EDIT: More details as asked in comments
Structure of document:
I have a single index and each document ID is the SKU of the product and it contains the following fields
title (Apple iPhone 5S)
brand (Apple)
categ (Electronics)
sub_categ (Smartphones)
model (iPhone 5S)
attribs (dictionary of product attributes particular to each sub_categ like {"color": "gold", "memory": "32 GB", "battery": "1570 mAh"})
price
Use Case:
Now when the user searches for phrase "iphone 5s battery", elastic returns search results which returns even the phone. (I agree the relevance score matches better for battery)
What I am trying to achieve is, I have master list of sub categories. So if any word from the search phrase is present in the master list, then i would search on elasticsearch with query ["must": {"sub_categ": "battery"}]. So the result from "Smartphones" sub category would not be fetched from elastic. I wish to replicate this across multiple fields like brand, category etc
My question is, how do I find if brand or any other particular word from the master list if present in the search phrase quickly? The only option i could think of is, looping through the master list and check if the word is present in the search phrase. If present, then keep note of it and do the same across all master list field (brand, categ, sub categ). Then generate the query with must and then querying them. I wish to know if there is a better way of accomplishing it.
The person in the Lucene world who has spoken the most on this topic is Ted Sullivan. (He calls this "auto-filtering", and has a component which does this available for Solr)
I realize you're using Elasticsearch, but Ted's component works by introspecting FieldCache data (exposed by Lucene) so should be possible to implement something very similar with Elasticsearch (look at the code).
There is also a discussion in this article about how to create a separate index for providing pre-query intelligence like you've described (e.g. your term "Apple" is most frequently found in the company field).

Resources