Solr: Searching actual items vs related items - search

We use solr for our product search. When using solr we have the need for handling "Is a" vs "Relates to". For example, right now if I search for "knife" I get knife sharpeners much higher than I would like given that we carry so many actual knives. Given that both products have "knife" in the name (when comparing '8" chef's knife' with 'Electric knife sharpener') and they both exist in a knife category, it's very difficult in our current setup to differentiate that we want the chef knife to have a higher score for this search term.
We have an approach for this which is some sort of label tied to a product when it is categorized and put on the site that it relates to another category. So for example, the cutting boards category relates to the knives category, but the knives category is the "master" category in that case. We could achieve the differentiation we want in that case, but that requires a lot of labor and management on the merchandisers' end.
I'm curious if there is a functionality in solr that I'm not aware of that would take care of this or if it's just a matter of us needing to tweak the way we store things.
Here's an example of an 8" chef knife doc vs an electric knife sharpener doc
8 inch knife:
"productId": 9071,
"productName": "8\" Chinese Chef's Knife",
"text": [
"8\" Chinese Chef's Knife",
"Update International",
"KCC-8",
"Chinese Chef's Knives"
],
"productName_exact": "8\" Chinese Chef's Knife",
"manuf": "Update International",
"baseSku": "KCC-8",
"sku": [
"KCC-8"
],
"modelTypeDesc": "Chinese Chef's Knives",
"manufId": 74,
"categories": [
"Chef's Knives",
"Chinese Chef's Knives",
"Knife Sale"
],
"type": "Product",
"popularity": 4301,
"displayName": "Update International (KCC-8) - 8\" Chinese Chef's Knife",
electric knife sharpener:
"productId": 3267,
"productName": "Edlund Electric Knife Sharpener",
"text": [
"Edlund Electric Knife Sharpener",
"Edlund",
"395",
"Electric Knife Sharpeners"
],
"productName_exact": "Edlund Electric Knife Sharpener",
"manuf": "Edlund",
"baseSku": "395",
"sku": [
"395"
],
"modelTypeDesc": "Electric Knife Sharpeners",
"manufId": 22,
"categories": [
"Electric Knife Sharpeners",
"Knife Sharpeners"
],
"type": "Product",
"popularity": 53,
"displayName": "Edlund (395) - Edlund Electric Knife Sharpener",
You'll see a "popularity" field on there which I thought about using, but the problem there is, given that that field is based off how well something sells, it's possible that an accessory for something might sell better than the item itself, but the term should still match the item itself first.
Thanks for the help.

So I figured out a good way of doing this for those that want to know. I realized that I can assume with pretty good certainty that if a person is searching for a knife his search term will END with the word knife. If he's searching for a knife sharpener, his search term will end with the word sharpener.
Given this I was able to make a field that only indexed the last word of our product name. So in the case of the knife it would just be "knife". In the case of the knife sharpener it would just be "sharpener".
I then indexed that field and then queried against it with a much higher boost value than just the standard product name field. The important thing here is I broke up the user's search term and used only the last word. I used a pretty strict keyword tokenizer and analysis to make sure that it only matched in almost exact cases. It only has lowercase, hunspell, and synonym filters on it.
This automatically achieved exactly what I was looking for. The only caveat is that if you don't have the product properly named, it won't show up where you expect it. A good example would be a "knife set" rather than "knife". They are both knives but one won't show up as high when searching knife. But one could also argue that is working as intended.

Related

does IS LIST works with #sys.any in dialogflow fulfillment?

Is it possible to use IS LIST with #sys.any
I have checked the IS LIST for games and language and IS LIST works well with language because it takes input as #sys.language and gives me different languages given by user, but games takes input as #sys.any
example-
Bot: which language do you know?
User: English, French and Chinese
Bot: Your favorite games?
User: Cricket, Football and Chess
responses In case of Language
"parameters": {
"langName": [
"English",
"French",
"Chinese",
],
responses In case of Games
"parameters": {
"games": "Cricket, Football and Chess",
How to get different values in case of Games as getting in language.
It isn't possible to use lists with #sys.any because, literally, anything and everything matches it, including the separators.
In order to create a list of games, you will need to create an Entity Type of Games and include the list of games you accept in it.

Dialogflow matches irrelevant phrases to existing intents

I created a chatbot which informs the user about the names of the members of my (extended) family and about where they are the living. I have created a small database with MySQL which has these data stored and I fetch them with a PHP script whenever this is appropriate depending on the interaction of the user with the chatbot.
For this reason, I have created two intents additionally to the Default Fallback Intent and to the Default Welcome Intent:
Names
Location_context
The first intent ('Names') is trained by phrases such as 'What is the name of your uncle?' and has an output context. The second intent ('Location_context') is trained by phrases such as 'Where is he living?', 'Where is he based?', 'Where is he located?' 'Which city does he live in?' etc and has an input context (from 'Names').
In general, this basic chatbot works well for what it is made for. However, my problem is that (after the 'Names' intent is triggered) if you ask something nonsensical such as 'Where is he snowing?' then the chatbot will trigger the 'Location_context' intent and the chatbot will respond (as it is defined) that 'Your uncle is living in New York'. Also let me mention that as I have structured the chatbot so far this kind of responses are getting a score higher than 0.75 which is pretty high.
How can I make my chatbot to trigger the Default Fallback Intent in these nonsensical questions (or even in more reasonable questions such as 'Where is he eating?' which are not however exactly related with the 'Location context' intent) and not trigger intents such as the 'Location_context' which simply contain some similar keywords to it such as the word 'Where'?
Try playing around with ML CLASSIFICATION THRESHOLD in your agent settings (Settings > ML Settings). By default it comes with a very low score (0.2), which is a little aggressive.
Define the threshold value for the confidence score. If the returned
value is less than the threshold value, then a fallback intent will be
triggered or, if there is no fallback intents defined, no intent will
be triggered.
You can see the score for your query in the JSON response:
{
"source": "agent",
"resolvedQuery": "Which city does he live at?",
"metadata": {
"intentId": "...",
"intentName": "Location_context"
},
"fulfillment": {
"speech": "Your uncle is living in New York",
"messages": [{
"type": 0,
"speech": "Your uncle is living in New York"
}]
},
"score": 0.9
}
Compare the scores between the right and wrong matches and you will have a good idea of which confident score is the right one for your agent.
After changing this settings, let it train, try again, and adjust it until it meets your needs.
Update
For queries that still will get a high score, like Where is he cooking?, you could add another intent, custom fallback, to handle those false positives, maybe with a custom entity: NonLocationActions, and use the template mode (#) in user expressions.
where is he #NonLocationActions:NonLocationActions
which city does he #NonLocationActions:NonLocationActions
So these queries will get 1 score in the new custom fallback, instead of getting 0.7 in the location intent.
I am working on a chatbot using dialogflow and am getting similar problems.
Our test manager invented the 'Sausage Test' where she replaces certain words in the question with the word sausage and our bot fell apart! Even with a threshold of 0.8 we still regularly hit issues where intents fire for nonsensical sentences, and with an enterprise level chatbot that is giving out product installation advice we could not afford to get it this wrong.
We found that in some cases we were getting max confidence levels (1) for clearly dodgy 'sausaged' input.
The way we have got round this issue is to back all the answers onto an API and use the confidence score in conjunction with other tests. For example we have introduced Regular Expression tests to check for keywords in the question, together with parameter matching (making sure that key entity parameters were also being passed through in the data from DialogFlow).
More recently we have also started to include in the reply a Low Confidence sentence at the start of the reply i.e. 'I think you are asking about XYZ, but if not please rephrase your question. Here is your answer'. We do this when all our extra tests fail and we have a threshold between 0.8 and 0.98.

LUIS does not recognize names with spaces

So I got a bot built with Microsoft Bot Framework and it's using the LUIS API for text recognition. With this bot, I'm able to ask about information about different devices that I got in my backend. They got names like Desk, Desk 2 and Phone Booth 4. The first and second name works just fine but whenever I send a name that contains 2 spaces or more, LUIS will fail to recognize it. I have added all the names to a feature list on LUIS but it doesn't seem to do anything. When I'm in the bot code executes the method for that intent, the entity is just null whenever I send this kind of names. Any idea how I might solve this? As I described, names with just one space like Desk 2 works just fine. Maybe there is a way to save multiple words as an entity inside LUIS?
In the image below, the top entry is "show me phone booth 4" and the bottom one "show me desk 2".
It'll take a little leg work, but have you tried updating your model programmatically?
On the LUIS API reference, you can label individual utterances or do it in batches. The benefit of doing it this way is that you can select what should be recognized as an entity based on index position.
Example:
{
"text": "Book me a flight from Cairo to Redmond next Thursday",
"intentName": "BookFlight",
"entityLabels":
[
{
"entityName": "Location::From",
"startCharIndex": 22,
"endCharIndex": 26
},
{
"entityName": "Location::To",
"startCharIndex": 31,
"endCharIndex": 37
}
]
}
I admit I haven't attempted to do this before, but I do not see how labeling/training this way would logically fail.
One thing I do note about your entities is that they're composed of an item and also a number. You could throw them into a composite entity; but in this case doing it the way I mentioned above is a good way to do what you're looking for.
That said, if you plan on using the office-furniture-pieces(?) as entities for a separate intent, say, 'PurchaseNewOfficePieces', it might pay to create use a composite entity for 'Desk 2' and 'Phone Booth 4'.

How to specify tax in Gmail Schema.org markup?

According to this documentation I can specify tax with priceSpecification. And syntax must be for example something like this (delivery charge example):
"priceSpecification": [
{
"#type": "DeliveryChargeSpecification",
"price": "10.00",
"priceCurrency": "USD"
}
]
The only problem is that there is no documentation about which #type for the "tax" must be specified, like in this example with delivery charge DeliveryChargeSpecification.
I tried to find it there and on Schema.org with no success. Any suggestions how is possible to specify tax charge information with Gmail markup?
The vocabulary Schema.org provides three sub-types for the PriceSpecification type:
DeliveryChargeSpecification
PaymentChargeSpecification
UnitPriceSpecification
Unless one of these applies in your case, you should use the parent type,
PriceSpecification.
(If it’s about a price that includes VAT, you can use the corresponding type and its valueAddedTaxIncluded property, which Google seems to support for their email markup.)

Solr spatial search on multivalued location field

Is there a way in Solr to know the matching address within a document on a mutlivalued location field and not just the document. For example, i might have "Store A" with locations "100 Main Street, NY, 00001" and "100 NotMainStreet, NY, 00010" which are 40 miles apart. When I search for "00002" assuming "100 Main Street, NY, 00001" matches, I want result to somehow indicate location "100 Main Street, NY" of "Store A" matches.
I tried Solr2155 patch for Solr3.5 but I could not find anything in there that can achieve this, I checked out documentation of new Solr4 Spatial Search module but I dint find anything useful there as well. Please help, thank you

Resources