Solr: Phrase search when indexed phrase is shorter than the query - search

Is it possible to find a document if the indexed field is a part of the queried phrase?
It it easy to find "Quick brown fox jumps over the lazy dog" when searching for "brown fox" or "lazy dog". But what if I need to do it vice versa?
Here's my situation: I have a short field in a document like "brown fox" or "lazy dog" and I want this document to be found by searching for longer phrases like "Quick brown fox" or "jump over lazy dog".
Note: it should be a phrase match, so making all the terms optional in the query wouldn't work. The query like "brown dog" SHOULD NOT match.
How would you do it in Solr? Is it possible to achieve this goal by only tweaking Solr, without necessity to parse and modify the requested phrase on the client side?

look at ShingleFilterFactory. If you apply that to the query side only, you could achieve what you are looking for.

Related

Sentence jumble with all possible combinations

I was wondering if it was possible to create a sentence jumble that would output all possible variations using all words from a given sentence using node.js
ie.
"The brown fox" would output to:
"Fox the brown"
"Brown fox the"
"The fox brown"
"Brown the fox"
"Fox brown the"
Thank you in advance.

Azure Cognitive Search Hit Highlighting full phrase

This is a question to Azure Cognitive Search team. Currently we are highlighting all of the terms that are in the phrase when we try to search for exact phrase or do a proximity search.
Received an email saying that from July 15 we will be able to get updated hit highlighting mechanism.
Now it will highlight only phrases that match the full phrase query.
https://learn.microsoft.com/ru-ru/azure/search/search-pagination-page-layout
So, will highlighting change only for exact phrase search? What about proximity search? Will we highlight only the phrase? Or it will work as it is working now - highlighting all terms from the phrase everywhere they occur?
I am an engineer on the Azure Search team. Users will have improved hit highlighting behavior as notified. One of the improvements is that highlighting with phrase and proximity queries is more accurate. So, even proximity queries will see a different behavior. The new highlights for the query "quick brown"~2 will look like this -
The <b>quick</b> <em>brown</em> fox jumped over the <em>quick</em> and <em>brown</em> dog but not the quick one that was not brown.

using lucene fuzzy search and synonyms with Azure Search

I want to be able to handle fuzzy search, as well as synonyms at the same time.
I tried it in a several ways, but I cannot get it working.
I have these values in my index:
white
black
light
dark
and this synonym rules:
white,light
black,dark
If I perform the query queryType=full&search=light or queryType=full&search=white, it always returns both values, light and white.
So synonyms are working.
If I perform the query queryType=full&search=light~1, then only light will be returned. But where is white?
Is the combination of fuzzy search and synonyms not possible yet, or am I doing something wrong?
Synonym expansions do not apply to wildcard search terms; prefix, fuzzy, and regex terms aren't expanded.
If you need to do a single query that applies synonym expansion and wildcard, regex, or fuzzy searches, you can combine the queries using the OR syntax.
For example, to combine synonyms with fuzzy search you would need to have query like this:
search=light~1|light
or
queryType=full&search=light OR light~1
if you're using Lucene query syntax

Using double quotes for exact matching - how to handle edge cases? (Ex. Odd number of quotes.)

Summary
It's been requested that we add exact matching to the search form on our site.
Below are examples of how we expect the search to operate.
Example 1
No exact match has been indicated by the user
Input: quick brown fox lazy dog
Output: matches for quick, brown, fox, lazy, and dog
Example 2
"brown" is the exact match indicated by the user
Input: quick "brown" fox lazy dog
Output: matches for quick, brown, fox, lazy, and dog (Same as example 1.)
Example 3
"brown fox" is the exact match indicated by the user
Input: quick "brown fox" lazy dog
Output: matches for quick, brown fox, lazy and dog
Example 4
"quick brown" and "lazy dog" are the exact matches indicated by the user
Input: "quick brown" fox "lazy dog"
Output: matches for quick brown, fox, and lazy dog
Question
Now we're trying to determine how edge cases should be handled. Specifically, how should we handle an odd number of quotes (see example 5) and how should we handle double quotes that occur in the middle of words (see example 6 and 7)?
Example 5
Odd number of double-quotes
Input: "quick brown" fox "lazy dog
Possible Output:
Matches for quick brown, fox, "lazy, and dog (Use spaces to delimit the search)
Matches for quick brown, fox, lazy, and dog (Ignore the odd double-quote)
Matches for quick brown, fox, and lazy dog (Add a closing double-quote after the last term)
Example 6
Even number of quotes, some of which interrupt a word
Input: qu"ick brown" fox lazy dog
Possible Output:
Matches for qu, ick brown, fox, lazy, and dog ?
(I couldn't think of another way to handle it, but I'm open to ideas.)
Example 7
Odd number of quotes, final quote interrupts a word
Input: "quick brown" fox la"zy dog
Possible Output:
Matches for quick brown, fox, la"zy, and dog (Use spaces to delimit the search)
Matches for quick brown, fox, lazy, and dog (Ignore the odd double-quote)
Matches for quick brown, fox, la, and zy dog (Add a closing double-quote after the last term)
Resources
I tried to search for resources that suggest best practices or common practices for these edge cases, but I wasn't able to find anything.
If you know of any resources on this, please let me know.

Solr edismax SearchHandler clarification

I am using edismax SearchHandler in my search and I have some issues in the search results. As I understand if the "defaultOperator" is set to OR the search query will be passed as -> The OR quick OR brown OR fox implicitly. However if I search for The quick brown fox, I get lesser results than explicitly adding the OR. Another issue is that if I search for The quick brown fox other documents that contain the word fox is not in the search results.
Thanks.
Make sure mm is 0%. Then the search should be OR.

Resources