How to use ngrams matching with Solr

How to use ngrams matching with Solr - search

I am learning solr. I want to use ngrams in Solr. For example:
If a document contains new york car driver , that document should not return for the following queries:
/select?q=york
/select?q=new
/select?q=new car
but it should return for the following queries
/select?q=new york
/select?q=car
/select?q=driver
/select?q=car driver
( it should consider New York as a single word for better results.There are word sequences that need consider as single word. eg:-New York,Tom Cruise,etc. These words are predefined; all other words should be treated as normal )
How can I achieve this using Solr search?

The first try should be put the quotes around the term like "new York" and try.
This would be your second try.
Change the Tokenizer from StandardTokenizerFactory to KeywordTokenizerFactory.
After the change reindex data and query again.
The third option is use of StrField type which cannot have any tokenization or analysis and will only give results for exact matches.
The StrField type is not analyzed, but indexed/stored verbatim.
<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>

Related

Lucene search with Stop words in PhraseQuery doesn't return results

I have problem while searching with stop words!(that, at, the etc.)
I am using StandardAnalyzer to Index a text like "Surname At Birth".
When I Search (with StandardAnalyzer) using PhraseQuery with the phrase "Surname at birth" I do not receive any results.
This is the code where I create PhraseQuery:
foreach(var word in search.Trim().Split(' '))
{
phraseQuery.Add(new Term("content", word.ToLowerInvariant()));
}
I am adjusting the slop when there are no results.
For example if I search for "Surname birth" I have results with "Surname at birth" inside.
It is like the StopWord "at" prevents the results

Stopwords must also be removed at query time. If you don't do that then any query that requires that word to be present will not match any documents. Stopword removal is done at analysis time, analysis should also be performed on the query to get the terms that will be searched. (This is also needed for stemming, case-insensitivity.)
After analysys, a query like "Surname At Birth" (with quotes) will be converted to a PhraseQuery "Surname * Birth" (with a "hole" in the middle, using the "position increment" attribute on the first token).
I assume you are using Lucene.NET so check the docs at https://lucenenet.apache.org/docs/4.8.0-beta00009/api/queryparser/overview.html to use a query parser for your query.

Azure Search filter on the whole field

I've been trying to create a filter matching the end of the whole field text.
For example, taking a text field with the text: the brown fox jumped over the lazy dog
I would like it to match with a query that searches for fields with values ending with g. Something like:
{
"search":"*",
"queryType":"full",
"searchMode": "any",
...
"filter":"search.ismatchscoring('/g$/','MyField')"
}
The result is only records where MyField contains values with words composed by a the single g character anywhere on the string.
Using the filter directly also produces no results:
{
"search":"*",
"queryType":"full",
"searchMode": "any",
...
"filter":"MyField eq '*g'"
}
As far as I can see, the tokenization will always be the base for the search and filter, which means that on the above query, $ is completely ignored and matches will be by word, not by field.
Probably I could use the keyword_v2 analyzer on this field but then I would lose the tokenizarion that I use when searching normally.

One possible solution could be defining a second field in your index, with the same value as ‘MyField’, but with a different analyzer (e.g. keyword_v2). That way you may still search over the original field while filtering over the other.
Regardless, you might have simplified the filter for the sake of the example, but otherwise it seems redundant to use search.ismatchscoring() when not combining with another filter clause via ‘or’ – one can use the search parameter directly.
Moreover, regex might not be working because the default queryType for search.ismatchscoring() is simple, not full - please see docs here

How to match text ignoring comma or any other special character

I want on query "HollandPark" or "Holland Park, Notting Hill" or "holland park notting hill" match this text "Holland Park, Notting Hill". How can I do this in mongodb?

To make this most performant and to cover the most cases, you should really have a normalized field preprocessed with location names. For instance you can have a field holland_park_notting_hill and match that by tokenizing the input as well. But if you are doing this sort of work, you might as well index your data in Elasticsearch and get a lot more powerful matchers.

Azure Search: Keyword tokenizer don't work with multi word search

I have a fields in index with [Analyzer(<name>)] applied. This analyzer is of type CustomAnalyzer with tokenizer = Keyword. I assume it treats both field value and search text as one term each. E.g.
ClientName = My Test Client (in index, is broken into 1 term). Search term = My Test Client (broken in 1 term). Result = match.
But surprisingly that's not the case until I apply phrasal search (enclose term in double quotes). Does anyone know why? And how to solve it? I'd rather treat search term as the whole, then do enclosing
Regards,
Sergei.

This is expected behavior. Query text is processed first by the query parser and only individual query terms go through lexical analysis. When you issue a phrase query, the whole expression between quotes is treated as a phrase term and as one goes through lexical analysis. You can find a complete explanation of this process here: How full text search works in Azure Search.

Azure Search - Find matches within a word like "contains"

I use Azure Search which in turn uses Lucene. Is there any way to make search not that strict.
What I need is when searching for "term" should match documents with terms that contain "term".
Serching fox term should match "PrefixTerm", "TermSuffix", "PrefixTermSuffix"
Serching fox part2 should match "part1part2", "part2part3", "part1part2part3"
I need to run search query which has several terms like
"term part2"
To match documents like:
{ someField:"... PrefixTermSuffix ... part1part2part3 ..." }
{ someField:"... PrefixTerm ... part2part3 ..." }
etc

You can use regex expression in Lucene query syntax in Azure Search. In your example, you can construct a regex query like /.term./ /.part2./ to find the documents with terms that contains the two search terms as substrings.
https://[service name].search.windows.net/indexes/[search index]/docs?api-version=2016-09-01&queryType=full&search=/.*term.*/ /.*part2.*/
Azure Search supports two query syntaxes, simple and full. The latter enables the Lucene query syntax. Please see our documentation (https://learn.microsoft.com/en-us/rest/api/searchservice/lucene-query-syntax-in-azure-search) to learn more about the capabilities.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to use ngrams matching with Solr - search

Related

Lucene search with Stop words in PhraseQuery doesn't return results

Azure Search filter on the whole field

How to match text ignoring comma or any other special character

Azure Search: Keyword tokenizer don't work with multi word search

Azure Search - Find matches within a word like "contains"

Categories

Resources