Solr Query - Set a threshold match percentage for my search - search

I am using Solr - Lucene 4.0. I am trying to run a query to search a field called Names.
An example of a query would be:
Names:George
When I execute the search with the amount of rows to return to 1000 it returns 1000 results. I expect it to return way less than that. The last results aren't similar at all. Is there a way to set a threshold for my results so that it only returns matches of a certain similarity?

actually you cant not create a minimum matching score. because the matching score is relative and depends on a lot of things (ex. number of overall documents, number of matching terms found).
i do not know what is your case exactly. but you may consider use paging. like get results in 20 document at a time and check the score of the last document and then stop if its lower than a threshold that you specify.

Related

Customize azure search scoring in a specific way

Consider a scenario where all documents have following fields
The requirement is that for email the score should be either 100 (if exact match) or 0.
For remaining fields, it is 0 to 100 based on edit distance .
Suppose in an index the records are like the following
1.abcd#gmail.com,Peterr,Parker,Developer
2.xyz#yahoo.com,Steve,Smith,Manager
The query is made on fuzzy search of all the fields and parameters are like
abcd#gmail.com,Pet,Par,Devl
The search result should have a score for first record like
score for email + score of last name +score of first name+score of title
=100+50(approx edit distance of 'Peterr and Pet')+50(approx edit distance of 'Peterr and Parker')+44(approx edit distance of 'Devl and Developer')
=244
Similarly ,the search result should have a score in similar way.
I just checked Azure search scoring has weights but those I don't think would be of much helpful in scenarios like this .The main thing we are looking for is to find a way where the search score returned for each record by Azure search would be in accordance with the score I discussed above
To clarify, it seems what you need is the scoring formula to be a function of the edit distance between the query term and the indexed term - the shorter the distance, the higher the score. Unfortunately, this is not possible in Azure Search.
Azure Search engine executes the search query in two phases: retrieval and scoring.
During retrieval search query terms processed by the lexical analyzer are looked up in the inverted index. Documents that had those terms are returned. When you use fuzzy search we expand your search query by adding terms from the inverted index that are within edit distance from a given query term - fuzzy expansion. This way your query can match more documents.
During scoring we assign a relevance score to retrieved documents using the Lucene scoring formula. This formula is based on TF/IDF. Practically, it means that documents that matched terms that are rare will be ranked higher up in the results set.
It's important to know that the Lucene scoring formula only applies to documents that matched the original query terms and terms added through fuzzy expansion. Documents that matched terms added through prefix expansion or regex/wildcard expansion are given constant score 1. This way those documents will be in the results set but won't have impact on ranking that's based on frequency of terms.
Hope that helps

Add variation to the Search Results based on a solr field

I have a solr field which has a set of values. Is it possible in solr to return results that are varied based on that field.
Eg: My field contains "ValueA","ValueB" and "ValueC". So if rows is set to 3 then instead of returning all results from "ValueA" it should give me one from each field value (Considering they have the same scores)
You might want to use the Result Grouping / Field Collapsing
or the CollapsingQParserPlugin.
The CollapsingQParserPlugin is newer (since Solr 4.6), faster and more appropriate for your problem, I guess, as it does not effect the structure of the results.
Just add this to your solrconfig.xml:
<queryParser name="collapse" class="org.apache.solr.search.CollapsingQParserPlugin"/>
You can then collapse your result by adding the following parameter to your query:
fq={!collapse field=my_field}
or in Solrj:
solrQuery.addFilterQuery("{!collapse field=my_field}");
Collapsing means: For each value in my_field it only retains the document with the highest score in the result set.

How can I exclude scores that equal 0 in a Solr function query and maintain the actual score?

My goal is to round score to group similar items and then sort by another field (let's use price as an example).
I'm able to accomplish this with the following query:
/select?defType=func&q=rint(product(query({!v=the search term}),100))&fl=score,price&sort=score%20desc,price
However, this query returns every document indexed in Solr.
How can I filter this query so that items with a score of 0 are excluded?
I've tried adding {!frange l=1} to the query which kind of worked... but it made all of the scores equal to 1. This obviously isn't good because I need to show the most relevant results first.
Thanks in advance for any help.
Alex
I spent hours trying to filter out values with a relevance score of 0. I couldn't find any straight forward way to do this. I ended up accomplishing this with a workaround that assigns the query function to a local param. I call this local param in both the query ("q=") and the filter query ("fq=").
Example
Let's say you have a query like:
q={!func}sum(*your arguments*)
First, make the function component its own parameter:
q={!func}$localParam
&localParam={!func}sum(*your arguments*)
Now to only return results with scores between 1 and 10 simply add a filter query on that localParam:
q={!func}$localParam
&localParam={!func}sum(*your arguments*)
&fq={!frange l=1 u=10 inclusive=true}$localParam

How does ElasticSearch rank filter queries (rather than text queries)?

I know that ElasticSearch uses relevance ranking algorithms such as Lucene's tf/idf, length normalization and couple of more algorithms to rank term queries applied on textual fields (i.e. searching words "medical" AND "journal" in the "title" and "body" fields).
My question is how does ElasticSearch rank and retrieve results of a filter or range query (i.e. age=25, or weight>60)?
I know these types of queries are just filtering documents based on the condition(s). But lets say I have 200 documents which their age field value is 25. Which of those documents will be retrieved as top 10 results?
Does ElasticSearch retrieve them by the order it indexed them?
From the Elasticsearch documentation:
Filters: As a general rule, filters should be used instead of queries:
for binary yes/no searches
for queries on exact values
Queries: As a general rule, queries should be used instead of filters:
for full text search
where the result depends on a relevance score
So when running a search such as "age=25, or weight>60" you should be using a filter.
However - Filters do not affect the scoring - i.e. if you only used a filter your search results would all have the same score.
There is a range query - this is a query that would affect score and I would guess that it scores documents based on things like the document timestamp (most recent gets a higher score).
You'd need to explore the documentation further and dig into Lucene documentation to understand exactly how and why the a document got its score - but as above, you may be better using Filters that don't affect scoring.

Influencing Solr search results with a field value

I've recently started experimenting with Solr. My data is indexed and searchable. My problem is in the sorting. I have three fields: Author, Title, Sales.
I would like to search against the author & title fields, but have the sales value influence the score so that matches with higher sales move toward the top, even if the initial match score is not the highest.
Simply sorting by sales does not produce valid results as a result with a near 0 score for the search term, but a lot of sales in general could end up above a perfect match for the term that has never been sold.
I am seeing results that, while great term matches, are not necessarily the product I want showing at the top of the list.
If you're using the dismax handler, you can add a boost function (bf) with the field you want to boost on, e.g.
http://...?q=foo&bf="fieldValue(sales)^1.5"
...to make the value of the sales figure give a bump. You can, of course, make the function more complex if you want to munge the sales data in some way.
More info is easily found.
You may also just want to do this at index time since the sales data isn't going to be changing on the fly.
You can also use Index-time boosting.
And here's detailed info on using function queries to influence scoring.

Resources