Lucene is not finding results that are present in the index

Lucene is not finding results that are present in the index - search

I'm inspecting a Lucene index with Luke.
All documents have a field 'Title' and I would like to do a search for the search expression Title:Power, by which I want to find all documents with a title containing the word Power.
In Luke, I go to the tab "Search" and enter +Title:Power
When searching, there are no results. However, when I search by another field, I do find the document: +ContentType:MyContentType
In the column Title, I can clearly see the value of the document being: Power Quality Guide.
What could be the reasons I'm not finding this document when searching on Title?

There can be a number of reasons. Most common ones:
Title field could just be stored in the index but not indexed for search (Field.Store.YES, Field.Index.NO), unlike for the field for which you can find results (ContentType);
document(s) could be indexed using one analyzer but query is using a different one;
document is indexed using NOT_ANALYZED option which would store a field as a single term

Related

Solr default search field for multiple fields which has different analyzers

I have a document which has title, stockCode, category fields.
I have different field types (and analysis chains) for each. For instance title has EdgeNGram 2 to 20, category has EdgeNGram 3 to 10 with different range and stockCode just has lowercase filter.
So that, I don't want to search from documents with keyword "sample" with building the query like title:sample OR stockCode:sample OR category:sample.
I'd like to search with just "q=sample".
I copied my fields to text but It does not work. Because all fields analyzed as same. But I don't want to index stockCode as EdgeNGram or any other filters. I'd like to index my fields as I configured and I'd like to search a keyword over them base on my indexes.
I've been researching about that for three days, and Solr has a little bit poor documentation.

You can use the edismax handler, as this will allow you to give a list of fields to query and supply the query by itself. You can also give separate weights to each field for scoring them differently.
defType=edismax&q=sample&qf=title^10 stockCode category
.. will search for sample in each of the three fields, giving a 10x boost to any hits in the title field.
You can find the documentation about the edismax query parser under Searching in the reference guide.

Match only by values in mongodb and return id of the document

I have inserted some json data into mongodb and I wanted to perform a simple search by matching only the values irrespective of the keys (Since keys are different for different documents) and wanted to return the id of the document. I don't know how to compare only by values in mongodb.
Example: Suppose if am searching for word "Knowledge" it should return all the ids of the document which contain the word "Knowledge" irrespective of its key value.

You need to use Wildcard Text Indexes.
db.collection.createIndex( { "$**": "text" } )

If there is a static superset of fieldnames, you may find text indexes and the $text query operator useful for word-based searches.
Create the text index on every potential field, and those contained in each document will be included.

Customize azure search scoring in a specific way

Consider a scenario where all documents have following fields
The requirement is that for email the score should be either 100 (if exact match) or 0.
For remaining fields, it is 0 to 100 based on edit distance .
Suppose in an index the records are like the following
1.abcd#gmail.com,Peterr,Parker,Developer
2.xyz#yahoo.com,Steve,Smith,Manager
The query is made on fuzzy search of all the fields and parameters are like
abcd#gmail.com,Pet,Par,Devl
The search result should have a score for first record like
score for email + score of last name +score of first name+score of title
=100+50(approx edit distance of 'Peterr and Pet')+50(approx edit distance of 'Peterr and Parker')+44(approx edit distance of 'Devl and Developer')
=244
Similarly ,the search result should have a score in similar way.
I just checked Azure search scoring has weights but those I don't think would be of much helpful in scenarios like this .The main thing we are looking for is to find a way where the search score returned for each record by Azure search would be in accordance with the score I discussed above

To clarify, it seems what you need is the scoring formula to be a function of the edit distance between the query term and the indexed term - the shorter the distance, the higher the score. Unfortunately, this is not possible in Azure Search.
Azure Search engine executes the search query in two phases: retrieval and scoring.
During retrieval search query terms processed by the lexical analyzer are looked up in the inverted index. Documents that had those terms are returned. When you use fuzzy search we expand your search query by adding terms from the inverted index that are within edit distance from a given query term - fuzzy expansion. This way your query can match more documents.
During scoring we assign a relevance score to retrieved documents using the Lucene scoring formula. This formula is based on TF/IDF. Practically, it means that documents that matched terms that are rare will be ranked higher up in the results set.
It's important to know that the Lucene scoring formula only applies to documents that matched the original query terms and terms added through fuzzy expansion. Documents that matched terms added through prefix expansion or regex/wildcard expansion are given constant score 1. This way those documents will be in the results set but won't have impact on ranking that's based on frequency of terms.
Hope that helps

The implication of #search.score in Azure Search Service

I understood the reason for having search profile and boosting results based on some fields e.g. distance, rating, etc. To me, that's most likely applicable to structured documents like json files. The scenario that I cannot make sense of it is when indexer gets search service index let's say a MS Word or PDF document in azure blob. We have two entries of "id" and "content" which I don't know how the search score would apply to it.
For e.g. there are two documents with different contents. I searched for a keyword and the same keyword found in two documents resulted into getting two different scores for two MS Word documents. My challenge is why this score should be different while both documents contain the same keyword?

The score is determined by many factors, for example, the count of terms in each document, and the number of searchable fields in which query terms were found. In your example, the documents have different lengths, so naturally they'll have different scores. HTH.

How to retrieve search results from two fields in lucene index, giving one query?

Suppose I search for a query in Field A, and I want to retrive the corresponding fields B and C from my index, how should I go about it? I am using Lucene 3.6.0.

The results of your query will be returned as a set of documents, not fields. Once you've got a document, you can load whichever field contents you're interested in.
One thing that's probably worth watching out for is to ensure that your fields have been "stored".
Good luck,

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Lucene is not finding results that are present in the index - search

Related

Solr default search field for multiple fields which has different analyzers

Match only by values in mongodb and return id of the document

Customize azure search scoring in a specific way

The implication of #search.score in Azure Search Service

How to retrieve search results from two fields in lucene index, giving one query?

Categories

Resources