Elastic Search Java API Multi match query prefix query on tokens

Elastic Search Java API Multi match query prefix query on tokens - search

I am looking for some way that I want to perform search on my index with NativeSearchQueryBuilder from Elastic java api but I want to add the following things while search.
Index details:
Filter type EdgeNgram
White space tokenizer
I am looking for autocomplete functionality so here i want to apply the search keyword on multiple fields but it should apply using prefix to improve the performance, also I want to the results needs to be returned if they reach my specified page limit instead of keep on searching the index even it found enough results.
Ex: "albert einstein" is there in my index, now if I search "alb" it should return the result or if I search "ein" it should return the result.
NativeSearchQueryBuilder sb = new NativeSearchQueryBuilder()
.withIndices(Constants.ES_INDEX_NAME)
//.withPageable(pageable)
.withSourceFilter(new FetchSourceFilterBuilder().withIncludes("id").build())
.withTypes(Constants.USERS_TYPE)
.withQuery(multiMatchQuery("alb", new String[]{"userFirstName","userLastName","userMobile", "userEmail"}))
.withFilter(boolQuery()
.must(termQuery("userCityName", "Chicago")));
Please someone help me on this, how to add prefix and limit to my Multimatch Query builder.

What you are looking for is match_phrase_prefix
int limit = 100; //Set your limit
NativeSearchQueryBuilder sb = new NativeSearchQueryBuilder()
.withIndices(Constants.ES_INDEX_NAME)
.withPageable(new PageRequest(0, limit))
.withSourceFilter(new FetchSourceFilterBuilder().withIncludes("id").build())
.withTypes(Constants.USERS_TYPE)
.withQuery(QueryBuilders.multiMatchQuery("alb", "userFirstName","userLastName","userMobile", "userEmail")
.type(MatchQueryBuilder.Type.PHRASE_PREFIX))
.withFilter(boolQuery()
.must(termQuery("userCityName", "Chicago")));

Related

How to power a windowed virtual list with cursor based pagination?

Take a windowed virtual list with the capability of loading an arbitrary range of rows at any point in the list, such as in this following example.
The virtual list provides a callback that is called anytime the user scrolls to some rows that have not been fetched from the backend yet, and provides the start and stop indexes, so that, in an offset based pagination endpoint, I can fetch the required items without fetching any unnecessary data.
const loadMoreItems = (startIndex, stopIndex) => {
fetch(`/items?offset=${startIndex}&limit=${stopIndex - startIndex}`);
}
I'd like to replace my offset based pagination with a cursor based one, but I can't figure out how to reproduce the above logic with it.
The main issue is that I feel like I will need to download all the items before startIndex in order to receive the cursor needed to fetch the items between startIndex and stopIndex.
What's the correct way to approach this?

After some investigation I found what seems to be the way MongoDB approaches the problem:
https://docs.mongodb.com/manual/reference/method/cursor.skip/#mongodb-method-cursor.skip
Obviously he same approach can be adopted by any other backend implementation.
They provide a skip method that allows to skip an arbitrary amount of items after the provided cursor.
This means my sample endpoint would look like the following:
/items?cursor=${cursor}&skip=${skip}&limit=${stopIndex - startIndex}
I then need to figure out the cursor and the skip values.
The following code could work to find the closest available cursor, given I store them together with the items:
// Limit our search only to items before startIndex
const fragment = items.slice(0, startIndex);
// Find the closest cursor index
const cursorIndex = fragment.length - 1 - fragment.reverse().findIndex(item => item.cursor != null);
// Get the cursor
const cursor = items[cursorIndex];
And of course, I also have a way to know the skip value:
const skip = items.length - 1 - cursorIndex;

Azure search - custom function for result boosting

I'm trying to move "complex" function to Azure Search. This function calculates score per each result element base on filter data (from search query) and data stored in result element. Score is use for reasult boosting. Base on my research Azure Search provides result boosting, but it's too simple for mine requirement.
Example function:
//filterElementsIds - ids taken from search query filter
public double Score(IEnumerable<string> filterElementsIds, ResultElement element)
{
double score = 0;
foreach(var elem in element.ScoreForFilters)
if (filterElementsIds.Any(x => x == elem.Key))
score += elem.Value * 1.5;
return score;
}
Currently, I'm iterating through each result returned by Azure Search - calculating score and sorting elements inside my application.
Is it possible to implement such function in Azure Search to improve process of boosting results?

I'm not sure I fully understand your question, but it appears like you are trying to boost the score of certain document if their key is equal to any of the IDs in your collection of "filterElements". If that's so, you could use the lucene query language to craft a query which does that:
https://learn.microsoft.com/en-us/azure/search/search-query-lucene-examples
You could do a search that looks like this
OriginalSearchTerm OR (OriginalSearchTerm AND key:("filterID1" OR "filterID2" OR "filterID3"))
That way, documents that match both the original search term as well as having one of the filter ID as part of the "key" field will match higher than documents that only match the original search term. You can also term boosting to give a specific boost to the key field in this case
If that's so, could you use "term boosting" to achieve this?
https://learn.microsoft.com/en-us/azure/search/search-query-lucene-examples#example-5-term-boosting
OriginalSearchTerm OR (OriginalSearchTerm AND key:("filterID1" OR "filterID2" OR "filterID3")^2)

Filtering Haystack (SOLR) results by django_id

With Django/Haystack/SOLR, I'd like to be able to restrict the result of a search to those records within a particular range of django_ids. Getting these IDs is not a problem, but trying to filter by them produces some unexpected effects. The code looks like this (extraneous code trimmed for clarity):
def view_results(request,arg):
# django_ids list is first calculated using arg...
sqs = SearchQuerySet().facet('example_facet') # STEP_1
sqs = sqs.filter(django_id__in=django_ids) # STEP_2
view = search_view_factory(
view_class=SearchView,
template='search/search-results.html',
searchqueryset=sqs,
form_class=FacetedSearchForm
)
return view(request)
At the point marked STEP_1 I get all the database records. At STEP_2 the records are successfully narrowed down to the number I'd expect for that list of django_ids. The problem comes when the search results are displayed in cases where the user has specified a search term in the form. Rather than returning all records from STEP_2 which match the term, I get all records from STEP_2 plus all from STEP_1 which match the term.
Presumably, therefore, I need to override one/some of the methods in for SearchView in haystack/views.py, but what? Can anyone suggest a means of achieving what is required here?

After a bit more thought, I found a way around this. In the code above, the problem was occurring in the view = search_view_factory... line, so I needed to create my own SearchView class and override the get_results(self) method in order to apply the filtering after the search has been run with the user's search terms. The result is code along these lines:
class MySearchView(SearchView):
def get_results(self):
search = self.form.search()
# The ID I need for the database search is at the end of the URL,
# but this may have some search parameters on and need cleaning up.
view_id = self.request.path.split("/")[-1]
view_query = MyView.objects.filter(id=view_id.split("&")[0])
# At this point the django_ids of the required objects can be found.
if len(view_query) > 0:
view_item = view_query.__getitem__(0)
django_ids = []
for thing in view_item.things.all():
django_ids.append(thing.id)
search = search.filter_and(django_id__in=django_ids)
return search
Using search.filter_and rather than search.filter at the end was another thing which turned out to be essential, but which didn't do what I needed when the filtering was being performed before getting to the SearchView.

Lucene wild card search

How can I perform a wildcard search in Lucene ?
I have the text: "1997_titanic"
If I search like "1997_titanic", it is returning a result, but I am not able to do below two searches:
1) If I search with only 1997 it is not returning any results.
2) Also if there is a space, such as in "spider man", that is not finding any results.
I retrieve all movie information from a DB and store it in Lucene Documents:
public Document createMovieDoc(Movie m){
document.add(new StoredField("moviename", m.getName()));
TextField field = new TextField("movienameSearch", m.getName().toLowerCase(), Store.NO);
field.setBoost(5.0f);
document.add(field);
}
And to search, I have this method:
public List searh(String txt){
PhraseQuery phQuery= new PhraseQuery();
Term term = new Term("movienameSearch", txt.toLowerCase());
BooleanQuery b = new BooleanQuery();
b.add(phQuery, Occur.SHOULD);
TopFieldDocs tp= searcher.search(b, 20, ..);
for(int i=0;i<tp.length;i++)
{
int mId = tp[i].doc;
Document d = searcher.doc(mId);
String moviename = d.get("moviename");
list.add(moviename);
}
return list;
}

I'm not sure what analyzer you are using to index. Sounds like maybe WhitespaceAnalyzer? It sounds like, when indexing "1997_titanic" remains a single token, while "spider man" is split into the token "spider" and "man".
Could also be SimpleAnalyzer which uses a LetterTokenizer. This would make it impossible to search for "1997", since that tokenizer will eliminate all numbers for the indexed representation of the text.
Your search method doesn't look right. You aren't adding any terms to your PhraseQuery, so I wouldn't expect it to find anything. You must add some terms in order for anything to be found. You create a Term in what you've provided, but nothing is ever done with that Term. Maybe this has something to do with how you've pick your excerpts, or something? Not sure, I'm a bit confused by that.
In order to manually construct a PhraseQuery you must add each term individually, so to search for "spider man", you would do something like:
PhraseQuery phQuery= new PhraseQuery();
phQuery.add(new Term("movienameSearch", "spider"));
phQuery.add(new Term("movienameSearch", "man"));
This requires you to know what the analyzer was doing at index time, and tokenize the input yourself to suit. The simpler solution is to just use the QueryParser:
//With whatever analyzer you like to use.
QueryParser parser = new QueryParser(Version.LUCENE_46, "defaultField", analyzer);
Query query = parser.parse("movienameSearch:\"" + txt.toLowerCase() + "\"");
TopFieldDocs tp= searcher.search(query, 20);
This allows you to rely on the same analyzer to index and query, so you don't have to know how to tokenize your phrases to suit.
As far as finding "1997" and "titanic" individually, I would recommend just using StandardAnalyzer. It will tokenize those into discrete tokens, allowing them to be searched very easily, with a simple query like: movienameSearch:1997.

Exact phrase search using Lucene.net

I am having trouble searching for an exact phrase using Lucene.NET 2.0.0.4
For example I am searching for "scope attribute sets the variable" (including quotes) but receive no matches, I have confirmed 100% that the phrase exists.
Can anyone suggest where I am going wrong? Is this even supported with Lucene.NET? As usual the API documentation is not too helpful and a few CodeProject articles I've read don't specifically touch on this.
Using the following code to create the index:
Directory dir = Lucene.Net.Store.FSDirectory.GetDirectory("Index", true);
Analyzer analyzer = new Lucene.Net.Analysis.SimpleAnalyzer();
IndexWriter indexWriter = new Lucene.Net.Index.IndexWriter(dir, analyzer,true);
//create a document, add in a single field
Lucene.Net.Documents.Document doc = new Lucene.Net.Documents.Document();
Lucene.Net.Documents.Field fldContent = new Lucene.Net.Documents.Field(
"content", File.ReadAllText(#"Documents\100.txt"),
Lucene.Net.Documents.Field.Store.YES,
Lucene.Net.Documents.Field.Index.TOKENIZED);
doc.Add(fldContent);
//write the document to the index
indexWriter.AddDocument(doc);
I then search for a phrase using:
//state the file location of the index
Directory dir = Lucene.Net.Store.FSDirectory.GetDirectory("Index", false);
//create an index searcher that will perform the search
IndexSearcher searcher = new Lucene.Net.Search.IndexSearcher(dir);
QueryParser qp = new QueryParser("content", new SimpleAnalyzer());
// txtSearch.Text Contains a phrase such as "this is a phrase"
Query q=qp.Parse(txtSearch.Text);
//execute the query
Lucene.Net.Search.Hits hits = searcher.Search(q);
The target document is about 7 MB plain text.
I have seen this previous question however I don't want a proximity search, just an exact phrase search.

Shashikant Kore is correct with his answer, you need to enable term positions...
However, I would recommend not storing the text of the document in the field unless you absolutely need it to return back to you in the search results... Setting the store to 'NO' might help reduce the size of your index a bit.
Lucene.Net.Documents.Field fldContent =
new Lucene.Net.Documents.Field("content",
File.ReadAllText(#"Documents\100.txt"),
Lucene.Net.Documents.Field.Store.NO,
Lucene.Net.Documents.Field.Index.TOKENIZED,
Lucene.Net.Documents.Field.TermVector.WITH_POSITIONS_OFFSETS);

You have not enabled the term positions. Creating field as follows should solve your problem.
Lucene.Net.Documents.Field fldContent =
new Lucene.Net.Documents.Field("content",
File.ReadAllText(#"Documents\100.txt"),
Lucene.Net.Documents.Field.Store.YES,
Lucene.Net.Documents.Field.Index.TOKENIZED,
Lucene.Net.Documents.Field.TermVector.WITH_POSITIONS_OFFSETS);

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string