How to send boost values while querying in elastic search? - search

I need to add a search relevancy score to a particular query in elastic search.
How can I add a boosting score for a particular filed while querying?

If you look at http://www.elasticsearch.org/guide/reference/query-dsl/query-string-query.html
it shows you how to add boosting to a field at query time. For example
{
"query_string" : {
"fields" : ["content", "name^5"],
"query" : "this AND that OR thus",
"use_dis_max" : true
}
Note however there is potential for this to slow down your query considerably so I would test it or use boosting in the mapping file.

Related

How can I use AzureSearch with wildcard

I want to search for a field that has the name "14009-00080300", and I want to get a hit when searching only on a part of that, for example "14009-000803".
Using this code I dont get any hits:
{
"search": "\"14009-000803\"*",
"count":true,
"top":10
}
Is there a way to use azure search like SQL uses its wildcard search? (select * from table where col like '%abc%' ?
You can get your desired result by performing a full query with Lucene syntax (as noted by Sumanth BM). The trick is to do a regex search. Modify your query params like so:
{
"queryType": "full",
"search": "/.*searchterm.*/",
"count":true,
"top":10
}
Replace 'searchterm' with what you are looking for and azure search should return all matches from your index searchable columns.
See Doc section: MS Docs on Lucene regular expression search
You can use generally recognized syntax for multiple () or single (?) character wildcard searches. Note the Lucene query parser supports the use of these symbols with a single term, and not a phrase.
For example to find documents containing the words with the prefix "note", such as "notebook" or "notepad", specify "note".
Note
You cannot use a * or ? symbol as the first character of a search.
No text analysis is performed on wildcard search queries. At query time, wildcard query terms are compared against analyzed terms in the search index and expanded.
SearchMode parameter considerations
The impact of searchMode on queries, as described in Simple query syntax in Azure Search, applies equally to the Lucene query syntax. Namely, searchMode in conjunction with NOT operators can result in query outcomes that might seem unusual if you aren't clear on the implications of how you set the parameter. If you retain the default, searchMode=any, and use a NOT operator, the operation is computed as an OR action, such that "New York" NOT "Seattle" returns all cities that are not Seattle.
https://learn.microsoft.com/en-us/rest/api/searchservice/simple-query-syntax-in-azure-search
Reference: https://learn.microsoft.com/en-us/rest/api/searchservice/lucene-query-syntax-in-azure-search#bkmk_wildcard

Mongoose speed up the search on multiple fields

I am working on a search feature over mongoose documents where I have to search over 250,000 documents.
In this feature I have to add search indexes over multiple fields.
In documents some of the fields are string type,
some are multi level objects.
I have indexed all the possible fields.
At local I am having 100,000 documents and when I search over them it took around 300-400ms.
But when I search over them on server It took around 10-15 sec to respond.
The search query is conditional based but I am sharing a small code snippet.
$and(
{
$or:[
{'field1': {$regex: re }},
{'field2': {$regex: re }},
{'level1.level2.value': {$regex: re }}
]
},
{
$and:[
{
lowAge: {$lte: parseInt(age)}
},
{
highAge: {$gte: parseInt(age)}
},
{
$or:[
{
gender:gender
},
{
gender:"N/A"
}
]
}
]
}
)
Can someone advice me that how can I speed up the process on server.
To speed further more, you can use the text index.
But text index comes with the following Storage Requirements and Performance Costs
Text indexes can be large. They contain one index entry for each unique post-stemmed word in each indexed field for each document inserted.
Building a text index is very similar to building a large multi-key index and will take longer than building a simple ordered (scalar) index on the same data.
When building a large text index on an existing collection, ensure that you have a sufficiently high limit on open file descriptors. See the recommended settings.
Text indexes will impact insertion throughput because MongoDB must add an index entry for each unique post-stemmed word in each indexed field of each new source document.
Additionally, text indexes do not store phrases or information about the proximity of words in the documents. As a result, phrase queries will run much more effectively when the entire collection fits in RAM.
Please see the below references
https://docs.mongodb.com/manual/core/index-text/
https://www.tutorialspoint.com/mongodb/mongodb_text_search.htm
Hope it helps!

Making ID as default search in Solr

I am novice in Solr. I want to build a documents just like below.
{
id : "what"
count : "123"
},
{
id : "what is"
count : "134"
}
Here. I used id as term(string) which will be unique values. If I do indexing and searching on id. Will it reduce the speed* of searching in Solr or it is not a better way to make id as default search. Any suggestion please ?
No, it will not have any real impact - as long as the field is actually defined as a StrField.
You should not have a unique id field that is defined as a TextField and where there are several filters and tokenizers applied.
Whether it's the default search field or not does not matter performance wise, just what field q=what is will be matched against.

Retrieve analyzed tokens from ElasticSearch documents

Trying to access the analyzed/tokenized text in my ElasticSearch documents.
I know you can use the Analyze API to analyze arbitrary text according your analysis modules. So I could copy and paste data from my documents into the Analyze API to see how it was tokenized.
This seems unnecessarily time consuming, though. Is there any way to instruct ElasticSearch to returned the tokenized text in search results? I've looked through the docs and haven't found anything.
This question is a litte old, but maybe I think an additional answer is necessary.
With ElasticSearch 1.0.0 the Term Vector API was added which gives you direct access to the tokens ElasticSearch stores under the hood on per document basis. The API docs are not very clear on this (only mentioned in the example), but in order to use the API you have to first indicate in your mapping definition that you want to store term vectors with the term_vector property on each field.
Have a look at this other answer: elasticsearch - Return the tokens of a field. Unfortunately it requires to reanalyze on the fly the content of your field using the script provided.
It should be possible to write a plugin to expose this feature. The idea would be to add two endpoints to:
allow to read the lucene TermsEnum like the solr TermsComponent does, useful to make auto-suggestions too. Note that it wouldn't be per document, just every term on the index with term frequency and document frequency (potentially expensive with a lot of unique terms)
allow to read the term vectors if enabled, like the solr TermVectorComponent does. This would be per document but requires to store the term vectors (you can configure it in your mapping) and allows also to retrieve positions and offsets if enabled.
You may want to use scripting, however your server should have the scripting enabled.
curl 'http://localhost:9200/your_index/your_type/_search?pretty=true' -d '{
"query" : {
"match_all" : { }
},
"script_fields": {
"terms" : {
"script": "doc[field].values",
"params": {
"field": "field_x.field_y"
}
}
}
}'
The default setting for allowing the script depends on the elastic search version, so please check that out from the official documentation.

field cross search in lucene

Hi:
I have two documents:
title body
Lucene In Action A high-performance, full-featured text search engine library.
Lucene Practice Use lucene in your application
Now,I search "lucene performance" using
private String[] f = { "title", "body"};
private Occur[] should = { Occur.SHOULD, Occur.SHOULD};
Query q = MultiFieldQueryParser.parse(Version.LUCENE_29, "lucene performance", f, should,new IKAnalyzer());
Then I get two hits:
"Lucene In Action" and "Lucene Practice".
However I do not want the "Lucene practice" in the search result.
That's to say,I just want the documents who own all my search terms can be returned,the "lucene parctice" does not contain the term "performance",so it should not be returned.
Any ideas?
Lucene cannot match across fields. That is to say, for the query "a b", it won't match "a" in title and "b" in body. For that you need to create another field, say, all_text, which has title and body both indexed.
Also, when you are searching for "lucene performance" I suppose you are looking for documents that have both the terms - lucene as well as performance. By default, the boolean operator is OR. You need to specify default operator as AND to match all the terms in the query. (Otherwise in this case, the query "lucene performance" will start returning matches that talk about database performance.)

Resources