Boosting specific field value in Elastic search - search

Hi I need to boost the documents based on the on a particular value of a field.. My documents contains a field called Region.. Based on the value present in the region i need to boost my documents..
These are my documents
{
"title":"INOX: Malleshwaram - Mantri Square",
"region":"Bangalore"
}
{
"title":"INOX: Bund Garden Road",
"region":"Pune"
}
{
"title":"INOX: Glomax Mall, Kharghar",
"region":"Mumbai"
}
I have tried to use rescore query in my query which look like this
"rescore" : {
"query" : {
"score_mode":"total",
"query_weight" : 2.5,
"rescore_query_weight" : 0.5,
"rescore_query" : {
"match" : {
"region" : {
"query" : "mumbai",
"slop" : 2
}
}
}
}
}
}
But its not working properly as required..Is there any way to solve this?..
Thanks in advance!

Why rescoring, all you need is boosting. Based on the query type you are using, boosting is possible in
"query_string": {
"fields":["region^56"],
"use_dis_max" : true,
"query": "mumbai"
}
where ^56 is the boosting value.
You can also use as mentioned here http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-boosting-query.html
If you are using the bool query you can use boost like this to boost all queries
{
"bool" : {
"must" : {
"term" : { "region" : "mumbai" }
},
"boost" : 25.0
}
}

Related

What if some documents don't have a field that is part of an index?

A collection has an indexed involved field_A. But field_A is not required. So what happens if some documents do not have this field? Will the index still work for documents that do have this field?
Yes it works, here is a test:
db.collection.createIndex({ field_A: 1 });
for (let i = 0; i < 100; i++)
db.collection.insertOne({ field_B: i });
db.collection.stats(1024).indexSizes
{ "_id_" : 20, "field_A_1" : 20 }
You see index field_A_1 has a size of 20 kiByte. This behavior is different to most relational DBMS database where such index would have a size of zero.
The index is also used by your query, if you use the field:
db.collection.find({ field_B: 1 }).explain().queryPlanner.winningPlan;
{
"stage" : "COLLSCAN",
"filter" : {
"field_B" : {
"$eq" : 1
}
}
}
db.collection.find({ field_A: null, field_B: 1 }).explain().queryPlanner.winningPlan;
{
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"field_A" : 1
},
"indexName" : "field_A_1",
"indexBounds" : {
"field_A" : [
"[undefined, undefined]",
"[null, null]"
]
}
}
}
Yes , index will work for the documents that have the field available and indexed , but you may look on the options to create sparse or partial type of indices which add some additional optimisation in certain cases ...
P.S.
In regular indices for documents that miss the field in the index this is seen as null value ... , so if you search by field_A: null you will find those documents missing the field and those that are equal to null ...

MongoDB: How to find documents by value?

Two documents containing ObjectId("6148a371c13a6a0be492ebf4")
Document 1
{
"_id" : ObjectId("6144f66fb9543917f96fc"),
"refId" : "ford",
"template" : "6144f61cb96d772317f96f9",
"fieldValues" : {
"PDV" : [
"6126938cd24a8aa3d37b4992",
ObjectId("6148a371c13a6a0be492ebf4")
]
},
"group" : ObjectId("6144f66fb96d7731917f96fd"),
"createdAt" : ISODate("2021-09-17T20:11:27.440Z"),
"updatedAt" : ISODate("2021-09-20T15:06:26.146Z"),
"__v" : 0
}
Document 2
{
"_id" : ObjectId("6144f66fb96d77rr3217f96fc"),
"refId" : "CCM",
"template" : "6144f613296d7731917f96f9",
"fieldValues" : {
"DDB" : [
"6126938cd2448aa3d37b4992",
"5443938cd2448aa3d37b4992",
ObjectId("6148a371c13a6a0be492ebf4"),
]
},
"group" : ObjectId("6144f66fb96de431917f96fd"),
"createdAt" : ISODate("2021-09-17T20:11:27.440Z"),
"updatedAt" : ISODate("2021-09-20T15:06:26.146Z"),
"__v" : 0
}
ObjectId that we looking for is always inside fieldValues but instead of PDV or DDB we will always have the different naming.
So we can't use this type of query:
db.getCollection('products').find({"fieldValues.PDV":ObjectId('6148a371c13a6a0be492ebf4')})
PS. This query should work only on DB, we can't afford to query all products and do calculation on backend there might to be a millions of products.
You can use this one:
db.collection.aggregate([
{
$set: {
kv: { $first: { $objectToArray: "$fieldValues" } }
}
},
{ $match: { "kv.v": ObjectId("6148a371c13a6a0be492ebf3") } },
{ $unset: "kv" }
])
Mongo Playground
db.products.find({'_id': ObjectId("6148a371c13a6a0be492ebf4")})
The mistake in your code is that you used key instead of _id.
This way of writing it is much easier on the fingers though.
You'd think a solution like this would work but one reason why this may not is because you're trying to use === on an object. If you refer to this thread, it might help if you use .equals() instead of ===.

Elastic.co/Elastic search - Relevance feedback with multiple Boosting Queries

I'm trying to implement relevance feedback for Elastic Search (Elastic.co).
I'm aware of boosting queries, which allow for the specification of postiive and negative terms, with the idea being to discount the negative terms, while not excluding them as would be the case in a boolean must_not.
However, I'm trying to achieve tiered boosting, of both positive and negative terms.
That is, I want to take a list of binned positive and negative terms and generate a query such that there are different positive and negative boost tiers, each containing their own query terms.
something like (pseudo query):
query{
{
terms: [very relevant terms]
pos_boost: 3
}
{
terms: [relevant terms]
pos_boost: 2
}
{
terms: [irrelevant terms]
neg_boost: 0.6
}
{
terms: [very irrelevant terms]
neg_boost: 0.3
}
}
My question is whether or not this can be achieved with nested boosting queries, or if I'm better off with multiple should clauses.
My concern is that I'm not sure if a boost of 0.2 in the should clause of a bool query still gives the document a positive increase in the score or not, as I want to discount the document, rather than provide any increase in score.
With boosting queries, the concern is that I can't control the degree to which positive terms are weighted.
Any help, or suggestions for other implementations, would be greatly appreciated. (What I really wanted to do was create a language model for relevant documents and use that to rank, but I don't see how that can easily be achieved in elastic.)
Seems that you can combine bool query and use boosting query clauses tweaking boost values.
POST so/boost/ {"text": "apple computers"}
POST so/boost/ {"text": "apple pie recipe"}
POST so/boost/ {"text": "apple tree garden"}
POST so/boost/ {"text": "apple iphone"}
POST so/boost/ {"text": "apple company"}
GET so/boost/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"text": "apple"
}
}
],
"should": [
{
"match": {
"text": {
"query": "pie",
"boost": 2
}
}
},
{
"match": {
"text": {
"query": "tree",
"boost": 2
}
}
},
{
"match": {
"text": {
"query": "iphone",
"boost": -0.5
}
}
}
]
}
}
}
Alternately, if you want to encode your language model into your collection at index-time, you can try the approach described here: Elasticsearch: Influence scoring with custom score field in document
To boost the elastic search document(priority based search query) based on custom/variable boost value at query time i.e. conditional boosting.
Java Coding example:
customerKeySearch = QueryBuilders.constantScoreQuery(QueryBuilders.termQuery(keys.type", "xxx"));
customerTypeSearch = QueryBuilders.constantScoreQuery(QueryBuilders.termQuery("keys.keyValues.value", "xxxx"));
keyValueQuery = QueryBuilders.boolQuery().must(customerKeySearch).must(customerTypeSearch).boost(2f);
customerKeySearch = QueryBuilders.constantScoreQuery(QueryBuilders.termQuery(keys.type", "xxx"));
customerTypeSearch = QueryBuilders.constantScoreQuery(QueryBuilders.termQuery("keys.keyValues.value", "xxxx"));
keyValueQuery = QueryBuilders.boolQuery().must(customerKeySearch).must(customerTypeSearch).boost(6f);
Description and search query:
elastic search has its internal score calculation technic so we need to disable this mechanism by setting disableCoord(true) property to true in java for BoleanQuery to apply custom boost effect.
Following Boolean query is running query for boosting the documents in elastic search index based on boost value.
{
"bool" : {
"should" : [ {
"bool" : {
"must" : [ {
"constant_score" : {
"query" : {
"term" : {
"keys.type" : "XXX"
}
}
}
}, {
"constant_score" : {
"query" : {
"term" : {
"keys.keyValues.value" : "XXXX"
}
}
}
} ],
"boost" : 2.0
}
}, {
"bool" : {
"must" : [ {
"constant_score" : {
"query" : {
"term" : {
"keys.type" : "XXX"
}
}
}
}, {
"constant_score" : {
"query" : {
"term" : {
"keys.keyValues.value" : "500072388315"
}
}
}
} ],
"boost" : 6.0
}
}, {
"bool" : {
"must" : [ {
"constant_score" : {
"query" : {
"term" : {
"keys.type" : "XXX"
}
}
}
}, {
"constant_score" : {
"query" : {
"term" : {
"keys.keyValues.value" : "XXXXXX"
}
}
}
} ],
"boost" : 10.0
}
} ],
"disable_coord" : true
}
}

Elasticsearch two level sort in aggregation list

Currently I am sorting aggregations by document score, so most relevant items come first in aggregation list like below:
{
'aggs' : {
'guilds' : {
'terms' : {
'field' : 'guilds.title.original',
'order' : [{'max_score' : 'desc'}],
'aggs' : {
'max_score' : {
'script' : 'doc.score'
}
}
}
}
}
}
I want to add another sort option to the order terms order array in my JSON. but when I do that like this :
{
'order' : [{'max_score' : 'desc'}, {"_count" : "desc"},
}
The second sort does not work. For example when all of the scores are equal it then should sort based on query but it does not work.
As a correction to Andrei's answer ... to order aggregations by multiple criteria, you MUST create an array as shown in Terms Aggregation: Order and you MUST be using ElasticSearch 1.5 or later.
So, for Andrei's answer, the correction is:
"order" : [ { "max_score": "desc" }, { "_count": "desc" } ]
As Andrei has it, ES will not complain but it will ONLY use the last item listed in the "order" element.
I don't know how your 'aggs' is even working because I tried it and I had parsing errors in three places: "order" is not allowed to have that array structure, your second "aggs" should be placed outside the first "terms" aggs and, finally, the "max_score" aggs should have had a "max" type of "aggs". In my case, to make it work (and it does actually order properly), it should look like this:
"aggs": {
"guilds": {
"terms": {
"field": "guilds.title.original",
"order": {
"max_score": "desc",
"_count": "desc"
}
},
"aggs": {
"max_score": {
"max": {
"script": "doc.score"
}
}
}
}
}

How to geo_distance filter against multiple location fields in Elasticsearch

I have an arbitrary # of location data points per document (anywhere up to 80). I want to perform a geo_distance filter against these locations. The elasticsearch docs claim that:
The geo_distance filter can work with multiple locations / points per document.
Once a single location / point matches the filter, the document will be included in the filter.
It's never made clear how to achieve this. I assume that you have to define the # of locations ahead of time, such that your indexed document looks contains these nested fields:
{
"pin" : {
"location" : {
"lat" : 40.12,
"lon" : -71.34
}
}
}
{
"alt_pin" : {
"location" : {
"lat" : 41.12,
"lon" : -72.34
}
}
}
I assume that you would then filter against pin.location and alt_pin.location somehow.
What if I had an arbitrary number of locations (pin1, pin2, pin3, ...)? Can I do something like this:
"pin" : {
"locations" : [{
"lat" : 41.12,
"lon" : -72.34
}, {
"lat" : 41.12,
"lon" : -72.34
}]
}
}
Would some variation on that work? Maybe using geo_hashes instead of lat/lng coordinates?
Multiple location values can be represented as an array of location fields. Try this:
{
"pin": [
{
"location" :{
"lat": 40.12,
"lon": -71.34
}
},
{
"location" :{
"lat": 41.12,
"lon": -72.34
}
}
]
}

Resources