I want to implement a simple search query using Elasticsearch.
I have two fields, "title" and "description" that I would like to match the searched term with. Currently, I have the body shown below as the body for search body. How can I make it so that the search prioritizes the title match, but if there are matches in the description, they are still included in the search (with lower priority)? Thanks in advance.
body = {
size: 200,
from: 0,
query: {
prefix: {
title: searchTerm
}
}
}
You have to use a constant score query with a score of 0 for the "other" field. Any other boost / function score usage will not reliably score a certain field over another field as the scoring is based on other parameters like text length for example, this means a constant boost (unless very very large) can not guarantee the behaviour you seek.
By using a constant score for each field you can control score manually, like so:
{
size: 200,
from: 0,
query: {
bool: {
should: [
{
prefix: {
title: searchTerm
}
},
{
constant_score: {
filter: {
prefix: {
description: searchTerm
}
},
boost: 0
}
},
]
}
}
}
If you set description boost to be more than 0 then the score will be the combined score of both fields, by doing this you can prioritize documents that have that prefix in both fields over ones that have it in just the title field.
You can use a combination of bool/should clause along with the boost parameter
{
"query": {
"bool": {
"should": [
{
"prefix": {
"title": {
"value": "searchterm"
}
}
},
{
"prefix": {
"description": {
"value": "searchterm",
"boost": 4
}
}
}
]
}
}
}
Related
I'm hooking into an API which I have no control over and would like to extract all recipe entries which match certain criteria. For the most part, this is a simple 'does value equal N', however for one of these criterion I also have to check if another value is greater than 0.
This code works absolutely fine:
should: [
{ match: { 'ItemResult.ItemAction.Type': 853 } },
{ match: { 'ItemResult.ItemAction.Type': 1013 } },
{ match: { 'ItemResult.ItemAction.Type': 1322 } },
{ match: { 'ItemResult.ItemAction.Type': 5845 } }
]
It gives me all recipe entries whose 'ItemResult.ItemAction.Type is either 853, 1013, 1322 or 5845 as expected. The problem comes with this new more complex condition to my should array:
range: {
'ItemResult.ItemAction.Type': { gte: 5100, lte: 5300 },
'ItemResult.ItemAction.Data0': { gt: 0 }
}, ...
Each individual range property works fine, but naturally I'm getting the following error when both are combined like they are above:
"reason":"[range] query doesn\'t support multiple fields
Is there a way I happily have both ranges considered within the same query without impacting the other ItemResult.ItemAction.Type values?
Obviously I can hook into the API a second time to perform the more complex criterion search, but I'm wondering if I can do it all in the one call.
{
"query": {
"bool": {
"must": [
{
"range": {
"ItemResult.ItemAction.Type": {
"gte": 5100,
"lte": 5300
}
}
},
{
"range": {
"ItemResult.ItemAction.Data0": {
"gt": 0
}
}
}
]
}
}
}
Range from elasticsearch doesn't support multiple fields but you can use this query for having multiple range conditions.
I'm struggling with something that should be easy but it's making no sense to me, I have these 2 documents in a database:
{ "name": "foo", "type": "typeA" },
{ "name": "bar", "type": "typeB" }
And I'm posting this to _find:
{
"selector": {
"type": "typeA"
},
"sort": ["name"]
}
Which works as expected but I get a warning that there's no matching index, so I've tried posting various combinations of the following to _index which makes no difference:
{
"index": {
"fields": ["type"]
}
}
{
"index": {
"fields": ["name"]
}
}
{
"index": {
"fields": ["name", "type"]
}
}
If I remove the sort by name and only index the type it works fine except it's not sorted, is this a limitation with couchdbs' mango implementation or am I missing something?
Using a view and map function works fine but I'm curious what mango is/isn't doing here.
With just the type index, I think it will normally be almost as efficient unless you have many documents of each type (as it has to do the sorting stage in memory.)
But since fields are ordered, it would be necessary to do:
{
"index": {
"fields": ["type", "name"]
}
}
to have a contiguous slice of this index for each type that is already ordered by name. But the query planner may not determine that this index applies.
As an example, the current pouchdb-find (which should be similar) needs the more complicated but equivalent query:
{
selector: {type: 'typeA', name: {$gte: null} },
sort: ['type','name']
}
to choose this index and build a plan that doesn't resort to building in memory for any step.
I'm trying to implement relevance feedback for Elastic Search (Elastic.co).
I'm aware of boosting queries, which allow for the specification of postiive and negative terms, with the idea being to discount the negative terms, while not excluding them as would be the case in a boolean must_not.
However, I'm trying to achieve tiered boosting, of both positive and negative terms.
That is, I want to take a list of binned positive and negative terms and generate a query such that there are different positive and negative boost tiers, each containing their own query terms.
something like (pseudo query):
query{
{
terms: [very relevant terms]
pos_boost: 3
}
{
terms: [relevant terms]
pos_boost: 2
}
{
terms: [irrelevant terms]
neg_boost: 0.6
}
{
terms: [very irrelevant terms]
neg_boost: 0.3
}
}
My question is whether or not this can be achieved with nested boosting queries, or if I'm better off with multiple should clauses.
My concern is that I'm not sure if a boost of 0.2 in the should clause of a bool query still gives the document a positive increase in the score or not, as I want to discount the document, rather than provide any increase in score.
With boosting queries, the concern is that I can't control the degree to which positive terms are weighted.
Any help, or suggestions for other implementations, would be greatly appreciated. (What I really wanted to do was create a language model for relevant documents and use that to rank, but I don't see how that can easily be achieved in elastic.)
Seems that you can combine bool query and use boosting query clauses tweaking boost values.
POST so/boost/ {"text": "apple computers"}
POST so/boost/ {"text": "apple pie recipe"}
POST so/boost/ {"text": "apple tree garden"}
POST so/boost/ {"text": "apple iphone"}
POST so/boost/ {"text": "apple company"}
GET so/boost/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"text": "apple"
}
}
],
"should": [
{
"match": {
"text": {
"query": "pie",
"boost": 2
}
}
},
{
"match": {
"text": {
"query": "tree",
"boost": 2
}
}
},
{
"match": {
"text": {
"query": "iphone",
"boost": -0.5
}
}
}
]
}
}
}
Alternately, if you want to encode your language model into your collection at index-time, you can try the approach described here: Elasticsearch: Influence scoring with custom score field in document
To boost the elastic search document(priority based search query) based on custom/variable boost value at query time i.e. conditional boosting.
Java Coding example:
customerKeySearch = QueryBuilders.constantScoreQuery(QueryBuilders.termQuery(keys.type", "xxx"));
customerTypeSearch = QueryBuilders.constantScoreQuery(QueryBuilders.termQuery("keys.keyValues.value", "xxxx"));
keyValueQuery = QueryBuilders.boolQuery().must(customerKeySearch).must(customerTypeSearch).boost(2f);
customerKeySearch = QueryBuilders.constantScoreQuery(QueryBuilders.termQuery(keys.type", "xxx"));
customerTypeSearch = QueryBuilders.constantScoreQuery(QueryBuilders.termQuery("keys.keyValues.value", "xxxx"));
keyValueQuery = QueryBuilders.boolQuery().must(customerKeySearch).must(customerTypeSearch).boost(6f);
Description and search query:
elastic search has its internal score calculation technic so we need to disable this mechanism by setting disableCoord(true) property to true in java for BoleanQuery to apply custom boost effect.
Following Boolean query is running query for boosting the documents in elastic search index based on boost value.
{
"bool" : {
"should" : [ {
"bool" : {
"must" : [ {
"constant_score" : {
"query" : {
"term" : {
"keys.type" : "XXX"
}
}
}
}, {
"constant_score" : {
"query" : {
"term" : {
"keys.keyValues.value" : "XXXX"
}
}
}
} ],
"boost" : 2.0
}
}, {
"bool" : {
"must" : [ {
"constant_score" : {
"query" : {
"term" : {
"keys.type" : "XXX"
}
}
}
}, {
"constant_score" : {
"query" : {
"term" : {
"keys.keyValues.value" : "500072388315"
}
}
}
} ],
"boost" : 6.0
}
}, {
"bool" : {
"must" : [ {
"constant_score" : {
"query" : {
"term" : {
"keys.type" : "XXX"
}
}
}
}, {
"constant_score" : {
"query" : {
"term" : {
"keys.keyValues.value" : "XXXXXX"
}
}
}
} ],
"boost" : 10.0
}
} ],
"disable_coord" : true
}
}
I'm building a leaderboard with elasticsearch. I'd like to query all documents who have points greater than a given amount using the following query:
{
"constant_score" : {
"filter" : {
"range" : {
"totalPoints" : {
"gt": 242
}
}
}
}
This works perfectly -- elasticsearch appropriately returns all documents with points greater than 242. However, all I really need is the count of elements matching this query. Since I'm sending the result over the network, it would be helpful if the query simply returned the count, as opposed to all of the documents matching the filter.
How do I get elasticsearch to only report the count of documents matching the filter?
EDIT: I've learned that what I'm looking for is setting search_type to count. However, I'm not sure how to do this with elastic.js. Any noders willing to pitch in their advice?
You can use the query type count for exactly that purpose:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-search-type.html#count
This is an example that should help you:
GET /mymusic/itunes/_search?search_type=count
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"range": {
"year": {
"gt": 2000
}
}
}
}
}
}
How do I search for all unique values of a given field with Elasticsearch?
I have such a kind of query like select full_name from authors, so I can display the list to the users on a form.
You could make a terms facet on your 'full_name' field. But in order to do that properly you need to make sure you're not tokenizing it while indexing, otherwise every entry in the facet will be a different term that is part of the field content. You most likely need to configure it as 'not_analyzed' in your mapping. If you are also searching on it and you still want to tokenize it you can just index it in two different ways using multi field.
You also need to take into account that depending on the number of unique terms that are part of the full_name field, this operation can be expensive and require quite some memory.
For Elasticsearch 1.0 and later, you can leverage terms aggregation to do this,
query DSL:
{
"aggs": {
"NAME": {
"terms": {
"field": "",
"size": 10
}
}
}
}
A real example:
{
"aggs": {
"full_name": {
"terms": {
"field": "authors",
"size": 0
}
}
}
}
Then you can get all unique values of authors field.
size=0 means not limit the number of terms(this requires es to be 1.1.0 or later).
Response:
{
...
"aggregations" : {
"full_name" : {
"buckets" : [
{
"key" : "Ken",
"doc_count" : 10
},
{
"key" : "Jim Gray",
"doc_count" : 10
},
]
}
}
}
see Elasticsearch terms aggregations.
Intuition:
In SQL parlance:
Select distinct full_name from authors;
is equivalent to
Select full_name from authors group by full_name;
So, we can use the grouping/aggregate syntax in ElasticSearch to find distinct entries.
Assume the following is the structure stored in elastic search :
[{
"author": "Brian Kernighan"
},
{
"author": "Charles Dickens"
}]
What did not work: Plain aggregation
{
"aggs": {
"full_name": {
"terms": {
"field": "author"
}
}
}
}
I got the following error:
{
"error": {
"root_cause": [
{
"reason": "Fielddata is disabled on text fields by default...",
"type": "illegal_argument_exception"
}
]
}
}
What worked like a charm: Appending .keyword with the field
{
"aggs": {
"full_name": {
"terms": {
"field": "author.keyword"
}
}
}
}
And the sample output could be:
{
"aggregations": {
"full_name": {
"buckets": [
{
"doc_count": 372,
"key": "Charles Dickens"
},
{
"doc_count": 283,
"key": "Brian Kernighan"
}
],
"doc_count": 1000
}
}
}
Bonus tip:
Let us assume the field in question is nested as follows:
[{
"authors": [{
"details": [{
"name": "Brian Kernighan"
}]
}]
},
{
"authors": [{
"details": [{
"name": "Charles Dickens"
}]
}]
}
]
Now the correct query becomes:
{
"aggregations": {
"full_name": {
"aggregations": {
"author_details": {
"terms": {
"field": "authors.details.name"
}
}
},
"nested": {
"path": "authors.details"
}
}
},
"size": 0
}
Working for Elasticsearch 5.2.2
curl -XGET http://localhost:9200/articles/_search?pretty -d '
{
"aggs" : {
"whatever" : {
"terms" : { "field" : "yourfield", "size":10000 }
}
},
"size" : 0
}'
The "size":10000 means get (at most) 10000 unique values. Without this, if you have more than 10 unique values, only 10 values are returned.
The "size":0 means that in result, "hits" will contain no documents. By default, 10 documents are returned, which we don't need.
Reference: bucket terms aggregation
Also note, according to this page, facets have been replaced by aggregations in Elasticsearch 1.0, which are a superset of facets.
The existing answers did not work for me in Elasticsearch 5.X, for the following reasons:
I needed to tokenize my input while indexing.
"size": 0 failed to parse because "[size] must be greater than 0."
"Fielddata is disabled on text fields by default." This means by default you cannot search on the full_name field. However, an unanalyzed keyword field can be used for aggregations.
Solution 1: use the Scroll API. It works by keeping a search context and making multiple requests, each time returning subsequent batches of results. If you are using Python, the elasticsearch module has the scan() helper function to handle scrolling for you and return all results.
Solution 2: use the Search After API. It is similar to Scroll, but provides a live cursor instead of keeping a search context. Thus it is more efficient for real-time requests.