ElasticSearch can't get multiple suggestor values from the same document - node.js

Can you help me please?
I have a problem with Completion Suggester in ElasticSearch
Example: I have this mapping :
PUT music
{
"mappings": {
"properties": {
"suggest": {
"type": "completion"
},
"title": {
"type": "keyword"
}
}
}
}
and index multiple suggestions for a document as follows:
PUT music/_doc/1?refresh
{
"suggest": [
{
"input": "Nirva test",
"weight": 10
},
{
"input": "Nirva hola",
"weight": 3
}
]
}
Querying: you can do this request on kibana
POST music/_search?pretty
{
"suggest": {
"song-suggest": {
"prefix": "nirv",
"completion": {
"field": "suggest"
}
}
}
}
and the result I retrieve only the first value but not both.
I did the test on kibana dev tool too and this is the result
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"suggest" : {
"song-suggest" : [
{
"text" : "nir",
"offset" : 0,
"length" : 3,
"options" : [
{
"text" : "Nirvana test",
"_index" : "music",
"_type" : "_doc",
"_id" : "1",
"_score" : 10.0,
"_source" : {
"suggest" : [
{
"input" : "Nirvana test",
"weight" : 10
},
{
"input" : "Nirvana best",
"weight" : 3
}
]
}
}
]
}
]
}
}
expected result :
"suggest" : {
"song-suggest" : [
{
"text" : "nirvana",
"offset" : 0,
"length" : 7,
"options" : [
{
"text" : "Nirvana test",
"_index" : "music",
"_type" : "_doc",
"_id" : "1",
"_score" : 10.0,
"_source" : {
"suggest" : [
{
"input" : "Nirvana test",
"weight" : 10
},
{
"input" : "Nirvana best",
"weight" : 3
}
]
}
}
]
},
{
"text" : "nirvana b",
"offset" : 0,
"length" : 9,
"options" : [
{
"text" : "Nirvana best",
"_index" : "music",
"_type" : "_doc",
"_id" : "1",
"_score" : 3.0,
"_source" : {
"suggest" : [
{
"input" : "Nirvana test",
"weight" : 10
},
{
"input" : "Nirvana best",
"weight" : 3
}
]
}
}
]
}
]
}

This is the default behavior of current implementations. You can check #31738. Below is one of the comment for an explanation why it is returning only one document/suggestion.
The completion suggester is document-based by design so we cannot
return one entry per matching suggestion. It is documented that it
returns documents not suggestions and a single input can be indexed in
multiple suggestions (if you have synonyms in your analyzer for
instance) so it is not trivial to differentiate a match from its
variations. Also the completion suggester does not visit all
suggestions to select the top N, it has a special structure (a
weighted FST) that can visit suggestions in the order of their scores
and early terminates the query once enough documents have been found.

Related

elastic search query to select most relevant data for given keyword

I have search query to get data from elastic search DB. The query is below
GET /v_entity_master/_search
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "(*gupta*)",
"fields": [
"id","name","mobile"
]
}
},
{
"query_string": {
"query": "*7542*",
"fields": [
"id","name","mobile"
]
}
}
]
}
}
}
This query return
{
"id":34501,
"name": "sree gupta",
"mobile":"98775421
},
{
"id":12302,
"name": "gupta",
"mobile":"98775422
}
But what I required is, the exact match of the given search key word should be in the 1st result
Expected output is ,
{
"id":12302,
"name": "gupta",
"mobile":"98775422
},{
"id":34501,
"name": "sree gupta",
"mobile":"98775421
}
Please share your suggestion and idea to slove this issue. Thanks in advance
So first of all, why would you search for "(gupta)" in the id and mobile (phone?) field? Based on the two results you shared, they are numeric fields so whats your intention with that?
Same issue with the second must-clause. I've never encountered a real name of a human being that includes numeric values...
I also don't get why you use the wildcards in the first must-clause. I assume you want to do a fulltext search. So you can simply use the match query.
Now to your actual question:
I created an index in my test cluster and indexed the two responses you showed as documents. This is the response when I execute your query:
{
...
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 2.0,
"hits" : [
{
"_index" : "gupta",
"_type" : "_doc",
"_id" : "1",
"_score" : 2.0,
"_source" : {
"id" : 12302,
"name" : "gupta",
"mobile" : "98775422"
}
},
{
"_index" : "gupta",
"_type" : "_doc",
"_id" : "2",
"_score" : 2.0,
"_source" : {
"id" : 34501,
"name" : "sree gupta",
"mobile" : "98775421"
}
}
]
}
}
Notice that both documents have the same score. That's because you specified wildcards in your search query.
Now let's modify your query:
GET gupta/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "gupta"
}
},
{
"query_string": {
"query": "*7542*",
"fields": ["mobile"]
}
}
]
}
}
}
The main difference is that this query uses a match query to do that fulltext search. You don't need to specify any wildcards since your text fields are analyzed.
This will return the following:
{
...
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.2111092,
"hits" : [
{
"_index" : "gupta",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.2111092,
"_source" : {
"id" : 12302,
"name" : "gupta",
"mobile" : "98775422"
}
},
{
"_index" : "gupta",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.160443,
"_source" : {
"id" : 34501,
"name" : "sree gupta",
"mobile" : "98775421"
}
}
]
}
}
Now the two documents have different scores due to Field length normalization. As stated in this article about elasticsearch scoring a term match found in a field with a low number of total terms is going to be more important than a match found in a field with a large number of terms.
I hope I could help you.

How to use $filter(aggregation) to select some fields of array only if condition true?

Here I'll show you what exactly I want. Suppose I have the below two document for XYZ model.
[
{
"_id" : ObjectId("59ef8786e8c7d60552139ba9"),
"name" : "s1",
"email" : "one#one.com",
"mobileNumber" : "910123456989",
"verificationStatus" : true,
"activities" : [
{
"name" : "a1",
"_id" : ObjectId("59ef8786e8c7d60552139bae"),
"type" : 0,
"level" : null,
"verificationStatus" : true
},
{
"name" : "a2",
"_id" : ObjectId("59ef8786e8c7d60552139bad"),
"type" : 0,
"level" : null,
"verificationStatus" : false
}
],
"address" : {
"line1" : "asd",
"line2" : "asd",
"city" : "sd",
"state" : "sd",
"country" : "asd",
"landmark" : "sdsa",
"pincode" : "560090"
},
"__v" : 0
},
{
"_id" : ObjectId("59ef8786e8c7d60552139ba9"),
"name" : "s1",
"email" : "one#one.com",
"mobileNumber" : "919876543210",
"verificationStatus" : true,
"activities" : [
{
"name" : "b1",
"_id" : ObjectId("59ef8786e8c7d60552139bae"),
"level" : null,
"type" : 0,
"verificationStatus" : true
},
{
"name" : "b2",
"_id" : ObjectId("59ef8786e8c7d60552139bad"),
"level" : null,
"type" : 0,
"verificationStatus" : false
}
],
"address" : {
"line1" : "asd",
"line2" : "asd",
"city" : "sd",
"state" : "sd",
"country" : "asd",
"landmark" : "sdsa",
"pincode" : "560090"
},
"__v" : 0
}
]
Now I want only the name, mobileNumber and activities.name from the document where verificationStatus is true and I don't want all the activities I want activities.name only if activities.varificationStatus is true.
I can get the list of all document where varificationStatus is true and activities.varificationStatus is true but I'm not able to select only required fields (activities.name) from activities.
My current code is:
XYZ.aggregate(
[
{ $match: { verificationStatus: true } },
{
$project: {
name: 1,
coverImage: 1,
location: 1,
address: 1,
dist: 1,
activities: {
$filter: {
input: "$activities",
as: "activity",
cond: {
$eq: ["$$activity.verificationStatus", true]
}
}
}
}
}], function (err, list) {
if (err) {
reject(err);
}
else {
resolve(list);
}
});
You actually need $map to "alter" the array elements returned, as $filter only "selects" the array elements that "match":
XYZ.aggregate(
[
{ $match: { verificationStatus: true } },
{
$project: {
name: 1,
mobileNumber: 1,
activities: {
$map: {
input: {
$filter: {
input: "$activities",
as: "activity",
cond: "$$activity.verificationStatus"
}
},
"as": "a",
"in": "$$a.name"
}
}
}
}], function (err, list) {
...
Would return:
{
"_id" : ObjectId("59ef8786e8c7d60552139ba9"),
"name" : "s1",
"mobileNumber" : "910123456989",
"activities" : ["a1"]
}
{
"_id" : ObjectId("59ef8786e8c7d60552139ba9"),
"name" : "s1",
"mobileNumber" : "919876543210",
"activities" : ["b1"]
}
Note also that the "cond" in $filter can be shortened since it's already a boolean value.
If you wanted the "object" with the property of "name" only, then return just that assigned key:
XYZ.aggregate(
[
{ $match: { verificationStatus: true } },
{
$project: {
name: 1,
mobileNumber: 1,
activities: {
$map: {
input: {
$filter: {
input: "$activities",
as: "activity",
cond: "$$activity.verificationStatus"
}
},
"as": "a",
"in": {
"name": "$$a.name"
}
}
}
}
}], function (err, list) {
...
Returns as:
{
"_id" : ObjectId("59ef8786e8c7d60552139ba9"),
"name" : "s1",
"mobileNumber" : "910123456989",
"activities" : [{ "name": "a1" }]
}
{
"_id" : ObjectId("59ef8786e8c7d60552139ba9"),
"name" : "s1",
"mobileNumber" : "919876543210",
"activities" : [{ "name": "b1" }]
}
If you knew for certain that you were matching "one" element in the array, then $indexOfArray with $arrayElemAt could be used instead if you have MongoDB 3.4
{ "$project": {
"name": 1,
"mobileNumber": 1,
"activities": {
"$arrayElemAt": [
"$activities.name",
{ "$indexOfArray": [ "$activities.verificationStatus", true ] }
]
}
}}
Which would come out a little differently since it's a singular value and not an array:
{
"_id" : ObjectId("59ef8786e8c7d60552139ba9"),
"name" : "s1",
"mobileNumber" : "910123456989",
"activities" : "a1"
}
{
"_id" : ObjectId("59ef8786e8c7d60552139ba9"),
"name" : "s1",
"mobileNumber" : "919876543210",
"activities" : "b1"
}

ElasticSearch fuzzy percolator query does not match

Please explain to me the following problem:
Query with fuzziness 0 doesn't match. Why?
I have the mapping:
$ curl -XGET 'http://localhost:9200/words/_mapping?pretty'
{
"words" : {
"mappings" : {
".percolator" : {
"properties" : {
"category" : {
"type" : "string",
"index" : "not_analyzed"
},
"fuzziness" : {
"type" : "long"
},
"list" : {
"type" : "string",
"index" : "not_analyzed"
},
"query" : {
"type" : "object",
"enabled" : false
}
}
},
"query_doc" : {
"properties" : {
"category" : {
"type" : "string",
"index" : "not_analyzed"
},
"text" : {
"type" : "string"
}
}
}
}
}
}
I have the percolator queries:
$ curl 'http://localhost:9200/words/.percolator/_search?pretty=true&q=*:*'
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "words",
"_type" : ".percolator",
"_id" : "id4_0",
"_score" : 1.0,
"_source" : {
"query" : {
"fuzzy" : {
"text" : {
"fuzziness" : 0,
"value" : "Banes"
},
"category" : "cuba"
}
}
}
}, {
"_index" : "words",
"_type" : ".percolator",
"_id" : "id4_1",
"_score" : 1.0,
"_source" : {
"query" : {
"fuzzy" : {
"text" : {
"fuzziness" : 1,
"value" : "Banes"
},
"category" : "cuba"
}
}
}
} ]
}
}
When I run the percolate query only the query with fuzziness 1 is matching:
$ curl 'http://localhost:9200/words/query_doc/_percolate?pretty' -d '
{
"doc": {
"text": "Just Banes"
}
}'
{
"took" : 2,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"total" : 1,
"matches" : [ {
"_index" : "words",
"_id" : "id4_1"
} ]
}
What is wrong? Could someone explain this?
Thanks

AQL Query Really Slow (~20 seconds)

The following query is taking around 20 seconds to execute:
FOR p IN PATHS(locations, connections, "outbound", { maxLength: 1 }) FILTER p.source._key == "26094" RETURN p.vertices[*].name
I believe this is a simple query (and the database is not that big) and it should execute fairly quick... I must be doing something wrong... Here is the query result:
==> [object ArangoQueryCursor - count: 286, hasMore: false]
The locations (vertices) collection has 23753 documents, and the connections (edges) collection has 123414 documents.
I tried to filter by _id as well but the performance is somewhat the same.
Is there anything I could do to get a better performance?
Here is the query's .explain() report:
{
"plan" : {
"nodes" : [
{
"type" : "SingletonNode",
"dependencies" : [ ],
"id" : 1,
"estimatedCost" : 1,
"estimatedNrItems" : 1
},
{
"type" : "CalculationNode",
"dependencies" : [
1
],
"id" : 2,
"estimatedCost" : 2,
"estimatedNrItems" : 1,
"expression" : {
"type" : "function call",
"name" : "PATHS",
"subNodes" : [
{
"type" : "array",
"subNodes" : [
{
"type" : "collection",
"name" : "locations"
},
{
"type" : "collection",
"name" : "connections"
},
{
"type" : "value",
"value" : "outbound"
},
{
"type" : "object",
"subNodes" : [
{
"type" : "object element",
"name" : "maxLength",
"subNodes" : [
{
"type" : "value",
"value" : 1
}
]
}
]
}
]
}
]
},
"outVariable" : {
"id" : 2,
"name" : "2"
},
"canThrow" : true
},
{
"type" : "EnumerateListNode",
"dependencies" : [
2
],
"id" : 3,
"estimatedCost" : 102,
"estimatedNrItems" : 100,
"inVariable" : {
"id" : 2,
"name" : "2"
},
"outVariable" : {
"id" : 0,
"name" : "p"
}
},
{
"type" : "CalculationNode",
"dependencies" : [
3
],
"id" : 4,
"estimatedCost" : 202,
"estimatedNrItems" : 100,
"expression" : {
"type" : "compare ==",
"subNodes" : [
{
"type" : "attribute access",
"name" : "_key",
"subNodes" : [
{
"type" : "attribute access",
"name" : "source",
"subNodes" : [
{
"type" : "reference",
"name" : "p",
"id" : 0
}
]
}
]
},
{
"type" : "value",
"value" : "26094"
}
]
},
"outVariable" : {
"id" : 3,
"name" : "3"
},
"canThrow" : false
},
{
"type" : "FilterNode",
"dependencies" : [
4
],
"id" : 5,
"estimatedCost" : 302,
"estimatedNrItems" : 100,
"inVariable" : {
"id" : 3,
"name" : "3"
}
},
{
"type" : "CalculationNode",
"dependencies" : [
5
],
"id" : 6,
"estimatedCost" : 402,
"estimatedNrItems" : 100,
"expression" : {
"type" : "expand",
"subNodes" : [
{
"type" : "iterator",
"subNodes" : [
{
"type" : "variable",
"name" : "1_",
"id" : 1
},
{
"type" : "attribute access",
"name" : "vertices",
"subNodes" : [
{
"type" : "reference",
"name" : "p",
"id" : 0
}
]
}
]
},
{
"type" : "attribute access",
"name" : "name",
"subNodes" : [
{
"type" : "reference",
"name" : "1_",
"id" : 1
}
]
}
]
},
"outVariable" : {
"id" : 4,
"name" : "4"
},
"canThrow" : false
},
{
"type" : "ReturnNode",
"dependencies" : [
6
],
"id" : 7,
"estimatedCost" : 502,
"estimatedNrItems" : 100,
"inVariable" : {
"id" : 4,
"name" : "4"
}
}
],
"rules" : [
"move-calculations-up",
"move-filters-up",
"move-calculations-up-2",
"move-filters-up-2"
],
"collections" : [
{
"name" : "connections",
"type" : "read"
},
{
"name" : "locations",
"type" : "read"
}
],
"variables" : [
{
"id" : 0,
"name" : "p"
},
{
"id" : 1,
"name" : "1_"
},
{
"id" : 2,
"name" : "2"
},
{
"id" : 3,
"name" : "3"
},
{
"id" : 4,
"name" : "4"
}
],
"estimatedCost" : 502,
"estimatedNrItems" : 100
},
"warnings" : [ ],
"stats" : {
"rulesExecuted" : 21,
"rulesSkipped" : 0,
"plansCreated" : 1
}
}
PATHS() will build all paths of the graph and then post-filter the results using the FILTER on the _key attribute. This may create a huge result set first (for all paths) before filtering out all non-matches.
If all that's required is to find connected vertices on depth 1, I think it will be more efficient to do something like this:
querying using TRAVERSAL:
This is more efficient because it will build all paths in the graph but only those starting at the specified start vertex:
FOR p IN TRAVERSAL(locations, connections, "1", "outbound", { minDepth: 1, maxDepth: 1, paths: true })
RETURN p.path.vertices[*].name
querying direct neighbors using NEIGHBORS:
This may be slightly more efficient even because it will construct a smaller intermediate result.
Additionally, it won't return the start vertex (26094) but all vertices directly connected to it:
FOR p IN NEIGHBORS(locations, connections, "26094", "outbound")
RETURN p.vertex.name
querying the edges directly (not using graph functions)
Finally you can query the edge collection directly.
Again, this won't return the start vertex (26094) but all vertices directly connected to it:
FOR edge IN connections
FILTER edge._from == "locations/26094"
FOR vertex IN locations
FILTER vertex._id == edge._to
RETURN vertex.name

Query Rule for non-indexed attribute FILTER

I observere an enormous runtime difference between those two AQL statements an a DB set with about 20 Mio records:
FOR e IN EAll
FILTER e.lastname == "Kmp" // <-- skip-index
FILTER e.lastpaff != "" // <-- no index
RETURN e
// runs in less than a second
AND
FOR e IN EAll
FILTER e.lastpaff != "" // <-- no index
FILTER e.lastname == "Kmp" // <-- skip-index
RETURN e
// needs about a minute to execute.
In addition to be (or not) indexed, the selectivity of those statements is highly different: the indexedAttribute is highly selective where-as the nonIndexedAttribute only filters 50%.
Is it possible that there is not yet an optimization rule for that? I currently am using ArangoDB 2.4.0.
DETAILS:
There is a SKIP-Index on the indexed Attribute (which seems to be used in the execuation plan 1).
Here are the execuation plan, in which only the order of the filters are changed:
FAST QUERY:
arangosh [Uni]> stmt.explain()
{
"plan" : {
"nodes" : [
{
"type" : "SingletonNode",
"dependencies" : [ ],
"id" : 1,
"estimatedCost" : 1,
"estimatedNrItems" : 1
},
{
"type" : "IndexRangeNode",
"dependencies" : [
1
],
"id" : 8,
"estimatedCost" : 170463.32,
"estimatedNrItems" : 170462,
"database" : "Uni",
"collection" : "EAll",
"outVariable" : {
"id" : 0,
"name" : "i"
},
"ranges" : [
[
{
"variable" : "i",
"attr" : "lastname",
"lowConst" : {
"bound" : "Kmp",
"include" : true,
"isConstant" : true
},
"highConst" : {
"bound" : "Kmp",
"include" : true,
"isConstant" : true
},
"lows" : [ ],
"highs" : [ ],
"valid" : true,
"equality" : true
}
]
],
"index" : {
"type" : "skiplist",
"id" : "13295598550318",
"unique" : false,
"fields" : [
"lastname"
]
},
"reverse" : false
},
{
"type" : "CalculationNode",
"dependencies" : [
8
],
"id" : 5,
"estimatedCost" : 340925.32,
"estimatedNrItems" : 170462,
"expression" : {
"type" : "compare !=",
"subNodes" : [
{
"type" : "attribute access",
"name" : "lastpaff",
"subNodes" : [
{
"type" : "reference",
"name" : "i",
"id" : 0
}
]
},
{
"type" : "value",
"value" : ""
}
]
},
"outVariable" : {
"id" : 2,
"name" : "2"
},
"canThrow" : false
},
{
"type" : "FilterNode",
"dependencies" : [
5
],
"id" : 6,
"estimatedCost" : 511387.32,
"estimatedNrItems" : 170462,
"inVariable" : {
"id" : 2,
"name" : "2"
}
},
{
"type" : "ReturnNode",
"dependencies" : [
6
],
"id" : 7,
"estimatedCost" : 681849.3200000001,
"estimatedNrItems" : 170462,
"inVariable" : {
"id" : 0,
"name" : "i"
}
}
],
"rules" : [
"move-calculations-up",
"move-filters-up",
"move-calculations-up-2",
"move-filters-up-2",
"use-index-range",
"remove-filter-covered-by-index"
],
"collections" : [
{
"name" : "EAll",
"type" : "read"
}
],
"variables" : [
{
"id" : 0,
"name" : "i"
},
{
"id" : 1,
"name" : "1"
},
{
"id" : 2,
"name" : "2"
}
],
"estimatedCost" : 681849.3200000001,
"estimatedNrItems" : 170462
},
"warnings" : [ ],
"stats" : {
"rulesExecuted" : 19,
"rulesSkipped" : 0,
"plansCreated" : 1
}
}
SLOW Query:
arangosh [Uni]> stmt.explain()
{
"plan" : {
"nodes" : [
{
"type" : "SingletonNode",
"dependencies" : [ ],
"id" : 1,
"estimatedCost" : 1,
"estimatedNrItems" : 1
},
{
"type" : "EnumerateCollectionNode",
"dependencies" : [
1
],
"id" : 2,
"estimatedCost" : 17046233,
"estimatedNrItems" : 17046232,
"database" : "Uni",
"collection" : "EAll",
"outVariable" : {
"id" : 0,
"name" : "i"
},
"random" : false
},
{
"type" : "CalculationNode",
"dependencies" : [
2
],
"id" : 3,
"estimatedCost" : 34092465,
"estimatedNrItems" : 17046232,
"expression" : {
"type" : "compare !=",
"subNodes" : [
{
"type" : "attribute access",
"name" : "lastpaff",
"subNodes" : [
{
"type" : "reference",
"name" : "i",
"id" : 0
}
]
},
{
"type" : "value",
"value" : ""
}
]
},
"outVariable" : {
"id" : 1,
"name" : "1"
},
"canThrow" : false
},
{
"type" : "FilterNode",
"dependencies" : [
3
],
"id" : 4,
"estimatedCost" : 51138697,
"estimatedNrItems" : 17046232,
"inVariable" : {
"id" : 1,
"name" : "1"
}
},
{
"type" : "CalculationNode",
"dependencies" : [
4
],
"id" : 5,
"estimatedCost" : 68184929,
"estimatedNrItems" : 17046232,
"expression" : {
"type" : "compare ==",
"subNodes" : [
{
"type" : "attribute access",
"name" : "lastname",
"subNodes" : [
{
"type" : "reference",
"name" : "i",
"id" : 0
}
]
},
{
"type" : "value",
"value" : "Kmp"
}
]
},
"outVariable" : {
"id" : 2,
"name" : "2"
},
"canThrow" : false
},
{
"type" : "FilterNode",
"dependencies" : [
5
],
"id" : 6,
"estimatedCost" : 85231161,
"estimatedNrItems" : 17046232,
"inVariable" : {
"id" : 2,
"name" : "2"
}
},
{
"type" : "ReturnNode",
"dependencies" : [
6
],
"id" : 7,
"estimatedCost" : 102277393,
"estimatedNrItems" : 17046232,
"inVariable" : {
"id" : 0,
"name" : "i"
}
}
],
"rules" : [
"move-calculations-up",
"move-filters-up",
"move-calculations-up-2",
"move-filters-up-2"
],
"collections" : [
{
"name" : "EAll",
"type" : "read"
}
],
"variables" : [
{
"id" : 0,
"name" : "i"
},
{
"id" : 1,
"name" : "1"
},
{
"id" : 2,
"name" : "2"
}
],
"estimatedCost" : 102277393,
"estimatedNrItems" : 17046232
},
"warnings" : [ ],
"stats" : {
"rulesExecuted" : 19,
"rulesSkipped" : 0,
"plansCreated" : 1
}
}
Indeed, conditions like the following disabled the usage of indexes even though an index could be used:
FILTER doc.indexedAttribute != ... FILTER doc.indexedAttribute == ...
Interestingly an index is used when the two conditions are put into the same FILTER condition and combined with &&:
FILTER doc.indexedAttribute != ... && doc.indexedAttribute == ...
Though these two statements are equivalent, they trigger a slightly different code path. The former will be AND-combining two existing FILTER ranges, the latter one will produce a range from a single FILTER. The case of AND-combination for the FILTER ranges was overly defensive and rejected both sides even if only a single side (in this case the one with the non-equality operator) could not be used for an index scan.
This has been fixed in 2.4, and the fix will be contained in 2.4.2. A workaround for now is to combine the two FILTER statements in a single one.

Resources