Workaround to group solr search result by collection field - search

I am beginner at Solr. There can be mistakes. Sorry.
I'm using version 7.7.1.
Let's say there are next documents:
{
"documents": [
{
"id": 1,
"category": [
"a",
"b"
],
"score":0.10 //lucene score
},
{
"id": 2,
"category": [
"b",
"c",
"d",
"e"
],
"score":0.20 //lucene score
},
{
"id": 3,
"category": [
"a",
"e"
],
"score":0.30 //lucene score
},
{
"id": 4,
"category": [
"d",
"e"
],
"score":0.40 //lucene score
},
{
"id": 5,
"category": [
"a",
"c"
],
"score":0.50 //lucene score
}
]
}
The main task is the next. I get 3 or more different categories and I need for only one document with the highest score for every category. In other words I need to group result by category field, every group has to be sorted by score desc and every group has to be limited with 1.
For example for got a,b,c categories result has to contain 3 documents
document with id == 5 for a category
document with id == 2 for b category
document with id == 5 for c category
Is it possible to create solr query to get such result with single request?
I tried the next approaches but they didn't help or work bad:
Grouping is not considered due to category field is collection.
Faceting returns only number of results. I need for a complete document.
There is a possibility to do the request for every category. But there can be 50 categories at one time and I guess it will be time consuming to make 50 requests in solr.
Thanks and Regards

json.facet can be helpful here. You can use below query and it will give you the response as per your needs. This query will first create buckets for category field sorted in ascending order and then nested bucket of ids sorted with lucene score of the query that passed via "qq" param.
I have used a function query which we are using to evaluate lucene score for the given query. For now, It is a simple query, but you can create a complex too and passed in qq parameter. Read here :- https://lucene.apache.org/solr/guide/6_6/function-queries.html ot get more information on function queries.
q=*&qq=category:a&json.facet={
categories:{
type:terms,
field:category,
sort:{index:asc},
facet:{
id:{
type:terms,
field:id,
sort:"query_score desc",
facet:{
query_score:"min(if(exists(query($qq)),query($qq),0))"
}
}
}
}
}
[Response of above query]
{
"responseHeader":{
"status":0,
"QTime":7,
"params":{
"qq":"category:a",
"q":"*",
"json.facet":"{ categories:{ type:terms, field:category, sort:{index:asc}, facet:{ id:{ type:terms, field:id, sort:\"query_score desc\", facet:{ query_score:\"min(if(exists(query($qq)),query($qq),0))\" } } } } }",
"indent":"on",
"fl":"*,query($qq,-1)",
"rows":"0",
"wt":"json"}},
"response":{"numFound":5,"start":0,"docs":[]
},
"facets":{
"count":5,
"categories":{
"buckets":[{
"val":"a",
"count":3,
"id":{
"buckets":[{
"val":"1",
"count":1,
"query_score":0.5389965176582336},
{
"val":"3",
"count":1,
"query_score":0.5389965176582336},
{
"val":"5",
"count":1,
"query_score":0.5389965176582336}]}},
{
"val":"b",
"count":2,
"id":{
"buckets":[{
"val":"1",
"count":1,
"query_score":0.5389965176582336},
{
"val":"2",
"count":1,
"query_score":0.0}]}},
{
"val":"c",
"count":2,
"id":{
"buckets":[{
"val":"5",
"count":1,
"query_score":0.5389965176582336},
{
"val":"2",
"count":1,
"query_score":0.0}]}},
{
"val":"d",
"count":2,
"id":{
"buckets":[{
"val":"2",
"count":1,
"query_score":0.0},
{
"val":"4",
"count":1,
"query_score":0.0}]}},
{
"val":"e",
"count":3,
"id":{
"buckets":[{
"val":"3",
"count":1,
"query_score":0.5389965176582336},
{
"val":"2",
"count":1,
"query_score":0.0},
{
"val":"4",
"count":1,
"query_score":0.0}]}}]}}}
For more information on json.facet :- https://lucene.apache.org/solr/guide/7_2/json-facet-api.html

Related

How to project specific fields from a queried document inside an array?

here is the document
formId: 123,
title:"XYZ"
eventDate:"2022-04-15T05:40:57.182Z"
responses:[
{
orderId:98422,
name:"XYZ1",
email:"a#gmal.com",
paymentStatus:"pending",
amount:250,
phone:123456789
},
{
orderId:98422,
name:"XYZ1",
email:"a#gmal.com",
paymentStatus:"success",
amount:250,
phone:123456791
}
]
I used $elemMatch to filter the array such that I get only the matched object.
const response = await Form.findOne({ formId:123 }, {
_id:0,
title: 1,
eventDate: 1,
responses: {
$elemMatch: { orderId: 98422 },
},
})
But this returns all the fields inside the object present in the array "responses".
title:"XYZ"
eventDate:"2022-04-15T05:40:57.182Z"
responses:[
{
orderId:98422,
name:"XYZ1",
email:"a#gmal.com",
paymentStatus:"pending",
amount:250,
phone:123456789
}
]
But I want only specific fields to be returned inside the object like this
title:"XYZ"
eventDate:"2022-04-15T05:40:57.182Z"
responses:[
{
name:"XYZ1",
email:"a#gmal.com",
paymentStatus:"pending",
}
]
How can i do that ?
Query
aggregation way to keep some members and edit them also
map on responses if orderId matches keep the fields you want, the others are replaced with null
filter to remove those nulls (members that didnt match)
here 2 matches if you want to keep only one member of the array you can use
[($first ($filter ...)]
*$elemMatch that you used can be combined with the $ project operator to avoid the aggregation, but with $ operator we get all the matching member (here you want only some fields so i think aggregation is the way)
Playmongo
aggregate(
[{"$match": {"formId": {"$eq": 123}}},
{"$project":
{"_id": 0,
"title": 1,
"eventDate": 1,
"responses":
{"$map":
{"input": "$responses",
"in":
{"$cond":
[{"$eq": ["$$this.orderId", 98422]},
{"name": "$$this.name",
"email": "$$this.email",
"paymentStatus": "$$this.paymentStatus"},
null]}}}}},
{"$set":
{"responses":
{"$filter":
{"input": "$responses", "cond": {"$ne": ["$$this", null]}}}}}])

how to query nested array of objects in mongodb?

i am trying to query nested array of objects in mongodb from node js, tried all the solutions but no luck. can anyone please help this on priority?
I have tried following :
{
"name": "Science",
"chapters": [
{
"name": "ScienceChap1",
"tests": [
{
"name": "ScienceChap1Test1",
"id": 1,
"marks": 10,
"duration": 30,
"questions": [
{
"question": "What is the capital city of New Mexico?",
"type": "mcq",
"choice": [
"Guadalajara",
"Albuquerque",
"Santa Fe",
"Taos"
],
"answer": [
"Santa Fe",
"Taos"
]
},
{
"question": "Who is the author of beowulf?",
"type": "notmcq",
"choice": [
"Mark Twain",
"Shakespeare",
"Abraham Lincoln",
"Newton"
],
"answer": [
"Shakespeare"
]
}
]
},
{
"name": "ScienceChap1test2",
"id": 2,
"marks": 20,
"duration": 30,
"questions": [
{
"question": "What is the capital city of New Mexico?",
"type": "mcq",
"choice": [
"Guadalajara",
"Albuquerque",
"Santa Fe",
"Taos"
],
"answer": [
"Santa Fe",
"Taos"
]
},
{
"question": "Who is the author of beowulf?",
"type": "notmcq",
"choice": [
"Mark Twain",
"Shakespeare",
"Abraham Lincoln",
"Newton"
],
"answer": [
"Shakespeare"
]
}
]
}
]
}
]
}
Here is what I've tried so far but still can't get it to work
db.quiz.find({name:"Science"},{"tests":0,chapters:{$elemMatch:{name:"ScienceCh‌​ap1"}}})
db.quiz.find({ chapters: { $elemMatch: {$elemMatch: { name:"ScienceChap1Test1" } } }})
db.quiz.find({name:"Science"},{chapters:{$elemMatch:{$elemMatch:{name:"Scienc‌​eChap1Test1"}}}}) ({ name:"Science"},{ chapters: { $elemMatch: {$elemMatch: { name:"ScienceChap1Test1" } } }})
Aggregation Framework
You can use the aggregation framework to transform and combine documents in a collection to display to the client. You build a pipeline that processes a stream of documents through several building blocks: filtering, projecting, grouping, sorting, etc.
If you want get the mcq type questions from the test named "ScienceChap1Test1", you would do the following:
db.quiz.aggregate(
//Match the documents by query. Search for science course
{"$match":{"name":"Science"}},
//De-normalize the nested array of chapters.
{"$unwind":"$chapters"},
{"$unwind":"$chapters.tests"},
//Match the document with test name Science Chapter
{"$match":{"chapters.tests.name":"ScienceChap1test2"}},
//Unwind nested questions array
{"$unwind":"$chapters.tests.questions"},
//Match questions of type mcq
{"$match":{"chapters.tests.questions.type":"mcq"}}
).pretty()
The result will be:
{
"_id" : ObjectId("5629eb252e95c020d4a0c5a5"),
"name" : "Science",
"chapters" : {
"name" : "ScienceChap1",
"tests" : {
"name" : "ScienceChap1test2",
"id" : 2,
"marks" : 20,
"duration" : 30,
"questions" : {
"question" : "What is the capital city of New Mexico?",
"type" : "mcq",
"choice" : [
"Guadalajara",
"Albuquerque",
"Santa Fe",
"Taos"
],
"answer" : [
"Santa Fe",
"Taos"
]
}
}
}
}
$elemMatch doesn't work for sub documents. You can use the aggregation framework for "array filtering" by using $unwind.
You can delete each line from the bottom of each command in the aggregation pipeline in the above code to observe the pipelines behavior.
You should try the following queries in the mongodb simple javascript shell.
There could be Two Scenarios.
Scenario One
If you simply want to return the documents that contain certain chapter names or test names for example just one argument in find will do.
For the find method the document you want to be returned is specified by the first argument. You could return documents with the name Science by doing this:
db.quiz.find({name:"Science"})
You could specify criteria to match a single embedded document in an array by using $elemMatch. To find a document that has a chapter with the name ScienceChap1. You could do this:
db.quiz.find({"chapters":{"$elemMatch":{"name":"ScienceChap1"}}})
If you wanted your criteria to be a test name then you could use the dot operator like this:
db.quiz.find({"chapters.tests":{"$elemMatch":{"name":"ScienceChap1Test1"}}})
Scenario Two - Specifying Which Keys to Return
If you want to specify which keys to Return you can pass a second argument to find (or findOne) specifying the keys you want. In your case you can search for the document name and then provide which keys to return like so.
db.quiz.find({name:"Science"},{"chapters":1})
//Would return
{
"_id": ObjectId(...),
"chapters": [
"name": "ScienceChap2",
"tests: [..all object content here..]
}
If you only want to return the marks from each test object you can use the dot operator to do so:
db.quiz.find({name:"Science"},{"chapters.tests.marks":1})
//Would return
{
"_id": ObjectId(...),
"chapters": [
"tests: [
{"marks":10},
{"marks":20}
]
}
If you only want to return the questions from each test:
db.quiz.find({name:"Science"},{"chapters.tests.questions":1})
Test these out. I hope these help.

Mongoose aggregation "$sum" of rows in sub document

I'm fairly good with sql queries, but I can't seem to get my head around grouping and getting sum of mongo db documents,
With this in mind, I have a job model with schema like below :
{
name: {
type: String,
required: true
},
info: String,
active: {
type: Boolean,
default: true
},
all_service: [
price: {
type: Number,
min: 0,
required: true
},
all_sub_item: [{
name: String,
price:{ // << -- this is the price I want to calculate
type: Number,
min: 0
},
owner: {
user_id: { // <<-- here is the filter I want to put
type: Schema.Types.ObjectId,
required: true
},
name: String,
...
}
}]
],
date_create: {
type: Date,
default : Date.now
},
date_update: {
type: Date,
default : Date.now
}
}
I would like to have a sum of price column, where owner is present, I tried below but no luck
Job.aggregate(
[
{
$group: {
_id: {}, // not sure what to put here
amount: { $sum: '$all_service.all_sub_item.price' }
},
$match: {'not sure how to limit the user': given_user_id}
}
],
//{ $project: { _id: 1, expense: 1 }}, // you can only project fields from 'group'
function(err, summary) {
console.log(err);
console.log(summary);
}
);
Could someone guide me in the right direction. thank you in advance
Primer
As is correctly noted earlier, it does help to think of an aggregation "pipeline" just as the "pipe" | operator from Unix and other system shells. One "stage" feeds input to the "next" stage and so on.
The thing you need to be careful with here is that you have "nested" arrays, one array within another, and this can make drastic differences to your expected results if you are not careful.
Your documents consist of an "all_service" array at the top level. Presumably there are often "multiple" entries here, all containing your "price" property as well as "all_sub_item". Then of course "all_sub_item" is an array in itself, also containg many items of it's own.
You can think of these arrays as the "relations" between your tables in SQL, in each case a "one-to-many". But the data is in a "pre-joined" form, where you can fetch all data at once without performing joins. That much you should already be familiar with.
However, when you want to "aggregate" accross documents, you need to "de-normalize" this in much the same way as in SQL by "defining" the "joins". This is to "transform" the data into a de-normalized state that is suitable for aggregation.
So the same visualization applies. A master document's entries are replicated by the number of child documents, and a "join" to an "inner-child" will replicate both the master and initial "child" accordingly. In a "nutshell", this:
{
"a": 1,
"b": [
{
"c": 1,
"d": [
{ "e": 1 }, { "e": 2 }
]
},
{
"c": 2,
"d": [
{ "e": 1 }, { "e": 2 }
]
}
]
}
Becomes this:
{ "a" : 1, "b" : { "c" : 1, "d" : { "e" : 1 } } }
{ "a" : 1, "b" : { "c" : 1, "d" : { "e" : 2 } } }
{ "a" : 1, "b" : { "c" : 2, "d" : { "e" : 1 } } }
{ "a" : 1, "b" : { "c" : 2, "d" : { "e" : 2 } } }
And the operation to do this is $unwind, and since there are multiple arrays then you need to $unwind both of them before continuing any processing:
db.collection.aggregate([
{ "$unwind": "$b" },
{ "$unwind": "$b.d" }
])
So there the "pipe" first array from "$b" like so:
{ "a" : 1, "b" : { "c" : 1, "d" : [ { "e" : 1 }, { "e" : 2 } ] } }
{ "a" : 1, "b" : { "c" : 2, "d" : [ { "e" : 1 }, { "e" : 2 } ] } }
Which leaves a second array referenced by "$b.d" to further be de-normalized into the the final de-normalized result "without any arrays". This allows other operations to process.
Solving
With just about "every" aggregation pipeline, the "first" thing you want to do is "filter" the documents to only those that contain your results. This is a good idea, as especially when doing operations such as $unwind, then you don't want to be doing that on documents that do not even match your target data.
So you need to match your "user_id" at the array depth. But this is only part of getting the result, since you should be aware of what happens when you query a document for a matching value in an array.
Of course, the "whole" document is still returned, because this is what you really asked for. The data is already "joined" and we haven't asked to "un-join" it in any way.You look at this just as a "first" document selection does, but then when "de-normalized", every array element now actualy represents a "document" in itself.
So not "only" do you $match at the beginning of the "pipeline", you also $match after you have processed "all" $unwind statements, down to the level of the element you wish to match.
Job.aggregate(
[
// Match to filter possible "documents"
{ "$match": {
"all_service.all_sub_item.owner": given_user_id
}},
// De-normalize arrays
{ "$unwind": "$all_service" },
{ "$unwind": "$all_service.all_subitem" },
// Match again to filter the array elements
{ "$match": {
"all_service.all_sub_item.owner": given_user_id
}},
// Group on the "_id" for the "key" you want, or "null" for all
{ "$group": {
"_id": null,
"total": { "$sum": "$all_service.all_sub_item.price" }
}}
],
function(err,results) {
}
)
Alternately, modern MongoDB releases since 2.6 also support the $redact operator. This could be used in this case to "pre-filter" the array content before processing with $unwind:
Job.aggregate(
[
// Match to filter possible "documents"
{ "$match": {
"all_service.all_sub_item.owner": given_user_id
}},
// Filter arrays for matches in document
{ "$redact": {
"$cond": {
"if": {
"$eq": [
{ "$ifNull": [ "$owner", given_user_id ] },
given_user_id
]
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}},
// De-normalize arrays
{ "$unwind": "$all_service" },
{ "$unwind": "$all_service.all_subitem" },
// Group on the "_id" for the "key" you want, or "null" for all
{ "$group": {
"_id": null,
"total": { "$sum": "$all_service.all_sub_item.price" }
}}
],
function(err,results) {
}
)
That can "recursively" traverse the document and test for the condition, effectively removing any "un-matched" array elements before you even $unwind. This can speed things up a bit since items that do not match would not need to be "un-wound". However there is a "catch" in that if for some reason the "owner" did not exist on an array element at all, then the logic required here would count that as another "match". You can always $match again to be sure, but there is still a more efficient way to do this:
Job.aggregate(
[
// Match to filter possible "documents"
{ "$match": {
"all_service.all_sub_item.owner": given_user_id
}},
// Filter arrays for matches in document
{ "$project": {
"all_items": {
"$setDifference": [
{ "$map": {
"input": "$all_service",
"as": "A",
"in": {
"$setDifference": [
{ "$map": {
"input": "$$A.all_sub_item",
"as": "B",
"in": {
"$cond": {
"if": { "$eq": [ "$$B.owner", given_user_id ] },
"then": "$$B",
"else": false
}
}
}},
false
]
}
}},
[[]]
]
}
}},
// De-normalize the "two" level array. "Double" $unwind
{ "$unwind": "$all_items" },
{ "$unwind": "$all_items" },
// Group on the "_id" for the "key" you want, or "null" for all
{ "$group": {
"_id": null,
"total": { "$sum": "$all_items.price" }
}}
],
function(err,results) {
}
)
That process cuts down the size of the items in both arrays "drastically" compared to $redact. The $map operator processes each elment of an array to the given statement within "in". In this case, each "outer" array elment is sent to another $map to process the "inner" elements.
A logical test is performed here with $cond whereby if the "condiition" is met then the "inner" array elment is returned, otherwise the false value is returned.
The $setDifference is used to filter down any false values that are returned. Or as in the "outer" case, any "blank" arrays resulting from all false values being filtered from the "inner" where there is no match there. This leaves just the matching items, encased in a "double" array, e.g:
[[{ "_id": 1, "price": 1, "owner": "b" },{..}],[{..},{..}]]
As "all" array elements have an _id by default with mongoose (and this is a good reason why you keep that) then every item is "distinct" and not affected by the "set" operator, apart from removing the un-matched values.
Process $unwind "twice" to convert these into plain objects in their own documents, suitable for aggregation.
So those are the things you need to know. As I stated earlier, be "aware" of how the data "de-normalizes" and what that implies towards your end totals.
It sounds like you want to, in SQL equivalent, do "sum (prices) WHERE owner IS NOT NULL".
On that assumption, you'll want to do your $match first, to reduce the input set to your sum. So your first stage should be something like
$match: { all_service.all_sub_items.owner : { $exists: true } }
Think of this as then passing all matching documents to your second stage.
Now, because you are summing an array, you have to do another step. Aggregation operators work on documents - there isn't really a way to sum an array. So we want to expand your array so that each element in the array gets pulled out to represent the array field as a value, in its own document. Think of this as a cross join. This will be $unwind.
$unwind: { "$all_service.all_sub_items" }
Now you've just made a much larger number of documents, but in a form where we can sum them. Now we can perform the $group. In your $group, you specify a transformation. The line:
_id: {}, // not sure what to put here
is creating a field in the output document, which is not the same documents as the input documents. So you can make the _id here anything you'd like, but think of this as the equivalent to your "GROUP BY" in sql. The $sum operator will essentially be creating a sum for each group of documents you create here that match that _id - so essentially we'll be "re-collapsing" what you just did with $unwind, by using the $group. But this will allow $sum to work.
I think you're looking for grouping on just your main document id, so I think your $sum statement in your question is correct.
$group : { _id : $_id, totalAmount : { $sum : '$all_service.all_sub_item.price' } }
This will output documents with an _id field equivalent to your original document ID, and your sum.
I'll let you put it together, I'm not super familiar with node. You were close but I think moving your $match to the front and using an $unwind stage will get you where you need to be. Good luck!

Elastic Search filtering in facets

I want to simulate a parent child relation in elastic search and perform some analytics work over it. My use case is something like this
I have a shop owner like this
"_source": {
"shopId": 5,
"distributorId": 4,
"stateId": 1,
"partnerId": 2,
}
and now have child records (for each day) like this:
"_source": {
"shopId": 5,
"date" : 2013-11-13,
"transactions": 150,
"amount": 1980,
}
The parent is a record per store, while the child is the transactions each store does for
day. Now I want to do some complex query like
Find out total transaction for each day for the last 30 days where distributor is 5
POST /newdb/shopsDaily/_search
{
"query": {
"match_all": {}
},
"filter": {
"has_parent": {
"type": "shop",
"query": {
"match": {
"distributorId": "5"
}
}
}
},
"facets": {
"date": {
"histogram": {
"key_field": "date",
"value_field": "transactions",
"interval": 100
}
}
}
}
But the result I get do not take the filtering into account which I applied.
So I changed the query to this:
POST /newdb/shopDaily/_search
{
"query": {"filtered": {
"query": {"match_all": {}},
"filter": { "has_parent": {
"type": "shop",
"query": {"match": {
"distributorId": "13"
}}
}}
}},
"facets": {
"date": {
"histogram": {
"key_field": "date",
"value_field": "transactions",
"interval": 100
}
}
}
}
And then the final histogram facet took filtering into count.
So, when I browsed though I found out this is due to using filtered(which can only be used inside query clause and not outside like filter) rather than filter,
but it also mentioned that to have fast search you should use filter. Will searching as I did in second step (when I used filtered instead of filter) effect the performance of elastic search? If so, how can I make my facets honor filters and not effect the performance?
Thanks for you time
filters in Filtered query (filters in query clause) are cached, hence faster. These type of filters affect both search result and facet counts.
Filters outside the query clause are not considered during facet calculations. They are considered only for search results. Facet is calculated only on the query clause. If you want filtered facets then you need to set filters to each of the facet clauses.

Query all unique values of a field with Elasticsearch

How do I search for all unique values of a given field with Elasticsearch?
I have such a kind of query like select full_name from authors, so I can display the list to the users on a form.
You could make a terms facet on your 'full_name' field. But in order to do that properly you need to make sure you're not tokenizing it while indexing, otherwise every entry in the facet will be a different term that is part of the field content. You most likely need to configure it as 'not_analyzed' in your mapping. If you are also searching on it and you still want to tokenize it you can just index it in two different ways using multi field.
You also need to take into account that depending on the number of unique terms that are part of the full_name field, this operation can be expensive and require quite some memory.
For Elasticsearch 1.0 and later, you can leverage terms aggregation to do this,
query DSL:
{
"aggs": {
"NAME": {
"terms": {
"field": "",
"size": 10
}
}
}
}
A real example:
{
"aggs": {
"full_name": {
"terms": {
"field": "authors",
"size": 0
}
}
}
}
Then you can get all unique values of authors field.
size=0 means not limit the number of terms(this requires es to be 1.1.0 or later).
Response:
{
...
"aggregations" : {
"full_name" : {
"buckets" : [
{
"key" : "Ken",
"doc_count" : 10
},
{
"key" : "Jim Gray",
"doc_count" : 10
},
]
}
}
}
see Elasticsearch terms aggregations.
Intuition:
In SQL parlance:
Select distinct full_name from authors;
is equivalent to
Select full_name from authors group by full_name;
So, we can use the grouping/aggregate syntax in ElasticSearch to find distinct entries.
Assume the following is the structure stored in elastic search :
[{
"author": "Brian Kernighan"
},
{
"author": "Charles Dickens"
}]
What did not work: Plain aggregation
{
"aggs": {
"full_name": {
"terms": {
"field": "author"
}
}
}
}
I got the following error:
{
"error": {
"root_cause": [
{
"reason": "Fielddata is disabled on text fields by default...",
"type": "illegal_argument_exception"
}
]
}
}
What worked like a charm: Appending .keyword with the field
{
"aggs": {
"full_name": {
"terms": {
"field": "author.keyword"
}
}
}
}
And the sample output could be:
{
"aggregations": {
"full_name": {
"buckets": [
{
"doc_count": 372,
"key": "Charles Dickens"
},
{
"doc_count": 283,
"key": "Brian Kernighan"
}
],
"doc_count": 1000
}
}
}
Bonus tip:
Let us assume the field in question is nested as follows:
[{
"authors": [{
"details": [{
"name": "Brian Kernighan"
}]
}]
},
{
"authors": [{
"details": [{
"name": "Charles Dickens"
}]
}]
}
]
Now the correct query becomes:
{
"aggregations": {
"full_name": {
"aggregations": {
"author_details": {
"terms": {
"field": "authors.details.name"
}
}
},
"nested": {
"path": "authors.details"
}
}
},
"size": 0
}
Working for Elasticsearch 5.2.2
curl -XGET http://localhost:9200/articles/_search?pretty -d '
{
"aggs" : {
"whatever" : {
"terms" : { "field" : "yourfield", "size":10000 }
}
},
"size" : 0
}'
The "size":10000 means get (at most) 10000 unique values. Without this, if you have more than 10 unique values, only 10 values are returned.
The "size":0 means that in result, "hits" will contain no documents. By default, 10 documents are returned, which we don't need.
Reference: bucket terms aggregation
Also note, according to this page, facets have been replaced by aggregations in Elasticsearch 1.0, which are a superset of facets.
The existing answers did not work for me in Elasticsearch 5.X, for the following reasons:
I needed to tokenize my input while indexing.
"size": 0 failed to parse because "[size] must be greater than 0."
"Fielddata is disabled on text fields by default." This means by default you cannot search on the full_name field. However, an unanalyzed keyword field can be used for aggregations.
Solution 1: use the Scroll API. It works by keeping a search context and making multiple requests, each time returning subsequent batches of results. If you are using Python, the elasticsearch module has the scan() helper function to handle scrolling for you and return all results.
Solution 2: use the Search After API. It is similar to Scroll, but provides a live cursor instead of keeping a search context. Thus it is more efficient for real-time requests.

Resources