ElasticSearch: access document nested value in groovy script - groovy

I have a document stored in ElasticSearch as below.
_source:
{
"firstname": "John",
"lastname": "Smith",
"medals":[
{
"bucket": 100,
"count": 1
},
{
"bucket": 150,
"count": 2
}
]
}
I can access the string type value inside a document using doc.firstname for scripted metric aggregation http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-scripted-metric-aggregation.html.
But I am not able to get the field value using doc.medals[0].bucket.
Can you please help me out and let me know how to access the values inside nested fields?

Use _source for nested properties.
Doc holds fields that are loaded in memory. Nested documents may not be loaded and should be accessed with _source.
For instance:
GET index/type
{
"aggs": {
"NAME": {
"scripted_metric": {
"init_script": "_agg['collection']=[]",
"map_script": "_agg['tr'].add(_source.propertry1.prop);",
"combine_script": "return _agg",
"reduce_script": "return _aggs"
}
}
},
"size": 0
}

Related

How to match and join results between two resolvers in one graphql query?

I have two resolver.
The one is Company resolve that return the company details like id, name and list of documents ids, like this example:
{
"data": {
"companyOne": {
"name": "twitter",
"documents": [
"5c6c0213f0fa854bd7d4a38c",
"5c6c02948e0001a16529a1a1",
"5c6c02ee7e76c12075850119",
"5c6ef2ddd16e19889ffaffd0",
"5c72fb723ebf7b2881679ced",
"5c753d1c2e080fa4a2f86c87",
...
]
}
}
}
And the another resolver gets me all the details of documents like this example:
{
"data": {
"documentsMany": [{
"name": "doc1",
"_id": 5c6c0213f0fa854bd7d4a38c,
}, {
"name": "doc2",
"_id": 5c6c02948e0001a16529a1a1,
},
...
]
}
}
How to match every data.companyOne.documents[id] to data.documentsMany[..]._id? in the query level? is it possible to do this graphql?
The expect results should be when I run the companyOne query (without change the code - just in the query level) it's should return with documents as object instead of array of string ids.
maybe something like?
query {
companyOne {
name,
documents on documentsMany where _id is ___???
}
}

Filter couchdb document based on value from nested child document

I would like to create a map/reduce function that filters the documents based on a nested value from the child document. But retrieve the parent document.
I have following documents:
{
"_id": "1",
"_rev": "1-991baf1d86435a73a3460335cc19063c",
"configuration_id": "225f9d47-841c-43c2-90c2-e65bb49083d3",
"name": "test",
"image": "",
"type": "A",
"created": "",
"updated": 1,
"destroyed": ""
}
{
"_id": "225f9d47-841c-43c2-90c2-e65bb49083d3",
"_rev": "1-3e3a1c357c86cbd1cd42b5980b9655a4",
"configuration_packages_id": "cd19b0ba-157d-4dd4-adac-56fd470bfed4",
"configuration_distribution_id": "5b538411-ca99-46c7-ac3c-1f382e4577a9",
"type": "CONFIGURATION",
"configuration": {
"hostname": "example123",
"images": [
"image1",
"image2"
]
}
}
Now I would like to retrieve all the documents of type A and with hostname example123.
At the moment I retrieve all the document of type A like this:
function (doc) {
if (doc.type === "A") {
emit([doc.updated], doc);
}
}
But now I would also like to filter on the host name as well.
I'm not sure on how to achieve this with CouchDB.
TLDR;
You cannot do this
Details
Your "nested" document is only accessible through a join but you can't query it.
The correct way to do that kind of query natively would have been to have a real nested document inside the parent document. Separating those documents has a cost.
Join example
function (doc) {
if (doc.type === "A") {
emit([doc.updated,0]);
emit([doc.updated,1],["_id":doc.configuration_id]);
}
}
If you query the view with "include_docs=true", this will get you the configuration document linked as well as the parent document itself. Then you can query to get the updated docs, merge the nested(1) with the parents(0) and filter them.

Mongoose : how to set a field of a model with result from an agregation

Here is my sample :
Two simple Mongoose models:
a Note model, with among other fields an id field that is a ref for the Notebook model.
a Notebook model, with the id I mentioned above.
My goal is to output something like that:
[
{
"notes_count": 7,
"title": "first notebook",
"id": "5585a9ffc9506e64192858c1"
},
{
"notes_count": 3,
"title": "second notebook",
"id": "558ab637cab9a2b01dae9a97"
}
]
Using aggregation and population on the Note model like this :
Note.aggregate(
[{
"$group": {
"_id": "$notebook",
"notes_count": {
"$sum": 1
}
}
}, {
"$project": {
"notebook": "$_id",
"notes_count": "$notes_count",
}
}]
gives me this kind of result :
{
"_id": "5585a9ffc9506e64192858c1",
"notes_count": 7,
"notebook": {
"_id": "5585a9ffc9506e64192858c1",
"title": "un carnet court",
"__v": 0
}
}
Forget about __v and _id fields, would be easy to handle with a modified toJSON function.
But in this function neither doc nor ret params gives me access to the computed notes_count value.
Obviously, I could manage this in the route handler (parse result and recreate the datas that will be returned) but, is there a proper way to do that with mongoose ?
You can't use the aggregate method to update. As you have noted, you'll need to use output from the aggregate constructor to update the relevant documents.
As the Mongoose aggregate method will return a collection of plain objects, you can iterate through this and utilise the _id field (or similar) to update the documents.

How to search through data with arbitrary amount of fields?

I have the web-form builder for science events. The event moderator creates registration form with arbitrary amount of boolean, integer, enum and text fields.
Created form is used for:
register a new member to event;
search through registered members.
What is the best search tool for second task (to search memebers of event)? Is ElasticSearch well for this task?
I wrote a post about how to index arbitrary data into Elasticsearch and then to search it by specific fields and values. All this, without blowing up your index mapping.
The post is here: http://smnh.me/indexing-and-searching-arbitrary-json-data-using-elasticsearch/
In short, you will need to do the following steps to get what you want:
Create a special index described in the post.
Flatten the data you want to index using the flattenData function:
https://gist.github.com/smnh/30f96028511e1440b7b02ea559858af4.
Create a document with the original and flattened data and index it into Elasticsearch:
{
"data": { ... },
"flatData": [ ... ]
}
Optional: use Elasticsearch aggregations to find which fields and types have been indexed.
Execute queries on the flatData object to find what you need.
Example
Basing on your original question, let's assume that the first event moderator created a form with following fields to register members for the science event:
name string
age long
sex long - 0 for male, 1 for female
In addition to this data, the related event probably has some sort of id, let's call it eventId. So the final document could look like this:
{
"eventId": "2T73ZT1R463DJNWE36IA8FEN",
"name": "Bob",
"age": 22,
"sex": 0
}
Now, before we index this document, we will flatten it using the flattenData function:
flattenData(document);
This will produce the following array:
[
{
"key": "eventId",
"type": "string",
"key_type": "eventId.string",
"value_string": "2T73ZT1R463DJNWE36IA8FEN"
},
{
"key": "name",
"type": "string",
"key_type": "name.string",
"value_string": "Bob"
},
{
"key": "age",
"type": "long",
"key_type": "age.long",
"value_long": 22
},
{
"key": "sex",
"type": "long",
"key_type": "sex.long",
"value_long": 0
}
]
Then we will wrap this data in a document as I've showed before and index it.
Then, the second event moderator, creates another form having a new field, field with same name and type, and also a field with same name but with different type:
name string
city string
sex string - "male" or "female"
This event moderator decided that instead of having 0 and 1 for male and female, his form will allow choosing between two strings - "male" and "female".
Let's try to flatten the data submitted by this form:
flattenData({
"eventId": "F1BU9GGK5IX3ZWOLGCE3I5ML",
"name": "Alice",
"city": "New York",
"sex": "female"
});
This will produce the following data:
[
{
"key": "eventId",
"type": "string",
"key_type": "eventId.string",
"value_string": "F1BU9GGK5IX3ZWOLGCE3I5ML"
},
{
"key": "name",
"type": "string",
"key_type": "name.string",
"value_string": "Alice"
},
{
"key": "city",
"type": "string",
"key_type": "city.string",
"value_string": "New York"
},
{
"key": "sex",
"type": "string",
"key_type": "sex.string",
"value_string": "female"
}
]
Then, after wrapping the flattened data in a document and indexing it into Elasticsearch we can execute complicated queries.
For example, to find members named "Bob" registered for the event with ID 2T73ZT1R463DJNWE36IA8FEN we can execute the following query:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "flatData",
"query": {
"bool": {
"must": [
{"term": {"flatData.key": "eventId"}},
{"match": {"flatData.value_string.keyword": "2T73ZT1R463DJNWE36IA8FEN"}}
]
}
}
}
},
{
"nested": {
"path": "flatData",
"query": {
"bool": {
"must": [
{"term": {"flatData.key": "name"}},
{"match": {"flatData.value_string": "bob"}}
]
}
}
}
}
]
}
}
}
ElasticSearch automatically detects the field content in order to index it correctly, even if the mapping hasn't been defined previously. So, yes : ElasticSearch suits well these cases.
However, you may want to fine tune this behavior, or maybe the default mapping applied by ElasticSearch doesn't correspond to what you need : in this case, take a look at the default mapping or, for even further control, the dynamic templates feature.
If you let your end users decide the keys you store things in, you'll have an ever-growing mapping and cluster state, which is problematic.
This case and a suggested solution is covered in this article on common problems with Elasticsearch.
Essentially, you want to have everything that can possibly be user-defined as a value. Using nested documents, you can have a key-field and differently mapped value fields to achieve pretty much the same.

Query all unique values of a field with Elasticsearch

How do I search for all unique values of a given field with Elasticsearch?
I have such a kind of query like select full_name from authors, so I can display the list to the users on a form.
You could make a terms facet on your 'full_name' field. But in order to do that properly you need to make sure you're not tokenizing it while indexing, otherwise every entry in the facet will be a different term that is part of the field content. You most likely need to configure it as 'not_analyzed' in your mapping. If you are also searching on it and you still want to tokenize it you can just index it in two different ways using multi field.
You also need to take into account that depending on the number of unique terms that are part of the full_name field, this operation can be expensive and require quite some memory.
For Elasticsearch 1.0 and later, you can leverage terms aggregation to do this,
query DSL:
{
"aggs": {
"NAME": {
"terms": {
"field": "",
"size": 10
}
}
}
}
A real example:
{
"aggs": {
"full_name": {
"terms": {
"field": "authors",
"size": 0
}
}
}
}
Then you can get all unique values of authors field.
size=0 means not limit the number of terms(this requires es to be 1.1.0 or later).
Response:
{
...
"aggregations" : {
"full_name" : {
"buckets" : [
{
"key" : "Ken",
"doc_count" : 10
},
{
"key" : "Jim Gray",
"doc_count" : 10
},
]
}
}
}
see Elasticsearch terms aggregations.
Intuition:
In SQL parlance:
Select distinct full_name from authors;
is equivalent to
Select full_name from authors group by full_name;
So, we can use the grouping/aggregate syntax in ElasticSearch to find distinct entries.
Assume the following is the structure stored in elastic search :
[{
"author": "Brian Kernighan"
},
{
"author": "Charles Dickens"
}]
What did not work: Plain aggregation
{
"aggs": {
"full_name": {
"terms": {
"field": "author"
}
}
}
}
I got the following error:
{
"error": {
"root_cause": [
{
"reason": "Fielddata is disabled on text fields by default...",
"type": "illegal_argument_exception"
}
]
}
}
What worked like a charm: Appending .keyword with the field
{
"aggs": {
"full_name": {
"terms": {
"field": "author.keyword"
}
}
}
}
And the sample output could be:
{
"aggregations": {
"full_name": {
"buckets": [
{
"doc_count": 372,
"key": "Charles Dickens"
},
{
"doc_count": 283,
"key": "Brian Kernighan"
}
],
"doc_count": 1000
}
}
}
Bonus tip:
Let us assume the field in question is nested as follows:
[{
"authors": [{
"details": [{
"name": "Brian Kernighan"
}]
}]
},
{
"authors": [{
"details": [{
"name": "Charles Dickens"
}]
}]
}
]
Now the correct query becomes:
{
"aggregations": {
"full_name": {
"aggregations": {
"author_details": {
"terms": {
"field": "authors.details.name"
}
}
},
"nested": {
"path": "authors.details"
}
}
},
"size": 0
}
Working for Elasticsearch 5.2.2
curl -XGET http://localhost:9200/articles/_search?pretty -d '
{
"aggs" : {
"whatever" : {
"terms" : { "field" : "yourfield", "size":10000 }
}
},
"size" : 0
}'
The "size":10000 means get (at most) 10000 unique values. Without this, if you have more than 10 unique values, only 10 values are returned.
The "size":0 means that in result, "hits" will contain no documents. By default, 10 documents are returned, which we don't need.
Reference: bucket terms aggregation
Also note, according to this page, facets have been replaced by aggregations in Elasticsearch 1.0, which are a superset of facets.
The existing answers did not work for me in Elasticsearch 5.X, for the following reasons:
I needed to tokenize my input while indexing.
"size": 0 failed to parse because "[size] must be greater than 0."
"Fielddata is disabled on text fields by default." This means by default you cannot search on the full_name field. However, an unanalyzed keyword field can be used for aggregations.
Solution 1: use the Scroll API. It works by keeping a search context and making multiple requests, each time returning subsequent batches of results. If you are using Python, the elasticsearch module has the scan() helper function to handle scrolling for you and return all results.
Solution 2: use the Search After API. It is similar to Scroll, but provides a live cursor instead of keeping a search context. Thus it is more efficient for real-time requests.

Resources