Finding a subkey with a view in CouchDB - couchdb

How do I go about creating a view that would allow me to search for a particular value based on a JSON object with the main document?
For example, below is a document from my database.
Phone number is kept in "phone" which is within the "patientinfo" key.
How would I search for a phone number with this layout?
{
"_id": "x2484",
"_rev": "2-44ac5e42ce95fcbcc8a743a256a3f6ce",
"patientinfo": {
"name": "Douglas",
"laststat": "Complete",
"timetocall": 1438034383,
"queue": 7,
"notes": "Spouse placed call",
"laststatcode": 501,
"timestamp": 1438023283,
"phone": 3141592653,
"attempts": 1
},
"call": {
"1": {
"timetocall": 1438034383,
"status": "Complete",
"laststat": "Complete",
"timestamp": 1438023283,
"lastintv": "moor",
"laststatcode": 501
}
}
}

So the answer turned out to be much simpler than I thought. All I needed to do was add the next level of the json object as part of my if and emit statements.
function(doc) {
if (doc.patientinfo.phone)
{
emit(doc.patientinfo.phone, doc);
}
}

Related

Ordering view by document type in couchDB

I have diferent kinds of documents in my couchDB, for example:
{
"_id": "c9f3ebc1-78f4-4dd1-8fc2-ab96f804287c",
"_rev": "7-1e8fcc048237366e24869dadc9ba54f1",
"to_customer": false,
"box_type": {
"id": 9,
"name": "ZF3330"
},
"erp_creation_date": "16/12/2017",
"type": "pallet",
"plantation": {
"id": 62,
"name": "FRF"
},
"pallet_type": {
"id": 2565,
"name": "ZF15324"
},
"creation_date": "16/12/2017",
"article_id": 3,
"updated": "2017/12/16 19:01",
"server_status": {
"in_server": true,
"errors": null,
"modified_in_server": false,
"dirty": false,
"delete_in_server": false
},
"pallet_article": {
"id": 11,
"name": "BLUE"
}
}
So , in all my documents, I have the field : type. In the other hand I have a view that get all the documents whose type is pallet || shipment
this is my view:
function(doc) {
if (doc.completed == true && (doc.type == "shipment" || doc.type == "pallet" )){
emit([doc.type, doc.device_num, doc.num], doc);
}
}
So in this view I get always a list with the view query result, the problem I have is that list is ordering by receiving date(I guess) and I need to order it by document type.
so my question is: How Can I order documents by document.type in a View?
View results are always sorted by key, so your view is sorted by doc.type: first you will get all pallets, then all the shipments. the pallets are sorted by device_num and then num. If you emit several rows with the same keys, the rows are then sorted by _id. You can find more detailed info in the CouchDB documentation.
So your view should actually work the way you want. ;-)

ElasticSearch: access document nested value in groovy script

I have a document stored in ElasticSearch as below.
_source:
{
"firstname": "John",
"lastname": "Smith",
"medals":[
{
"bucket": 100,
"count": 1
},
{
"bucket": 150,
"count": 2
}
]
}
I can access the string type value inside a document using doc.firstname for scripted metric aggregation http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-scripted-metric-aggregation.html.
But I am not able to get the field value using doc.medals[0].bucket.
Can you please help me out and let me know how to access the values inside nested fields?
Use _source for nested properties.
Doc holds fields that are loaded in memory. Nested documents may not be loaded and should be accessed with _source.
For instance:
GET index/type
{
"aggs": {
"NAME": {
"scripted_metric": {
"init_script": "_agg['collection']=[]",
"map_script": "_agg['tr'].add(_source.propertry1.prop);",
"combine_script": "return _agg",
"reduce_script": "return _aggs"
}
}
},
"size": 0
}

How to search through data with arbitrary amount of fields?

I have the web-form builder for science events. The event moderator creates registration form with arbitrary amount of boolean, integer, enum and text fields.
Created form is used for:
register a new member to event;
search through registered members.
What is the best search tool for second task (to search memebers of event)? Is ElasticSearch well for this task?
I wrote a post about how to index arbitrary data into Elasticsearch and then to search it by specific fields and values. All this, without blowing up your index mapping.
The post is here: http://smnh.me/indexing-and-searching-arbitrary-json-data-using-elasticsearch/
In short, you will need to do the following steps to get what you want:
Create a special index described in the post.
Flatten the data you want to index using the flattenData function:
https://gist.github.com/smnh/30f96028511e1440b7b02ea559858af4.
Create a document with the original and flattened data and index it into Elasticsearch:
{
"data": { ... },
"flatData": [ ... ]
}
Optional: use Elasticsearch aggregations to find which fields and types have been indexed.
Execute queries on the flatData object to find what you need.
Example
Basing on your original question, let's assume that the first event moderator created a form with following fields to register members for the science event:
name string
age long
sex long - 0 for male, 1 for female
In addition to this data, the related event probably has some sort of id, let's call it eventId. So the final document could look like this:
{
"eventId": "2T73ZT1R463DJNWE36IA8FEN",
"name": "Bob",
"age": 22,
"sex": 0
}
Now, before we index this document, we will flatten it using the flattenData function:
flattenData(document);
This will produce the following array:
[
{
"key": "eventId",
"type": "string",
"key_type": "eventId.string",
"value_string": "2T73ZT1R463DJNWE36IA8FEN"
},
{
"key": "name",
"type": "string",
"key_type": "name.string",
"value_string": "Bob"
},
{
"key": "age",
"type": "long",
"key_type": "age.long",
"value_long": 22
},
{
"key": "sex",
"type": "long",
"key_type": "sex.long",
"value_long": 0
}
]
Then we will wrap this data in a document as I've showed before and index it.
Then, the second event moderator, creates another form having a new field, field with same name and type, and also a field with same name but with different type:
name string
city string
sex string - "male" or "female"
This event moderator decided that instead of having 0 and 1 for male and female, his form will allow choosing between two strings - "male" and "female".
Let's try to flatten the data submitted by this form:
flattenData({
"eventId": "F1BU9GGK5IX3ZWOLGCE3I5ML",
"name": "Alice",
"city": "New York",
"sex": "female"
});
This will produce the following data:
[
{
"key": "eventId",
"type": "string",
"key_type": "eventId.string",
"value_string": "F1BU9GGK5IX3ZWOLGCE3I5ML"
},
{
"key": "name",
"type": "string",
"key_type": "name.string",
"value_string": "Alice"
},
{
"key": "city",
"type": "string",
"key_type": "city.string",
"value_string": "New York"
},
{
"key": "sex",
"type": "string",
"key_type": "sex.string",
"value_string": "female"
}
]
Then, after wrapping the flattened data in a document and indexing it into Elasticsearch we can execute complicated queries.
For example, to find members named "Bob" registered for the event with ID 2T73ZT1R463DJNWE36IA8FEN we can execute the following query:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "flatData",
"query": {
"bool": {
"must": [
{"term": {"flatData.key": "eventId"}},
{"match": {"flatData.value_string.keyword": "2T73ZT1R463DJNWE36IA8FEN"}}
]
}
}
}
},
{
"nested": {
"path": "flatData",
"query": {
"bool": {
"must": [
{"term": {"flatData.key": "name"}},
{"match": {"flatData.value_string": "bob"}}
]
}
}
}
}
]
}
}
}
ElasticSearch automatically detects the field content in order to index it correctly, even if the mapping hasn't been defined previously. So, yes : ElasticSearch suits well these cases.
However, you may want to fine tune this behavior, or maybe the default mapping applied by ElasticSearch doesn't correspond to what you need : in this case, take a look at the default mapping or, for even further control, the dynamic templates feature.
If you let your end users decide the keys you store things in, you'll have an ever-growing mapping and cluster state, which is problematic.
This case and a suggested solution is covered in this article on common problems with Elasticsearch.
Essentially, you want to have everything that can possibly be user-defined as a value. Using nested documents, you can have a key-field and differently mapped value fields to achieve pretty much the same.

Elasticsearch term filter on inner object field not matching

I have just organized my document structure to have a more OO design (e.g. moved top level properties like venueId and venueName into a venue object with id and name fields).
However I can now not get a simple term filter working for fields on the child venue inner object.
Here is my mapping:
{
"deal": {
"properties": {
"textId": {"type":"string","name":"textId","index":"no"},
"displayId": {"type":"string","name":"displayId","index":"no"},
"active": {"name":"active","type":"boolean","index":"not_analyzed"},
"venue": {
"type":"object",
"path":"full",
"properties": {
"textId": {"type":"string","name":"textId","index":"not_analyzed"},
"regionId": {"type":"string","name":"regionId","index":"not_analyzed"},
"displayId": {"type":"string","name":"displayId","index":"not_analyzed"},
"name": {"type":"string","name":"name"},
"address": {"type":"string","name":"address"},
"area": {
"type":"multi_field",
"fields": {
"area": {"type":"string","index":"not_analyzed"},
"area_search": {"type":"string","index":"analyzed"}}},
"location": {"type":"geo_point","lat_lon":true}}},
"tags": {
"type":"multi_field",
"fields": {
"tags":{"type":"string","index":"not_analyzed"},
"tags_search":{"type":"string","index":"analyzed"}}},
"days": {
"type":"multi_field",
"fields": {
"days":{"type":"string","index":"not_analyzed"},
"days_search":{"type":"string","index":"analyzed"}}},
"value": {"type":"string","name":"value"},
"title": {"type":"string","name":"title"},
"subtitle": {"type":"string","name":"subtitle"},
"description": {"type":"string","name":"description"},
"time": {"type":"string","name":"time"},
"link": {"type":"string","name":"link","index":"no"},
"previewImage": {"type":"string","name":"previewImage","index":"no"},
"detailImage": {"type":"string","name":"detailImage","index":"no"}}}
}
Here is an example document:
GET /production/deals/wa-au-some-venue-weekends-some-deal
{
"_index":"some-index-v1",
"_type":"deals",
"_id":"wa-au-some-venue-weekends-some-deal",
"_version":1,
"exists":true,
"_source" : {
"id":"921d5fe0-8867-4d5c-81b4-7c1caf11325f",
"textId":"wa-au-some-venue-weekends-some-deal",
"displayId":"some-venue-weekends-some-deal",
"active":true,
"venue":{
"id":"46a7cb64-395c-4bc4-814a-a7735591f9de",
"textId":"wa-au-some-venue",
"regionId":"wa-au",
"displayId":"some-venue",
"name":"Some Venue",
"address":"sdgfdg",
"area":"Swan Valley & Surrounds"},
"tags":["Lunch"],
"days":["Saturday","Sunday"],
"value":"$1",
"title":"Some Deal",
"subtitle":"",
"description":"",
"time":"5pm - Late"
}
}
And here is an 'explain' test on that same document:
POST /production/deals/wa-au-some-venue-weekends-some-deal/_explain
{
"query": {
"filtered": {
"filter": {
"term": {
"venue.regionId": "wa-au"
}
}
}
}
}
{
"ok":true,
"_index":"some-index-v1",
"_type":"deals",
"_id":"wa-au-some-venue-weekends-some-deal",
"matched":false,
"explanation":{
"value":0.0,
"description":"ConstantScore(cache(venue.regionId:wa-au)) doesn't match id 0"
}
}
Is there any way to get more useful debugging info?
Is there something wrong with the explain result description? Simply saying "doesn't match id 0" does not really make sense to me... the field is called 'regionId' (not 'id') and the value is definitely not 0...???
That happens because the type you submitted the mapping for is called deal, while the type you indexed the document in is called deals.
If you look at the mapping for your type deals, you'll see that was automatically generated and the field venue.regionId is analyzed, thus you most likely have two tokens in your index: wa and au. Only searching for those tokens on that type you would get back that document.
Anything else looks just great! Only a small character is wrong ;)

Query all unique values of a field with Elasticsearch

How do I search for all unique values of a given field with Elasticsearch?
I have such a kind of query like select full_name from authors, so I can display the list to the users on a form.
You could make a terms facet on your 'full_name' field. But in order to do that properly you need to make sure you're not tokenizing it while indexing, otherwise every entry in the facet will be a different term that is part of the field content. You most likely need to configure it as 'not_analyzed' in your mapping. If you are also searching on it and you still want to tokenize it you can just index it in two different ways using multi field.
You also need to take into account that depending on the number of unique terms that are part of the full_name field, this operation can be expensive and require quite some memory.
For Elasticsearch 1.0 and later, you can leverage terms aggregation to do this,
query DSL:
{
"aggs": {
"NAME": {
"terms": {
"field": "",
"size": 10
}
}
}
}
A real example:
{
"aggs": {
"full_name": {
"terms": {
"field": "authors",
"size": 0
}
}
}
}
Then you can get all unique values of authors field.
size=0 means not limit the number of terms(this requires es to be 1.1.0 or later).
Response:
{
...
"aggregations" : {
"full_name" : {
"buckets" : [
{
"key" : "Ken",
"doc_count" : 10
},
{
"key" : "Jim Gray",
"doc_count" : 10
},
]
}
}
}
see Elasticsearch terms aggregations.
Intuition:
In SQL parlance:
Select distinct full_name from authors;
is equivalent to
Select full_name from authors group by full_name;
So, we can use the grouping/aggregate syntax in ElasticSearch to find distinct entries.
Assume the following is the structure stored in elastic search :
[{
"author": "Brian Kernighan"
},
{
"author": "Charles Dickens"
}]
What did not work: Plain aggregation
{
"aggs": {
"full_name": {
"terms": {
"field": "author"
}
}
}
}
I got the following error:
{
"error": {
"root_cause": [
{
"reason": "Fielddata is disabled on text fields by default...",
"type": "illegal_argument_exception"
}
]
}
}
What worked like a charm: Appending .keyword with the field
{
"aggs": {
"full_name": {
"terms": {
"field": "author.keyword"
}
}
}
}
And the sample output could be:
{
"aggregations": {
"full_name": {
"buckets": [
{
"doc_count": 372,
"key": "Charles Dickens"
},
{
"doc_count": 283,
"key": "Brian Kernighan"
}
],
"doc_count": 1000
}
}
}
Bonus tip:
Let us assume the field in question is nested as follows:
[{
"authors": [{
"details": [{
"name": "Brian Kernighan"
}]
}]
},
{
"authors": [{
"details": [{
"name": "Charles Dickens"
}]
}]
}
]
Now the correct query becomes:
{
"aggregations": {
"full_name": {
"aggregations": {
"author_details": {
"terms": {
"field": "authors.details.name"
}
}
},
"nested": {
"path": "authors.details"
}
}
},
"size": 0
}
Working for Elasticsearch 5.2.2
curl -XGET http://localhost:9200/articles/_search?pretty -d '
{
"aggs" : {
"whatever" : {
"terms" : { "field" : "yourfield", "size":10000 }
}
},
"size" : 0
}'
The "size":10000 means get (at most) 10000 unique values. Without this, if you have more than 10 unique values, only 10 values are returned.
The "size":0 means that in result, "hits" will contain no documents. By default, 10 documents are returned, which we don't need.
Reference: bucket terms aggregation
Also note, according to this page, facets have been replaced by aggregations in Elasticsearch 1.0, which are a superset of facets.
The existing answers did not work for me in Elasticsearch 5.X, for the following reasons:
I needed to tokenize my input while indexing.
"size": 0 failed to parse because "[size] must be greater than 0."
"Fielddata is disabled on text fields by default." This means by default you cannot search on the full_name field. However, an unanalyzed keyword field can be used for aggregations.
Solution 1: use the Scroll API. It works by keeping a search context and making multiple requests, each time returning subsequent batches of results. If you are using Python, the elasticsearch module has the scan() helper function to handle scrolling for you and return all results.
Solution 2: use the Search After API. It is similar to Scroll, but provides a live cursor instead of keeping a search context. Thus it is more efficient for real-time requests.

Resources