I have a cloudant database containing a collection of measures coming in from multiple devices.
Each device sends many measures but three are of interest : temperature, latitude and longitude.
Due to the design of the system, each value is a separate measure (I cannot easily join temperature, latitude and longitude at insertion time) and in the database I have a set of measures like so:
{
"device": "foo",
"name": "temperature",
"date": <timestamp>,
"value": <value>
},
{
"device": "foo",
"name": "latitude",
"date": <timestamp>,
"value": <value>
},
{
"device": "foo",
"name": "longitude",
"date": <timestamp>,
"value": <value>
},
So conceptually, I have 3 time series.
I would like to extract the latest measure of each of these time series, and ideally have them grouped by device.
Something like:
{
device1: {
"temperature": {
date: <date>,
value: <value>
},
"latitude": {
date: <date>,
value: <value>
},
"longitude": {
date: <date>,
value: <value>
}
},
"device2": {
...
}
}
I do not expect this exact syntax, that's just an idea of the set of data I'm expecting.
I could join the positional measures together but the question would be the same: how to get the last ts entries of each device grouped together?
In the first place, I would use a data structure like this :
{
"type":"temperature",
"date":"1472528116698",
"value":"35",
"device":"device1"
}
The type property could be either temperature,latitude,longitude.
Then you need some views. Personally, I prefer to have one _design documents by type and it will also be easier for the queries.
For example, you would have a _design document like this for the temperature :
{
"_id": "_design/temperature",
"_rev": "8-91e594df623063ed3ad7111cde09eecb",
"language": "javascript",
"views": {
"byDevice": {
"map": "function(doc) {\n if ((doc.type + '').toLowerCase() === 'temperature' && doc.device)\n emit(doc.device);\n}\n"
},
"lastestByDevice": {
"map": "function(doc) {\n if ((doc.type + '').toLowerCase() === 'temperature' && doc.device && doc.value)\n emit(doc.device,doc.value);\n}\n",
"reduce": "function(keys, values, rereduce) {\n var max = Number.MIN_VALUE;\n for (var i = 0; i < values.length; i++) {\n var val = parseFloat(values[i]);\n if (val > max)\n max = val;\n }\n return max;\n}\n"
}
}
}
Request example:
http://localhost:5984/db/_design/temperature/_view/latestByDevice?group_level=1&reduce=true
If you use the latest by device with the reduce function, it would return each device with their maximum value. We this example, you should be able to get a good start. I don't know how you receive and build your data but if you prefer to group everything by device, it's also possible.
Related
I'm not sure how to query when using CosmosDb as I'm used to SQL. My question is about how to get the maximum value of a property in an array of arrays. I've been trying subqueries so far but apparently I don't understand very well how they work.
In an structure such as the one below, how do I query the city with more population among all states using the Data Explorer in Azure:
{
"id": 1,
"states": [
{
"name": "New York",
"cities": [
{
"name": "New York",
"population": 8500000
},
{
"name": "Hempstead",
"population": 750000
},
{
"name": "Brookhaven",
"population": 500000
}
]
},
{
"name": "California",
"cities":[
{
"name": "Los Angeles",
"population": 4000000
},
{
"name": "San Diego",
"population": 1400000
},
{
"name": "San Jose",
"population": 1000000
}
]
}
]
}
This is currently not possible as far as I know.
It would look a bit like this:
SELECT TOP 1 state.name as stateName, city.name as cityName, city.population FROM c
join state in c.states
join city in state.cities
--order by city.population desc <-- this does not work in this case
You could write a user defined function that will allow you to write the query you probably expect, similar to this: CosmosDB sort results by a value into an array
The result could look like:
SELECT c.name, udf.OnlyMaxPop(c.states) FROM c
function OnlyMaxPop(states){
function compareStates(stateA,stateB){
stateB.cities[0].poplulation - stateA.cities[0].population;
}
onlywithOneCity = states.map(s => {
maxpop = Math.max.apply(Math, s.cities.map(o => o.population));
return {
name: s.name,
cities: s.cities.filter(x => x.population === maxpop)
}
});
return onlywithOneCity.sort(compareStates)[0];
}
You would probably need to adapt the function to your exact query needs, but I am not certain what your desired result would look like.
I have diferent kinds of documents in my couchDB, for example:
{
"_id": "c9f3ebc1-78f4-4dd1-8fc2-ab96f804287c",
"_rev": "7-1e8fcc048237366e24869dadc9ba54f1",
"to_customer": false,
"box_type": {
"id": 9,
"name": "ZF3330"
},
"erp_creation_date": "16/12/2017",
"type": "pallet",
"plantation": {
"id": 62,
"name": "FRF"
},
"pallet_type": {
"id": 2565,
"name": "ZF15324"
},
"creation_date": "16/12/2017",
"article_id": 3,
"updated": "2017/12/16 19:01",
"server_status": {
"in_server": true,
"errors": null,
"modified_in_server": false,
"dirty": false,
"delete_in_server": false
},
"pallet_article": {
"id": 11,
"name": "BLUE"
}
}
So , in all my documents, I have the field : type. In the other hand I have a view that get all the documents whose type is pallet || shipment
this is my view:
function(doc) {
if (doc.completed == true && (doc.type == "shipment" || doc.type == "pallet" )){
emit([doc.type, doc.device_num, doc.num], doc);
}
}
So in this view I get always a list with the view query result, the problem I have is that list is ordering by receiving date(I guess) and I need to order it by document type.
so my question is: How Can I order documents by document.type in a View?
View results are always sorted by key, so your view is sorted by doc.type: first you will get all pallets, then all the shipments. the pallets are sorted by device_num and then num. If you emit several rows with the same keys, the rows are then sorted by _id. You can find more detailed info in the CouchDB documentation.
So your view should actually work the way you want. ;-)
I want to simulate a parent child relation in elastic search and perform some analytics work over it. My use case is something like this
I have a shop owner like this
"_source": {
"shopId": 5,
"distributorId": 4,
"stateId": 1,
"partnerId": 2,
}
and now have child records (for each day) like this:
"_source": {
"shopId": 5,
"date" : 2013-11-13,
"transactions": 150,
"amount": 1980,
}
The parent is a record per store, while the child is the transactions each store does for
day. Now I want to do some complex query like
Find out total transaction for each day for the last 30 days where distributor is 5
POST /newdb/shopsDaily/_search
{
"query": {
"match_all": {}
},
"filter": {
"has_parent": {
"type": "shop",
"query": {
"match": {
"distributorId": "5"
}
}
}
},
"facets": {
"date": {
"histogram": {
"key_field": "date",
"value_field": "transactions",
"interval": 100
}
}
}
}
But the result I get do not take the filtering into account which I applied.
So I changed the query to this:
POST /newdb/shopDaily/_search
{
"query": {"filtered": {
"query": {"match_all": {}},
"filter": { "has_parent": {
"type": "shop",
"query": {"match": {
"distributorId": "13"
}}
}}
}},
"facets": {
"date": {
"histogram": {
"key_field": "date",
"value_field": "transactions",
"interval": 100
}
}
}
}
And then the final histogram facet took filtering into count.
So, when I browsed though I found out this is due to using filtered(which can only be used inside query clause and not outside like filter) rather than filter,
but it also mentioned that to have fast search you should use filter. Will searching as I did in second step (when I used filtered instead of filter) effect the performance of elastic search? If so, how can I make my facets honor filters and not effect the performance?
Thanks for you time
filters in Filtered query (filters in query clause) are cached, hence faster. These type of filters affect both search result and facet counts.
Filters outside the query clause are not considered during facet calculations. They are considered only for search results. Facet is calculated only on the query clause. If you want filtered facets then you need to set filters to each of the facet clauses.
How do I search for all unique values of a given field with Elasticsearch?
I have such a kind of query like select full_name from authors, so I can display the list to the users on a form.
You could make a terms facet on your 'full_name' field. But in order to do that properly you need to make sure you're not tokenizing it while indexing, otherwise every entry in the facet will be a different term that is part of the field content. You most likely need to configure it as 'not_analyzed' in your mapping. If you are also searching on it and you still want to tokenize it you can just index it in two different ways using multi field.
You also need to take into account that depending on the number of unique terms that are part of the full_name field, this operation can be expensive and require quite some memory.
For Elasticsearch 1.0 and later, you can leverage terms aggregation to do this,
query DSL:
{
"aggs": {
"NAME": {
"terms": {
"field": "",
"size": 10
}
}
}
}
A real example:
{
"aggs": {
"full_name": {
"terms": {
"field": "authors",
"size": 0
}
}
}
}
Then you can get all unique values of authors field.
size=0 means not limit the number of terms(this requires es to be 1.1.0 or later).
Response:
{
...
"aggregations" : {
"full_name" : {
"buckets" : [
{
"key" : "Ken",
"doc_count" : 10
},
{
"key" : "Jim Gray",
"doc_count" : 10
},
]
}
}
}
see Elasticsearch terms aggregations.
Intuition:
In SQL parlance:
Select distinct full_name from authors;
is equivalent to
Select full_name from authors group by full_name;
So, we can use the grouping/aggregate syntax in ElasticSearch to find distinct entries.
Assume the following is the structure stored in elastic search :
[{
"author": "Brian Kernighan"
},
{
"author": "Charles Dickens"
}]
What did not work: Plain aggregation
{
"aggs": {
"full_name": {
"terms": {
"field": "author"
}
}
}
}
I got the following error:
{
"error": {
"root_cause": [
{
"reason": "Fielddata is disabled on text fields by default...",
"type": "illegal_argument_exception"
}
]
}
}
What worked like a charm: Appending .keyword with the field
{
"aggs": {
"full_name": {
"terms": {
"field": "author.keyword"
}
}
}
}
And the sample output could be:
{
"aggregations": {
"full_name": {
"buckets": [
{
"doc_count": 372,
"key": "Charles Dickens"
},
{
"doc_count": 283,
"key": "Brian Kernighan"
}
],
"doc_count": 1000
}
}
}
Bonus tip:
Let us assume the field in question is nested as follows:
[{
"authors": [{
"details": [{
"name": "Brian Kernighan"
}]
}]
},
{
"authors": [{
"details": [{
"name": "Charles Dickens"
}]
}]
}
]
Now the correct query becomes:
{
"aggregations": {
"full_name": {
"aggregations": {
"author_details": {
"terms": {
"field": "authors.details.name"
}
}
},
"nested": {
"path": "authors.details"
}
}
},
"size": 0
}
Working for Elasticsearch 5.2.2
curl -XGET http://localhost:9200/articles/_search?pretty -d '
{
"aggs" : {
"whatever" : {
"terms" : { "field" : "yourfield", "size":10000 }
}
},
"size" : 0
}'
The "size":10000 means get (at most) 10000 unique values. Without this, if you have more than 10 unique values, only 10 values are returned.
The "size":0 means that in result, "hits" will contain no documents. By default, 10 documents are returned, which we don't need.
Reference: bucket terms aggregation
Also note, according to this page, facets have been replaced by aggregations in Elasticsearch 1.0, which are a superset of facets.
The existing answers did not work for me in Elasticsearch 5.X, for the following reasons:
I needed to tokenize my input while indexing.
"size": 0 failed to parse because "[size] must be greater than 0."
"Fielddata is disabled on text fields by default." This means by default you cannot search on the full_name field. However, an unanalyzed keyword field can be used for aggregations.
Solution 1: use the Scroll API. It works by keeping a search context and making multiple requests, each time returning subsequent batches of results. If you are using Python, the elasticsearch module has the scan() helper function to handle scrolling for you and return all results.
Solution 2: use the Search After API. It is similar to Scroll, but provides a live cursor instead of keeping a search context. Thus it is more efficient for real-time requests.
I have the following json (simplified) objects in a couchdb storage:
[{
"_id": "5ea7a53e670b432e0fe22a7bc10024db",
"_rev": "1-ae70c8906f7aa6d73539a89f7ad960ee",
"type": "job"
}, {
"_id": "5ea7a53e670b432e0fe22a7bc10041d9",
"_rev": "4-fa0ba68c35ca548b497a7309389f9087",
"type": "scan",
"job_id": "5ea7a53e670b432e0fe22a7bc10024db",
"number": 1
}, {
"_id": "5ea7a53e670b432e0fe22a7bc100520e",
"_rev": "4-3e6b1a028786c265ecb7362e245d049e",
"type": "scan",
"job_id": "5ea7a53e670b432e0fe22a7bc10024db",
"number": 2
}]
I want to make a post request with the keys ["5ea7a53e670b432e0fe22a7bc10024db", 2] (the job id and a scan number). How can I make a map function for a view to filter out the job that has the given id and the measurement that matches the job_id and the number?
Thanks,
Radu
What is you expected output of the request? If you just want to get the scan, emit in map the key you want to search for:
function (doc) {
if (type == "scan" && number) {
emit([doc.job_id, doc. number], doc);
}
}
If you need two documents -- the job (full document, not just id) and the scan, either emit both of them in single emit with include_docs=true parameter in the request's URL:
function (doc) {
if (doc.type == "scan" && doc.number) {
emit([doc.job_id, doc. number], {scan: doc, _id: doc.job_id});
}
}
or in two emit's with:
function (doc) {
if (doc.type == "scan" && doc.number && doc.job_id) {
emit([doc.job_id, doc. number, "job"], {_id: doc.job_id);
emit([doc.job_id, doc. number, "scan"], {_id: doc._id});
}
}
You will get both document with startkey=["5ea7a53e670b432e0fe22a7bc10024db", 2]&endkey=["5ea7a53e670b432e0fe22a7bc10024db", 2, {}]&include_docs=true (or keys=[]) in the URL.
Check the view API for the include_docs option.