couchdb map-reduce and grouping - couchdb

I am attempting to get a count of unique events for an object (lets say a video):
Here are my documents:
{
"type":"View",
"video_id": "12300",
"user_id": 3
}
{
"type":"View",
"video_id": "12300",
"user_id": 1
}
{
"type":"View",
"video_id": "45600",
"user_id": 3
}
I'm trying to get a unique (by user_id) count of views for each video
I assume I want to map my data like so:
function(doc) {
if (doc.type === 'View') {
emit([doc.video_id, doc.user_id], 1);
}
},
But I don't understand how to reduce it down to unique users per video, or am I going about this wrong.

You should look at the group_level view parameter. It will allow you to change what field(s) the grouping occurs on.
By using group_level = 1, in this case it will group by video_id. Using group_level = 2, it will group on both video_id and user_id.

Add ?group=true after the request URL. That groups identical keys together as input for the reduce function:
function(keys, values, rereduce){
return sum(values);
}
That should do it.
Note that keys and values are unzipped lists of keys and their values. With grouping on the keys are all identical for each call of the reduce.

Related

CouchDB Count Reduce with timestamp filtering

Let's say I have documents like so:
{
_id: "a98798978s978dd98d",
type: "signature",
uid: "u12345",
category: "cat_1",
timestamp: UNIX_TIMESTAMP
}
My goal is to be able to count all signature's created by a certain uid but being able to filter by timestamp
Thanks to Alexis, I've gotten to this far with a reduce _count function:
function (doc) {
if (doc.type === "signature") {
emit([doc.uid, doc.timestamp], 1);
}
}
With the following queries:
start_key=[null,lowerTimestamp]
end_key=[{},higherTimestamp]
reduce=true
group_level=1
Response:
{
"rows": [
{
"key": [ "u11111" ],
"value": 3
},
{
"key": [ "u12345" ],
"value": 26
}
]
}
It counts the uid correctly but the filter doesn't work properly. At first I thought it might be a CouchDB 2.2 bug, but I tried on Cloudant and I got the same response.
Does anyone have any ideas on how I could get this to work with being ale to filter timestamps?
When using compound keys in MapReduce (i.e. the key is an array of things), you cannot query a range of keys with a "leading" array element missing. i.e. you can query a range of uuids and get the results ordered by timestamp, but your use-case is the other way round - you want to query uuids by time.
I'd be tempted to put time first in the array, but unix timestamps are not so good for grouping ;). I don't known the ins and outs of your application but if you were to index a date instead of a timestamp like so:
function (doc) {
if (doc.type === "signature") {
var date = new Date(doc.timestamp)
var datestr = date.toISOString().split('T')[0]
emit([datestr, doc.uuid], 1);
}
}
This would allow you to query a range of dates (to the resolution of a whole day):
?startkey=["2018-01-01"]&endkey=["2018-02-01"]&group_level=2
albeit with your uuids grouped by day.

CouchDB View - Filter by List Field Attribute (doc.objects.[0].attribute)

I need to create a view that lists the values for an attribute of a doc field.
Sample Doc:
{
"_id": "003e5a9742e04ce7a6791aa845405c17",
"title", "testdoc",
"samples": [
{
"confidence": "high",
"handle": "joetest"
}
]
}
Example using that doc, I want a view that will return the values for "handle"
I found this example with the heading - Get contents of an object with specific attributes e.g. doc.objects.[0].attribute. But when I fill in the attribute name, e.g. "handle" and replace doc.objects with doc.samples, I get no results:
Toggle line numbers
// map
function(doc) {
for (var idx in doc.objects) {
emit(doc.objects[idx], attribute)
}
}
That will create an array of key-value-pairs where the key is alway the value of handle. Replace null with a value you want e.g. doc.title. If you want to get the doc attached to every row use the query parameter ?include_docs=true while requesting the view.
// map
function (doc) {
var samples = doc.samples
for(var i = 0, sample; sample = samples[i++];) {
emit(sample.handle, null)
}
}
Like this ->
function(doc) {
for (var i in doc.samples) {
emit(doc._id, doc.samples[i].handle)
}
}
It will produce a result based on the doc._id field as the key. Or, if you want your key to be based on the .handle field you reverse the parameters in emit so you can search by startKey=, endKey=.

CouchDB - Map Reduce similar to SQL Group by

Consider following sample documents stored in CouchDB
{
"_id":....,
"rev":....,
"type":"orders",
"Period":"2013-01",
"Region":"East",
"Category":"Stationary",
"Product":"Pen",
"Rate":1,
"Qty":10,
"Amount":10
}
{
"_id":....,
"rev":....,
"type":"orders",
"Period":"2013-02",
"Region":"South",
"Category":"Food",
"Product":"Biscuit",
"Rate":7,
"Qty":5,
"Amount":35
}
Consider following SQL query
SELECT Period, Region,Category, Product, Min(Rate),Max(Rate),Count(Rate), Sum(Qty),Sum(Amount)
FROM Sales
GROUP BY Period,Region,Category, Product;
Is it possible to create map/reduce views in couchdb equivalent to the above SQL query and to produce output like
[
{
"Period":"2013-01",
"Region":"East",
"Category":"Stationary",
"Product":"Pen",
"MinRate":1,
"MaxRate":2,
"OrdersCount":20,
"TotQty":1000,
"Amount":1750
},
{
...
}
]
Up front, I believe #benedolph's answer is best-practice and best-case-scenario. Each reduce should ideally return 1 scalar value to keep the code as simple as possible.
However, it is true you'd have to issue multiple queries to retrieve the full resultset described by your question. If you don't have the option to run queries in parallel, or it is really important to keep the number of queries down it is possible to do it all at once.
Your map function will remain pretty simple:
function (doc) {
emit([ doc.Period, doc.Region, doc.Category, doc.Product ], doc);
}
The reduce function is where it gets lengthy:
function (key, values, rereduce) {
// helper function to sum all the values of a specified field in an array of objects
function sumField(arr, field) {
return arr.reduce(function (prev, cur) {
return prev + cur[field];
}, 0);
}
// helper function to create an array of just a single property from an array of objects
// (this function came from underscore.js, at least it's name and concept)
function pluck(arr, field) {
return arr.map(function (item) {
return item[field];
});
}
// rereduce made this more challenging, and I could not thoroughly test this right now
// see the CouchDB wiki for more information
if (rereduce) {
// a rereduce handles transitionary values
// (so the "values" below are the results of previous reduce functions, not the map function)
return {
OrdersCount: sumField(values, "OrdersCount"),
MinRate: Math.min.apply(Math, pluck(values, "MinRate")),
MaxRate: Math.max.apply(Math, pluck(values, "MaxRate")),
TotQty: sumField(values, "TotQty"),
Amount: sumField(values, "Amount")
};
} else {
var rates = pluck(values, "Rate");
// This takes a group of documents and gives you the stats you were asking for
return {
OrdersCount: values.length,
MinRate: Math.min.apply(Math, rates),
MaxRate: Math.max.apply(Math, rates),
TotQty: sumField(values, "Qty"),
Amount: sumField(values, "Amount")
};
}
}
I was not able to test the "rereduce" branch of this code at all, you'll have to do that on your end. (but this should work) See the wiki for information about reduce vs rereduce.
The helper functions I added at the top actually made the code overall much shorter and easier to read, they're largely influenced by my experience with Underscore.js. However, you can't include CommonJS modules in reduce functions, so it has to be written manually.
Again, best-case scenario is to have each aggregated field get it's own map/reduce index, but if that isn't on option to you, the above code should get you what you've described here in the question.
I will propose a very simple solution that requires one view per variable you want to aggregate in your "select" clause. While it is certainly possible to aggregate all variables in a single view, the reduce function would be far more complex.
The design document looks like this:
{
"_id": "_design/ddoc",
"_rev": "...",
"language": "javascript",
"views": {
"rates": {
"map": "function(doc) {\n emit([doc.Period, doc.Region, doc.Category, doc.Product], doc.Rate);\n}",
"reduce": "_stats"
},
"qty": {
"map": "function(doc) {\n emit([doc.Period, doc.Region, doc.Category, doc.Product], doc.Qty);\n}",
"reduce": "_stats"
}
}
}
Now, you can query <couchdb>/<database>/_design/ddoc/_view/rates?group_level=4 to get the statistics about the "Rate" variable. The result should look like this:
{"rows":[
{"key":["2013-01","East","Stationary","Pen"],"value":{"sum":4,"count":3,"min":1,"max":2,"sumsqr":6}},
{"key":["2013-01","North","Stationary","Pen"],"value":{"sum":1,"count":1,"min":1,"max":1,"sumsqr":1}},
{"key":["2013-01","South","Stationary","Pen"],"value":{"sum":0.5,"count":1,"min":0.5,"max":0.5,"sumsqr":0.25}},
{"key":["2013-02","South","Food","Biscuit"],"value":{"sum":7,"count":1,"min":7,"max":7,"sumsqr":49}}
]}
For the "Qty" variable, the query would be <couchdb>/<database>/_design/ddoc/_view/qty?group_level=4.
With the group_level property you can control over which levels the aggregation is to be performed. For example, querying with group_level=2 will aggregate up to "Period" and "Region".

CouchDB: getting number of keys in given key range

In my CouchDB database, all keys have the form "A_xxxxxxxx" where xxxxxxxx is zero-padded decimal number (e.g. "A_00000001" or "A_12345678")
I want to get only the number of keys in a given key range.
For example, to get the keys from A_10000000 to A_30000000, I can query something like:
GET DATABASE/_all_docs?startkey="A_00001000"&endkey="A_30000000"&include_docs=false
But the result contains all keys, and I need to count the elements in "docs" field of the output.
Since the number of keys in my query will be huge, and all I want to know is the number of keys, not the actual list of the keys.
The range start and range end value can be vary, which is not fixed.
Is is possible to get only the number of keys of the given range, without retrieving actual key list?
Thanks,
You cannot get the number of keys in a given key range using the built-in _all_docs view. But you can get the desired result using a custom map reduce view such as this one described in the CouchDB Definitive Guide
map.js
function(doc) {
emit(doc._id, 1);
}
reduce.js
function(keys, values, rereduce) {
return sum(values)
}
You can add these views to your CouchDB database using the Futon admin utility by creating a new document with these contents:
{
"_id": "_design/test",
"views": {
"count": {
"map": "function(doc) {\n emit(doc._id, 1);\n}",
"reduce": "function(keys, values, rereduce) {\n return sum(values)\n}"
}
}
}
_design/test/count can then be queried like instead of _all_docs and will return the number of documents between the start and end keys.
When I run this query again my database without a start and end key I get this result:
{
"rows":[
{
"key": null,
"value": 185
}
]
}
Running the query again with the start and end keys populated I get this result:
{
"rows":[
{
"key": null,
"value": 11
}
]
}

Map composite key sort

I try to display app's log entries from couchdb - each log entry contains timestamp, log tag and client's remote IP, my map function is :
{
"_id": "_design/log",
"language": "javascript",
"views": {
"browse": {
"map": "function(doc){ if (doc.type=='log') {emit([doc.date,doc.tag,doc.ip], doc);}}"
}
}
}
Now how can I get log entries for specified IP(tag) sorted by date ?
Already tried variants of : /_design/log/_view/browse?startkey=["info","8.8.8.8"] with no success.
Your start key needs 3 elements: date, tag, and ip.
Your unsuccessful query only has 2 elements in the start key.
There is some documentation out there for composite keys. In the example used, they have a different key for year, month, and day. You can find the example in this book: http://shop.oreilly.com/product/0636920018247.do
Map function:
function(doc) {
if (doc.type === 'log') {
emit([doc.tag, doc.ip, doc.date], 1);
}
}
Query parameters (properly url-encoded):
?startkey=["info","8.8.8.8"]&endkey=["info","8.8.8.8",{}]&include_docs=true
The results are sorted by date because tag and ip are fixed.

Resources