How to search and sort with CouchDB in one map function - couchdb

I'm stumbling a bit with my CouchDB knowledge.
I have a database of content that is tagged with an array of tags and has a created date.
I want to create a view that pulls a limited number of newest stories tagged with a specific tag.
For example, the newest 6 stories tagged "Business."
Ran across this question, which seems to get me almost to where I need to go, but I'm missing one key element, which I think is how to craft the query string to sort by one key while searching by the other.
Here's my map function.
function(doc) {
if (doc.published == "yes" && doc.type == "news") {
for (var i = 0; i < doc.tags.length; i++) {
if (doc.tags[i]) {
emit([doc.created, doc.tags[i]], doc);
}
}
}
}
So how do I query that view for a all documents tagged "Business" that are the newest documents based on created.
The created attribute is a date sortable format.

First, I would switch the order of your emit:
emit([doc.tags[i], doc.created]);
(leave out doc as well, you can just add include_docs=true to get the entire document, and your view won't take up so much disk-space in the process)
Now you can query for the all the stories tagged as "Business" by using the following querystring:
startkey=["Business"]&endkey=["Business",{}]
You'll get all the documents with the tag business, and they'll be sorted by date.
This takes advantage of view collation, which basically is the rules governing how indexes are sorted/queried. For complex keys like this, the sorting is done for each item of the array separately. (ie. the first key is sorted first, the second key is sorted second, etc) This is why the order matters, as you must always move from left to right when querying a view index.
If you want the 6 most recent, your querystring will need to change:
descending=true&limit=6&endkey=["Business"]&startkey=["Business",{}]
NOTICE You need to swap the startkey/endkey values, due to how the descending parameter works. See the View reference page on the wiki for further explanation.

OK, I think I figured this out, but I'm not quite certain I fully understand it.
I found this story about complex keys and searching and sorting.
My map function looks like this:
function(doc) {
if (doc.published == "yes" && doc.type == "news") {
for (var i = 0; i < doc.tags.length; i++) {
if (doc.tags[i]) {
emit([doc.tags[i], doc.created], doc);
}
}
}
}
And to query and sort using it, the query looks like this.
http://localhost:5984/database/_design/story/_view/tagged?limit=10&startkey=["Business"]&endkey=["Business",{}]&descending=false
I'm getting the results I want, but I'm not entirely certain I understand it all.

Related

Cloudant Custom Sort

I have my data as follows
{
"key":"adasd",
"col1"::23,
"col2":3
}
I want to see the results sorted in descending order of the ratio of col1/sum(col2)
where sum(col2) refers to the sum of all values of col2. I am a bit new to cloudant so I don't know what the best way to approach this is. I can think of a few options.
Create a new column for sum(col2) and keep updating it with each new value of col2
For each record,also create a new column col1/sum(col2). Then i can sort on this column.
Use Views to calculate the ratio and sum on the fly. This way I don't have to store new columns plus I don't have to perform costly calculations on each update.
I tried to create a view and the map function is easy enough
function (doc) {
emit(doc._id, {"col1_value":doc.col1,"col2_value":doc.col2});
}
but I am confused by the reduce template
function (keys, values, rereduce) {
if (rereduce) {
return sum(values);
} else {
return values.length;
}
}
I have no idea on how to access the values of the two columns and then aggregate here. Is this even possible? Is there any other way to achieve the result I need?
Two comments:
Ordering by X/sum(Y) is the same as ordering by X (or by -X if sum(Y) is negative). So for ordering purposes, just order by X and save yourself a bunch of hassle.
Assuming you actually want to know the value of X/sum(Y), and not just order by it, there's no one-step way to accomplish this in CouchDB. The best I can think of is to create a map/reduce view that gives you the global sum(Y). Then you can fetch that sum with a simple query, and do the math in your application, when fetching your documents.

How to get from CouchDB only certain fields of certain documents for a single request?

For example I have a thousands of documents with same structure, for example:
{
"key_1":"value_1",
"key_2":"value_2",
"key_3":"value_3",
...
...
}
And I need to get, let's say key_1, key_3 and key_23 from some set of documents with known IDs, for example, I need to process only 5 documents while my DB contains several thousands. Each time I have a different set of keys and document IDs. Is it possible to get that information for a one request?
You can use a list function (see: this, this, and this).
Since you know the ids, you can then query _all_docs with the list function:
POST /{db}/_design/{ddoc}/_list/{func}/_all_docs?include_docs=true&columns=["key_1","key_2","key_3"]
Accept: application/json
Content-Length: {whatever}
{
"keys": [
"docid002",
"docid005"
]
}
The list function needs to look at documents, and send the appropriate JSON for each one. Not tested:
(function (head, req) {
send('{"total_rows":' + head.total_rows + ',"offset":' + head.offset + ',"rows":[');
var columns = JSON.parse(req.query.columns);
var delim = '';
var row;
while (row = getRow()) {
var doc = {};
for (var k in columns) {
doc[k] = row.doc[k];
}
row.doc = doc;
send(delim + toJSON(row));
delim = ',';
}
send(']}');
})
Whether this is a good idea, I'm not sure. If your documents are big, and bandwidth savings important, it might.
Yes, that’s possible. Your question can be broken up into two distinct problems:
Getting only a part of the document (in your example: key_1, key_3 and key_23). This can be done using a view. A view is saved into a design document. See the wiki for more info on how to create views.
Retrieving only certain documents, which are defined by their ID. When querying views, you cannot only specify a single ID (or rather key), but also an array of keys, which is what you would need here. Again, see the section on querying views in the wiki for explanations and examples.
Even though you only need a subset of values from a document, you may find that the system as a whole performs better if you just ask for the entire document then select the values you need from that result.
To only get the specific key value pairs you need to create a view that has view entries with a multipart key consisting of the doc id and doc item name, with value of the corresponding doc item.
So your map function would look something like:
function(doc){
for(var i = 1; i < doc.keysInDoc; i++){
var k = "key_"+i;
emit([doc._id, k], doc.[k]);
}
}
You can then use multi key lookup with each key being of the form ["docid12345", "key_1"], ["docid56789", "key_23"], etc.
So a query like:
http://host:5984/db/_design/design/_view/view?&keys=[["docid002","key_8"],["docid005","key_7"]]
will return
{"total_rows":84,"offset":67,"rows":[
{"id":"docid002","key":["docid002","key_8"],"value":"value d2_k8"},
{"id":"docid005","key":["docid005","key_12"],"value":"value d5_k12"}
]}

Couchdb - date range + multiple query parameters

I want to be able query the couchdb between dates, I know that this can be done with startkey and endkey (it works fine), but is it possible to do query for example like this:
SELECT *
FROM TABLENAME
WHERE
DateTime >= '2011-04-12T00:00:00.000' AND
DateTime <= '2012-05-25T03:53:04.000'
AND
Status = 'Completed'
AND
Job_category = 'Installation'
Generally-speaking, establishing indexes on multiple fields grows in complexity as the number of fields increases.
My main question is: do Status and Job_category need to be queried dynamically too? If not, your view is simple:
function (doc) {
if (doc.Status === 'Completed' && doc.Job_category === 'Installation') {
emit(doc.DateTime); // this line may change depending on how you break up and emit the datetimes
}
}
Views are fairly cheap, (depending on the size of your database) so don't be afraid to establish several that cover different cases. I would expect something like Status to have predefined list of available options, as oppposed to Job_category which seems like it could be more related to user input.
If you need those fields to be dynamic, you can just add them to the index as well:
function (doc) {
emit([ doc.Status, doc.Job_category, doc.DateTime ]);
}
Then you can use an array as your start_key. For example:
start_key=["Completed", "Installation", ...]
tl;dr: use "static" views where you have a predetermined list of values for a given field. while possible to query "dynamic" views with multiple fields, the complexity grows very quickly.

How to restrict rows in a list inside a document?

I have a document something like:
{"name":"Stock levels",
"content":[
{"sku":"328143",
"name":"Battery",
"stocklevel":"100",
"warehouse":"london"},
{"sku":"328143",
"name":"Battery",
"stocklevel":"20",
"warehouse":"manchester"},
{"sku":"328143",
"name":"Battery",
"stocklevel":"30",
"warehouse":"brighton"}]}
Where the list "content" could have quite a lot of rows.
What I want to do is return an internal row count and just one row from the list.
e.g.
{"name":"Stock levels",
"rows" : "2300",
"content":[
{"sku":"328143",
"name":"Battery",
"stocklevel":"100",
"warehouse":"london"}]}
How might I achieve this in CouchDb? My initial thought is using a list to effectively rebuild the document and inserting the extra rows field and restricting the number of rows return internally, but I am not sure if this is the best approach.
Thanks
You can use a view,
the following example allow you to search based on document id
(which is emit as key)
function(doc)
{
if (doc._id == "xxx")
{
emit(doc._id, {name:doc.name, rows:doc.content.length, content:doc.content[0]});
}
}

Couchdb: filter and group in a single view

I have a Couchdb database with documents of the form: { Name, Timestamp, Value }
I have a view that shows a summary grouped by name with the sum of the values. This is straight forward reduce function.
Now I want to filter the view to only take into account documents where the timestamp occured in a given range.
AFAIK this means I have to include the timestamp in the emitted key of the map function, eg. emit([doc.Timestamp, doc.Name], doc)
But as soon as I do that the reduce function no longer sees the rows grouped together to calculate the sum. If I put the name first I can group at level 1 only, but how to I filter at level 2?
Is there a way to do this?
I don't think this is possible with only one HTTP fetch and/or without additional logic in your own code.
If you emit([time, name]) you would be able to query startkey=[timeA]&endkey=[timeB]&group_level=2 to get items between timeA and timeB grouped where their timestamp and name were identical. You could then post-process this to add up whenever the names matched, but the initial result set might be larger than you want to handle.
An alternative would be to emit([name,time]). Then you could first query with group_level=1 to get a list of names [if your application doesn't already know what they'll be]. Then for each one of those you would query startkey=[nameN]&endkey=[nameN,{}]&group_level=2 to get the summary for each name.
(Note that in my query examples I've left the JSON start/end keys unencoded, so as to make them more human readable, but you'll need to apply your language's equivalent of JavaScript's encodeURIComponent on them in actual use.)
You can not make a view onto a view. You need to write another map-reduce view that has the filtering and makes the grouping in the end. Something like:
map:
function(doc) {
if (doc.timestamp > start and doc.timestamp < end ) {
emit(doc.name, doc.value);
}
}
reduce:
function(key, values, rereduce) {
return sum(values);
}
I suppose you can not store this view, and have to put it as an ad-hoc query in your application.

Resources