Pagination in CouchDB using variable keys - couchdb

There's a bunch of questions on here related to pagination using CouchDB, but none that quite fit what I'm wondering about.
Basically, I have a result set ranked by number of votes, and I want to page through the set in descending order.
Here's the map for reference.
function(doc) {
emit(doc.votes);
}
Now, the problem. I found out that startkey_docid doesn't work on it's own. You have to use it in combination with startkey. The thing is, for the query, I don't use a startkey parameter (I'm not looking to restrict the results, just get the most->least). I was thinking I could just use startkey={{doc.votes}}&startkey_docid={{doc._id}} instead, but the number of votes for a document could have changed by the time someone clicks the "Next Page" link.
The way to solve this seemed obvious: just set startkey=99999999 so that it will return all documents in the database and I can just use startkey_docid to start at the one where we left off last time. Oddly, when I do that, the startkey_docid stopped working and just allowed all results to be returned again. Apparently startkey needs to exactly equal the key on the document whose _id is used in startkey_docid.
What I'm asking is whether anyone knows a workaround for using startkey_docid to page when the actual startkey could have changed by the time you want to use it? Should my application just lookup the document by _id and immediately use the doc.votes value hoping it hasn't changed in the few milliseconds between requests? Even that doesn't seem very reliable.
EDIT: Ended up switching to Mongo for the speed, so this question turned out to be kinda moot.

I have never done something like this but I think I have some idea how to do it. What you can do is to take a snapshot of the ratings and refer to it in every page. You probably want your view not to consume to much space, so you should not map separate copies of the documents with votes not changed after taking the snapshot. So, you can do the following:
Add some history of ratings with timestamp to your document.
Map the ratings AND history like this.
In your app get the current time: start_time = Date.now() and query all pages.
Cleanup the history older then the oldest active sessions.
The problem is that if you emit [votes, date] and try to paginate you will never know how many document you have to fetch to get desired number per page. There can always be some older version which you will have to skip, and you will have make next get from DB. Thats why you can consider emitting: [date, votes], read the view always twice -- for start_time and current time, and merge and sort the result (like in merge-sort).
Ad.1:
{ ...,
votes: 12,
history: [
{date: 1357390271342, votes: 10},
{date: 1357390294682, votes: 11}
]
}
Ad.2:
function (doc) {
emit([{}, doc.votes], null);
doc.history && doc.history.forEach(function(h) {
emit([h.date, h.votes], null);
});
}
Ad.3:
?startkey=[start_time, votes]&limit=items_per_page_plus1
?startkey=[{}, votes]&limit=items_per_page_plus1
Merge lists, sort by votes in your app (on in a list function).
If you will have problems with using start_docid then you can emit [date, votes, id] and query with the ID explicitly. Even when this particular doc changes its votes it will still be available in the history.
Ad.4:
If you emit [date, votes] then you can just get outdated history width: ?startkey=[0]&endkey=[oldest_active_session_time]&inclusive_end=false and update them with update handler:
function(doc, req) {
if (!doc || !doc.history) return [null, 'Error'];
var history = new Array();
var oldest = +(req.query.date);
doc.history.forEach(function(h) {
if (h.date >= oldest)
history.push(h);
});
doc.history = history;
return [doc, 'OK'];
}
Note: I have not tested it, so it is expected not to run without modifications :)
As far as I know CouchDB uses b-tree shadowing to make updates and in principle is should be possible to access older revisions of the view. I am not into the CouchDB design, so it is just a guess and there seems not to be any (documented) API for this.

I can't figure out any simple solution by now, but there are options:
Replicate not-so-often your sorting list to small dedicated db so it will be much more stale than stale=ok
Modify your schema in a way that you'll be able to sort by some more stable data. Look at the banking/ledger example in CouchDb guide: http://guide.couchdb.org/draft/recipes.html#banking. Try to log every vote and reduce them hourly for example. As a bonus you'll get a history/trends :)

I'm kind of surprised this question has been left unanswered because the functionality of CouchDB Futon basically does this when you are paginating through the results of a map function. I opened up firebug to see what was happening in the javascript console as I paginated and saw that for every set of paginated results it is passing the startkey along with startkey_docid. So although the question is how do I paginate without including startkey, CouchDB specifies that the startkey is required and demonstrates how it can work. The endkey is not specified, so if there is only one result for the specified startkey, the next set of paginated results will also contain the next key of the sorted results that do not match the startkey.
So to clarify a bit, the answer to this problem is that as you are paginating and keeping track of the startkey_docid, you also need to capture the startkey of the same document that will be the start of the next set of results. When you are calling the paginated results use both the captured startkey and startkey_docid as couchdb requires. Leave endkey off so that the results will continue on to the next key of the sorted results.
The usecase scenario for wanting to be able to paginate without specifying a key is kind of odd. So let's say that the start docid of the next paginated result did change it's key value drastically from a 9 to a 3. And we are also assuming that there is only one instance of the docid existing in the map results, even though it could potentially appear multiple times (which I believe is why the startkey needs to be specified). As the user is clicking the next button, the user's paginated results will have now moved from looking at rank 9 to rank 3. But if you are including the startkey in addition to the startkey_docid, the paginated results would just start all over at the beginning of the rank 9 results which is a more logical progression than potentially jumping over a large set of results.

Related

CouchDB filter function and continuous feed

I have a filter function filtering based on document property, e.g. "version: A" and it works fine, until there a document update at some point in time when this property "version: A" removed (or updated to "version: B").
At this point i would like to be notified that the document been updated, similar to one when the document get deleted, but couldn't find an effective way (without listening and processing all documents changes).
Hope i'm just missing something and it's not a design limitation.
While my other answer is a valid approach, I had this same situation yesterday and decided to look at making this work using Mango selectors. I did the following:
Establish a changes feed filtered by the query selector (see the "_selector" filter for /db/_changes)
Perform the query (db/_find) and record the results
Establish a second changes feed that filters for just in the documents returned in (2) (see the "_doc_ids" filter for /db/_changes)
The feed at (1) lets you know when new documents match your query along with edits to documents that matched your query both before and after the change.
The feed at (2) lets you know when a change is made to a document that previously matched your query, irrespective of if it matches your query after the change has been made.
The combination of these feeds covers all cases, though with some false positives. On a change in either feed, tear down the changes feed at (3) and redo steps (2) and (3).
Now, some notes on this approach:
This is really only suitable in cases where the number of documents returned by the query is small because if the filtering by _id in the second feed.
Care must be taken to ensure that the second feed is established correctly if there are lots of changes coming in from the first changes feed.
There are cases where a change will appear in both feeds. It would be good to avoid reacting twice.
If changes are expected to happen frequently, then employ debouncing or rate limiting if your client does not need to process each and every change notification.
This approach worked well for me and the cases I had to deal with.
References:
http://docs.couchdb.org/en/stable/api/database/find.html
http://docs.couchdb.org/en/stable/api/database/changes.html
The behaviour that you described is correct.
CouchDB will populate the changes feed with the docs that accomplish with the filter function. If you remove/modify the information that is used by the filter function the filtered changes feed will ignore those updates.
The closest you will come to this is to use a view and filter the changes feed based on that view - see [1] for details.
You can create a simple view that includes the "version" as part of the key using a map function such as:
function (doc) {
emit(doc.version, 1);
}
A changes feed filtered by this view will notify you of the insert or deletion of documents that have a "version" field as well as changes to the "version" field of existing documents. You can not, however, determine the previous value of the "version" field from the changes feed.
Depending on your requirements, you can make the view more targeted. For example, if you only cared about transition form "A" to "B" then you could include only documents that have "A" or "B" as their "Version":
function (doc) {
if( doc.version === "A" || doc.version === "B") {
emit(doc.version, 1);
}
}
But be aware that this will not trigger a change notification on transition from, say, "A" to "C" (or any other value for "version", including when the document is deleted) because change notifications are only send when the map function emit()'s at least one value for a document. It doesn't not notify you when the map function used to emit at least one value for a give document, but no longer does!
You can also filter the changes feed using Mango selectors, so if Mango queries work for you then perhaps this is simpler than using a view, but I'm not sure that you can be notified of deletions via Mango selectors...
EDIT:
May claim about the simple map function above is not quite right as it will notify you of all document insertions and deletions, not just ones with a "version" field. You can do this to avoid some of those false positive notifications:
function (doc) {
if ( doc.hasOwnProperty( 'version' ) || doc.hasOwnProperty( '_deleted' ) ) {
emit(doc.version, 1);
}
}
That will give notifications for new documents with a "version" field, or an update that adds a "version" field to an existing document, but it will still notify of all deletions.
[1] http://docs.couchdb.org/en/stable/api/database/changes.html#changes-filter-view

Alfresco webscript (js) and pagination

I have a question about the good way to use pagination with Alfresco.
I know the documentation (https://wiki.alfresco.com/wiki/4.0_JavaScript_API#Search_API)
and I use with success the query part.
I mean by that that I use the parameters maxItems and skipCount and they work the way I want.
This is an example of a query that I am doing :
var paging =
{
maxItems: 100,
skipCount: 0
};
var def =
{
query: "cm:name:test*"
page: paging
};
var results = search.query(def);
The problem is that, if I get the number of results I want (100 for example), I don't know how to get the maxResults of my query (I mean the total amount of result that Alfresco can give me with this query).
And I need this to :
know if there are more results
know how many pages of results are lasting
I'm using a workaround for the first need : I'm doing a query for (maxItems+1), and showing only maxItems. If I have maxItems+1, I know that there are more results. But this doesn't give me the total amount of result.
Do you have any idea ?
With the javascript search object you can't know if there are more items. This javascript object is backed by the class org.alfresco.repo.jscript.Search.java. As you can see the query method only returns the query results without any extra information. Compare it with org.alfresco.repo.links.LinkServiceImpl which gives you results wrapped in PagingResults.
So, as javacript search object doesn't provide hasMoreItems info, you need to perform some workaround, for instance first query without limits to know the total, and then apply pagination as desired.
You can find how many objects have been found by your query simply calling
results.length
paying attention to the fact that usually queries have a configured maximum result set of 1000 entries to save resources.
You can change this value by editing the <alfresco>/tomcat/webapps/alfresco/WEB_INF/classes/alfresco/repository.properties file.
So, but is an alternative to your solution, you can launch a query with no constraints and obtain the real value or the max results configured.
Then you can use this value to devise how many pages are available basing you calculation on the number of results for page.
Then dinamically pass the number of the current page to the builder of your query def and the results variable will contain the corresponding chunk of data.
In this SO post you can find more information about pagination.

How to get Post with Comments Count in single query with CouchDB?

How to get Post with Comments Count in single query with CouchDB?
I can use map-reduce to build standalone view [{key: post_id, value: comments_count}] but then I had to hit DB twice - one query to get the post, another to get comments_count.
There's also another way (Rails does this) - count comments manually, on the application server and save it in comment_count attribute of the post. But then we need to update the whole post document every time a new comment added or deleted.
It seems to me that CouchDB is not tuned for such a way, unlike RDBMS when we can update only the comment_count attribute in CouchDB we are forced to update the whole post document.
Maybe there's another way to do it?
Thanks.
The view's return json includes the document count as 'total_rows', so you don't need to compute anything yourself, just emit all the documents you want counted.
{"total_rows":3,"offset":0,"rows":[
{"id":...,"key":...,value:doc1},
{"id":...,"key":...,value:doc2},
{"id":...,"key":...,value:doc3}]
}

CouchDB views - Multiple join... Can it be done?

I have three document types MainCategory, Category, SubCategory... each have a parentid which relates to the id of their parent document.
So I want to set up a view so that I can get a list of SubCategories which sit under the MainCategory (preferably just using a map function)... I haven't found a way to arrange the view so this is possible.
I currently have set up a view which gets the following output -
{"total_rows":16,"offset":0,"rows":[
{"id":"11098","key":["22056",0,"11098"],"value":"MainCat...."},
{"id":"11098","key":["22056",1,"11098"],"value":"Cat...."},
{"id":"33610","key":["22056",2,"null"],"value":"SubCat...."},
{"id":"33989","key":["22056",2,"null"],"value":"SubCat...."},
{"id":"11810","key":["22245",0,"11810"],"value":"MainCat...."},
{"id":"11810","key":["22245",1,"11810"],"value":"Cat...."},
{"id":"33106","key":["22245",2,"null"],"value":"SubCat...."},
{"id":"33321","key":["22245",2,"null"],"value":"SubCat...."},
{"id":"11098","key":["22479",0,"11098"],"value":"MainCat...."},
{"id":"11098","key":["22479",1,"11098"],"value":"Cat...."},
{"id":"11810","key":["22945",0,"11810"],"value":"MainCat...."},
{"id":"11810","key":["22945",1,"11810"],"value":"Cat...."},
{"id":"33123","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33453","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33667","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33987","key":["22945",2,"null"],"value":"SubCat...."}
]}
Which QueryString parameters would I use to get say the rows which have a key that starts with ["22945".... When all I have (at query time) is the id "11810" (at query time I don't have knowledge of the id "22945").
If any of that makes sense.
Thanks
The way you store your categories seems to be suboptimal for the query you try to perform on it.
MongoDB.org has a page on various strategies to implement tree-structures (they should apply to Couch and other doc dbs as well) - you should consider Array of Ancestors, where you always store the full path to your node. This makes updating/moving categories more difficult, but querying is easy and fast.

CouchDB pagination sorted by date, queried by id

I want to create pagination on application level using the CouchDB view API. The pagination uses cursors, so given a cursor, I will query the view for the n+1 documents starting with the given cursor as start key and output the n results as page and provide the n+1 result row as the cursor for the next page.
This works well as long as the view keys are also the keys for my view rows. Now this time all my docs have a date field and I emit them as map keys, because I want to sort via date. However, I can't use my cursors anymore like before.
I thought that is the reason the view API also provides startkey_docid for submitting such a cursor doc id, however this is obviously not true. It seems like this value is only applied if there are several equal rows per keys.
So, in short: I want a date-ordered view, but cursors based on the document ids. How can I do this?
Thanks in advance
Simplified view
function map(doc)
{
emit(doc.date, {_id: doc._id});
}
Simplified view result:
{
"rows":[
{"id":"123","key":"2010-06-26T01:28:13.555Z", value:{...}},
{"id":"234","key":"2010-06-22T12:21:23.123Z", value:{...}},
{"id":"987","key":"2010-06-16T13:48:43.321Z", value:{...}},
{"id":"103","key":"2010-05-01T17:38:31.123Z", value:{...}},
{"id":"645","key":"2009-07-21T21:21:13.345Z", value:{...}}
]
}
Application-level query with cursor 234, page size 3 should return:
234, 987, 103
So how can I map this to a view?
Why do you want cursors based on docid?
Map Reduce creates single dimensional indexes, so any non-key traversal will be expensive. However, I think you can do what you want without requiring traversing 2 indexes at the same time.
See for instance here how I paginate through a posts with a certain tag:
Sofa's CouchApp tag pagination
aka
http://jchris.couchone.com/sofa/_design/sofa/_list/index/tags?descending=true&reduce=false&limit=10&startkey=[%22life%22%2C{}]&endkey=[%22life%22]
The key in that view looks like ["tag","2008/10/25 04:49:10 +0000"] so you can paginate through by tag and, within tags, by time.
Edited
Ha! I just realized what you are trying to do. It is so very simple.
Forget all about docids, they should be random anyway and not related to anything so just forget docs even have ids for a second.
You say "Application-level query with cursor 234, page size 3 should return:
234, 987, 103"
Your cursor should not be 234. It should be the key "2010-06-22T12:21:23.123Z".
So in essence you use the key of the last row of results as the startkey for the next query. So eg startkey=""2010-06-22T12:21:23.123Z""&limit=3, then for each page you render, link to a query where the new startkey is the last returned key.
Bonus: with what I've just described, you will have the bottom row of page 2 be the top row of page 3. To fix this, add skip=1 to your query.
Bonus bonus: OK, what about when I have more than 3 docs that emitted to the same key in the view? Then the last key will always be the same as the first key, so you can't progress in pagination without expanding the limit parameter. Unless... you use startkey_docid (and set it do the id of the last row). That is the only time you should use startkey_docid.

Resources