CouchDB filter function and continuous feed - couchdb

I have a filter function filtering based on document property, e.g. "version: A" and it works fine, until there a document update at some point in time when this property "version: A" removed (or updated to "version: B").
At this point i would like to be notified that the document been updated, similar to one when the document get deleted, but couldn't find an effective way (without listening and processing all documents changes).
Hope i'm just missing something and it's not a design limitation.

While my other answer is a valid approach, I had this same situation yesterday and decided to look at making this work using Mango selectors. I did the following:
Establish a changes feed filtered by the query selector (see the "_selector" filter for /db/_changes)
Perform the query (db/_find) and record the results
Establish a second changes feed that filters for just in the documents returned in (2) (see the "_doc_ids" filter for /db/_changes)
The feed at (1) lets you know when new documents match your query along with edits to documents that matched your query both before and after the change.
The feed at (2) lets you know when a change is made to a document that previously matched your query, irrespective of if it matches your query after the change has been made.
The combination of these feeds covers all cases, though with some false positives. On a change in either feed, tear down the changes feed at (3) and redo steps (2) and (3).
Now, some notes on this approach:
This is really only suitable in cases where the number of documents returned by the query is small because if the filtering by _id in the second feed.
Care must be taken to ensure that the second feed is established correctly if there are lots of changes coming in from the first changes feed.
There are cases where a change will appear in both feeds. It would be good to avoid reacting twice.
If changes are expected to happen frequently, then employ debouncing or rate limiting if your client does not need to process each and every change notification.
This approach worked well for me and the cases I had to deal with.
References:
http://docs.couchdb.org/en/stable/api/database/find.html
http://docs.couchdb.org/en/stable/api/database/changes.html

The behaviour that you described is correct.
CouchDB will populate the changes feed with the docs that accomplish with the filter function. If you remove/modify the information that is used by the filter function the filtered changes feed will ignore those updates.

The closest you will come to this is to use a view and filter the changes feed based on that view - see [1] for details.
You can create a simple view that includes the "version" as part of the key using a map function such as:
function (doc) {
emit(doc.version, 1);
}
A changes feed filtered by this view will notify you of the insert or deletion of documents that have a "version" field as well as changes to the "version" field of existing documents. You can not, however, determine the previous value of the "version" field from the changes feed.
Depending on your requirements, you can make the view more targeted. For example, if you only cared about transition form "A" to "B" then you could include only documents that have "A" or "B" as their "Version":
function (doc) {
if( doc.version === "A" || doc.version === "B") {
emit(doc.version, 1);
}
}
But be aware that this will not trigger a change notification on transition from, say, "A" to "C" (or any other value for "version", including when the document is deleted) because change notifications are only send when the map function emit()'s at least one value for a document. It doesn't not notify you when the map function used to emit at least one value for a give document, but no longer does!
You can also filter the changes feed using Mango selectors, so if Mango queries work for you then perhaps this is simpler than using a view, but I'm not sure that you can be notified of deletions via Mango selectors...
EDIT:
May claim about the simple map function above is not quite right as it will notify you of all document insertions and deletions, not just ones with a "version" field. You can do this to avoid some of those false positive notifications:
function (doc) {
if ( doc.hasOwnProperty( 'version' ) || doc.hasOwnProperty( '_deleted' ) ) {
emit(doc.version, 1);
}
}
That will give notifications for new documents with a "version" field, or an update that adds a "version" field to an existing document, but it will still notify of all deletions.
[1] http://docs.couchdb.org/en/stable/api/database/changes.html#changes-filter-view

Related

CosmosDB SQL API: Delete element matching criteria via partial update

I'm trying to remove a particular element matching criteria, in a nested array, when I already know the document id. For example, imagine a document like this (taken from https://devblogs.microsoft.com/cosmosdb/understanding-how-to-query-arrays-in-azure-cosmos-db/)
{
"id": "Tim",
"city": "Seattle",
"gifts": [
{
"recipient": "Andrew",
"gift": "blanket"
},
{
"recipient": "Deborah",
"gift": "board game"
},
{
"recipient": "Chris",
"gift": "coffee maker"
}
]
}
I want to remove any element from the "gifts" array that has a recipient of "Andrew". A naieve approach is to pull the document, remove the array element, and set the new document. But that concerns me because it exposes a race condition where, during the time of pulling and updating the document, a different mutation could be made to that array, which would be lost.
I'd like to see if I can perform a partial document update to remove the array element. However, the partial document update documentation at https://learn.microsoft.com/en-us/azure/cosmos-db/partial-document-update states (for "remove"):
If the target path is an array index, it will be deleted and any elements above the specified index are shifted one position to the left.
So it seems like the only means of removing an array element via a partial document update is by its array index. But if I pull its array index and then perform the update, I have a similar race condition: what if the array is mutated between when I pull the document and when I perform the partial doc update?
I see that for partial document updates, you can do a "Conditional Update":
Conditional Update: For the aforementioned modes, it is also possible
to add a SQL-like filter predicate (for example, from c where
c.taskNum = 3) such that the operation fails if the pre-condition
specified in the predicate is not satisfied.
I'm wondering if this is something I could use to achieve my goal.
Essentially, I think I'm looking for something like https://www.mongodb.com/docs/manual/reference/operator/update/pull/ but within the SQL API.
the array is mutated between when I pull the document and when I perform the partial doc update
The only way to avoid concurrent updates to the same document is by doing Optimistic Concurrency. You would need to first Read the current document, obtain it's ETag, and then use that on the update operation by passing it to the CosmosItemRequestOptions:
requestOptions.setIfMatchETag("etag from the read");
Full example: https://github.com/Azure-Samples/azure-cosmos-java-sql-api-samples/blob/0ead4ca33dac72c223285e1db866c9dc06f5fb47/src/main/java/com/azure/cosmos/examples/documentcrud/async/DocumentCRUDQuickstartAsync.java#L366-L418
In that case, if there was any concurrent update that happened, you'd get a failure with status code 412, which means a concurrent update happened and you'd need to retry the operation (read, get the new etag, find the new index to remove, execute the update).

Restrict User input for PXSelector and use it only as a lookup

I have a case in my customisation project, were I have a PXSelector that I want it to solely act as a lookup, and would not like the users to input any data via the selector and create new records.
I could not find a way to limit this from the attribute itself, therefore I tried to limit it from the events that the control fires. The idea was that in the FieldUpdating Event I would verify whether the value inserted by the user can be found in the selector's key column, if not I would revert it back to the old value. The problem was that cancelling the event had no effect on the selector and since I did not know what the previous value was, I could not revert it back manually.
It sounds like you are trying to use a filter. You need a PXFilter view which then could be used to display data in a grid for example.
You can search the source for "PXFilter to find good examples. One I found is APVendorBalanceEnq which uses public PXFilter<APHistoryFilter> Filter
PXFilter views are not committed to the database. Typically you would create a new DAC for the filter based on your needs but you can use existing DACs that are bound to tables without the fear of the data making it to the database. With the filter you simply use the field values rather than load records into the view.

How can I reduce a collection by keeping only change points?

I have a Collection exampled below. This data is pulled from an endpoint every twenty minutes on a cron job.
{"id":AFFD6,"empty":8,"capacity":15,"ready":6,"t":1474370406,"_id":"kROabyTIQ5eNoIf1"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474116005,"_id":"kX0DpoZ5fkMr2ezg"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474684808,"_id":"ken1WRN47PTW159H"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474117205,"_id":"kes1gDlG1sBjgV1R"}
{"id":AFFD6,"empty":10,"capacity":15,"ready":4,"t":1474264806,"_id":"khILUjzGEPOn0c2P"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474275606,"_id":"ko9r8u860es7E2hI"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474591207,"_id":"kpLS6mCtkIiffTrN"}
I want to discard any document (row) that doesn't show a change in the empty (and consequently ready). My goal is to find the most recent time stamp where these values have changed with in this collection.
Better illustrated, I want to reduce it to where the values change as such:
{"id":AFFD6,"empty":8,"capacity":15,"ready":6,"t":1474370406,"_id":"kROabyTIQ5eNoIf1"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474117205,"_id":"kes1gDlG1sBjgV1R"}
{"id":AFFD6,"empty":10,"capacity":15,"ready":4,"t":1474264806,"_id":"khILUjzGEPOn0c2P"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474591207,"_id":"kpLS6mCtkIiffTrN"}
Can I do this at the in a MongoDB query? Or am I better off with a JavaScript filter function?
MongoDB allows you to specify a unique constraint on an index. These constraints prevent applications from inserting documents that have duplicate values for the inserted fields.
Use the following code to make unique
db.collection.createIndex( { "id": 1 }, { unique: true } )
Also refer the MongoDB documentation for more clarification.

Pagination in CouchDB using variable keys

There's a bunch of questions on here related to pagination using CouchDB, but none that quite fit what I'm wondering about.
Basically, I have a result set ranked by number of votes, and I want to page through the set in descending order.
Here's the map for reference.
function(doc) {
emit(doc.votes);
}
Now, the problem. I found out that startkey_docid doesn't work on it's own. You have to use it in combination with startkey. The thing is, for the query, I don't use a startkey parameter (I'm not looking to restrict the results, just get the most->least). I was thinking I could just use startkey={{doc.votes}}&startkey_docid={{doc._id}} instead, but the number of votes for a document could have changed by the time someone clicks the "Next Page" link.
The way to solve this seemed obvious: just set startkey=99999999 so that it will return all documents in the database and I can just use startkey_docid to start at the one where we left off last time. Oddly, when I do that, the startkey_docid stopped working and just allowed all results to be returned again. Apparently startkey needs to exactly equal the key on the document whose _id is used in startkey_docid.
What I'm asking is whether anyone knows a workaround for using startkey_docid to page when the actual startkey could have changed by the time you want to use it? Should my application just lookup the document by _id and immediately use the doc.votes value hoping it hasn't changed in the few milliseconds between requests? Even that doesn't seem very reliable.
EDIT: Ended up switching to Mongo for the speed, so this question turned out to be kinda moot.
I have never done something like this but I think I have some idea how to do it. What you can do is to take a snapshot of the ratings and refer to it in every page. You probably want your view not to consume to much space, so you should not map separate copies of the documents with votes not changed after taking the snapshot. So, you can do the following:
Add some history of ratings with timestamp to your document.
Map the ratings AND history like this.
In your app get the current time: start_time = Date.now() and query all pages.
Cleanup the history older then the oldest active sessions.
The problem is that if you emit [votes, date] and try to paginate you will never know how many document you have to fetch to get desired number per page. There can always be some older version which you will have to skip, and you will have make next get from DB. Thats why you can consider emitting: [date, votes], read the view always twice -- for start_time and current time, and merge and sort the result (like in merge-sort).
Ad.1:
{ ...,
votes: 12,
history: [
{date: 1357390271342, votes: 10},
{date: 1357390294682, votes: 11}
]
}
Ad.2:
function (doc) {
emit([{}, doc.votes], null);
doc.history && doc.history.forEach(function(h) {
emit([h.date, h.votes], null);
});
}
Ad.3:
?startkey=[start_time, votes]&limit=items_per_page_plus1
?startkey=[{}, votes]&limit=items_per_page_plus1
Merge lists, sort by votes in your app (on in a list function).
If you will have problems with using start_docid then you can emit [date, votes, id] and query with the ID explicitly. Even when this particular doc changes its votes it will still be available in the history.
Ad.4:
If you emit [date, votes] then you can just get outdated history width: ?startkey=[0]&endkey=[oldest_active_session_time]&inclusive_end=false and update them with update handler:
function(doc, req) {
if (!doc || !doc.history) return [null, 'Error'];
var history = new Array();
var oldest = +(req.query.date);
doc.history.forEach(function(h) {
if (h.date >= oldest)
history.push(h);
});
doc.history = history;
return [doc, 'OK'];
}
Note: I have not tested it, so it is expected not to run without modifications :)
As far as I know CouchDB uses b-tree shadowing to make updates and in principle is should be possible to access older revisions of the view. I am not into the CouchDB design, so it is just a guess and there seems not to be any (documented) API for this.
I can't figure out any simple solution by now, but there are options:
Replicate not-so-often your sorting list to small dedicated db so it will be much more stale than stale=ok
Modify your schema in a way that you'll be able to sort by some more stable data. Look at the banking/ledger example in CouchDb guide: http://guide.couchdb.org/draft/recipes.html#banking. Try to log every vote and reduce them hourly for example. As a bonus you'll get a history/trends :)
I'm kind of surprised this question has been left unanswered because the functionality of CouchDB Futon basically does this when you are paginating through the results of a map function. I opened up firebug to see what was happening in the javascript console as I paginated and saw that for every set of paginated results it is passing the startkey along with startkey_docid. So although the question is how do I paginate without including startkey, CouchDB specifies that the startkey is required and demonstrates how it can work. The endkey is not specified, so if there is only one result for the specified startkey, the next set of paginated results will also contain the next key of the sorted results that do not match the startkey.
So to clarify a bit, the answer to this problem is that as you are paginating and keeping track of the startkey_docid, you also need to capture the startkey of the same document that will be the start of the next set of results. When you are calling the paginated results use both the captured startkey and startkey_docid as couchdb requires. Leave endkey off so that the results will continue on to the next key of the sorted results.
The usecase scenario for wanting to be able to paginate without specifying a key is kind of odd. So let's say that the start docid of the next paginated result did change it's key value drastically from a 9 to a 3. And we are also assuming that there is only one instance of the docid existing in the map results, even though it could potentially appear multiple times (which I believe is why the startkey needs to be specified). As the user is clicking the next button, the user's paginated results will have now moved from looking at rank 9 to rank 3. But if you are including the startkey in addition to the startkey_docid, the paginated results would just start all over at the beginning of the rank 9 results which is a more logical progression than potentially jumping over a large set of results.

Mongoose: Only return one embedded document from array of embedded documents

I've got a model which contains an array of embedded documents. This embedded documents keeps track of points the user has earned in a given activity. Since a user can be a part of several activities or just one, it makes sense to keep these activities in an array. Now, i want to extract the hall of fame, the top ten users for a given activity. Currently i'm doing it like this:
userModel.find({ "stats.activity": "soccer" }, ["stats", "email"])
.desc("stats.points")
.limit(10)
.run (err, users) ->
(if you are wondering about the syntax, it's coffeescript)
where "stats" is the array of embedded documents/activeties.
Now this actually works, but currently I'm only testing with accounts who only has one activity. I assume that something will go wrong (sorting-wise) once a user has more activities. Is there anyway i can tell mongoose to only return the embedded document where "activity" == "soccer" alongside the top-level document?
Btw, i realize i can do this another way, by having stats in it's own collection and having a db-ref to the relevant user, but i'm wondering if it's possible to do it like this before i consider any rewrites.
Thanks!
You are correct that this won't work once you have multiple activities in your array.
Specifically, since you can't return just an arbitrary subset of an array with the element, you'll get back all of it and the sort will apply across all points, not just the ones "paired" with "activity":"soccer".
There is a pretty simple tweak that you could make to your schema to get around this though. Don't store the activity name as a value, use it as the key.
{ _id: userId,
email: email,
stats: [
{soccer : points},
{rugby: points},
{dance: points}
]
}
Now you will be able to query and sort like so:
users.find({"stats.soccer":{$gt:0}}).sort({"stats.soccer":-1})
Note that when you move to version 2.2 (currently only available as unstable development version 2.1) you would be able to use aggregation framework to get the exact results you want (only a particular subset of an array or subdocument that matches your query) without changing your schema.

Resources