CouchDB - Get DB's update_seq based on document - couchdb

I want to get all documents inside a CouchDB database and then listen to changes on that database. I could:
1- Get the docs using the _all_docs view. /db/_all_docs
2- Get the current db update_seq. /db .
3- listen to the changes in the database. /db/_changes?since=update_seq
But what if one or more documents are created right after I query the _all_docs view and before I get the update_seq? If that happens when I listen to the changes which happened after the update_seq I'll never receive those documents.
Is there a way to know what was the DB's update_seq when a given document had a given revision? With that, I could be 100% sure I'll never miss a document.

Add update_seq=true to your request for _all_docs, you'll get the update_seq for the database at that time. (this avoids the race condition you are afraid of)

Related

Logic App to push data from Cosmosdb into CRM and perform an update

I have created a logic app with the goal of pulling data from a container within cosmosdb (with a query), looping over the results and then pushing this data into CRM (or Common Data Service). When the data is pushed to CRM, an ID will be generated. I wish to then update cosmosdb with this new ID. Here is what I have so far:
This next step is querying for the data within our cosmosdb database and selecting all IDS with a length that is greater than 15. (This tells us that the ID is not yet within the CRM database)
Then we loop over the results and push this into CRM (Dynamics365 or the Common Data Service)
Dilemma: The first part of this process appears to be correct, however, I want to make sure that I am on the right track with this. Furthermore, once the data is successfully pushed to CRM, CRM automatically generates an ID for each record. How would I then update cosmosDB with the newly generated IDs?
Any suggestion is appreciated
Thanks
I see a red flag in your approach here with this query with length(c.id) > 15. This is not something I would do. I don't know how big your database is going to be but generally not very performant to do high volumes of cross partition queries, especially if the database is going to keep growing.
Cosmos DB already provides an awesome streaming capability so rather than doing this in a batch I would use Change Feed and use that to accomplish whatever your doing here in your Logic App. This will likely give you better control of the process and likely allow you to get the id back out of your CRM app to insert back into Cosmos DB.
Because you will be writing back to Cosmos DB, you will need a flag to ignore the update in Change Feed when the item is updated.

Listen to changes of all databases in CouchDB

I have a scenario where there are multiple (~1000 - 5000) databases being created dynamically in CouchDB, similar to the "one database per user" strategy. Whenever a user creates a document in any DB, I need to hit an existing API and update that document. This need not be synchronous. A short delay is acceptable. I have thought of two ways to solve this:
Continuously listen to the changes feed of the _global_changes database.
Get the db name which was updated from the feed.
Call the /{db}/_changes API with the seq (stored in redis).
Fetch the changed document, call my external API and update the document
Continuously replicate all databases into a single database.
Listen to the /_changes feed of this database.
Fetch the changed document, call my external API and update the document in the original database (I can easily keep a track of which document originally belongs to which database)
Questions:
Does any of the above make sense? Will it scale to 5000 databases?
How do I handle failures? It is critical that the API be hit for all documents.
Thanks!

CouchDB - Preferred structure for access/event logging

I'm just getting started with CouchDB and looking for some best practices. My current project is a CMS/Wiki-like tool that contains many pages of content. So far, this seems to fit well with CouchDB. The next thing I want to do is track every time a page on the site is accessed.
Each access log should contain the timestamp, the URI of the page that was accessed and the UUID of the user who accessed it. How is the best way to structure this access log information in CouchDB? It's likely that any given page will be accessed up to 100 times per day.
A couple thoughts I've had so far:
1 CouchDB document per page that contains ALL access logs.
1 CouchDB document per log.
If it's one document per log, should all the logs be in their own CouchDB database to keep the main DB cleaner?
Definitely not 1st option. Because CouchDb is an append only storage, each time you update document, new document with same ID but different revision is created. And if you have 100 hits for a page in a day 100 new documents will be created, as a result you database will quickly get huge. So its better to use your second option.
As for the separate database for logs, it depends on your data and how you plan to use it. You can create separate view just for your logs if you decide to keep all your data in same place.

How can I intercept a call from PouchDB to CouchDB, using .net

I am learning PouchDB with CouchDB and trying to wrap my head around intercepting documents to the couchdb server and performing an action on it wether it be creating other documents, updating the user table, etc.
On the server the json document will be treated through a business layer before it is submitted to the couchdb server, preferably in .net.
Is this possible? If not, is there a way to do so?
Thanks!
On the server side, you can listen to the _changes feed from CouchDB (docs here) and react whenever a document is added, modified, or deleted. This could be useful for reporting/messaging/aggregation/etc.
Alternatively, if you want to do some schema validation on the documents before they are accepted, then you should look into adding a design doc with a validate_doc_update field (docs here).

Issue with CouchDB

In the TAMA implementation, I came across an issue with Couchdb. (Version 1.2.0) ,
We are using named documents to maintain unique constraint logic in the application. (named documents : whose _id is user defined, and not couch generated.)
We are using the REST API to add the documents to Couchdb, where we found strange behavior :
When we try to recreate the documents using HTTP PUT which have been deleted in the past(because of bug in the code), the documents are not created the first time .
HTTP Put - Returns HTTP 200, but doc is not saved in couchdb.
Again trying the same request,
HTTP Put - Returns HTTP 200 and adds the doc in database.
HTTP PUT request needs to be sent twice to create and save the doc.
I have checked that the above bug is reproducible for deleted docs, i.e the response for GET _id is {"error":"not_found","reason":"deleted"}.
This looks like a bug in CouchDB to me, could you please let us know if you could think of any scenario where above error might occur and any possible workarounds/solutions ?
Couchdb has a builtin mechanism to ensure that you do not overwrite the same document as someone else.
If you PUT any existing document, you'll have to accompany this process with the current doc._rev value, so that couchdb can confirm the document you are updating is based on the most recent version in the database.
I've not come across this case with deletions, but it makes sense to me that couchdb should not allow you to overwrite a deleted document as the assumption should be, you just don't know about the deletion.
Have you tried if you can access the revision of the deleted document and if so, whether by adding it to the new document, you can succeed with the PUT on the first call?

Resources