How to resolve conflicts with continuous replication

How to resolve conflicts with continuous replication - couchdb

I'm new to both CouchDB and PouchDB and am using it to create a contact management system that syncs across mobile and desktop devices, and can be used offline. I am seeing that using PouchDB is infinitely easier than having to write a PHP/MySQL backend.
I have been using it successfully, and when I make conflicting changes on offline devices, CouchDB uses an algorithm to arbitrarily pick a winner and then correctly pushes it to all the devices.
What I would like to do is implement a custom algorithm to merge conflicting records. Here is the algorithm I would like to use:
If a record is deleted on one client and merely updated on another,
the updated version wins, unless both clients agree on the delete.
The record with the most recent "modified" timestamp becomes the
master, and the older record becomes the secondary.
Any fields that exist only in the secondary (or are empty in the
master) are moved over to the master.
The master revision is saved and the secondary is deleted.
CouchDB's guide has a good explanation, but I don't have a clue how to implement it with the PouchDB API during a continuous replication. According to the PouchDB API, there is an "onChange" listener in the replicate options, but I don't understand how to use it to intercept conflicts.
If someone could write a brief tutorial including some sample code, myself and I'm sure many other PouchDB users would appreciate it!

Writing an article with examples of exactly how to manage conflict resolution is a really good idea, it can be confusing, but with the lack of one
The idea is the exact same as CouchDB, to resolve conflicts you delete the revisions that didnt win (and write a new winner if needed)
#1 is how CouchDB conflict resolution works anyway, so you dont need to worry about that, deleted leafs dont conflict
function onChange(doc) {
if (!doc._conflicts) return;
collectConflicts(doc._conflicts, function(docs) {
var master = docs.sort(byTime).unshift();
for (var doc in docs) {
for (var prop in doc) {
if (!(prop in master)) {
master[prop] = doc[prop];
}
}
}
db.put(master, function(err) {
if (!err) {
for (var doc in docs) {
db.remove(doc);
}
}
});
});
}
}
db.changes({conflicts: true, onChange: onChange});
This will need error handling etc and could be written much nicer, was just a quick napkin drawing of what the code could look like

Related

How do I know which fields are indexed in pouchdb if I use query() API?

I am new to pouchdb and I am reading below source code:
db.query('product_index', {
startkey: ["01234"],
endkey: ["01234", {}],
include_docs: false
});
this code executes for a long time. After read some pouchdb document it looks like it builds index on the database when it run the first time. But I don't understand which fields are indexed based on above code.
Below code I can see it builds index on field foo. But how can I understand query API for building index? What is the different between using query and createIndex from index perceptive?
db.createIndex({
index: {
fields: ['foo']
}
})

Have you seen the PouchDB Guide Bulk operations section Please use 'allDocs()'. Seriously.?
Far too many developers overlook this valuable API, because they
misunderstand it. When a developer says "my PouchDB app is slow!", it
is usually because they are using the slow query() API when they
should be using the fast allDocs() API.
When designing your data structures it's very important to bear that in mind. You should define your record id fields to optimize data accessibility through allDocs().

Preventing concurrent access to documents in Mongoose

My server application (using node.js, mongodb, mongoose) has a collection of documents for which it is important that two client applications cannot modify them at the same time without seeing each other's modification.
To prevent this I added a simple document versioning system: a pre-hook on the schema which checks if the version of the document is valid (i.e., not higher than the one the client last read). At first sight it works fine:
// Validate version number
UserSchema.pre("save", function(next) {
var user = this
user.constructor.findById(user._id, function(err, userCurrent) { // userCurrent is the user that is currently in the db
if (err) return next(err)
if (userCurrent == null) return next()
if(userCurrent.docVersion > user.docVersion) {
return next(new Error("document was modified by someone else"))
} else {
user.docVersion = user.docVersion + 1
return next()
}
})
})
The problem is the following:
When one User document is saved at the same time by two client applications, is it possible that these interleave between the pre-hook and the actual save operations? What I mean is the following, imagine time going from left to right and v being the version number (which is persisted by save):
App1: findById(pre)[v:1] save[v->2]
App2: findById(pre)[v:1] save[v->2]
Resulting in App1 saving something that has been modified meanwhile (by App2), and it has no way to notice that it was modified. App2's update is completely lost.
My question might boil down to: Do the Mongoose pre-hook and the save method happen in one atomic step?
If not, could you give me a suggestion on how to fix this problem so that no update ever gets lost?
Thank you!

MongoDB has findAndModify which, for a single matching document, is an atomic operation.
Mongoose has various methods that use this method, and I think that they will suit your use case:
Model.findOneAndUpdate()
Model.findByIdAndUpdate()
Model.findOneAndRemove()
Model.findByIdAndRemove()
Another solution (one that Mongoose itself uses as well for its own document versioning) is to use the Update Document if Current pattern.

Can you use CouchDB 'document update handlers' with replication?

I am replicating docs from DB A to DB B, every time a Doc from DB A arrives in DB B I want to run a 'stored procedure' to remove most of the fields from DB A (DB A is private, but has attachments that I want to be publicly available)
So far I've seen that this might be achieved using the _changes feed (continuous)and then running an 'update' handler on each document.
The document update handlers doc: https://wiki.apache.org/couchdb/Document_Update_Handlers
This seems like something that CouchDB would implement for me... (and I'm not really sure yet how to do the above).
Is there something like a 'hook' that can be run on every document that enters the database?
== EDIT ==
It seems that I would want to somehow include the update handler command in the replication trigger?

It sounds like with some changes to how your storing documents you may be able to benefit from CouchDB's filtered replication. You'd need to store the attachments in documents that could be equivalently copied (without modification) between the two databases.
If that's not an option, then you could potentially use transform-pouchdb plus PouchDB's .replicate.from() method to manage the replication.
Some quick pseudo-code for this idea looks a bit like this:
var PouchDB = require('pouchdb');
PouchDB.plugin(require('transform-pouch'));
var dbA = new PouchDB('a'); // "a" could be a URL to CouchDB or Cloudant
var dbB = new PouchDB('b');
dbB.transform({
incoming: function (doc) {
// do something to the document before storage
return doc;
}
});
dbB.replicate.from(dbA);
In theory, that (or something like it) should do what you're wanting...or at least giving you the framework in which to do what you're wanting. ^_^
Hope that helps!

CouchDB: Single document vs "joining" documents together

I'm tryting to decide the best approach for a CouchApp (no middleware). Since there are similarities to my idea, lets assume we have a stackoverflow page stored in a CouchDB. In essence it consists of the actual question on top, answers and commets. Those are basically three layers.
There are two ways of storing it. Either within a single document containing a suitable JSON representation of the data, or store each part of the entry within a separate document combining them later through a view (similar to this: http://www.cmlenz.net/archives/2007/10/couchdb-joins)
Now, both approaches may be fine, yet both have massive downsides from my current point of view. Storing a busy document (many changes through multiple users are expected) as a signle entity would cause conflicts to happen. If user A stores his/her changes to the document, user B would receive a conflict error once he/she is finished typing his/her update. I can imagine its possible to fix this without the users knowledge through re-downloading the document before retrying.
But what if the document is rather big? I'll except them to become rather blown up over time which would put quite some noticeable delay on a save process, especially if the retry process has to happen multiple times due to many users updating a document at the same time.
Another problem I'd see is editing. Every user should be allowed to edit his/her contributions. Now, if they're stored within one document it might be hard to write a solid auth handler.
Ok, now lets look at the multiple documents approach. Question, Answers and Comments would be stored within their own documents. Advantage: only the actual owner of the document can cause conflicts, something that won't happen too often. Being rather small elements of the whole, redownloading wouldn't take much time. Furthermore the auth routine should be quite easy to realize.
Now here's the downside. The single document is real easy to query and display. Having a lot of unsorted snippets laying around seems like a messy thing since I didn't really get the actual view to present me with a 100% ready to use JSON object containing the entire item in an ordered and structured format.
I hope I've been able to communicate the actual problem. I try to decide which solution would be more suitable for me, which problems easier to overcome. I imagine the first solution to be the prettier one in terms of storage and querying, yet the second one the more practical one solvable through better key management within the view (I'm not entirely into the principle of keys yet).
Thank you very much for your help in advance :)

Go with your second option. It's much easier than having to deal with the conflicts. Here are some example docs how I might structure the data:
{
_id: 12345,
type: 'question',
slug: 'couchdb-single-document-vs-joining-documents-together',
markdown: 'Im tryting to decide the best approach for a CouchApp (no middleware). Since there are similarities to...' ,
user: 'roman-geber',
date: 1322150148041,
'jquery.couch.attachPrevRev' : true
}
{
_id: 23456,
type: 'answer'
question: 12345,
markdown: 'Go with your second option...',
user : 'ryan-ramage',
votes: 100,
date: 1322151148041,
'jquery.couch.attachPrevRev' : true
}
{
_id: 45678,
type: 'comment'
question: 12345,
answer: 23456,
markdown : 'I really like what you have said, but...' ,
user: 'somedude',
date: 1322151158041,
'jquery.couch.attachPrevRev' : true
}
To store revisions of each one, I would store the old versions as attachments on the doc being edited. If you use the jquery client for couchdb, you get it for free by adding the jquery.couch.attachPrevRev = true. See Versioning docs in CouchDB by jchris
Create a view like this
fullQuestion : {
map : function(doc) {
if (doc.type == 'question') emit([doc._id, null, null], null);
if (doc.type == 'answer') emit([doc.question, doc._id, null], null);
if (doc.type == 'comment') emit([doc.question, doc.answer, doc._id], null) ;
}
}
And query the view like this
http://localhost:5984/so/_design/app/_view/fullQuestion?startkey=['12345']&endkey=['12345',{},{}]&include_docs=true
(Note: I have not url encoded this query, but it is more readable)
This will get you all of the related documents for the question that you will need to build the page. The only thing is that they will not be sorted by date. You can sort them on the client side (in javascript).
EDIT: Here is an alternative option for the view and query
Based on your domain, you know some facts. You know an answer cant exist before a question existed, and a comment on an answer cant exist before an answer existed. So lets make a view that might make it faster to create the display page, respecting the order of things:
fullQuestion : {
map : function(doc) {
if (doc.type == 'question') emit([doc._id, doc.date], null);
if (doc.type == 'answer') emit([doc.question, doc.date], null);
if (doc.type == 'comment') emit([doc.question, doc.date], null);
}
}
This will keep all the related docs together, and keep them ordered by date. Here is a sample query
http://localhost:5984/so/_design/app/_view/fullQuestion?startkey=['12345']&endkey=['12345',{}]&include_docs=true
This will get back all the docs you will need, ordered from oldest to newest. You can now zip through the results, knowing that the parent objects will be before the child ones, like this:
function addAnswer(doc) {
$('.answers').append(answerTemplate(doc));
}
function addCommentToAnswer(doc) {
$('#' + doc.answer).append(commentTemplate(doc));
}
$.each(results.rows, function(i, row) {
if (row.doc.type == 'question') displyQuestionInfo(row.doc);
if (row.doc.type == 'answer') addAnswer(row.doc);
if (row.doc.type == 'comment') addCommentToAnswer(row.doc)
})
So then you dont have to perform any client side sorting.
Hope this helps.

MVC 3 EF Code-first to webhost database trouble

Im fairly new to ASP.NET MVC 3, and to coding in general really.
I have a very very small application i want to upload to my webhosting domain.
I am using entity framework, and it works fine on my local machine.
I've entered a new connection string to use my remote database instead however it dosen't really work, first of all i have 1 single MSSQL database, which cannot be de dropped and recreated, so i cannot use that strategy in my initializer, i tried to supply null in the strategy, but to no avail, my tables simply does not get created in my database and thats the problem, i don't know how i am to do that with entity framework.
When i run the application, it tries to select the data from the database, that part works fine, i just dont know how to be able to create those tabes in my database through codefirst.
I could probaly get it to work through manually recreating the tables, but i want to know the solution through codefirst.
This is my initializer class
public class EntityInit : DropCreateDatabaseIfModelChanges<NewsContext>
{
private NewsContext _db = new NewsContext();
protected override void Seed(NewsContext context)
{
new List<News>
{
new News{ Author="Michael Brandt", Title="Test News 1 ", NewsBody="Bblablabalblaaaaa1" },
new News{ Author="Michael Brandt", Title="Test News 2 ", NewsBody="Bblablabalblaaaaa2" },
new News{ Author="Michael Brandt", Title="Test News 3 ", NewsBody="Bblablabalblaaaaa3" },
new News{ Author="Michael Brandt", Title="Test News 4 ", NewsBody="Bblablabalblaaaaa4" },
}.ForEach(a => context.News.Add(a));
base.Seed(context);
}
}
As i said, im really new to all this, so excuse me, if im lacking to provide the proper information you need to answer my question, just me know and i will answer it

Initialization strategies do not support upgrade strategies at the moment.
Initialization strategies should be used to initialise a new database. all subsequent changes should be done using scripts at the moment.
the best practice as we speak is to modify the database with a script, and then adjust by hand the code to reflect this change.
in future releases, upgrade / migration strategies will be available.
try to execute the scripts statement by statement from a custom IDatabaseInitializer
then from this you can read the database version in the db and apply the missing scripts to your database. simply store a db version in a table. then level up with change scripts.
public class Initializer : IDatabaseInitializer<MyContext>
{
public void InitializeDatabase(MyContext context)
{
if (!context.Database.Exists() || !context.Database.CompatibleWithModel(false))
{
context.Database.Delete();
context.Database.Create();
var jobInstanceStateList = EnumExtensions.ConvertEnumToDictionary<JobInstanceStateEnum>().ToList();
jobInstanceStateList.ForEach(kvp => context.JobInstanceStateLookup.Add(
new JobInstanceStateLookup()
{
JobInstanceStateLookupId = kvp.Value,
Value = kvp.Key
}));
context.SaveChanges();
}
}
}
Have you tried to use the CreateDatabaseOnlyIfNotExists
– Every time the context is initialized, database will be recreated if it does not exist.
The database initializer can be set using the SetInitializer method of the Database class.If nothing is specified it will use the CreateDatabaseOnlyIfNotExists class to initialize the database.
Database.SetInitializer(null);
-
Database.SetInitializer<NewsContext>(new CreateDatabaseOnlyIfNotExists<NewsContext>());
I'm not sure if this is the exact syntax as I have not written this in a while. But it should be very similar.

If you are using a very small application, you maybe could go for SQL CE 4.0.
The bin-deployment should allow you to run SQL CE 4.0 even if your provider doesn't have the binaries installed for it. You can read more here.
That we you can actually use whatever initializer you want, since you now don't have the problem of not being able to drop databases and delete tables.
could this be of any help?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string