I have a local pouchDB of tasks. Worked pretty well until I tried to set up sync with IBM Cloudant noSQL. The major problem so far is with the remove() method, which I have written like this:
$(document).on("click","#taskList li .delete", function(){
db.remove(id,rev);
refreshTasks();
});
The method works perfectly when sync is off, but as soon as I activate sync with the lines below, It won't remove the task from both local and remote. I'm running PouchDB syncing with this code:
db.sync(remote_db, {
live: true,
retry: true
}).on('change', function (change) {
// yo, something changed!
}).on('paused', function (info) {
// replication was paused, usually because of a lost connection
}).on('active', function (info) {
// replication was resumed
}).on('error', function (err) {
// totally unhandled error (shouldn't happen)
});
My database is created with this:
var db = new PouchDB('tasks', {revs_limit: 1, auto_compaction: true});
Now when I create the database with a revs_limit of 0, it works again.
What could be going on?
When you remove a document from Pouch, the underlying operation is similar to an update - the update contains a deletion "tombstone" revision that must be synchronised back to the server.
By setting revs_limit:1, the local database will only keep track of the most recent revision for each document, including this deletion, so when the sync occurs it won't know which parent revision to attach the deletion to. This isn't an error condition in Couch/Cloudant - it would just create a conflicted document on the server.
I suggest creating the database without changing the default revs_limit. Really, the only scenario where that makes sense is when you have documents that are immutable - i.e. they will never be updated / deleted.
Related
thanks for your help, I am new to firebase, I am designing an application with Node.js, what I want is that every time it detects changes in a document, a function is invoked that creates or updates the file system according to the new structure of data in the firebase document, everything works fine but the problem I have is that if the document is updated with 2 or more attributes the makeBotFileSystem function is invoked the same number of times which brings me problems since this can give me performance problems or file overwriting problems since what I do is generate or update multiple files.
I would like to see how the change can be expected but wait until all the information in the document is finished updating, not attribute by attribute, is there any way? this is my code:
let botRef = firebasebotservice.db.collection('bot');
botRef.onSnapshot(querySnapshot => {
querySnapshot.docChanges().forEach(change => {
if (change.type === 'modified') {
console.log('bot-changes ' + change.doc.id);
const botData = change.doc.data();
botData.botId = change.doc.id;
//HERE I CREATE OR UPDATE FILESYSTEM STRUCTURE, ACCORDING Data changes
fsbotservice.makeBotFileSystem(botData);
}
});
});
The onSnapshot function will notify you anytime a document changes. If property changes are commited one by one instead of updating the document all at once, then you will receive multiple snapshots.
One way to partially solve the multiple snapshot thing would be to change the code that updates the document to commit all property changes in a single operation so that you only receive one snapshot.
Nonetheless, you should design the function triggered by the snapshot so that it can handle multiple document changes without breaking. Given that document updates will happen no matter if by single/multiple property changes your code should be able to handle those. IMHO the problem is the filesystem update rather than how many snaphots are received
You should use docChanges() method like this:
db.collection("cities").onSnapshot(querySnapshot => {
let changes = querySnapshot.docChanges();
for (let change of changes) {
var data = change.doc.data();
console.log(data);
}
});
Currently we are working with nodeJS & LokiJS. as our application is dealing with real-time data; to communicate with external NoSQL/Relational DB will cause the latency problems.
So we decided to use in-memory database i.e, LokiJS.
LokiJs is good at when we are working with a collection which has 500-100 documents in it. but when it comes to the updates & parallel reads; it is worse.
meaning one of our vendor is published Kafka endpoint to consume the feed, and serve it to some external service again; From Kafka topic we are getting 100-200 events per second. So whenever, we received an Kafka event, we are updating the existing collection. As the delta updates are too frequent, LokiJS collection updates are not done properly by that read giving inconsistency results.
Here is my collection creation snippet.
let db= new loki('c:\products.json', {
autoload: true,
autosave: true,
autosaveInterval: 4000
});
this.collection= db.addCollection('REALTIMEFEED', {
indices: ["id", "type"],
adaptiveBinaryIndices: true,
autoupdate: true,
clone: true
});
function update(db, collection, element, id) {
try {
var data = collection.findOne(id);
data.snapshot = Date.now();
data.delta = element;
collection.update(data);
db.saveDatabase();
} catch (error) {
console.error('dbOperations:update:failed,', error)
}
}
Could you please suggest me that, am I missing anything here.
I think your problem lies in the fact that you are saving the database at each update. You already specified an autoSave and an autoSaveInterval, so LokiJS is going to periodically save your data. If you also force saving from each update you are clogging the process, since JS is single-threaded so it has to handle most of the operation (it can keep running when the save operations is passed off to the OS for the file save bit).
My Node app uses Mongo change streams, and the app runs 3+ instances in production (more eventually, so this will become more of an issue as it grows). So, when a change comes in the change stream functionality runs as many times as there are processes.
How to set things up so that the change stream only runs once?
Here's what I've got:
const options = { fullDocument: "updateLookup" };
const filter = [
{
$match: {
$and: [
{ "updateDescription.updatedFields.sites": { $exists: true } },
{ operationType: "update" }
]
}
}
];
const sitesStream = Client.watch(sitesFilter, options);
// Start listening to site stream
sitesStream.on("change", async change => {
console.log("in site change stream", change);
console.log(
"in site change stream, update desc",
change.updateDescription
);
// Do work...
console.log("site change stream done.");
return;
});
It can easily be done with only Mongodb query operators. You can add a modulo query on the ID field where the divisor is the number of your app instances (N). The remainder is then an element of {0, 1, 2, ..., N-1}. If your app instances are numbered in ascending order from zero to N-1 you can write the filter like this:
const filter = [
{
"$match": {
"$and": [
// Other filters
{ "_id": { "$mod": [<number of instances>, <this instance's id>]}}
]
}
}
];
Doing this with strong guarantees is difficult but not impossible. I wrote about the details of one solution here: https://www.alechenninger.com/2020/05/building-kafka-like-message-queue-with.html
The examples are in Java but the important part is the algorithm.
It comes down to a few techniques:
Each process attempts to obtain a lock
Each lock (or each change) has an associated fencing token
Processing each change must be idempotent
While processing the change, the token is used to ensure ordered, effectively-once updates.
More details in the blog post.
It sounds like you need a way to partition updates between instances. Have you looked into Apache Kafka? Basically what you would do is have a single application that writes the change data to a partitioned Kafka Topic and have your node application be a Kafka consumer. This would ensure only one application instance ever receives an update.
Depending on your partitioning strategy, you could even ensure that updates for the same record always go to the same node app (if your application needs to maintain its own state). Otherwise, you can spread out the updates in a round robin fashion.
The biggest benefit to using Kafka is that you can add and remove instances without having to adjust configurations. For example, you could start one instance and it would handle all updates. Then, as soon as you start another instance, they each start handling half of the load. You can continue this pattern for as many instances as there are partitions (and you can configure the topic to have 1000s of partitions if you want), that is the power of the Kafka consumer group. Scaling down works in the reverse.
While the Kafka option sounded interesting, it was a lot of infrastructure work on a platform I'm not familiar with, so I decided to go with something a little closer to home for me, sending an MQTT message to a little stand alone app, and letting the MQTT server monitor messages for uniqueness.
siteStream.on("change", async change => {
console.log("in site change stream);
const mqttClient = mqtt.connect("mqtt://localhost:1883");
const id = JSON.stringify(change._id._data);
// You'll want to push more than just the change stream id obviously...
mqttClient.on("connect", function() {
mqttClient.publish("myTopic", id);
mqttClient.end();
});
});
I'm still working out the final version of the MQTT server, but the method to evaluate uniqueness of messages will probably store an array of change stream IDs in application memory, as there is no need to persist them, and evaluate whether to proceed any further based on whether that change stream ID has been seen before.
var mqtt = require("mqtt");
var client = mqtt.connect("mqtt://localhost:1883");
var seen = [];
client.on("connect", function() {
client.subscribe("myTopic");
});
client.on("message", function(topic, message) {
context = message.toString().replace(/"/g, "");
if (seen.indexOf(context) < 0) {
seen.push(context);
// Do stuff
}
});
This doesn't include security, etc., but you get the idea.
Will that having a field in DB called status which will be updated using findAnUpdate based on the event received from change stream. So lets say you get 2 events at the same time from change stream. First event will update the status to start and the other will throw error if status is start. So the second event will not process any business logic.
I'm not claiming those are rock-solid production grade solutions, but I believe something like this could work
Solution 1
applying Read-Modify-Write:
Add version field to the document, all the created docs have version=0
Receive ChangeStream event
Read the document that needs to be updated
Perform the update on the model
Increment version
Update the document where both id and version match, otherwise discard the change
Yes, it creates 2 * n_application_replicas useless queries, so there is another option
Solution 2
Create collection of ResumeTokens in mongo which would store collection -> token mapping
In the changeStream handler code, after successful write, update ResumeToken in the collection
Create a feature toggle that will disable reading ChangeStream in your application
Configure only a single instance of your application to be a "reader"
In case of "reader" failure you might either enable reading on another node, or redeploy the "reader" node.
As a result: there might be an infinite amount of non-reader replicas and there won't be any useless queries
I have a situation where live sync is refusing to get some documents on it's own, making PouchDb.get return saying the document is not found (despite it being there in CouchDb that it is replicating from).
Reading through the documentation, it suggests doing a manual replicate first, then a sync. So I changed my code to first replicate
docId='testdoc';
return new Promise(function (resolve, reject) {
var toReplicate=[docId];
console.log("replicate new ", toReplicate);
var quickReplicate = self.db.replicate.from(self.url, {
doc_ids: toReplicate,
// timeout: 100000, //makes no difference
checkpoint: false, //attempt to get around bad checkpoints, but I purged all checkpoints and still have the issue
batch_size: 10, //attempt to get around huge memory usage
batches_limit: 1
}).on('denied', function (err) {
// a document failed to replicate (e.g. due to permissions)
console.log("replicate denied", err);
reject(err);
}).on('complete', function (info) {
// handle complete
console.log("replicate complete", info, toReplicate);
resolve(info);
}).on('error', function (err) {
// handle error
console.log("replicate error", err);
reject(err);
}).on('change', function (change) {
console.log("replicate change", change);
}).on('pause', function (err) {
console.log("replicate pause", err);
});
})
Then get the doc
return self.db.get(docId).catch(function (err) {
console.error(err);
throw err;
});
This function is called multiple times (about 8 times on average), each time requesting a single doc. They may all run at almost the exact same time.
To simplify this, I commented out nearly every single time this function was used, one at a time, until I found the exact document causing the problem. I reduced it down to a very simple command directly calling the problem document
db.replicate.from("https://server/db",{
doc_ids:['profile.bf778cd1c7b4e5ea9a3eced7049725a1']
}).then(function(result){
console.log("Done",result);
});
This will never finish, the browser will rapidly use up memory and crash.
It is probably related to database rollback issues in this question here Is it possible to get the latest seq number of PouchDB?
When you attempt to replicate this document, no event is ever fired in the above code. Chrome/firefox will just sit, gradually using more ram and maxing the CPU then the browser crashes with this message in chrome.
This started happening after we re-created our test system like this:
1: A live Couchdb is replicated to a test system.
2: The test Couchdb is modified and becomes ahead of the live system. Causing replication conflicts.
3: The test CouchDb is deleted, and the replication rerun from start, creating a fresh test system.
Certain documents now have this problem, despite never being in PouchDb before, and there should be no existing replication checkpoints for PouchDb since the database is a fresh replication of live. Even destroying the PouchDb doesn't work. Even removing the indexDb pouch doesn't solve it. I am not sure what else to try.
-Edit, I've narrowed down the problem a little bit, the document has a ton of deleted revisions from conflicts. It seems to get stuck looping through them.
I am using nodejs to contact the google datastore. This is my code:
dataset.runInTransaction(function(transaction, done,err) {
// From the `transaction` object, execute dataset methods as usual.
// Call `done` when you're ready to commit all of the changes.
transaction.save(entities, function(err) {
if(err){
console.log("ERROR TRNASCTION");
transaction.rollback(done);
return;
}else{
console.log("TRANSACTION SUCCESS!");
done();
}
});
});
If the save was not successful, I would like the transaction to rollback and if it was I want it to commit. The problem I am facing is that neither of them seem to be running, just no output on the console. Nothing is being sent to my database so I would assume at least the 'if(err)' condition would run but it doesn't. I am new to glcoud so I am not sure if I am doing something wrong here? I am following the doc at https://googlecloudplatform.github.io/gcloud-node/#/docs/v0.26.0/datastore/transaction?method=save.
In the context of a transaction, the save method doesn't actually need a callback. The things you wish to save are queued up until you call done(). Calling done will commit the transaction.
You can then handle errors from the commit operation in a second function passed to runInTransaction. See https://googlecloudplatform.github.io/gcloud-node/#/docs/v0.26.0/datastore/dataset?method=runInTransaction for examples of that.
--
Just mentioning this since it relates to the rollback part: https://github.com/GoogleCloudPlatform/gcloud-node/issues/633 -- we're waiting on an upgrade to Datastore's API before tackling that issue.