PouchDb Replicate of single document causes huge memory usage, then crash - couchdb

I have a situation where live sync is refusing to get some documents on it's own, making PouchDb.get return saying the document is not found (despite it being there in CouchDb that it is replicating from).
Reading through the documentation, it suggests doing a manual replicate first, then a sync. So I changed my code to first replicate
docId='testdoc';
return new Promise(function (resolve, reject) {
var toReplicate=[docId];
console.log("replicate new ", toReplicate);
var quickReplicate = self.db.replicate.from(self.url, {
doc_ids: toReplicate,
// timeout: 100000, //makes no difference
checkpoint: false, //attempt to get around bad checkpoints, but I purged all checkpoints and still have the issue
batch_size: 10, //attempt to get around huge memory usage
batches_limit: 1
}).on('denied', function (err) {
// a document failed to replicate (e.g. due to permissions)
console.log("replicate denied", err);
reject(err);
}).on('complete', function (info) {
// handle complete
console.log("replicate complete", info, toReplicate);
resolve(info);
}).on('error', function (err) {
// handle error
console.log("replicate error", err);
reject(err);
}).on('change', function (change) {
console.log("replicate change", change);
}).on('pause', function (err) {
console.log("replicate pause", err);
});
})
Then get the doc
return self.db.get(docId).catch(function (err) {
console.error(err);
throw err;
});
This function is called multiple times (about 8 times on average), each time requesting a single doc. They may all run at almost the exact same time.
To simplify this, I commented out nearly every single time this function was used, one at a time, until I found the exact document causing the problem. I reduced it down to a very simple command directly calling the problem document
db.replicate.from("https://server/db",{
doc_ids:['profile.bf778cd1c7b4e5ea9a3eced7049725a1']
}).then(function(result){
console.log("Done",result);
});
This will never finish, the browser will rapidly use up memory and crash.
It is probably related to database rollback issues in this question here Is it possible to get the latest seq number of PouchDB?
When you attempt to replicate this document, no event is ever fired in the above code. Chrome/firefox will just sit, gradually using more ram and maxing the CPU then the browser crashes with this message in chrome.
This started happening after we re-created our test system like this:
1: A live Couchdb is replicated to a test system.
2: The test Couchdb is modified and becomes ahead of the live system. Causing replication conflicts.
3: The test CouchDb is deleted, and the replication rerun from start, creating a fresh test system.
Certain documents now have this problem, despite never being in PouchDb before, and there should be no existing replication checkpoints for PouchDb since the database is a fresh replication of live. Even destroying the PouchDb doesn't work. Even removing the indexDb pouch doesn't solve it. I am not sure what else to try.
-Edit, I've narrowed down the problem a little bit, the document has a ton of deleted revisions from conflicts. It seems to get stuck looping through them.

Related

LokiJS inconsistent with updates & reads while working with 50k+ reads and writes

Currently we are working with nodeJS & LokiJS. as our application is dealing with real-time data; to communicate with external NoSQL/Relational DB will cause the latency problems.
So we decided to use in-memory database i.e, LokiJS.
LokiJs is good at when we are working with a collection which has 500-100 documents in it. but when it comes to the updates & parallel reads; it is worse.
meaning one of our vendor is published Kafka endpoint to consume the feed, and serve it to some external service again; From Kafka topic we are getting 100-200 events per second. So whenever, we received an Kafka event, we are updating the existing collection. As the delta updates are too frequent, LokiJS collection updates are not done properly by that read giving inconsistency results.
Here is my collection creation snippet.
let db= new loki('c:\products.json', {
autoload: true,
autosave: true,
autosaveInterval: 4000
});
this.collection= db.addCollection('REALTIMEFEED', {
indices: ["id", "type"],
adaptiveBinaryIndices: true,
autoupdate: true,
clone: true
});
function update(db, collection, element, id) {
try {
var data = collection.findOne(id);
data.snapshot = Date.now();
data.delta = element;
collection.update(data);
db.saveDatabase();
} catch (error) {
console.error('dbOperations:update:failed,', error)
}
}
Could you please suggest me that, am I missing anything here.
I think your problem lies in the fact that you are saving the database at each update. You already specified an autoSave and an autoSaveInterval, so LokiJS is going to periodically save your data. If you also force saving from each update you are clogging the process, since JS is single-threaded so it has to handle most of the operation (it can keep running when the save operations is passed off to the OS for the file save bit).

Mongoose Node - Request Multiple updates to multi documents with success all or cancel all?

Good question I expect to be slane down quickly.
Sometimes we want to update many documents when an action is performed at the front end.
Example React Code
this.props.submitRecord(newRecord, (err, record) => {
if (err) actions.showSnackBar(err);
else {
actions.showSnackBar("Record Submitted Successfully ...");
this.props.validateClub(this.props.club._id, (err, message) => {
if (err) actions.showSnackBar(err);
else {
obj.setState({
player: {},
open: false
});
actions.showSnackBar(message);
}
});
}
});
As we can see I firstly submit the first request, and on success, I submit the second request. If the first fails, the second one won't happen. But, if the first one passes, and the second one fails for whatever reason, we have a data mismatch.
Ideally, we want to send them all together and they all pass or none pass. Is there a simple way at doing this with react, Node and mongoose or do I have to do it the hard way (Which is also subject to error been possible, store old values until all request are satisfied, or make some revert request on the node server, lol).
Thanks
Transactions are a part of Mongodb 4.0. There was no transaction support in Mongodb in the previous versions. The other way could be to perform rollback on failure through code, and there are some non-opinionated npm packages such as mongoose-transaction.
https://www.mongodb.com/transactions

PouchDB not syncing deletions when revs_limit is greater than 0

I have a local pouchDB of tasks. Worked pretty well until I tried to set up sync with IBM Cloudant noSQL. The major problem so far is with the remove() method, which I have written like this:
$(document).on("click","#taskList li .delete", function(){
db.remove(id,rev);
refreshTasks();
});
The method works perfectly when sync is off, but as soon as I activate sync with the lines below, It won't remove the task from both local and remote. I'm running PouchDB syncing with this code:
db.sync(remote_db, {
live: true,
retry: true
}).on('change', function (change) {
// yo, something changed!
}).on('paused', function (info) {
// replication was paused, usually because of a lost connection
}).on('active', function (info) {
// replication was resumed
}).on('error', function (err) {
// totally unhandled error (shouldn't happen)
});
My database is created with this:
var db = new PouchDB('tasks', {revs_limit: 1, auto_compaction: true});
Now when I create the database with a revs_limit of 0, it works again.
What could be going on?
When you remove a document from Pouch, the underlying operation is similar to an update - the update contains a deletion "tombstone" revision that must be synchronised back to the server.
By setting revs_limit:1, the local database will only keep track of the most recent revision for each document, including this deletion, so when the sync occurs it won't know which parent revision to attach the deletion to. This isn't an error condition in Couch/Cloudant - it would just create a conflicted document on the server.
I suggest creating the database without changing the default revs_limit. Really, the only scenario where that makes sense is when you have documents that are immutable - i.e. they will never be updated / deleted.

How to make sure Elastic Search is healthy before sending a command?

I'm indexing a bunch of documents using Elastic Search's Bulk API from the Javascript (NodeJS) client. I'm sending a thousand docs with each call. The instance handles it until before it reaches 100 calls (100K documents, approx.). Then it goes down returning a Service Unavailable (503) error.
Before doing a new call, I wait for the previous to complete and an extra second.
Searching on this matter, I found a post that talks about a fix for Rails: https://medium.com/#thetron/dealing-with-503-errors-when-testing-elasticsearch-integration-in-rails-ec7a5f828274. The author uses the following code to make the errors go away:
before do
repository.create_index!
repository.client.cluster.health wait_for_status: ‘yellow’
end
Based on that, I wrote the following:
const body = [
// 1K actions/docs
];
elastic.cluster.health({
waitForStatus: 'yellow',
timeout: '60s', // I also tried using the default timeout
requestTimeout: 60000
}, function (error, response) {
if (!!error) {
console.error(error);
return;
}
elastic.bulk({
body: body
}, function (error, response) {
if (!!error) {
console.error(error);
return;
}
console.log('Success!');
});
});
Not sure if it makes any difference, but the instance is running on AWS. Due the big number of docs, maybe scaling up the instance is a solution. But I wanted to figure out how to handle this error before going that way. Even if I have to make my code somewhat slower.
Your best bet is to scale up your cluster as you've said, but your loading process should be able to handle failures as well.
That being said, the following are the Elasticsearch cluster statuses:
red - There's at least one unallocated primary shard
yellow - There's at least one unallocated replica shard
green - Everything is allocated and healthy
So in you example above, you don't want to wait for yellow, you want to wait for green.

gcloud Datastore transaction issue using nodejs

I am using nodejs to contact the google datastore. This is my code:
dataset.runInTransaction(function(transaction, done,err) {
// From the `transaction` object, execute dataset methods as usual.
// Call `done` when you're ready to commit all of the changes.
transaction.save(entities, function(err) {
if(err){
console.log("ERROR TRNASCTION");
transaction.rollback(done);
return;
}else{
console.log("TRANSACTION SUCCESS!");
done();
}
});
});
If the save was not successful, I would like the transaction to rollback and if it was I want it to commit. The problem I am facing is that neither of them seem to be running, just no output on the console. Nothing is being sent to my database so I would assume at least the 'if(err)' condition would run but it doesn't. I am new to glcoud so I am not sure if I am doing something wrong here? I am following the doc at https://googlecloudplatform.github.io/gcloud-node/#/docs/v0.26.0/datastore/transaction?method=save.
In the context of a transaction, the save method doesn't actually need a callback. The things you wish to save are queued up until you call done(). Calling done will commit the transaction.
You can then handle errors from the commit operation in a second function passed to runInTransaction. See https://googlecloudplatform.github.io/gcloud-node/#/docs/v0.26.0/datastore/dataset?method=runInTransaction for examples of that.
--
Just mentioning this since it relates to the rollback part: https://github.com/GoogleCloudPlatform/gcloud-node/issues/633 -- we're waiting on an upgrade to Datastore's API before tackling that issue.

Resources