CouchbaseLite wait for pull replication to finish - couchdb

I have a CouchDB that I am connecting via CouchBaseLite 1.4. I am having trouble waiting for all documents to be pulled before continuing on with the application.
Currently I am achieving this in a very hacky way and I would like to fix it to be better in line with proper coding standards.
Current:
pull.setContinuous(false);
pull.start();
//Waits for pull replication to start pulling in data
while(!pull.isRunning());
//Waits for pull replication to finish.
while(!pull.getStatus().equals(Replication.ReplicationStatus.REPLICATION_STOPPED));
//Set continuous to true
pull.setContinuous(true);
//Start it again.
pull.start();
The reason I am doing this is I potentially have 2 documents in the DB that I need to wait for, if they are not present the desktop app goes into setup mode.
Is there any way to wait for all documents to complete pulling
without the hacky double while?
Even better, lets assume I know the _id's of the docs. Is there a way to wait until BOTH are pulled before continuing?

Use change listeners. To monitor replications, you want something like
// Replication.ChangeListener
#Override
public void changed(Replication.ChangeEvent changeEvent) {
if (changeEvent.getError() != null) {
Throwable lastError = changeEvent.getError();
// React to the error
return;
}
if (changeEvent.getTransition() == null) return;
ReplicationState dest = changeEvent.getTransition().getDestination();
replicationActive = ((dest == ReplicationState.STOPPING || dest == ReplicationState.STOPPED) ? false : true);
// Do something here if true
}
You could do something similar with a change listener on the database object to catch when the two specific documents have been replicated.
Since it sounds like you're expecting these docs to be in the database after initial setup somewhere else, another approach would be to do a one-shot replication to get those first documents, then start a continuous replication after it has finished.

Related

Firebase doc changes

thanks for your help, I am new to firebase, I am designing an application with Node.js, what I want is that every time it detects changes in a document, a function is invoked that creates or updates the file system according to the new structure of data in the firebase document, everything works fine but the problem I have is that if the document is updated with 2 or more attributes the makeBotFileSystem function is invoked the same number of times which brings me problems since this can give me performance problems or file overwriting problems since what I do is generate or update multiple files.
I would like to see how the change can be expected but wait until all the information in the document is finished updating, not attribute by attribute, is there any way? this is my code:
let botRef = firebasebotservice.db.collection('bot');
botRef.onSnapshot(querySnapshot => {
querySnapshot.docChanges().forEach(change => {
if (change.type === 'modified') {
console.log('bot-changes ' + change.doc.id);
const botData = change.doc.data();
botData.botId = change.doc.id;
//HERE I CREATE OR UPDATE FILESYSTEM STRUCTURE, ACCORDING Data changes
fsbotservice.makeBotFileSystem(botData);
}
});
});
The onSnapshot function will notify you anytime a document changes. If property changes are commited one by one instead of updating the document all at once, then you will receive multiple snapshots.
One way to partially solve the multiple snapshot thing would be to change the code that updates the document to commit all property changes in a single operation so that you only receive one snapshot.
Nonetheless, you should design the function triggered by the snapshot so that it can handle multiple document changes without breaking. Given that document updates will happen no matter if by single/multiple property changes your code should be able to handle those. IMHO the problem is the filesystem update rather than how many snaphots are received
You should use docChanges() method like this:
db.collection("cities").onSnapshot(querySnapshot => {
let changes = querySnapshot.docChanges();
for (let change of changes) {
var data = change.doc.data();
console.log(data);
}
});

How to capture only the fields modified by user

I am trying to build a logging mechanism, to log changes done to a record. I am currently logging previous and new record. However, as the site is very busy, I expect the logfile to grow seriously huge. To avoid this, I plan to only capture the modified fields only.
Is there a way to capture only the modifications done to a record (in REACT), so my {request.body} will have fewer fields?
My Server-side is build with NODE.JS and the client-side is REACT.
One approach you might want to consider is to add an onChange(universal) or onTextChanged(native) listener to the text field and store the form update in a local state/variables.
Finally, when a user makes an action (submit, etc.) you can send the updated data to the logging module.
The best way I found and works for me is …
on the api server-side, where I handle the update request, before hitting the database, I do a difference between the previous record and {request.body} using lodash and use the result to send to my update database function
var _ = require('lodash');
const difference = (object, base) => {
function changes(object, base) {
return _.transform(object, function (result, value, key) {
if (!_.isEqual(value, base[key])) {
result[key] = (_.isObject(value) && _.isObject(base[key])) ? changes(value, base[key]) : value;
}
});
}
return changes(object, base);
}
module.exports = difference
I saved the above code in a file named diff.js and included it in my server-side file.
It worked good.
Thanks for giving the idea...

Mongo Change Streams running multiple times (kind of): Node app running multiple instances

My Node app uses Mongo change streams, and the app runs 3+ instances in production (more eventually, so this will become more of an issue as it grows). So, when a change comes in the change stream functionality runs as many times as there are processes.
How to set things up so that the change stream only runs once?
Here's what I've got:
const options = { fullDocument: "updateLookup" };
const filter = [
{
$match: {
$and: [
{ "updateDescription.updatedFields.sites": { $exists: true } },
{ operationType: "update" }
]
}
}
];
const sitesStream = Client.watch(sitesFilter, options);
// Start listening to site stream
sitesStream.on("change", async change => {
console.log("in site change stream", change);
console.log(
"in site change stream, update desc",
change.updateDescription
);
// Do work...
console.log("site change stream done.");
return;
});
It can easily be done with only Mongodb query operators. You can add a modulo query on the ID field where the divisor is the number of your app instances (N). The remainder is then an element of {0, 1, 2, ..., N-1}. If your app instances are numbered in ascending order from zero to N-1 you can write the filter like this:
const filter = [
{
"$match": {
"$and": [
// Other filters
{ "_id": { "$mod": [<number of instances>, <this instance's id>]}}
]
}
}
];
Doing this with strong guarantees is difficult but not impossible. I wrote about the details of one solution here: https://www.alechenninger.com/2020/05/building-kafka-like-message-queue-with.html
The examples are in Java but the important part is the algorithm.
It comes down to a few techniques:
Each process attempts to obtain a lock
Each lock (or each change) has an associated fencing token
Processing each change must be idempotent
While processing the change, the token is used to ensure ordered, effectively-once updates.
More details in the blog post.
It sounds like you need a way to partition updates between instances. Have you looked into Apache Kafka? Basically what you would do is have a single application that writes the change data to a partitioned Kafka Topic and have your node application be a Kafka consumer. This would ensure only one application instance ever receives an update.
Depending on your partitioning strategy, you could even ensure that updates for the same record always go to the same node app (if your application needs to maintain its own state). Otherwise, you can spread out the updates in a round robin fashion.
The biggest benefit to using Kafka is that you can add and remove instances without having to adjust configurations. For example, you could start one instance and it would handle all updates. Then, as soon as you start another instance, they each start handling half of the load. You can continue this pattern for as many instances as there are partitions (and you can configure the topic to have 1000s of partitions if you want), that is the power of the Kafka consumer group. Scaling down works in the reverse.
While the Kafka option sounded interesting, it was a lot of infrastructure work on a platform I'm not familiar with, so I decided to go with something a little closer to home for me, sending an MQTT message to a little stand alone app, and letting the MQTT server monitor messages for uniqueness.
siteStream.on("change", async change => {
console.log("in site change stream);
const mqttClient = mqtt.connect("mqtt://localhost:1883");
const id = JSON.stringify(change._id._data);
// You'll want to push more than just the change stream id obviously...
mqttClient.on("connect", function() {
mqttClient.publish("myTopic", id);
mqttClient.end();
});
});
I'm still working out the final version of the MQTT server, but the method to evaluate uniqueness of messages will probably store an array of change stream IDs in application memory, as there is no need to persist them, and evaluate whether to proceed any further based on whether that change stream ID has been seen before.
var mqtt = require("mqtt");
var client = mqtt.connect("mqtt://localhost:1883");
var seen = [];
client.on("connect", function() {
client.subscribe("myTopic");
});
client.on("message", function(topic, message) {
context = message.toString().replace(/"/g, "");
if (seen.indexOf(context) < 0) {
seen.push(context);
// Do stuff
}
});
This doesn't include security, etc., but you get the idea.
Will that having a field in DB called status which will be updated using findAnUpdate based on the event received from change stream. So lets say you get 2 events at the same time from change stream. First event will update the status to start and the other will throw error if status is start. So the second event will not process any business logic.
I'm not claiming those are rock-solid production grade solutions, but I believe something like this could work
Solution 1
applying Read-Modify-Write:
Add version field to the document, all the created docs have version=0
Receive ChangeStream event
Read the document that needs to be updated
Perform the update on the model
Increment version
Update the document where both id and version match, otherwise discard the change
Yes, it creates 2 * n_application_replicas useless queries, so there is another option
Solution 2
Create collection of ResumeTokens in mongo which would store collection -> token mapping
In the changeStream handler code, after successful write, update ResumeToken in the collection
Create a feature toggle that will disable reading ChangeStream in your application
Configure only a single instance of your application to be a "reader"
In case of "reader" failure you might either enable reading on another node, or redeploy the "reader" node.
As a result: there might be an infinite amount of non-reader replicas and there won't be any useless queries

Store cache hit and miss rate calculation in nodejs+mongoose

Let me get started with the cache code that i use
helper['listAll'] = ()=>{
return new Promise((fullfill,reject)=>{
if(cache.get("footer") == null){
footerModel
.find({})
.then((data)=>{
cache.put("footer",data,parseInt(process.env.CACHE_FOOTER_TIMEOUT));
fullfill(data);
})
.catch((ex)=>{
reject(ex);
});
}else{
fullfill(cache.get("footer"));
}
});
}
Now as you can see i need to check how effictive the cache system is on production environment and for that i need the math in hit/miss rate. The data should be accessible via a web console. The problem is that if i keep writing an insert query for each of the hit/miss, it would be more inefficient and there would be no point using cache at all. What is the best possible way that i can go about so that the calculations are stored on the database and also not overload the dbs system ?
The cache model that i am using is memory-cache and the last method i tried was set a global counter on hit/miss and a timeout function that pushed the values to redis server every 10 seconds. Is there a better approach that this ?

sqlite returns SQLITE_BUSY in WAL mode

I have a web application working with sqlite database.
My version of sqlite is the latest from official windows binary distribution - 3.7.13.
The problem is that under heavy load on database, sqlite API functions (such as sqlite3_step) are returning SQLITE_BUSY.
I pass the following pragmas when initializing a connection:
journal_mode = WAL
page_size = 4096
synchronous = FULL
foreign_keys = on
The databas is one-file database. And I'm using Mono 2.10.8 and Mono.Data.Sqlite assembly provided with it to access database.
I'm testing it with 50 parallel threads which are sending 50 subsequent http-requests each to my application. On every request some reading and writing are done to the database. Every set of IO operations is executed inside the transaction.
Everything goes well until near 400th - 700th request. In this (random) moment API functions are starting to return SQLITE_BUSY permanently (To be more exact - until the limit of retries is reached).
As far as i know WAL mode transparently supports parallel reads and writes. I've guessed that it could be because of attempt to read database while checkpoint operation is executed. But even after turning autocheckpoint off the situation remains the same.
What could be wrong in this situation?
How to serve large amount of parallel database IO correctly?
P.S.
Only one connection per request is supposed.
I use nhibernate configured with WebSessionContext.
I initialize my NHibernate session like this:
ISession session = null;
//factory variable is session factory
if (CurrentSessionContext.HasBind(factory))
{
session = factory.GetCurrentSession();
if (session == null)
CurrentSessionContext.Unbind(factory);
}
if (session == null)
{
session = factory.OpenSession();
CurrentSessionContext.Bind(session);
}
return session;
And on HttpApplication.EndRequest i release it like this:
//factory variable is session factory
if (CurrentSessionContext.HasBind(factory))
{
try
{
CurrentSessionContext.Unbind(factory)
.Dispose();
}
catch (Exception ee)
{
Logr.Error("Error uninitializing session", ee);
}
}
So as far as i know there should be only one connection per request life cycle. While proceessing the request, code is executed sequentially (ASP.NET MVC 3). So it doesn't look like any concurency is possible here. Can i conclude that no connections are shared in this case?
It's not clear to me if the request threads share the same connection or not. If they don't then you should not be having these issues.
Assuming that you are indeed sharing the connection object across multiple threads, you should use some locking mechanism as the the SqliteConnection isn't thread-safe (an old post, but the SQLite library maintained as part of Mono evolved from System.Data.SQLite found on http://sqlite.phxsoftware.com).
So assuming that you don't lock around using the SqliteConnection object, can you please try it? A simple way to accomplish this could look like this:
static readonly object _locker = new object();
public void ProcessRequest()
{
lock (_locker) {
using (IDbCommand dbcmd = conn.CreateCommand()) {
string sql = "INSERT INTO foo VALUES ('bar')";
dbcmd.CommandText = sql;
dbcmd.ExecuteNonQuery();
}
}
}
You may however choose to open a distinct connection with each thread to ensure you don't have any more threading issues with the SQLite library.
EDIT
Following-up on the code you posted, do you close the session after committing the transaction? If you don't use some ITransaction, do you flush and close the session? I'm asking since I don't see it in your code, and I see it mentioned in https://stackoverflow.com/a/43567/610650
I also see it mentioned on http://nhibernate.info/doc/nh/en/index.html#session-configuration:
Also note that you may call NHibernateHelper.GetCurrentSession(); as
many times as you like, you will always get the current ISession of
this HTTP request. You have to make sure the ISession is closed after
your unit-of-work completes, either in Application_EndRequest event
handler in your application class or in a HttpModule before the HTTP
response is sent.

Resources