MongoDB - how to make Replica set Step down truly seamless - node.js

The problem is - when your Replica Set is forced to step down while your application is running, all mainstream Mongo clients will throw at least one exception per connection. This happens because their database connections are hardwired to the physical server which used to be the primary, and no longer accepts queries. So, while MongoDB architects might think that the StepDown process does not create any downtime, in reality if you handle connections according to their documentation, each step down will cause a full blown crash for at least one user, and might even create a data integrity issue. I hope, this can be avoided with a simple wrapper that captures some specific Mongo exceptions and handles them by automatically re-connecting to the Replica Set, and re-running the failed query. If you already have a solution for this, please share! I am particularly interested in a solution that works with any major Mongo driver for Node.JS.

You are correct -- this is the exact behavior I experienced with both mainstream ODMs as well as the official native MongoDB driver for Node.js.
Replica set step-downs would cause my outstanding queries to fail with "Could not locate any valid servers in initial seed list", "sockets closed", and "ECONNRESET" before additional queries would get buffered up even though bufferMaxEntries is correctly configured.
Therefore, I developed Monkster to provide seamless replica set step-down and overall high-availability for MongoDB clusters for Node.js developers using the popular Monk ODM.
Monkster is a Node.js package that provides high availability for Monk, the wise MongoDB API. It implements smart error handling and retry logic to handle temporary network connectivity issues and replica set step-downs seamlessly.
https://www.npmjs.com/package/monkster

Related

Shopware 6 partitioning

Has anyone had any experience with database partitioning? We already have a lot of data and queries on it are already starting to slow down. Maybe someone has some examples? These are tables related to orders.
Shopware, since version 6.4.12.0, allows the use of database clusters, see the relevant documentation. You will have to set up a number read-only nodes first. The load of reading data will then be distributed among the read-only nodes while write operations are restricted to the primary node.
Note that in a cluster setup you should also use a lock storage that compliments the setup.
Besides using a DB cluster you can also try to reduce the load of the db server.
The first thing you should enable the HTTP-Cache, still better to additionaly also set up a reverse cache like varnish. This will greatly decrease the number of requests that hit your webserver and thus your DB server as well.
Besides all those measures explained here should improve the overall performance of your shop as well as decreasing load on the DB.
Additionally you could use Elasticsearch, so that costly search requests won't hit the Database. And use a "real" MessageQueue, so that the messages are not stored in the Database. And use Redis instead of the database for the storage of performance critical information as is documented in the articles in this category of the official docs.
The impact of all those measures probably depends on your concrete project setup, so maybe you see in the DB locks something that hints to one of the points i mentioned previously, so that would be an indicator to start in that direction. E.g. if you see a lot of search related queries Elasticsearch would be a great start, but if you see a lot of DB load coming from writing/reading/deleting messages, then the MessageQueue might be a better starting point.
All in all when you use a DB cluster with a primary and multiple replicas and use the additional services i mentioned here your shop should be able to scale quite well without the need for partitioning the actual DB.

Does MongoDB Atlas provide offline support?

I am creating an App in expo and using nodejs express as backend and mongodb atlas as database.
So I was thinking that if a user is offline and the actions he will perform during offline will automatically sync with online data when he will be offline.
Does this feature mongodb atlas provides ? Or any other option in mongodb provide this feature ?
Check the paragraph offline first on MongoDB website:
Realm Sync is built on the assumption that connectivity will drop. We call this mentality offline-first. After you make changes to the local realm on the client device, the Realm SDK automatically sends the changes to the server as soon as possible.
Check also the paragraph conflict resolution:
MongoDB Realm's sync conflict resolution engine is deterministic. Changes received out-of-order eventually converge on the same state across the server and all clients. As such, Realm Sync is strongly eventually consistent.
In simple terms, Realm Sync's conflict resolution comes down to last write wins. Realm Sync also uses more sophisticated techniques like operational transform to handle, for example, insertions into lists.

Architecture for Redis cache & Mongo for persistence

The Setup:
Imagine a 'twitter like' service where a user submits a post, which is then read by many (hundreds, thousands, or more) users.
My question is regarding the best way to architect the cache & database to optimize for quick access & many reads, but still keep the historical data so that users may (if they want) see older posts. The assumption here is that 90% of users would only be interested in the new stuff, and that the old stuff will get accessed occasionally. The other assumption here is that we want to optimize for the 90%, and its ok if the older 10% take a little longer to retrieve.
With this in mind, my research seems to strongly point in the direction of using a cache for the 90%, and then to also store the posts in another longer-term persistent system. So my idea thus far is to use Redis for the cache. The advantages is that Redis is very fast, and also it has built in pub/sub which would be perfect for publishing posts to many people. And then I was considering using MongoDB as a more permanent data store to store the same posts which will be accessed as they expire off of Redis.
Questions:
1. Does this architecture hold water? Is there a better way to do this?
2. Regarding the mechanism for storing posts in both the Redis & MongoDB, I was thinking about having the app do 2 writes: 1st - write to Redis, it then is immediately available for the subscribers. 2nd - after successfully storing to Redis, write to MongoDB immediately. Is this the best way to do it? Should I instead have Redis push the expired posts to MongoDB itself? I thought about this, but I couldn't find much information on pushing to MongoDB from Redis directly.
It is actually sensible to associate Redis and MongoDB: they are good team players. You will find more information here:
MongoDB with redis
One critical point is the resiliency level you need. Both Redis and MongoDB can be configured to achieve an acceptable level of resiliency, and these considerations should be discussed at design time. Also, it may put constraint on the deployment options: if you want master/slave replication for both Redis and MongoDB you need at least 4 boxes (Redis and MongoDB should not be deployed on the same machine).
Now, it may be a bit simpler to keep Redis for queuing, pub/sub, etc ... and store the user data in MongoDB only. Rationale is you do not have to design similar data access paths (the difficult part of this job) for two stores featuring different paradigms. Also, MongoDB has built-in horizontal scalability (replica sets, auto-sharding, etc ...) while Redis has only do-it-yourself scalability.
Regarding the second question, writing to both stores would be the easiest way to do it. There is no built-in feature to replicate Redis activity to MongoDB. Designing a daemon listening to a Redis queue (where activity would be posted) and writing to MongoDB is not that hard though.

Rate limiting - using CouchDB with Redis or CouchDB on its own

I've written an application with a CouchDB backend. I have invested a lot of time into CouchDB and so I'm reluctant to move everything over to a different NoSQL database (like Redis).
The problem is that I now need to implement a rate limiting (based on IP address) feature.
There are plenty of examples on how good Redis is for this kind of task, however because I don't want to drop CouchDB for other tasks this means I would essentially be running (and supporting) two databases (1 for most data, 1 for rate limiting) and so...
Is running CouchDB in tandem with Redis unheard of?
Is CouchDB itself suitable for handling rate limiting itself?
Is running CouchDB in tandem with Redis unheard of?
Redis is commonly used in complement with other storage solutions (MySQL, PostgreSQL, MongoDB, CouchDB, etc ...). Like many other NoSQL solutions, Redis is not adapted to all kind of workloads or situations. The authors of Redis are pragmatic and open people, and they routinely suggest to use other solutions rather than Redis, when they are more adapted to the situation.
Redis is therefore a good team player, and it is generally easy to integrate in an existing infrastructure.
Here is an example of usage of Redis with CouchDB.
Is CouchDB itself suitable for handling rate limiting itself?
CouchDB has a number of useful features to implement the rate limiting strategy described in Chris O'Hara's article. For instance, it supports bulk operations on several documents (with optional atomicity). A "bucket span" can be stored in a single document. In-place incrementation of counters can be covered by using update handlers.
IMO, the main missing feature would be automatic item expiration (which CouchDB does not provide AFAIK). So you would have to design a clever mechanism to get rid of obsolete data on top of CouchDB.
The main problem is CouchDB is not really designed for this kind of workload: it is a log structured document oriented database. Each time a counter has to be incremented, it would involve JSON unpacking/packing operations, some Javascript code to be executed, and writing a new revision of the whole document in append only files. You can find a good article describing how CouchDB stores its data here.
I suspect a rate limiting strategy implemented on top of CouchDB would not scale very well (too many I/Os, too much CPU consumption, inefficient network protocol). For instance, CouchDB is a RESTful server; I would not feel comfortable to initiate client HTTP operations (REST queries to CouchDB) to rate limit each incoming HTTP query of my system.
Redis is much more adapted to this kind of workload (fast, in-memory, no I/O, efficient client protocol, no JSON parsing/formatting, incrementations are native atomic operations, etc ...)
You can do rate limiting with Memcached - it has a nice counter increment command as you mention, plus obsolete data is automatically purged from the cache in due course, so it has all the benefits of Redis for this application without the annoying duplication of capability (and complexity) that running Redis on top of CouchDB would bring.
http://simonwillison.net/2009/jan/7/ratelimitcache/
You could add memcached to your own setup easily enough or you could investigate CouchBase whose current server product integrates a CouchDB derived database with Memcached compatibility baked in:
http://www.couchbase.com/memcached
Personally I dislike the way Couchbase forked from CouchDB, but for your application it might be a perfect fit.

C#, Linq2Sql, TransactionScope, "Transaction has aborted.", msdtc, sqlserver 2005

we're having issues with TransactionScope in a .NET 4 project.
We have segmented our DAL's into domains, that is we have different Linq2Sql DataContexts pointing to the same database.
The issue arises when, within the same TransactionScope, we insert/update on more than one DataContext, instantly a msdtc transaction will pop up, both locally and on the server, and then it will just hang there for 1-2 minutes (guess it times out), the code will then continue to run, until t.Complete() and subsequent implied .Dispose will yield and Exception "Transaction has aborted.".
We have configured msdtc both locally and on the server to allow all, no authentication, full trace levels, still no relevant information comes from the dtctrace.log
I guess it is standard procedure for msdtc to kick in when more that one database connection is initiated (even if it is vs. the same database), but why the timeout? The operations are not conflicting, there is no possible way for a deadlock to occur in our domain?
Have googled and tested extensively, hope for some seasoned experience here :)
With SQL2005 any transaction spanning multiple connections will be escalated to DTC. With SQL2008, several connections with the same connection string can participate in the same transaction without the need for DTC. With the architecture you've chosen I'd strongly suggest upgrading to SQL2008 if that is an option. DTC can be a paint to get working correctly.

Resources