mongodb - write operations are rejected when primary node down

mongodb - write operations are rejected when primary node down - node.js

We have one primary and two secondary nodes on mongo, during autoscale the primary goes down and secondary(whichever is healthy) becomes a new primary using election. At this time no write operations are allowed, the mongo simply rejects the write queries with the below error.
"errmsg": "not master",
"code": 10107,
"codeName": "NotWritablePrimary",
We also faced the error not master and slaveOk=false but after setting readPreference=primaryPreferred in the connection string reads are allowed during primary is down.
The reason for autoscale is the load at peak hours, which is expected and hence it gets automatically scaled to a higher number. After exploring found below things
Thread 1
In the event of a failure for a primary node, a new primary needs to
be elected. During this period when the election is held, write
operations will fail as there is currently no primary to service them.
Thread 2
It is not possible to write to a secondary in the MongoDB replica set.
Question: Has anyone faced this before? If it's a known behavior how can we allow write operations during this time? as it affects the application to result in 500 errors. Any suggestions will be appreciated, thanks in advance!
Note: We are using MongoDB atlas with replicas. We tried Test failover of Atlas to simulate the autoscaling and then did a small load test to perform reads and writes on DB.
We are using the following
NodeJs: v12
NPM: v6
NestJS: v6
#nestjs/mongoose: v6.4
Connection String options are
retryWrites=true&w=majority&readPreference=primaryPreferred

It is expected behaviour during the election duration(typically not exceeding 12 sec ) to have the replicaSet readOnly , you may enable retryable writes to allow the driver to do one more attempt after serverSelectionTimeoutMS milliseconds ( default 30000ms )

Related

How detect MongoDB reconnection for replica set

If I try to register for the event 'reconnect' in a MongoDB replicaset:
db.on('reconnect', () => console.log('Reconnected'));
I receive a deprecation warning as:
DeprecationWarning: The `reconnect` event is no longer supported by the unified topology
How can I handle a case of lost MongoDB connection (all servers in the replica set) but I want to be notified of servers availability status (when at least one server become again available)?
Suppose to handle this in a Node app with MongoDB native drivers.
Thanks in advance.

If we take a look at the spec regarding the unified topology, we can find the following section:
The unified topology is the first step in a paradigm shift away from a
concept of “connecting” to a MongoDB deployment using a connect
method. Consider for a moment what it means to be connected to a
replica set: do we trigger this state when connected to a primary? A
primary and one secondary? When connected to all known nodes? It’s
unclear whether its possible to answer this without introducing
something like a ReadPreference parameter to the connect method. At
this point “connecting” is just one half of “operation execution” -
you pass a ReadPreference in, and await a selectable server for the
operation, now we’re connected!
There are, however, new events you can listen to and that might be useful for your usecase -> see this for more information.

how does mongodb replica set work with nodejs-mongoose?

Techstack used nodejs,mongoose,mongodb
i'm working on product that handles many DBrequests. During beginning of every month the db requests are high due to high read/write requests (bulk data processing). The number of records in each collection's targeted for serving these read/write requests are quite high. Read is high but write is not that high.
So the cpu utilization on the instance in which mongodb is running reaches the dangerzone(above 90%) during these times. The only thing that gets me through these times is HOPE (yes, hoping that instance will not crash).
Rather than scaling vertically, i'm looking for solutions to scale horizontally (not a revolutionary thought). i looked at replicaset and sharding. This question is only related to replicaSet.
i went through documents and i feel like the understanding i have on replicaset is not really the way it might work.
i have configured my replicaset with below configuration. i simply want to add one more instance because as per the understanding i have right now, if i add one more instance then my database can handle more read requests by distributing the load which could minimize the cpuUtilization by atleast 30% on primaryNode. is this understanding correct or wrong? Please share your thoughts
var configuration = {
_id : "testReplicaDB",
members:[
{_id:0,host:"localhost:12017"},
{_id:1,host:"localhost:12018",arbiterOnly:true,buildIndexes:false},
{_id:2,host:"localhost:12019"}
]
}
When i broughtup the replicaset with above config and ran my nodejs-mongoose code, i ran into this issue . Resolution they are proposing is to change the above config into
var configuration = {
_id : "testReplicaDB",
members:[
{_id:0,host:"validdomain.com:12017"},
{_id:1,host:"validdomain.com:12018",arbiterOnly:true,buildIndexes:false},
{_id:2,host:"validdomain.com:12019"}
]
}
Question 1 (related to the coding written in nodejsproject with mongoose library(for handling db) which connects to the replicaSet)
const URI = mongodb://167.99.21.9:12017,167.99.21.9:12019/${DB};
i have to specify both uri's of my mongodb instances in mongoose connection URI String.
When i look at my nodejs-mongoose code that will connect to the replicaSet, i have many doubts on how it might handle the multipleNode.
How does mongoose know which ip is the primaryNode?
Lets assume 167.99.21.9:12019 is primaryNode and rs.slaveOk(false) on secondaryReplica, so secondaryNode cannot serve readRequests.
In this situation, does mongoose trigger to the first uri(167.99.21.9:12017) and this instance would redirect to the primaryNode or will the request comeback to mongoose and then mongoose will trigger another request to the 167.99.21.9:12019 ?
Question 2
This docLink mention's that data redundancy enables to handle high read requests. Lets assume, read is enabled for secondaryNode, and
Lets assume the case when mongoose triggers a request to primaryNode and primaryNode was getting bombarded at that time with read/write requests but secondaryNode is free(doing nothing) , then will mongodb automatically redirect the request to secondaryNode or will this request fail and redirect back to mongoose, so that the burden will be on mongoose to trigger another request to the next available Node?
can mongoose automatically know which Node in the replicaSet is free?
Question 3
Assuming both 167.99.21.9:12017 & 167.99.21.9:12019 instances are available for read requests with ReadPreference.SecondaryPreferred or ReadPreference.nearest, will the load get distributed when secondaryNode gets bombarded with readRequests and primaryNode is like 20% utilization? is this the case? or is my understanding wrong? Can the replicaSet act as a loadbalancer? if not, how to make it balance the load?
Question 4
var configuration = {
_id : "testReplicaDB",
members:[
{_id:0,host:"validdomain.com:12017"},
{_id:1,host:"validdomain.com:12018",arbiterOnly:true,buildIndexes:false},
{_id:2,host:"validdomain.com:12019"}
]
}
You can see the DNS name in the configuration, does this mean that when primaryNode redirects a request to secondaryNode, DNS resolution will happen and then using that IP which corresponds to secondaryNode, the request will be redirected to secondaryNode? is my understanding correct or wrong? (if my understanding is correct, this is going to fireup another set of questions)
:|
i could've missed many details while reading the docs. This is my last hope of getting answers. So please share if you know the answers to any of these.

if this is the case, then how does mongoose know which ip is the primaryReplicaset?
There is no "primary replica set", there can be however a primary in a replica set.
Each MongoDB driver queries all of the hosts specified in the connection string to discover the members of the replica set (in case one or more of the hosts is unavailable for whatever reason). When any member of the replica set responds, it does so with the full list of current members of the replica set. The driver then knows what the replica set members are, and which of them is currently primary (if any).
secondaryReplica cannot serve readRequests
This is not at all true. Any data-bearing node can fulfill read requests, IF the application provided a suitable read preference.
In this situation, does mongoose trigger to the first uri(167.99.21.9:12017) and this instance would redirect to the primaryReplicaset or will the request comeback to mongoose and then mongoose will trigger another request to the 167.99.21.9:12019 ?
mongoose does not directly talk to the database. It uses the driver (node driver for MongoDB) to do so. The driver has connections to all replica set members, and sends the requests to the appropriate node.
For example, if you specified a primary read preference, the driver would send that query to the primary if one exists. If you specified a secondary read preference, the driver would send that query to a secondary if one exists.
i'm assuming that when both 167.99.21.9:12017 & 167.99.21.9:12019 instances are available for read requests with ReadPreference.SecondaryPreferred or ReadPreference.nearest
Correct, any node can fulfill those.
the load could get distributed across
Yes and no. In general replicas may have stale data. If you require current data, you must read from the primary. If you do not require current data, you may read from secondaries.
how to make it balance the load?
You can make your application balance the load by using secondary or nearest reads, assuming it is OK for your application to receive stale data.
if mongoose triggers a request to primaryReplica and primaryReplica is bombarded with read/write requests and secondaryReplica is free(doing nothing) , then will mongodb automatically redirect the request to secondaryReplica?
No, a primary read will not be changed to a secondary read.
Especially in the scenario you are describing, the secondary is likely to be stale, thus a secondary read is likely to produce wrong results.
can mongoose automatically know which replica is free?
mongoose does not track deployment state, the driver is responsible for this. There is limited support in drivers for choosing a "less loaded" node, although this is measured based on network latency and not CPU/memory/disk load and only applies to the nearest read preference.

How can I instrument and log my KnexJS transactions?

I have a serious problem in production causing the application to become unresponsive and output the following error:
Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
A running hypothesis is some operations are holding onto long-running Knex transactions. Enough of them to reach the pool size, basically.
Is there a way to query the KnexJS API for how many pool connections are in use at any one time? Unfortunately since KnexJS occupies the max pool settings from the config, it can be hard to know how many are actually in use. From the postgres end, it seems like KnexJS is idling on all of its connections when they are not in use.
Is there a good way to instrument Knex transaction and transacting with some kind of middleware or hook? Another useful thing is to log the callstack of any transaction (or any longer than, say, 7 seconds). One challenge is I have calls to Knex transaction and transacting throughout my project. Maybe it's a long shot.
Any advice is greatly appreciated.
System Information
KnexJS version: 0.12.6 (we will update in the next month)
Database + version: Postgres 9.6
OS: Heroku Linux (Ubuntu?)

Easiest was to see whats happening on connection pool level is to run knex with DEBUG=knex:* environment variable set, which will print quite a lot debug info whats happening inside knex. Those logs shows for example when connections are fetched from pool and returned to there and every ran query too.
There are couple of global events that you can use to hookup to every query, but there is not any for hooking to transactions. Here is related question where I have written some example code how to actually measure transaction durations with query hooks though: Tracking DB querying time - Bookshelf/knex It probably leaks some memory, so its not very production ready solution, but for your debugging purposes it might be helpful.

Unable to understand why N1QL Queries in couchbase hangs?

I have a couchbase cluster setup (couchbase version 4.1) where there are N data nodes, 1 Query Node and 1 Index Node. Data nodes have roughly 1 million key value pairs in a single bucket. This whole setup is hosted in Microsoft Azure within a virtual network. And can assure you that each node has enough resources that RAM, CPU or Disk is not an issue.
Now i can GET/SET JSON documents in my couchbase server without any issue. I am just testing, so ports are not issue as i have opened all ports between machines for now.
But when i try to run N1QL queries (from couchbase shell or using python SDK) it does not work. The query just hangs and i don't get any reply from server. On the other hand, once in a while the query just works without any issue and then after a minute it again stops working.
I have created PRIMARY index on my bucket and any other required Global Secondary Index if needed.
I also installed sample buckets provided by couchbase. Same problems exist.
Does anyone have a clue what the issue could be?

Your query hangs probably because you are straining the server too much, I don't know how many N1QL ops you are push each second, but for that type of query you will benefit the most with several tweaks, which lower cpu usage and increase efficiency.
Create a specific covering index such as:
create index inx_id_email on clients(id,email) where transaction_successful=false
use explain keyword to check if your query is using the index.
(explain SELECT id, email FROM clients where transaction_successful = false LIMIT 100 OFFSET 200)
I believe that your query/index nodes are utilized too much because you actually doing the equivalent to primary scan in relational databases.

How to properly load-balance with Node.js / Mongoose / MongoDB + ConnectSet?

I'm using Node.js+Express+Mongoose to connect to my MongoDB replica set (3x instances).
I was under the impression that when I used Mongoose's "connectSet" command, thereby connecting to the replica set, my queries would be load-balanced between my replica set.
However, using nodetime, I can see that all queries (including find() queries) are going to the PRIMARY instance in the replica set.
Am I misunderstanding something here? Is there some practice I am missing, or a setting in the replica set? I thought the purpose of a replica set was to balance read-only queries with the SECONDARY MongoDB servers in the set...
Thanks.

I was under the impression that when I used Mongoose's "connectSet" command, thereby connecting to the replica set, my queries would be load-balanced between my replica set.
This impression is incorrect.
By default, MongoDB reads & writes are sent to the Primary member of a Replica Set. The primary purpose of a Replica Set is to provide high availability (HA). When the primary node goes down, the driver will throw an exception on existing connections and then auto-reconnect to whatever node is elected the new primary.
The idea here being that the driver will find the new primary with no intervention and no configuration changes.
Is there some practice I am missing, or a setting in the replica set?
If you really want to send queries to a secondary you can configure a flag on the query that states "this query can be sent to a secondary". Implementation of this will vary, here's a version for Mongoose.
Please note that sending queries to Secondary nodes is not the default behaviour and there are many pitfalls here. Most implementations of MongoDB are limited by the single write lock, so load-balancing the reads is not necessary. Spreading the reads is not guaranteed to increase performance and it can easily result in dirty reads.
Before undertaking such a load balancing, please be sure that you absolutely need it. Sharding may be a better option.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string