How to check actual connection pool in use in Mongoose? - node.js

I have NodeJS API using mongoose to talk to Mongo DB. Initially, I had the default connection pool size of 5 and later increased it to 20 during performance testing. However, it's bothering me if there is a way to log the actual number of DB instances being used. It could be 10 or 15. But there is no way to figure it out. Has anyone tried to solve this?
I also came across an article that said that NodeJS uses threads internally (max 4) that would restrict the actual DB threads. Will update this question with the link later.

Related

TypeORM limit number o created connections with createConnection(...)

I need help to limit the allowed number of connections typeORM can hold in his connectionManager.
Today I have many database, more than 12 thousands that are distributed on some servers, and each request in my application can connect to a different database because each database is related to the user, so for each user requesting something from my API my service runs the createConnection(userParams) but I don't know how to control this connection.
I tried limiting inside the userParams something like
createConnection(...userParams, {extra: connectionLimit: 5})
but it seems this only limit the inner Pool that is created each time. I need a way so I can limit the total number of connections the connectionManager can have.
Basically I want a global pool instead of one for each connection created. Can someone please give me any hints?
It looks like what I wanted to achieve was not possible before typeorm version 0.3.6. On current versions the connectionManager does not exists, thus I'm able to control connections by myself

how does mongodb replica set work with nodejs-mongoose?

Techstack used nodejs,mongoose,mongodb
i'm working on product that handles many DBrequests. During beginning of every month the db requests are high due to high read/write requests (bulk data processing). The number of records in each collection's targeted for serving these read/write requests are quite high. Read is high but write is not that high.
So the cpu utilization on the instance in which mongodb is running reaches the dangerzone(above 90%) during these times. The only thing that gets me through these times is HOPE (yes, hoping that instance will not crash).
Rather than scaling vertically, i'm looking for solutions to scale horizontally (not a revolutionary thought). i looked at replicaset and sharding. This question is only related to replicaSet.
i went through documents and i feel like the understanding i have on replicaset is not really the way it might work.
i have configured my replicaset with below configuration. i simply want to add one more instance because as per the understanding i have right now, if i add one more instance then my database can handle more read requests by distributing the load which could minimize the cpuUtilization by atleast 30% on primaryNode. is this understanding correct or wrong? Please share your thoughts
var configuration = {
_id : "testReplicaDB",
members:[
{_id:0,host:"localhost:12017"},
{_id:1,host:"localhost:12018",arbiterOnly:true,buildIndexes:false},
{_id:2,host:"localhost:12019"}
]
}
When i broughtup the replicaset with above config and ran my nodejs-mongoose code, i ran into this issue . Resolution they are proposing is to change the above config into
var configuration = {
_id : "testReplicaDB",
members:[
{_id:0,host:"validdomain.com:12017"},
{_id:1,host:"validdomain.com:12018",arbiterOnly:true,buildIndexes:false},
{_id:2,host:"validdomain.com:12019"}
]
}
Question 1 (related to the coding written in nodejsproject with mongoose library(for handling db) which connects to the replicaSet)
const URI = mongodb://167.99.21.9:12017,167.99.21.9:12019/${DB};
i have to specify both uri's of my mongodb instances in mongoose connection URI String.
When i look at my nodejs-mongoose code that will connect to the replicaSet, i have many doubts on how it might handle the multipleNode.
How does mongoose know which ip is the primaryNode?
Lets assume 167.99.21.9:12019 is primaryNode and rs.slaveOk(false) on secondaryReplica, so secondaryNode cannot serve readRequests.
In this situation, does mongoose trigger to the first uri(167.99.21.9:12017) and this instance would redirect to the primaryNode or will the request comeback to mongoose and then mongoose will trigger another request to the 167.99.21.9:12019 ?
Question 2
This docLink mention's that data redundancy enables to handle high read requests. Lets assume, read is enabled for secondaryNode, and
Lets assume the case when mongoose triggers a request to primaryNode and primaryNode was getting bombarded at that time with read/write requests but secondaryNode is free(doing nothing) , then will mongodb automatically redirect the request to secondaryNode or will this request fail and redirect back to mongoose, so that the burden will be on mongoose to trigger another request to the next available Node?
can mongoose automatically know which Node in the replicaSet is free?
Question 3
Assuming both 167.99.21.9:12017 & 167.99.21.9:12019 instances are available for read requests with ReadPreference.SecondaryPreferred or ReadPreference.nearest, will the load get distributed when secondaryNode gets bombarded with readRequests and primaryNode is like 20% utilization? is this the case? or is my understanding wrong? Can the replicaSet act as a loadbalancer? if not, how to make it balance the load?
Question 4
var configuration = {
_id : "testReplicaDB",
members:[
{_id:0,host:"validdomain.com:12017"},
{_id:1,host:"validdomain.com:12018",arbiterOnly:true,buildIndexes:false},
{_id:2,host:"validdomain.com:12019"}
]
}
You can see the DNS name in the configuration, does this mean that when primaryNode redirects a request to secondaryNode, DNS resolution will happen and then using that IP which corresponds to secondaryNode, the request will be redirected to secondaryNode? is my understanding correct or wrong? (if my understanding is correct, this is going to fireup another set of questions)
:|
i could've missed many details while reading the docs. This is my last hope of getting answers. So please share if you know the answers to any of these.
if this is the case, then how does mongoose know which ip is the primaryReplicaset?
There is no "primary replica set", there can be however a primary in a replica set.
Each MongoDB driver queries all of the hosts specified in the connection string to discover the members of the replica set (in case one or more of the hosts is unavailable for whatever reason). When any member of the replica set responds, it does so with the full list of current members of the replica set. The driver then knows what the replica set members are, and which of them is currently primary (if any).
secondaryReplica cannot serve readRequests
This is not at all true. Any data-bearing node can fulfill read requests, IF the application provided a suitable read preference.
In this situation, does mongoose trigger to the first uri(167.99.21.9:12017) and this instance would redirect to the primaryReplicaset or will the request comeback to mongoose and then mongoose will trigger another request to the 167.99.21.9:12019 ?
mongoose does not directly talk to the database. It uses the driver (node driver for MongoDB) to do so. The driver has connections to all replica set members, and sends the requests to the appropriate node.
For example, if you specified a primary read preference, the driver would send that query to the primary if one exists. If you specified a secondary read preference, the driver would send that query to a secondary if one exists.
i'm assuming that when both 167.99.21.9:12017 & 167.99.21.9:12019 instances are available for read requests with ReadPreference.SecondaryPreferred or ReadPreference.nearest
Correct, any node can fulfill those.
the load could get distributed across
Yes and no. In general replicas may have stale data. If you require current data, you must read from the primary. If you do not require current data, you may read from secondaries.
how to make it balance the load?
You can make your application balance the load by using secondary or nearest reads, assuming it is OK for your application to receive stale data.
if mongoose triggers a request to primaryReplica and primaryReplica is bombarded with read/write requests and secondaryReplica is free(doing nothing) , then will mongodb automatically redirect the request to secondaryReplica?
No, a primary read will not be changed to a secondary read.
Especially in the scenario you are describing, the secondary is likely to be stale, thus a secondary read is likely to produce wrong results.
can mongoose automatically know which replica is free?
mongoose does not track deployment state, the driver is responsible for this. There is limited support in drivers for choosing a "less loaded" node, although this is measured based on network latency and not CPU/memory/disk load and only applies to the nearest read preference.

How can I instrument and log my KnexJS transactions?

I have a serious problem in production causing the application to become unresponsive and output the following error:
Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
A running hypothesis is some operations are holding onto long-running Knex transactions. Enough of them to reach the pool size, basically.
Is there a way to query the KnexJS API for how many pool connections are in use at any one time? Unfortunately since KnexJS occupies the max pool settings from the config, it can be hard to know how many are actually in use. From the postgres end, it seems like KnexJS is idling on all of its connections when they are not in use.
Is there a good way to instrument Knex transaction and transacting with some kind of middleware or hook? Another useful thing is to log the callstack of any transaction (or any longer than, say, 7 seconds). One challenge is I have calls to Knex transaction and transacting throughout my project. Maybe it's a long shot.
Any advice is greatly appreciated.
System Information
KnexJS version: 0.12.6 (we will update in the next month)
Database + version: Postgres 9.6
OS: Heroku Linux (Ubuntu?)
Easiest was to see whats happening on connection pool level is to run knex with DEBUG=knex:* environment variable set, which will print quite a lot debug info whats happening inside knex. Those logs shows for example when connections are fetched from pool and returned to there and every ran query too.
There are couple of global events that you can use to hookup to every query, but there is not any for hooking to transactions. Here is related question where I have written some example code how to actually measure transaction durations with query hooks though: Tracking DB querying time - Bookshelf/knex It probably leaks some memory, so its not very production ready solution, but for your debugging purposes it might be helpful.

Connection pool using pg-promise

I'm using Node js and Postgresql and trying to be most efficient in the connections implementation.
I saw that pg-promise is built on top of node-postgres and node-postgres uses pg-pool to manage pooling.
I also read that "more than 100 clients at a time is a very bad thing" (node-postgres).
I'm using pg-promise and wanted to know:
what is the recommended poolSize for a very big load of data.
what happens if poolSize = 100 and the application gets 101 request simultaneously (or even more)?
Does Postgres handles the order and makes the 101 request wait until it can run it?
I'm the author of pg-promise.
I'm using Node js and Postgresql and trying to be most efficient in the connections implementation.
There are several levels of optimization for database communications. The most important of them is to minimize the number of queries per HTTP request, because IO is expensive, so is the connection pool.
If you have to execute more than one query per HTTP request, always use tasks, via method task.
If your task requires a transaction, execute it as a transaction, via method tx.
If you need to do multiple inserts or updates, always use multi-row operations. See Multi-row insert with pg-promise and PostgreSQL multi-row updates in Node.js.
I saw that pg-promise is built on top of node-postgres and node-postgres uses pg-pool to manage pooling.
node-postgres started using pg-pool from version 6.x, while pg-promise remains on version 5.x which uses the internal connection pool implementation. Here's the reason why.
I also read that "more than 100 clients at a time is a very bad thing"
My long practice in this area suggests: If you cannot fit your service into a pool of 20 connections, you will not be saved by going for more connections, you will need to fix your implementation instead. Also, by going over 20 you start putting additional strain on the CPU, and that translates into further slow-down.
what is the recommended poolSize for a very big load of data.
The size of the data got nothing to do with the size of the pool. You typically use just one connection for a single download or upload, no matter how large. Unless your implementation is wrong and you end up using more than one connection, then you need to fix it, if you want your app to be scalable.
what happens if poolSize = 100 and the application gets 101 request simultaneously
It will wait for the next available connection.
See also:
Chaining Queries
Performance Boost
what happens if poolSize = 100 and the application gets 101 request simultaneously (or even more)? Does Postgres handles the order and makes the 101 request wait until it can run it?
Right, the request will be queued. But it's not handled by Postgres itself, but by your app (pg-pool). So whenever you run out of free connections, the app will wait for a connection to release, and then the next pending request will be performed. That's what pools are for.
what is the recommended poolSize for a very big load of data.
It really depends on many factors, and no one will really tell you the exact number. Why not test your app under huge load and see in practise how it performs, and find the bottlenecks.
Also I find the node-postgres documentation quite confusing and misleading on the matter:
Once you get >100 simultaneous requests your web server will attempt to open 100 connections to the PostgreSQL backend and 💥 you'll run out of memory on the PostgreSQL server, your database will become unresponsive, your app will seem to hang, and everything will break. Boooo!
https://github.com/brianc/node-postgres
It's not quite true. If you reach the connection limit at Postgres side, you simply won't be able to establish a new connection until any previous connection is closed. Nothing will break, if you handle this situation in your node app.

MongoJS increase pool size

I am building a simple application using Node.js and MongoDB through the MongoJS driver. I am aiming at some heavy load (round 10000 users in 10 mins with sessions of approx 30s)
I am using connection pooling, and by default mongojs creates 5 connections which are expected to be shared. I would like to enlarge this value to increase efficiency, otherwise a bunch of requests will be waiting for a connection which is highly undesirable.
I have found this which explains how to increase the pool size if using the native driver
http://technosophos.com/content/nodejs-connection-pools-and-mongodb
I know that mongojs is a wrapper for the native driver, but is there a way to set the connection pool size in mongojs? or is it something you have to do from the driver.. but if so how can you use the driver and mongojs at the same time? Can anyone point me in the right direction?
Thanks so much.
Pool size options can be specified as part of the connection string. The options are listed in the MongoDB documentation. Connection string parameters are mentioned in the MongoJS documentation but it's not particularly obvious that one would find the answer there.
var db = mongojs('mongodb://localhost/mydb?maxPoolSize=200');
A minPoolSize can also be specified, however this option is not supported by all drivers. I am pretty sure that the native js driver does not support this option. From memory the native js driver keeps a minimum pool of 5 connections.
The default maxPoolSize value is 100.

Resources