Configuring the ideal maxPoolSize on MongoDB Native driver

Configuring the ideal maxPoolSize on MongoDB Native driver - node.js

The default value for maxPoolSize on the MongoClient is 100.
How would one know if the default value, 100, is the optimal value? Is there a formula, a way of calculating to determine the ideal maxPoolSize?
Thanks a lot.

The pool size is solely dependent on the driver you are using, for Node.js minimum is 5 which is default and the maximum is 100(default).
Pool size helps to make the concurrent requests to the DB and it depends upon how much concurrent connection(queries you can run on) you need to create from the application. You can increase it based on your usage- that you have to need more than 100 connections at a time with DB or you need some connections for long-running tasks etc.
And these connections are of course based on the server that you are running the DB on. Mongo atlas does provide certain clusters that can have up to 1500 connections at a time.
You should check this post to see how does it impact application performance.

Related

How many session will create using single pool?

I am using Knex version 0.21.15 npm. my pooling parameter is pool {min: 3 , max:300}.
Oracle is my data base server.
pool Is this pool count or session count?
If it is pool, how many sessions can create using a single pool?
If i run one non transaction query 10 time using knex connection ,how many sessions will create?
And when the created session will cleared from oracle session?
Is there any parameter available to remove the idle session from oracle.?
suggest me please if any.

WARNING: a pool.max value of 300 is far too large. You really don't want the database administrator running your Oracle server to distrust you: that can make your work life much more difficult. And such a large max pool size can bring the Oracle server to its knees.
It's a paradox: often you can get better throughput from a database application by reducing the pool size. That's because many concurrent queries can clog the database system.
The pool object here governs how many connections may be in the pool at once. Each connection is a so-called serially reusable resource. That is, when some part of your nodejs program needs to run a query or series of queries, it grabs a connection from the pool. If no connection is already available in the pool, the pooling stuff in knex opens a new one.
If the number of open connections is already at the pool.max value, the pooling stuff makes that part of your nodejs program wait until some other part of the program finishes using a connection in the pool.
When your part of the nodejs program finishes its queries, it releases the connection back to the pool to be reused when some other part of the program needs it.
This is almost absurdly complex. Why bother? Because it's expensive to open connections and much cheaper to re-use them.
Now to your questions:
pool Is this pool count or session count?
It is a pair of limits (min / max) on the count of connections (sessions) open within the pool at one time.
If it is pool, how many sessions can create using a single pool?
Up to the pool.max value.
If i run one non transaction query 10 time using knex connection ,how many sessions will create?
It depends on concurrency. If your tenth query before the first one completes, you may use ten connections from the pool. But you will most likely use fewer than that.
And when the created session will cleared from oracle session?
As mentioned, the pool keeps up to pool.max connections open. That's why 300 is too many.
Is there any parameter available to remove the idle session from oracle.?
This operation is called "evicting" connections from the pool. knex does not support this. Oracle itself may drop idle connections after a timeout. Ask your DBA about that.
In the meantime, use the knex defaults of pool: {min: 2, max: 10} unless and until you really understand pooling and the required concurrency of your application. max:300 would only be justified under very special circumstances.

Node.js Oracle Database Connection Pool Size

I implemented a connection pool (poolMin=poolMax=10) with node-oracledb and i saw a difference of up to 100 times especially in case of few users like 10. Really impressive. I also increased UV_THREADPOOL_SIZE like 4 + poolMax. At this point I could not understand somethings.
process.env.UV_THREADPOOL_SIZE = 4 + config.pool.poolMax // Default + Max
NodeJs works as Single Thread (with additional 4 threads those none of them are used for network I/O). So when i use a pool with 10 connections, can Single Thread use all of these connections? Or isn't it Single Thread with these settings anymore? Because i added 10 more to UV_THREADPOOL_SIZE. I would be grateful to anyone who explained this matter.
Btw, I wonder if using a fixed number pool like 10 would cause a problem in case of too many users? For example, if the number of instant users is 500, we can reach 5000 instant users on certain days of the year. Do I need to make a special setting (e.g. pool size 100) for those days or will the default be enough?
Thanks in advace.

When you do something like connection.execute(), that work will handled by a Node.js worker thread until the call completes. And each underlaying Oracle connection can only ever do one 'thing' (like execute, or fetch LOB data) at a time - this is a fundamental (i.e. insurmountable) behavior of Oracle connections.
For node-oracledb you want the number of worker threads to be at least as big as the number of connections in the connection pool, plus some extra for non database work. This allows connections to do their thing without blocking any other connection.
Any use of Promise.all() (and similar constructs) using a single connection should be assessed and considered for rewriting as a simple loop. Prior to node-oracledb 5.2, each of the 'parallel' operations for Promise.all() on a single connection will use a thread but this will be blocked waiting for prior work on the connection to complete, so you might need even more threads available. From 5.2 onwards any 'parallel' operations on a single connection will be queued in the JavaScript layer of node-oracledb and will be executed sequentially, so you will only need a worker thread per connection at most. In either version, using Promise.all() where each unit of work has its own connection is different, and only subject to the one-connection per thread requirements.
Check the node-oracledb documentation Connections, Threads, and Parallelism and Connection Pool Sizing.
Separate to how connections are used, first you have to get a connection. Node-oracledb will queue connection pool requests (e.g. pool.getConnection()) if every connection in the pool is already in use. This provides some resiliency under connection spikes. There are some limits to help real storms: queueMax and queueTimeout. Yes, at peak periods you might need to increase the poolMax value. You can check the pool statistics to see pool behavior. You don't want to make the pool too big - see the doc.
Side note: process.env.UV_THREADPOOL_SIZE doesn't have an effect in Node.js on Windows; the UV_THREADPOOL_SIZE variable must be set before Node.js is started.

Connection pool using pg-promise

I'm using Node js and Postgresql and trying to be most efficient in the connections implementation.
I saw that pg-promise is built on top of node-postgres and node-postgres uses pg-pool to manage pooling.
I also read that "more than 100 clients at a time is a very bad thing" (node-postgres).
I'm using pg-promise and wanted to know:
what is the recommended poolSize for a very big load of data.
what happens if poolSize = 100 and the application gets 101 request simultaneously (or even more)?
Does Postgres handles the order and makes the 101 request wait until it can run it?

I'm the author of pg-promise.
I'm using Node js and Postgresql and trying to be most efficient in the connections implementation.
There are several levels of optimization for database communications. The most important of them is to minimize the number of queries per HTTP request, because IO is expensive, so is the connection pool.
If you have to execute more than one query per HTTP request, always use tasks, via method task.
If your task requires a transaction, execute it as a transaction, via method tx.
If you need to do multiple inserts or updates, always use multi-row operations. See Multi-row insert with pg-promise and PostgreSQL multi-row updates in Node.js.
I saw that pg-promise is built on top of node-postgres and node-postgres uses pg-pool to manage pooling.
node-postgres started using pg-pool from version 6.x, while pg-promise remains on version 5.x which uses the internal connection pool implementation. Here's the reason why.
I also read that "more than 100 clients at a time is a very bad thing"
My long practice in this area suggests: If you cannot fit your service into a pool of 20 connections, you will not be saved by going for more connections, you will need to fix your implementation instead. Also, by going over 20 you start putting additional strain on the CPU, and that translates into further slow-down.
what is the recommended poolSize for a very big load of data.
The size of the data got nothing to do with the size of the pool. You typically use just one connection for a single download or upload, no matter how large. Unless your implementation is wrong and you end up using more than one connection, then you need to fix it, if you want your app to be scalable.
what happens if poolSize = 100 and the application gets 101 request simultaneously
It will wait for the next available connection.
See also:
Chaining Queries
Performance Boost

what happens if poolSize = 100 and the application gets 101 request simultaneously (or even more)? Does Postgres handles the order and makes the 101 request wait until it can run it?
Right, the request will be queued. But it's not handled by Postgres itself, but by your app (pg-pool). So whenever you run out of free connections, the app will wait for a connection to release, and then the next pending request will be performed. That's what pools are for.
what is the recommended poolSize for a very big load of data.
It really depends on many factors, and no one will really tell you the exact number. Why not test your app under huge load and see in practise how it performs, and find the bottlenecks.
Also I find the node-postgres documentation quite confusing and misleading on the matter:
Once you get >100 simultaneous requests your web server will attempt to open 100 connections to the PostgreSQL backend and 💥 you'll run out of memory on the PostgreSQL server, your database will become unresponsive, your app will seem to hang, and everything will break. Boooo!
https://github.com/brianc/node-postgres
It's not quite true. If you reach the connection limit at Postgres side, you simply won't be able to establish a new connection until any previous connection is closed. Nothing will break, if you handle this situation in your node app.

How to optimize Postgresql max_connections and node-postgres connection pool?

In brief, I am having trouble supporting more than 5000 read requests per minute from a data API leveraging Postgresql, Node.js, and node-postgres. The bottleneck appears to be in between the API and the DB. Here are the implmentation details.
I'm using an AWS Postgresql RDS database instance (m4.4xlarge - 64 GB mem, 16 vCPUs, 350 GB SSD, no provisioned IOPS) for a Node.js powered data API. By default the RDS's max_connections=5000. The node API is load-balanced across two clusters with 4 processes each (2 Ec2s with 4 vCPUs running the API with PM2 in cluster-mode). I use node-postgres to bind the API to the Postgresql RDS, and am attempting to use it's connection pooling feature. Below is a sample of my connection pool code:
var pool = new Pool({
user: settings.database.username,
password: settings.database.password,
host: settings.database.readServer,
database: settings.database.database,
max: 25,
idleTimeoutMillis: 1000
});
/* Example of pool usage */
pool.query('SELECT my_column FROM my_table', function(err, result){
/* Callback code here */
});
Using this implementation and testing with a load tester, I can support about 5000 requests over the course of one minute, with an average response time of about 190ms (which is what I expect). As soon as I fire off more than 5000 requests per minute, my response time increases to over 1200ms in the best of cases and in the worst of cases the API begins to frequently timeout. Monitoring indicates that for the EC2s running the Node.js API, CPU utilization remains below 10%. Thus my focus is on the DB and the API's binding to the DB.
I have attempted to increase (and decrease for that matter) the node-postgres "max" connections setting, but there was no change in the API response/timeout behavior. I've also tried provisioned IOPS on the RDS, but no improvement. Also, interestingly, I scaled the RDS up to m4.10xlarge (160 GB mem, 40 vCPUs), and while the RDS CPU utilization dropped greatly, the overall performance of the API worsed considerably (couldn't even support the 5000 requests per minute that I was able to with the smaller RDS).
I'm in unfamilar territory in many respects and am unsure of how to best determine which of these moving parts is bottlenecking API performance when over 5000 requests per minute. As noted I have attempted a variety of adjustments based on the review of Postgresql configuration documentation and node-postgres documentation, but to no avail.
If anyone has advice on how to diagnose or optimize I would greatly appreciate it.
UPDATE
After scaling up to m4.10xlarge, i performed a series of load-tests, varying the number of request/min and the max number of connections in each pool. Here are some screen captures of monitoring metrics:

In order to support more then 5k requests, while maintaining the same response rate, you'll need better hardware...
The simple math states that:
5000 requests*190ms avg = 950k ms divided into 16 cores ~ 60k ms per core
which basically means your system was highly loaded.
(I'm guessing you had some spare CPU as some time was lost on networking)
Now, the really interesting part in your question comes from the scale up attempt: m4.10xlarge (160 GB mem, 40 vCPUs).
The drop in CPU utilization indicates that the scale up freed DB time resources - So you need to push more requests!
2 suggestions:
Try increasing the connection pool to max: 70 and look at the network traffic (depending on the amount of data you might be hogging the network)
also, are your requests to the DB a-sync from the application side? make sure your app can actually push more requests.

The best way is to make use of a separate Pool for each API call, based on the call's priority:
const highPriority = new Pool({max: 20}); // for high-priority API calls
const lowPriority = new Pool({max: 5}); // for low-priority API calls
Then you just use the right pool for each of the API calls, for optimum service/connection availability.

Since you are interested in read performance can set up replication between two (or more) PostgreSQL instances, and then use pgpool II to load balance between the instances.
Scaling horizontally means you won't start hitting the max instance sizes at AWS if you decide next week you need to go to 10,000 concurrent reads.
You also start to get some HA in your architecture.
--
Many times people will use pgbouncer as a connection pooler even if they have one built into their application code already. pgbouncer works really well and is typically easier to configure and manage that pgpool, but it doesn't do load balancing. I'm not sure if it would help you very much in this scenario though.

MongoJS increase pool size

I am building a simple application using Node.js and MongoDB through the MongoJS driver. I am aiming at some heavy load (round 10000 users in 10 mins with sessions of approx 30s)
I am using connection pooling, and by default mongojs creates 5 connections which are expected to be shared. I would like to enlarge this value to increase efficiency, otherwise a bunch of requests will be waiting for a connection which is highly undesirable.
I have found this which explains how to increase the pool size if using the native driver
http://technosophos.com/content/nodejs-connection-pools-and-mongodb
I know that mongojs is a wrapper for the native driver, but is there a way to set the connection pool size in mongojs? or is it something you have to do from the driver.. but if so how can you use the driver and mongojs at the same time? Can anyone point me in the right direction?
Thanks so much.

Pool size options can be specified as part of the connection string. The options are listed in the MongoDB documentation. Connection string parameters are mentioned in the MongoJS documentation but it's not particularly obvious that one would find the answer there.
var db = mongojs('mongodb://localhost/mydb?maxPoolSize=200');
A minPoolSize can also be specified, however this option is not supported by all drivers. I am pretty sure that the native js driver does not support this option. From memory the native js driver keeps a minimum pool of 5 connections.
The default maxPoolSize value is 100.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string