node postgres connection perfomance improvement - node.js

I am writing my backend using nodejs (express), and my database is PostgreSQL. I am using node-postres (pg) to connect to the postgres database.
Currently I am using the pg.Pool concept, so that I will have clients connected to serve the request/response. What I observe is the node-postgres takes more than 2 seconds to connect to the DB, and hence the response time is quite long.
In the node-postgres document, they mention that initial connection takes only 20 - 30 milliseconds. But I see more than 2-3 secs, to establish the connection. I did a load test on my app hitting around 1000 requests/ sec, but the average response time, is quite high due to the initial connection establish time. I have only a single SELECT query, where I get the response. The response processing time is very less, only the connection and getting data from DB takes more time.
I tried all the ways of releasing a client to the pool, after receiving the response etc.. Now I'm using pool.query which will take care of connecting, as well as releasing the client to the pool, after the task is done.
Is there any alternate for node-postgres which an provide a better performance for the DB operations.

Related

How does Postgres handle more requests than connections

While going through the Postgres Architecture, one of the things mentioned was that the Postgres DB has a connection limit of 500(which can be modified). And to fetch any data from the Postgres DB, we first need to make a connection to it. So in this case what happens if there are simultaneous 10k requests coming to the DB? How does the requests map to the connection limit, since we have the limit of 500. Do we need to increase the limit or do we need to create more instance of Postgres or is concurrency in play?
If there are 10000 concurrent statements running on a single database, any hardware will be overloaded. You just cannot do that.
Even 500 is way too many concurrent requests, so that value is too high for max_connections (or for the number of concurrent active sessions to be precise).
The good thing is that you don't have to do that. You use a connection pool that acts as a proxy between the application and the database. If your database statements are sufficiently short, you can easily handle thousands of concurrent application users with a few dozen database connections. This protects the database from getting overloaded and avoids opening database connections frequently, which is expensive.
If you try to open more database connections than max_connections allows, you will get an error message. If more processes request a database connection from the pool than the limit allows, some sessions will hang and wait until a connection is available. Yet another point for using a connection pool!

How many session will create using single pool?

I am using Knex version 0.21.15 npm. my pooling parameter is pool {min: 3 , max:300}.
Oracle is my data base server.
pool Is this pool count or session count?
If it is pool, how many sessions can create using a single pool?
If i run one non transaction query 10 time using knex connection ,how many sessions will create?
And when the created session will cleared from oracle session?
Is there any parameter available to remove the idle session from oracle.?
suggest me please if any.
WARNING: a pool.max value of 300 is far too large. You really don't want the database administrator running your Oracle server to distrust you: that can make your work life much more difficult. And such a large max pool size can bring the Oracle server to its knees.
It's a paradox: often you can get better throughput from a database application by reducing the pool size. That's because many concurrent queries can clog the database system.
The pool object here governs how many connections may be in the pool at once. Each connection is a so-called serially reusable resource. That is, when some part of your nodejs program needs to run a query or series of queries, it grabs a connection from the pool. If no connection is already available in the pool, the pooling stuff in knex opens a new one.
If the number of open connections is already at the pool.max value, the pooling stuff makes that part of your nodejs program wait until some other part of the program finishes using a connection in the pool.
When your part of the nodejs program finishes its queries, it releases the connection back to the pool to be reused when some other part of the program needs it.
This is almost absurdly complex. Why bother? Because it's expensive to open connections and much cheaper to re-use them.
Now to your questions:
pool Is this pool count or session count?
It is a pair of limits (min / max) on the count of connections (sessions) open within the pool at one time.
If it is pool, how many sessions can create using a single pool?
Up to the pool.max value.
If i run one non transaction query 10 time using knex connection ,how many sessions will create?
It depends on concurrency. If your tenth query before the first one completes, you may use ten connections from the pool. But you will most likely use fewer than that.
And when the created session will cleared from oracle session?
As mentioned, the pool keeps up to pool.max connections open. That's why 300 is too many.
Is there any parameter available to remove the idle session from oracle.?
This operation is called "evicting" connections from the pool. knex does not support this. Oracle itself may drop idle connections after a timeout. Ask your DBA about that.
In the meantime, use the knex defaults of pool: {min: 2, max: 10} unless and until you really understand pooling and the required concurrency of your application. max:300 would only be justified under very special circumstances.

Number of active connections on the server reached to max

I am working with mongodb and nodejs. I have mongodb hosted on Atlas.
My backend had been working perfectly but now it is sometimes getting stuck and when I see the analytics on mongodb atlas it shows maximum number of active connections reached to 100.
Can someone please explain why this is happening? Can I reboot the connections and make it 0?
#Stennie I have used mongoose to connect to database
Here is my configuration file
const mongooseOptions = {
useNewUrlParser: true,
autoReconnect: true,
poolSize: 25,
connectTimeoutMS: 30000,
socketTimeoutMS: 30000
}
exports.register = (server, options, next) => {
defaults = Hoek.applyToDefaults(defaults, options)
if (Mongoose.connection.readyState) {
return next()
}
if (!Mongoose.connection.readyState) {
server.log(`${process.env.NOED_ENV} server connecting to ${defaults.url} ${defaults.url}`)
return Mongoose.connect(defaults.url, mongooseOptions).then(() => {
return next() // call the next item in hapi bootstrap
})
}
}
Assuming your backend is deployed on lambda since serverless tag.
Each invocation will leave a container idle to prevent cold start, or use an existing one if available. You are leaving the connection open to reuse it between invocation, like advertised in best practices.
With a poolSize of 25 (?) and 100 max connections, you should limit your function concurrency to 4.
Reserve concurrency to prevent your function from using all the available concurrency in the region, or from overloading downstream resources.
More reading: https://www.mongodb.com/blog/post/optimizing-aws-lambda-performance-with-mongodb-atlas-and-nodejs
You could try couple of things:
In a serverless environment, as already suggested by #Gabriel Bleu, why have such a high connectionLimit. Serverless environment keeps spawning new containers and stopping as per requests. If multiple instances spawn concurrently, it would exhaust the MongoDB server limit very quickly.
The concept of connectionPool is, x number of connections are established every time from every node (instance). But that does not mean all the connections are automatically released after querying. After completing ALL the DB operation, you should release each connection individual after use: mongoose.connection.close();
Note: Mongoose connection close will close all the connections of connection pool. So ideally, this should be run just before returning the response.
Why are you setting explicity autoReconnect to true. MongoDB driver internally reconnects whenever the connection is lost and certainly is not recommended for short lifespan instances such as serverless containers.
If you are running in cluster mode, to optimize for performance, change the serverUri to replica set URL format: MONGODB_URI=mongodb://<username>:<password>#<hostOne>,<hostTwo>,<hostThree>...&ssl=true&authSource=admin.
There are so many factors affecting the max connection limit. You have mongoDB hosted on Atlas and as you mentioned the backend is lamda means you have a serverless environment.
Serverless environments spawn new container on the new connection and destroy a connection when it's no longer being used. The peak connection shows that there are so many new instances being initialized or so many concurrent requests from the user connection. The best practice is to terminate database connection once it's no longer needed. You can terminate the connection
mongoose.connection.close(); as you have used mongoose. It will release the connection from the connection pool. Rather exhausting the concurrent connection limit, you should release connection once it's idle.
Your configuration forces the database driver to reconnect after the connection is dropped by the database. You are explicitly setting the autoReconnect as true so the driver will quickly instantiate connection request once the connection is dropped. That may affect the concurrent connection limit. You should avoid setting it explicitly.
cluster mode can optimize the requests according to the load, you can change the server uri to the replica of database. it may help to migrate the load.
There is a small initial startup cost of approximately 5 to 10 seconds when the Lambda function is invoked for the first time and the MongoDB client in your AWS Lambda function connects to MongoDB. Connections to a mongos for a sharded cluster are faster than connecting to a replica set. Subsequent connections will be significantly faster for the duration of the lifecycle of the Lambda function. so Each invocation will leave a container idle to prevent cold start or cold boot, or use an existing one if available.
Atlas sets the limit for concurrent incoming connections to a cluster based on the cluster tier. If you try to connect when you are at this limit, MongoDB displays an error stating “connection refused because too many open connections”. You can close any open connections to your cluster not currently in use. scaling down to a higher tier to support more concurrent connections. as mentioned in best practice you may restart the application. To prevent this issue in the future, consider utilizing the maxPoolSize connection string option to limit the number of connections in the connection pool.
Final Solution to this issue is Upgrading to a larger Atlas cluster tier which allows a greater number of connections. if your user base is too large for your current cluster tier.

Connection pool using pg-promise

I'm using Node js and Postgresql and trying to be most efficient in the connections implementation.
I saw that pg-promise is built on top of node-postgres and node-postgres uses pg-pool to manage pooling.
I also read that "more than 100 clients at a time is a very bad thing" (node-postgres).
I'm using pg-promise and wanted to know:
what is the recommended poolSize for a very big load of data.
what happens if poolSize = 100 and the application gets 101 request simultaneously (or even more)?
Does Postgres handles the order and makes the 101 request wait until it can run it?
I'm the author of pg-promise.
I'm using Node js and Postgresql and trying to be most efficient in the connections implementation.
There are several levels of optimization for database communications. The most important of them is to minimize the number of queries per HTTP request, because IO is expensive, so is the connection pool.
If you have to execute more than one query per HTTP request, always use tasks, via method task.
If your task requires a transaction, execute it as a transaction, via method tx.
If you need to do multiple inserts or updates, always use multi-row operations. See Multi-row insert with pg-promise and PostgreSQL multi-row updates in Node.js.
I saw that pg-promise is built on top of node-postgres and node-postgres uses pg-pool to manage pooling.
node-postgres started using pg-pool from version 6.x, while pg-promise remains on version 5.x which uses the internal connection pool implementation. Here's the reason why.
I also read that "more than 100 clients at a time is a very bad thing"
My long practice in this area suggests: If you cannot fit your service into a pool of 20 connections, you will not be saved by going for more connections, you will need to fix your implementation instead. Also, by going over 20 you start putting additional strain on the CPU, and that translates into further slow-down.
what is the recommended poolSize for a very big load of data.
The size of the data got nothing to do with the size of the pool. You typically use just one connection for a single download or upload, no matter how large. Unless your implementation is wrong and you end up using more than one connection, then you need to fix it, if you want your app to be scalable.
what happens if poolSize = 100 and the application gets 101 request simultaneously
It will wait for the next available connection.
See also:
Chaining Queries
Performance Boost
what happens if poolSize = 100 and the application gets 101 request simultaneously (or even more)? Does Postgres handles the order and makes the 101 request wait until it can run it?
Right, the request will be queued. But it's not handled by Postgres itself, but by your app (pg-pool). So whenever you run out of free connections, the app will wait for a connection to release, and then the next pending request will be performed. That's what pools are for.
what is the recommended poolSize for a very big load of data.
It really depends on many factors, and no one will really tell you the exact number. Why not test your app under huge load and see in practise how it performs, and find the bottlenecks.
Also I find the node-postgres documentation quite confusing and misleading on the matter:
Once you get >100 simultaneous requests your web server will attempt to open 100 connections to the PostgreSQL backend and 💥 you'll run out of memory on the PostgreSQL server, your database will become unresponsive, your app will seem to hang, and everything will break. Boooo!
https://github.com/brianc/node-postgres
It's not quite true. If you reach the connection limit at Postgres side, you simply won't be able to establish a new connection until any previous connection is closed. Nothing will break, if you handle this situation in your node app.

Check queued reads/writes for MongoDB

I feel like this question would have been asked before, but I can't find one. Pardon me if this is a repeat.
I'm building a service on Node.js hosted in Heroku and using MongoDB hosted by Compose. Under heavy load, the latency is most likely to come from the database, as there is nothing very CPU-heavy in the service layer. Thus, when MongoDB is overloaded, I want to return an HTTP 503 promptly instead of waiting for a timeout.
I'm also using REDIS, and REDIS has a feature where you can check the number of queued commands (redisClient.command_queue.length). With this feature, I can know right away if REDIS is backed up. Is there something similar for MongoDB?
The best option I have found so far is polling the server for status via this command, but (1) I'm hoping for something client side, as there could be spikes within the polling interval that cause problems, and (2) I'm not actually sure what part of the status response I want to act on. That second part brings me to a follow up question...
I don't fully understand how the MondoDB client works with the server. Is one connection shared per client instance (and in my case, per process)? Are queries and writes queued locally or on the server? Or, is one connection opened for each query/write, until the database's connection pool is exhausted? If the latter is the case, it seems like I might want to keep an eye on the open connections. Does the MongoDB server return such information at other times, besides when polled for status?
Thanks!
MongoDB connection pool workflow-
Every MongoClient instance has a built-in connection pool. The client opens sockets on demand to support the number of concurrent MongoDB operations your application requires. There is no thread-affinity for sockets.
The client instance, opens one additional socket per server in your MongoDB topology for monitoring the server’s state.
The size of each connection pool is capped at maxPoolSize, which defaults to 100.
When a thread in your application begins an operation on MongoDB, if all other sockets are in use and the pool has reached its maximum, the thread pauses, waiting for a socket to be returned to the pool by another thread.
You can increase maxPoolSize:
client = MongoClient(host, port, maxPoolSize=200)
By default, any number of threads are allowed to wait for sockets to become available, and they can wait any length of time. Override waitQueueMultiple to cap the number of waiting threads. E.g., to keep the number of waiters less than or equal to 500:
client = MongoClient(host, port, maxPoolSize=50, waitQueueMultiple=10)
Once the pool reaches its max size, additional threads are allowed to wait indefinitely for sockets to become available, unless you set waitQueueTimeoutMS:
client = MongoClient(host, port, waitQueueTimeoutMS=100)
Reference for connection pooling-
http://blog.mongolab.com/2013/11/deep-dive-into-connection-pooling/

Resources