I am working with mongodb and nodejs. I have mongodb hosted on Atlas.
My backend had been working perfectly but now it is sometimes getting stuck and when I see the analytics on mongodb atlas it shows maximum number of active connections reached to 100.
Can someone please explain why this is happening? Can I reboot the connections and make it 0?
#Stennie I have used mongoose to connect to database
Here is my configuration file
const mongooseOptions = {
useNewUrlParser: true,
autoReconnect: true,
poolSize: 25,
connectTimeoutMS: 30000,
socketTimeoutMS: 30000
}
exports.register = (server, options, next) => {
defaults = Hoek.applyToDefaults(defaults, options)
if (Mongoose.connection.readyState) {
return next()
}
if (!Mongoose.connection.readyState) {
server.log(`${process.env.NOED_ENV} server connecting to ${defaults.url} ${defaults.url}`)
return Mongoose.connect(defaults.url, mongooseOptions).then(() => {
return next() // call the next item in hapi bootstrap
})
}
}
Assuming your backend is deployed on lambda since serverless tag.
Each invocation will leave a container idle to prevent cold start, or use an existing one if available. You are leaving the connection open to reuse it between invocation, like advertised in best practices.
With a poolSize of 25 (?) and 100 max connections, you should limit your function concurrency to 4.
Reserve concurrency to prevent your function from using all the available concurrency in the region, or from overloading downstream resources.
More reading: https://www.mongodb.com/blog/post/optimizing-aws-lambda-performance-with-mongodb-atlas-and-nodejs
You could try couple of things:
In a serverless environment, as already suggested by #Gabriel Bleu, why have such a high connectionLimit. Serverless environment keeps spawning new containers and stopping as per requests. If multiple instances spawn concurrently, it would exhaust the MongoDB server limit very quickly.
The concept of connectionPool is, x number of connections are established every time from every node (instance). But that does not mean all the connections are automatically released after querying. After completing ALL the DB operation, you should release each connection individual after use: mongoose.connection.close();
Note: Mongoose connection close will close all the connections of connection pool. So ideally, this should be run just before returning the response.
Why are you setting explicity autoReconnect to true. MongoDB driver internally reconnects whenever the connection is lost and certainly is not recommended for short lifespan instances such as serverless containers.
If you are running in cluster mode, to optimize for performance, change the serverUri to replica set URL format: MONGODB_URI=mongodb://<username>:<password>#<hostOne>,<hostTwo>,<hostThree>...&ssl=true&authSource=admin.
There are so many factors affecting the max connection limit. You have mongoDB hosted on Atlas and as you mentioned the backend is lamda means you have a serverless environment.
Serverless environments spawn new container on the new connection and destroy a connection when it's no longer being used. The peak connection shows that there are so many new instances being initialized or so many concurrent requests from the user connection. The best practice is to terminate database connection once it's no longer needed. You can terminate the connection
mongoose.connection.close(); as you have used mongoose. It will release the connection from the connection pool. Rather exhausting the concurrent connection limit, you should release connection once it's idle.
Your configuration forces the database driver to reconnect after the connection is dropped by the database. You are explicitly setting the autoReconnect as true so the driver will quickly instantiate connection request once the connection is dropped. That may affect the concurrent connection limit. You should avoid setting it explicitly.
cluster mode can optimize the requests according to the load, you can change the server uri to the replica of database. it may help to migrate the load.
There is a small initial startup cost of approximately 5 to 10 seconds when the Lambda function is invoked for the first time and the MongoDB client in your AWS Lambda function connects to MongoDB. Connections to a mongos for a sharded cluster are faster than connecting to a replica set. Subsequent connections will be significantly faster for the duration of the lifecycle of the Lambda function. so Each invocation will leave a container idle to prevent cold start or cold boot, or use an existing one if available.
Atlas sets the limit for concurrent incoming connections to a cluster based on the cluster tier. If you try to connect when you are at this limit, MongoDB displays an error stating “connection refused because too many open connections”. You can close any open connections to your cluster not currently in use. scaling down to a higher tier to support more concurrent connections. as mentioned in best practice you may restart the application. To prevent this issue in the future, consider utilizing the maxPoolSize connection string option to limit the number of connections in the connection pool.
Final Solution to this issue is Upgrading to a larger Atlas cluster tier which allows a greater number of connections. if your user base is too large for your current cluster tier.
Related
While going through the Postgres Architecture, one of the things mentioned was that the Postgres DB has a connection limit of 500(which can be modified). And to fetch any data from the Postgres DB, we first need to make a connection to it. So in this case what happens if there are simultaneous 10k requests coming to the DB? How does the requests map to the connection limit, since we have the limit of 500. Do we need to increase the limit or do we need to create more instance of Postgres or is concurrency in play?
If there are 10000 concurrent statements running on a single database, any hardware will be overloaded. You just cannot do that.
Even 500 is way too many concurrent requests, so that value is too high for max_connections (or for the number of concurrent active sessions to be precise).
The good thing is that you don't have to do that. You use a connection pool that acts as a proxy between the application and the database. If your database statements are sufficiently short, you can easily handle thousands of concurrent application users with a few dozen database connections. This protects the database from getting overloaded and avoids opening database connections frequently, which is expensive.
If you try to open more database connections than max_connections allows, you will get an error message. If more processes request a database connection from the pool than the limit allows, some sessions will hang and wait until a connection is available. Yet another point for using a connection pool!
I am using Knex version 0.21.15 npm. my pooling parameter is pool {min: 3 , max:300}.
Oracle is my data base server.
pool Is this pool count or session count?
If it is pool, how many sessions can create using a single pool?
If i run one non transaction query 10 time using knex connection ,how many sessions will create?
And when the created session will cleared from oracle session?
Is there any parameter available to remove the idle session from oracle.?
suggest me please if any.
WARNING: a pool.max value of 300 is far too large. You really don't want the database administrator running your Oracle server to distrust you: that can make your work life much more difficult. And such a large max pool size can bring the Oracle server to its knees.
It's a paradox: often you can get better throughput from a database application by reducing the pool size. That's because many concurrent queries can clog the database system.
The pool object here governs how many connections may be in the pool at once. Each connection is a so-called serially reusable resource. That is, when some part of your nodejs program needs to run a query or series of queries, it grabs a connection from the pool. If no connection is already available in the pool, the pooling stuff in knex opens a new one.
If the number of open connections is already at the pool.max value, the pooling stuff makes that part of your nodejs program wait until some other part of the program finishes using a connection in the pool.
When your part of the nodejs program finishes its queries, it releases the connection back to the pool to be reused when some other part of the program needs it.
This is almost absurdly complex. Why bother? Because it's expensive to open connections and much cheaper to re-use them.
Now to your questions:
pool Is this pool count or session count?
It is a pair of limits (min / max) on the count of connections (sessions) open within the pool at one time.
If it is pool, how many sessions can create using a single pool?
Up to the pool.max value.
If i run one non transaction query 10 time using knex connection ,how many sessions will create?
It depends on concurrency. If your tenth query before the first one completes, you may use ten connections from the pool. But you will most likely use fewer than that.
And when the created session will cleared from oracle session?
As mentioned, the pool keeps up to pool.max connections open. That's why 300 is too many.
Is there any parameter available to remove the idle session from oracle.?
This operation is called "evicting" connections from the pool. knex does not support this. Oracle itself may drop idle connections after a timeout. Ask your DBA about that.
In the meantime, use the knex defaults of pool: {min: 2, max: 10} unless and until you really understand pooling and the required concurrency of your application. max:300 would only be justified under very special circumstances.
I have a Sequelize instance and it is exported in a file to be accessed when doing DB operations.
const sequelize = new Sequelize('database', 'username', null, {
dialect: 'mysql'
});
module.exports = sequelize;
So the instance is created when the expressjs server starts and never destroys. I wonder if this is the correct way to do, or should I call new Sequelize every time I use the DB operation?
I think it should be kept alive because that's how DB pooling could take effect. Right?
The bottom line is - yes, it should stay alive. There is no performance hit whatsoever if you keep the instance alive. Because it will be the Sequelize instance (and by extension the ORM) which will handle the future connections. This also includes (as you noted) pooling.
The connections
When it comes to the pooling configuration itself, it get's a little tricky though. Depending on your configuration, the pool has some amount of "space" to work with - the limit to create connections, the idle duration after which connections are removed etc. I can certainly imagine a situation when keeping a connection alive is simply not needed - for example an internal system for a company which is not used overnight.
The Sequelize ORM gives you a good set of options to choose from when configuring your connection pool. In general, you do want to reuse connections as establishing new ones is quite expensive - not just because of the network (e.g. authorization, maybe proxy etc.) but also because of memory allocation which happens when you create a database connection (which is why reconnecting on every request is not a good idea..).
However, it all comes down to which database engine you use (and how busy your system is); MySQL can for example cache connections. When a connection is closed, it is returned to the thread cache rather than discarded (for some period of time). When a new connection opens, MySQL will look into the thread cache rather than try and establish a new connection.
You might want to go through these:
https://stackoverflow.com/a/4041136/8775880 (Good explanation on how pooling works)
https://stackoverflow.com/a/11659275/8775880 (Good explanation on how expensive it is to keep connections open)
I am writing my backend using nodejs (express), and my database is PostgreSQL. I am using node-postres (pg) to connect to the postgres database.
Currently I am using the pg.Pool concept, so that I will have clients connected to serve the request/response. What I observe is the node-postgres takes more than 2 seconds to connect to the DB, and hence the response time is quite long.
In the node-postgres document, they mention that initial connection takes only 20 - 30 milliseconds. But I see more than 2-3 secs, to establish the connection. I did a load test on my app hitting around 1000 requests/ sec, but the average response time, is quite high due to the initial connection establish time. I have only a single SELECT query, where I get the response. The response processing time is very less, only the connection and getting data from DB takes more time.
I tried all the ways of releasing a client to the pool, after receiving the response etc.. Now I'm using pool.query which will take care of connecting, as well as releasing the client to the pool, after the task is done.
Is there any alternate for node-postgres which an provide a better performance for the DB operations.
I feel like this question would have been asked before, but I can't find one. Pardon me if this is a repeat.
I'm building a service on Node.js hosted in Heroku and using MongoDB hosted by Compose. Under heavy load, the latency is most likely to come from the database, as there is nothing very CPU-heavy in the service layer. Thus, when MongoDB is overloaded, I want to return an HTTP 503 promptly instead of waiting for a timeout.
I'm also using REDIS, and REDIS has a feature where you can check the number of queued commands (redisClient.command_queue.length). With this feature, I can know right away if REDIS is backed up. Is there something similar for MongoDB?
The best option I have found so far is polling the server for status via this command, but (1) I'm hoping for something client side, as there could be spikes within the polling interval that cause problems, and (2) I'm not actually sure what part of the status response I want to act on. That second part brings me to a follow up question...
I don't fully understand how the MondoDB client works with the server. Is one connection shared per client instance (and in my case, per process)? Are queries and writes queued locally or on the server? Or, is one connection opened for each query/write, until the database's connection pool is exhausted? If the latter is the case, it seems like I might want to keep an eye on the open connections. Does the MongoDB server return such information at other times, besides when polled for status?
Thanks!
MongoDB connection pool workflow-
Every MongoClient instance has a built-in connection pool. The client opens sockets on demand to support the number of concurrent MongoDB operations your application requires. There is no thread-affinity for sockets.
The client instance, opens one additional socket per server in your MongoDB topology for monitoring the server’s state.
The size of each connection pool is capped at maxPoolSize, which defaults to 100.
When a thread in your application begins an operation on MongoDB, if all other sockets are in use and the pool has reached its maximum, the thread pauses, waiting for a socket to be returned to the pool by another thread.
You can increase maxPoolSize:
client = MongoClient(host, port, maxPoolSize=200)
By default, any number of threads are allowed to wait for sockets to become available, and they can wait any length of time. Override waitQueueMultiple to cap the number of waiting threads. E.g., to keep the number of waiters less than or equal to 500:
client = MongoClient(host, port, maxPoolSize=50, waitQueueMultiple=10)
Once the pool reaches its max size, additional threads are allowed to wait indefinitely for sockets to become available, unless you set waitQueueTimeoutMS:
client = MongoClient(host, port, waitQueueTimeoutMS=100)
Reference for connection pooling-
http://blog.mongolab.com/2013/11/deep-dive-into-connection-pooling/