Should I keep Sequelize instance throughout server running time? - node.js

I have a Sequelize instance and it is exported in a file to be accessed when doing DB operations.
const sequelize = new Sequelize('database', 'username', null, {
dialect: 'mysql'
});
module.exports = sequelize;
So the instance is created when the expressjs server starts and never destroys. I wonder if this is the correct way to do, or should I call new Sequelize every time I use the DB operation?
I think it should be kept alive because that's how DB pooling could take effect. Right?

The bottom line is - yes, it should stay alive. There is no performance hit whatsoever if you keep the instance alive. Because it will be the Sequelize instance (and by extension the ORM) which will handle the future connections. This also includes (as you noted) pooling.
The connections
When it comes to the pooling configuration itself, it get's a little tricky though. Depending on your configuration, the pool has some amount of "space" to work with - the limit to create connections, the idle duration after which connections are removed etc. I can certainly imagine a situation when keeping a connection alive is simply not needed - for example an internal system for a company which is not used overnight.
The Sequelize ORM gives you a good set of options to choose from when configuring your connection pool. In general, you do want to reuse connections as establishing new ones is quite expensive - not just because of the network (e.g. authorization, maybe proxy etc.) but also because of memory allocation which happens when you create a database connection (which is why reconnecting on every request is not a good idea..).
However, it all comes down to which database engine you use (and how busy your system is); MySQL can for example cache connections. When a connection is closed, it is returned to the thread cache rather than discarded (for some period of time). When a new connection opens, MySQL will look into the thread cache rather than try and establish a new connection.
You might want to go through these:
https://stackoverflow.com/a/4041136/8775880 (Good explanation on how pooling works)
https://stackoverflow.com/a/11659275/8775880 (Good explanation on how expensive it is to keep connections open)

Related

How many session will create using single pool?

I am using Knex version 0.21.15 npm. my pooling parameter is pool {min: 3 , max:300}.
Oracle is my data base server.
pool Is this pool count or session count?
If it is pool, how many sessions can create using a single pool?
If i run one non transaction query 10 time using knex connection ,how many sessions will create?
And when the created session will cleared from oracle session?
Is there any parameter available to remove the idle session from oracle.?
suggest me please if any.
WARNING: a pool.max value of 300 is far too large. You really don't want the database administrator running your Oracle server to distrust you: that can make your work life much more difficult. And such a large max pool size can bring the Oracle server to its knees.
It's a paradox: often you can get better throughput from a database application by reducing the pool size. That's because many concurrent queries can clog the database system.
The pool object here governs how many connections may be in the pool at once. Each connection is a so-called serially reusable resource. That is, when some part of your nodejs program needs to run a query or series of queries, it grabs a connection from the pool. If no connection is already available in the pool, the pooling stuff in knex opens a new one.
If the number of open connections is already at the pool.max value, the pooling stuff makes that part of your nodejs program wait until some other part of the program finishes using a connection in the pool.
When your part of the nodejs program finishes its queries, it releases the connection back to the pool to be reused when some other part of the program needs it.
This is almost absurdly complex. Why bother? Because it's expensive to open connections and much cheaper to re-use them.
Now to your questions:
pool Is this pool count or session count?
It is a pair of limits (min / max) on the count of connections (sessions) open within the pool at one time.
If it is pool, how many sessions can create using a single pool?
Up to the pool.max value.
If i run one non transaction query 10 time using knex connection ,how many sessions will create?
It depends on concurrency. If your tenth query before the first one completes, you may use ten connections from the pool. But you will most likely use fewer than that.
And when the created session will cleared from oracle session?
As mentioned, the pool keeps up to pool.max connections open. That's why 300 is too many.
Is there any parameter available to remove the idle session from oracle.?
This operation is called "evicting" connections from the pool. knex does not support this. Oracle itself may drop idle connections after a timeout. Ask your DBA about that.
In the meantime, use the knex defaults of pool: {min: 2, max: 10} unless and until you really understand pooling and the required concurrency of your application. max:300 would only be justified under very special circumstances.

PostgreSQL: use same connection or get another from pool?

I have a Node.js script and a PostgreSQL database, and I'll be using a library that maintains a pool of connections to the database.
Say I have a script that queries the database multiple times (not a transaction) at different parts of the script, how do I tell if I should acquire a single connection/client and reuse it throughout*, or acquire a new client from the pool for each query? (Both works but which has better performance?)
*task in the pg-promise library, connect in the node-postgres library.
...
// Acquire connection from pool.
(Database query)
(Non-database-related code)
(Database query)
// Release connection to pool.
...
or
...
// Acquire connection from pool.
(Database query)
// Release connection to pool.
(Non-database-related code)
// Acquire connection from pool.
(Database query)
// Release connection to pool.
...
I am not sure, how the pool you are using works, but normally they should reuse the connections (don't disconnect after use), so you do not need to be concerned with caching connections.
You can use node-postgres module that will make you task easier.
And about your question when to use pool here is the brief answer.
PostgreSQL server can only handle 1 query at a time per connection.
That means if you have 1 global new pg.Client() connected to your
backend your entire app is bottleknecked based on how fast postgres
can respond to queries. It literally will line everything up, queuing
each query. Yeah, it's async and so that's alright...but wouldn't you
rather multiply your throughput by 10x? Use pg.connect set the
pg.defaults.poolSize to something sane (we do 25-100, not sure the
right number yet).
new pg.Client is for when you know what you're doing. When you need a
single long lived client for some reason or need to very carefully
control the life-cycle. A good example of this is when using
LISTEN/NOTIFY. The listening client needs to be around and connected
and not shared so it can properly handle NOTIFY messages. Other
example would be when opening up a 1-off client to kill some hung
stuff or in command line scripts.
here is the link of that module.
Hopefully this will help.
https://github.com/brianc/node-postgres
You can see the documentation over there and about the pooling. Thanks :)
And about closing the pool it provides the callback done which can be called when you want to close that pool.

Check queued reads/writes for MongoDB

I feel like this question would have been asked before, but I can't find one. Pardon me if this is a repeat.
I'm building a service on Node.js hosted in Heroku and using MongoDB hosted by Compose. Under heavy load, the latency is most likely to come from the database, as there is nothing very CPU-heavy in the service layer. Thus, when MongoDB is overloaded, I want to return an HTTP 503 promptly instead of waiting for a timeout.
I'm also using REDIS, and REDIS has a feature where you can check the number of queued commands (redisClient.command_queue.length). With this feature, I can know right away if REDIS is backed up. Is there something similar for MongoDB?
The best option I have found so far is polling the server for status via this command, but (1) I'm hoping for something client side, as there could be spikes within the polling interval that cause problems, and (2) I'm not actually sure what part of the status response I want to act on. That second part brings me to a follow up question...
I don't fully understand how the MondoDB client works with the server. Is one connection shared per client instance (and in my case, per process)? Are queries and writes queued locally or on the server? Or, is one connection opened for each query/write, until the database's connection pool is exhausted? If the latter is the case, it seems like I might want to keep an eye on the open connections. Does the MongoDB server return such information at other times, besides when polled for status?
Thanks!
MongoDB connection pool workflow-
Every MongoClient instance has a built-in connection pool. The client opens sockets on demand to support the number of concurrent MongoDB operations your application requires. There is no thread-affinity for sockets.
The client instance, opens one additional socket per server in your MongoDB topology for monitoring the server’s state.
The size of each connection pool is capped at maxPoolSize, which defaults to 100.
When a thread in your application begins an operation on MongoDB, if all other sockets are in use and the pool has reached its maximum, the thread pauses, waiting for a socket to be returned to the pool by another thread.
You can increase maxPoolSize:
client = MongoClient(host, port, maxPoolSize=200)
By default, any number of threads are allowed to wait for sockets to become available, and they can wait any length of time. Override waitQueueMultiple to cap the number of waiting threads. E.g., to keep the number of waiters less than or equal to 500:
client = MongoClient(host, port, maxPoolSize=50, waitQueueMultiple=10)
Once the pool reaches its max size, additional threads are allowed to wait indefinitely for sockets to become available, unless you set waitQueueTimeoutMS:
client = MongoClient(host, port, waitQueueTimeoutMS=100)
Reference for connection pooling-
http://blog.mongolab.com/2013/11/deep-dive-into-connection-pooling/

Mongoose Connection Pool

I noticed in the Mongoose docs that there is support for a connection pool.
http://mongoosejs.com/docs/connections.html
Considering that node is single threaded why is there a connection pool?
What's the lifecycle of connections in the pool?
Connection pools don't have anything to do with async vs sync -- it just works like so:
You can specify an amount of open connections to maintain to your database (let's say 10).
Each time your Node JS code makes a query, if possible, it'll use one of the already-open 10 connections to make this request -- this way you can avoid the overhead of opening a new database connection for each query.
Maintaining a connection pool is essentially maintaining an array of db connection objects, and picking unused ones for every query. It's not actually effecting threads or processes at all =)
apparently node is a single threaded but internally when node makes a call to IO operation under the hood it has some threading mechanism through which it performs IO. Main thread doesn't perform this IO operation itself, if it was performing IO, system would be dead already.
https://codeburst.io/how-node-js-single-thread-mechanism-work-understanding-event-loop-in-nodejs-230f7440b0ea

reuse mongodb connection and close it

I'm using the Node native client 1.4 in my application and I found something in the document a little bit confusing:
A Connection Pool is a cache of database connections maintained by the driver so that connections can be re-used when new connections to the database are required. To reduce the number of connection pools created by your application, we recommend calling MongoClient.connect once and reusing the database variable returned by the callback:
Several questions come in mind when reading this:
Does it mean the db object also maintains the fail over feature provided by replica set? Which I thought should be the work of MongoClient (not sure about this but the C# driver document does say MongoClient maintains replica set stuff)
If I'm reusing the db object, when should I invoke the db.close() function? I saw the db.close() in every example. But shouldn't we keep it open if we want to reuse it?
EDIT:
As it's a topic about reusing, I'd also want to know how we can share the db in different functions/objects?
As the project grows bigger, I don't want to nest all the functions/objects in one big closure, but I also don't want to pass it to all the functions/objects.
What's a more elegant way to share it among the application?
The concept of "connection pooling" for database connections has been around for some time. It really is a common sense approach as when you consider it, establishing a connection to a database every time you wish to issue a query is very costly and you don't want to be doing that with the additional overhead involved.
So the general principle is there that you have an object handle ( db reference in this case ) that essentially goes and checks for which "pooled" connection it can use, and possibly if the current "pool" is fully utilized then and create another ( or a few others ) connection up to the pool limit in order to service the request.
The MongoClient class itself is just a constructor or "factory" type class whose purpose is to establish the connections and indeed the connection pool and return a handle to the database for later usage. So it is actually the connections created here that are managed for things such as replica set fail-over or possibly choosing another router instance from the available instances and generally handling the connections.
As such, the general practice in "long lived" applications is that "handle" is either globally available or able to be retrieved from an instance manager to give access to the available connections. This avoids the need to "establish" a new connection elsewhere in your code, which has already been stated as a costly operation.
You mention the "example" code which is often present through many such driver implementation manuals often or always calling db.close. But these are just examples and not intended as long running applications, and as such those examples tend to be "cycle complete" in that they show all of the "initialization", the "usage" of various methods, and finally the "cleanup" as the application exits.
Good application or ODM type implementations will typically have a way to setup connections, share the pool and then gracefully cleanup when the application finally exits. You might write your code just like "manual page" examples for small scripts, but for a larger long running application you are probably going to implement code to "clean up" your connections as your actual application exits.

Resources