Use connection pool with MongoEngine - python-3.x

I have documents in different MongoDB databases referencing each other (mongoengine's LazyRefereneceField), so each time I need to get the field's value, I need to connect and disconnect from the field's relevant database, which I find very inefficient.
I've read about connection pooling, but I can't find a solution on how to implement it using MongoEngine. How can I create a connection pool and reuse connections from it every time I need to the value for a LazyReferenceField?

MongoEngine is managing the connection globally (i.e once connected, it auto-magically re-use that connection), usually you call connect just once, when the application/script starts and then you are good to go, and don't need to interfere with the connection.
LazyReferenceField is not different from any other field (ReferenceField, StringField, etc) in that context. The only difference is that it's not doing the de-referencing immediatly but only when you explicitly request it with the .fetch method

Related

N number of Sequelize connections to dynamically query via API calls

I'm looking to get some opinions on what the best approach is for the following scenario:
Our product requires connections to our users' Postgres databases via our Node Express server, they provide their credentials once and we store it in an encrypted way in our internal operations DB and can reference to it when access is needed. A user can do an action on our app UI like create a table, delete a table, etc. and view table sizes, min max values of a column, etc.
These actions comes to our server as authenticated API calls and we can query their databases via Sequelize as needed and return the results to frontend.
My question is, when there are N number of users with N number of databases from different SQL instances that need to be connected when an API is called to query the respective database, what is the best approach to maintain that?
Should we create a new Sequelize connection instance each time an API is called and run the query, return the response, and close the connection. Or create a new Sequelize connection instance for a DB when an API is called, and keep the instance for certain amount of time, and close the connection if it was inactive during that amount of time, and restart the instance next time?
If there are better and more efficient ways of doing this, I would love to hear about it. Thanks.
Currently, I've tried to do a new Sequelize instance each time at the beginning of the API request, and run the query, and then close the connection. Works ok, but that's just locally with 2 DBs so I can't tell what production would be like.
Edit: Anatoly suggested connection pool, in that case, what're the things that need to be considered for the config?

Should I keep Sequelize instance throughout server running time?

I have a Sequelize instance and it is exported in a file to be accessed when doing DB operations.
const sequelize = new Sequelize('database', 'username', null, {
dialect: 'mysql'
});
module.exports = sequelize;
So the instance is created when the expressjs server starts and never destroys. I wonder if this is the correct way to do, or should I call new Sequelize every time I use the DB operation?
I think it should be kept alive because that's how DB pooling could take effect. Right?
The bottom line is - yes, it should stay alive. There is no performance hit whatsoever if you keep the instance alive. Because it will be the Sequelize instance (and by extension the ORM) which will handle the future connections. This also includes (as you noted) pooling.
The connections
When it comes to the pooling configuration itself, it get's a little tricky though. Depending on your configuration, the pool has some amount of "space" to work with - the limit to create connections, the idle duration after which connections are removed etc. I can certainly imagine a situation when keeping a connection alive is simply not needed - for example an internal system for a company which is not used overnight.
The Sequelize ORM gives you a good set of options to choose from when configuring your connection pool. In general, you do want to reuse connections as establishing new ones is quite expensive - not just because of the network (e.g. authorization, maybe proxy etc.) but also because of memory allocation which happens when you create a database connection (which is why reconnecting on every request is not a good idea..).
However, it all comes down to which database engine you use (and how busy your system is); MySQL can for example cache connections. When a connection is closed, it is returned to the thread cache rather than discarded (for some period of time). When a new connection opens, MySQL will look into the thread cache rather than try and establish a new connection.
You might want to go through these:
https://stackoverflow.com/a/4041136/8775880 (Good explanation on how pooling works)
https://stackoverflow.com/a/11659275/8775880 (Good explanation on how expensive it is to keep connections open)

Programmatically create multiple connections for TcpNetClientConnectionFactory

Continuing the conversation from
this question:
Two-part question here:
Can a TcpNetClientConnectionFactory have multiple connections to its upstream
server, if the host and port are the same?
If so, how can I programmatically build a new connection for that connection
factory? I see the buildNewConnection method, but it is protected.
The first connection is automatically built as soon as the first Message
passes through the factory. What we need to do is notice when following Messages
have a different ip_connectionId, stand up a new connection, and route those
Messages to that new connection. Obviously, Messages with the original
ip_connectionId would still be routed to the original connection.
Not sure whether it would be better to create multiple connections off of one
connection factory, or create a whole new connection factory, sending message
handler, and receiving channel adapter for each new connection.
If the inbound connection factory is a TcpNetServerConnectionFactory, you can simply use a ThreadAffinityClientConnectionFactory because each inbound connection gets its own thread.
You would call getConnection(). This will bind the connection to the thread (and you can obtain the connection id from it), but you don't really need to map the header in this direction because of the thread affinity, you would only have to map on the return path.
Bear in mind, though, if the ThreadAffinityClientConnectionFactory detects that a connection has been closed, it will create a new one. So, you might want to call getConnection() in your mapper on each call. However, there would still be a race condition, so you might also need to listen for TcpConnectionCloseEvents and TcpConnectionOpenEvents.
If you use NIO on the inbound, or otherwise hand off the work to other threads via an executor, this won't work.
In that case you would need your own wrapping connection factory - you could use the ThreadAffinityClientConnectionFactory as a model, but instead of storing the connections in a ThreadLocal, you'd store them in a map. But you'd still need a ThreadLocal (set upstream on each call) to tell the factory which connection to hand out when the adapter asks for one.
There's a trick you need to be aware of, however.
There is a property singleUse on the connection factory. This serves 2 purposes;
first, it tells the factory to create a new connection each time getConnection() is called instead of a single, shared, connection
second, it tells the inbound adapter to close the connection after the reply is received
So the trick is you need singleUse=true on the real factory (so it gives you a new connection each time getConnection() is called), but singleUse=false on the wrapping factory so the adapters don't close the connection.
I suggest you look at the ThreadAffinityClientConnectionFactory and CachingClientConnectionFactory connection factory to see how they work.
We should probably consider splitting this into two booleans; we could probably also make some improvements to avoid the need for a thread local by adding something like getConnection(String connectionId) to the client factory contract and have the factory look up the connection internally; but that will require work in the adapters.
I'll capture an issue for this and see if we can get something in 5.2.
Rather a long answer, but I hope it makes sense.

reuse mongodb connection and close it

I'm using the Node native client 1.4 in my application and I found something in the document a little bit confusing:
A Connection Pool is a cache of database connections maintained by the driver so that connections can be re-used when new connections to the database are required. To reduce the number of connection pools created by your application, we recommend calling MongoClient.connect once and reusing the database variable returned by the callback:
Several questions come in mind when reading this:
Does it mean the db object also maintains the fail over feature provided by replica set? Which I thought should be the work of MongoClient (not sure about this but the C# driver document does say MongoClient maintains replica set stuff)
If I'm reusing the db object, when should I invoke the db.close() function? I saw the db.close() in every example. But shouldn't we keep it open if we want to reuse it?
EDIT:
As it's a topic about reusing, I'd also want to know how we can share the db in different functions/objects?
As the project grows bigger, I don't want to nest all the functions/objects in one big closure, but I also don't want to pass it to all the functions/objects.
What's a more elegant way to share it among the application?
The concept of "connection pooling" for database connections has been around for some time. It really is a common sense approach as when you consider it, establishing a connection to a database every time you wish to issue a query is very costly and you don't want to be doing that with the additional overhead involved.
So the general principle is there that you have an object handle ( db reference in this case ) that essentially goes and checks for which "pooled" connection it can use, and possibly if the current "pool" is fully utilized then and create another ( or a few others ) connection up to the pool limit in order to service the request.
The MongoClient class itself is just a constructor or "factory" type class whose purpose is to establish the connections and indeed the connection pool and return a handle to the database for later usage. So it is actually the connections created here that are managed for things such as replica set fail-over or possibly choosing another router instance from the available instances and generally handling the connections.
As such, the general practice in "long lived" applications is that "handle" is either globally available or able to be retrieved from an instance manager to give access to the available connections. This avoids the need to "establish" a new connection elsewhere in your code, which has already been stated as a costly operation.
You mention the "example" code which is often present through many such driver implementation manuals often or always calling db.close. But these are just examples and not intended as long running applications, and as such those examples tend to be "cycle complete" in that they show all of the "initialization", the "usage" of various methods, and finally the "cleanup" as the application exits.
Good application or ODM type implementations will typically have a way to setup connections, share the pool and then gracefully cleanup when the application finally exits. You might write your code just like "manual page" examples for small scripts, but for a larger long running application you are probably going to implement code to "clean up" your connections as your actual application exits.

ReactiveMongo Connection, keep connection object alive in the Play context or re-establish for each call to the database? (Play, Scala, ReactiveMongo)

I am just starting to use ReactiveMongo with Play 2 (scala).
Should I store a singleton object with the connection details and a return of the database (connection.get.db("mydb")) or keep the connection alive indefinitely.
I am used to JDBC connection pools so am unsure what the performant way to use ReactiveMongo and Mongo is.
Sorry if this is not very well formed question, I am fumbling in the dark a bit.
Thanks
From this documentation
http://reactivemongo.org/releases/0.10/api/index.html#reactivemongo.api.MongoDriver
there is optional parameter
nbChannelsPerNode Number of channels to open per node. Defaults to 10.
This looks like that the returned object (MongoConnection) is connection pool itself. So you should use it as singleton and not create a new instances for each request.

Resources