MongoDB Performance when connecting to multiple databases via parent-child connections - node.js

When connecting to a mongo server containing multiple dbs, what is more performant approach using node-mongodb-native driver.
Let's say I have 8 dbs(db1...db8) on the same Mongo Server. My node app needs to connect to all 8 depending on the queries received to it. What is a better option here for me
1) Create 8 separate connections (1 with each db)
OR
2) Create one parent connection to the server on test db and then call db.db 8 times to create 8 child connections under that parent. As I read in the doc(http://mongodb.github.io/node-mongodb-native/2.0/api/Db.html#db), all 8 child connections will be running on the same socket
Has anyone researched into this or has some background or thoughts that can help me determine the right course of action?

How granular is MongoDB concurrency?: this depends on the version. Since MongoDB 3 many operations lock on the document. Earlier versions would apply a lock on the entire collection. Some operations still lock on the entire instance (aka server). This means that sometimes an operation (likely operations involving multiple databases) can block an entire instance affecting all databases within it. https://docs.mongodb.com/manual/faq/concurrency/#how-granular-are-locks-in-mongodb
Threading model: node.js is asynchronous while MongoDB is not. MongoDB will use one thread per socket. If you perceive operations are blocking each other you should keep seperate connection pools. http://mongodb.github.io/node-mongodb-native/2.2/reference/faq/

Related

How many session will create using single pool?

I am using Knex version 0.21.15 npm. my pooling parameter is pool {min: 3 , max:300}.
Oracle is my data base server.
pool Is this pool count or session count?
If it is pool, how many sessions can create using a single pool?
If i run one non transaction query 10 time using knex connection ,how many sessions will create?
And when the created session will cleared from oracle session?
Is there any parameter available to remove the idle session from oracle.?
suggest me please if any.
WARNING: a pool.max value of 300 is far too large. You really don't want the database administrator running your Oracle server to distrust you: that can make your work life much more difficult. And such a large max pool size can bring the Oracle server to its knees.
It's a paradox: often you can get better throughput from a database application by reducing the pool size. That's because many concurrent queries can clog the database system.
The pool object here governs how many connections may be in the pool at once. Each connection is a so-called serially reusable resource. That is, when some part of your nodejs program needs to run a query or series of queries, it grabs a connection from the pool. If no connection is already available in the pool, the pooling stuff in knex opens a new one.
If the number of open connections is already at the pool.max value, the pooling stuff makes that part of your nodejs program wait until some other part of the program finishes using a connection in the pool.
When your part of the nodejs program finishes its queries, it releases the connection back to the pool to be reused when some other part of the program needs it.
This is almost absurdly complex. Why bother? Because it's expensive to open connections and much cheaper to re-use them.
Now to your questions:
pool Is this pool count or session count?
It is a pair of limits (min / max) on the count of connections (sessions) open within the pool at one time.
If it is pool, how many sessions can create using a single pool?
Up to the pool.max value.
If i run one non transaction query 10 time using knex connection ,how many sessions will create?
It depends on concurrency. If your tenth query before the first one completes, you may use ten connections from the pool. But you will most likely use fewer than that.
And when the created session will cleared from oracle session?
As mentioned, the pool keeps up to pool.max connections open. That's why 300 is too many.
Is there any parameter available to remove the idle session from oracle.?
This operation is called "evicting" connections from the pool. knex does not support this. Oracle itself may drop idle connections after a timeout. Ask your DBA about that.
In the meantime, use the knex defaults of pool: {min: 2, max: 10} unless and until you really understand pooling and the required concurrency of your application. max:300 would only be justified under very special circumstances.

Connection pool using pg-promise

I'm using Node js and Postgresql and trying to be most efficient in the connections implementation.
I saw that pg-promise is built on top of node-postgres and node-postgres uses pg-pool to manage pooling.
I also read that "more than 100 clients at a time is a very bad thing" (node-postgres).
I'm using pg-promise and wanted to know:
what is the recommended poolSize for a very big load of data.
what happens if poolSize = 100 and the application gets 101 request simultaneously (or even more)?
Does Postgres handles the order and makes the 101 request wait until it can run it?
I'm the author of pg-promise.
I'm using Node js and Postgresql and trying to be most efficient in the connections implementation.
There are several levels of optimization for database communications. The most important of them is to minimize the number of queries per HTTP request, because IO is expensive, so is the connection pool.
If you have to execute more than one query per HTTP request, always use tasks, via method task.
If your task requires a transaction, execute it as a transaction, via method tx.
If you need to do multiple inserts or updates, always use multi-row operations. See Multi-row insert with pg-promise and PostgreSQL multi-row updates in Node.js.
I saw that pg-promise is built on top of node-postgres and node-postgres uses pg-pool to manage pooling.
node-postgres started using pg-pool from version 6.x, while pg-promise remains on version 5.x which uses the internal connection pool implementation. Here's the reason why.
I also read that "more than 100 clients at a time is a very bad thing"
My long practice in this area suggests: If you cannot fit your service into a pool of 20 connections, you will not be saved by going for more connections, you will need to fix your implementation instead. Also, by going over 20 you start putting additional strain on the CPU, and that translates into further slow-down.
what is the recommended poolSize for a very big load of data.
The size of the data got nothing to do with the size of the pool. You typically use just one connection for a single download or upload, no matter how large. Unless your implementation is wrong and you end up using more than one connection, then you need to fix it, if you want your app to be scalable.
what happens if poolSize = 100 and the application gets 101 request simultaneously
It will wait for the next available connection.
See also:
Chaining Queries
Performance Boost
what happens if poolSize = 100 and the application gets 101 request simultaneously (or even more)? Does Postgres handles the order and makes the 101 request wait until it can run it?
Right, the request will be queued. But it's not handled by Postgres itself, but by your app (pg-pool). So whenever you run out of free connections, the app will wait for a connection to release, and then the next pending request will be performed. That's what pools are for.
what is the recommended poolSize for a very big load of data.
It really depends on many factors, and no one will really tell you the exact number. Why not test your app under huge load and see in practise how it performs, and find the bottlenecks.
Also I find the node-postgres documentation quite confusing and misleading on the matter:
Once you get >100 simultaneous requests your web server will attempt to open 100 connections to the PostgreSQL backend and 💥 you'll run out of memory on the PostgreSQL server, your database will become unresponsive, your app will seem to hang, and everything will break. Boooo!
https://github.com/brianc/node-postgres
It's not quite true. If you reach the connection limit at Postgres side, you simply won't be able to establish a new connection until any previous connection is closed. Nothing will break, if you handle this situation in your node app.

Mongoose Connection Pool

I noticed in the Mongoose docs that there is support for a connection pool.
http://mongoosejs.com/docs/connections.html
Considering that node is single threaded why is there a connection pool?
What's the lifecycle of connections in the pool?
Connection pools don't have anything to do with async vs sync -- it just works like so:
You can specify an amount of open connections to maintain to your database (let's say 10).
Each time your Node JS code makes a query, if possible, it'll use one of the already-open 10 connections to make this request -- this way you can avoid the overhead of opening a new database connection for each query.
Maintaining a connection pool is essentially maintaining an array of db connection objects, and picking unused ones for every query. It's not actually effecting threads or processes at all =)
apparently node is a single threaded but internally when node makes a call to IO operation under the hood it has some threading mechanism through which it performs IO. Main thread doesn't perform this IO operation itself, if it was performing IO, system would be dead already.
https://codeburst.io/how-node-js-single-thread-mechanism-work-understanding-event-loop-in-nodejs-230f7440b0ea

Why are database connection pools better than a single connection?

I'm currently working on writing a multithreaded application that will need to access a database in order to serve requests. I see many people saying that using a pool of many persistent database connections is the way to go for this type of application, but I'm trying to wrap my head around why exactly this is the case.
Keep in mind that I'm designing this application in Erlang, so I'll be using threads/processes/workers a lot.
So let's compare two situations:
You have a single thread that owns a single database connection. All your client-handling-threads talk to this thread in order to make database queries.
You have a pool of threads, each with their own database connection. When a client-handling-thread wants to access the database, it gets one of these threads from the pool, and uses that to query the DB.
In the first case, I see many people saying that it is bad because having one thread handling all database related queries will in turn cause a bottleneck. But my confusion is the following: Wouldn't the bottleneck in that single thread actually be the database itself? If all that the thread is doing is querying the database through its connection handle, isn't waiting for the DB to respond to requests the main source of latency? How will throwing more connections threads at this problem solve it?
The database probably has well-developed multithreading abilities. Using a connection pool allows:
Make use of the DB's multithreading / load-balancing ability
Avoid the overhead of setting up and tearing down connections over and over
When the database is serving multiple connections, it can make its own decisions on how to prioritize requests. Imagine this scenario:
User A requests a set of records from Table A with 100,000 rows
User B requests a set of records from Table B with 50 rows
User C updates Table A
If multiple connections are used, the DB can take advantage of the fact that (1) and (2) can occur concurrently, and User B gets his 50 records without having to wait for User A to get all 100,000 of his. Only User C has to wait for User A to finish.
Also, setting up and tearing down TCP connections is a relatively expensive task. Using a pool allows one user to release the resource without tearing down the TCP connection, so the next user doesn't have to wait for a new connection. Your single-threaded approach wouldn't benefit from this aspect of connection-pooling, though.

replicaset vs multi-mongos vs multiple connections

what is the difference and why use each of this features of mongoose?
for now I just need a method to transfer a document from one database to another.
Replica-Set
A replica-set are two or more MongoDB servers which mirror the same data. Reads can be served by any member of the set, but writes can only be handled by a single server (the "Master" or "Primary").
An application can only connect to the replica-set members it knows, so you need to tell it the hostnames and ports of all of them. There are cases where you want to restrict an application to specific members. In that case you wouldn't tell them about the other servers.
Multiple mongos
Another feature to scale MongoDB on multiple servers is sharding. A sharded cluster consists of multiple replica-sets or stand-alone MongoDB servers where each one has only a part of the data. This improves both read- and write performance but is technically more complex. When an application wants to connect to a cluster, it doesn't connect to the MongoDB processes directly. Each connection goes through a MongoDB router instead (mongos) which forwards each query to the mongod's who are responsible for it. For increased performance and redundancy, a cluster can have multiple mongos servers. When this is the case, the clients should pick one at random for each connection.
Multiple connections
When your application opens multiple connections to the database, it can perform multiple requests in parallel. Usually the database driver should do this automatically, so you don't have to worry about this, unless you need to connect to multiple databases at the same time or you need connections with different connection settings for some reason.

Resources