Mongoose Connection Pool - node.js

I noticed in the Mongoose docs that there is support for a connection pool.
http://mongoosejs.com/docs/connections.html
Considering that node is single threaded why is there a connection pool?
What's the lifecycle of connections in the pool?

Connection pools don't have anything to do with async vs sync -- it just works like so:
You can specify an amount of open connections to maintain to your database (let's say 10).
Each time your Node JS code makes a query, if possible, it'll use one of the already-open 10 connections to make this request -- this way you can avoid the overhead of opening a new database connection for each query.
Maintaining a connection pool is essentially maintaining an array of db connection objects, and picking unused ones for every query. It's not actually effecting threads or processes at all =)

apparently node is a single threaded but internally when node makes a call to IO operation under the hood it has some threading mechanism through which it performs IO. Main thread doesn't perform this IO operation itself, if it was performing IO, system would be dead already.
https://codeburst.io/how-node-js-single-thread-mechanism-work-understanding-event-loop-in-nodejs-230f7440b0ea

Related

How many session will create using single pool?

I am using Knex version 0.21.15 npm. my pooling parameter is pool {min: 3 , max:300}.
Oracle is my data base server.
pool Is this pool count or session count?
If it is pool, how many sessions can create using a single pool?
If i run one non transaction query 10 time using knex connection ,how many sessions will create?
And when the created session will cleared from oracle session?
Is there any parameter available to remove the idle session from oracle.?
suggest me please if any.
WARNING: a pool.max value of 300 is far too large. You really don't want the database administrator running your Oracle server to distrust you: that can make your work life much more difficult. And such a large max pool size can bring the Oracle server to its knees.
It's a paradox: often you can get better throughput from a database application by reducing the pool size. That's because many concurrent queries can clog the database system.
The pool object here governs how many connections may be in the pool at once. Each connection is a so-called serially reusable resource. That is, when some part of your nodejs program needs to run a query or series of queries, it grabs a connection from the pool. If no connection is already available in the pool, the pooling stuff in knex opens a new one.
If the number of open connections is already at the pool.max value, the pooling stuff makes that part of your nodejs program wait until some other part of the program finishes using a connection in the pool.
When your part of the nodejs program finishes its queries, it releases the connection back to the pool to be reused when some other part of the program needs it.
This is almost absurdly complex. Why bother? Because it's expensive to open connections and much cheaper to re-use them.
Now to your questions:
pool Is this pool count or session count?
It is a pair of limits (min / max) on the count of connections (sessions) open within the pool at one time.
If it is pool, how many sessions can create using a single pool?
Up to the pool.max value.
If i run one non transaction query 10 time using knex connection ,how many sessions will create?
It depends on concurrency. If your tenth query before the first one completes, you may use ten connections from the pool. But you will most likely use fewer than that.
And when the created session will cleared from oracle session?
As mentioned, the pool keeps up to pool.max connections open. That's why 300 is too many.
Is there any parameter available to remove the idle session from oracle.?
This operation is called "evicting" connections from the pool. knex does not support this. Oracle itself may drop idle connections after a timeout. Ask your DBA about that.
In the meantime, use the knex defaults of pool: {min: 2, max: 10} unless and until you really understand pooling and the required concurrency of your application. max:300 would only be justified under very special circumstances.

Why do we need connection pool in node.js when node is single threaded?

Node.js is single threaded. The Javascript V8 engine and some of the internal libraries are multi threaded. For I/O, node delegates I/O to OS which may be multi-threaded.
If my node.js application is connecting to redis or ,sql/mariadb server, I assume I should not need a connection pool for redis or mysql.
As a developer, I create 1 redis or mysql connection and reuse it to send/get data. When data arrives, node will invoke the callback to process the data.
I understand connection pooling with Java/.NET but they are multi-threaded and so connection pooling in Java/.NET has clear benefit.
My question is: Why do we need connection pool in node.js when node is single threaded? Is there any benefit of it? Will node not leverage the multi-threading features of underlying OS and javascript engine without the developer having to do it?
Thanks
Node runs your code single threaded. However, Node.js actually has a thread pool at its disposal that your code does not have access to. The threading mechanisms are implemented with libuv. Take a look at libuv book its in-depth and explains the inner workings of libuv.
Basically, your code is run within the context of the Event Loop (single thread). Any asynchronous work is then offloaded to an available thread from the pool and the Event Loop will just poll until that asynchronous work is completed by one of the threads. Once done, the callback function registered by your async call is then invoked and will get worked on during the next I/O Callback Phase of the event loop. You can read more about the Event Loop and its phases in the Node.js docs.
One of the benefits of building an application with this Event Loop style, is the abstraction of critical section coding (mutexs, semaphores, etc.) that is normally associated with multi-threaded applications.
You need connection pools because even a single thread can hold multiple "blocking" DB connections (assuming RDBMS here). Without a connection pool, your app will create each connection from scratch for every additional DB request, even in a async / non-blocking system like Node.
Example:
request 1 - insert user -- wait for response (assume it's 5 secs)
request 2 - insert invoice - wait for response (assume it's 3 secs)
request 3 - insert another invoice
Notice that request 3 is processed right away, without waiting for request 1 and 2 to complete. Right here in this single thread, we've already used three resources to connect to the DB. Imagine having to create each one every time you need a DB operation. It's much faster to just grab one from a connection pool!

Check queued reads/writes for MongoDB

I feel like this question would have been asked before, but I can't find one. Pardon me if this is a repeat.
I'm building a service on Node.js hosted in Heroku and using MongoDB hosted by Compose. Under heavy load, the latency is most likely to come from the database, as there is nothing very CPU-heavy in the service layer. Thus, when MongoDB is overloaded, I want to return an HTTP 503 promptly instead of waiting for a timeout.
I'm also using REDIS, and REDIS has a feature where you can check the number of queued commands (redisClient.command_queue.length). With this feature, I can know right away if REDIS is backed up. Is there something similar for MongoDB?
The best option I have found so far is polling the server for status via this command, but (1) I'm hoping for something client side, as there could be spikes within the polling interval that cause problems, and (2) I'm not actually sure what part of the status response I want to act on. That second part brings me to a follow up question...
I don't fully understand how the MondoDB client works with the server. Is one connection shared per client instance (and in my case, per process)? Are queries and writes queued locally or on the server? Or, is one connection opened for each query/write, until the database's connection pool is exhausted? If the latter is the case, it seems like I might want to keep an eye on the open connections. Does the MongoDB server return such information at other times, besides when polled for status?
Thanks!
MongoDB connection pool workflow-
Every MongoClient instance has a built-in connection pool. The client opens sockets on demand to support the number of concurrent MongoDB operations your application requires. There is no thread-affinity for sockets.
The client instance, opens one additional socket per server in your MongoDB topology for monitoring the server’s state.
The size of each connection pool is capped at maxPoolSize, which defaults to 100.
When a thread in your application begins an operation on MongoDB, if all other sockets are in use and the pool has reached its maximum, the thread pauses, waiting for a socket to be returned to the pool by another thread.
You can increase maxPoolSize:
client = MongoClient(host, port, maxPoolSize=200)
By default, any number of threads are allowed to wait for sockets to become available, and they can wait any length of time. Override waitQueueMultiple to cap the number of waiting threads. E.g., to keep the number of waiters less than or equal to 500:
client = MongoClient(host, port, maxPoolSize=50, waitQueueMultiple=10)
Once the pool reaches its max size, additional threads are allowed to wait indefinitely for sockets to become available, unless you set waitQueueTimeoutMS:
client = MongoClient(host, port, waitQueueTimeoutMS=100)
Reference for connection pooling-
http://blog.mongolab.com/2013/11/deep-dive-into-connection-pooling/

When should I use connection pooling with MySQL in NodeJS

Working with the Node-MySql module:
From my understanding multithreaded programs benefit more from pooling connections than singlethreaded ones. Is this true?
And if this logic proves true, what scenario is connection pooling beneficial in a Node.JS application?
Whether single or multithreaded, pooling can still be beneficial in allowing open connections to be reused rather than being closed only to open another immediately after:
When you are done with a connection, just call connection.release() and the connection will return to the pool, ready to be used again by someone else.
The added benefit with multithreading is that the pool can also manage multiple, concurrent connections:
Connections are lazily created by the pool. If you configure the pool to allow up to 100 connections, but only ever use 5 simultaneously, only 5 connections will be made.
Though, to be clear, Node is multithreaded. It just uses a different model than seems to be typical -- 1 "application" thread which executes JavaScript and multiple "worker" threads handling the brunt of asynchronous I/O.

JDBC: Can I share a connection in a multithreading app, and enjoy nice transactions?

It seems like the classical way to handle transactions with JDBC is to set auto-commit to false. This creates a new transaction, and each call to commit marks the beginning the next transactions.
On multithreading app, I understand that it is common practice to open a new connection for each thread.
I am writing a RMI based multi-client server application, so that basically my server is seamlessly spawning one thread for each new connection.
To handle transactions correctly should I go and create a new connection for each of those thread ?
Isn't the cost of such an architecture prohibitive?
Yes, in general you need to create a new connection for each thread. You don't have control over how the operating system timeslices execution of threads (notwithstanding defining your own critical sections), so you could inadvertently have multiple threads trying to send data down that one pipe.
Note the same applies to any network communications. If you had two threads trying to share one socket with an HTTP connection, for instance.
Thread 1 makes a request
Thread 2 makes a request
Thread 1 reads bytes from the socket, unwittingly reading the response from thread 2's request
If you wrapped all your transactions in critical sections, and therefore lock out any other threads for an entire begin/commit cycle, then you might be able to share a database connection between threads. But I wouldn't do that even then, unless you really have innate knowledge of the JDBC protocol.
If most of your threads have infrequent need for database connections (or no need at all), you might be able to designate one thread to do your database work, and have other threads queue their requests to that one thread. That would reduce the overhead of so many connections. But you'll have to figure out how to manage connections per thread in your environment (or ask another specific question about that on StackOverflow).
update: To answer your question in the comment, most database brands don't support multiple concurrent transactions on a single connection (InterBase/Firebird is the only exception I know of).
It'd be nice to have a separate transaction object, and to be able to start and commit multiple transactions per connection. But vendors simply don't support it.
Likewise, standard vendor-independent APIs like JDBC and ODBC make the same assumption, that transaction state is merely a property of the connection object.
It's uncommon practice to open a new connection for each thread.
Usually you use a connection pool like c3po library.
If you are in an application server, or using Hibernate for example, look at the documentation and you will find how to configure the connection pool.
The same connection object can be used to create multiple statement objects and these statement objects can then used by different threads concurrently. Most modern DBs interfaced by JDBC can do that. The JDBC is thus able to make use of concurrent cursors as follows. PostgreSQL is no exception here, see for example:
http://doc.postgresintl.com/jdbc/ch10.html
This allows connection pooling where the connection are only used for a short time, namely to created the statement object and but after that returned to the pool. This short time pooling is only recommended when the JDBC connection does also parallelization of statement operations, otherwise normal connection pooling might show better results. Anyhow the thread can continue work with the statement object and close it later, but not the connection.
1. Thread 1 opens statement
3. Thread 2 opens statement
4. Thread 1 does something Thread 2 does something
5. ... ...
6. Thread 1 closes statement ...
7. Thread 2 closes statement
The above only works in auto commit mode. If transactions are needed there is still no need to tie the transaction to a thread. You can just partition the pooling along the transactions that is all and use the same approach as above. But this is only needed not because of some socket connection limitation but because the JDBC then equates the session ID with the transaction ID.
If I remember well there should be APIs and products around with a less simplistic design, where teh session ID and the transaction ID are not equated. In this APIs you could write your server with one single database connection object, even when it does
transactions. Will need to check and tell you later what this APIs and products are.

Resources