Working with the Node-MySql module:
From my understanding multithreaded programs benefit more from pooling connections than singlethreaded ones. Is this true?
And if this logic proves true, what scenario is connection pooling beneficial in a Node.JS application?
Whether single or multithreaded, pooling can still be beneficial in allowing open connections to be reused rather than being closed only to open another immediately after:
When you are done with a connection, just call connection.release() and the connection will return to the pool, ready to be used again by someone else.
The added benefit with multithreading is that the pool can also manage multiple, concurrent connections:
Connections are lazily created by the pool. If you configure the pool to allow up to 100 connections, but only ever use 5 simultaneously, only 5 connections will be made.
Though, to be clear, Node is multithreaded. It just uses a different model than seems to be typical -- 1 "application" thread which executes JavaScript and multiple "worker" threads handling the brunt of asynchronous I/O.
Related
I implemented a connection pool (poolMin=poolMax=10) with node-oracledb and i saw a difference of up to 100 times especially in case of few users like 10. Really impressive. I also increased UV_THREADPOOL_SIZE like 4 + poolMax. At this point I could not understand somethings.
process.env.UV_THREADPOOL_SIZE = 4 + config.pool.poolMax // Default + Max
NodeJs works as Single Thread (with additional 4 threads those none of them are used for network I/O). So when i use a pool with 10 connections, can Single Thread use all of these connections? Or isn't it Single Thread with these settings anymore? Because i added 10 more to UV_THREADPOOL_SIZE. I would be grateful to anyone who explained this matter.
Btw, I wonder if using a fixed number pool like 10 would cause a problem in case of too many users? For example, if the number of instant users is 500, we can reach 5000 instant users on certain days of the year. Do I need to make a special setting (e.g. pool size 100) for those days or will the default be enough?
Thanks in advace.
When you do something like connection.execute(), that work will handled by a Node.js worker thread until the call completes. And each underlaying Oracle connection can only ever do one 'thing' (like execute, or fetch LOB data) at a time - this is a fundamental (i.e. insurmountable) behavior of Oracle connections.
For node-oracledb you want the number of worker threads to be at least as big as the number of connections in the connection pool, plus some extra for non database work. This allows connections to do their thing without blocking any other connection.
Any use of Promise.all() (and similar constructs) using a single connection should be assessed and considered for rewriting as a simple loop. Prior to node-oracledb 5.2, each of the 'parallel' operations for Promise.all() on a single connection will use a thread but this will be blocked waiting for prior work on the connection to complete, so you might need even more threads available. From 5.2 onwards any 'parallel' operations on a single connection will be queued in the JavaScript layer of node-oracledb and will be executed sequentially, so you will only need a worker thread per connection at most. In either version, using Promise.all() where each unit of work has its own connection is different, and only subject to the one-connection per thread requirements.
Check the node-oracledb documentation Connections, Threads, and Parallelism and Connection Pool Sizing.
Separate to how connections are used, first you have to get a connection. Node-oracledb will queue connection pool requests (e.g. pool.getConnection()) if every connection in the pool is already in use. This provides some resiliency under connection spikes. There are some limits to help real storms: queueMax and queueTimeout. Yes, at peak periods you might need to increase the poolMax value. You can check the pool statistics to see pool behavior. You don't want to make the pool too big - see the doc.
Side note: process.env.UV_THREADPOOL_SIZE doesn't have an effect in Node.js on Windows; the UV_THREADPOOL_SIZE variable must be set before Node.js is started.
Using worker_threads from node 12, is it suitable to establish remote connection within the workers and keep those connection alive ?
I don't mean sharing the socket between the master and the workers like we could do with node cluster and fork.
The idea would be to have pools of secure connections already established within the workers to use if needed.
Let say I have a pool of 10 workers. When a worker is created, some pre-established "TLS" connection are created (streams) to server X,Y amd Z, and the worker is marked as "ready"
Each time that I use a worker to process "heavy" tasks (mapReduce, etc, ) and if I need to post data or get data to/from server X,Y or Z during the process,
I use the appropriate "TLS" connection already established from the pool.
Once the task completed, the result is return to the master and the worker just execute a new/next tasks.
1 ) Do you see any side effect / impact of doing so ?
2 ) would it be better to have the pool of "TLS" connection on the "main thread" (master) . If "remote" data are needed within the workers during the tasks, use the "postMessage" method to communicate with the "master" ( and vice/versa ).
Thanks
Worker Threads do not work for remote connections. However, you can build your own system that would work similar using TLS sockets. In a case of such a system I would definitely recommend keeping these types of connections alive. There is a significant latency in setting up these connections, and having these connections active in memory, will use a minimum amount of resources.
Keep in mind that a system like this has some drawbacks:
You are working with different machines, and each of these machines can have its own set of failure conditions.
You are communicating over a network, connections with remote servers might suddenly drop, for any reason imaginable.
You are increasing the physical distance, this will cause latency.
So keep this in the back of your mind.
Would I recommend building a system like this. It is really hard to determine and it relies on your use case, time and money. You mentioned the cluster nodes are processing 'heavy tasks', and with that I reckon CPU / GPU intensive tasks. So a system like this might be a good solution, however, a simple rest API in front of your processing servers might be good enough. Or maybe even database synchronized servers, that just check the database for tasks to execute.
There are many solutions for the same problem, just have to consider what works best for your project(s).
Node.js is single threaded. The Javascript V8 engine and some of the internal libraries are multi threaded. For I/O, node delegates I/O to OS which may be multi-threaded.
If my node.js application is connecting to redis or ,sql/mariadb server, I assume I should not need a connection pool for redis or mysql.
As a developer, I create 1 redis or mysql connection and reuse it to send/get data. When data arrives, node will invoke the callback to process the data.
I understand connection pooling with Java/.NET but they are multi-threaded and so connection pooling in Java/.NET has clear benefit.
My question is: Why do we need connection pool in node.js when node is single threaded? Is there any benefit of it? Will node not leverage the multi-threading features of underlying OS and javascript engine without the developer having to do it?
Thanks
Node runs your code single threaded. However, Node.js actually has a thread pool at its disposal that your code does not have access to. The threading mechanisms are implemented with libuv. Take a look at libuv book its in-depth and explains the inner workings of libuv.
Basically, your code is run within the context of the Event Loop (single thread). Any asynchronous work is then offloaded to an available thread from the pool and the Event Loop will just poll until that asynchronous work is completed by one of the threads. Once done, the callback function registered by your async call is then invoked and will get worked on during the next I/O Callback Phase of the event loop. You can read more about the Event Loop and its phases in the Node.js docs.
One of the benefits of building an application with this Event Loop style, is the abstraction of critical section coding (mutexs, semaphores, etc.) that is normally associated with multi-threaded applications.
You need connection pools because even a single thread can hold multiple "blocking" DB connections (assuming RDBMS here). Without a connection pool, your app will create each connection from scratch for every additional DB request, even in a async / non-blocking system like Node.
Example:
request 1 - insert user -- wait for response (assume it's 5 secs)
request 2 - insert invoice - wait for response (assume it's 3 secs)
request 3 - insert another invoice
Notice that request 3 is processed right away, without waiting for request 1 and 2 to complete. Right here in this single thread, we've already used three resources to connect to the DB. Imagine having to create each one every time you need a DB operation. It's much faster to just grab one from a connection pool!
We have a set of micro-services that I'd like to load test in a manner that is consistent with how they are accessed.
After settling on Locust as my tool of choice, I found out that the TCP connection underpinning has connection pooling because I keep seeing messages like these:
WARNING/requests.packages.urllib3.connectionpool: Connection pool is full, discarding connection:
As I understand it, this message is telling me that it discards a connection from the pool that it manages. I assume that it still creates a new connection, and adds it in the place of the one that it discarded.
Is that what it does?
Does it do this without the connection failing?
I don't think that our micro-services keep any sessions open. The connections are made, from a far end, to our services, which provide a result, and then the connection is closed. So, the test is handling the connections in a way that's different than the services are used. Is there a way to get the requests lib to not use a pool, and go through the work of setting up and tearing down all connections made through it, each time?
Is there any reason why we wouldn't want to test this way?
If it is preferable to test with a connection pool, how should I anticipate the difference in load when it's not done this way in production?
That's correct. Unless you set the urllib3 pool to blocking, it will generate more connections than the pool is configured to hold, as needed, and then will discard them once the request is done.
This often happens when you have more threads using a pool than the number of connections the pool is configured to store. urllib3 takes a maxsize parameter (defaults to 1) which you can set to the number of threads you're running. For requests, you'll need to make a custom adapter to do this. See:
https://stackoverflow.com/a/18845952/187878
https://laike9m.com/blog/requests-secret-pool_connections-and-pool_maxsize,89/
That said, it's merely a warning which some people ignore, so it's not a failure. But if this happens a lot in production, that probably means you should tweak your configuration because creating/discarding new connections all the time is fairly costly.
In general, it's a good idea to re-use connections for this reason.
My suggestions would be in this order:
Re-use connections, or
Increase the number of connections that get pooled to match the number of threads, or
Disable the warning if you'd rather not deal with it.
I noticed in the Mongoose docs that there is support for a connection pool.
http://mongoosejs.com/docs/connections.html
Considering that node is single threaded why is there a connection pool?
What's the lifecycle of connections in the pool?
Connection pools don't have anything to do with async vs sync -- it just works like so:
You can specify an amount of open connections to maintain to your database (let's say 10).
Each time your Node JS code makes a query, if possible, it'll use one of the already-open 10 connections to make this request -- this way you can avoid the overhead of opening a new database connection for each query.
Maintaining a connection pool is essentially maintaining an array of db connection objects, and picking unused ones for every query. It's not actually effecting threads or processes at all =)
apparently node is a single threaded but internally when node makes a call to IO operation under the hood it has some threading mechanism through which it performs IO. Main thread doesn't perform this IO operation itself, if it was performing IO, system would be dead already.
https://codeburst.io/how-node-js-single-thread-mechanism-work-understanding-event-loop-in-nodejs-230f7440b0ea