Redis number of connections vs Single Thread - multithreading

I read that Redis is Single Thread.
Using jedis client (java) we can configure pool connections, like as:
spring.redis.jedis.pool.max-active=8 # Maximum number of connections that can be allocated by the pool at a given time. Use a negative value for no limit.
spring.redis.jedis.pool.max-idle=8 # Maximum number of "idle" connections in the pool. Use a negative value to indicate an unlimited number of idle connections.
spring.redis.jedis.pool.max-wait=-1ms # Maximum amount of time a connection allocation should block before throwing an exception when the pool is exhausted. Use a negative value to block indefinitely.
spring.redis.jedis.pool.min-idle=0 # Target for the minimum number of idle connections to maintain in the pool. This setting only has an effect if it is positive.
I know that pool connection is important to get a connect already opened, therefore not spending time to connect again.
Imagine that 8 "client request" uses all available pools, so 8 connections will be used, these clients do a "GET" command in Redis.
Redis will be process one thread per time? Each thread needs wait for other to finish since Redis is Single Thread? In this case when 1 "GET" it is processing other 7 are in Redis queue?
How max-active impacts Redis performance if Redis is Single Thread?

Only main loop of Redis is single thread. It uses client buffers in separate threads to speed up processing. Most of the time you spend getting data from Redis is networking time. They help with that. This means unless you are storing really big objects, you will get almost linear throughput increase.
Here I measured different setups for Lettuce (another client). I expect 'pooled' setup to be very similar to what you will get with Jedis: Redis is single thread. Then why should I use lettuce?

Related

Should I use BLPOP, or yield-based busy waiting on hundreds of redis keys?

The former sounds good, but I have the following concerns:
blocking hundreds of connections might be wasteful, e.g., io multiplexing strategy used, and
the redis server should deal with more concurrent connections since each is long-running.
The alternative approach might be the latter:
instead of busy waiting indefinitely, yield the thread at every N-th iteration.
Note that the number of connections would increase in proportion to the number of instances. Outside the two, a fixed pool of BLPOP executors can be introduced, but that could easily be the bottleneck if some of the redis list is idle.
I/O multiplexing is not busy waiting
The BLPOP command is an actual command sent to Redis server which involves I/O multiplexing and not busy waiting in the client code
I don't see any point in busy waiting with or without thread yield
For an alternative I suggest using PUB/SUB function supported by Redis, just in case you need to wait on some other events beside list being empty

How many session will create using single pool?

I am using Knex version 0.21.15 npm. my pooling parameter is pool {min: 3 , max:300}.
Oracle is my data base server.
pool Is this pool count or session count?
If it is pool, how many sessions can create using a single pool?
If i run one non transaction query 10 time using knex connection ,how many sessions will create?
And when the created session will cleared from oracle session?
Is there any parameter available to remove the idle session from oracle.?
suggest me please if any.
WARNING: a pool.max value of 300 is far too large. You really don't want the database administrator running your Oracle server to distrust you: that can make your work life much more difficult. And such a large max pool size can bring the Oracle server to its knees.
It's a paradox: often you can get better throughput from a database application by reducing the pool size. That's because many concurrent queries can clog the database system.
The pool object here governs how many connections may be in the pool at once. Each connection is a so-called serially reusable resource. That is, when some part of your nodejs program needs to run a query or series of queries, it grabs a connection from the pool. If no connection is already available in the pool, the pooling stuff in knex opens a new one.
If the number of open connections is already at the pool.max value, the pooling stuff makes that part of your nodejs program wait until some other part of the program finishes using a connection in the pool.
When your part of the nodejs program finishes its queries, it releases the connection back to the pool to be reused when some other part of the program needs it.
This is almost absurdly complex. Why bother? Because it's expensive to open connections and much cheaper to re-use them.
Now to your questions:
pool Is this pool count or session count?
It is a pair of limits (min / max) on the count of connections (sessions) open within the pool at one time.
If it is pool, how many sessions can create using a single pool?
Up to the pool.max value.
If i run one non transaction query 10 time using knex connection ,how many sessions will create?
It depends on concurrency. If your tenth query before the first one completes, you may use ten connections from the pool. But you will most likely use fewer than that.
And when the created session will cleared from oracle session?
As mentioned, the pool keeps up to pool.max connections open. That's why 300 is too many.
Is there any parameter available to remove the idle session from oracle.?
This operation is called "evicting" connections from the pool. knex does not support this. Oracle itself may drop idle connections after a timeout. Ask your DBA about that.
In the meantime, use the knex defaults of pool: {min: 2, max: 10} unless and until you really understand pooling and the required concurrency of your application. max:300 would only be justified under very special circumstances.

Node.js Oracle Database Connection Pool Size

I implemented a connection pool (poolMin=poolMax=10) with node-oracledb and i saw a difference of up to 100 times especially in case of few users like 10. Really impressive. I also increased UV_THREADPOOL_SIZE like 4 + poolMax. At this point I could not understand somethings.
process.env.UV_THREADPOOL_SIZE = 4 + config.pool.poolMax // Default + Max
NodeJs works as Single Thread (with additional 4 threads those none of them are used for network I/O). So when i use a pool with 10 connections, can Single Thread use all of these connections? Or isn't it Single Thread with these settings anymore? Because i added 10 more to UV_THREADPOOL_SIZE. I would be grateful to anyone who explained this matter.
Btw, I wonder if using a fixed number pool like 10 would cause a problem in case of too many users? For example, if the number of instant users is 500, we can reach 5000 instant users on certain days of the year. Do I need to make a special setting (e.g. pool size 100) for those days or will the default be enough?
Thanks in advace.
When you do something like connection.execute(), that work will handled by a Node.js worker thread until the call completes. And each underlaying Oracle connection can only ever do one 'thing' (like execute, or fetch LOB data) at a time - this is a fundamental (i.e. insurmountable) behavior of Oracle connections.
For node-oracledb you want the number of worker threads to be at least as big as the number of connections in the connection pool, plus some extra for non database work. This allows connections to do their thing without blocking any other connection.
Any use of Promise.all() (and similar constructs) using a single connection should be assessed and considered for rewriting as a simple loop. Prior to node-oracledb 5.2, each of the 'parallel' operations for Promise.all() on a single connection will use a thread but this will be blocked waiting for prior work on the connection to complete, so you might need even more threads available. From 5.2 onwards any 'parallel' operations on a single connection will be queued in the JavaScript layer of node-oracledb and will be executed sequentially, so you will only need a worker thread per connection at most. In either version, using Promise.all() where each unit of work has its own connection is different, and only subject to the one-connection per thread requirements.
Check the node-oracledb documentation Connections, Threads, and Parallelism and Connection Pool Sizing.
Separate to how connections are used, first you have to get a connection. Node-oracledb will queue connection pool requests (e.g. pool.getConnection()) if every connection in the pool is already in use. This provides some resiliency under connection spikes. There are some limits to help real storms: queueMax and queueTimeout. Yes, at peak periods you might need to increase the poolMax value. You can check the pool statistics to see pool behavior. You don't want to make the pool too big - see the doc.
Side note: process.env.UV_THREADPOOL_SIZE doesn't have an effect in Node.js on Windows; the UV_THREADPOOL_SIZE variable must be set before Node.js is started.

nodejs | worker_thread | keep alive tcp connection within workers?

Using worker_threads from node 12, is it suitable to establish remote connection within the workers and keep those connection alive ?
I don't mean sharing the socket between the master and the workers like we could do with node cluster and fork.
The idea would be to have pools of secure connections already established within the workers to use if needed.
Let say I have a pool of 10 workers. When a worker is created, some pre-established "TLS" connection are created (streams) to server X,Y amd Z, and the worker is marked as "ready"
Each time that I use a worker to process "heavy" tasks (mapReduce, etc, ) and if I need to post data or get data to/from server X,Y or Z during the process,
I use the appropriate "TLS" connection already established from the pool.
Once the task completed, the result is return to the master and the worker just execute a new/next tasks.
1 ) Do you see any side effect / impact of doing so ?
2 ) would it be better to have the pool of "TLS" connection on the "main thread" (master) . If "remote" data are needed within the workers during the tasks, use the "postMessage" method to communicate with the "master" ( and vice/versa ).
Thanks
Worker Threads do not work for remote connections. However, you can build your own system that would work similar using TLS sockets. In a case of such a system I would definitely recommend keeping these types of connections alive. There is a significant latency in setting up these connections, and having these connections active in memory, will use a minimum amount of resources.
Keep in mind that a system like this has some drawbacks:
You are working with different machines, and each of these machines can have its own set of failure conditions.
You are communicating over a network, connections with remote servers might suddenly drop, for any reason imaginable.
You are increasing the physical distance, this will cause latency.
So keep this in the back of your mind.
Would I recommend building a system like this. It is really hard to determine and it relies on your use case, time and money. You mentioned the cluster nodes are processing 'heavy tasks', and with that I reckon CPU / GPU intensive tasks. So a system like this might be a good solution, however, a simple rest API in front of your processing servers might be good enough. Or maybe even database synchronized servers, that just check the database for tasks to execute.
There are many solutions for the same problem, just have to consider what works best for your project(s).

Check queued reads/writes for MongoDB

I feel like this question would have been asked before, but I can't find one. Pardon me if this is a repeat.
I'm building a service on Node.js hosted in Heroku and using MongoDB hosted by Compose. Under heavy load, the latency is most likely to come from the database, as there is nothing very CPU-heavy in the service layer. Thus, when MongoDB is overloaded, I want to return an HTTP 503 promptly instead of waiting for a timeout.
I'm also using REDIS, and REDIS has a feature where you can check the number of queued commands (redisClient.command_queue.length). With this feature, I can know right away if REDIS is backed up. Is there something similar for MongoDB?
The best option I have found so far is polling the server for status via this command, but (1) I'm hoping for something client side, as there could be spikes within the polling interval that cause problems, and (2) I'm not actually sure what part of the status response I want to act on. That second part brings me to a follow up question...
I don't fully understand how the MondoDB client works with the server. Is one connection shared per client instance (and in my case, per process)? Are queries and writes queued locally or on the server? Or, is one connection opened for each query/write, until the database's connection pool is exhausted? If the latter is the case, it seems like I might want to keep an eye on the open connections. Does the MongoDB server return such information at other times, besides when polled for status?
Thanks!
MongoDB connection pool workflow-
Every MongoClient instance has a built-in connection pool. The client opens sockets on demand to support the number of concurrent MongoDB operations your application requires. There is no thread-affinity for sockets.
The client instance, opens one additional socket per server in your MongoDB topology for monitoring the server’s state.
The size of each connection pool is capped at maxPoolSize, which defaults to 100.
When a thread in your application begins an operation on MongoDB, if all other sockets are in use and the pool has reached its maximum, the thread pauses, waiting for a socket to be returned to the pool by another thread.
You can increase maxPoolSize:
client = MongoClient(host, port, maxPoolSize=200)
By default, any number of threads are allowed to wait for sockets to become available, and they can wait any length of time. Override waitQueueMultiple to cap the number of waiting threads. E.g., to keep the number of waiters less than or equal to 500:
client = MongoClient(host, port, maxPoolSize=50, waitQueueMultiple=10)
Once the pool reaches its max size, additional threads are allowed to wait indefinitely for sockets to become available, unless you set waitQueueTimeoutMS:
client = MongoClient(host, port, waitQueueTimeoutMS=100)
Reference for connection pooling-
http://blog.mongolab.com/2013/11/deep-dive-into-connection-pooling/

Resources