pgbouncer - auroraDB cluster not load balancing correctly - amazon-rds

I as using AuroraDB cluster with 2 readers and pgBouncer to maintain a connection pool.
My application is very read intensive and fires a lot of select queries.
the problem I am facing is my 2 read replicas are not getting used completely in parallel.
I can see the trends where all connections get moved to 1 replica where other replica is serving 0 connections and after some time the situation shift when 2nd replica serves all connections and 1st serves 0.
I investigated this and found that auroraDB cluster load balancing is done on by time slicing 1-second intervals.
My guess is when pgBouncer creates connection pool all connection are created within 1 second window and all connections end up on 1 read replica.
is there any way I can correct this?

The DB Endpoint is a Route 53 DNS and load balancing is done basically via DNS round robin, each time you resolve the DNS. When you use pgBouncer, is it resolving the DNS once and trying to open connections to the resolved IP? If yes, then this is expected that all your connections are resolved to the same instance. You could fix this conceptually in multiple ways (I'm not too familiar with pgBouncer), but you basically need to somehow make the library resolve the DNS explicitly for each connection, or explicitly add all the instance endpoints to the configuration. The latter is not recommended if you plan on issuing writes using this Connection pool. You don't have any control over who stays as the writer, so you may inadvertently end up sending your writes to a replica.
AuroraDB cluster load balancing is done on by time slicing 1-second intervals
I'm not too sure where you read that. Could you share some references?

Related

Understand Cassandra pooling options (setCoreConnectionsPerHost and setMaxConnectionsPerHost)?

I recently started working with Cassandra and I was reading more about connection pooling here. I was confuse about pool size and couldn't understand what does this mean here:
poolingOptions
.setCoreConnectionsPerHost(HostDistance.LOCAL, 4)
.setMaxConnectionsPerHost( HostDistance.LOCAL, 10)
.setCoreConnectionsPerHost(HostDistance.REMOTE, 2)
.setMaxConnectionsPerHost( HostDistance.REMOTE, 4)
.setMaxRequestsPerConnection(2000);
Below is what I want to understand in detail:
I would like to know what does setCoreConnectionsPerHost, setMaxConnectionsPerHost and setMaxRequestsPerConnection means?
What is LOCAL and REMOTE means here?
If someone can explain with an example then it will really help me understand better.
We have a 6 nodes cluster all in one dc with RF as 3 and we read/write as local quorum.
Cassandra protocol allows to submit for execution multiple queries over the same network connection in parallel, without waiting for answer. The setMaxRequestsPerConnection sets how many in-flight queries could be in one connection simultaneously - maximal limit depends on protocol, and since protocol v3, it's 32k, but in reality you need to keep it around 1000-2000 - if you have more, then it's a sign that server is not keeping with your queries.
Drivers are opening connections to every node in the cluster, and these connections are marked either as LOCAL - if they are to the nodes in the data center that is local to the application (either set explicitly in load balancing policy, or inferred from first contacted point), or as REMOTE if they are to the nodes that in the other data centers.
Also, driver can open several connections to nodes. And there are 2 values that control their number: core - the minimal number of connections, and max - what is the upper limit. Driver will open new connections if you submit new requests that doesn't fit into the existing limit.
So in your example:
poolingOptions
.setCoreConnectionsPerHost(HostDistance.LOCAL, 4)
.setMaxConnectionsPerHost( HostDistance.LOCAL, 10)
.setCoreConnectionsPerHost(HostDistance.REMOTE, 2)
.setMaxConnectionsPerHost( HostDistance.REMOTE, 4)
.setMaxRequestsPerConnection(2000);
for local data center, it will open 4 connections per node initially, and it may grow up to 10 connections
for other data centers it will open 2 connections, that could grow up to 4 connections

cassandra connections spikes load issue

I am using cassandra according to the following struct:
21 nodes , AWS EC2 i3.2xlarge , version 3.11.4 .
The application is opening about 5000 connection per node (so its 100k connections per cluster) using the datastax java connection driver.
Application is using autoscale and frequently opens/close connections.
Number of connections to open at once by app servers can reach up to 500 per node (opens simultaneously on all nodes at once - so its 10k connections opens at the same time across the cluster)
This cause spikes of load on cassandra and cause reads and writes latency.
I have noticed each time connections opens/close there are high number of reads from system_auth.roles and system_auth.role_permissions.
How can I prevent the load and resolve this issue ?
You need to modify your application to work with as small number of connections as possible. You need to have following in mind:
Create Cluster/Session object, once at start and keep it. Initialization of session is very expensive operation, it adds a load to Cassandra, and to your application as well
you may increase the number of the simultaneous requests per connection, instead of opening new connections. Protocol allows to have up to 32k requests per connection. Although, if you have too many requests in-flight, then it's a sign that your Cassandra doesn't keep with workload and can't answer fast enough. See documentation on connection pooling

Cassandra Failed to create a selector. Multithreading multiple concurrent cassandra connections

I am running an ExecutorService of more than 50 threads concurrently. Each thread is opening a connection to Cassandra and performing inserts using springframework.data.cassandra. The problem is when I open more than 50 connections at a time, I get the following error.
Caused by: org.jboss.netty.channel.ChannelException: Failed to create a selector.
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.openSelector(AbstractNioSelector.java:343)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.<init>(AbstractNioSelector.java:100)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.<init>(AbstractNioWorker.java:52)
at org.jboss.netty.channel.socket.nio.NioWorker.<init>(NioWorker.java:45)
at org.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPool.java:45)
at org.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPool.java:28)
at org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.newWorker(AbstractNioWorkerPool.java:143)
at org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.init(AbstractNioWorkerPool.java:81)
at org.jboss.netty.channel.socket.nio.NioWorkerPool.<init>(NioWorkerPool.java:39)
at org.jboss.netty.channel.socket.nio.NioWorkerPool.<init>(NioWorkerPool.java:33)
at org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.<init>(NioClientSocketChannelFactory.java:151)
at org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.<init>(NioClientSocketChannelFactory.java:116)
at com.datastax.driver.core.Connection$Factory.<init>(Connection.java:532)
at com.datastax.driver.core.Cluster$Manager.<init>(Cluster.java:1201)
at com.datastax.driver.core.Cluster$Manager.<init>(Cluster.java:1144)
at com.datastax.driver.core.Cluster.<init>(Cluster.java:121)
at com.datastax.driver.core.Cluster.<init>(Cluster.java:108)
at com.datastax.driver.core.Cluster.buildFrom(Cluster.java:177)
at com.datastax.driver.core.Cluster$Builder.build(Cluster.java:1109)
If I open exactly 50 threads (or less), it works fine. Is there a way to configure this so I can allow more? In my cassandra.yaml file, rpc_max_threads according to the comments by default "The default is unlimited"
My guess is you are overwhelming your OS by creating too many connections. You should only create 1 Cluster instance per Cassandra cluster. Clusters create Sessions, which manage their own connection pools. Both Cluster and Session are thread safe, so you can share them between threads.
Four simple rules for coding with the driver distills these concepts well:
When writing code that uses the driver, there are four simple rules that you should follow that will also make your code efficient:
Use one cluster instance per (physical) cluster (per application lifetime)
Use at most one session instance per keyspace, or use a single Session and explicitly specify the keyspace in your queries
...
A Cluster instance allows to configure different important aspects of the way connections and queries will be handled. At this level you can configure everything from contact points (address of the nodes to be contacted initially before the driver performs node discovery), the request routing policy, retry and reconnection policies, and so forth. Generally such settings are set once at the application level.
While the session instance is centered around query execution, the Session it also manages the per-node connection pools. The session instance is a long-lived object, and it should not be used in a request-response, short-lived fashion. The code should share the same cluster and session instances across your application.

Spring TCP Client Server max connections

Trying to build a TCP server using Spring Integration in which keeps connections may run into thousands at any point in time. Key concerns are regarding
Max no. of concurrent client connections that can be managed as session would be live for a long period of time.
What is advise in case connections exceed limit specified in (1).
Something along the lines of a cluster of servers would be helpful.
There's no mechanism to limit the number of connections allowed. You can, however, limit the workload by using fixed thread pools. You could also use an ApplicationListener to get TcpConnectionOpenEvents and immediately close the socket if your limit is exceeded (perhaps sending some error to the client first).
Of course you can have a cluster, together with some kind of load balancer.

Azure Auto Scaling works but load doesn´t get distributed

i have a problem with auto scaling in azure. The scaling process works fine but when a new instance is added it becomes no traffic.
My scenario:
I have 2 running instances whit a WCF webservice on it. Now i shot from 2 other servers(not azure) data to the webservice.
After a while the auto scaling kicks in and a new instance is added. The 2 servers are producing still load on the first 2 azure servers. However the new one doesn´t get any.
I thought azure is using round robin for load balancing or am i missing sth. else?
Thx for any help.
The problem is because of TCP connection keep-alive - when the clients first connect the connection is established to existing instances and then it persists to those instances. So when the service scales out the clients won't reconnect unless the connection is broken. New clients will connect to both existing and new instances.
Here's another question for a very similar scenario. For testing purposes you can just disable keep-alive to ensure that load is indeed distributed between instances.

Resources