I am using cassandra according to the following struct:
21 nodes , AWS EC2 i3.2xlarge , version 3.11.4 .
The application is opening about 5000 connection per node (so its 100k connections per cluster) using the datastax java connection driver.
Application is using autoscale and frequently opens/close connections.
Number of connections to open at once by app servers can reach up to 500 per node (opens simultaneously on all nodes at once - so its 10k connections opens at the same time across the cluster)
This cause spikes of load on cassandra and cause reads and writes latency.
I have noticed each time connections opens/close there are high number of reads from system_auth.roles and system_auth.role_permissions.
How can I prevent the load and resolve this issue ?
You need to modify your application to work with as small number of connections as possible. You need to have following in mind:
Create Cluster/Session object, once at start and keep it. Initialization of session is very expensive operation, it adds a load to Cassandra, and to your application as well
you may increase the number of the simultaneous requests per connection, instead of opening new connections. Protocol allows to have up to 32k requests per connection. Although, if you have too many requests in-flight, then it's a sign that your Cassandra doesn't keep with workload and can't answer fast enough. See documentation on connection pooling
Related
I recently started working with Cassandra and I was reading more about connection pooling here. I was confuse about pool size and couldn't understand what does this mean here:
poolingOptions
.setCoreConnectionsPerHost(HostDistance.LOCAL, 4)
.setMaxConnectionsPerHost( HostDistance.LOCAL, 10)
.setCoreConnectionsPerHost(HostDistance.REMOTE, 2)
.setMaxConnectionsPerHost( HostDistance.REMOTE, 4)
.setMaxRequestsPerConnection(2000);
Below is what I want to understand in detail:
I would like to know what does setCoreConnectionsPerHost, setMaxConnectionsPerHost and setMaxRequestsPerConnection means?
What is LOCAL and REMOTE means here?
If someone can explain with an example then it will really help me understand better.
We have a 6 nodes cluster all in one dc with RF as 3 and we read/write as local quorum.
Cassandra protocol allows to submit for execution multiple queries over the same network connection in parallel, without waiting for answer. The setMaxRequestsPerConnection sets how many in-flight queries could be in one connection simultaneously - maximal limit depends on protocol, and since protocol v3, it's 32k, but in reality you need to keep it around 1000-2000 - if you have more, then it's a sign that server is not keeping with your queries.
Drivers are opening connections to every node in the cluster, and these connections are marked either as LOCAL - if they are to the nodes in the data center that is local to the application (either set explicitly in load balancing policy, or inferred from first contacted point), or as REMOTE if they are to the nodes that in the other data centers.
Also, driver can open several connections to nodes. And there are 2 values that control their number: core - the minimal number of connections, and max - what is the upper limit. Driver will open new connections if you submit new requests that doesn't fit into the existing limit.
So in your example:
poolingOptions
.setCoreConnectionsPerHost(HostDistance.LOCAL, 4)
.setMaxConnectionsPerHost( HostDistance.LOCAL, 10)
.setCoreConnectionsPerHost(HostDistance.REMOTE, 2)
.setMaxConnectionsPerHost( HostDistance.REMOTE, 4)
.setMaxRequestsPerConnection(2000);
for local data center, it will open 4 connections per node initially, and it may grow up to 10 connections
for other data centers it will open 2 connections, that could grow up to 4 connections
I as using AuroraDB cluster with 2 readers and pgBouncer to maintain a connection pool.
My application is very read intensive and fires a lot of select queries.
the problem I am facing is my 2 read replicas are not getting used completely in parallel.
I can see the trends where all connections get moved to 1 replica where other replica is serving 0 connections and after some time the situation shift when 2nd replica serves all connections and 1st serves 0.
I investigated this and found that auroraDB cluster load balancing is done on by time slicing 1-second intervals.
My guess is when pgBouncer creates connection pool all connection are created within 1 second window and all connections end up on 1 read replica.
is there any way I can correct this?
The DB Endpoint is a Route 53 DNS and load balancing is done basically via DNS round robin, each time you resolve the DNS. When you use pgBouncer, is it resolving the DNS once and trying to open connections to the resolved IP? If yes, then this is expected that all your connections are resolved to the same instance. You could fix this conceptually in multiple ways (I'm not too familiar with pgBouncer), but you basically need to somehow make the library resolve the DNS explicitly for each connection, or explicitly add all the instance endpoints to the configuration. The latter is not recommended if you plan on issuing writes using this Connection pool. You don't have any control over who stays as the writer, so you may inadvertently end up sending your writes to a replica.
AuroraDB cluster load balancing is done on by time slicing 1-second intervals
I'm not too sure where you read that. Could you share some references?
I'm trying to tune Cassandra because I keep getting this error:
com.datastax.driver.core.exceptions.OperationTimedOutException: Timed out waiting for server response
But I'm not sure whether I should increase the number of connections I have per host or increase the number of requests per connection in the datastax driver.
In general, what is the best way to determine if adding more connections is better or adding more requests per connection?
I am doing load testing on Cassandra By using JMeter.
After systematically increasing the load, I can see that more than 58000 Active connection has been established by the driver with different node of cassandra.
I have started with 500 and added up 500 more after 10 iteration. like this i reached upto 2500 request. where it is failing. And Throwing NoHOSTAvailableExecption. I thought that may be cassandra is down. But when i have tried to send request to cassnadra by using DataStax driver . Running in a different System it is working fine. So now My question is that
When I am increasing the load on DataStax java driver it is Opening
more connection instead of using existing connection. Why it is not
using the existing connection?
By default the driver should only have connections based on the number of nodes in the cluster (1 connection per node I believe). This makes me think that your issue is with your Jmeter code and not the Java driver.
In normal operation using the native protocol, the java driver sends multiple requests along each connection simultaneously so there is no need to open multiple connections to the same server. There is some work around upping the limit of simultaneous requests.
The connection was increasing Due to calling creating multiple session for each request. Now it is working Fine.
builder = new Cluster.Builder().
addContactPoints("192.168.114.42");
builder.withPoolingOptions(new PoolingOptions().setCoreConnectionsPerHost(
HostDistance.LOCAL, new PoolingOptions().getMaxConnectionsPerHost(HostDistance.LOCAL)));
cluster = builder
.withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE)
.withReconnectionPolicy(new ConstantReconnectionPolicy(100L))
.build();
session = cluster.connect("demodb");
Now Driver is maintain 17-26 number of connection irrespective of number of transaction.
Trying to build a TCP server using Spring Integration in which keeps connections may run into thousands at any point in time. Key concerns are regarding
Max no. of concurrent client connections that can be managed as session would be live for a long period of time.
What is advise in case connections exceed limit specified in (1).
Something along the lines of a cluster of servers would be helpful.
There's no mechanism to limit the number of connections allowed. You can, however, limit the workload by using fixed thread pools. You could also use an ApplicationListener to get TcpConnectionOpenEvents and immediately close the socket if your limit is exceeded (perhaps sending some error to the client first).
Of course you can have a cluster, together with some kind of load balancer.