how to check how many pooled connection we have at a specific time in MemSQL - singlestore

How can i tell that i have reached the maximum number of queries that my cluster can handle at a time?
If the connection pool size is too small for the workload then we have to adjust the max_pooled_connections configuration variable, which controls the number of pooled connections between each pair of nodes.
However how can I tell how many pooled connections we have at a specific time ?
In memsql agregator status i can see following entries Aborted_connects is 11 - why do we abort those connection ? Also Max_used_connections is 41, while Connections is a number that increases constantly.

How can i tell that i have reached the maximum number of queries that
my cluster can handle at a time?
There isn't a hard limit on the number of queries you can send other than max_connections (100k), but at some point the cluster will not execute them all at once and will schedule/queue them up. Is your question about the former or the latter?
If the connection pool size is too small for the workload then we have
to adjust the max_pooled_connections configuration variable, which
controls the number of pooled connections between each pair of nodes.
However how can I tell how many pooled connections we have at a
specific time ?
show leaves will show how many connections are currently open from the current node to each leaf. So the current connection pool size is min(current open connections, max_pooled_connections). Note that this is per (node, node) pair.
In memsql agregator status i can see following entries
Aborted_connects is 11 - why do we abort those connection ? Also
Max_used_connections is 41, while Connections is a number that
increases constantly.
Aborted connects includes e.g. failed login authentications.
max_used_connections is max peak, connections is cumulative total.

Related

What is recommended for cassandra max connections per host? How is it calculated?

Is there a way to limit number of connections per host in cassandra cluster and based on what parameter this is calculated?
In some of cassandra node i can see established connections count for 9042 goes up to 1400+, is this something i need to worry about?
Thanks
Yes, you can limit the number of connections per host in the Cassandra cluster.
If you are using the C++ driver, check this out.
I visualize any query following the below path:
Client --> Session --> IO threads --> Connections --> Nodes
You can configure the number of IO threads(This is the number of threads that will handle query requests) associated with the session. In each IO thread, you can then configure the number of connections per host. If needed, the number of connections per host will increase based on certain parameters(The maximum count up to which this can be increased is also configurable).
So, at max, there can be x number of connections per host where,
x = number_of_sessions * number_of_IO_threads * max_number_of_connections_per_host
All 3 variables on RHS in the above equation are configurable.
Also check out:
https://www.datastax.com/dev/blog/4-simple-rules-when-using-the-datastax-drivers-for-cassandra
https://stackoverflow.com/a/28219086/5701173

com.datastax.driver.core.exceptions.BusyPoolException

Whenever I insert data in table in Cassandra, more than 1000 and fetching the data by id, it throws the following exception:
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: localhost/127.0.0.1:9042 (com.datastax.driver.core.exceptions.BusyPoolException: [localhost/127.0.0.1] Pool is busy (no available connection and the queue has reached its max size 256)))
at com.datastax.driver.core.RequestHandler.reportNoMoreHosts(RequestHandler.java:213)
at com.datastax.driver.core.RequestHandler.access$1000(RequestHandler.java:49)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.findNextHostAndQuery(RequestHandler.java:277)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution$1.onFailure(RequestHandler.java:340)
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1764)
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:456)
at com.google.common.util.concurrent.Futures$ImmediateFuture.addListener(Futures.java:153)
at com.google.common.util.concurrent.Futures.addCallback(Futures.java:1776)
at com.google.common.util.concurrent.Futures.addCallback(Futures.java:1713)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.query(RequestHandler.java:299)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.findNextHostAndQuery(RequestHandler.java:274)
at com.datastax.driver.core.RequestHandler.startNewExecution(RequestHandler.java:117)
at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:97)
at com.datastax.driver.core.SessionManager.executeAsync(SessionManager.java:132)
at com.outworkers.phantom.builder.query.CassandraOperations$class.scalaQueryStringToPromise(CassandraOperations.scala:67)
at com.outworkers.phantom.builder.query.InsertQuery.scalaQueryStringToPromise(InsertQuery.scala:31)
at com.outworkers.phantom.builder.query.CassandraOperations$class.scalaQueryStringExecuteToFuture(CassandraOperations.scala:31)
at com.outworkers.phantom.builder.query.InsertQuery.scalaQueryStringExecuteToFuture(InsertQuery.scala:31)
at com.outworkers.phantom.builder.query.ExecutableStatement$class.future(ExecutableQuery.scala:80)
at com.outworkers.phantom.builder.query.InsertQuery.future(InsertQuery.scala:31)
at nd.cluster.data.store.Points.upsert(Models.scala:114)
I have solved above issue using PoolingOptions.
val poolingOptions = new PoolingOptions()
.setConnectionsPerHost(HostDistance.LOCAL, 1, 200)
.setMaxRequestsPerConnection(HostDistance.LOCAL, 256)
.setNewConnectionThreshold(HostDistance.LOCAL, 100).setCoreConnectionsPerHost(HostDistance.LOCAL, 200)
val builder1 = ContactPoint.local
.noHeartbeat()
.withClusterBuilder(_.withoutJMXReporting()
.withoutMetrics().withPoolingOptions(poolingOptions)).keySpace("nd")
Now it is working even with 1l data. But i am not sure about its efficiency.
Could anyone please help me ?
This means that you are submitting too many requests, and not waiting for the futures to complete before submitting more.
The default maximum number of requests per connection is 1024. If this number is exceeded for all connections, the connection pool will enqueue some requests, up to 256. If the queue gets full, a BusyPoolException is thrown. Of course you can increase the max number of requests per connection, and the number of max connections per host. But the real solution is of course to throttle your thread. You could e.g. submit your requests by batches of 1,000 and then wait on the futures to complete before submitting more, or use a semaphore to regulate the total number of pending requests and make sure they don't exceed a certain number (in theory, this number must stay below num_hosts * max_connections_per_host * max_requests_per_connection – in practice, I don't suggest going above 1,000 as it probably won't bring you more throughput).
You may find this links useful.
https://github.com/redisson/redisson/issues/438
https://groups.google.com/a/lists.datastax.com/forum/#!topic/java-driver-user/p3CwOL0kNrs http://docs.datastax.com/en/developer/java-driver/3.1/manual/pooling

Cassandra 3.10 debug.log contains frequent "FailureDetector.java:457 - Ignoring interval time of..."

The debug.log files for one of our Cassandra 3.10 clusters has frequent messages similar to “FailureDetector.java:457 - Ignoring interval time of…”
The messages appear even if the cluster is idle. I see the messages at a rate of about 1 per second on each node of this 6 node cluster (3 nodes each in two data centers).
Can someone tell me what causes the messages and if they are something to be concerned about?
We have a couple of other small clusters supporting the same application (different environments) and I see this message much less often (days apart).
The FailureDetector is responsible of deciding if a node is considered UP or DOWN.
The gossip process tracks state from other nodes both directly (nodes
gossiping directly to it) and indirectly (nodes communicated about
secondhand, third-hand, and so on). Rather than have a fixed threshold
for marking failing nodes, Cassandra uses an accrual detection
mechanism to calculate a per-node threshold that takes into account
network performance, workload, and historical conditions. During
gossip exchanges, every node maintains a sliding window of
inter-arrival times of gossip messages from other nodes in the
cluster.
Here you can find the source code, which gives you the log message. It is set to DEBUG level because they may be helpful in tracking down the actual issue causing the latency, but don't indicate a problem on their own.
In other words: your node measures the acknowledgement latency for each gossip message sent to the other nodes e.g: X nanosec for IP address1, Z nanosec for IP address2, etc. If eitherX or Y is above the expected 2 sec threshold as stated in MAX_INTERVAL_IN_NANO, it will get reported.
Problems, which can cause this log message:
Huge load on the node(s): e.g too many large partitions
High pressure: e.g. too many queries in sort period of time
Bad network connection
The extra FailureDetector logging was added with this:
Expose phi values from failure detector via JMX and tweak debug
and trace logging (CASSANDRA-9526)
and also I found this open issue, might be related to your problem:
The failure detector becomes more sensitive when the network is flakey(CASSANDRA-9536)
Also I find this article about Gossiping and Failure Detection very useful.

High Native-Transport-Requests All time Blocked

After running tpstats on all nodes. I see a lot of nodes having high number of ALL TIME BLOCKED NTR. We have a 4 node cluster and the values for NTR ALL TIME BLOCKED are :
NODE 1: 23953
NODE 2: 2935
NODE 3: 15229
NODE 4: 5951
I know ALL TIME BLOCKED is bad and hence worried as to what I am doing wrong.
This pool handles cql requests, so it is the number of active CQL requests allowed. Its limited to prevent too many active ones from OOMing your system (ie each returning large blobs). This effectively applies backpressure to your client application to slow down. Unfortunately if you have small requests this isnt ideal and hurts your throughput so in CASSANDRA-11363 they added a setting to make the space tradeoff for small bursty workloads.
If you upgrade to 2.2.8+ you can set the max queue size of that threadpool with -Dcassandra.max_queued_native_transport_requests=4096

Cassandra throwing NoHostAvailableException after 5 minutes of high IOPS run

I'm using datastax cassandra 2.1 driver and performing read/write operations at the rate of ~8000 IOPS. I've used pooling options to configure my session and am using separate session for read and write each of which connect to a different node in the cluster as contact point.
This works fine for say 5 mins but after that I get a lot of exceptions like :
Failed with: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /10.0.1.123:9042 (com.datastax.driver.core.TransportException: [/10.0.1.123:9042] Connection has been closed), /10.0.1.56:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout while trying to acquire available connection (you may want to increase the driver number of per-host connections)))
Can anyone help me out here on what could be the problem?
The exception asks me to increase number of connections per host but how high a value can I set for this parameter ?
Also I'm not able to set CoreConnectionsPerHost beyond 2 as it throws me exception saying 2 is the max.
This is how I'm creating each read / write session.
PoolingOptions poolingOpts = new PoolingOptions();
poolingOpts.setCoreConnectionsPerHost(HostDistance.REMOTE, 2);
poolingOpts.setMaxConnectionsPerHost(HostDistance.REMOTE, 200);
poolingOpts.setMaxSimultaneousRequestsPerConnectionThreshold(HostDistance.REMOTE, 128);
poolingOpts.setMinSimultaneousRequestsPerConnectionThreshold(HostDistance.REMOTE, 2);
cluster = Cluster
.builder()
.withPoolingOptions( poolingOpts )
.addContactPoint(ip)
.withRetryPolicy( DowngradingConsistencyRetryPolicy.INSTANCE )
.withReconnectionPolicy( new ConstantReconnectionPolicy( 100L ) ).build();
Session s = cluster.connect(keySpace);
Your problem might not actually be in your code or the way you are connecting. If you say the problem is happening after a few minutes then it could simply be that your cluster is becoming overloaded trying to process the ingestion of data and cannot keep up. The typical sign of this is when you start seeing JVM garbage collection "GC" messages in the cassandra system.log file, too many small ones batched together of large ones on their own can mean that incoming clients are not responded to causing this kind of scenario. Verify that you do not have too many of these event showing up in your logs first before you start to look at your code. Here's a good example of a large GC event:
INFO [ScheduledTasks:1] 2014-05-15 23:19:49,678 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 2896 ms for 2 collections, 310563800 used; max is 8375238656
When connecting to a cluster there are some recommendations, one of which is only have one Cluster object per real cluster. As per the article I've linked below (apologies if you already studied this):
Use one cluster instance per (physical) cluster (per application lifetime)
Use at most one session instance per keyspace, or use a single Session and explicitly specify the keyspace in your queries
If you execute a statement more than once, consider using a prepared statement
You can reduce the number of network roundtrips and also have atomic operations by using batches
http://www.datastax.com/documentation/developer/java-driver/2.1/java-driver/fourSimpleRules.html
As you are doing a high number of reads I'd most definitely recommend using setFetchSize also if its applicable to your code
http://www.datastax.com/documentation/developer/java-driver/2.1/common/drivers/reference/cqlStatements.html
http://www.datastax.com/documentation/developer/java-driver/2.1/java-driver/reference/queryBuilderOverview.html
For reference heres the connection options in case you find it useful
http://www.datastax.com/documentation/developer/java-driver/2.1/common/drivers/reference/connectionsOptions_c.html
Hope this helps.

Resources