How to measure effectiveness of using Token Aware connection pool? - cassandra

My team is testing the token aware connection pool of Astyanax. How can we measure effectiveness of the connection pool type, i.e. how can we know how the tokens are distributed in a ring and how client connections are distributed across them?
Our initial tests by counting the number of open connection on network cards show that only 3 out of 4 or more Cassandra instances in a ring are used and the other nodes participate in request processing in a very limited scope.
What other information would help making a valid judgment/verification? Is there an Cassandra/Astyanax API or command line tools to help us out?

Use Opscenter. This will show you how balanced your cluster is, i.e. whether each node has the same amount of data, as well asbeing able to graph the incoming read / write request per node and for your entire cluster. It is free and works with open source Cassandra as well as DSE. http://www.datastax.com/what-we-offer/products-services/datastax-opscenter

Related

Understand Cassandra pooling options (setCoreConnectionsPerHost and setMaxConnectionsPerHost)?

I recently started working with Cassandra and I was reading more about connection pooling here. I was confuse about pool size and couldn't understand what does this mean here:
poolingOptions
.setCoreConnectionsPerHost(HostDistance.LOCAL, 4)
.setMaxConnectionsPerHost( HostDistance.LOCAL, 10)
.setCoreConnectionsPerHost(HostDistance.REMOTE, 2)
.setMaxConnectionsPerHost( HostDistance.REMOTE, 4)
.setMaxRequestsPerConnection(2000);
Below is what I want to understand in detail:
I would like to know what does setCoreConnectionsPerHost, setMaxConnectionsPerHost and setMaxRequestsPerConnection means?
What is LOCAL and REMOTE means here?
If someone can explain with an example then it will really help me understand better.
We have a 6 nodes cluster all in one dc with RF as 3 and we read/write as local quorum.
Cassandra protocol allows to submit for execution multiple queries over the same network connection in parallel, without waiting for answer. The setMaxRequestsPerConnection sets how many in-flight queries could be in one connection simultaneously - maximal limit depends on protocol, and since protocol v3, it's 32k, but in reality you need to keep it around 1000-2000 - if you have more, then it's a sign that server is not keeping with your queries.
Drivers are opening connections to every node in the cluster, and these connections are marked either as LOCAL - if they are to the nodes in the data center that is local to the application (either set explicitly in load balancing policy, or inferred from first contacted point), or as REMOTE if they are to the nodes that in the other data centers.
Also, driver can open several connections to nodes. And there are 2 values that control their number: core - the minimal number of connections, and max - what is the upper limit. Driver will open new connections if you submit new requests that doesn't fit into the existing limit.
So in your example:
poolingOptions
.setCoreConnectionsPerHost(HostDistance.LOCAL, 4)
.setMaxConnectionsPerHost( HostDistance.LOCAL, 10)
.setCoreConnectionsPerHost(HostDistance.REMOTE, 2)
.setMaxConnectionsPerHost( HostDistance.REMOTE, 4)
.setMaxRequestsPerConnection(2000);
for local data center, it will open 4 connections per node initially, and it may grow up to 10 connections
for other data centers it will open 2 connections, that could grow up to 4 connections

cassandra connections spikes load issue

I am using cassandra according to the following struct:
21 nodes , AWS EC2 i3.2xlarge , version 3.11.4 .
The application is opening about 5000 connection per node (so its 100k connections per cluster) using the datastax java connection driver.
Application is using autoscale and frequently opens/close connections.
Number of connections to open at once by app servers can reach up to 500 per node (opens simultaneously on all nodes at once - so its 10k connections opens at the same time across the cluster)
This cause spikes of load on cassandra and cause reads and writes latency.
I have noticed each time connections opens/close there are high number of reads from system_auth.roles and system_auth.role_permissions.
How can I prevent the load and resolve this issue ?
You need to modify your application to work with as small number of connections as possible. You need to have following in mind:
Create Cluster/Session object, once at start and keep it. Initialization of session is very expensive operation, it adds a load to Cassandra, and to your application as well
you may increase the number of the simultaneous requests per connection, instead of opening new connections. Protocol allows to have up to 32k requests per connection. Although, if you have too many requests in-flight, then it's a sign that your Cassandra doesn't keep with workload and can't answer fast enough. See documentation on connection pooling

Need more insight into Hazelcast Client and the ideal scenario to use it

There is already a question on the difference between Hazelcast Instance and Hazelcast client.
And it is mentioned that
HazelcastInstance = HazelcastClient + AnotherFeatures
So is it right to say client just reads and writes to the cluster formed without getting involved in the cluster? i.e. client does not store the data?
This is important to know since we can configure JVM memory as per the usage. The instances forming the cluster will be allocated more than the ones that are just connecting as a client.
It is a little bit more complicated than that. The Hazelcast Lite Member is a full-blown cluster member, without getting partitions assigned. That said, it doesn't store any data but otherwise behaves like a normal member.
Clients on the other side are simple proxies that have to forward everything to one cluster member to get any operation done. You can imagine a Hazelcast client to be something like a JDBC client, that has just enough code to connect to the cluster and redirect requests / retrieve responses.

Cassandra DB - Node is down and a request is made to fetch data in that Node

If we configured our replication factor in such a way that there are no replica nodes (Data is stored in one place/Node only) and if the Node contains requested data is down, How will the request be handled by Cassandra DB?
Will it return no data or Other nodes gossip and somehow pick up data from failed Node(Storage) and send the required response? If data is picked up, Will data transfer between nodes happen as soon as Node is down(GOSSIP PROTOCOL) or after a request is made?
Have researched for long time on how GOSSIP happens and high availability of Cassandra but was wondering availability of data in case of "No Replicas" since I do not want to waste additional Storage for occasional failures and at the same time, I need availability and No data loss(though delayed)
I assume when you say that there is "no replica nodes" you mean that you have set the Replication Factor=1. In this case if the request is a Read then it will fail, if the request is a write it will be stored as a hint, up to the maximum hint time, and will be replayed. If the node is down for longer than the hint time then that write will be lost. Hinted Handoff: repair during write path
In general only having a single replica of data in your C* cluster goes against some the basic design of how C* is to be used and is an anti-pattern. Data duplication is a normal and expected part of using C* and is what allows for it's high availability aspects. Having an RF=1 introduces a single point of failure into the system as the server containing that data can go out for any of a variety of reasons (including things like maintenance) which will cause requests to fail.
If you are truly looking for a solution that provides high availability and no data loss then you need to increase your replication factor (the standard I usually see is RF=3) and setup your clusters hardware in such a manner as to reduce/remove potential single points of failure.

Cassandra Failed to create a selector. Multithreading multiple concurrent cassandra connections

I am running an ExecutorService of more than 50 threads concurrently. Each thread is opening a connection to Cassandra and performing inserts using springframework.data.cassandra. The problem is when I open more than 50 connections at a time, I get the following error.
Caused by: org.jboss.netty.channel.ChannelException: Failed to create a selector.
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.openSelector(AbstractNioSelector.java:343)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.<init>(AbstractNioSelector.java:100)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.<init>(AbstractNioWorker.java:52)
at org.jboss.netty.channel.socket.nio.NioWorker.<init>(NioWorker.java:45)
at org.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPool.java:45)
at org.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPool.java:28)
at org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.newWorker(AbstractNioWorkerPool.java:143)
at org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.init(AbstractNioWorkerPool.java:81)
at org.jboss.netty.channel.socket.nio.NioWorkerPool.<init>(NioWorkerPool.java:39)
at org.jboss.netty.channel.socket.nio.NioWorkerPool.<init>(NioWorkerPool.java:33)
at org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.<init>(NioClientSocketChannelFactory.java:151)
at org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.<init>(NioClientSocketChannelFactory.java:116)
at com.datastax.driver.core.Connection$Factory.<init>(Connection.java:532)
at com.datastax.driver.core.Cluster$Manager.<init>(Cluster.java:1201)
at com.datastax.driver.core.Cluster$Manager.<init>(Cluster.java:1144)
at com.datastax.driver.core.Cluster.<init>(Cluster.java:121)
at com.datastax.driver.core.Cluster.<init>(Cluster.java:108)
at com.datastax.driver.core.Cluster.buildFrom(Cluster.java:177)
at com.datastax.driver.core.Cluster$Builder.build(Cluster.java:1109)
If I open exactly 50 threads (or less), it works fine. Is there a way to configure this so I can allow more? In my cassandra.yaml file, rpc_max_threads according to the comments by default "The default is unlimited"
My guess is you are overwhelming your OS by creating too many connections. You should only create 1 Cluster instance per Cassandra cluster. Clusters create Sessions, which manage their own connection pools. Both Cluster and Session are thread safe, so you can share them between threads.
Four simple rules for coding with the driver distills these concepts well:
When writing code that uses the driver, there are four simple rules that you should follow that will also make your code efficient:
Use one cluster instance per (physical) cluster (per application lifetime)
Use at most one session instance per keyspace, or use a single Session and explicitly specify the keyspace in your queries
...
A Cluster instance allows to configure different important aspects of the way connections and queries will be handled. At this level you can configure everything from contact points (address of the nodes to be contacted initially before the driver performs node discovery), the request routing policy, retry and reconnection policies, and so forth. Generally such settings are set once at the application level.
While the session instance is centered around query execution, the Session it also manages the per-node connection pools. The session instance is a long-lived object, and it should not be used in a request-response, short-lived fashion. The code should share the same cluster and session instances across your application.

Resources