Cassandra native transport port 9042 slow on EC2 Machine - cassandra

I have a 5 node Cassandra cluster set up on EC2, all in the same region.
If I connect over cqlsh (9160), queries respond in under a second.
When I connect via Dev Center, or using the native Java Driver, both of which use port 9042, the queries take over 20 seconds to respond.
They consistently respond in the same 21 second region. Never fast and then slow.
I have set up a few Cassandra Clusters on EC2 and have seen this before but do not know how to fix the problem. The last time, I scrapped the cluster and built a new one and the response time on port 9042 was fine.
Any help in how to debug or fix this problem would be appreciated, thanks.

The current version of DevCenter was designed to support as main scenario running (longish) CQL scripts (vs an interactive console with queries executed one after another). DevCenter is using as an underlying connector the DataStax Java driver for Cassandra.
For the above mentioned scenario, in order to ensure there are no "conflicts", a new Session is created for each execution. When a Session is initialized, the driver performs an auto-node discovery, creates connection pools, etc. Basically it does a lot of preparation work. Depending on the latency from your client machine to the EC2 nodes, the size of the cluster and also the configuration of these nodes (see the connection requirements), this initialization phase can be quite expensive.
As you can imagine the time spent preparing wouldn't represent a large percentage of running a DDL script and a decent size of inserts/updates. But for an interactive scenario, it will result in a suboptimal behavior (the one you are describing)
The next version(s) of DevCenter will address the interactive scenario and optimize for it so the user experience would be what you'd expect. And supporting this scenario is pretty high on our list of priorities.

The underlying Java driver obtains the whole cluster topology when it initially connects. This enables it to automatically connect to any node in the cluster. On EC2 it only obtains the private addresses, tries each one, and then times out. It then sends the request over the initial connection

Related

How does peer to peer architecture work in Cassandra?

How the peer-to-peer Cassandra architecture really works ? I mean :
When the request hits the Cluster, it must hit some machine based on an IP, right ?
So which machine it will hit first ? : one of the nodes, or something in the Cluster who is responsible to balance and redirect the request to the right node ?
Could you describe what it is ? And how this differ from the Master/Folowers architecture ?
For the purposes of my answer, I will use the Java driver as an example since it is the most popular.
When you connect to a cluster using one of the driver, you need to configure it with details of your cluster including:
Contact points - the entry point to your cluster which is a comma-separated list of IPs/hostnames for some of the nodes in your cluster.
Login credentials - username and password if authentication is enabled on your cluster.
SSL/TLS certificate and credentials - if encryption is enabled on your cluster.
When your application starts, a control connection is established with the first available node in the list of contact points. The driver uses this control connection for admin tasks such as:
get topology information about the cluster including node IPs, rack placement, network/DC information, etc
get schema information such as keyspaces and tables
subscribe to metadata changes including topology and schema updates
When you configure the driver with a load-balancing policy (LBP), the policy will determine which node the driver will pick as the coordinator for each and every single query. By default, the Java driver uses a load balancing policy which picks nodes in the local datacenter. If you don't specify which DC is local to the app, the driver will set the local DC to the DC of the first contact point.
Each time a driver executes a query, it generates a query plan or a list of nodes to contact. This list of nodes has the following characteristics:
A query plan is different for each query to balance the load across nodes in the cluster.
A query plan only lists available nodes and does not include nodes which are down or temporarily unavailable.
Nodes in the local DC are listed first and if the load-balancing policy allows it, remote nodes are included last.
The driver tries to contact each node in the query plan in the order they are listed. If the first node is available then the driver uses it as the coordinator. If the first node does not respond (for whatever reason), the driver tries the next node in the query plan and so on.
Finally, all nodes are equal in Cassandra. There is no active-passive, no leader-follower, no primary-secondary and this makes Cassandra a truly high availability (HA) cluster with no single point-of-failure. Any node can do the work of any other node and the load is distributed equally to all nodes by design.
If you're new to Cassandra, I recommend having a look at datastax.com/dev which has lots of free hands-on interactive learning resources. In particular, the Cassandra Fundamentals learning series lets you learn the basic concepts quickly.
For what it's worth, you can also use the Stargate.io data platform. It allows you to connect to a Cassandra cluster using APIs you're already familiar with. It is fully open-source so it's free to use. Here are links to the Stargate tutorials on datastax.com/dev: REST API, Document API, GraphQL API, and more recently gRPC API. Cheers!
Working with Cassandra, we have to remember two very important things: data is partitioned (split into chunks) and data is replicated (each chunk is stored on a few different servers). Partitioning is needed for scalability purposes while Replication serves High Availability. Given that Cassandra is designed to handle petabytes of data under huge pressure (dozens of millions of queries per second), and there is no single server able to handle such the load, each cluster server is responsible only for a range of data, not for the whole dataset. A node storing data you need for a particular query is called a "replica node". Notice that the different queries there will have different replica nodes.
Together, it brings a few implications:
We have to reach multiple servers during a single query to assure the data is consistent (read) / write data to all responsible servers (write).
How do we know which node is right for that particular query? What happens if a query hits a "wrong" node? How do we configure the application so it sends queries to the replica nodes?
Funny enough, as a developer you have to do one and only one thing: understand partitions and partition keys, and then Cassandra will take care of all the potential issues. Simple as that. When you design a table, you have to declare partition keys and the data placement will be based on that - automagically. Next thing, you have to always specify partition keys while doing your queries. That's it, your job is done, get yourself some coffee!
Meanwhile, Cassandra starts her job. Cassandra nodes are smart, they know data placement, they know what servers are responsible for the data you are writing, and they know the partitions - in Cassandra language it's called token-aware. That does not matter which server will receive the query, as literally every server is able to answer it. Any node that got the request (it's called query coordinator because it coordinates the query operations) will find replica nodes based on the placement of the partitions. With that, the query coordinator will execute the query, making proper calls to the replicas - the coordinator knows which nodes to ask because you did your part of the job and specified partition key value in the query, which is used for the routing.
In short, you can ask any of your cluster nodes to write/read your data, Cassandra is decentralized and you'll get it done. But how do we make it better and get directly to the replica to avoid bothering nodes that don't store our data?
So which machine it will hit first ?
The travel of a request starts much earlier than we could think of - when your application starts, a Cassandra driver connects to a cluster and reads information about data placement: which partition is stored on which nodes, It means that driver knows which node has to be contacted for different queries. You got it right, a driver is token-aware too!
Token-aware drivers understand data placement and will route a query to a proper replica node. Answering the question: under normal circumstances, your query will first hit one of the replica nodes, this node will get answers or write data to the other replica nodes and that's it, we are good. In some rare situations, your query may hit a "wrong" non-replica server, but it doesn't really matter as it also will do the job, with just a minor delay - for example, if your Replication Factor = 3 (you have three replicas), and your query got to a "wrong" node, it will have to ask all three replicas while hitting the "right one" still require 2 network operations. It's not a big deal though as all the operations are done in parallel.
how this differ from the Master/Folowers architecture
With leader/follower architecture, you can read from any server but you can write only to a leader server, which gives two issues:
Your app needs to know who is the leader (or you need to have a special proxy)
Single Point of Failure (SPoF) - if the leader is down, you can't write to the DB at all
With Cassandra's peer-to-peer architecture you can write to any of the cluster nodes, even if there are thousands of them. Of course, there is no SPoF.
P.S. Cassandra is an extremely powerful technology, but great power comes with great responsibility, it's quite complex too. If you plan to work with it, you better invest some time into learning to use it properly. I do suggest taking a Developer Path on the academy.datastax.com (it's free!) or at least watch DataStax "Intro to Cassandra" workshop
It is based on the driver that you used to connect to the Cassandra™ cluster. Again, all nodes in the datacenter are one and same. It would connect to any of the nodes the localdatacenter that you have provided in driver configs based on the contact points configuration (i.e. datastax-java-driver.basic.contact-points in Java Driver).
For example, the Java driver (& most drivers logic will be the same) uses system.peers.rpc-address to connect to newly discovered nodes. For special network topologies, an address translation component can be plugged in.
advanced.address-translator in the configuration.
none by default. Also available: EC2-specific (for deployments that span multiple regions), or write your own.
Each node in the Cassandra cluster is uniquely identified by an IP address that the driver will use to establish connections.
for contact points, these are provided as part of configuring the CqlSession object;
for other nodes, addresses will be discovered dynamically, either by inspecting system.peers on already connected nodes, or via push notifications received on the control connection when new nodes are discovered by gossip.
More info can be found here.
It seems you are asking how specifically Cassandra selects which Node gets hit with data and which ones doesn't.
There are two sides to this: the client and the servers
On the client
When a CQL Connection is established the client (if implemented in the client library and configured) usually also retrieves the Topology from the Cluster. A topology is the information about the token ownership inside the ring as well as information about quorums etc..
So the client itself can already make a decision on the next request what Node to contact for a certain amount of information due to Consistent Hashing of the primary keys in Cassandra. The client is aware who would be the right choice of Node to contact.
But still the client can choose not to use this information and just send the information to any node of the ring - the nodes will then forward the requests to the appropriate token owners -> See the next section.
In the Cluster
The same applies to the nodes themselves. If a client sends a request to a node it will simply look up the owner nodes in it's topology table and forward the request to exactly the nodes that do own this token.
It will always forward it to all of them so the data is consistent across the cluster. Depending on the replication factor it will return a success response to the client if the required replication is acknowledged by the cluster (eg. LOCAL_QUORUM with RF=3 will return a success response when 2 nodes acknowledge the receipt while the 3rd node is still pending).
If a node is detected as down or can't be reached the Command that would have been sent to the node is saved in the local hints table - a buffer that keeps all the operations that haven't been successfully sent to other nodes.
You can read more on Hints in the Cassandra Docs
Compared to a Leader/Follower architecture the Cassandra model is actually simpler and depends mostly on all involved nodes seeing all the mutation commands happening to the data they "own" via the tokens.

Does "spring data cassandra" have client side loadbalancing?

I'm operating project using spring-boot, spring-data-cassandra.
When I setup that project, I set cassandra properties by ip and port.
(referred by https://www.baeldung.com/spring-data-cassandra-tutorial)
When set it up like this, If I had 3 cassandra nodes and 1 cassandra node died, I think project should fail to connect with cassandra at a 33% probability.
But my project was fine even though 1 cassandra node was dead. (just have some error on one's deathbed)
Do It happen to have A function in spring-data-cassandra like client-side-loadbalancing?
If they have that function, Where can I see that code??
I tried to find that code but failed.
Please give me a little clue.
Spring Data Cassandra relies on the functionality of the DataStax Java driver that is responsible for making everything works. This includes:
establishing the initial connection to the cluster. This is where the contact points play their role. After driver is connected to any of points, it reads information about the whole cluster and establishes connections to all nodes (by default)
establishing the control connection that is used to receive notifications about changes in the cluster - nodes going up & down, changes in schema, etc. If node goes down or up, this information is used to modify the list of the active nodes
providing the load balancing of requests based on the replication, and nodes availability - if the node is down, it's excluded from list of candidates, so we don't send queries to node that is known to be down

Preventing Cassandra Node from Being Overwhelemed

When in Java, I create a Cassandra cluster builder, I provide a list of multiple Cassandra nodes as shown below:
Cluster cluster = Cluster.builder().addContactPoint(host1, host2, host3, host4).build();
But from what I understand, the connector connects only to the first host in the list that is available, and that host becomes my connection point to the Cassandra cluster.
Now, my question is if my Java application reads/writes huge amount of data from/to Cassandra, then doesn't my Java application overwhelm the node that it is connected to?
Is there a way to configure my connection such that it uses multiple nodes of Cassandra for its reads/writes? What is the common practice?
It uses the contact point to find the rest of the nodes in the cluster, then creates a pool of connections to all the hosts and balances the requests among them. It doesn't only connect to the hosts you provide unless you use the whitelist load balancing policy or a custom one.
If your worried about overwhelming nodes use the RoundRobinLoadBalancingPolicy (DC aware if multiple DCs) and it will distribute it amongst all of them evenly. If you have hot spots of data and use the TokenAware policy you may have it uneven, but you shouldn't need to worry about it.

Cassandra improve failover time

We are using a 3-node cassandra cluster (each node on a different vm) and currently investigating failover times during write and read operations in case one of the nodes dies.
Failover times are pretty good when shutting down one node gracefully, however, when killing a node (by shutting down the VM) the latency during the tests is about 12 seconds. I guess this has something to do with the tcp timeout?
Is there any way to tweak this?
Edit:
At the moment we are using Cassandra Version 2.0.10.
We are using the java client driver, version 2.1.9.
To describe the situation in more detail:
The write/read operations are performed using the QUROUM Consistency Level with a replication factor of 3. The cluster consists of 3 nodes (c1,c2,c3), each on a different host (VM). The client driver is connected to c1. During the tests I shutdown the host for c2. From then on we observe that the client is blocking for > 12 seconds, until the other nodes realize that c2 is gone. So i think this is not a client issue, since the client is connected to node c1, which is still running in this scenario.
Edit: I don't believe that the fact that running cassandra inside a VM affects the network stack. In fact, killing the VM has the effect, that the TCP connections are not terminated. In this case, a remote host can notice this only through some time out mechanism (either a timeout on the application level protocol or a TCP timeout).
If the process is killed on OS level, the TCP stack of the OS will take care of terminating the TCP connection (IMHO with a TCP reset) enabling a remote host to immediately be notified about the failure. 
However, it might be important that even in situations, where a host crashes due to a hardware failure, or where a host is disconnected due to an unplugged network cable (in both cases the TCP connection will not be terminated immediately) the failover time is low.  I've tried to sigkill the cassandra process inside the VM. In this case the failover time is about 600ms, which is fine.
kind regards
Failover times are pretty good when shutting down one node gracefully, however, when killing a node (by shutting down the VM) the latency during the tests is about 12 seconds
12 secs is a pretty huge value. Some questions before investigating further
what is your Cassandra version ? Since version 2.0.2 there is a speculative retry mechanism that help reducing the latency for such failover scenario: http://www.datastax.com/dev/blog/rapid-read-protection-in-cassandra-2-0-2
what is the client driver you're using (java ? c# ? version ?). Normally with a properly configured load balancing policy, when a node is down the client will retry automatically the query by re-routing to another replica. There is also speculative retry implemented at the driver-side : http://datastax.github.io/java-driver/manual/speculative_execution/
Edit: for a node to be marked down, the gossip protocol is using the phi accrual detector. Instead of having a binary state (UP/DOWN), the algorithm adjust the suspicion level and if the value is above a threshold, the node is considered down
This algorithm is necessary to avoid marking down a node because of a micro network issue.
Look in the cassandra.yaml file for this config:
# phi value that must be reached for a host to be marked down.
# most users should never need to adjust this.
# phi_convict_threshold: 8
Another question is: what load balancing strategy are you using from the driver ? And did you use the speculative retry policy ?

What is meant by a node in cassandra?

I am new to Cassandra and I want to install it. So far I've read a small article on it.
But there one thing that I do not understand and it is the meaning of 'node'.
Can anyone tell me what a 'node' is, what it is for, and how many nodes we can have in one cluster ?
A node is the storage layer within a server.
Newer versions of Cassandra use virtual nodes, or vnodes. There are 256 vnodes per server by default.
A vnode is essentially the storage layer.
machine: a physical server, EC2 instance, etc.
server: an installation of Cassandra. Each machine has one installation of Cassandra. The Cassandra server runs core processes such as the snitch, the partitioner, etc.
vnode: The storage layer in a Cassandra server. There are 256 vnodes per server by default.
Helpful tip:
Where you will get confused is that Cassandra terminology (in older blog posts, YouTube videos, and so on) had been used inconsistently. In older versions of Cassandra, each machine had one Cassandra server installed, and each server contained one node. Due to the 1-to-1-to-1 relationship between machine-server-node in old versions of Cassandra people previously used the terms machine, server and node interchangeably.
Cassandra is a distributed database management system designed to handle large amounts of data across many commodity servers. Like all other distributed database systems, it provides high availability with no single point of failure.
You may got some ideas from the description of above paragraph. Generally, when we talk Cassandra, we mean a Cassandra cluster, not a single PC. A node in a cluster is just a fully functional machine that is connected with other nodes in the cluster through high internal network. All nodes work together to make sure that even if one of them failed due to unexpected error, they as a whole cluster can provide service.
All nodes in a Cassandra cluster are same. There is no concept of Master node or slave nodes. There are multiple reason to design like this, and you can Google it for more details if you want.
Theoretically, you can have as many nodes as you want in a Cassandra cluster. For example, Apple used 75,000 nodes served Cassandra summit in 2014.
Of course you can try Cassandra with one machine. It still work while just one node in this cluster.
What is meant by a node in cassandra?
Cassandra Node is a place where data is stored.
Data center is a collection of related nodes.
A cluster is a component which contains one or more data centers.
In other words collection of multiple Cassandra nodes which communicates with each other to perform set of operation.
In Cassandra, each node is independent and at the same time interconnected to other nodes.
All the nodes in a cluster play the same role.
Every node in a cluster can accept read and write requests, regardless of where the data is actually located in the cluster.
In the case of failure of one node, Read/Write requests can be served from other nodes in the network.
If you're looking to understand Cassandra terminology, then the following post is a good reference:
http://exponential.io/blog/2015/01/08/cassandra-terminology/

Resources