We are using a 3-node cassandra cluster (each node on a different vm) and currently investigating failover times during write and read operations in case one of the nodes dies.
Failover times are pretty good when shutting down one node gracefully, however, when killing a node (by shutting down the VM) the latency during the tests is about 12 seconds. I guess this has something to do with the tcp timeout?
Is there any way to tweak this?
Edit:
At the moment we are using Cassandra Version 2.0.10.
We are using the java client driver, version 2.1.9.
To describe the situation in more detail:
The write/read operations are performed using the QUROUM Consistency Level with a replication factor of 3. The cluster consists of 3 nodes (c1,c2,c3), each on a different host (VM). The client driver is connected to c1. During the tests I shutdown the host for c2. From then on we observe that the client is blocking for > 12 seconds, until the other nodes realize that c2 is gone. So i think this is not a client issue, since the client is connected to node c1, which is still running in this scenario.
Edit: I don't believe that the fact that running cassandra inside a VM affects the network stack. In fact, killing the VM has the effect, that the TCP connections are not terminated. In this case, a remote host can notice this only through some time out mechanism (either a timeout on the application level protocol or a TCP timeout).
If the process is killed on OS level, the TCP stack of the OS will take care of terminating the TCP connection (IMHO with a TCP reset) enabling a remote host to immediately be notified about the failure.
However, it might be important that even in situations, where a host crashes due to a hardware failure, or where a host is disconnected due to an unplugged network cable (in both cases the TCP connection will not be terminated immediately) the failover time is low. I've tried to sigkill the cassandra process inside the VM. In this case the failover time is about 600ms, which is fine.
kind regards
Failover times are pretty good when shutting down one node gracefully, however, when killing a node (by shutting down the VM) the latency during the tests is about 12 seconds
12 secs is a pretty huge value. Some questions before investigating further
what is your Cassandra version ? Since version 2.0.2 there is a speculative retry mechanism that help reducing the latency for such failover scenario: http://www.datastax.com/dev/blog/rapid-read-protection-in-cassandra-2-0-2
what is the client driver you're using (java ? c# ? version ?). Normally with a properly configured load balancing policy, when a node is down the client will retry automatically the query by re-routing to another replica. There is also speculative retry implemented at the driver-side : http://datastax.github.io/java-driver/manual/speculative_execution/
Edit: for a node to be marked down, the gossip protocol is using the phi accrual detector. Instead of having a binary state (UP/DOWN), the algorithm adjust the suspicion level and if the value is above a threshold, the node is considered down
This algorithm is necessary to avoid marking down a node because of a micro network issue.
Look in the cassandra.yaml file for this config:
# phi value that must be reached for a host to be marked down.
# most users should never need to adjust this.
# phi_convict_threshold: 8
Another question is: what load balancing strategy are you using from the driver ? And did you use the speculative retry policy ?
Related
The disk read/write rate and cpu usage of cassandra db intermittently bounce.
Casssandra was installed with docker, and node exporter and process exporter were used for monitoring. Node and process exporter are all installed with Docker.
I checked the process exporter at the time it bounced. The process that consumed the most resources during the bounced time has Java in the groupname. I'm guessing that there might be a problem with cassandra java.
No more special traffic came in at the time of the bounce.
It does not match the compaction cycle.
Clustering is not broken.
cassandra version is 4.0.3
In Cassandra 4 you have the ability to access swiss java knife (sjk) via nodetool and one of the things you get access to is ttop.
If you run the following in your cassandra env during the time your cpu is spiking you can see which threads are the top consumers, which then allows you to dial in on those threads specifically to see if there is an actual problem.
nodetool sjk ttop >> $(hostname -i)_ttop.out
Allow that to run to completion (during a period of reported high cpu), or at least for 5-10min or so if you decide to kill it early. This will collect a new iteration every few seconds, so once complete, parse the results to see which threads are regularly top consumers and what percentage of the cpu they are actually using, then you'll have a targeted approach at where to troubleshoot for potential problems in the jvm.
If nothing good turns up, go for a thread dump next for a more complete look and I recommend the following script:
https://github.com/brendancicchi/collect-thread-dumps
I'm operating project using spring-boot, spring-data-cassandra.
When I setup that project, I set cassandra properties by ip and port.
(referred by https://www.baeldung.com/spring-data-cassandra-tutorial)
When set it up like this, If I had 3 cassandra nodes and 1 cassandra node died, I think project should fail to connect with cassandra at a 33% probability.
But my project was fine even though 1 cassandra node was dead. (just have some error on one's deathbed)
Do It happen to have A function in spring-data-cassandra like client-side-loadbalancing?
If they have that function, Where can I see that code??
I tried to find that code but failed.
Please give me a little clue.
Spring Data Cassandra relies on the functionality of the DataStax Java driver that is responsible for making everything works. This includes:
establishing the initial connection to the cluster. This is where the contact points play their role. After driver is connected to any of points, it reads information about the whole cluster and establishes connections to all nodes (by default)
establishing the control connection that is used to receive notifications about changes in the cluster - nodes going up & down, changes in schema, etc. If node goes down or up, this information is used to modify the list of the active nodes
providing the load balancing of requests based on the replication, and nodes availability - if the node is down, it's excluded from list of candidates, so we don't send queries to node that is known to be down
We recently deployed micro-services into our production and these micro-service communicates with Cassandra nodes for reads/writes.
After deployment, we started noticing sudden drop in CPU to 0 on all cassandra nodes in primary DC. This is happening at least once per day. when this happens each time, we see randomly 2 nodes (in SAME DC) are not able to reachable to each other ("nodetool describecluster") and when we check "nodetool tpstats", these 2 nodes has higher number of ACTIVE Native-Transport-Requests b/w 100-200. Also these 2 nodes are storing HINTS for each other but when i do longer "pings" b/w them i don't see any packet loss. when we restart those 2 cassandra nodes, issue will be fixed at that moment. This is happening since 2 weeks.
We use Apache Cassandra 2.2.8.
Also microservices logs are having reads/writes timeouts before sudden drop in CPU on all cassandra nodes.
You might be using token aware load balancing policy on client, and updating a single partition or range heavily. In which case all the coordination load will be focused on the single replica set. Can change your application to use RoundRobin (or dc aware round robin) LoadBalancingPolicy and it will likely resolve. If it does you have a hotspot in your application and you might want to give attention to your data model.
It does look like a datamodel problem (hot partitions causing issues in specific replicas).
But in any case you might want to add the following to your cassandra-env.sh to see if it helps:
JVM_OPTS="$JVM_OPTS -Dcassandra.max_queued_native_transport_requests=1024"
More information about this here: https://issues.apache.org/jira/browse/CASSANDRA-11363
I have a test cluster on 3 machines where 2 are seeds all centos7 and all cassandra 3.4.
Yesterday all was fine they were chating and i had the "brilliant" idea to ....power all those machines off to simulate a power failure.
As a newbie that i am, i simply powered the machines back and i expected probably some kind of supermagic, but here it is my cluster is not up again, each individual refuses to connect.
And yes, my firewalld is disabled.
My question : what damage was made and how can i recover the cluster to the previous running state?
Since you abruptly shutdown your cluster, that simply means, nodes were not able to drain themselves.
Don't worry, it is unlikely any data loss happened because of this, as cassandra maintains commit logs, and will read from it when it is restarted.
First, find your seed node ip from cassandra.yaml
Start your seed node first.
Check the start up logs in cassandra.log and system.log and wait for it to start up completely, it will take sometime.
As it will read from commit log for pending actions, and will replay them.
Once it finishes starting up, start other nodes, and tail their log files.
I have a 5 node Cassandra cluster set up on EC2, all in the same region.
If I connect over cqlsh (9160), queries respond in under a second.
When I connect via Dev Center, or using the native Java Driver, both of which use port 9042, the queries take over 20 seconds to respond.
They consistently respond in the same 21 second region. Never fast and then slow.
I have set up a few Cassandra Clusters on EC2 and have seen this before but do not know how to fix the problem. The last time, I scrapped the cluster and built a new one and the response time on port 9042 was fine.
Any help in how to debug or fix this problem would be appreciated, thanks.
The current version of DevCenter was designed to support as main scenario running (longish) CQL scripts (vs an interactive console with queries executed one after another). DevCenter is using as an underlying connector the DataStax Java driver for Cassandra.
For the above mentioned scenario, in order to ensure there are no "conflicts", a new Session is created for each execution. When a Session is initialized, the driver performs an auto-node discovery, creates connection pools, etc. Basically it does a lot of preparation work. Depending on the latency from your client machine to the EC2 nodes, the size of the cluster and also the configuration of these nodes (see the connection requirements), this initialization phase can be quite expensive.
As you can imagine the time spent preparing wouldn't represent a large percentage of running a DDL script and a decent size of inserts/updates. But for an interactive scenario, it will result in a suboptimal behavior (the one you are describing)
The next version(s) of DevCenter will address the interactive scenario and optimize for it so the user experience would be what you'd expect. And supporting this scenario is pretty high on our list of priorities.
The underlying Java driver obtains the whole cluster topology when it initially connects. This enables it to automatically connect to any node in the cluster. On EC2 it only obtains the private addresses, tries each one, and then times out. It then sends the request over the initial connection