what happens when an entire cassandra cluster goes down - cassandra

I have a cassandra cluster having 3 nodes with replication factor of 2. But what would happen if the entire cassandra cluster goes down at the same time. How read and write can be manage in this situation and what would be the best consistency level so that i can manage my cassandra nodes for high availability, As of now i'm using QUORUM.

If your cluster is down on all nodes - it is down.
When you need HA, think of deploying more than one datacenter, so availability can be maintained even when an entire datacenter/rack goes down.
If you can live with stale data, you could use CL.ONE instead - you need only one node to respond.
More replicas also increases availability for CL.QUORUM - you need RF/2+1 nodes from your replicas alive, in case of 2 -> 2/2+1 = 2 or all your replicas need to be online. With RF=3 you sill only need 2 as 3/2+1 = 2 - now you can have one node down.
As for your writes - all acknowleged writes will be written to disk in the commitlog (if there is no caching issue on your disks) and restored when coming back online. Of course there may be a race condition where the changes are written to disk but not acked via network.
Keep in mind to setup NTP!

Related

Hazelcast High Availability in case of 3 nodes cluster

We are using Hazelcast IMDG as an in memory grid. The number of nodes in our cluster is three, and we have one sync backup and the cluster is partition aware. In that case, I expect the distributed map will be distributed across 3 nodes (more or less) homogeneously. In case of a node break down, the leadership should be transferred to a healthy node(which has the sync backup for the lost data). If there is a write request to this newly assigned leader node, the same partition should be replicated synchronously to one of the alive nodes. Does it mean that in case of node failure, approximately one third of the distributed map should be replicated and during the replication time, all reads are blocked? Availability is hit if one of three node is down in case of one sync backup till approximately one third of the distribution is restored?
If a node goes down, the cluster will promote the backup partitions to primary.
And the migrations will start to create backups of these new primary partitions.
Please check the Data Partitioning section.
During migrations, read operations are not blocked.
Only write operations are blocked on the partition that is actively migrating.
Since the partitions are migrated one by one, the effect on availability is minimal.

How cassandra improve performance by adding nodes?

I'm going build apache cassandra 3.11.X cluster with 44 nodes. Each application server will have one cluster node so that application do r/w locally.
I have couple of questions running in my mind kindly answer if possible.
1.How many server Ip's should mention in seednode parameter?
2.How HA works when all the mentioned seed node goes down?
3.What is the dis-advantage to mention all the serverIP's in seednode parameter?
4.How cassandra scales with respect to data other than(Primary key and Tunable consistency). As per my assumption replication factor can improve HA chances but not performances.
then how performance will increase by adding more nodes?
5.Is there any sharding mechanism in Cassandra.
Answers are in order:
It's recommended to point to at least to 2 nodes per DC
Seed/contact node is used only for initial bootstrap - when your program reaches any of listed nodes, it "learns" the topology of whole cluster, and then driver listens for nodes status change, and adjust a list of available hosts. So even if seed node(s) goes down after connection is already established, driver will able to reach other nodes
it's harder to maintain usually - you need to keep a configuration parameters for your driver & list of nodes in sync.
When you have RF > 1, Cassandra may read or write data from/to any replica. Consistency level regulates how many nodes should return answer for read or write operation. When you add the new node, the data is redistributed to new node, and if you have correctly selected partition key, then new node start to receive requests in parallel to old nodes
Partition key is responsible for selection of replica(s) that will hold data associated with it - you can see it as a shard. But you need to be careful with selection of partition key - it's easy to create too big partitions, or partitions that will be "hot" (receiving most of operations in cluster - for example, if you're using the date as partition key, and always writing reading data for today).
P.S. I would recommend to read DataStax Architecture guide - it contains a lot of information about Cassandra as well...

cassandra nodes are unresponsive and "Native-Transport-Requests" are high only on 2 nodes

We recently deployed micro-services into our production and these micro-service communicates with Cassandra nodes for reads/writes.
After deployment, we started noticing sudden drop in CPU to 0 on all cassandra nodes in primary DC. This is happening at least once per day. when this happens each time, we see randomly 2 nodes (in SAME DC) are not able to reachable to each other ("nodetool describecluster") and when we check "nodetool tpstats", these 2 nodes has higher number of ACTIVE Native-Transport-Requests b/w 100-200. Also these 2 nodes are storing HINTS for each other but when i do longer "pings" b/w them i don't see any packet loss. when we restart those 2 cassandra nodes, issue will be fixed at that moment. This is happening since 2 weeks.
We use Apache Cassandra 2.2.8.
Also microservices logs are having reads/writes timeouts before sudden drop in CPU on all cassandra nodes.
You might be using token aware load balancing policy on client, and updating a single partition or range heavily. In which case all the coordination load will be focused on the single replica set. Can change your application to use RoundRobin (or dc aware round robin) LoadBalancingPolicy and it will likely resolve. If it does you have a hotspot in your application and you might want to give attention to your data model.
It does look like a datamodel problem (hot partitions causing issues in specific replicas).
But in any case you might want to add the following to your cassandra-env.sh to see if it helps:
JVM_OPTS="$JVM_OPTS -Dcassandra.max_queued_native_transport_requests=1024"
More information about this here: https://issues.apache.org/jira/browse/CASSANDRA-11363

Cassandra reads slow with multiple nodes

I have a three node Cassandra cluster with version 2.0.5.
RF=3 and all data is synced to all three nodes.
I read from cqlsh with Consistency=ONE.
When I bring down two of the nodes my reads are twice as fast than when I have the entire cluster up.
Tracing from cqlsh shows that the slow down on the reads with a full cluster up occurs when a request is forwarded to other nodes.
All nodes are local to the same datacenter and there is no other activity on the system.
So, why are requests sometimes forwarded to other nodes?
Even for the exact same key if I repeat the same query multiple times I see that sometimes the query executes on the local node and sometimes it gets forwarded and then becomes very slow.
Assuming that the cluster isn't overloaded, Cassandra should always prefer to do local reads when possible. Can you create a bug report at https://issues.apache.org/jira/browse/CASSANDRA ?
This is due to read repair.
By default read repair is applied for all the read with consistency level quorum or with 10% chance for lower consistency levels, that's why for consistency level one sometimes you see more activity and sometime less activity.

Best way to shrink a Cassandra cluster

So there is a fair amount of documentation on how to scale up a Cassandra, but is there a good resource on how to "unscale" Cassandra and remove nodes from the cluster? Is it as simple as turning off a node, letting the cluster sync up again, and repeating?
The reason is for a site that expects high spikes of traffic, climbing from the daily few thousand hits to hundreds of thousands over a few days. The site will be "ramped up" before hand, starting up multiple instances of the web server, Cassandra, etc. After the torrent of requests subsides, the goal is to turn off the instances that are not longer used, rather than pay for servers that are just sitting around.
If you just shut the nodes down and rebalance cluster, you risk losing some data, that exist only on removed nodes and hasn't replicated yet.
Safe cluster shrink can be easily done with nodetool. At first, run:
nodetool drain
... on the node removed, to stop accepting writes and flush memtables, then:
nodetool decommission
To move node's data to other nodes, and then shut the node down, and run on some other node:
nodetool removetoken
... to remove the node from the cluster completely. The detailed documentation might be found here: http://wiki.apache.org/cassandra/NodeTool
From my experience, I'd recommend to remove nodes one-by-one, not in batches. It takes more time, but much more safe in case of network outages or hardware failures.
When you remove nodes you may have to re-balance the cluster, moving some nodes to a new token. In a planed downscale, you need to:
1 - minimize the number of moves.
2 - if you have to move a node, minimize the amount of transfered data.
There's an article about cluster balancing that may be helpful:
Balancing Your Cassandra Cluster
Also, the begining of this video is about add node and remove node operations and best strategies to minimize the cluster impact in each of these operations.
Hopefully, these 2 references will give you enough information to plan your downscale.
First, on the node, which will be removed, flush memory (memtable) to SSTables on disk:
-nodetool flush
Second, run command to leave a cluster:
-nodetool decommission
This command will assign ranges that the node was responsible for to other nodes and replicates the data appropriately.
To monitor a process you can use command:
- nodetool netstats
Found an article on how to remove nodes from Cassandra. It was helpful for me scaling down cassandra.All actions are described step-by-step there.

Resources