How failure detection and recovery mechanism in cassandra works? - cassandra

To all Cassandra experts,
I am trying to understand cassandra failure detection and recovery. I am a little bit confused on how this exactly works.
From Datastax Doc:
Configuring the phi_convict_threshold property adjusts the sensitivity of the failure detector. Lower values increase the likelihood that an unresponsive node will be marked as down, while higher values decrease the likelihood that transient failures causing node failure. In unstable network environments (such as EC2 at times), raising the value to 10 or 12 helps prevent false failures.
From http://ljungblad.nu/post/44006928392/cassandra-and-its-accrual-failure-detector
Phi represents the likelihood that Node A is wrong about Node B’s state.The higher the Phi, the bigger the confidence that Node B has failed.
Can someone explain me in details C* failure detection mechanism and how C* recovers it in different scenarios.
Thanks in advance
Chaity

I don't consider myself a Cassandra expert, but here is my take on Cassandra's node failure detection :
Once per second, each node contacts 1-3 other nodes asking about the node state and location. These time-stamped messages are past of the Gossip protocol.
The Snitch informs the partitioner of a node's rack and data center topology. A dynamic snitch can detect if nodes are functioning at poor performance (read and write) levels and not perform read or write operations until it is functioning properly.
Hinted Handoff is a recovery mechanism for partition writes targeting offline nodes. The Coordinator stores whether or not each node on the write path acknowledges the write operation and stores the hint in the system.hints table. The write is re-attempted if the target node comes back online.
All of these communication methods work together when nodes go offline or are performing poorly, and can be configured. As far as I know, Cassandra will not bring nodes back to life after failure; this requires human intervention to bring the node back online and run nodetool to repair the data on the failed node.
Depending on your organization's failure tolerance for read and write operations, you can always configure the consistency level.
Some resources for managing node failure:
(Check your C* version first) DataStax Failure detection and recovery
C* High Availability from Planet Cassandra
Configuring Consistency Level

Related

cassandra enable hints and repair

I am adding a new node to my cassandra cluster which is currently 5 nodes. The nodes have hints turned on and I am also running repairs using cassandra reaper. When adding the node node, the node addition is taking foreever and the other nodes are becoming unresponsive. I am running cassandra 3.11.13.
questions
As I understand hints are used to make sure writes are correctly propagated to all replicas
Cassandra is designed to remain available if one of it’s nodes is down or unreachable. However, when a node is down or unreachable, it needs to eventually discover the writes it missed. Hints attempt to inform a node of missed writes, but are a best effort, and aren’t guaranteed to inform a node of 100% of the writes it missed.
repairs do something similar
Repair synchronizes the data between nodes by comparing their respective datasets for their common token ranges, and streaming the differences for any out of sync sections between the nodes.
If I am running repairs with cassandra reaper, do I need to disable hints?
If hints are enabled and repairs are carried. Does it cause double writes of data in nodes?
Is it okay to carry repair while a node is joining?

Cassandra availability penalty in strong consistency mode

as I got Cassandra has ALL consistency level. It provides: "the highest consistency and the lowest availability". If this level provides strong consistency?
What availability penalty for it? I don't see a case when data won't be availabile. Could anyone give example of a such case.
If you use a consistency level of ALL then the coordinator must receive a response from all nodes. This means that:
After a successful write, nobody will read the previous state (high consistency).
If even a single node fails to respond, the whole read/write operation will fail (low availability).
For further reading, see the CAP theorem.
Could anyone give example of a such case.
A node is disconnected for maintenance.
A node crashes.
The power goes out in the server room / datacentre.
A node becomes unresponsive due to high load.
The network connection to a node goes down or becomes too slow.
Data has not yet propagated to all nodes.

Can a Cassandra cluster serve as a replacement for an in-memory Redis key-value store?

My application crawls user's mailbox and saves it to an RDBMS database. I started using Redis as a cache (simple key-value store) for RDBMS database. But gradually I started storing crawler states and other data in Redis that needs to be persistent. Loosing this data means a few hours of downtime. I must ensure airtight consistency for this data. The data should not be lost in node failures or split brain scenarios. Strong consistency is a must. Sharding is done by my application. One Redis process runs on each of ten EC2 m4.large instances. On each of these instances. I am doing up to 20K IOPS to Redis. I am doing more writes than reads, though I have not determined the actual percentage of both. All my data is completely in memory, not backed by disk.
My only problem is each of these instances are SPOF. I cannot use Redis cluster as it does not guarantee consistency. I have evaluated a few more tools like Aerospike, none gives 'No data loss guarantee'.
Cassandra looks promising as I can tune the consistency level I want. I plan to use Cassandra with a replication factor 2, and a write must be written to both the replicas before considered committed. This gives 'No data loss guarantee.
By launching enough cassandra nodes (ssd backed) can I replace my Redis key-value store and still get similar read/write IOPS and
latency? Will opensource cassandra suffice my use case? If not, will the Datastax enterprise In-Memory version solve it?
EDIT 1:
A bit of clarification:
I think I need to use Write consistency level 'ALL' and Read consistency level 'One'. I understand that with this consistency level my cluster will not tolerate any failure. That is OK for me. A few minutes of downtime occasionally is not a problem, as long as my data is consistent. In my present setup, one Redis instance failure causes a few hours of downtime.
I must ensure airtight consistency for this data.
Cassandra deals with failure better when there are more nodes. Assuming your case allows for having more nodes, this is my suggestion.
So, if you have 5 nodes, use CL of QUORUM for both READ and WRITE. What it means is that you always write to at least 3 nodes and read from 3 nodes.(for 5 nodes , QUORUM is 3).
This ensures a very high level consistency
Also ensures limited downtime. Even if a node is down your writes and reads won't break.
If you use CL ALL, then even if one node is down or overloaded, you will have to take a full downtime.
I hope it helps!

when does Cassandra node fail?

How does cassandra guarantee no failure of node at any given point of time,i know data is replicated so there might not be issues of losing the data
Cassandra nodes can fail due to alot of reasons like, very heavy write, out of memory error, hardware failure, tombstone limit 100k error, compaction failures, network errors, and so on.
Cassandra cannot guarantee no failure of node, because it just like any other software is vulnerable to dependent component and hardware.
What it does guarantee is that you won't have data loss, until you have minimum number of required nodes up and running, based on replication factor.
Cassandra could not guarantee no failure of nodes like any other systems, but with a correct setup of cassandra cluster, with enough number of nodes and replicas configured, even some of the nodes down, the entire cluster will still be available and no data lost, which could be transparent to clients. Clients will not realize it.

Configure cassandra to use different network interfaces for data streaming and client connection?

I have a cassandra cluster deployed with 3 cassandra nodes with replication factor of 3. I have a lot of data being written to cassandra on daily basis (10-15GB). I have provisioned these cassandra on commodity hardware as suggested by "Big data community" and I am expecting the nodes to go down frequently which is handled using redundancy provided by cassandra.
My problem is, I have observed cassandra to slow down with writes when a new node is provisioned and the data is being streamed while bootstrapping. So, to overcome this hurdle, We have decided to have a separate network interface for inter-node communication and for client application to write data to cassandra. My question is how can this be configured, if at all this is possible ?
Any help is appreciated.
I think you are chasing the wrong solution.
I am confused by the fact that you only have 3 nodes, yet your concern is around slow writes while bootstrapping. Why? Are you planning to grow your cluster regularly? What is your consistency level on write, as this has a big impact on performance? Obviously if you only have 2 or 3 nodes and you're trying to bootstrap, you will see a slowdown, because you're tying up a significant percentage of your cluster to do the streaming.
Note that "commodity hardware" doesn't mean cheap, low-performance hardware. It just means you don't need the super high-end database-class machines used for databases like Oracle. You should still use really good commodity hardware. You may also need more nodes, as setting RF equal to cluster size is not typically a great idea.
Having said that, you can set your listen_address to the inter-node interface and rpc_address to the client address if you feel that will help.

Resources