How does cassandra handle file system Partitions - cassandra

My Situation:
I have a server with multiple hard disks.
If i install cassandra(2.1.9) on the server and use all the hard disks.
What happens if one hard disk goes down?
Will it black list only that (Hard disk)partition and move the partitions(cassandra partitions) to other nodes or to the system partitions on same node.
Will it treat as if the entire node went down.

The behavior is configured in cassandra.yaml using the disk_failure_policy setting. See documentation here.
disk_failure_policy: (Default: stop) Sets how Cassandra responds to disk failure.
Recommend settings are stop or best_effort.
die - Shut down gossip and Thrift and kill the JVM for any file system errors
or single SSTable errors, so the node can be replaced.
stop_paranoid - Shut down gossip and Thrift even for single SSTable errors.
stop - Shut down gossip and Thrift, leaving the node effectively dead,
but available for inspection using JMX.
best_effort - Stop using the failed disk and respond to requests based on
the remaining available SSTables. This means you will see obsolete data
at consistency level of ONE.
ignore - Ignores fatal errors and lets the requests fail; all file system
errors are logged but otherwise ignored. Cassandra acts as in versions
prior to 1.2.
You can find documentation on how to recover from a disk failure here. Cassandra will not automatically move data from a failed disk to the good disks. It requires manual intervention to correct the problem.

Related

The disk read/write rate and cpu usage of cassandra db intermittently bounce

The disk read/write rate and cpu usage of cassandra db intermittently bounce.
Casssandra was installed with docker, and node exporter and process exporter were used for monitoring. Node and process exporter are all installed with Docker.
I checked the process exporter at the time it bounced. The process that consumed the most resources during the bounced time has Java in the groupname. I'm guessing that there might be a problem with cassandra java.
No more special traffic came in at the time of the bounce.
It does not match the compaction cycle.
Clustering is not broken.
cassandra version is 4.0.3
In Cassandra 4 you have the ability to access swiss java knife (sjk) via nodetool and one of the things you get access to is ttop.
If you run the following in your cassandra env during the time your cpu is spiking you can see which threads are the top consumers, which then allows you to dial in on those threads specifically to see if there is an actual problem.
nodetool sjk ttop >> $(hostname -i)_ttop.out
Allow that to run to completion (during a period of reported high cpu), or at least for 5-10min or so if you decide to kill it early. This will collect a new iteration every few seconds, so once complete, parse the results to see which threads are regularly top consumers and what percentage of the cpu they are actually using, then you'll have a targeted approach at where to troubleshoot for potential problems in the jvm.
If nothing good turns up, go for a thread dump next for a more complete look and I recommend the following script:
https://github.com/brendancicchi/collect-thread-dumps

cassandra enable hints and repair

I am adding a new node to my cassandra cluster which is currently 5 nodes. The nodes have hints turned on and I am also running repairs using cassandra reaper. When adding the node node, the node addition is taking foreever and the other nodes are becoming unresponsive. I am running cassandra 3.11.13.
questions
As I understand hints are used to make sure writes are correctly propagated to all replicas
Cassandra is designed to remain available if one of it’s nodes is down or unreachable. However, when a node is down or unreachable, it needs to eventually discover the writes it missed. Hints attempt to inform a node of missed writes, but are a best effort, and aren’t guaranteed to inform a node of 100% of the writes it missed.
repairs do something similar
Repair synchronizes the data between nodes by comparing their respective datasets for their common token ranges, and streaming the differences for any out of sync sections between the nodes.
If I am running repairs with cassandra reaper, do I need to disable hints?
If hints are enabled and repairs are carried. Does it cause double writes of data in nodes?
Is it okay to carry repair while a node is joining?

Apache Ignite 2.9.0 Recover from lost partitions

We have setup a apache ignite 2.9.0 cluster with native persistence using kubernetes in Azure with 4 nodes. To update some cache configuration, we restarted all the ignite nodes. After restart, running any sql query on one particular table - results in restart of 2 ignite nodes and after that we see lost partitions exception.
If we try to restart all nodes to recover from lost partitions, then its fine until we run any sql query on that table after which 2 nodes restart and we get lost partitions exception.
Is there anyway we can recover from lost partitions and overcome this problem? We also wanted to understand why its occuring?We could not find any logs related to this.
When all partition owners left the grid, the partition is considered to be lost, you might think of this as a special internal marker. Depending on the PartitionLossPolicy Ignite might ignore this fact and allow cache operations or disallow them to protect data consistency.
If you use native persistence, then most likely there was no physical data loss and all you need is to tell Ignite that you are aware of the situation, now all data are in place and it's safe to remove the "lost" mark from the partitions.
I think the most simple way to handle this would be to use the control script from within a pod:
control.sh --cache reset_lost_partitions cacheName1,cacheName2,...
More details:
https://ignite.apache.org/docs/latest/configuring-caches/partition-loss-policy#handling-partition-loss

How failure detection and recovery mechanism in cassandra works?

To all Cassandra experts,
I am trying to understand cassandra failure detection and recovery. I am a little bit confused on how this exactly works.
From Datastax Doc:
Configuring the phi_convict_threshold property adjusts the sensitivity of the failure detector. Lower values increase the likelihood that an unresponsive node will be marked as down, while higher values decrease the likelihood that transient failures causing node failure. In unstable network environments (such as EC2 at times), raising the value to 10 or 12 helps prevent false failures.
From http://ljungblad.nu/post/44006928392/cassandra-and-its-accrual-failure-detector
Phi represents the likelihood that Node A is wrong about Node B’s state.The higher the Phi, the bigger the confidence that Node B has failed.
Can someone explain me in details C* failure detection mechanism and how C* recovers it in different scenarios.
Thanks in advance
Chaity
I don't consider myself a Cassandra expert, but here is my take on Cassandra's node failure detection :
Once per second, each node contacts 1-3 other nodes asking about the node state and location. These time-stamped messages are past of the Gossip protocol.
The Snitch informs the partitioner of a node's rack and data center topology. A dynamic snitch can detect if nodes are functioning at poor performance (read and write) levels and not perform read or write operations until it is functioning properly.
Hinted Handoff is a recovery mechanism for partition writes targeting offline nodes. The Coordinator stores whether or not each node on the write path acknowledges the write operation and stores the hint in the system.hints table. The write is re-attempted if the target node comes back online.
All of these communication methods work together when nodes go offline or are performing poorly, and can be configured. As far as I know, Cassandra will not bring nodes back to life after failure; this requires human intervention to bring the node back online and run nodetool to repair the data on the failed node.
Depending on your organization's failure tolerance for read and write operations, you can always configure the consistency level.
Some resources for managing node failure:
(Check your C* version first) DataStax Failure detection and recovery
C* High Availability from Planet Cassandra
Configuring Consistency Level

Best way to shrink a Cassandra cluster

So there is a fair amount of documentation on how to scale up a Cassandra, but is there a good resource on how to "unscale" Cassandra and remove nodes from the cluster? Is it as simple as turning off a node, letting the cluster sync up again, and repeating?
The reason is for a site that expects high spikes of traffic, climbing from the daily few thousand hits to hundreds of thousands over a few days. The site will be "ramped up" before hand, starting up multiple instances of the web server, Cassandra, etc. After the torrent of requests subsides, the goal is to turn off the instances that are not longer used, rather than pay for servers that are just sitting around.
If you just shut the nodes down and rebalance cluster, you risk losing some data, that exist only on removed nodes and hasn't replicated yet.
Safe cluster shrink can be easily done with nodetool. At first, run:
nodetool drain
... on the node removed, to stop accepting writes and flush memtables, then:
nodetool decommission
To move node's data to other nodes, and then shut the node down, and run on some other node:
nodetool removetoken
... to remove the node from the cluster completely. The detailed documentation might be found here: http://wiki.apache.org/cassandra/NodeTool
From my experience, I'd recommend to remove nodes one-by-one, not in batches. It takes more time, but much more safe in case of network outages or hardware failures.
When you remove nodes you may have to re-balance the cluster, moving some nodes to a new token. In a planed downscale, you need to:
1 - minimize the number of moves.
2 - if you have to move a node, minimize the amount of transfered data.
There's an article about cluster balancing that may be helpful:
Balancing Your Cassandra Cluster
Also, the begining of this video is about add node and remove node operations and best strategies to minimize the cluster impact in each of these operations.
Hopefully, these 2 references will give you enough information to plan your downscale.
First, on the node, which will be removed, flush memory (memtable) to SSTables on disk:
-nodetool flush
Second, run command to leave a cluster:
-nodetool decommission
This command will assign ranges that the node was responsible for to other nodes and replicates the data appropriately.
To monitor a process you can use command:
- nodetool netstats
Found an article on how to remove nodes from Cassandra. It was helpful for me scaling down cassandra.All actions are described step-by-step there.

Resources