Cassandra: hinted handoff in case of multiple nodes down - cassandra

Cassandra uses the concept of hinted handoff for consistency.
It means that if a node is down, the coordinator takes note of it and waits till it's up, and then resends the write request to it.
Does it mean that Cassandra sends success response back to the client even while it's waiting for the unavailable node to be up? If yes, then what if all of the target nodes were down? Won't it mean a successful response to the client even without a single write?

Hints are not stored if consistency cannot be acheived
For example consider you have 3 replicas and all nodes are down. In this case if write consistency is quorum then hints will not get stored and write will fail. Hints get stored only when one node is down and coordinator got success response from two nodes.
Only exception is write consistency ANY. In this case even if all replicas are down hint will get stored and write will be successful.

Related

Cassandra timeout during write query but entry present in Datebase

We are using Cassandra 3.0 on our system. For insertion in the db, we are using the Datastax C# driver.
We have a query regarding the timeout and retry during insertion. We faced an instance where a timeout during insert was thrown yet there is that entry present in the database. All are settings are default in the Cassandra.yaml file as well as in the driver.
How can we know the actual status of the insert even if there is a timeout? If there was a timeout thrown, how could possibly the insert have gone through ahead? Whether the insert was successful or there was some default retry policy in place that was applied, we don't have any tangible answer on it currently and we need to know exactly about that.
How do we make sure that the status of that insertion was actually successful/failed with or without the timeout?
A write timeout is not necessarily a failure to write, moreover it's a notification that not enough replicas acknowledged the write within a time period. The write will still eventually happen on all replicas.
If you do observe a write timeout, it indicates that not enough replicas responded for the configured consistency level within the configured write_request_timeout_in_ms value in cassandra.yaml, the default being 2 seconds. Keep in mind however that the write will still happen.
The coordinating Cassandra node responsible for that write sends write mutations to all replicas and responds to the client as soon as enough have replied or the timeout is reached. Because of this, if you get a WriteTimeoutException you should assume the write happened. If any of the replicas are down, the coordinator maintains a hint for that write, which will be delivered to the replica when it becomes available again.
Cassandra also employs Read Repairs and Operators should run recurring Repairs to help keep data consistent.
If your operations are idempotent, you can simply retry the write until it succeeds. Or you can attempt to read the data back to make sure the write was processed. However, depending on your application requirements, you may not need to employ these strategies and you can safely assume the write did or will happen.
Please note on the other hand that unavailable errors (i.e. Not enough replicas available at consistency level X) indicate that not enough replicas were available to perform a write and therefore the write is never attempted.

Cassandra DB - Node is down and a request is made to fetch data in that Node

If we configured our replication factor in such a way that there are no replica nodes (Data is stored in one place/Node only) and if the Node contains requested data is down, How will the request be handled by Cassandra DB?
Will it return no data or Other nodes gossip and somehow pick up data from failed Node(Storage) and send the required response? If data is picked up, Will data transfer between nodes happen as soon as Node is down(GOSSIP PROTOCOL) or after a request is made?
Have researched for long time on how GOSSIP happens and high availability of Cassandra but was wondering availability of data in case of "No Replicas" since I do not want to waste additional Storage for occasional failures and at the same time, I need availability and No data loss(though delayed)
I assume when you say that there is "no replica nodes" you mean that you have set the Replication Factor=1. In this case if the request is a Read then it will fail, if the request is a write it will be stored as a hint, up to the maximum hint time, and will be replayed. If the node is down for longer than the hint time then that write will be lost. Hinted Handoff: repair during write path
In general only having a single replica of data in your C* cluster goes against some the basic design of how C* is to be used and is an anti-pattern. Data duplication is a normal and expected part of using C* and is what allows for it's high availability aspects. Having an RF=1 introduces a single point of failure into the system as the server containing that data can go out for any of a variety of reasons (including things like maintenance) which will cause requests to fail.
If you are truly looking for a solution that provides high availability and no data loss then you need to increase your replication factor (the standard I usually see is RF=3) and setup your clusters hardware in such a manner as to reduce/remove potential single points of failure.

how does cassandra reacts when a write is being performed in node and node went down

We have 2 node Cassandra cluster. Replication factor is 1 and consistency level is 1. We are not using replication as the data we are inserting is very huge for each record.
How does Cassandra reacts when a node is down when write is being performed in that node? We are using Hector API from Java client.
My understanding is that Cassandra will perform the write to other node which is running.
No, using CL.ONE the write would not be performed if the inserted data belongs to the tokenrange of the downed node. The consistency level defines how many replica nodes have to respond to accept the request.
If you want to be able to write, even if the replica node is down, you need to use CL.ANY. ANY will make sure that the coordinator stores a hint for the request. Hints are stored in System.Hints table. After the replica comes back up again, all hints will be processed and sent to the upcoming node.
Edit
You will receive the following error:
com.datastax.driver.core.exceptions.UnavailableException: Not enough replica available for query at consistency ONE (1 required but only 0 alive)

How data will be consistent on cassandra cluster

I have a doubt when i read datastax documentation about cassandra write consistency. I have a question on how cassandra will maintain consistent state on following scenario:
Write consistency level = Quorum
replication factor = 3
As per docs, When a write occurs coordinator node will send this write request to all replicas in a cluster. If one replica succeed and other fails then coordinator node will send error response back to the client but node-1 successfully written the data and that will not be rolled back.
In this case,
Will read-repair (or hinted-handoff or nodetool repair) replicate the inconsistent data from node-1 to node-2 and node-3?
If not how will cassandra takes care of not replicating inconsistent data to other replicas?
Can you please clarify my question
You are completely right, the read repair or other methods will update the node-2 and node-3.
This means even the failed write will eventually update other nodes (if at least one succeeded). Cassandra doesn't have anything like rollback that relational databases have.
I don't see there is anything wrong - the system does what you tell it, i.e., two override one, and since the error messages sent back to the client as "fail", then the ultimate status should be "fail" by read repair tool.
Cassandra Coordinator node maintains the failed replica data in its storage and it will retry periodically (3 times or so) then if it succeeds then it will send the latest data, otherwise it will truncate the data in its storage.
In case of any read query, Coordinator node sends requests to all the replica nodes, and it will compare the results from all the replica nodes. If one of the replica node is not sending the latest data, then it will send read repair command to that node in order to keep the nodes in sync.

Cassandra - reading with consistency level ONE

How is reading with CL ONE implemented by Cassandra?
Does coordinator query all replicas and waits for the first to answer?
According to documentation, coordinator should query one single closest replica. What happens if timeout occurs during this query - does it try another replica, or it returns error to client?
Does coordinator query all replicas and waits for the first to answer?
As you mentioned, it queries the closest node, as determined by the snitch.
What happens if timeout occurs during this query
There is additional documentation on the Dynamic Snitch, which states that:
By default, all snitches also use a dynamic snitch layer that monitors
read latency and, when possible, routes requests away from
poorly-performing nodes.
By that definition, if the node chosen by the snitch should fail, the snitch should route the transaction to the [next] closest node.
Note that as of 2.0.2, Cassandra has a feature called Rapid Read Protection, which:
[A]llows Cassandra to tolerate node failure without dropping a single request

Resources