Cassandra - reading with consistency level ONE - cassandra

How is reading with CL ONE implemented by Cassandra?
Does coordinator query all replicas and waits for the first to answer?
According to documentation, coordinator should query one single closest replica. What happens if timeout occurs during this query - does it try another replica, or it returns error to client?

Does coordinator query all replicas and waits for the first to answer?
As you mentioned, it queries the closest node, as determined by the snitch.
What happens if timeout occurs during this query
There is additional documentation on the Dynamic Snitch, which states that:
By default, all snitches also use a dynamic snitch layer that monitors
read latency and, when possible, routes requests away from
poorly-performing nodes.
By that definition, if the node chosen by the snitch should fail, the snitch should route the transaction to the [next] closest node.
Note that as of 2.0.2, Cassandra has a feature called Rapid Read Protection, which:
[A]llows Cassandra to tolerate node failure without dropping a single request

Related

Cassandra: hinted handoff in case of multiple nodes down

Cassandra uses the concept of hinted handoff for consistency.
It means that if a node is down, the coordinator takes note of it and waits till it's up, and then resends the write request to it.
Does it mean that Cassandra sends success response back to the client even while it's waiting for the unavailable node to be up? If yes, then what if all of the target nodes were down? Won't it mean a successful response to the client even without a single write?
Hints are not stored if consistency cannot be acheived
For example consider you have 3 replicas and all nodes are down. In this case if write consistency is quorum then hints will not get stored and write will fail. Hints get stored only when one node is down and coordinator got success response from two nodes.
Only exception is write consistency ANY. In this case even if all replicas are down hint will get stored and write will be successful.

How is the coordinator node in cassandra determined by a client driver? [duplicate]

This question already has answers here:
how Cassandra chooses the coordinator node and the replication nodes?
(2 answers)
Closed 3 years ago.
I don't understand the load balancing algorithm in cassandra.
It seems that the TokenAwarePolicy can be used to route my request to the coordinator node holding the data. In particular, the documentation states (https://docs.datastax.com/en/developer/java-driver/3.6/manual/load_balancing/) that it works when the driver is able to automatically calculate a routing-key. If it can, I am routed to the coordinator node holding the data, if not, I am routed to another node. I can still specify the routing-key myself if I really want to reach the data without any extra hop.
What does not make sense to me:
If the driver cannot calculate the routing-key automatically, then why can the coordinator do this? Does it have more information than the client driver? Or does the coordinator node then ask every other node in the cluster on my behalf? This would then not scale, right?
I thought that the gossip protocol is used to share the topology of the ring among all nodes (AND the client driver). The client driver than has the complete 'ring' structure and should be equal to any 'hop' node.
Load balancing makes sense to me when the client driver determines the N replicas holding the data, and then prioritizes them (host-distance, etc), but it doesn't make sense to me when I reach a random node that is unlikey to have my data.
Token aware load balancing happens only for that statements that are able to hold routing information. For example, for prepared queries, driver receives information from cluster about fields in query, and has information about partition key(s), so it's able to calculate token for data, and select the node. You can also specify the routing key youself, and driver will send request to corresponding node.
It's all explained in the documentation:
For simple statements, routing information can never be computed automatically
For built statements, the keyspace is available if it was provided while building the query; the routing key is available only if the statement was built using the table metadata, and all components of the partition key appear in the query
For bound statements, the keyspace is always available; the routing key is only available if all components of the partition key are bound as variables
For batch statements, the routing information of each child statement is inspected; the first non-null keyspace is used as the keyspace of the batch, and the first non-null routing key as its routing key
When statement doesn't have routing information, the request is sent to node selected by nested load balancing policy, and the node-coordinator, performs parsing of the statement, extract necessary information and calculates the token, and forwards request to correct node.

Cassandra DB - Node is down and a request is made to fetch data in that Node

If we configured our replication factor in such a way that there are no replica nodes (Data is stored in one place/Node only) and if the Node contains requested data is down, How will the request be handled by Cassandra DB?
Will it return no data or Other nodes gossip and somehow pick up data from failed Node(Storage) and send the required response? If data is picked up, Will data transfer between nodes happen as soon as Node is down(GOSSIP PROTOCOL) or after a request is made?
Have researched for long time on how GOSSIP happens and high availability of Cassandra but was wondering availability of data in case of "No Replicas" since I do not want to waste additional Storage for occasional failures and at the same time, I need availability and No data loss(though delayed)
I assume when you say that there is "no replica nodes" you mean that you have set the Replication Factor=1. In this case if the request is a Read then it will fail, if the request is a write it will be stored as a hint, up to the maximum hint time, and will be replayed. If the node is down for longer than the hint time then that write will be lost. Hinted Handoff: repair during write path
In general only having a single replica of data in your C* cluster goes against some the basic design of how C* is to be used and is an anti-pattern. Data duplication is a normal and expected part of using C* and is what allows for it's high availability aspects. Having an RF=1 introduces a single point of failure into the system as the server containing that data can go out for any of a variety of reasons (including things like maintenance) which will cause requests to fail.
If you are truly looking for a solution that provides high availability and no data loss then you need to increase your replication factor (the standard I usually see is RF=3) and setup your clusters hardware in such a manner as to reduce/remove potential single points of failure.

how does cassandra reacts when a write is being performed in node and node went down

We have 2 node Cassandra cluster. Replication factor is 1 and consistency level is 1. We are not using replication as the data we are inserting is very huge for each record.
How does Cassandra reacts when a node is down when write is being performed in that node? We are using Hector API from Java client.
My understanding is that Cassandra will perform the write to other node which is running.
No, using CL.ONE the write would not be performed if the inserted data belongs to the tokenrange of the downed node. The consistency level defines how many replica nodes have to respond to accept the request.
If you want to be able to write, even if the replica node is down, you need to use CL.ANY. ANY will make sure that the coordinator stores a hint for the request. Hints are stored in System.Hints table. After the replica comes back up again, all hints will be processed and sent to the upcoming node.
Edit
You will receive the following error:
com.datastax.driver.core.exceptions.UnavailableException: Not enough replica available for query at consistency ONE (1 required but only 0 alive)

How data will be consistent on cassandra cluster

I have a doubt when i read datastax documentation about cassandra write consistency. I have a question on how cassandra will maintain consistent state on following scenario:
Write consistency level = Quorum
replication factor = 3
As per docs, When a write occurs coordinator node will send this write request to all replicas in a cluster. If one replica succeed and other fails then coordinator node will send error response back to the client but node-1 successfully written the data and that will not be rolled back.
In this case,
Will read-repair (or hinted-handoff or nodetool repair) replicate the inconsistent data from node-1 to node-2 and node-3?
If not how will cassandra takes care of not replicating inconsistent data to other replicas?
Can you please clarify my question
You are completely right, the read repair or other methods will update the node-2 and node-3.
This means even the failed write will eventually update other nodes (if at least one succeeded). Cassandra doesn't have anything like rollback that relational databases have.
I don't see there is anything wrong - the system does what you tell it, i.e., two override one, and since the error messages sent back to the client as "fail", then the ultimate status should be "fail" by read repair tool.
Cassandra Coordinator node maintains the failed replica data in its storage and it will retry periodically (3 times or so) then if it succeeds then it will send the latest data, otherwise it will truncate the data in its storage.
In case of any read query, Coordinator node sends requests to all the replica nodes, and it will compare the results from all the replica nodes. If one of the replica node is not sending the latest data, then it will send read repair command to that node in order to keep the nodes in sync.

Resources