would Cassandra write be successful in the following situation? - cassandra

I have two datacentres with 3 machines each. the replication factor is DC1:3, DC2:3 and all the inserts are made with write consistency = all.
So all data exists on all nodes (this is done to get reads to be the fastest).
But are there other problems with this set up that I might be missing? (except for writes being slow which im fine with)
For example, if a single node is down would all my writes fail? (Or can cassandra note down the writes for the failed node somewhere and bring it up to speed once its up?)

If a single node were down, then all your writes would fail. The consistency level specifies how many replicas you require for the write to be successful. So if you say ALL, and every node is a replica, then all the nodes would need to be up for it to succeed.
Usually you would do your writes with a lower consistency, like ONE. Cassandra will still write the data to all the nodes if they are up. If some of them are down, then the data may still get written to them (once they are back up) via hinted handoffs, read repair chance, and scheduled repairs.

Related

How to determine the sync status is up to date for particular node in a Cassandra cluster?

Suppose I have two node cassandra cluster and they are reside on physically different data-centers. Suppose the database inside that cluster has replication factor is 2 which means every data in that database should be sync with each other. suppose this database is a massive database which have millions of records of its tables. I named those nodes centers as node1 and node2. Suppose node2 is not reliable and there was a crash on that server and take few days to fix and get the server back to up and running state. After that according to my understating there should be a gap between node1 and node2 and it may take significant time to sync node2 with node1. So need a way to measure the gap between node2 and node1 for the mean time of sync happen? After some times how should I assure that node2 is equal to node1? Please correct me if im wrong with this question according to the cassandra architechure.
So let's start with your description. 2 node cluster, which sounds fine, but 2 nodes in 2 different data centers (DCs) - bad design, but doable. Each data center should have multiple nodes to ensure your data is highly available. Anyway, that aside, let's assume you have a 2 node cluster with 1 node in each DC. The replication factor (RF) is defined at the keyspace level (not at the cluster level - each DC will have a RF setting for a particular keyspace (or 0 if not specified for a particular DC)). That being said, you can't have RF=2 for a keyspace for either of your DCs if you only have a single node in each one (RF, which is how many copies of the data that exist, can't be more than the number of nodes in the DC). So let's put that aside for now as well.
You have the possibility for DCs to become out of sync as well as nodes within a DC to become out of sync. There are multiple protections against this problem.
Consistency Level (CL)
This is a lever that you (the client) have to be able to help control how far out of sync things get. There's a trade off between availability v.s. consistency (with performance implications as well). The CL setting is configured at connection time and/or each statement level. For writes, the CL determines how many nodes must IMMEDIATELY ACKNOWLEDGE the write before giving your application the "green light" to move on (a number of nodes that you're comfortable with - knowing the more nodes you immediately require the more consistent your nodes and/or DC(s) will be, but the longer it will take and the less flexibility you have in nodes becoming unavailable without client failure). If you specify less than RF it doesn't mean that RF won't be met, it just means that they don't need to immediately acknowledge the write to move on. For reads, this setting determines how many nodes' data are compared before the result is returned (if cassandra finds a particular row doesn't match from the nodes it's comparing, it will "fix" them during the read before you get your results - this is called read repair). There are a handful of CL options by the client (e.g. ONE, QUORUM, LOCAL_ONE, LOCAL_QUOURM, etc.). Again, there is a trade-off between availability and consistency with the selected choice.
If you want to be sure your data is consistent when your queries run (when you read the data), ensure the write CL + the read CL > RF. You can ensure that's done on a LOCAL level (e.g. the DC that the read/write is occurring on, say, LOCAL_QUORUM) or globally (all DCs with QUORUM). By doing this, you'll be sure that while your cluster may be inconsistent, your results during reads will not be (i.e. the results will be consistent/accurate - which is all that anyone really cares about). With this setting you also allow some flexibility in unavailable nodes (e.g. for a 3 node DC you could have a single node be unavailable without client failure for either reads or writes).
If nodes do become out of sync, you have a few options at this point:
Repair
Repair (run by "nodetool repair") - this is a facility that you can schedule or manually run to reconcile your tables, keyspaces and/or the entire node with other nodes (either in the DC the node resides or the entire cluster). This is a "node level" command and must be run on each node to "fix" things. If you have DSE, Ops Center can run repairs in the background fixing "chunks" of data - cycling the process repetitively.
NodeSync
Similar to repair, this is a DSE specific tool similar to repair that helps keep data in sync (the newer version of repair).
Unavailable nodes:
Hinted Handoff
Cassandra has the ability to "hold onto" changes if nodes become unavailable during writes. It will hang onto changes for a specified period of time. If the unavailable nodes become available before time runs out, the changes are sent over for application. If time runs out, hint collection stops and one of the other options, above, need to be performed to catch things up.
Finally, there is no way to know how inconsistent things are (e.g. 30% inconsistent). You simply try to utilize the tools mentioned above to control consistency without completely sacrificing availability.
Hopefully that makes sense and helps.
-Jim

Why do tables get out of sync over time when Write Consistency ALL is used?

Iam running a cassandra 3.11.4 cluster with 1 data center, 2 racks and 11 nodes. My keyspaces and the tables are set to replication 2. I use the Prometheus-Grafana-Combo to monitor the cluster.
Observation: During (massive) inserts using Write-Consistency Level ALL (i.e. 2 nodes) the affected tables/nodes get slowly out of sync (worst case on one node: from 100% to 83% within 6 hours). My expectation is that this could only happen if I use ANY (or anything less than my replication factor).
I would really like to understand this behaviour.
What is also interesting: If I dare to use write consistency ANY I get exactly that- and even though all nodes are online Cassandra does not even seem attempt to write to all nodes. In any case (ANY or ALL) if have to perform incremental repairs.
First of all, your expectation is correct: Writes, regardless of what the consistency-level is (ALL or ONE or ANY or whatever), do make every attempt to write to all replicas. The different write-consistency levels only differ on when "success" is reported to the client: ALL waits until all writes were done, while ONE waits for just one (and does the other ones in the background). So unless one of your nodes goes down, or severely overloaded, none of the writes should be missing on any of the nodes, and there should be zero inconsistencies. The "hinted handoff" feature makes inconsistencies even less likely (if one node is temporarily down, other nodes save for it the writes it missed, and replay them later).
I think your only problem is that you're misinterpreting what the "percentrepaired" statistic means. The "percentrepaired" metric is used by incremental repair. In incremental repair, data on disk is split between "repaired" data (data that already went through a repair process) and "unrepaired" data - new data that still did not yes pass through repair. This does not mean that the new data is inconsistent or differs between nodes - it just that nobody checked that yet! To mark this new data "repaired" you'd need to run an (incremental) repair - it will realize the data does not differ between nodes, and mark it as "repaired".

Is the my cassandra config implement true

I have a cassandra cluster with three nodes under normal circumstances. When I send write request cluster from node.js, I want all nodes to write back to me after writing, while reading, i want to be able to read which node I am connected to. I want this setup to continue when one of the three nodes has died. I chose replication factor= 3 consistency=2
How should I implement a configuration. Is the config implement true ?
With my respects...
So I unfortunately have no real clue about the provided numbers from the node JS driver, but I know something about the consistency levels, which I suspect you are using in the background, assuming that you are using this driver: http://datastax.github.io/nodejs-driver/.
Just a basic thing: The nodes don't write back to you directly, but your query is sent to one node, the coordinator of that query, which then distributes the query in your cluster according to your consistency level specifications (at least if it's a simple query, more complex destribution logic applies in case of batch queries). The coordinator then reports back to you when the query is executed.
Whether your requirements can be fulfilled at all depends on the replication factor you chose. The problem here is that cassandra only knows so many of them. The options for writing are: all (which at first looks what you want), quorum (which is also an option), one and any. So let's assume you choose all, because you want to write to all replicas. That's totally fine, but if one of the nodes goes down, there will be failing writes, because one of the replicas could not be updated. In case you are actually using replication factor 3, you can fallback to write with quorum, which is 2 nodes in this case. What should happen if another node fails? I know, very unlikely, but I've seen it in production, so it happens from time to time. Should the single last node be updated in this case? Then you need to fallback to consistency level 1. Everything fine.
But what if you choose the replication factor to be 5? Well, there is no way of saying: I want 4 nodes. You can only have a quorum in case of a failure of one node, and that would be 3, not 4. And the next fallback would be 1 and not 2.
The final question is: if you lose one node and you do a fallbak in the writing part, what happens when your node comes back (assuming that there are lost updates because some of the hinted handoffs are already discarded)? The reading part of your application can always read stale data because you always only read from a single node. It seems to me like you are trying to compensate for that in the write part. My personal idea would be using quorum when reading and writing, this way it's guaranteed that you read current data and a single node can go down (with replication factor 5 it's even 2 nodes). Also keep in mind that when you write to a node, cassandra will always attempt to write to the replicas in the background, so it tries to keep your data up to date. The risk of reading stale data even with a consistency level pair of one-one can be acceptable if you really need the speed.

Cassandra difference between ANY and ONE consistency levels

Assumptions: RF = 3
In some video on the Internet about Consistency level speaker says that CL = ONE is better then CL = ANY because when we use CL = ANY coordinator will be happy to store only hint(and data)(we are assuming here that all the other nodes with corresponding partition key ranges are down) and we can potentially lose our data due to coordinator's failure. But wait a minute.... as I understand it, if we used CL = ONE and for example we had only one(of three) available node for this partition key, we would have only one node with inserted data. Risk of loss is the same.
But I think we should assume equal situations - all nodes for particular token is gone. Then it's better to discard write operation then write with such a big risk of coordinator's loss.
CL=ANY should probably never be used on a production server. Writes will be unavailable until the hint is written to a node owning that partition because you can't read data when its in a hints log.
Using CL=ONE and RF=3 with two nodes down, you would have data stored in both a) the commit log and memtable on a node and b) the hints log. These are likely different nodes, but they could be the same 1/3 of the time. So, yes, with CL=ONE and CL=ANY you risk complete loss of data with a single node failure.
Instead of ANY or ONE, use CL=QUORUM or CL=LOCAL_QUORUM.
The thing is the hints will just be stored for 3 hours by default and for longer times than that you have to run repairs. You can repair if you have at least one copy of this data on one node somewhere in the cluster (hints that are stored on coordinator don't count).
Consistency One guarantees that at least one node in the cluster has it in commit log no matter what. Any is in worst case stored in hints of coordinator (other nodes can't access it) and this is stored by default in a time frame of 3 hours. After 3 hours pass by with ANY you are loosing data if other two instances are down.
If you are worried about the risk, then use quorum and 2 nodes will have to guarantee to save the data. It's up to application developer / designer to decide. Quorum will usually have slightly bigger latencies on write than One. But You can always add more nodes etc. should the load dramatically increase.
Also have a look at this nice tool to see what impacts do various consistencies and replication factors have on applications:
https://www.ecyrd.com/cassandracalculator/
With RF 3, 3 nodes in the cluster will actually get the write. Consistency is just about how long you want to wait for response from them ... If you use One, you will wait until one node has it in commit log. But the coordinator will actually send the writes to all 3. If they don't respond coordinator will save the writes into hints.
Most of the time any in production is a bad idea.

Understand cassandra replication factor versus consistency level

I want to clarify very basic concept of replication factor and consistency level in Cassandra. Highly appreciate if someone can provide answer to below questions.
RF- Replication Factor
RC- Read Consistency
WC- Write Consistency
2 cassandra nodes (Ex: A, B) RF=1, RC=ONE, WC=ONE or ANY
can I write data to node A and read from node B ?
what will happen if A goes down ?
3 cassandra nodes (Ex: A, B, C) RF=2, RC=QUORUM, WC=QUORUM
can I write data to node A and read from node C ?
what will happen if node A goes down ?
3 cassandra nodes (Ex: A, B, C) RF=3, RC=QUORUM, WC=QUORUM
can I write data to node A and read from node C ?
what will happen if node A goes down ?
Short summary: Replication factor describes how many copies of your data exist. Consistency level describes the behavior seen by the client. Perhaps there's a better way to categorize these.
As an example, you can have a replication factor of 2. When you write, two copies will always be stored, assuming enough nodes are up. When a node is down, writes for that node are stashed away and written when it comes back up, unless it's down long enough that Cassandra decides it's gone for good.
Now say in that example you write with a consistency level of ONE. The client will receive a success acknowledgement after a write is done to one node, without waiting for the second write. If you did a write with a CL of ALL, the acknowledgement to the client will wait until both copies are written. There are very many other consistency level options, too many to cover all the variants here. Read the Datastax doc, though, it does a good job of explaining them.
In the same example, if you read with a consistency level of ONE, the response will be sent to the client after a single replica responds. Another replica may have newer data, in which case the response will not be up-to-date. In many contexts, that's quite sufficient. In others, the client will need the most up-to-date information, and you'll use a different consistency level on the read - perhaps a level ALL. In that way, the consistency of Cassandra and other post-relational databases is tunable in ways that relational databases typically are not.
Now getting back to your examples.
Example one: Yes, you can write to A and read from B, even if B doesn't have its own replica. B will ask A for it on your client's behalf. This is also true for your other cases where the nodes are all up. When they're all up, you can write to one and read from another.
For writes, with WC=ONE, if the node for the single replica is up and is the one you're connect to, the write will succeed. If it's for the other node, the write will fail. If you use ANY, the write will succeed, assuming you're talking to the node that's up. I think you also have to have hinted handoff enabled for that. The down node will get the data later, and you won't be able to read it until after that occurs, not even from the node that's up.
In the other two examples, replication factor will affect how many copies are eventually written, but doesn't affect client behavior beyond what I've described above. The QUORUM will affect client behavior in that you will have to have a sufficient number of nodes up and responding for writes and reads. If you get lucky and at least (nodes/2) + 1 nodes are up out of the nodes you need, then writes and reads will succeed. If you don't have enough nodes with replicas up, reads and writes will fail. Overall some QUORUM reads and writes can succeed if a node is down, assuming that that node is either not needed to store your replica, or if its outage still leaves enough replica nodes available.
Check out this simple calculator which allows you to simulate different scenarios:
http://www.ecyrd.com/cassandracalculator/
For example with 2 nodes, a replication factor of 1, read consistency = 1, and write consistency = 1:
Your reads are consistent
You can survive the loss of no nodes.
You are really reading from 1 node every time.
You are really writing to 1 node every time.
Each node holds 50% of your data.

Resources