Dealing with eventual consistency in Cassandra - cassandra

I have a 3 node cassandra cluster with RF=2. The read consistency level, call it CL, is set to 1.
I understand that whenever CL=1,a read repair would happen when a read is performed against Cassandra, if it returns inconsistent data. I like the idea of having CL=1 instead of setting it to 2, because then even if a node goes down, my system would run fine. Thinking by the way of the CAP theorem, I like my system to be AP instead of CP.
The read requests are seldom(more like 2-3 per second), but are very important to the business. They are performed against log-like data(which is immutable, and hence never updated). My temporary fix for this is to run the query more than once, say 3 times, instead of running it once. This way, I can be sure that that even if I don't get my data in the first read request, the system would trigger read repairs, and I would eventually get my data during the 2nd or 3rd read request. Ofcourse, these 3 queries happen one after the other, without any blocking.
Is there any way that I can direct Cassandra to perform read repairs in the background without having the need to actually perform a read request in order to trigger a repair?
Basically, I am looking for ways to tune my system in a way as to circumvent the 'eventual consistency' model, by which my reads would have a high probability of succeeding.
Help would be greatly appreciated.

reads would have a high probability of succeeding
Look at DowngradingConsistencyRetryPolicy this policy allows retry queries with lower CL than the initial one. With this policy your queries will have strong consistency when all nodes are available and you will not lose availability if some node is fail.

Related

How to run multiple queries in Scylla using "Non Atomic" Batch/Pipeline

I understand that Scylla allows batch statements like these.
BEGIN BATCH
<insert-stmt>/ <update-stmt>/ <delete-stmt>
APPLY BATCH
These statements have performance implications as it ensures atomicity. However, I simply have many insert statements which I want to perform from my node client in a single IO. Atomicity among these inserts is not needed. Any idea how I can do that? Can't find anything.
Batching multiple inserts in Cassandra world usually is an antipattern (except when they go into one partition, see the docs). When you're sending inserts into multiple partitions in one batch, the coordinator node will need to take care for taking data from this batch and sending them to nodes that are owning the data. And this puts an additional load onto the coordinating node that first needs to backup the content of the batch just not to lose it if it crashes in the middle of execution, and then need to execute all operations, and wait for results of execution before sending it back to caller (see this diagram to understand how so-called logged batch works).
When you don't need atomicity, then the best performance would be by sending multiple parallel inserts, and waiting for their execution - it will be faster, it will put less load onto nodes, and driver can use token-aware load balancing policy, so requests will be sent to nodes that own data (if you're using prepared statements). In node.js you can achieve this by using Concurrent Execution API - there are several variants of its usage, so it's better to look into the documentation to select what is best for your use case.

Is the my cassandra config implement true

I have a cassandra cluster with three nodes under normal circumstances. When I send write request cluster from node.js, I want all nodes to write back to me after writing, while reading, i want to be able to read which node I am connected to. I want this setup to continue when one of the three nodes has died. I chose replication factor= 3 consistency=2
How should I implement a configuration. Is the config implement true ?
With my respects...
So I unfortunately have no real clue about the provided numbers from the node JS driver, but I know something about the consistency levels, which I suspect you are using in the background, assuming that you are using this driver: http://datastax.github.io/nodejs-driver/.
Just a basic thing: The nodes don't write back to you directly, but your query is sent to one node, the coordinator of that query, which then distributes the query in your cluster according to your consistency level specifications (at least if it's a simple query, more complex destribution logic applies in case of batch queries). The coordinator then reports back to you when the query is executed.
Whether your requirements can be fulfilled at all depends on the replication factor you chose. The problem here is that cassandra only knows so many of them. The options for writing are: all (which at first looks what you want), quorum (which is also an option), one and any. So let's assume you choose all, because you want to write to all replicas. That's totally fine, but if one of the nodes goes down, there will be failing writes, because one of the replicas could not be updated. In case you are actually using replication factor 3, you can fallback to write with quorum, which is 2 nodes in this case. What should happen if another node fails? I know, very unlikely, but I've seen it in production, so it happens from time to time. Should the single last node be updated in this case? Then you need to fallback to consistency level 1. Everything fine.
But what if you choose the replication factor to be 5? Well, there is no way of saying: I want 4 nodes. You can only have a quorum in case of a failure of one node, and that would be 3, not 4. And the next fallback would be 1 and not 2.
The final question is: if you lose one node and you do a fallbak in the writing part, what happens when your node comes back (assuming that there are lost updates because some of the hinted handoffs are already discarded)? The reading part of your application can always read stale data because you always only read from a single node. It seems to me like you are trying to compensate for that in the write part. My personal idea would be using quorum when reading and writing, this way it's guaranteed that you read current data and a single node can go down (with replication factor 5 it's even 2 nodes). Also keep in mind that when you write to a node, cassandra will always attempt to write to the replicas in the background, so it tries to keep your data up to date. The risk of reading stale data even with a consistency level pair of one-one can be acceptable if you really need the speed.

If all replicas will sync up eventually, what's the point of read repairs?

If all replicas will sync up eventually, what's the point of read repairs?
Wouldn't you have cases where if you have a write that's being sent to all replicas, then a read repair happens before the write, wouldn't you essentially be duplicating that same write twice?
Theres a few things, blocking read repairs, async read repairs, and if either are needed.
Blocking read repairs: Quorum reads are monotonically consistent for awhile now. If you read it twice you should get the same answer. People tend to use QUORUM reads as wanting stronger consistency, so the blocking read repairs prevent reads from going back in time. If this behavior ends it would cause potential surprises to existing applications. However the latency impact of these repairs have been causing issues and it may still be changed in very near future. You cannot currently disable this behavior and it will always be on.
Async read repairs: Repairs in background can be disabled or happen only a small percentage of time, or (recommended) only within a DC. This reduces background impact as there isnt as much cross DC traffic. This is controlled by the dc_local and global read repair chance settings. When you do a ONE or LOCAL_ONE etc query it will depending on that chance wait for the rest of the responses and compare results for a read repair.
Statistically your far more likely to be having unnecessary work with async read repairs as on a normal functioning perfect system they are not needed. Hinted Handoff however is not perfect and there are cases where hints are lost. In these situations the consistency will not be met until a anti-entropy repair is run (should be weekly or even daily depending on how repairs run, inc or full etc).
So other than for the sake of monotonic consistency (blocking on QUORUM+ requests), read repairs are not really critical or needed. Its something people have used to statistically put cluster in a more consistent state faster (maybe). Lots of people run without async read repairs (you cannot currently disable the read repair mechanism fwiw), and theres even serious talk of removing options around it completely due to misunderstandings.
One scenario that makes sense to me is this:
You write the data to a node (or a subset of the cluster)
You read the data (with Quorum), and one of the nodes has the fresher data.
because you specified QUROUM, several nodes are being asked for the value before the response is sent to the client. But because the data is fresher on one of the nodes, a blocking read-repair must first happen, to update all of them.
in this case, the read-repair needs to happen because the "eventual update" has yet to happen.
In highly dynamic applications with many nodes, there are times when an eventually consistent write doesn't make it to the node PRIOR to a read request for that piece of data on that node. This is common in environments with heavy load on an undersized cluster, unknown hardware issues and other reasons. Its likely also where write consistency is set to one, while read consistency is set to local_quorum.
Example 1: random & sporadic network drops due to an unknown network switch failing that affects the write to the node but doesn't affect the read.
Example 2: the write occurs during a peak load time period, and as a result doesn't make it to the overloaded node, prior to the read request.

Using Cassandra as a Queue

Using Cassandra as Queue:
Is it really that bad?
Setup: 5 node cluster, all operations execute at quorum
Using DateTieredCompaction should significantly reduce the cost of TombStones, and allow entire SSTables to be dropped at once.
We add all messages to the queue with the same TTL
We partition messages based on time (say 1 minute intervals), and keep track of the read-position.
Messages consumed will be explicitly deleted. (only 1 thread extracts messages)
Some Messages may be explicitly deleted prior to being read (i.e. we may have tombstones after the read-position). (i.e. the TTL initially used is an upper limit) gc_grace would probably be set to 0, as quorum reads will do blocking-repair (i.e. we can have repair turned off, as messages only reside in 1 cluster (DC), and all operations a quorum))
Messages can be added/deleted only, no updates allowed.
In our use case, if a tombstone does not replicate its not a big deal, its ok for us to see the same message multiple times occasionally. (Also we would likely not run Repair on regular basis, as all operations are executing at quorum.)
Thoughts?
Generally, it is an anti-pattern, this link talks much of the impact on tombstone: http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
My opinion is, try to avoid that if possible, but if you really understand the performance impact, and it is not an issue in your architecture, of course you could do that.
Another reason to not do that if possible is, the cassandra data structure is not designed for queues, it will always look ugly, UGLY!
Strongly suggest to consider Redis or RabbitMQ before making your final decision.

would Cassandra write be successful in the following situation?

I have two datacentres with 3 machines each. the replication factor is DC1:3, DC2:3 and all the inserts are made with write consistency = all.
So all data exists on all nodes (this is done to get reads to be the fastest).
But are there other problems with this set up that I might be missing? (except for writes being slow which im fine with)
For example, if a single node is down would all my writes fail? (Or can cassandra note down the writes for the failed node somewhere and bring it up to speed once its up?)
If a single node were down, then all your writes would fail. The consistency level specifies how many replicas you require for the write to be successful. So if you say ALL, and every node is a replica, then all the nodes would need to be up for it to succeed.
Usually you would do your writes with a lower consistency, like ONE. Cassandra will still write the data to all the nodes if they are up. If some of them are down, then the data may still get written to them (once they are back up) via hinted handoffs, read repair chance, and scheduled repairs.

Resources