What ConsistencyLevel to use with Cassandra counter tables? - cassandra

I have a table counting around 1000 page views per second. What read and write ConsistencyLevel should I use with it? I am using the Cassandra Thrift client.

Carlo has more or less the right idea. But you have to balance it with your use case.
I work in the game industry and we use cassandra for player data. It is quite heavily bound by the read-modify-write pattern which is not the strong suit of cassandra. But we also have some functionality that are Write heavy (thousands of writes for a few reads a day).
This is my opinion, based upon experience, of how you should use the consistency levels.
Write + Read at QUORUM means that before returning for both operations it will wait for a majority of nodes in the cluster to confirm the operation. It is the solution I use when Read and Writes are roughly at the same frequency. (Player data blob)
Write One + Read All is useful for something very write heavy. We use this for high scores for examples (write often read every 5 minutes for regenerating the high score table of the whole game)
You could use Write Any if you do not care about the data that much (non critical logs comes to mind).
The only use case I could come up for the Write All + Read One would be messaging or feeds with periodical checks for updates. Chats and messaging seem a good fit for that since Cassandra does not have a subscription/push functionality to it.
Write & Read ALL is a bad implementation. It IS a WASTE of resource as you will get the same consistency as if you were using one of the three set up I mentioned above.
A final note about Write ANY vs. Write ONE : ANY only confirms that anything in the cluster has received the mutation, but ONE confirms that it has been applied at least by one node. ANY is not safe as it could return without error even if all the nodes responsible for that mutation are down, or any other condition that could make the mutation fail after reception. It is also slightly quicker (I only use it as an async dump for logs that are not critical) that is its only advantage, but do not trust the response at 100%.
A good reference to study this subject about cassandra is http://www.datastax.com/docs/1.2/dml/data_consistency

If you want always be consistent at any read the rule is
(write consistency level + read consistency level) > replication factor.
So you could
Write All + Read All (worst solution)
Write One + Read All (second-worst solution)
Write All + Read One (probably faster solution)
Write Quorum + Read Quorum (imho, best solution)
I want remember that if a node of RF is down during the r/w operation the operation will fail so I'd avoid the CL ALL.
Regards, Carlo

Based on their document (https://docs.datastax.com/en/cql/3.0/cql/ddl/ddl_counters_c.html), consistency level ONE is recommended. I guess some sort of merging is used to resolve conflict for counter columns, instead of usual last write win. That's likely why setting a value is not allowed.

Related

Cassandra, Counters, and Write Conflicts

We are exploring using Cassandra as a way to store time series type data, so this may be somewhat of a noob question. One of the use cases is to read data from a Kafka stream, look for matches, and incrementing a counter (e.g. 5 customers have clicked through link alpha on page beta, increment (beta, alpha) by 5). However, we expect a very wide degree of parallelism to keep up with the load, so there may be more than one consumer reading from Kafka at the same time.
My question is: How would Cassandra resolve multiple simultaneous writes to a given counter from multiple sources?
It's my understanding that multiple writes to the counter with different timestamps will be added to the counter in the timestamp order received. However, if there were to be a simultaneous write with exact same timestamp, would the LWW model of Cassandra throw out one of those counter increments?
If we were to have a large cluster (100+ nodes), ALL or QUORUM writes may not be sufficient performant to keep up with the messasge traffic. Writes with THREE would seem to be likely to result in a situation where process #1 writes to nodes A, B, and C, but process #2 might write to X, Y, and Z. Would LWT work here, or do they not play well with counter activity?
I would try out a proof of concept and benchmark it, it will most likely work just fine. Counters are not super performant in Cassandra though, especially if there will be a lot of contention.
Counters are not like the normal writes with a simple LWW, it uses paxos with some pessimistic locking and specialized caches. The partition lock contention will slow it down soome, and paxos is an expensive multiple network hop process with reads before writes.
Use quorum, don't try to do something funky with CL's with counters, especially before benchmarking to know if you need it. 100 node cluster should be able to handle a lot as long as your not trying to update all the same partitions constantly.

If lower consistency level is good then why we need to have a higher consistency(QUORUM,ALL) level in Cassandra?

While going through Datastax tutorial I learned that
1)Lower consistency level is quicker for read and write whereas, it's much longer if we use higher consistency level
2) Lower consistency level also ensures high availability of data.
My question is
if Lower CL is good then we can always set CL as ONE,
why do we need QUORUM and ALL consistency levels?
It ultimately depends on the application using Cassandra. If the application is ok with serving up data that may be under-replicated or slightly stale, then LOCAL_ONE should be fine. If the application absolutely cannot provide a wrong answer, or if written rows are not being successfully read consistently, then perhaps LOCAL_QUORUM may be more applicable.
I tell my application teams the same thing. Start with LOCAL_ONE, and and work with it through testing. If you don't have any problems, then continue using it. If you do experience stale data, and your application is more read-sensitive, then try writing at LOCAL_QUORUM and continue reading at LOCAL_ONE. And if that doesn't help, then perhaps the application may need both at QUORUM.
Again, that's why application teams need to do thorough testing.
And just to address it, ALL is a useful consistency level because it invokes a read repair. Essentially, if you have a table which you know is missing data, and you don't want to run a costly nodetool repair on it, you can set your consistency to ALL and read from it. I find this trick to be most-useful when addressing issues with multi-DC clusters having issues with system_auth.
But you probably wouldn't want to use ALL from within an application. Or if you did, it'd be for a very specific edge case.
The real meat of database like Cassandra is "eventual consistency": it won't enforce strong consistency when you first write data to the database. rather, it gives you option to choose a weaker consistency level like "one" to reach high writing performance, and then later when you query data, as long as this rule "Read_Consistency_level + Write_consistency_level >= the RF policy (Replication factor)" satisfies, you won't have stale data.
It's risky if you can't fulfill the above rule since you might get either stale or contradictory (sometimes new, sometimes old) data.

Dealing with eventual consistency in Cassandra

I have a 3 node cassandra cluster with RF=2. The read consistency level, call it CL, is set to 1.
I understand that whenever CL=1,a read repair would happen when a read is performed against Cassandra, if it returns inconsistent data. I like the idea of having CL=1 instead of setting it to 2, because then even if a node goes down, my system would run fine. Thinking by the way of the CAP theorem, I like my system to be AP instead of CP.
The read requests are seldom(more like 2-3 per second), but are very important to the business. They are performed against log-like data(which is immutable, and hence never updated). My temporary fix for this is to run the query more than once, say 3 times, instead of running it once. This way, I can be sure that that even if I don't get my data in the first read request, the system would trigger read repairs, and I would eventually get my data during the 2nd or 3rd read request. Ofcourse, these 3 queries happen one after the other, without any blocking.
Is there any way that I can direct Cassandra to perform read repairs in the background without having the need to actually perform a read request in order to trigger a repair?
Basically, I am looking for ways to tune my system in a way as to circumvent the 'eventual consistency' model, by which my reads would have a high probability of succeeding.
Help would be greatly appreciated.
reads would have a high probability of succeeding
Look at DowngradingConsistencyRetryPolicy this policy allows retry queries with lower CL than the initial one. With this policy your queries will have strong consistency when all nodes are available and you will not lose availability if some node is fail.

maintaining dynamic consistency level in datastax

I have a 5 node cluster and keyspace with replication factor of 3. The nature of operations are such that writes are much more important than read, but frequency of read operations are about 10 times higher than write. To achieve consistency while improving overall performance, I chose to set consistency level for writes as ALL, and ONE for read. But this causes operations to fail if even one node is down.
Is there a method by which I can simultaneously change consistency level for (Write,Read) from (ALL,ONE) to (QUORUM, QUORUM) if one node is detected down, or if there is a query-execution-exception; plus this be done in a manner that no operations pass through a temporary phase where it sees a temporary (QUORUM, ONE) setting.
We also plan to expand to twice the capacity, 3 datacenter with 4 nodes each. Is it possible to define custom consistency levels, like, (a level of ALL in any one datacenter and ONE in others). I'm thinking that a level of (EACH_ONE) for read, coupled with above level for write will insure consistency but will allow the cluster to remain available even if a node goes down.
The flexibility is there since you can set your consistency level at a per request basis. Depending on the client you are using, there are some nice capabilities. For example, the java driver has something called a DowngradingConsistencyRetryPolicy such that if a request fails, it will be retried with the next lowest consistency level until the request succeeds. This pushes the complexity of retrying into the client so you don't have to write a bunch of code for it, it's really nice!
The java driver also allows you to configure consistency level per request with Statement#setConsistencyLevel()
As far as custom consistency levels, this is not an option available to you (without changing the cassandra source code), however I think what is made available should be sufficient.
For reads, I don't find much value in ensuring consistency between Data Centers on read. I think LOCAL_QUORUM is more than sufficient, but if you really care, you can use something like EACH_QUORUM for to ensure all datacenters agree, but that will severely impact your response time and availability. For example, if one of your datacenters goes down completely, you won't be able to do reads at all (unless downgrading).
For writes, I'd strongly recommend not using ALL in a multi datacenter set up if you care about response time and availability. Depending on your requirements, LOCAL_QUORUM should likely be more than sufficient.
While one of the benefits of Cassandra is that consistency is tunable, you can have as much strong consistency as you like, but keep in mind that Cassandra is at its best as a Highly Available, Partition Tolerant system.
A really good presentation on consistency that I think really nails a lot of these points is Christos Kalazantis' talk 'Eventual Consistency != Hopeful Consistency' which suggests that a consistency level of ONE is sufficient for a lot of use cases.

Read/Write Strategy For Consistency Level

Based on Read Operation in Cassandra at Consistency level of Quorum?
there are 3 ways to read data consistency:
a. WRITE ALL + READ OoNE
b. WRITE ONE + READ ALL
c. WRITE QUORUM + READ QUORUM
For a data, the write operation usually happens once, but read operations often happens.
But take care of the read consistency, is it possible to merge a and b ?
This is, WRITE ONE -> READ ONE -> if not found -> READ ALL.
Does the approach usually fulfill read/write operation happen once?
There is only read ALL at first time on a node which has no the data.
So Is my understanding correct?
Wilian, thanks for exactly elaborating. I think I need to describe my use case, as bellow. I implemented a timeline uses can post to. And users can follow the interesting post. So notification will be sent to the followers. For saving bandwidth, users write/read post at CL ONE. Eventually, users always can read the post after a while by read repair. Followers will receive the notification of comments added to post if they listen the post. Here is my question. It must make sure followers can read the comments if notification were delivers to followers. So I am indented to use CL ONE to check if the comment was synced to the node queried. If no result, try CL ALL to synced the comment. So other followers query from the node don't bother to sync other nodes since the CL ALL was done before,which can save bandwidth and lower server overhead. So as for your final scenario, I don't care if the value is old or latest because the data was synced according to notifications. I need to ensure users can get the comment if notification was delivered to followers.
From the answer to the linked question, Carlo Bertuccini wrote:
What guarantees consistency is the following disequation
(WRITE CL + READ CL) > REPLICATION FACTOR
The cases A, B, and C in this question appear to be referring to the three minimum ways of satisfying that disequation, as given in the same answer.
Case A
WRITE ALL will send the data to all replicas. If your replication factor (RF) is three(3), then WRITE ALL writes three copies before reporting a successful write to the client. But you can't possibly see that the write occurred until the next read of the same data key. Minimally, READ ONE will read from a single one of the aforementioned replicas, and satisfies the necessary condition: WRITE(3) + READ(1) > RF(3)
Case B
WRITE ONE will send the data to only a single replica. In this case, the only way to get a consistent read is to read from all of them. The coordinator node will get all of the answers, figure out which one is the most recent and then send a "hint" to the out-of-date replicas, informing them that there's a newer value. The hint occurs asynchronously but only after the READ ALL occurs does it satisfy the necessary condition: WRITE(1) + READ(3) > RF(3)
Case C
QUORUM operations must involve FLOOR(RF / 2) + 1 replicas. In our RF=3 example, that is FLOOR(3 / 2) + 1 == 1 + 1 == 2. Again, consistency depends on both the reads and the writes. In the simplest case, the read operation talks to exactly the same replicas that the write operation used, but that's never guaranteed. In the general case, the coordinator node doing the read will talk to at least one of the replicas used by the write, so it will see the newer value. In that case, much like the READ ALL case, the coordinator node will get all of the answers, figure out which one is the most recent and then send a "hint" to the out-of-date replicas. Of course, this also satisfies the necessary condition: WRITE(2) + READ(2) > RF(3)
So to the OP's question...
Is it possible to "merge" cases A and B?
To ensure consistency it is only possible to "merge" if you mean WRITE ALL + READ ALL because you can always increase the number of readers or writers in the above cases.
However, WRITE ONE + READ ONE is not a good idea if you need to read consistent data, so my answer is: no. Again, using that disequation and our example RF=3: WRITE(1) + READ(1) > RF(3) does not hold. If you were to use this configuration, receiving an answer that there is no value cannot be trusted -- it simply means that the one replica contacted to do the read did not have a value. But values might exist on one or more of the other replicas.
So from that logic, it might seem that doing a READ ALL on receiving a no value answer would solve the problem. And it would for that use case, but there's another to consider: what if you get some value back from the READ ALL... how do you know that the value returned is "the latest" one? That's what's meant when we want consistency. If you care about reading the most recent write, then you need to satisfy the disequation.
Regarding the use case of "timeline" notifications in the edited question
If my understanding of your described scenario is correct, these are the main points to your use case:
Most (but not all?) timeline entries will be write-once (not modified later)
Any such entry can be followed (there is a list of followers)
Any such entry can be commented upon (there is a list of comments)
Any comment on a timeline entry should trigger a notification to the list of followers for that timeline entry
Trying to minimize cost (in this case, measured as bandwidth) for the "normal" case
Willing to rely on the anti-entropy features built into Cassandra (e.g. read repair)
I need to ensure users can get the comment if notification was delivered to followers.
Since most of your entries are write-once, and you care more about the existence of an entry and not necessarily the latest content for the entry, you might be able to get away with WRITE ONE + READ ONE with a fallback to READ ALL if you get no record for something that had some other indication it should exist (e.g. from a notification). For the timeline entry content, it does not sound like your case depends on consistency of the user content of the timeline entries.
If you don't care about consistency, then this discussion is moot; read/write with whatever Consistency Level and let Cassandra's asynchronous replication and anti-entropy features do their work. That said, though your goal is minimizing network traffic/cost, if your workload is mostly reads then the added cost of doing writes at CL QUORUM or ALL may not actually be that much.
You also said:
Followers will receive the notification of comments added to post if they listen the post.
This statement implies that you care about about not only whether the set of followers exists but also its contents (which users are following). You have not detailed how you are storing/tracking the followers, but unless you ensure the consistency of this data it is possible that one or more followers are not notified of a new comment because you retrieved an out-of-date version of the follower list. Or, someone who "unfollowed" a post could still receive notifications for the same reason.
Cassandra is very flexible and allows each discrete read and write operation to use different consistency levels. Take advantage of this and ensure strong consistency where it is needed and relax it where you are sure that "reading the latest write" is not important to your application's logic and function.

Resources