How can Cassandra tuned to be CA in CAP theorem? - cassandra

I know The CAP theorem:
Consistency (all nodes see the same data at the same time)
Availability (a guarantee that every request receives a response about whether it was successful or failed)
Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)
Cassandra is typically classified as an AP system, I heard yes it can turned to CA, but I didn't find the documentation.
How to use CA Cassandra ?
Thanks.

Generally speaking, the 'P' in CAP is what NoSQL technologies were built to solve for. This is usually accomplished by spreading data horizontally across multiple instances.
Therefore, if you wanted Cassandra to run in a "CA" CAP configuration, running it as a single node cluster would be a good first step.
I heard yes it can turned to CA, but I didn't find the documentation.
After re-reading this, it's possible that you may have confused "CA" with "CP."
It is possible to run Cassandra as a "CP" database, or at least tune it to behave more in that regard. The way to go about this, would be to set queries on the application side to use the higher levels of consistency, like [LOCAL_]QUORUM, EACH_QUORUM, or even ALL. Consistency could be tuned even higher, by increasing the replication factor (RF) in each keyspace definition. Setting RF equal to number of nodes and querying at ALL consistency would be about as high as it could be tuned to be consistent.
However, I feel compelled to mention at what a terrible, terrible idea this all is. Cassandra was engineered to be "AP." Fighting that intrinsic design is a fool's errand. I've always said, nobody wins when you try to out-Cassandra Cassandra.
If you're employing engineering time to make a datastore function in ways that are contrary to its design, then a different datastore (one you don't have to work against) might be the better choice.

Related

If lower consistency level is good then why we need to have a higher consistency(QUORUM,ALL) level in Cassandra?

While going through Datastax tutorial I learned that
1)Lower consistency level is quicker for read and write whereas, it's much longer if we use higher consistency level
2) Lower consistency level also ensures high availability of data.
My question is
if Lower CL is good then we can always set CL as ONE,
why do we need QUORUM and ALL consistency levels?
It ultimately depends on the application using Cassandra. If the application is ok with serving up data that may be under-replicated or slightly stale, then LOCAL_ONE should be fine. If the application absolutely cannot provide a wrong answer, or if written rows are not being successfully read consistently, then perhaps LOCAL_QUORUM may be more applicable.
I tell my application teams the same thing. Start with LOCAL_ONE, and and work with it through testing. If you don't have any problems, then continue using it. If you do experience stale data, and your application is more read-sensitive, then try writing at LOCAL_QUORUM and continue reading at LOCAL_ONE. And if that doesn't help, then perhaps the application may need both at QUORUM.
Again, that's why application teams need to do thorough testing.
And just to address it, ALL is a useful consistency level because it invokes a read repair. Essentially, if you have a table which you know is missing data, and you don't want to run a costly nodetool repair on it, you can set your consistency to ALL and read from it. I find this trick to be most-useful when addressing issues with multi-DC clusters having issues with system_auth.
But you probably wouldn't want to use ALL from within an application. Or if you did, it'd be for a very specific edge case.
The real meat of database like Cassandra is "eventual consistency": it won't enforce strong consistency when you first write data to the database. rather, it gives you option to choose a weaker consistency level like "one" to reach high writing performance, and then later when you query data, as long as this rule "Read_Consistency_level + Write_consistency_level >= the RF policy (Replication factor)" satisfies, you won't have stale data.
It's risky if you can't fulfill the above rule since you might get either stale or contradictory (sometimes new, sometimes old) data.

Pros and Cons of Cassandra User Defined Functions

I am using Apache Cassandra to store mostly time series data. And I am grouping the data and aggregating/counting it based on some conditions. At the moment I am doing this in a Java 8 application, but with the release of Cassandra 3.0 and the User Defined Functions, I have been asking myself if extracting the grouping and aggregation/counting logic to Cassandra is a good idea. To my understanding this functionallity is something like the stored procedures in SQL.
My concern is if this will impact the computation performance and the overall performance of the database. I am also not sure if there are other issues with it and if this new feature is something like the secondary indexes in Cassandra - you can do them, but it is not recommended at all.
Have you used user defined functions in Cassandra? Do you have any observations on the performance? What are the good and bad sides of this new functionality? Is it applicable in my use case?
You can compare it to using count() or avg() kind of aggregations. They can save you a lot of network traffic and object creation/GC by having the coordinator only send the result, but its easy to get carried away and make the coordinator do a lot of work. This extra work takes away from normal C* duties, and can just as likely increase GCs as reduce them.
If your aggregating 100 rows in a partition its probably fine and if your aggregating 10000 its probably not end of the world if its very rare. If your calling it once a second though its a problem. If your aggregating over 1000 I would be very careful.
If you absolutely need to do it and its a lot of data often, you may want to create dedicated proxy coordinators (-Djoin_ring=false) to bear the brunt of the load without impacting normal C* read/writes. At that point its just as easy to create dedicated workload DC for it or something (with RF=0 for your keyspace, and set application to be part of that DC with DCAwareRoundRobinPolicy). This also is the point where using Spark is probably the right thing to do.

maintaining dynamic consistency level in datastax

I have a 5 node cluster and keyspace with replication factor of 3. The nature of operations are such that writes are much more important than read, but frequency of read operations are about 10 times higher than write. To achieve consistency while improving overall performance, I chose to set consistency level for writes as ALL, and ONE for read. But this causes operations to fail if even one node is down.
Is there a method by which I can simultaneously change consistency level for (Write,Read) from (ALL,ONE) to (QUORUM, QUORUM) if one node is detected down, or if there is a query-execution-exception; plus this be done in a manner that no operations pass through a temporary phase where it sees a temporary (QUORUM, ONE) setting.
We also plan to expand to twice the capacity, 3 datacenter with 4 nodes each. Is it possible to define custom consistency levels, like, (a level of ALL in any one datacenter and ONE in others). I'm thinking that a level of (EACH_ONE) for read, coupled with above level for write will insure consistency but will allow the cluster to remain available even if a node goes down.
The flexibility is there since you can set your consistency level at a per request basis. Depending on the client you are using, there are some nice capabilities. For example, the java driver has something called a DowngradingConsistencyRetryPolicy such that if a request fails, it will be retried with the next lowest consistency level until the request succeeds. This pushes the complexity of retrying into the client so you don't have to write a bunch of code for it, it's really nice!
The java driver also allows you to configure consistency level per request with Statement#setConsistencyLevel()
As far as custom consistency levels, this is not an option available to you (without changing the cassandra source code), however I think what is made available should be sufficient.
For reads, I don't find much value in ensuring consistency between Data Centers on read. I think LOCAL_QUORUM is more than sufficient, but if you really care, you can use something like EACH_QUORUM for to ensure all datacenters agree, but that will severely impact your response time and availability. For example, if one of your datacenters goes down completely, you won't be able to do reads at all (unless downgrading).
For writes, I'd strongly recommend not using ALL in a multi datacenter set up if you care about response time and availability. Depending on your requirements, LOCAL_QUORUM should likely be more than sufficient.
While one of the benefits of Cassandra is that consistency is tunable, you can have as much strong consistency as you like, but keep in mind that Cassandra is at its best as a Highly Available, Partition Tolerant system.
A really good presentation on consistency that I think really nails a lot of these points is Christos Kalazantis' talk 'Eventual Consistency != Hopeful Consistency' which suggests that a consistency level of ONE is sufficient for a lot of use cases.

Which part of the CAP theorem does Cassandra sacrifice and why?

There is a great talk here about simulating partition issues in Cassandra with Kingsby's Jesper library.
My question is - with Cassandra are you mainly concerned with the Partitioning part of the CAP theorem, or is Consistency a factor you need to manage as well?
Cassandra is typically classified as an AP system, meaning that availability and partition tolerance are generally considered to be more important than consistency. However, real world systems rarely fall neatly into these categories, so it's more helpful to view CAP as a continuum. Most systems will make some effort to be consistent, available, and partition tolerant, and many (including Cassandra) can be tuned depending on what's most important. Turning knobs like replication factor and consistency level can have a dramatic impact on C, A, and P.
Even defining what the terms mean can be challenging, as various use cases have different requirements for each. So rather than classify a system as CP, AP, or whatever, it's more helpful to think in terms of the options it provides for tuning these properties as appropriate for the use case.
Here's an interesting discussion on how things have changed in the years since the CAP theorem was first introduced.
CAP stands for Consistency, Availability and Partition Tolerance.
In general, its impossible for a distributed system to guarantee above three at a given point.
Apache Cassandra falls under AP system meaning Cassandra holds true for Availability and Partition Tolerance but not for Consistency but this can further tuned via replication factor(how many copies of data) and consistency level (read and write).
For more info: https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlConfigConsistency.html
Interestingly it depends on your Cassandra configuration. Cassandra can at max be AP system. But if you configure it to read or write based on Quorum then it does not remain CAP-available (available as per definition of the CAP theorem) and is only P system.
Just to explain things in more detail CAP theorem means:
C: (Linearizability or strong consistency) roughly means
If operation B started after operation A successfully completed, then
operation B must see the system in the same state as it was on
completion of operation A, or a newer state (but never older state).
A:
“every request received by a non-failing [database] node in the system must result in a [non-error] response”. It’s not sufficient for some node to be able to handle the request: any non-failing node needs to be able to handle it. Many so-called “highly available” (i.e. low downtime) systems actually do not meet this definition of availability.
P
Partition Tolerance (terribly misnamed) basically means that you’re communicating over an asynchronous network that may delay or drop messages. The internet and all our data centres have this property, so you don’t really have any choice in this matter.
Source: Awesome Martin kleppmann's work
The CAP theorem states that a database can’t simultaneously guarantee consistency, availability, and partition tolerance
Since network partitions are part of life, distributed databases tend to be either CP or AP
Cassandara was meant for AP but you can fine tune consistency at the cost of availability.
Availability : It was ensured with replicas. Cassandra typically writes multiple copies to different cluster nodes (generally 3). If one node is unavailable, data won't be lost.
Writing data to multiple nodes will take time because nodes are scattered in different location. At some point of time, data will become eventually consistent.
So with high availability preference, consistency is compramised.
Tunable consistency:
For read or write operation, you can mention consistency level. Consistency level refers to the number of replicas that need to respond for a read or write operation to be considered complete.
For non-critical features, you can provide less consistency level : say 1.
If you think consistency is important, you can increase the level to TWO, THREE or QUORAM ( A majority of replicas)
Assume that you set the consistency level to high (QUORAM) for your critical features and majority of the nodes are down. In this case, write operation will fail.
Here Cassandra sacrificies availabiltiy for consistency.
Have a look at this article for more details.

Read-your-own-writes consistency in Cassandra

Read-your-own-writes consistency is great improvement from the so called eventual consistency: if I change my profile picture I don't care if others see the change a minute later, but it looks weird if after a page reload I still see the old one.
Can this be achieved in Cassandra without having to do a full read-check on more than one node?
Using ConsistencyLevel.QUORUM is fine while reading an unspecified data and n>1 nodes are actually being read. However when client reads from the same node as he writes in (and actually using the same connection) it can be wasteful - some databases will in this case always ensure that the previously written (my) data are returned, and not some older one. Using ConsistencyLevel.ONE does not ensure this and assuming it leads to race conditions. Some test showed this: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/per-connection-quot-read-after-my-write-quot-consistency-td6018377.html
My hypothetical setup for this scenario is 2 nodes, replication factor 2, read level 1, write level 1. This leads to eventual consistency, but I want read-your-own-writes consistency on reads.
Using 3 nodes, RF=3, RL=quorum and WL=quorum in my opinion leads to wasteful read request if I being consistent only on "my" data is enough.
// seo: also known as: session consistency, read-after-my-write consistency
Good question.
We've had http://issues.apache.org/jira/browse/CASSANDRA-876 open for a while to add this, but nobody's bothered finishing it because
CL.ONE is just fine for a LOT of workloads without any extra gymnastics
Reads are so fast anyway that doing the extra one is not a big deal (and in fact Read Repair, which is on by default, means all the nodes get checked anyway, so the difference between CL.ONE and higher is really more about availability than performance)
That said, if you're motivated to help, ask on the ticket and I'll be happy to point you in the right direction.
I've been following Cassandra development for a little while and I haven't seen a feature like this mentioned.
That said, if you only have 2 nodes with a replication factor of 2, I would question whether Cassandra is the best solution. You are going to end up with the entire data set on each node, so a more traditional replicated SQL setup might be simpler and more widely tested. Cassandra is very promising but it is still only version 0.8.2 and problems are regularly reported on the mailing list.
The other way to solve the 'see my own updates' problem would be to cache the results somewhere closer to the client, whether in the web server, the application layer, or using something like memcached.

Resources