What is the mechanism to achieve Row-level locking in Cassandra ? What I want to do is to allow only one process to modify a given row at any given time.
Cassandra does not provide locking. It does provide lightweight transactions, which can replace locking in some cases. Also note that operations on a single row are atomic, so locking a row is not necessarily to ensure a read or write of a row provides consistent field values for that row.
Cassandra does not provide locking because of a fundamental constraint on all kinds of distributed data store: a distributed data store can not ensure consistency while also providing performance and availability, but it can provide two of those three properties. The design of Cassandra chooses not to provide consistency, so it can provide high performance and availability. Locking is about consistency. So Cassandra could not provide locking without sacrificing performance and/or availability. As that would go against the design goals of Cassandra, it is a good bet that Cassandra will never have locking functionality.
Do you really need locking? You might be surprised that it is not as often necessary as you might think, if you have an RDBMS background. If you do need it, you must choose a different data-store, which is designed to provide consistency at the cost of either performance or availability (or both).
Related
I am reading about NoSQL DBs (Specifically Cassandra) and It says that Cassandra is faster for writing and queries are fast as well. Schema design is done more based on queries than based on data. For example, You have queries like in this example
then I have a question, Suppose I design the RDBMS schema similar to Cassandra's way and I ensure that no joins are required for queries. Will I get any significant performance gains still by using Cassandra(NoSql DBs)?
Cannot have an exact answer but few points,
JOIN is just of the many things - Cassandra stores the data physically based on the partition keys and hence making the read by partition as fast as possible.
On the performance side - its not about the performance at the beginning but keeping the performance consistent over a period of time. Say for example you have a time series like requirement where data is inserted every second, RDBMS performance will usually degrade as the data grows and not easy to keep up the index and stats up to date etc, while cassandra will fit better for a time series pattern and as the data grows its easy to scale up by adding nodes.
On the write performance - Cassandra's write workflow itself is different and is designed in a way to take up faster (the complicated process like merging sstabls, compaction etc happens in the background without affecting the actual write).
In short - you need to review the business case and make decision.
While going through Datastax tutorial I learned that
1)Lower consistency level is quicker for read and write whereas, it's much longer if we use higher consistency level
2) Lower consistency level also ensures high availability of data.
My question is
if Lower CL is good then we can always set CL as ONE,
why do we need QUORUM and ALL consistency levels?
It ultimately depends on the application using Cassandra. If the application is ok with serving up data that may be under-replicated or slightly stale, then LOCAL_ONE should be fine. If the application absolutely cannot provide a wrong answer, or if written rows are not being successfully read consistently, then perhaps LOCAL_QUORUM may be more applicable.
I tell my application teams the same thing. Start with LOCAL_ONE, and and work with it through testing. If you don't have any problems, then continue using it. If you do experience stale data, and your application is more read-sensitive, then try writing at LOCAL_QUORUM and continue reading at LOCAL_ONE. And if that doesn't help, then perhaps the application may need both at QUORUM.
Again, that's why application teams need to do thorough testing.
And just to address it, ALL is a useful consistency level because it invokes a read repair. Essentially, if you have a table which you know is missing data, and you don't want to run a costly nodetool repair on it, you can set your consistency to ALL and read from it. I find this trick to be most-useful when addressing issues with multi-DC clusters having issues with system_auth.
But you probably wouldn't want to use ALL from within an application. Or if you did, it'd be for a very specific edge case.
The real meat of database like Cassandra is "eventual consistency": it won't enforce strong consistency when you first write data to the database. rather, it gives you option to choose a weaker consistency level like "one" to reach high writing performance, and then later when you query data, as long as this rule "Read_Consistency_level + Write_consistency_level >= the RF policy (Replication factor)" satisfies, you won't have stale data.
It's risky if you can't fulfill the above rule since you might get either stale or contradictory (sometimes new, sometimes old) data.
We want to generated auto incremental integer key in Cassandra. This is trivial task in traditional databases but little complicated in Cassandra.
I have tried out Counter data type which can be incremented using
value=value+1
and tried LWT with
UPDATE myTable SET value=newValue IF value=oldValue.
(where newValue=oldValue+1 for auto increment)
I have been strongly warned against counter variables. I am not sure why though. Can you please help me understand pros and cons of above two approaches?
Starting with a Disclaimer,
You most likely do not want an auto-incrementing integer key in C*. More likely a UUID or TimeUUID is what you want. But if you do happen to really need an auto incrementing value read on.
State and Distributed Systems do not like to blend. Generally whenever you want to make 'really' sure of the state of your distributed system you have to check all replicas and thus sacrifice availability/partition tolerance. LWT use Paxos to allow check and set operations but to do this they require a quorum of nodes and are significantly slower than normal Cassandra operations. LWT should only be used in a small percentage of the operations used in your applications. As long as there is little contention for the counter variable and you don't need it for every write you should be ok.
Counters in C* are a very different implementation. In older versions of C* they were slightly infamous for their ability to lose track of values. Their implementation has been rewritten for vastly improved stability but it would require careful planning on the application side to guarantee unique operations. You can imagine two clients both incrementing a counter simultaneously each thinking that they had received a unique value. Because of this I think you should use LWT if you really need to make sure of uniqueness.
I have a 5 node cluster and keyspace with replication factor of 3. The nature of operations are such that writes are much more important than read, but frequency of read operations are about 10 times higher than write. To achieve consistency while improving overall performance, I chose to set consistency level for writes as ALL, and ONE for read. But this causes operations to fail if even one node is down.
Is there a method by which I can simultaneously change consistency level for (Write,Read) from (ALL,ONE) to (QUORUM, QUORUM) if one node is detected down, or if there is a query-execution-exception; plus this be done in a manner that no operations pass through a temporary phase where it sees a temporary (QUORUM, ONE) setting.
We also plan to expand to twice the capacity, 3 datacenter with 4 nodes each. Is it possible to define custom consistency levels, like, (a level of ALL in any one datacenter and ONE in others). I'm thinking that a level of (EACH_ONE) for read, coupled with above level for write will insure consistency but will allow the cluster to remain available even if a node goes down.
The flexibility is there since you can set your consistency level at a per request basis. Depending on the client you are using, there are some nice capabilities. For example, the java driver has something called a DowngradingConsistencyRetryPolicy such that if a request fails, it will be retried with the next lowest consistency level until the request succeeds. This pushes the complexity of retrying into the client so you don't have to write a bunch of code for it, it's really nice!
The java driver also allows you to configure consistency level per request with Statement#setConsistencyLevel()
As far as custom consistency levels, this is not an option available to you (without changing the cassandra source code), however I think what is made available should be sufficient.
For reads, I don't find much value in ensuring consistency between Data Centers on read. I think LOCAL_QUORUM is more than sufficient, but if you really care, you can use something like EACH_QUORUM for to ensure all datacenters agree, but that will severely impact your response time and availability. For example, if one of your datacenters goes down completely, you won't be able to do reads at all (unless downgrading).
For writes, I'd strongly recommend not using ALL in a multi datacenter set up if you care about response time and availability. Depending on your requirements, LOCAL_QUORUM should likely be more than sufficient.
While one of the benefits of Cassandra is that consistency is tunable, you can have as much strong consistency as you like, but keep in mind that Cassandra is at its best as a Highly Available, Partition Tolerant system.
A really good presentation on consistency that I think really nails a lot of these points is Christos Kalazantis' talk 'Eventual Consistency != Hopeful Consistency' which suggests that a consistency level of ONE is sufficient for a lot of use cases.
There is a great talk here about simulating partition issues in Cassandra with Kingsby's Jesper library.
My question is - with Cassandra are you mainly concerned with the Partitioning part of the CAP theorem, or is Consistency a factor you need to manage as well?
Cassandra is typically classified as an AP system, meaning that availability and partition tolerance are generally considered to be more important than consistency. However, real world systems rarely fall neatly into these categories, so it's more helpful to view CAP as a continuum. Most systems will make some effort to be consistent, available, and partition tolerant, and many (including Cassandra) can be tuned depending on what's most important. Turning knobs like replication factor and consistency level can have a dramatic impact on C, A, and P.
Even defining what the terms mean can be challenging, as various use cases have different requirements for each. So rather than classify a system as CP, AP, or whatever, it's more helpful to think in terms of the options it provides for tuning these properties as appropriate for the use case.
Here's an interesting discussion on how things have changed in the years since the CAP theorem was first introduced.
CAP stands for Consistency, Availability and Partition Tolerance.
In general, its impossible for a distributed system to guarantee above three at a given point.
Apache Cassandra falls under AP system meaning Cassandra holds true for Availability and Partition Tolerance but not for Consistency but this can further tuned via replication factor(how many copies of data) and consistency level (read and write).
For more info: https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlConfigConsistency.html
Interestingly it depends on your Cassandra configuration. Cassandra can at max be AP system. But if you configure it to read or write based on Quorum then it does not remain CAP-available (available as per definition of the CAP theorem) and is only P system.
Just to explain things in more detail CAP theorem means:
C: (Linearizability or strong consistency) roughly means
If operation B started after operation A successfully completed, then
operation B must see the system in the same state as it was on
completion of operation A, or a newer state (but never older state).
A:
“every request received by a non-failing [database] node in the system must result in a [non-error] response”. It’s not sufficient for some node to be able to handle the request: any non-failing node needs to be able to handle it. Many so-called “highly available” (i.e. low downtime) systems actually do not meet this definition of availability.
P
Partition Tolerance (terribly misnamed) basically means that you’re communicating over an asynchronous network that may delay or drop messages. The internet and all our data centres have this property, so you don’t really have any choice in this matter.
Source: Awesome Martin kleppmann's work
The CAP theorem states that a database can’t simultaneously guarantee consistency, availability, and partition tolerance
Since network partitions are part of life, distributed databases tend to be either CP or AP
Cassandara was meant for AP but you can fine tune consistency at the cost of availability.
Availability : It was ensured with replicas. Cassandra typically writes multiple copies to different cluster nodes (generally 3). If one node is unavailable, data won't be lost.
Writing data to multiple nodes will take time because nodes are scattered in different location. At some point of time, data will become eventually consistent.
So with high availability preference, consistency is compramised.
Tunable consistency:
For read or write operation, you can mention consistency level. Consistency level refers to the number of replicas that need to respond for a read or write operation to be considered complete.
For non-critical features, you can provide less consistency level : say 1.
If you think consistency is important, you can increase the level to TWO, THREE or QUORAM ( A majority of replicas)
Assume that you set the consistency level to high (QUORAM) for your critical features and majority of the nodes are down. In this case, write operation will fail.
Here Cassandra sacrificies availabiltiy for consistency.
Have a look at this article for more details.