Avoid dirty read and write in cassandra - cassandra

I am using Cassandra 2.1.12 to store the event data in a column family. Below is the c# code for creating the client for the .net which manage connections from Cassandra. Now the problem is rate of insert/update data is very high. So, now let's say i increment a column value in Cassandra on subsequent request. But as i said rate of insert/update is very high. So in my 3 node cluster if first time
i write value of the column would be 1 then in next request i will read the value of this column and update it to 2. But if the value if fetched from other node where the value has not been initialized to 1. Then again value would be stored as 1. So, now to solve this problem i have also kept the value of consistency to be QUORUM. But still the problem persists. Can any one tell me the possible solution for this ?
private static ISession _singleton;
public static ISession GetSingleton()
{
if (_singleton == null)
{
Cluster cluster = Cluster.Builder().AddContactPoints(ConfigurationManager.AppSettings["cassandraCluster"].ToString().Split(',')).Build();
ISession session = cluster.Connect(ConfigurationManager.AppSettings["cassandraKeySpace"].ToString());
_singleton = session;
}
return _singleton;
}

No, It is not possible to achieve your goal in cassandra. The reason is, every distributed application falls within the CAP theorem. According to that, cassandra does not have Consistency.
So in your scenario, you are trying to update a same partition key for many time in multi threaded environment, So it is not guaranteed to see latest data in all the threads. If you try with small interval gap then you might see latest data in all the threads. If your requirement is to increment/decrement the integers then you can go with cassandra counters. But however cassandra counter does not support to retrieve the updated value with in a single request. Which means you can have a request to increment the counter and have a separate request to get the updated value. It is not possible to increment and to get the incremented value in a single request. If you requirement is to only incrementing the value (like counting the number of times a page is viewed) then you can go with cassandra counters. Cassandra counters will not miss any increments/decrements. You can see actual data at last. Hope it helps.

Related

Cassandra cluster : How it resolve single timestamp updates?

In Cassandra cluster say I have twp nodes, Now clients send update for the same record(with different values) exactly at same time which goes to
two different nodes of Cassandra cluster. As Cassandra works in master less mode and both nodes can take the update request,
My question is how this conflict will be resolved during eventual consistency and which value will ultimately take precedence ?
Here is the example scenario
Initial data: KeyA: { colA:"val AA", colB:"val BB"}
Client 1 sends update: `update data set colA:"val C1_ColA" where
colB="val BB"` and data becomes below at node_1
KeyA: { colA:"val C1_ColA", colB:"val BB"}
Client 2 `update data set colA:"val C2_ColA" where
colB="val BB"` and data becomes becomes below at node_2
KeyA: { colA:"val C2_ColA", colB:"val BB"}
Now how the value of colA will eventually be resolved here ?
last write always wins, and I doubt that the timestamps will be the same - they are with microseconds resolution, so it's very low probability that timestamp will have the same value.
If you want to prevent this situation, then you can use lightweight transactions that allow to put condition on insert/updates/deletes, but you need to keep in mind that they are very resource intensive, and will add quite big load to the cluster.

Couchdb cluster writes behaviour

I have a couchdb in a cluster setup. 3 nodes, all shards on all nodes and W = 2. We have code to create a document in couchdb and read it back from a view. However, the view returns no corresponding data intermittently. The data is there after we check couchdb directly. So, my question is that why the third nodes taking so long to write a value and how long should I expect the write latency to be?
Thanks in advance.
If you query a view and not use stale parameter, views are supposed to always return fresh data. A view will first gets itself updated to the database, and then returns results for your query.
A view can get results from any node. If you query a view, and don't get expected fresh data, it means that the updates are not yet available on the node used.
If a write a document with W =2, than at least two nodes out of three should successfully update this document. And if all nodes are up, internal synchronization between nodes, within milliseconds or seconds should bring updates to all nodes. So the latency should be just several seconds.
How long was the latency that you experienced? Was your view finally able to produce the expected results after this latency?

Why Cassandra cluster need synchronized clocks between nodes?

In the introduction course of Cassandra DataStax they say that all of the clocks of a Cassandra cluster nodes, have to be synchronized, in order to prevent READ queries to 'old' data.
If one or more nodes are down they can not get updates, but as soon as they back up again - they would update and there is no problem...
So, why Cassandra cluster need synchronized clocks between nodes?
In general it is always a good idea to keep your server clocks in sync, but a primary reason why clock sync is needed between nodes is because Cassandra uses a concept called 'Last Write Wins' to resolve conflicts and determine which mutation represents the most correct up-to date state of data. This is explained in Why cassandra doesn't need vector clocks.
Whenever you 'mutate' (write or delete) column(s) in cassandra a timestamp is assigned by the coordinator handling your request. That timestamp is written with the column value in a cell.
When a read request occurs, cassandra builds your results finding the mutations for your query criteria and when it sees multiple cells representing the same column it will pick the one with the most recent timestamp (The read path is more involved than this but that is all you need to know in this context).
Things start to become problematic when your nodes' clocks become out of sync. As I mentioned, the coordinator node handling your request assigns the timestamp. If you do multiple mutations to the same column and different coordinators are assigned, you can create some situations where writes that happened in the past are returned instead of the most recent one.
Here is a basic scenario that describes that:
Assume we have a 2 node cluster with nodes A and B. Lets assume an initial state where A is at time t10 and B is at time t5.
User executes DELETE C FROM tbl WHERE key=5. Node A coordinates the request and it is assigned timestamp t10.
A second passes and a User executes UPDATE tbl SET C='data' where key=5. Node B coordinates the request and it is assigned timestamp t6.
User executes the query SELECT C from tbl where key=5. Because the DELETE from Step 1 has a more recent timestamp (t10 > t6), no results are returned.
Note that newer versions of the datastax drivers will start defaulting to use Client Timestamps to have your client application generate and assign timestamps to requests instead of relying on the C* nodes to assign them. datastax java-driver as of 3.0 now defaults to client timestamps (read more about there in 'Client-side generation'). This is very nice if all requests come from the same client, however if you have multiple applications writing to cassandra you now have to worry about keeping your client clocks in sync.

how to rapidly increment counters in Cassandra w/o staleness

I have a Cassandra question. Do you know how Cassandra does updates/increments of counters?
I want to use a storm bolt (CassandraCounterBatchingBolt from storm-contrib repo on github) which writes into cassandra. However, I'm not sure how some of the implementation of the incrementCounterColumn() method works .. and there is also the limitations with cassandra counters (from: http://wiki.apache.org/cassandra/Counters) which makes them useless for my scenario IMHO:
If a write fails unexpectedly (timeout or loss of connection to the coordinator node) the client will not know if the operation has been performed. A retry can result in an over count CASSANDRA-2495.
Counter removal is intrinsically limited. For instance, if you issue very quickly the sequence "increment, remove, increment" it is possible for the removal to be lost
Anyway, here is my scenario:
I update the same counter faster than the updates propagate to other Cassandra nodes.
Example:
Say I have 3 cassandra nodes. The counters on each of these nodes are 0.
Node1:0, node2:0, node3:0
An increment comes: 5 -> Node1:0, node2:0, node3:0
Increment starts at node 2 – still needs to propagate to node1 and node3
Node1:0, node2:5, node3:0
In the meantime, another increment arrives before previous increment
is propagated: 3 -> Node1:0, node2:5, node3:0
Assuming 3 starts at a different node than where 5 started we have:
Node1:3, node2:5, node3:0
Now if 3 gets propagated to the other nodes AS AN INCREMENT and not as a new value
(and the same for 5) then eventually the nodes would all equal 8 and this is what I want.
If 3 overwrites 5 (because it has a later timestamp) this is problematic – not what I want.
Do you know how these updates/increments are handled by Cassandra?
Note, that a read before a write is still susceptible to the same problem depending from which replica node the read executes (Quorum can still fail if propagation is not far along)
I'm also thinking that maybe putting a cache b/w my storm bolt and Cassandra might solve this issue but that's a story for another time.
Counters in C* have a complex internal representation that avoids most (but not all) problems of counting things in a leaderless distributed system. I like to think of them as sharded counters. A counter consists of a number of sub-counters identified by host ID and a version number. The host that receives the counter operation increments only its own sub-counter, and also increments the version. It then replicates its whole counter state to the other replicas, which merge it with their states. When the counter is read the node handling the read operation determines the counter value by summing up the total of the counts from each host.
On each node a counter increment is just like everything else in Cassandra, just a write. The increment is written to the memtable, and the local value is determined at read time by merging all of the increments from the memtable and all SSTables.
I hope that explanation helps you believe me when I say that you don't have to worry about incrementing counters faster than Cassandra can handle. Since each node keeps its own counter, and never replicates increment operations, there is no possibility of counts getting lost by race conditions like a read-modify-write scenario would introduce. If Cassandra accepts the write, your're pretty much guaranteed that it will count.
What you're not guaranteed, though, is that the count will appear correct at all times unless. If an increment is written to one node but the counter value read from another just after, there is not guarantee that the increment has been replicated, and you also have to consider what would happen during a network partition. This more or less the same with any write in Cassandra, it's in its eventually consistent nature, and it depends on which consistency levels you used for the operations.
There is also the possibility of a lost acknowledgement. If you do an increment and loose the connection to Cassandra before you can get the response back you can't know whether or not your write got though. And when you get the connection back you can't tell either, since you don't know what the count was before you incremented. This is an inherent problem with systems that choose availability over consistency, and the price you pay for many of the other benefits.
Finally, the issue of rapid remove, increment, removes are real, and something you should avoid. The problem is that the increment operation will essentially resurrect the column, and if these operations come close enough to each other they might get the same timestamp. Cassandra is strictly last-write-wins and determines last based on the timestamp of the operation. If two operations have the same time stamp, the "greater" one wins, which means the one which sorts after in a strict byte order. It's real, but I wouldn't worry too much about it unless you're doing very rapid writes and deletes to the same value (which is probably a fault in your data model).
Here's a good guide to the internals of Cassandra's counters: http://www.datastax.com/wp-content/uploads/2011/07/cassandra_sf_counters.pdf
The current version of counters are just not a good fit for a use case that requires guarantees of no over-counting and immediate consistency.
There are increment and decrement operations, and those will not collide with each other, and, barring any lost mutations or replayed mutations, will give you a correct result.
The rewrite of Cassandra counters (https://issues.apache.org/jira/browse/CASSANDRA-6504) might be interesting to you, and it should address all of the current concerns with getting a correct count.
In the meantime, if I had to implement this on top of a current version of Cassandra, and an accurate count was essential, I would probably store each increment or decrement as a column, and do read-time aggregation of the results, while writing back a checkpoint so you don't have to read back to the beginning of time to calculate subsequent results.
That adds a lot of burden to the read side, though it is extremely efficient on the write path, so it may or may not work for your use case.
To understand updates/increments i.e write operations, i will suggest you to go through Gossip, protocol used by Cassandra for communication. In Gossip every participant(node) maintains their state using the tuple σ(K) = (V*N) where σ(K) is the state of K key with V value and N as version number.
To maintain the single version of truth for a data packet Gossip maintains a Reconciliation mechanism namely Precise & Scuttlebutt(current). According to Scuttlebutt Reconciliation, before updating any tuple they communicate with each other to check who is holding the highest version (newest value) of the key. Whosoever is holding the highest version is responsible for the write operation.
For further information read this article.

astyanax mutationbatch failure handling

I'd need to understand if/how a call to MutationBatch.execute() is safe against the server running the code going down.
Have a look at the code below (copy from the Astyanax examples). I intend to use this code to modify 2 rows in 2 different column families. I need to ensure (100%) that if the server executing this code crashes/fails at any point during the execution either:
- nothing is changed in the Cassandra datastore
- ALL changes (2 rows) are applied to the Cassandra datastore
I'm especially concerned about the line "OperationResult result = m.execute();". I would assume that this translates into something like: write all modifications to a commit log in Cassandra and then atomically trigger a change to be executed inside Cassandra (and Cassandra guarantee execution on some server).
Any help on this is very appreciated.
Thanks,
Sven.
CODE:
MutationBatch m = keyspace.prepareMutationBatch();
long rowKey = 1234;
// Setting columns in a standard column
m.withRow(CF_STANDARD1, rowKey)
.putColumn("Column1", "X", null)
.putColumn("Column2", "X", null);
m.withRow(CF_STANDARD1, rowKey2)
.putColumn("Column1", "Y", null);
try {
OperationResult<Void> result = m.execute();
} catch (ConnectionException e) {
LOG.error(e);
}
http://www.datastax.com/docs/0.8/dml/about_writes
In Cassandra, a write is atomic at the row-level, meaning inserting or updating columns for a given row key will be treated as one write operation. Cassandra does not support transactions in the sense of bundling multiple row updates into one all-or-nothing operation.
This means, that there is no way to be 100% sure, that mutation will update two different rows or none. But since Cassandra 0.8 you will have such guarantee at least within single row - all columns modified within single row will success or none - this is all.
You can see mutations on different rows as separate transactions, the fact that they are send within single mutation call does not change anything. Cassandra internally will group all operations together on row key, and execute each row mutation as separate atomic operation.
In your example, you can be sure that rowKey (Column1,Column2) or rowKey2(Column1) was persisted, but never both.
You can enable Hinted Handoff Writes, this would increase probability, that write will propagate with time, but again, this is not ACID DB

Resources