I'd need to understand if/how a call to MutationBatch.execute() is safe against the server running the code going down.
Have a look at the code below (copy from the Astyanax examples). I intend to use this code to modify 2 rows in 2 different column families. I need to ensure (100%) that if the server executing this code crashes/fails at any point during the execution either:
- nothing is changed in the Cassandra datastore
- ALL changes (2 rows) are applied to the Cassandra datastore
I'm especially concerned about the line "OperationResult result = m.execute();". I would assume that this translates into something like: write all modifications to a commit log in Cassandra and then atomically trigger a change to be executed inside Cassandra (and Cassandra guarantee execution on some server).
Any help on this is very appreciated.
Thanks,
Sven.
CODE:
MutationBatch m = keyspace.prepareMutationBatch();
long rowKey = 1234;
// Setting columns in a standard column
m.withRow(CF_STANDARD1, rowKey)
.putColumn("Column1", "X", null)
.putColumn("Column2", "X", null);
m.withRow(CF_STANDARD1, rowKey2)
.putColumn("Column1", "Y", null);
try {
OperationResult<Void> result = m.execute();
} catch (ConnectionException e) {
LOG.error(e);
}
http://www.datastax.com/docs/0.8/dml/about_writes
In Cassandra, a write is atomic at the row-level, meaning inserting or updating columns for a given row key will be treated as one write operation. Cassandra does not support transactions in the sense of bundling multiple row updates into one all-or-nothing operation.
This means, that there is no way to be 100% sure, that mutation will update two different rows or none. But since Cassandra 0.8 you will have such guarantee at least within single row - all columns modified within single row will success or none - this is all.
You can see mutations on different rows as separate transactions, the fact that they are send within single mutation call does not change anything. Cassandra internally will group all operations together on row key, and execute each row mutation as separate atomic operation.
In your example, you can be sure that rowKey (Column1,Column2) or rowKey2(Column1) was persisted, but never both.
You can enable Hinted Handoff Writes, this would increase probability, that write will propagate with time, but again, this is not ACID DB
Related
I have a table in Cassandra which stores versions of csv-files. It uses a primary key with a unique id for the version (the partition key) and a row number (the clustering key). When I insert a new version I first execute a delete statement on the partition key I am about to insert, to clean up any incomplete data. Then the data is inserted.
Now here is the issue. Even though the delete and subsequent insert are executed synchronously after one another in the application it seems that some level of concurrency still exist in Cassandra, because when I read afterwards, rows from my insert will be missing occasionally - something like 1 in 3 times. Here are some facts:
Cassandra 3.0
Consistency ALL (R+W)
Delete using the Java Driver
Insert using the Spark-Cassandra connector
Number of nodes: 2
Replication factor: 2
The delete statement I execute looks like this:
"DELETE FROM myTable WHERE version = 'id'"
If I omit it, the problem goes away. If I insert a delay between the delete and the insert the problem is reduced (less rows missing). Initially I used a less restrictive consistency level, and I was sure this was the issue, but it didn't affect the problem. My hypothesis is that for some reason the delete statement is being sent to the replica asynchronously despite the consistency level of ALL, but I can't see why this would be the case or how to avoid it.
All mutations are going to by default get a write time of the coordinator for that write. From the docs
TIMESTAMP: sets the timestamp for the operation. If not specified,
the coordinator will use the current time (in microseconds) at the
start of statement execution as the timestamp. This is usually a
suitable default.
http://cassandra.apache.org/doc/cql3/CQL.html
Since the coordinator for different mutations can be different, a clock skew between coordinators can end up with a mutations to one machine to be skewed relative to another.
Since write time controls C* history this means you can have a driver which synchronously inserts and deletes but depending on the coordinator the delete can happen "before" the insert.
Example
Imagine two nodes A and B, B is operating with a 5 second clock skew behind A.
At time 0: You insert data to the cluster and A is chosen as the coordinator. The mutation arrives at A and A assigns a timestamp (0)
There is now a record in the cluster
INSERT VALUE AT TIME 0
Both nodes contain this message and the request returns confirming the write was successful.
At time 2: You issue a delete for the data previously inserted and B is chosen as the coordinator. B assigns a timestamp of (-3) because it is clock skewed 5 seconds behind the time in A. This means that we end up with a statement like
DELETE VALUE AT TIME -3
We acknowledge that all nodes have received this record.
Now the global consistent timeline is
DELETE VALUE AT TIME -3
INSERT VALUE AT TIME 0
Since the insertion occurs after the delete the value still exists.
I have got similar problem, and I have fixed it by enabling Light-Weight-Transaction for both INSERT and DELETE requests (for all queries actually, including UPDATE). It will make sure all queries to this partition are serialized through one "thread", so DELETE wan't overwrite INSERT. For example (assuming instance_id is a primary key):
INSERT INTO myTable (instance_id, instance_version, data) VALUES ('myinstance', 0, 'some-data') IF NOT EXISTS;
UPDATE myTable SET instance_version=1, data='some-updated-data' WHERE instance_id='myinstance' IF instance_version=0;
UPDATE myTable SET instance_version=2, data='again-some-updated-data' WHERE instance_id='myinstance' IF instance_version=1;
DELETE FROM myTable WHERE instance_id='myinstance' IF instance_version=2
//or:
DELETE FROM myTable WHERE instance_id='myinstance' IF EXISTS
IF clauses enable light-wight-transactions for each row, so all of them are serialized. Warning: LWT is more expensive than normal calls, but sometimes they are needed, like in the case of this concurrency problem.
I am using Cassandra 2.1.12 to store the event data in a column family. Below is the c# code for creating the client for the .net which manage connections from Cassandra. Now the problem is rate of insert/update data is very high. So, now let's say i increment a column value in Cassandra on subsequent request. But as i said rate of insert/update is very high. So in my 3 node cluster if first time
i write value of the column would be 1 then in next request i will read the value of this column and update it to 2. But if the value if fetched from other node where the value has not been initialized to 1. Then again value would be stored as 1. So, now to solve this problem i have also kept the value of consistency to be QUORUM. But still the problem persists. Can any one tell me the possible solution for this ?
private static ISession _singleton;
public static ISession GetSingleton()
{
if (_singleton == null)
{
Cluster cluster = Cluster.Builder().AddContactPoints(ConfigurationManager.AppSettings["cassandraCluster"].ToString().Split(',')).Build();
ISession session = cluster.Connect(ConfigurationManager.AppSettings["cassandraKeySpace"].ToString());
_singleton = session;
}
return _singleton;
}
No, It is not possible to achieve your goal in cassandra. The reason is, every distributed application falls within the CAP theorem. According to that, cassandra does not have Consistency.
So in your scenario, you are trying to update a same partition key for many time in multi threaded environment, So it is not guaranteed to see latest data in all the threads. If you try with small interval gap then you might see latest data in all the threads. If your requirement is to increment/decrement the integers then you can go with cassandra counters. But however cassandra counter does not support to retrieve the updated value with in a single request. Which means you can have a request to increment the counter and have a separate request to get the updated value. It is not possible to increment and to get the incremented value in a single request. If you requirement is to only incrementing the value (like counting the number of times a page is viewed) then you can go with cassandra counters. Cassandra counters will not miss any increments/decrements. You can see actual data at last. Hope it helps.
It seems to me that using IF would make the statement possibly fail if re-tried. Therefore, the statement is not idempotent. For instance, given the CQL below, if it fails because of a timeout or system problem and I retry it, then it may not work because another person may have updated the version between retries.
UPDATE users
SET name = 'foo', version = 4
WHERE userid = 1
IF version = 3
Best practices for updates in Cassandra are to make updates idempotent, yet the IF operator is in direct opposition to this. Am I missing something?
If your application is idempotent, then generally you wouldn't need to use the expensive IF clause, since all your clients would be trying to set the same value.
For example, suppose your clients were aggregating some values and writing the result to a roll up table. Each client would calculate the same total and write the same value, so it wouldn't matter if multiple clients wrote to it, or what order they wrote to it, since it would be the same value.
If what you are actually looking for is mutual exclusion, such as keeping a bank balance, then the IF clause could be used. You might read a row to get the current balance, then subtract some money and update the balance only if the balance hadn't changed since you read it. If another client was trying to add a deposit at the same time, then it would fail and would have to try again.
But another way to do that without mutual exclusion is to write each withdrawal and deposit as a separate clustered transaction row, and then calculate the balance as an idempotent result of applying all the transaction rows.
You can use the IF clause for idempotent writes, but it seems pointless. The first client to do the write would succeed and Cassandra would return the value "applied=True". And the next client to try the same write would get back "applied=False, version=4", indicating that the row had already been updated to version 4 so nothing was changed.
This question is more about linerizability(ordering) than idempotency I think. This query uses Paxos to try to determine the state of the system before applying a change. If the state of the system is identical then the query can be retried many times without a change in the results. This provides a weak form of ordering (and is expensive) unlike most Cassandra writes. Generally you should only use CAS operations if you are attempting to record state of a system (rather than a history or log)
Do not use many of these queries if you can help it, the guidelines suggest having only a small percentage of your queries rely on this behavior.
I have 3 similar cassandra tables. Table A1, A2, A3.
All have same columns, but different partition keys.
Data is inserted in all three tables at same time through sequential inserts using Mapper Library (cassandra-driver-mapping-2.1.8.jar)
However, there has been inconsistency in few columns.
E.g. Sometimes A1.colX and A2.colX are same but A3.colX is having old value(not updated) and rest all columns in these three tables have exactly same value.
Another time A1.colY and A3.colY may have same value but A2.colY is having old value(not updated) and rest all columns in these three tables have exactly same value.
I am using Mapper Manager to save the input data in Cassandra.
Is it a known problem with mapper manager or something wrong in my approach?
Sample code:
public void insertInTables(String inputString){
.
.
ClassNameA1 classObjectA1=new Gson().fromJson(inputString, ClassNameA1.class);
ClassNameA2 classObjectA2=new Gson().fromJson(inputString, ClassNameA2.class);
ClassNameA3 classObjectA3=new Gson().fromJson(inputString, ClassNameA3.class);
MappingManager manager = new MappingManager(session);
Mapper<ClassNameA1> mapperA1 = manager.mapper(ClassNameA1.class);
Mapper<ClassNameA2> mapperA2 = manager.mapper(ClassNameA2.class);
Mapper<ClassNameA3> mapperA3 = manager.mapper(ClassNameA3.class);
mapperA1.save(classObjectA1);
mapperA2.save(classObjectA2);
mapperA3.save(classObjectA3);
.
.
}
It might happen as Cassandra is an eventual consistency store, not strong one. Typical reasons for similar behaviour I've witnessed in my experience are:
issues with read/write consistency level. If you have RF=3, but write data with CL=ONE, some nodes might fail to replicate your value on write for some reason (like network/hw glitch). Then if you read with CL=QUORUM (or ONE), the quorum may decide to show you the old column value because the new one not propagated to all nodes correctly. So make sure you're writing with CL=ALL/QUORUM and reading with CL=QUORUM.
issues with hinted hand-off (which is used to protect you from previous issue). Once I've observed a strange behaviour when the first column read was stale/inconsistent (in 1% of all queries), but the second (or third) one shown the correct column value all the time. So try to re-read your inconsistent column multiple times (and think about possible hw/net failures before).
internal database errors got due to hardware failure or the Cassandra itself.
Most of the issues described above are possible to fix with nodetool repair. You can do a full repair and see if this helps.
I know that in Cassandra, there's no strong consistency unless you explicitly request it (and even then, there're no transactions).
However, I'm interested in the "order" of consistency. Take the following example:
In a database node, there are 3 nodes (A, B and C). Two insert queries are sent trough the same CQL-connection (or thrift for that matter, I don't think that's relevant to this question anyway). Both operate on different tables (this might be relevant).
INSERT INTO table_a (id) VALUES (0)
INSERT INTO table_b (id) VALUES (1)
Directly after the questions have been successfuly executed on the node that they're sent to, it goes down. The node may or may not have succeeded in propogating these two queries to B and C.
Now, I'd think that there is an order of consistency. Either both are successfully propogated and executed on B and C, or only the first query is, or both are. I'd think that, under no circumstances only the second query is propogated and executed, and not the first (because of the order of tcp packets, and the fact that obviously, all nodes share the same consistency strategy).
Am I right?
You're right, at least on the node you connect to. What happens on the server is, for a consistency level ONE write:
Receive insert to table_a
Write into commitlog
Acknowledge write to client
Receive insert to table_b
Write into commitlog
Acknowledge write to client
The key is that there is a global commitlog. So you can't flush it for one table and not another. Also, because the writes are sequential, you know the write was made to the commitlog before returning.
The commitlog gets flushed periodically (by default), so could flush after 2 but before 5, in which case only the insert to table_a is kept in the event of a crash immediately after 4 or 5.
On other nodes, the ordering isn't guaranteed, because the write is done asynchronously and writes are multithreaded. But it's not possible to totally lose the first write and not the second if the original node doesn't fail permanently.
If you want stronger guarantees, you can use Cassandra's batching.
Cassandra can guarantee that neither or both of the writes succeed if you write them as a batch. For even old Cassandra versions, if updates within a batch have the same row key (partition key in CQL speak), even if they are in different column families (tables), they will get committed to the commitlog atomically.
New in 1.2 is a batchlog across multiple rows that offers the same guarantees - either all the batch gets applied or none.