maybe a user registered into my app.
Then I do this things:
const query1 = INSERT INTO users_by_email ... values (...)
const query2 = INSERT INTO users_by_id ... VALUES (...)
Maybe query2 failed. What do I am then ? Or It cant failed if all datas are correctly ?
Write can fail for multiple reasons in Cassandra. For example a node may go down or there may be network issues between nodes. So don't assume that writes will succeed if all the data is correct or your query is correct.
Coming back to your query, what should an application do. Simplest answer is to try the failed query again.
One more thing I would like to tell that if two tables are having same data which you want to sync, you should use batch
BEGIN BATCH
INSERT INTO users_by_email ... values (...);
INSERT INTO users_by_id ... VALUES (...);
APPLY BATCH ;
One more thing that batch doesn't guarantee that your changes will get rolled back. It will just inform the client that the set of query failed, so you as a client should retry your batch queries.
Related
How can I delete a row from Cassandra and get the value it had just before the deletion?
I could execute a SELECT and DELETE query in series, but how can I be sure that the data was not altered concurrently between the execution of those two queries?
I've tried to execute the SELECT and DELETE queries in a batch but that seems to be not allowed.
cqlsh:foo> BEGIN BATCH
... SELECT * FROM data_by_user WHERE user = 'foo';
... DELETE FROM data_by_user WHERE user = 'foo';
... APPLY BATCH;
SyntaxException: line 2:4 mismatched input 'SELECT' expecting K_APPLY (BEGIN BATCH [SELECT]...)
In my use case I have one main table that stores data for items. And I've build several tables that allow to lookup items based on those informations.
If I delete an item from the main table, I must also remove it from the other tables.
CREATE TABLE items (id text PRIMARY KEY, owner text, liking_users set<text>, ...);
CREATE TABLE owned_items_by_user (user text, item_id text, PRIMARY KEY ((user), item_id));
CREATE TABLE liked_items_by_user (user text, item_id tect, PRIMARY KEY ((user), item_id));
...
I'm afraid the tables might contain wrong data if I delete an item and at the same time someone e.g. hits the like button of that same item.
The deleteItem method execute a SELECT query to fetch the current row of the item from the main table
The likeItem method that gets executed at the same times runs an UPDATE query and inserts the item into the owned_items_by_user, liked_items_by_user, ... tables. This happens after the SELECT statement was executed and the UPDATE query is executed before the DELETE query.
The deleteItem method deletes the items from the owned_items_by_user, liked_items_by_user, ... tables based on the data just retrieved via the SELECT statement. This data does not yet contain the just added like. The item is therefore deleted, but the just added like remains in the liked_items_by_user table.
You can do a select beforehand, then do a lightweight transaction on the delete to ensure that the data still looks exactly like it did when you selected. If it does, you know the latest state before you deleted. If it does not, keep retrying the whole procedure until it sticks.
Unfortunately you cannot do a SELECT query inside a batch statement. If you read the docs here, only insert, update, and delete statements can be used.
What you're looking for is atomicity on the execution, but batch statements are not going to be the way forward. If the data has been altered, your worst case situation is zombies, or data that could reappear.
Cassandra uses a grade period mechanism to deal with this, you can find the details here. If for whatever reason, this is critical to your business logic, the "best" thing you can do in this situation is to increase the consistency level, or restructure the read pattern at application level to not rely on perfect atomicity, whichever the right trade off is for you. So either you give up some of the performance, or tune down the requirement.
In practice, QUORUM should be more than enough to satisfy most situations most of the time. Alternatively, you can do an ALL, and you pay the performance penalty, but that means all replicas for the given foo partition key will have to acknowledge the write both in the commitlog and the memtable. Note, this still means a flush from the commitlog will need to happen before the delete is complete, but you can tune the consistency to the level you require.
You don't have atomicity in the SQL sense, but depending on throughput it's unlikely that you will need it(touch wood).
TLDR:
USE CONSISTENCY ALL;
DELETE FROM data_by_user WHERE user = 'foo';
That should do the trick. The error you're seeing now is basically the ANTLR3 Grammar parser for CQL 3, which is not designed to accept to SELECT queries inside batches simply because they are not supported, you can see that here.
Cassandra doesn't imply particular order in which the statements are executed.
Executing statements like the code below doesn't execute in the order.
INSERT INTO channel
JSON ''{"cuid":"NQAA0WAL6drA"
,"owner":"123"
,"status":"open"
,"post_count":0
,"mem_count":1
,"link":"FWsA609l2Og1AADRYODkzNjE2MTIyOTE="
, "create_at":"1543328307953"}}'';
BEGIN BATCH
UPDATE channel
SET title = ? , description = ? WHERE cuid = ? ;
INSERT INTO channel_subscriber
JSON ''{"cuid":"NQAA0WAL6drA"
,"user_id":"123"
,"status":"subscribed"
,"priority":"owner"
,"mute":false
,"setting":{"create_at":"1543328307956"}}'';
APPLY BATCH ;
According to system_traces.sessions each of them are received by different nodes.
Sometimes the started_at time in both query are equal (in milliseconds), sometimes the started_at time of second query is less than the first one.
So, this ruins the order of statements and data.
We use erlang, marina driver, consistency_level is QUORUM and time of all cassandra nodes and application server are sync.
How can I force Cassandra to execute queries in order?
Because of the distributed nature, queries in Cassandra could be received by different nodes, and depending on the load on particular node, it could be that some queries that sent later, are executed earlier. In your case you can put first insert into batch itself. Or, as it's implemented in some drivers (for example, Java driver), use whitelist policy to send queries to only one node - but it will be bottleneck in this case. (and I really not sure that your driver has such functionality).
I have two methods in my server application:
boolean isMessageExist(messageId) which execute below query:
SELECT messageId from message Where messageId =1;
insertMessage(int messageId,String data) which execute below query:
INSERT INTO message (messageId,data) VALUES (1, xyz);
In my code I am doing below to meet the required that "only insert if message does not exist".
if(!isMessageExist(1)){
insertMessage(1,"xyz")
}
But above code is not working if request for same messageId comes almost simultaneously.
i.e at time T0 ... the Read1(1), Write1(1) and Read2(1), Write2(1) are happening at the same time since the two requests were sent from the client at the same time. Is there a way make those request in sequence at serverside. I mean Read2(1) should always get the result Write1(1) ?
I don't want to USE CAS operation if IF NOT EXISTS due to performance overhead.
Is there any other way to achieve my requirement? Please suggest.
Using Cassandra's light weight transactions (LWT) IF NOT EXISTS should be both less expensive then what you are currently doing and satisfy your requirement for uniqueness.
INSERT INTO message( messageId, data ) VALUES ( 1, xyz ) IF NOT EXISTS
You can test and verify the performance, but two round trips (read, write) is almost certainly more expensive than a single INSERT ... IF NOT EXISTS.
Alternatively, if you can redesign your application so it uses UPSERTS -- where new values simply overwrites old data, that would be even better and use a more native Cassandra style.
I was following this link to use a batch transaction without using BATCH keyword.
Cluster cluster = Cluster.builder()
.addContactPoint(“127.0.0.1")
.build();
Session session = cluster.newSession();
//Save off the prepared statement you're going to use
PreparedStatement statement = session.prepare(“INSERT INTO tester.users (userID, firstName, lastName) VALUES (?,?,?)”);
//
List<ResultSetFuture> futures = new ArrayList<ResultSetFuture>();
for (int i = 0; i < 1000; i++) {
//please bind with whatever actually useful data you're importing
BoundStatement bind = statement.bind(i, “John”, “Tester”);
ResultSetFuture resultSetFuture = session.executeAsync(bind);
futures.add(resultSetFuture);
}
//not returning anything useful but makes sure everything has completed before you exit the thread.
for(ResultSetFuture future: futures){
future.getUninterruptibly();
}
cluster.close();
My question is with the given approach is it possible to INSERT, UPDATE or DELETE data from different table and if any of those fail all should be failed by maintaining the same performance (as described in the link).
With this approach what i tried, i was trying to insert, delete data from different table and one query got failed so all previous query was executed and updated the db.
With BATCH I can see that if any statement get failed all statement will be failed. But using BATCH on different table is anti-pattern so what is the solution ?
With BATCH I can see that if any statement get failed all statement will be failed.
Wrong, the guarantee of LOGGED BATCH is: if some statements in the batch fail, they will be retried until the succeed.
But using BATCH on different table is anti-pattern so what is the solution ?
ACID transaction is not possible with Cassandra, it would require some sort of global lock or global coordination and be prohibitive performance-wise.
However, if you don't care about the performance cost, you can implement your self a global lock/lease system using Light Weight Transaction primitives as described here
But be ready to face poor performance
Is it possible to do sequential batch in cassandra.
eg:
insert into table1 and take uuid from this insert operation and pass this to table2 insert statement.
If table 2 insert fails, fail the entire operaion.
If not , whats my best option?
(Its kind of transactional)
You'r best shot is Cassandra Batch statement:
BATCH - Cassandra documentation
Combined with "IF EXISTS" constraints (like here: DELETE - Cassandra documentation) it may be what you need.
However, I don't believe there is a possibility to "insert into table1 and take uuid from this insert operation and pass this to table2 insert statement". You can think of batches in C* as transactions in SQL - it's fully executed or not.
Important things to note:
batches can span multiple tables in C*
although batches are atomic, they are not isolated. Some portion of a batch can be executed, in another query you can read those changes, but it may happen they will be revoked because the batch will fail.