Cassandra WriteTimeoutException during CAS write query

Cassandra WriteTimeoutException during CAS write query - cassandra

We have two CAS queries. It was working just fine with 2 containers per region. We have increased containers from 2 to 3 then we started seeing the WriteTimeoutException. The traffic is same or even less compared to the regular business hours. Cassandra is in 3 regions and each cluster has 3 hosts.
Not sure what could be the reason for these errors, but the change was increase in the application container by one. Appreciate if any help here to debug further.
UPDATE order_sequences USING TTL 10 set instance_name = ? where id_name = ? IF instance_name = null", ConsistencyLevel.QUORUM)
UPDATE order_sequences SET next_id= ? where id_name= ? IF next_id= ? AND instance_name = ?", ConsistencyLevel.QUORUM),
Error stack:
com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during CAS write query at consistency SERIAL (7 replica were required but only 0 acknowledged the write) at
com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:85) at
com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:23) at
com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:35) at
com.datastax.driver.core.ChainedResultSetFuture.getUninterruptibly(ChainedResultSetFuture.java:59) at
com.datastax.driver.core.NewRelicChainedResultSetFuture.getUninterruptibly(NewRelicChainedResultSetFuture.java:11) at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:58) at

CAS write are a specialized metric which are triggered when a compare and set is conducted. LWT transaction is known as compare and set (CAS); replica data is compared and any data found to be out of date is set to the most consistent value.
In Cassandra, the process combines the Paxos protocol with normal read and write operations to accomplish the compare and set operation.
The Paxos protocol is implemented as a series of phases:
• Prepare/Promise
• Read/Results
• Propose/Accept
• Commit/Acknowledge
These four phases require four round trips between a node proposing a lightweight transaction and any cluster replicas involved in the transaction. The performance will be affected. Consequently, reserve lightweight transactions for situations where concurrency must be considered.
For example, the following series of operations can fail:
DELETE ...
INSERT .... IF NOT EXISTS
SELECT ....
The following series of operations will work:
DELETE ... IF EXISTS
INSERT .... IF NOT EXISTS
SELECT .....
Would strongly recommend you to check the "CAS write latency" statistics from
"nodetool proxyhistograms" command, it provides a histogram of network statistics at the time of the command.
Could you please let me know in case if you are still facing this error ?

Related

Proper Consistency Level to read 'everything'

I'm creating a sync program to periodically copy our Cassandra data into another database. The database I'm copying from only gets INSERTs - data is never UPDATEd or DELETEd. I would like to address Cassandra's eventual consistency model in two ways:
1 - Each sync scan overlaps the last by a certain time span. For example, if the scan happens every hour, then each scan looks an hour and a half backwards. The data contains a unique key, so reading the same record in more than one scan is not an issue.
2 - I use a Consistency level of ALL to ensure that I'm scanning all of the nodes on the cluster for the data.
Is ALL the best Consistency for this situation? I just need to see a record on any node, I don't care if it appears on any other nodes. But I don't want to miss any INSERTed records either. But I also don't want to experience timeouts or performance issues because Cassandra is waiting for multiple nodes to see that record.
To complicate this a bit more, this Cassandra network is made up of 6 clusters in different geographic locations. I am only querying one. My assumption is that the overlap mentioned in #1 will eventually catch up records that exist on other clusters.
The query I'm doing is like this:
SELECT ... FROM transactions WHERE userid=:userid AND transactiondate>:(lastscan-overlap)
Where userid is the partioning key and transactiondate is a clustering column. The list of userId's is sourced elsewhere.

I use a Consistency level of All to ensure that I'm scanning all of the nodes on the cluster for the data
So consistency ALL has more to do with the number of data replicas read than it does with the number of nodes contacted. If you have a replication factor (RF) of 3 and query a single row at ALL, then Cassandra will hash your partition key to figure out the three nodes responsible for that row, contact all 3 nodes, and wait for all 3 to respond.
I just need to see a record on one node
So I think you'd be fine with LOCAL_ONE, in this regard.
The only possible advantage of using ALL, is that it actually does help to enforce data consistency by triggering a read repair 100% of the time. So if eventual consistency is a concern, that's a "plus." But *_ONE is definitely faster.
The CL documentation talks a lot about 'stale data', but I am interested in 'new data'
In your case, I don't see stale data as a possibility, so you should be ok there. The issue that you would face instead, is in the event that one or more replicas failed during the write operation, querying at LOCAL_ONE may or may not get you the only replica that actually exists. So your data wouldn't be stale vs. new, it'd be exists vs. does not exist. One point I talk about in the linked answer, is that perhaps writing at a higher consistency level and reading at LOCAL_ONE might work for your use case.
A few years ago, I wrote an answer about the different consistency levels, which you might find helpful in this case:
If lower consistency level is good then why we need to have a higher consistency(QUORUM,ALL) level in Cassandra?

Cassandra cluster : How it resolve single timestamp updates?

In Cassandra cluster say I have twp nodes, Now clients send update for the same record(with different values) exactly at same time which goes to
two different nodes of Cassandra cluster. As Cassandra works in master less mode and both nodes can take the update request,
My question is how this conflict will be resolved during eventual consistency and which value will ultimately take precedence ?
Here is the example scenario
Initial data: KeyA: { colA:"val AA", colB:"val BB"}
Client 1 sends update: `update data set colA:"val C1_ColA" where
colB="val BB"` and data becomes below at node_1
KeyA: { colA:"val C1_ColA", colB:"val BB"}
Client 2 `update data set colA:"val C2_ColA" where
colB="val BB"` and data becomes becomes below at node_2
KeyA: { colA:"val C2_ColA", colB:"val BB"}
Now how the value of colA will eventually be resolved here ?

last write always wins, and I doubt that the timestamps will be the same - they are with microseconds resolution, so it's very low probability that timestamp will have the same value.
If you want to prevent this situation, then you can use lightweight transactions that allow to put condition on insert/updates/deletes, but you need to keep in mind that they are very resource intensive, and will add quite big load to the cluster.

Select All Performance in Cassandra

I'm current using DB2 and planning to use cassandra because as i know cassandra have a read performance greater than RDBMS.
May be this is a stupid question but I have experiment that compare read performance between DB2 and Cassandra.
Testing with 5 million records and same table schema.
With query SELECT * FROM customer. DB2 using 25-30s and Cassandra using 40-50s.
But query with where condition SELECT * FROM customer WHERE cusId IN (100,200,300,400,500) DB2 using 2-3s and Cassandra using 3-5ms.
Why Cassandra faster than DB2 with where condition? So i can't prove which database is greater with SELECT * FROM customer right?
FYI.
Cassandra: RF=3 and CL=1 with 3 nodes each node run on 3 computers (VM-Ubuntu)
DB2: Run on windows
Table schema:
cusId int PRIMARY KEY, cusName varchar

If you look at the types of problems that Cassandra is good at solving, then the reasons behind why unbound ("Select All") queries suck become quite apparent.
Cassandra was designed to be a distributed data base. In many Cassandra storage patterns, the number of nodes is greater than the replication factor (I.E., not all nodes contain all of the data). Therefore, limiting the number of network hops becomes essential to modeling high-performing queries. Cassandra performs very well with specific queries (which utilize the partition/clustering key structure), because it can quickly locate the node primarily responsible for the data.
Unbound queries (A.K.A. multi-key queries) incur the extra network time because a coordinator node is required. So one node acts as the coordinator, queries all other nodes, collates data, and returns the result set. Specifying a WHERE clause (with at least a partition key) and while using a "Token Aware" load balancing policy, performs well for two reasons:
A coordinator node is not required.
The node primarily responsible for the range is queried, returning the result set in a single netowrk hop.
tl;dr;
Querying Cassandra with an unbound query, causes it to incur a lot of extra processing and network time that it normally wouldn't have to do, had the query been specified with a WHERE clause.

Even as a troublesome query like a no-condition range query, 40-50s is pretty extreme for C*. Is the coordinator hitting GCs with the coordination? Can you include code used for your test?
When you make a select * vs millions of records, it wont fetch them all at once, it will grab the fetchSize at a time. If your just iterating through this, the iterator will actually block even if you used executeAsync initially. This means that every 10k (default) records it will issue a new query that you will block on. The serialized nature of this will take time just from a network perspective. http://docs.datastax.com/en/developer/java-driver/3.1/manual/async/#async-paging explains how to do it in a non-blocking way. You can use this to to kick off the next page fetch while processing the current which would help.
Decreasing the limit or fetch size could also help, since the coordinator may walk token ranges (parallelism is possible here but its heuristic is not perfect) one at a time until it has read enough. If it has to walk too many nodes to respond it will be slow, this is why empty tables can be very slow to do a select * on, it may serially walk every replica set. With 256 vnodes this can be very bad.

Why Cassandra cluster need synchronized clocks between nodes?

In the introduction course of Cassandra DataStax they say that all of the clocks of a Cassandra cluster nodes, have to be synchronized, in order to prevent READ queries to 'old' data.
If one or more nodes are down they can not get updates, but as soon as they back up again - they would update and there is no problem...
So, why Cassandra cluster need synchronized clocks between nodes?

In general it is always a good idea to keep your server clocks in sync, but a primary reason why clock sync is needed between nodes is because Cassandra uses a concept called 'Last Write Wins' to resolve conflicts and determine which mutation represents the most correct up-to date state of data. This is explained in Why cassandra doesn't need vector clocks.
Whenever you 'mutate' (write or delete) column(s) in cassandra a timestamp is assigned by the coordinator handling your request. That timestamp is written with the column value in a cell.
When a read request occurs, cassandra builds your results finding the mutations for your query criteria and when it sees multiple cells representing the same column it will pick the one with the most recent timestamp (The read path is more involved than this but that is all you need to know in this context).
Things start to become problematic when your nodes' clocks become out of sync. As I mentioned, the coordinator node handling your request assigns the timestamp. If you do multiple mutations to the same column and different coordinators are assigned, you can create some situations where writes that happened in the past are returned instead of the most recent one.
Here is a basic scenario that describes that:
Assume we have a 2 node cluster with nodes A and B. Lets assume an initial state where A is at time t10 and B is at time t5.
User executes DELETE C FROM tbl WHERE key=5. Node A coordinates the request and it is assigned timestamp t10.
A second passes and a User executes UPDATE tbl SET C='data' where key=5. Node B coordinates the request and it is assigned timestamp t6.
User executes the query SELECT C from tbl where key=5. Because the DELETE from Step 1 has a more recent timestamp (t10 > t6), no results are returned.
Note that newer versions of the datastax drivers will start defaulting to use Client Timestamps to have your client application generate and assign timestamps to requests instead of relying on the C* nodes to assign them. datastax java-driver as of 3.0 now defaults to client timestamps (read more about there in 'Client-side generation'). This is very nice if all requests come from the same client, however if you have multiple applications writing to cassandra you now have to worry about keeping your client clocks in sync.

Frequent rpc_timeouts of the query SELECT count(*) FROM Keyspace1.Standard1 limit 5; in cassandra

I have a 5 node cassandra cluster with 3 nodes on a private DC & the other 2 on AWS.
Select * requests are timing out even when it is limited to 5. I understand if they are timing out for high numbers but timing out for single digits looks strange strange.
Any one observed this before?
NOTE: Queries with WHERE clause are normal.

There are two or three options:
1) Your servers are too busy / slow to reply to the query.
2) You're hitting a tombstone exception, which sometimes doesn't get reported properly. Check the log on the cassandra server for the word 'tombstone' to be sure.
3) You're asking for too much data at once - less likely if it happens when you LIMIT 5.
I'm guessing it's #2. Look for tombstone warnings in your cassandra server logs. If that's the problem, you likely have a data model problem.

Are the nodes on two different networks (you said private DC and AWS), check if the communication between nodes.
what is the consistency you are using when querying, try with consistency of one and see the response and then checkthe communication between nodes (with higher consistency it always checks the consistency of data with other nodes before responding back with results).
Does your select have any where clause or a simple select *, if later then again retrieving data from different nodes with a slow inter node communication might be an issue.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string