Cassandra. Not enough replica available - Java driver behaviour different from CQL console - cassandra

I have a very simple cluster with 2 nodes.
I have created a keyspace with SimpleStrategy replication and a replication factor of 2.
For reads and writes I always use the default data consistency level of ONE.
If I take down one of the two nodes, by using the datastax java driver, I can still read data but when I try to write I get "Not enough replica available for query at consistency ONE (1 required but only 0 alive)".
Strangely if I execute the exactly same insert statement by using the CQL console it works without any problem. Even when using the CQL console the data consistency level was 1.
Am I missing something?
TIA
Update
I have done some more tests and the problem appears only when I use the BatchStatement. If I execute the prepared statement directly it works. Any idea ?
Here the code
Cluster cluster = Cluster.builder()
.addContactPoint("192.168.1.10")
.addContactPoint("192.168.1.12")
.build();
Session session = cluster.connect();
session.execute("use giotest");
BatchStatement batch = new BatchStatement();
PreparedStatement statement = session.prepare("INSERT INTO hourly(series_id, timestamp, value) VALUES (?, ?, ?)");
for (int i = 0; i < 50; i++) {
batch.add(statement.bind(new Long(i), new Date(), 2345.5));
}
session.execute(batch);
batch.clear();
session.close();
cluster.close();

Batches are atomic by default: if the coordinator fails mid-batch, Cassandra will make sure other nodes replay the remaining requests. It uses a distributed batch log for that (see this post for more details).
This batch log must be replicated to at least one replica other than the coordinator, otherwise that would defeat the above mechanism.
In your case, there is no other replica, only the coordinator. So Cassandra is telling you that it cannot provide the guarantees of an atomic batch. See also the discussion on CASSANDRA-7870.

If you haven't already, make sure you have specified both hosts at the driver level.

Related

will cassandra fail on two parallel create keyspace commands executed simultanously

We have experienced, that if we rollout DDL cql scripts, that will alter an existing table in parallel, that there is a substantial chance to corrupt the keyspace to the point that we needed to recreate it.
We have now serialized this process, including the creation of that keyspace. Now there is a flaming discussion, if cassandra explicitely supports the creation of different keyspaces in parallel.
I suppose, that this is ok, but since the cluster is large, we would like to have a second opinion, so I am asking here:
Can we safely assume, that parallel creation of different keyspaces is safe in cassandra?
In current versions of the Cassandra it's not possible - you need to wait for schema agreement after each DDL statement, including creation of other keyspaces. Usually drivers are waiting for some time (default 10 seconds) to get confirmation that all nodes in cluster have the same schema version. Depending on the driver, you can explicitly check for schema agreement - either in the result set returned after execution of statement, or via cluster metadata. For example, in Java it could look as following:
Metadata metadata = cluster.getMetadata();
for (int i = 0; i < commands.length; i++) {
System.out.println("Executing '" + commands[i] + "'");
ResultSet rs = session.execute(commands[i]);
if (!rs.getExecutionInfo().isSchemaInAgreement()) {
while (!metadata.checkSchemaAgreement()) {
System.out.println("Schema isn't in agreement, sleep 1 second...");
Thread.sleep(1000);
}
}
}
New versions of Cassandra will have improvements in this area, for example, via CASSANDRA-13426 (committed into 4.0), and CASSANDRA-10699 (not yet done)

Cassandra doesn't imply particular order in which the statements are executed

Cassandra doesn't imply particular order in which the statements are executed.
Executing statements like the code below doesn't execute in the order.
INSERT INTO channel
JSON ''{"cuid":"NQAA0WAL6drA"
,"owner":"123"
,"status":"open"
,"post_count":0
,"mem_count":1
,"link":"FWsA609l2Og1AADRYODkzNjE2MTIyOTE="
, "create_at":"1543328307953"}}'';
BEGIN BATCH
UPDATE channel
SET title = ? , description = ? WHERE cuid = ? ;
INSERT INTO channel_subscriber
JSON ''{"cuid":"NQAA0WAL6drA"
,"user_id":"123"
,"status":"subscribed"
,"priority":"owner"
,"mute":false
,"setting":{"create_at":"1543328307956"}}'';
APPLY BATCH ;
According to system_traces.sessions each of them are received by different nodes.
Sometimes the started_at time in both query are equal (in milliseconds), sometimes the started_at time of second query is less than the first one.
So, this ruins the order of statements and data.
We use erlang, marina driver, consistency_level is QUORUM and time of all cassandra nodes and application server are sync.
How can I force Cassandra to execute queries in order?
Because of the distributed nature, queries in Cassandra could be received by different nodes, and depending on the load on particular node, it could be that some queries that sent later, are executed earlier. In your case you can put first insert into batch itself. Or, as it's implemented in some drivers (for example, Java driver), use whitelist policy to send queries to only one node - but it will be bottleneck in this case. (and I really not sure that your driver has such functionality).

Apache Cassandra Reading explanation

I am currently managing a percona xtradb cluster composed by 5 nodes, that hadle milions of insert every day. Write performance are very good but reading is not so fast, specially when i request a big dataset.
The record inserted are sensors time series.
I would like to try apache cassandra to replace percona cluster, but i don't understand how data reading works. I am looking for something able to split query around all the nodes and read in parallel from more than one node.
I know that cassandra sharding can have shard replicas.
If i have 5 nodes and i set a replica factor of 5, does reading will be 5x faster?
Cassandra read path
The read request initiated by a client is sent over to a coordinator node which checks the partitioner what are the replicas responsible for the data and if the consistency level is met.
The coordinator will check is it is responsible for the data. If yes, will satisfy the request. If no, it will send the request to fastest answering replica (this is determined using the dynamic snitch). Also, a request digest is sent over to the other replicas.
The node will compare the returning data digests and if all are the same and the consistency level has been met, the data is returned from the fastest answering replica. If the digests are not the same, the coordinator will issue some read repair operations.
On the node there are a few steps performed: check row cache, check memtables, check sstables. More information: How is data read? and ReadPathForUsers.
Load balancing queries
Since you have a replication factor that is equal to the number of nodes, this means that each node will hold all of your data. So, when a coordinator node will receive a read query it will satisfy it from itself. In particular(if you would use a LOCAL_ONE consistency level, the request will be pretty fast).
The client drivers implement the load balancing policies, which means that on your client you can configure how the queries will be spread around the cluster. Some more reading - ClientRequestsRead
If i have 5 nodes and i set a replica factor of 5, does reading will be 5x faster?
No. It means you will have up to 5 copies of the data to ensure that your query can be satisfied when nodes are down. Cassandra does not divide up the work for the read. Instead it tries to force you to design your data in a way that makes the reads efficient and fast.
Best way to read cassandra is by making sure that each query you generate hits cassandra partition. Which means the first part of your simple primary(x,y,z) key and first bracket of compound ((x,y),z) primary key are provided as query parameters.
This goes back to cassandra table design principle of having a table design by your query needs.
Replication is about copies of data and Partitioning is about distributing data.
https://docs.datastax.com/en/cassandra/3.0/cassandra/architecture/archPartitionerAbout.html
some references about cassandra modelling,
https://www.datastax.com/dev/blog/the-most-important-thing-to-know-in-cassandra-data-modeling-the-primary-key
https://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling
it is recommended to have 100 MB partitions but not compulsory.
You can use cassandra-stress utility to have look report of how your reads and writes look.

What is the solution of multi table ACID transaction in cassandra

I was following this link to use a batch transaction without using BATCH keyword.
Cluster cluster = Cluster.builder()
.addContactPoint(“127.0.0.1")
.build();
Session session = cluster.newSession();
//Save off the prepared statement you're going to use
PreparedStatement statement = session.prepare(“INSERT INTO tester.users (userID, firstName, lastName) VALUES (?,?,?)”);
//
List<ResultSetFuture> futures = new ArrayList<ResultSetFuture>();
for (int i = 0; i < 1000; i++) {
//please bind with whatever actually useful data you're importing
BoundStatement bind = statement.bind(i, “John”, “Tester”);
ResultSetFuture resultSetFuture = session.executeAsync(bind);
futures.add(resultSetFuture);
}
//not returning anything useful but makes sure everything has completed before you exit the thread.
for(ResultSetFuture future: futures){
future.getUninterruptibly();
}
cluster.close();
My question is with the given approach is it possible to INSERT, UPDATE or DELETE data from different table and if any of those fail all should be failed by maintaining the same performance (as described in the link).
With this approach what i tried, i was trying to insert, delete data from different table and one query got failed so all previous query was executed and updated the db.
With BATCH I can see that if any statement get failed all statement will be failed. But using BATCH on different table is anti-pattern so what is the solution ?
With BATCH I can see that if any statement get failed all statement will be failed.
Wrong, the guarantee of LOGGED BATCH is: if some statements in the batch fail, they will be retried until the succeed.
But using BATCH on different table is anti-pattern so what is the solution ?
ACID transaction is not possible with Cassandra, it would require some sort of global lock or global coordination and be prohibitive performance-wise.
However, if you don't care about the performance cost, you can implement your self a global lock/lease system using Light Weight Transaction primitives as described here
But be ready to face poor performance

Is it possible to read data only from a single node in a Cassandra cluster with a replication factor of 3?

I know that Cassandra have different read consistency levels but I haven't seen a consistency level which allows as read data by key only from one node. I mean if we have a cluster with a replication factor of 3 then we will always ask all nodes when we read. Even if we choose a consistency level of one we will ask all nodes but wait for the first response from any node. That is why we will load not only one node when we read but 3 (4 with a coordinator node). I think we can't really improve a read performance even if we set a bigger replication factor.
Is it possible to read really only from a single node?
Are you using a Token-Aware Load Balancing Policy?
If you are, and you are querying with a consistency of LOCAL_ONE/ONE, a read query should only contact a single node.
Give the article Ideology and Testing of a Resilient Driver a read. In it, you'll notice that using the TokenAwarePolicy has this effect:
"For cases with a single datacenter, the TokenAwarePolicy chooses the primary replica to be the chosen coordinator in hopes of cutting down latency by avoiding the typical coordinator-replica hop."
So here's what happens. Let's say that I have a table for keeping track of Kerbalnauts, and I want to get all data for "Bill." I would use a query like this:
SELECT * FROM kerbalnauts WHERE name='Bill';
The driver hashes my partition key value (name) to the token of 4639906948852899531 (SELECT token(name) FROM kerbalnauts WHERE name='Bill'; returns that value). If I am working with a 6-node cluster, then my primary token ranges will look like this:
node start range end range
1) 9223372036854775808 to -9223372036854775808
2) -9223372036854775807 to -5534023222112865485
3) -5534023222112865484 to -1844674407370955162
4) -1844674407370955161 to 1844674407370955161
5) 1844674407370955162 to 5534023222112865484
6) 5534023222112865485 to 9223372036854775807
As node 5 is responsible for the token range containing the partition key "Bill," my query will be sent to node 5. As I am reading at a consistency of LOCAL_ONE, there will be no need for another node to be contacted, and the result will be returned to the client...having only hit a single node.
Note: Token ranges computed with:
python -c'print [str(((2**64 /5) * i) - 2**63) for i in range(6)]'
I mean if we have a cluster with a replication factor of 3 then we will always ask all nodes when we read
Wrong, with Consistency Level ONE the coordinator picks the fastest node (the one with lowest latency) to ask for data.
How does it know which replica is the fastest ? By keeping internal latency stats for each node.
With consistency level >= QUORUM, the coordinator will ask for data from the fastest node and also asks for digest from other replicas
From the client side, if you choose the appropriate load balancing strategy (e.g. TokenAwareStrategy) the client will always contact the primary replica when using consistency level ONE

Resources