Cassandra - Select without replication - cassandra

Lets say I've created a keyspace and table:
CREATE KEYSPACE IF NOT EXISTS keyspace_rep_0
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 0};
CREATE TABLE IF NOT EXISTS some_table (
some_key ascii,
some_data ascii,
PRIMARY KEY (some_key)
);
I don't want any replica of this data. I can insert into this table with consistency level ANY. But I couldn't select any data from this table.
I got the following errors when querying with consistency levels ANY and ONE, respectively:
message="ANY ConsistencyLevel is only supported for writes"
message="Cannot achieve consistency level ONE"
info={'required_replicas': 1, 'alive_replicas': 0, 'consistency': 1}
I've tried other read consistency levels but none of them worked for me.
This is very similar to choosing 'replication_factor': 1 and shutting down a node. Again I couldn't select any data. All read consistency levels require at least one replica to be up. Is this how Cassandra works? You cannot select data without replication? What am I missing?

Every copy of the data, including the original, is a replica. Replication factor is not a count of additional copies, it is the total number of copies. You need RF >= 1.
I'm rather surprised that it allows RF == 0. With no replicas available, there's nothing to read. However, a comment on CASSANDRA-4486 indicates that this is intentionally allowed, but for special purposes:
. . . the point is that it's legitimate to set up a zero-replication keyspace (this is common when adding a new datacenter) and change it later. In the meantime, it's correct to reject writes to it.
And the write does not result in an error probably due to hinted handoff as mentioned in the descriptions for consistency levels, for ANY:
A write must be written to at least one node. If all replica nodes for the given partition key are down, the write can still succeed after a hinted handoff has been written. If all replica nodes are down at write time, an ANY write is not readable until the replica nodes for that partition have recovered.
So, if you want confirmation that your write was persisted to at least one node and not rely on the hinted handoff (which can expire), then write with consistency level ONE and not ANY.

Related

Replication without partitioning in Cassandra

In Mongo we can go for any of the below model
Simple replication(without shard where one node will be working as master and other as slaves) or
Shard(where data will be distributed on different shard based on partition key)
Both 1 and 2
My question - Can't we have Cassandra just with replication without partitioning just like model_1 in mongo ?
From Cassandra vs MongoDB in respect of Secondary Index?
In case of Cassandra, the data is distributed into multiple nodes based on the partition key.
From above it looks like it is mandatory to distribute the data based on some p[artition key when we have more than one node ?
In Cassandra, replication factor defines how many copies of data you have. Partition key is responsible for distributing of data between nodes. But this distribution may depend on the amount of nodes that you have. For example, if you have 3 nodes cluster & replication factor equal to 3, then all nodes will get data anyway...
Basically your intuition is right: The data is always distributed based on the partition key. The partition key is also called row key or primary key, so you can see: you have one anyway. The 1. case of your mongo example is not doable in cassandra, mainly because cassandra does not know the concept of masters and slaves. If you have a 2 node cluster and a replication factor of 2, then the data will be held on 2 nodes, like Alex Ott already pointed out. When you query (read or write), your client will decide to which to connect and perform the operation. To my knowledge, the default here would be a round robin load balancing between the two nodes, so either of them will receive somewhat the same load. If you have 3 nodes and a replication factor of 2, it becomes a little more tricky. The nice part is though, that you can determine the set of nodes which hold your data in the client code, thus you don't lose any performance by connecting to a "wrong" node.
One more thing about partitions: you can configure some of this, but this would be per server and not per table. I've never used this, and personally i wouldn't recommend to do so. Just stick to the default mechanism of cassandra.
And one word about the secondary index thing. Use materialized views

Cassandra tunable consistency exercise

I need some help solving an exercise for school. It's about tunable consistency in Cassandra.
Given a cluster of 15 nodes, complete the following table. In case of multiple posibilities give all of them. CL values are: ANY, ONE, QUORUM, ALL
Thank you very much for your help!
p.s. I'm sure that we need the following rule to solve this: nodes read + nodes written > replication factor to be consistent
This document here should outline the consistency levels and how they function:
https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlConfigConsistency.html
I've copied some of the content here for clarity if the link becomes broken in the future
Write Consistency Levels
ALL
A write must be written to the commit log and memtable on all replica
nodes in the cluster for that partition.
EACH_QUORUM Strong consistency. A write must be written to the commit log and memtable on a quorum of replica nodes in each
datacenter.
QUORUM
A write must be written to the commit log and memtable on a quorum of
replica nodes across all datacenters.
LOCAL_QUORUM
Strong consistency. A write must be written to the commit log and
memtable on a quorum of replica nodes in the same datacenter as the
coordinator. Avoids latency of inter-datacenter communication.
ONE
A write must be written to the commit log and memtable of at least one
replica node.
TWO A write must be written to the commit log and memtable of at least two replica nodes.
THREE A write must be written to the commit log and memtable
of at least three replica nodes.
LOCAL_ONE
A write must be sent to, and successfully acknowledged by, at least
one replica node in the local datacenter.
ANY
A write must be written to at least one node. If all replica nodes for
the given partition key are down, the write can still succeed after a
hinted handoff has been written. If all replica nodes are down at
write time, an ANY write is not readable until the replica nodes for
that partition have recovered.
Read consistency levels
ALL
Returns the record after all replicas have responded. The read
operation will fail if a replica does not respond. EACH_QUORUM
Not supported for reads.
QUORUM
Returns the record after a quorum of replicas from all datacenters has
responded.
LOCAL_QUORUM
Returns the record after a quorum of replicas in the current
datacenter as the coordinator has reported. Avoids latency of
inter-datacenter communication.
ONE
Returns a response from the closest replica, as determined by the
snitch. By default, a read repair runs in the background to make the
other replicas consistent.
TWO
Returns the most recent data from two of the closest replicas.
THREE
Returns the most recent data from three of the closest replicas.
LOCAL_ONE
Returns a response from the closest replica in the local datacenter.
SERIAL
Allows reading the current (and possibly uncommitted) state of data
without proposing a new addition or update. If a SERIAL read finds an
uncommitted transaction in progress, it will commit the transaction as
part of the read. Similar to QUORUM.
LOCAL_SERIAL
Same as SERIAL, but confined to the datacenter.
I think this should be the correct answer. Please correct me if I'm wrong. Ignore the dutch sentences in the table, I don't think it will pose any problems for english readers.

Consistency and timeout issues with Cassandra 2.2

I'm using Cassandra 2.2 and I've an application that requires a high level of consistency.
I've configured one datacenter cluster with 3 nodes.
My keyspace is created with replication_factor of 2.
In each configuration.yaml files I've set 2 seed_providers (for example NODE_1 and NODE_3).
The important thing is that my app should be full-functional even if one node is down.
Currently I've some issues with the consistency and timeout when my app contacts the cluster.
I've read the whole Cassandra 2.2 documentation and I concluded that the best CONSISTENCY LEVEL for my write operations should be QUORUM and for my read operations ONE, but I still have some consistency issues.
First of all, is it the right choice to have a strong level of consistency?
And also, are UPDATE and DELETE operations considered as write or read operations, since for example an update operation with a WHERE clause still has to 'read' data? I'm not sure, spacially in the context of the cassandra' write workflow.
My second issue is the timeout during the write operations. A simple and lightweight INSERT sometimes get "Cassandra timeout during write query at consistency QUORUM (2 replicas were required but only 1 acknowledged the write)"
or sometines even "... 0 acknoledged" even though all of my 3 nodes are UP.
Are there some other parameters that I should check, like for example write_request_timeout_in_ms, with default value of 2000 ms (which is already a high value)?
You will have strong consistency with Replication Factor = 2 and Consistency Level = QUORUM for write operations and ONE for read operations. But write operations will fail if one node is down. Consistency Level = QUORUM is the same as ALL in case Replication Factor = 2.
You should use Replication Factor = 3 and Consistency Level = QUORUM for both write and read operations, to have strong consistency and full functional app even if one node is down.
DELETE and UPDATE operations are write operations.
For the timeout issue please provide table model and queries that fails.
Updated
Consistency level applies to replicas, not nodes.
Replication factor = 2 means that 2 of 3 nodes will contain data. These nodes will be replicas.
QUORUM means that a write operation must be acknowledged by 2 replicas (when replication factor=2), not nodes.
Cassandra places the data on each node according to the partition key. Each node is responsible for a range of partition keys. Not any node can store any data, so you need have alive replicas (not nodes) to perform operations. Here article about data replication and distribution.
When you perform QUORUM write request to cluster with 2 of 3 alive nodes, there is a chance that the cluster has only 1 alive replica for the partition key, in this case the write request will fail.
In additional: here is a simple calculator for Cassandra parameters

Not enough replica available for query at consistency ONE (1 required but only 0 alive)

I have a Cassandra cluster with three nodes, two of which are up. They are all in the same DC. When my Java application goes to write to the cluster, I get an error in my application that seems to be caused by some problem with Cassandra:
Caused by: com.datastax.driver.core.exceptions.UnavailableException: Not enough replica available for query at consistency ONE (1 required but only 0 alive)
at com.datastax.driver.core.exceptions.UnavailableException.copy(UnavailableException.java:79)
The part that doesn't make sense is that "1 required but only 0 alive" statement. There are two nodes up, which means that one should be "alive" for replication.
Or am I misunderstanding the error message?
Thanks.
You are likely getting this error because the Replication Factor of the keyspace the table you are querying belongs to has a Replication Factor of one, is that correct?
If the partition you are reading / updating does not have enough available replicas (nodes with that data) to meet the consistency level, you will get this error.
If you want to be able to handle more than 1 node being unavailable, what you could do is look into altering your keyspace to set a higher replication factor, preferably three in this case, and then running a nodetool repair on each node to get all of your data on all nodes. With this change, you would be able to survive the loss of 2 nodes to read at a consistency level of one.
This cassandra parameters calculator is a good reference for understanding the considerations of node count, replication factor, and consistency levels.
I hit this today because the datacenter field is case sensitive. If your dc is 'somedc01' this isn't going to work:
replication =
{
'class': 'NetworkTopologyStrategy',
'SOMEDC01': '3' # <-- BOOM!
}
AND durable_writes = true;
Anyway, it's not that intuitive, hope this helps.
in my case, I got a message 0 available, but cassandra was up and cqlsh worked correctly, the problem was accessing from java: query was for a complete table, and some records were not accesible (all nodes containing them down). From cqlsh, select * from table works, only shows accesible records. So, the solution is to recover down nodes, and maybe to change replication factors with:
ALTER KEYSPACE ....
nodetool repair -all
then nodetool status to see changes and cluster structure
For me it was that my endpoint_snitch was still set to SimpleSnitch instead of something like GossipingPropertyFileSnitch. This was preventing the multi-DC cluster from connecting properly and manifesting in the error above.

How to set consistency level of column family using either cassandra-cli OR Hector

I cannot seem to find a way to do this. There is nothing in the documentation (except for CQL examples or thrift examples). Has anyone done this before? I want to set the consistency for read and write. Or even if you could tell me how do I pass the consistency level for each individual read / write, that would be great.
There exists a way to set the consistency level when the keyspace is created. But what about the case where the keyspace already exists and you want to update the consistency level for it. If it can't be done, then why is it so, since it should normally be available.
Zanson is correct - Consistency Level is determined on a query by query basis. So each Read and Write query you execute will have its own consistency level specified. I believe by default this is set to QUORUM. QUORUM is calculated based on the Keyspace's Replication Factor.
-> Replication Factor is defined at the keyspace level.
-> Read / Write Consistency Level is defined at the point of querying.
This issue described here https://issues.apache.org/jira/browse/CASSANDRA-4734
they deleted available to set consistency level (CL) per-request (i.e. in CQL).
And decided to set this in Column Famaly (CF) level.
unfortunaly :(
BUT if using php pro_cassandra - you can make so: https://github.com/Orange-OpenSource/YACassandraPDO/issues/13
You want to update the replication factor. Consistency level refers to a single query. To update the replication factor from the cassandra-cli you want to use the 'update keyspace' command and change the strategy options.
UPDATE KEYSPACE demo WITH strategy_options = {replication_factor:1}
You are mixing up consistency level (refers to a single query) and replication factor (set when the keyspace is created).
To change the replication factor for an existing keyspace, use the 'alter keyspace' command:
ALTER KEYSPACE demo WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1};
Sources:
Updating the replication factor - CQL for Cassandra 1.2
Updating the replication factor - CQL for Cassandra 2.0 & 2.1

Resources