Cassandra cluster with each node total replication - cassandra

Hi I'm new to Cassandra. I have a 2 node Cassandra cluster. For reasons imposed by the front end I need...
Total replication of all data on each of the two nodes.
Eventual consistent writes. So the node being written to will respond with an acknowledge to the front end straight away. Not synchronized on the replication
Can anyone tell me is this possible? Is it done in the YAML file? I know there is properties there for consistency but I don't see that any of the Partitioners suit my needs. Where can I set the replication factor?
Thanks

You set the replication factor during creation of the keyspace. So if you use (and plan for the future on using) a single data center set-up, you create the keyspace using cqlsh like so
CREATE KEYSPACE "Excalibur"
WITH REPLICATION = {'class' : 'SimpleStrategy', 'replication_factor' : 3};
Check out the documentation regarding the create keyspace. How this is handled internally is related to the snitch definition of the cluster and a strategy option defined per keyspace. In the case of the SimpleStrategy above, this simply assumes a ring topology of your cluster and places the data clockwise in that ring (see this).
Regarding consistency, you can very different levels of consistency for write and read operations in your client/driver during each operation:
Cassandra extends the concept of eventual consistency by offering tunable consistency―for any given read or write operation, the client application decides how consistent the requested data should be.
Read the doc
If you use Java in your clients, and the DatatStax Java driver, you can set the consistency level using
QueryOptions.setConsistencyLevel(ConsistencyLevel consistencyLevel)
"One" is the default setting.
Hope that helps

Related

default consistency level and quorum setting in Cassandra and what is the best practice for tuning them

I just started learning Cassandra, wondering if there is a default consistency level and quorum setting. seems to me there are quite a few parameters (like replicator number, quorum number) are tunable to balance Consistency with performance, is there a best practice on these settings? what's the default settings?
Thank you very much.
Default READ and WRITE consistency is ONE in cassandra.
Consistency can be specified for each query. CONSISTENCY command can be used from cqlsh to check current consistency value or set new consistency value.
Replication factor is number of copies of data required.
Deciding consistency depends on factors like whether it is write heavy workload or read heavy workload, how many nodes failure can be handled at a time.
Ideally LOCAL_QUORUM READ & WRITE will give you strong consistency.
quorum = (sum_of_replication_factors / 2) + 1
For example, using a replication factor of 3, a quorum is 2 nodes ((3 / 2) + 1 = 2). The cluster can tolerate one replica down.Similar to QUORUM, the LOCAL_QUORUM level is calculated based on the replication factor of the same datacenter as the coordinator node. Even if the cluster has more than one datacenter, the quorum is calculated with only local replica nodes.
Consistency in cassandra
Following are the excellent links and should help you to understand consistency level and its configuration in Cassandra. Second link contains many pictorial diagrams.
https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/dml/dmlConfigConsistency.html#dmlConfigConsistency__about-the-quorum-level
https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/dml/dmlClientRequestsReadExp.html

Cassandra multi-DC:write on LOCAL and read from any DC

We use a multi-data center (DC) cassandra cluster. During write on to the cluster, I want only LOCAL DC to perform writes on its nodes as we are already routing the write requests to the desired DC only based on the source from where write is initiated. So, I want only LOCAL DC to process the write and no other DC to perform the writes on its nodes. But later on by virtue of replication among nodes across DCs, I want the written data to be replicated across DCs. Is this replication across DCs possible when I am restricting the write to only one DC in the first place.If I do not open connections to REMOTE hosts lying in different DCs during my write operation, is data replication possible amongst DCs later on. Why I definitely need replicas of data in all DCs is because during data read from cluster, we want the data to be read from any DC the read request falls on, not necessarily the LOCAL one.
Do anyone has solution to this?
You may want to use Local_Quorum consistency for writes if you want to perform them in only Local DC.
Check keyspace definition for the one you want these restriction. It should have class "Network topology" and RF in both DC. Something like this:
ALTER KEYSPACE <Keyspace_name> WITH REPLICATION =
{'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2};
It states that after consistency is satisfied Cassandra will propagate the writes to another DC.
Use Quorum consistency for reads if they are not restricted to one DC but be aware that it might add bit latency because Cassandra has to read data from other data center as well.

Can Cassandra support multi-DC cluster with different number of nodes?

I want to be able to get the backup/replicarw of operational data to a single node so we can do some adhoc queries.
Having just one machine handle this replica will be work for now.
Is this possible ? If not what are the arguments against it ?
Yes, you can have different number of nodes in each data center. Set the replication factor as per your requirement.
E.g. If you have DC1 with 4 nodes and going to add DC2 with 1 node then replication factor for your keyspace should be DC1=x,DC2=1(where x<=4).
To add one more data center you need to check the Topology, Snitch and seeds configurations.
E.g. If you are using SimpleSnitch then you can't have multiple data centers, So you need to change your snitch and topology. Check this link which explains more about changing snitch and topology.

Cassandra Read/Write CONSISTENCY Level in NetworkTopologyStrategy

I have setup cassandra in 2 data centers with 4 nodes each with replication factor of 2.
Consistency level is ONE (set by default)
I was facing consistency issue when trying to read data at consistency level of ONE.
As read in DataStax documentation, Consistency level (read + write) should be greater than replication factor.
I decided to change the write consistency level to TWO and read consistency level as ONE which resolves the inconsistency problem in single data center.
But in case of multiple data center, the problem would be resolved by consistency level as LOCAL_QUORUM.
How would i achieve that write should be (LOCAL_QUORUM + TWO) so that i should write to the local data center and also on 2 nodes.
Just write using LOCAL_QUORUM in the datacenter you want. If you have a replication factor of 2 in each of your datacenter then the data you are writing in the "local" datacenter will eventually be replicated in the "other" datacenter (but you have no guaranty of when).
LOCAL_QUORUM means: "after the write operation returns, data has been effectively writen on a quorum of nodes in the local datacenter"
TWO means: "after the write operation returns, data has been writen on at least 2 nodes in any of the datacenter"
If you want to read the data you have just written with LOCAL_QUORUM in the same datacenter, you should use LOCAL_ONE consistency. If you read with ONE, then there is a chance that the closest replica is in the "remote" datacenter and therefore not yet replicated by Cassandra.
This also depends on the load balancing strategy configured at the driver level. You can read more about this here: https://datastax.github.io/java-driver/manual/load_balancing/

Cassandra consistency Issue

We have our Cassandra cluster running on AWS EC2 with 4 nodes in the ring. We have face data inconsistency issue. We changed consistency level two while using "cqlsh" shell, the data inconsistency issue has been solved.
But we dont know "How to set consistency level on Cassandra cluster?"
Consistency level can be set at per session or per statement basis. You will need to check the consistency level of writes and reads, to get a strong consistency your R + W ( read consistency + write consistency ) should be greater than your replication factor.
If you are using Java Driver, you can set default consistency at cluster level using "Cluster.Builder.withQueryOptions()" method.
http://docs.datastax.com/en/drivers/java/2.0/com/datastax/driver/core/Cluster.Builder.html#withQueryOptions-com.datastax.driver.core.QueryOptions-

Resources