Cassandra DB: What ultimately does 'replication_factor' controls? - cassandra

I want to verify and test the 'replication_factor' and the consistency level ONE of Cassandra DB.
And I specified a Cluster: 'MyCluster01' with three nodes in two data center: DC1(node1, node2) in RAC1, DC2(node3) in RAC2.
Structure shown as below:
[root#localhost ~]# nodetool status
Datacenter: DC1
===============
Status=Up/Down |/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.0.0.62 409.11 KB 256 ? 59bf9a73-45cc-4f9b-a14a-a27de7b19246 RAC1
UN 10.0.0.61 408.93 KB 256 ? b0cdac31-ca73-452a-9cee-4ed9d9a20622 RAC1
Datacenter: DC2
===============
Status=Up/Down |/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.0.0.63 336.34 KB 256 ? 70537e0a-edff-4f48-b5db-44f623ec6066 RAC2
Then, I created a keyspace and table like following:
CREATE KEYSPACE my_check1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};
create table replica_test(id uuid PRIMARY KEY);
After I inserted one record into that table:
insert into replica_test(id) values (uuid());
select * from replica_test;
id
--------------------------------------
5e6050f1-8075-4bc9-a072-5ef24d5391e5
I got that record.
But when I stopped node1 and queried again in either node 2 and node 3,
none of the query succeeded.
select * from replica_test;
Traceback (most recent call last): File "/usr/bin/cqlsh", line 997,
in perform_simple_statement
rows = self.session.execute(statement, trace=self.tracing_enabled) File
"/usr/share/cassandra/lib/cassandra-driver-internal-only-2.1.3.post.zip/cassandra-driver-2.1.3.post/cassandra/cluster.py",
line 1337, in execute
result = future.result(timeout) File "/usr/share/cassandra/lib/cassandra-driver-internal-only-2.1.3.post.zip/cassandra-driver-2.1.3.post/cassandra/cluster.py",
line 2861, in result
raise self._final_exception Unavailable: code=1000 [Unavailable exception] message="Cannot achieve consistency level ONE"
info={'required_replicas': 1, 'alive_replicas': 0, 'consistency':
'ONE'}
While the 'nodetool status' command returned:
UN 10.0.0.62 409.11 KB 256 ? 59bf9a73-45cc-4f9b-a14a-a27de7b19246 RAC1
DN 10.0.0.61 408.93 KB 256 ? b0cdac31-ca73-452a-9cee-4ed9d9a20622 RAC1
UN 10.0.0.63 336.34 KB 256 ? 70537e0a-edff-4f48-b5db-44f623ec6066 RAC2
And when I tried stopping node 2, keeping node 1 and 3 alive; or stopping node 3, keeping node 1 and 2 alive; The error occurred as well.
Then what 's the problem, since I think I 've already satisfied the consistency level, and where exactly does this record exists?

What ultimately does 'replication_factor' controls?
To directly answer the question, replication factor (RF) controls the number of replicas of each data partition that exist in a cluster or data center (DC). In your case, you have 3 nodes and a RF of 1. That means that when a row is written to your cluster, that it is only stored on 1 node. This also means that your cluster cannot withstand the failure of a single node.
In contrast, consider a RF of 3 on a 3 node cluster. Such a cluster could withstand the failure of 1 or 2 nodes, and still be able to support queries for all of its data.
With all of your nodes up and running, try this command:
nodetool getendpoints my_check1 replica_test 5e6050f1-8075-4bc9-a072-5ef24d5391e5
That will tell you on which node the data for key 5e6050f1-8075-4bc9-a072-5ef24d5391e5 resides. My first thought, is that you are dropping the only node which has this key, and then trying to query it.
My second thought echoes what Carlo said in his answer. You are using 2 DCs, which is really not supported with the SimpleStrategy. Using SimpleStrategy with multiple DCs could produce unpredictable results. Also with multiple DCs, you need to be using the NetworkTopologyStrategy and something other than the default SimpleSnitch. Otherwise Cassandra may fail to find the proper node to complete an operation.
First of all, re-create your keyspace and table with the NetworkTopologyStrategy. Then change your snitch (in the cassandra.yaml) to a network-aware snitch, restart your nodes, and try this exercise again.

NetworkTopologyStrategy should be used when replicating accross multiple DC.

Related

Elassandra replication information and rack configuration

I recently started working with an Elassandra cluster with two data centers which have been configured using NetworkTopologyStrategy.
Cluster details : Elassandra 6.2.3.15 = Elasticsearch 6.2.3 + Cassandra 3.11.4
Datacenter: DC1
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN <ip1> 50 GiB 256 ? 6cab1f4c-8937-437d-b010-0a5677443dc3 rack1
UN <ip2> 48 GiB 256 ? 6c9e7ad5-a642-4c0d-8b77-e78d821d904b rack1
UN <ip3> 50 GiB 256 ? 7e493bc6-c8a5-471e-8eee-3f3fe985b90a rack1
Datacenter: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN <ip4> 47 GiB 256 ? c49c1203-cc38-41a2-b9c8-2b42bc907c17 rack1
UN <ip5> 67 GiB 256 ? 0d9f31bc-9690-49b6-9d88-4fb30c1b6c0d rack1
UN <ip6> 88 GiB 256 ? 80c4d60d-185f-457a-ae9b-2eb611735f07 rack1
schema info
CREATE KEYSPACE my_keyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'} AND durable_writes = true;
The DC2 is kind of a Disaster Recovery site and in an ideal world, we should be able to use only that in case of a disaster.
With the very limited knowledge I have, I strongly suspect that we need
to modify the rack configuration to have a 'proper' D/R cluster (So
that data in DC1 gets replicated in DC2) or am I getting this
wrong? If so, is there a standard guideline to follow?
When there are multiple DCs, does Cassandra automatically replicate this regardless of rack configurations? (Are racks kind of additional fail proof?)
DC2 has more data than DC1. Is this purely related to hash function?
Is there any other things that can be rectified in this cluster?
Many thanks!
These replication settings mean that the data for your keyspace is replicated in real time between the 2 DCs with each DC having 3 replicas (copies):
CREATE KEYSPACE my_keyspace WITH replication = {
'class': 'NetworkTopologyStrategy',
'DC1': '3',
'DC2': '3'
}
Replication in Cassandra happens in real time -- any writes sent to one DC is sent to all other DCs at the same time. Unlike traditional RDBMS or configurations with primary/secondary or active/DR, Cassandra replication is instantaneous and immediate.
The logical Cassandra racks are for additional redundancy mechanism. If you have C* nodes deployed in different (a) physical racks, or (b) public cloud availability zones, Cassandra will distribute the replicas to separate racks so each rack has a full copy of the data. With a replication factor of 3 in the DC, if a rack goes down for whatever reason then there's still full copies of the data in the remaining 2 racks and read/write requests with a consistency of LOCAL_QUORUM (or lower) will not be affected.
I've explained this in a bit more detail in this post -- https://community.datastax.com/questions/1128/.
If you're new to Cassandra, we recommend https://www.datastax.com/dev which has links to short hands-on tutorials where you can quickly learn the basics of Cassandra -- all free. This tutorial is a good place to start -- https://www.datastax.com/try-it-out. Cheers!

Cassandra removing node and use it seperatel

I have a few questions about Cassandra, waiting for suggestions from experts. Thank you.
I'm using 3 nodes with replication factor 3. And all nodes own %100 of data.
[root#datab ~]# nodetool status mydatabase
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.1.1 11.83 GiB 8 100.0% c0fb9cfc-b20a-4c65-93b6-8e14107cc411 r1
UN 192.168.1.2 20.13 GiB 8 100.0% dd940c60-9645-45ca-894d-f0af49a7bd83 r1
UN 192.168.1.3 17.27 GiB 8 100.0% 3031587e-5354-4342-9ddc-e5696985fc8c r1
I want to remove node 192.168.1.3 and use this server as separated for testing purpose, I mean I want to keep %100 of data till removing the node.
I tried to decommission, but I couldn't use as separated.
For example, I have a table with 100gb data in it and select all queries returning slower. This 3 nodes running on separated hardware (servers). If I add 10 nodes for each server with docker, it makes queries run faster?
For 3 node, what is the difference between having replication factor 2 and 3? 3 node with replication factor keeps %100 data, but whenever I alter replication factor to 2, data percentages are getting down in few seconds, with factor 2 if I lose one of server, do I lost any data?
Whats the proper step to remove 1 node from the dc1 ?
changing factor to 2 ?
removenode ID ?
or first remove node than change factor ?
Thank you !!

Consistency level error while trying to insert data to cassandra cluster

I have the following cluster consisting of two node in two different datacenters:
Datacenter: 168
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns (effective) Host ID Token Rack
UN 10.168.92.13 47.44 KB 100.0% 70e6fb88-60d3-4f19-b4a7-4eacc6790042 -9223372036854775808 92
Datacenter: 186
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns (effective) Host ID Token Rack
UN 10.186.163.119 73.33 KB 100.0% 19714869-3d7a-434b-9c41-e7d90f14205c 0 163
I created a keyspace using NetworkTopologyStrategy to spread the data among the nodes. Assuming this will make a replica of the data in every datacenter, like this:
Create KEYSPACE demo WITH replication = {'class':'NetworkTopologyStrategy','DC1':1,'DC2':1};
Then I made a simple table user (id,name,last_name) but when I try to import or insert data to it I get the following error:
Traceback (most recent call last):
File "./bin/cqlsh", line 1108, in perform_simple_statement
rows = self.session.execute(statement, trace=self.tracing_enabled)
File "/home/ubuntu/cassandra/bin/../lib/cassandra-driver-internal-only-2.7.2.zip/cassandra-driver-2.7.2/cassandra/cluster.py", line 1602, in execute
result = future.result()
File "/home/ubuntu/cassandra/bin/../lib/cassandra-driver-internal-only-2.7.2.zip/cassandra-driver-2.7.2/cassandra/cluster.py", line 3347, in result
raise self._final_exception
Unavailable: code=1000 [Unavailable exception] message="Cannot achieve consistency level ONE" info={'required_replicas': 1, 'alive_replicas': 0, 'consistency': 'ONE'}
I also set in cassandra.yaml file endpoint_snitch: RackInferringSnitch
Does anyone know what's going on?
The names of the datacenters that appear when using nodetool status must match the names of the dcs when defining the keyspace strategy, for example:
Create KEYSPACE demo WITH replication = {'class':'NetworkTopologyStrategy','DC1':1,'DC2':1};
should be:
Create KEYSPACE demo WITH replication = {'class':'NetworkTopologyStrategy','168':1,'186':1};
Either that or renaming the datacenters to DC1 and DC2 and executing node repair.

Now getting error "message="Cannot achieve consistency level ONE" info={'required_replicas': 1, 'alive_replicas': 0, 'consistency': 'ONE'}"

Cassandra version: dsc-cassandra-2.1.9
Had 3 nodes, one of which was down for a long time. Brought it back up and decomissioned it. Then did a nodetool removenode.
When I try to make a cql query I get the above error.
Initially thought this might be because replication strategy was SimpleStrategy. So did a ALTER KEYSPACE history WITH REPLICATION =
{'class' : 'NetworkTopologyStrategy', 'dc1' : 2};
and changed the endpoint_snitch: GossipingPropertyFileSnitch instead of SimpleSnitch
did a nodetool repair on both nodes and restarted the cassandra services
But the problem is still there. What do I do?
EDIT 1: Nodetool status of machine A
-- Address Load Tokens Owns Host ID Rack
UN 192.168.99.xxx 19.8 GB 256 ? xxxxxxxx-xxxx-xxx-xxxx-xxxxx4ea RAC1
UN 192.168.99.xxx 18.79 GB 256 ? xxxxxxxx-xxxx-xxx-xxxx-xxxxxx15 RAC1
nodetool status output of machine B
-- Address Load Tokens Owns Host ID Rack
UN 192.168.99.xxx 19.8 GB 256 ? xxxxxxxx-xxxx-xxx-xxxx-xxxxxxxx4ea RAC1
UN 192.168.99.xxx 18.79 GB 256 ? xxxxxxxx-xxxx-xxx-xxxx-xxxxxxxxf15 RAC1
What is weird is that under the columns Owns you have no %, only the ?. This same issue occured to me in the past when I bootstrapped a new C* cluster and was using SimpleStrategy and SimpleSnitch. I did like you an ALTER KEYSPACE to switch to NetworkTopology and GossipingPropertyFileSnitch but it did not solve my issue so I rebuilt the cluster from scratch (fortunately I had no data inside)
If you have a backup of your data somewhere, just try to rebuild the 2 nodes from scratch.
Otherwise, consider backing up your sstable files on one node and rebuild the cluster and put the sstables back. Be careful because some file name/folder renaming may be necessary

Cassandra - Pillar applied migrations sync issue

Experiencing sync issues between different nodes in the same datacenter in Cassandra. The keyspace is set to a replication factor of 3 with NetworkTopology and has 3 nodes in the DC. Effectively making sure each node has a copy of the data. When node tool status is run, it shows all three nodes in the DC own 100% of the data.
Yet the applied_migrations column family in that keyspace is not in sync. This is strange because only a single column family is impacted within the keyspace. All the other column families are replicated fully among the three nodes. The test was done by doing a count of rows on each of the column families in the keyspaces.
keyspace_name | durable_writes | strategy_class | strategy_options
--------------+----------------+------------------------------------------------------+----------------------------
core_service | True | org.apache.cassandra.locator.NetworkTopologyStrategy | {"DC_DATA_1":"3"}
keyspace: core_service
Datacenter: DC_DATA_1
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN host_ip_address_1_DC_DATA_1 3.75 MB 256 100.0% 3851106b RAC1
UN host_ip_address_2_DC_DATA_1 3.72 MB 256 100.0% d1201142 RAC1
UN host_ip_address_3_DC_DATA_1 3.72 MB 256 100.0% 81625495 RAC1
Datacenter: DC_OPSCENTER_1
==========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN host_ip_address_4_DC_OPSCENTER_1 631.31 MB 256 0.0% 39e4f8af RAC1
Query: select count(*) from core_service.applied_migrations;
host_ip_address_1_DC_DATA_1 core_service applied_migrations
count
-------
1
(1 rows)
host_ip_address_2_DC_DATA_1 core_service applied_migrations
count
-------
2
(1 rows)
host_ip_address_3_DC_DATA_1 core_service applied_migrations
count
-------
2
(1 rows)
host_ip_address_4_DC_OPSCENTER_1 core_service applied_migrations
count
-------
2
(1 rows)
Similar error is received as described in the issue below. Because all the rows of data are not available, the migration script fails because it is trying to create an existing table:
https://github.com/comeara/pillar/issues/25
I require strong consistency
If you want to ensure that your reads are consistent you need to use the right consistency levels.
For RF3 the following are your options:
Write CL ALL and read with CL One or greater
Write CL Quorum and read CL Quorum. This is what's recommended by Magro who opened the issue you linked to. It's also the most common because you can loose one node and still read and write.
Write CL one but read CL ALL.
What does Cassandra do improve consistency
Cassandra's anti entropy mechanisms are:
Repair will ensure that your nodes are consistent. It gives you a consistency base line and for this reason it should be run as part of your maintenance operations. Run repair at least more often than your gc_grace_seconds in order to avoid zombie tombstones from coming back. DataStax OpsCenter has a Repair Service that automates this task.
Manually you can run:
nodetool repair
in one node or
nodetool repair -pr
in each of your nodes. The -pr option will ensure you only repair a node's primary ranges.
Read repair happens probabilistically (configurable at the table def). When you read a row, c* will notice if some of the replicas don't have the latest data and fix it.
Hints are collected by other nodes when a node is unavailable to take a write.
Manipulating c* Schemas
I noticed that the whole point of Pillar is "to automatically manage Cassandra schema as code". This is a dangerous notion--especially if Pillar is a distributed application (I don't know if it is). Because it may cause schema collisions that can leave a cluster in a wacky state.
Assuming that Pillar is not a distributed / multi-threaded system, you can ensure that you do not break schema by utilizing checkSchemaAgreement() before and after schema changes in the Java driver after schema modifications.
Long term
Cassandra schemas will be more robust and handle distributed updates. Watch (and vote for) CASSANDRA-9424

Resources