Experiencing sync issues between different nodes in the same datacenter in Cassandra. The keyspace is set to a replication factor of 3 with NetworkTopology and has 3 nodes in the DC. Effectively making sure each node has a copy of the data. When node tool status is run, it shows all three nodes in the DC own 100% of the data.
Yet the applied_migrations column family in that keyspace is not in sync. This is strange because only a single column family is impacted within the keyspace. All the other column families are replicated fully among the three nodes. The test was done by doing a count of rows on each of the column families in the keyspaces.
keyspace_name | durable_writes | strategy_class | strategy_options
--------------+----------------+------------------------------------------------------+----------------------------
core_service | True | org.apache.cassandra.locator.NetworkTopologyStrategy | {"DC_DATA_1":"3"}
keyspace: core_service
Datacenter: DC_DATA_1
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN host_ip_address_1_DC_DATA_1 3.75 MB 256 100.0% 3851106b RAC1
UN host_ip_address_2_DC_DATA_1 3.72 MB 256 100.0% d1201142 RAC1
UN host_ip_address_3_DC_DATA_1 3.72 MB 256 100.0% 81625495 RAC1
Datacenter: DC_OPSCENTER_1
==========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN host_ip_address_4_DC_OPSCENTER_1 631.31 MB 256 0.0% 39e4f8af RAC1
Query: select count(*) from core_service.applied_migrations;
host_ip_address_1_DC_DATA_1 core_service applied_migrations
count
-------
1
(1 rows)
host_ip_address_2_DC_DATA_1 core_service applied_migrations
count
-------
2
(1 rows)
host_ip_address_3_DC_DATA_1 core_service applied_migrations
count
-------
2
(1 rows)
host_ip_address_4_DC_OPSCENTER_1 core_service applied_migrations
count
-------
2
(1 rows)
Similar error is received as described in the issue below. Because all the rows of data are not available, the migration script fails because it is trying to create an existing table:
https://github.com/comeara/pillar/issues/25
I require strong consistency
If you want to ensure that your reads are consistent you need to use the right consistency levels.
For RF3 the following are your options:
Write CL ALL and read with CL One or greater
Write CL Quorum and read CL Quorum. This is what's recommended by Magro who opened the issue you linked to. It's also the most common because you can loose one node and still read and write.
Write CL one but read CL ALL.
What does Cassandra do improve consistency
Cassandra's anti entropy mechanisms are:
Repair will ensure that your nodes are consistent. It gives you a consistency base line and for this reason it should be run as part of your maintenance operations. Run repair at least more often than your gc_grace_seconds in order to avoid zombie tombstones from coming back. DataStax OpsCenter has a Repair Service that automates this task.
Manually you can run:
nodetool repair
in one node or
nodetool repair -pr
in each of your nodes. The -pr option will ensure you only repair a node's primary ranges.
Read repair happens probabilistically (configurable at the table def). When you read a row, c* will notice if some of the replicas don't have the latest data and fix it.
Hints are collected by other nodes when a node is unavailable to take a write.
Manipulating c* Schemas
I noticed that the whole point of Pillar is "to automatically manage Cassandra schema as code". This is a dangerous notion--especially if Pillar is a distributed application (I don't know if it is). Because it may cause schema collisions that can leave a cluster in a wacky state.
Assuming that Pillar is not a distributed / multi-threaded system, you can ensure that you do not break schema by utilizing checkSchemaAgreement() before and after schema changes in the Java driver after schema modifications.
Long term
Cassandra schemas will be more robust and handle distributed updates. Watch (and vote for) CASSANDRA-9424
Related
I have a few questions about Cassandra, waiting for suggestions from experts. Thank you.
I'm using 3 nodes with replication factor 3. And all nodes own %100 of data.
[root#datab ~]# nodetool status mydatabase
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.1.1 11.83 GiB 8 100.0% c0fb9cfc-b20a-4c65-93b6-8e14107cc411 r1
UN 192.168.1.2 20.13 GiB 8 100.0% dd940c60-9645-45ca-894d-f0af49a7bd83 r1
UN 192.168.1.3 17.27 GiB 8 100.0% 3031587e-5354-4342-9ddc-e5696985fc8c r1
I want to remove node 192.168.1.3 and use this server as separated for testing purpose, I mean I want to keep %100 of data till removing the node.
I tried to decommission, but I couldn't use as separated.
For example, I have a table with 100gb data in it and select all queries returning slower. This 3 nodes running on separated hardware (servers). If I add 10 nodes for each server with docker, it makes queries run faster?
For 3 node, what is the difference between having replication factor 2 and 3? 3 node with replication factor keeps %100 data, but whenever I alter replication factor to 2, data percentages are getting down in few seconds, with factor 2 if I lose one of server, do I lost any data?
Whats the proper step to remove 1 node from the dc1 ?
changing factor to 2 ?
removenode ID ?
or first remove node than change factor ?
Thank you !!
I have three-node ring of Apache Cassandra 2.1.12. I inserted some data when it was 2-node ring and then added one more 172.16.5.54 node in the ring. I am using the vnode in my ring. The problem is data is not distributed evenly where as ownership seems distributed evenly. So, how to redistribute the data aross the ring. I have tried with nodetool repair and nodetool cleanup but still no luck.
Moreover, what does this load and own column signify in the nodetool status output.
Also, If out of these three-node if i import the data from one of the node from the file. So, CPU utilization goes upto 100% and finally data on the rest of the two nodes get distributed evenly but not on import running node. Why is it so?
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.16.5.54 1.47 MB 256 67.4% 40d07f44-eef8-46bf-9813-4155ba753370 rack1
UN 172.16.4.196 165.65 MB 256 68.3% 6315bbad-e306-4332-803c-6f2d5b658586 rack1
UN 172.16.3.172 64.69 MB 256 64.4% 26e773ea-f478-49f6-92a5-1d07ae6c0f69 rack1
The columns in the output are explained for cassandra 2.1.x in this doc. The load is the amount of file system data in the cassandra data directories. It seems unbalanced across your 3 nodes, which might imply that your partition keys are clustering on a single node (172.16.4.196), sometimes called a hot spot.
The Owns column is "the percentage of the data owned by the node per datacenter times the replication factor." So I can deduce your RF=2 because each node Owns roughly 2/3 of the data.
You need to fix your partition keys of tables.
Cassandra distributes the data based on partition keys to nodes (using hash partitioning range).
So, for some reason you have alot of data for few partition key value, and almost non for rest partition key values.
I want to verify and test the 'replication_factor' and the consistency level ONE of Cassandra DB.
And I specified a Cluster: 'MyCluster01' with three nodes in two data center: DC1(node1, node2) in RAC1, DC2(node3) in RAC2.
Structure shown as below:
[root#localhost ~]# nodetool status
Datacenter: DC1
===============
Status=Up/Down |/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.0.0.62 409.11 KB 256 ? 59bf9a73-45cc-4f9b-a14a-a27de7b19246 RAC1
UN 10.0.0.61 408.93 KB 256 ? b0cdac31-ca73-452a-9cee-4ed9d9a20622 RAC1
Datacenter: DC2
===============
Status=Up/Down |/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.0.0.63 336.34 KB 256 ? 70537e0a-edff-4f48-b5db-44f623ec6066 RAC2
Then, I created a keyspace and table like following:
CREATE KEYSPACE my_check1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};
create table replica_test(id uuid PRIMARY KEY);
After I inserted one record into that table:
insert into replica_test(id) values (uuid());
select * from replica_test;
id
--------------------------------------
5e6050f1-8075-4bc9-a072-5ef24d5391e5
I got that record.
But when I stopped node1 and queried again in either node 2 and node 3,
none of the query succeeded.
select * from replica_test;
Traceback (most recent call last): File "/usr/bin/cqlsh", line 997,
in perform_simple_statement
rows = self.session.execute(statement, trace=self.tracing_enabled) File
"/usr/share/cassandra/lib/cassandra-driver-internal-only-2.1.3.post.zip/cassandra-driver-2.1.3.post/cassandra/cluster.py",
line 1337, in execute
result = future.result(timeout) File "/usr/share/cassandra/lib/cassandra-driver-internal-only-2.1.3.post.zip/cassandra-driver-2.1.3.post/cassandra/cluster.py",
line 2861, in result
raise self._final_exception Unavailable: code=1000 [Unavailable exception] message="Cannot achieve consistency level ONE"
info={'required_replicas': 1, 'alive_replicas': 0, 'consistency':
'ONE'}
While the 'nodetool status' command returned:
UN 10.0.0.62 409.11 KB 256 ? 59bf9a73-45cc-4f9b-a14a-a27de7b19246 RAC1
DN 10.0.0.61 408.93 KB 256 ? b0cdac31-ca73-452a-9cee-4ed9d9a20622 RAC1
UN 10.0.0.63 336.34 KB 256 ? 70537e0a-edff-4f48-b5db-44f623ec6066 RAC2
And when I tried stopping node 2, keeping node 1 and 3 alive; or stopping node 3, keeping node 1 and 2 alive; The error occurred as well.
Then what 's the problem, since I think I 've already satisfied the consistency level, and where exactly does this record exists?
What ultimately does 'replication_factor' controls?
To directly answer the question, replication factor (RF) controls the number of replicas of each data partition that exist in a cluster or data center (DC). In your case, you have 3 nodes and a RF of 1. That means that when a row is written to your cluster, that it is only stored on 1 node. This also means that your cluster cannot withstand the failure of a single node.
In contrast, consider a RF of 3 on a 3 node cluster. Such a cluster could withstand the failure of 1 or 2 nodes, and still be able to support queries for all of its data.
With all of your nodes up and running, try this command:
nodetool getendpoints my_check1 replica_test 5e6050f1-8075-4bc9-a072-5ef24d5391e5
That will tell you on which node the data for key 5e6050f1-8075-4bc9-a072-5ef24d5391e5 resides. My first thought, is that you are dropping the only node which has this key, and then trying to query it.
My second thought echoes what Carlo said in his answer. You are using 2 DCs, which is really not supported with the SimpleStrategy. Using SimpleStrategy with multiple DCs could produce unpredictable results. Also with multiple DCs, you need to be using the NetworkTopologyStrategy and something other than the default SimpleSnitch. Otherwise Cassandra may fail to find the proper node to complete an operation.
First of all, re-create your keyspace and table with the NetworkTopologyStrategy. Then change your snitch (in the cassandra.yaml) to a network-aware snitch, restart your nodes, and try this exercise again.
NetworkTopologyStrategy should be used when replicating accross multiple DC.
We are trying to provision a Cassandra cluster to use as a KV storage.
We are deploying a 3-node production cluster on our production DC, but we would also want to have a single node as a DR copy on our disaster recovery DC.
Using PropertyFileSnitch we have
10.1.1.1=DC1:R1
10.1.1.2=DC1:R1
10.1.1.3=DC1:R1
10.2.1.1=DC2:R1
We plan on using a keyspace with the following definition:
CREATE KEYSPACE "cassandraKV"
WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'DC1' : 2, 'DC2' : 1};
In order to achieve the following:
2 replicas distributed among 3 nodes in DC1 (66% of total data per node) while still allowing a single node to go down without any data loss.
1 replica in DC2 (100% of total data per node)
We see the ownership distributed at 25% per node, while we would expect 33% on each node in DC1 and 100% in DC2.
Is this above configuration correct?
Thanks
My guess is that you ran nodetool Status without specifying the keyspace. This will end up just showing you the general distribution of the tokens in your cluster which will not be representative of your "cassandraKV" keyspace.
On running nodetool status "cassandraKV" you should see
Datacenter: DC1
10.1.1.1: 66%
10.1.1.2: 66%
10.1.1.3: 66%
Datacenter: DC2
10.2.1.1: 100%
You should see 66% for the DC1 nodes because each node holds 1 copy of it's primary range (33%) and one copy of whatever it is replicating (33%)
While DC2 has 100% of all the data you are currently holding.
I have a cassandra 2 datacenter pair with single replication with each datacenter containing a single node and each datacenter located on separate physical servers on the network. If one datacenter crashes, the other one will continue to be available for reads and writes I started up my java application, on a 3rd server, and everything it running ok. It's reading and writing to cassandra.
Next I disconnected, pulled the network cable, the 2nd datacenter server from the network.
I expected the application to continue running with no exceptions against the 1st datacenter, but that was not the case.
The following exception started to occur in the application:
me.prettyprint.hector.api.exceptions.HUnavailableException: : May not be enough replicas present to handle consistency level.
at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:60)
at me.prettyprint.cassandra.service.KeyspaceServiceImpl$9.execute(KeyspaceServiceImpl.java:354)
at me.prettyprint.cassandra.service.KeyspaceServiceImpl$9.execute(KeyspaceServiceImpl.java:343)
at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101)
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:232)
at me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131)
at me.prettyprint.cassandra.service.KeyspaceServiceImpl.getSuperColumn(KeyspaceServiceImpl.java:360)
at me.prettyprint.cassandra.model.thrift.ThriftSuperColumnQuery$1.doInKeyspace(ThriftSuperColumnQuery.java:51)
at me.prettyprint.cassandra.model.thrift.ThriftSuperColumnQuery$1.doInKeyspace(ThriftSuperColumnQuery.java:45)
at me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)
at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85)
at me.prettyprint.cassandra.model.thrift.ThriftSuperColumnQuery.execute(ThriftSuperColumnQuery.java:44)
Once I reconnected the network cable to the 2nd server, the error stopped.
Here's more details on cassandra 1.0.10
1) Here's the following describe from cassandra on both datacenters
Keyspace: AdvancedAds:
Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
Durable Writes: true
Options: [DC2:1, DC1:1]
2) I ran a node tool ring against each instance
./nodetool -h 111.111.111.111 -p 11000 ring
Address DC Rack Status State Load Owns Token
1
111.111.111.111 DC1 RAC1 # <-- usUp Normal 1.07 GB 100.00% 0
111.111.111.222 DC2 RAC1 Up Normal 1.1 GB 0.00% 1
./nodetool -h 111.111.111.222 ring -port 11000
Address DC Rack Status State Load Owns Token
1
111.111.111.111 DC1 RAC1 Up Normal 1.07 GB 100.00% 0
111.111.111.222 DC2 RAC1 # <-- usUp Normal 1.1 GB 0.00% 1
3) I checked the cassandra.yaml
the seeds are 111.111.111.111, 111.111.111.222
4) I checked the cassandra-topology.properties
111.111.111.111
# Cassandra Node IP=Data Center:Rack
# datacenter 1
111.111.111.111=DC1:RAC1 # <-- us
# datacenter 2
111.111.111.222=DC2:RAC1
default=DC1:r1
111.111.111.222
# Cassandra Node IP=Data Center:Rack
# datacenter 1
111.111.111.111=DC1:RAC1
# datacenter 2
111.111.111.222=DC2:RAC1 # <-- us
default=DC1:r1
5) we set the consistencyLevel to LOCAL_QUORUM in our java application as follows:
public Keyspace getKeyspace(final String keyspaceName, final String serverAddresses)
{
Keyspace ks = null;
Cluster c = clusterMap.get(serverAddresses);
if (c != null)
{
ConfigurableConsistencyLevel policy = new ConfigurableConsistencyLevel();
policy.setDefaultReadConsistencyLevel(consistencyLevel);
policy.setDefaultWriteConsistencyLevel(consistencyLevel);
// Create Keyspace
ks = HFactory.createKeyspace(keyspaceName, c, policy);
}
return ks;
}
I was told this configuration would work, but maybe I'm missing something.
Thanks for any insight
Hector is known to return spurious unavailable errors. The native protocol Java driver does not have this problem: https://github.com/datastax/java-driver
If you have only two nodes and your data would be placed on the node that is actually down, when the consistency is required, you may not be able to achieve full write availability. Cassandra would be achieving that with Hinted Handoff, but for the QUORUM consistency level the UnavailableException will be thrown anyway.
The same is true when requesting data belonging to the down node.
However it seems like your cluster is not well balanced. Your node 111.111.111.111 owns 100% and then 111.111.111.222 seem to own 0%, looking at your tokens they seem to be a reason for that.
Checkout how to set initial token here : http://www.datastax.com/docs/0.8/install/cluster_init#token-gen-cassandra
Additionally you may want to check Another Question, which contains answer with more reasons, when the situation like this may happen.
LOCAL_QUORUM won't work if you configure NetworkTopologyStrategy like this:
Options: [DC2:1, DC1:1] # this will make LOCAL_QUORUM and QUORUM fail always
LOCAL_QUORUM and (in my experience) QUORUM require data centers to have at least 2 replicas up. If you want a quorum spanning your data centers you have to set consistency level to data center agnostic TWO.
More examples:
Options: [DC2:3, DC1:1] # LOCAL_QUORUM for clients in DC2 works, QUORUM fails
Options: [DC2:2, DC1:1] # LOCAL_QUORUM in DC2 works, but down after 1 node failure
# QUORUM fails, TWO works.