I am struggling to understand why my Cassandra nodes have uneven data size.
I have a cluster of three nodes. According to nodetool ring, each node owns 33.33%. Still disk space usages are uneven.
Node1: 4.7 GB (DC: logg_2, RAC: RAC1)
Node2: 13.9 GB (DC: logg_2, RAC:RAC2)
Node3: 9.3 GB (DC: logg_2, RAC:RAC1)
There is only one keysapce.
keyspace_definition: |
CREATE KEYSPACE stresscql_cass_logg WITH replication = { 'class': 'NetworkTopologyStrategy', 'logg_2' : 2, 'logg_1' : 1};
And there is only one table named blogposts.
table_definition: |
CREATE TABLE blogposts (
domain text,
published_date timeuuid,
url text,
author text,
title text,
body text,
PRIMARY KEY(domain, published_date)
) WITH CLUSTERING ORDER BY (published_date DESC)
AND compaction = { 'class':'LeveledCompactionStrategy' }
AND comment='A table to hold blog posts'
Please help me to understand it why each node has uneven datasize.
Ownership is how much data is owned by the node.
The percentage of the data owned by the node per datacenter times the
replication factor. For example, a node can own 33% of the ring, but
show 100% if the replication factor is 3.
Attention: If your cluster uses keyspaces having different replication
strategies or replication factors, specify a keyspace when you run
nodetool status to get meaningful ownship information.
More information can be found here:
https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsStatus.html#toolsStatus__description
NetworkTopologyStrategy places replicas in the same datacenter by walking the ring clockwise until reaching the first node in another rack.
NetworkTopologyStrategy attempts to place replicas on distinct racks because nodes in the same rack (or similar physical grouping) often fail at the same time due to power, cooling, or network issues.
Because you only have two racks(RAC1 and RAC2), you are placing node 1 and node 3's replicas in node 2, which is why it is bigger.
https://docs.datastax.com/en/cassandra/3.0/cassandra/architecture/archDataDistributeReplication.html
Related
For some test purposes I want to break a consistency of data in my test cassandra cluster, consisting of two datacenters.
I assumed that if I use a consistency level equal to LOCAL_QUORUM, or LOCAL_ONE I will achieve this. Let us say I have a cassandra node node11 belonging to DC1:
cqlsh node11
CONSISTENCY LOCAL_QUORUM;
INSERT INTO test.test (...) VALUES (...) ;
But in fact, data appears in all nodes. I can read it from the node22 belonging to the DC2 even with the consistency level LOCAL_*. I've double checked: the nodetool shows me the two datacenters and node11 certainly belongs to the DC1, while node22 belongs to the DC2.
My keyspace test is configured as follows:
CREATE KEYSPACE "test"
WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 2, 'dc2' : 2};
and I have two nodes in each DC respectively.
My questions:
It seems to me that I wrongly understand the idea of these consistency level. In fact they do not prevent from writing data to the different DC's, but just ask for appearing of the data at least in the current datacenter. Is it correct understanding?
More essentially: is any way to perform such a trick and achieve such a "broken" consistency, when I have a different data stored in two datacenters within one cluster?
(At the moment I think that the only one way to achieve that - is to break the ring and do not allow nodes from one DC know anything about nodes from another DC, but I don't like this solution).
LOCAL_QUORUM, this consistency level requires a quorum of acknoledgement received from the local DC but all the data are sent to all the nodes defined in the keyspace.
Even at low consistency levels, the write is still sent to all
replicas for the written key, even replicas in other data centers. The
consistency level just determines how many replicas are required to
respond that they received the write.
https://docs.datastax.com/en/archived/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
I don't think there is proper way to do that
This suggestion is to test scenario only to break data consistency between 2 DCs. (haven't tried but based on my understanding should work)
Write data in one DC (say DC1) with Local* consistency
Before write, keep nodes in DC2 down so DC1 will store hints as DC2 nodes are down.
Let max_hint_window_in_ms (3 hours by default - and you can reduce it) time pass so that DC1 coordinator will delete all the hints
Start DC2 nodes and query with LOCAL* query, the data from DC1 won't be present in DC2.
You can repeat these steps and insert data in DC2 with different values keeping DC1 down so same data will have different values in DC1 and DC2.
I am using Cassandra 2.2.4.In these i have a table with replication factor 3 but i have only 2 node .The used disk space of these 2 nodes is different(1st node has 10 GB size and 2nd node has 14 GB ) . What is the reason for these difference.
Can anyone please help me?
Even if you had replication factor 1, the disk space might have been different still. This is because some partitions are stored in one node, and others in the other.
If you have more data belonging to partition A, then the node that has partition A will have more data.
The partition is determined from the primary key. This is why it's so important to have a good primary key.
You can watch the tutorials on the datastax website for details on how to choose the best data model and primary key: https://academy.datastax.com/courses .
I have a 3 node cluster, with replication factor of 2 but data is getting replicated on all 3 nodes. This is how I create my keyspace:
CREATE KEYSPACE IF NOT EXISTS DEMO WITH replication = {'class':'SimpleStrategy', 'replication_factor':2};
What's missing here ?
Cassandra distributes data based on primary key of the row. Any table is generally distributed over all the machines and when you insert a row, it is inserted on "two machines" only (These two machines are not random and can be calculated with nodetool)
If you want to know more about how data is distributed by primary key, take a look at partitioners. Cassandra Partitioners
Data is being distributed over 3 nodes, and each node holds 2 pieces of data: its own piece of data pertaining to its assigned partitions, and data belonging to its neighbor node.
Try to execute getendpoints on any of the partition key in a table with in that keyspace. You will get the nodes list which holds that partition. In this case, you should get output as 2 nodes only.
$ nodetool getendpoints <keyspace> <table> key
I am new to Cassandra and at work I have a 4 node cluster.
nodetool gossipinfo tells me that there are one datacentre, 2 racks and 2 nodes in each rack. Replication factor is defined as 2. nodetool ring tell me that each node has 50% ownership. There are 2 seed nodes in our config. Each rack has 1 seed node.
Does this mean that for each rack, there is one seed node and its replicated node. If that is the case then why is datasize not the same for seed node and its replicated node.
what happens if one node goes down. Will it have any impact on the data availability of the cluster.
Seeds
Seeds nodes are only special in the way that new nodes that join the cluster contact the seed nodes to find out about other nodes and the topology of the ring. But in Cassandra, all nodes are the same, i.e. there are no master or slave, no primary or secondary node. Because of this, you can elect any (or all) node as the seed.
Since seeds only relate to gossip information, it does not have anything to do with replicated data.
Size
In relation to data size, each node will never be exactly the same since each partition/row size is never the same. If you look at the nodetool cfstats output, you will see that there is a big range between minimum and maximum sizes.
Availability
If the reads are done with a consistency level CL=ONE, then if a node is down the other replica will continue to serve requests. But if reads are done with a higher consistency, then reads will fail since it needs 2 nodes to be available, i.e. CL=LOCAL_QUORUM requires [ RF/2 + 1 ] nodes to respond.
EDIT: Response to:
Shouldn't each node own 25%?
Ownership
In Cassandra, data is not "distributed" across ALL nodes in ALL DCs. In fact, a DC is a copy of another DC depending on the replication factor.
To illustrate, consider the following keyspace definition:
CREATE KEYSPACE "myKS"
WITH REPLICATION = {
'class' : 'NetworkTopologyStrategy',
'DC1' : 2,
'DC2' : 2};
Based on this definition, it means that the myKS keyspace has 2 replicas in DC1 and 2 replicas in DC2. Since each of your data centres only have 2 nodes, this effectively means that each DC is a copy of each other.
Following from that, since the tokens are split between 2 nodes, each node owns half of the data which is 50%. So in DC1, each node owns 50% and in DC2 (which is a copy of DC1) each node also owns 50%.
We are trying to provision a Cassandra cluster to use as a KV storage.
We are deploying a 3-node production cluster on our production DC, but we would also want to have a single node as a DR copy on our disaster recovery DC.
Using PropertyFileSnitch we have
10.1.1.1=DC1:R1
10.1.1.2=DC1:R1
10.1.1.3=DC1:R1
10.2.1.1=DC2:R1
We plan on using a keyspace with the following definition:
CREATE KEYSPACE "cassandraKV"
WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'DC1' : 2, 'DC2' : 1};
In order to achieve the following:
2 replicas distributed among 3 nodes in DC1 (66% of total data per node) while still allowing a single node to go down without any data loss.
1 replica in DC2 (100% of total data per node)
We see the ownership distributed at 25% per node, while we would expect 33% on each node in DC1 and 100% in DC2.
Is this above configuration correct?
Thanks
My guess is that you ran nodetool Status without specifying the keyspace. This will end up just showing you the general distribution of the tokens in your cluster which will not be representative of your "cassandraKV" keyspace.
On running nodetool status "cassandraKV" you should see
Datacenter: DC1
10.1.1.1: 66%
10.1.1.2: 66%
10.1.1.3: 66%
Datacenter: DC2
10.2.1.1: 100%
You should see 66% for the DC1 nodes because each node holds 1 copy of it's primary range (33%) and one copy of whatever it is replicating (33%)
While DC2 has 100% of all the data you are currently holding.