Data is replicated/copied on my 2nd node even with a replication factor of 1 for the key-space - cassandra

I have a Cassandra cluster of 3 nodes and I create a keyspace 'abcd' using SimpleStrategy and ReplicationFactor 1. Since I have chosen RF as 1, I assume that any writes to my node-1 should not be replicated across the other 2 nodes.
But when I inserted a record into keyspace/table, I saw this new row is getting inserted in to all nodes in my cluster.
My question is since I have chosen RF as 1 for this keyspace, I would have expected only one node (i.e. node-1) in this cluster should have owned this data, not the rest of the nodes.
Pease correct me if my understanding is wrong.

Since your RF is 1, your data is getting written to only one node. But you can access that data from running the select query from other nodes also as any node in a Cassandra cluster is able to access all the data present in Cluster.
If the node from which you are running the query does not have the data, it will fetch the data from other nodes and display the result.
You can check which exact node has the data by running nodetool getendpoints.
You will need to mention your keyspace, table name and partition key.

Related

Cassandra UnavailableException with one node up and replication factor is 1, ordered partitioned

Hello, I got a problem when getting some data from Cassandra.
My cluster had following attributions:
1. Ordered partitioner. The cluster had 10 nodes, and using ByteOrderedPartitioner.
2. Replication was 1. The cluster was in SimpleStrategy with replication factor is 1.
When I started up only a node, and to get some data from that node(I was sure that the desired data was stored in the only started node), UnavailableException throwed.
Anybody could give some suggestions? Thanks a lot.

Cassandra clustering fail over-High Avialability

I have configured a cassandra clustter with 3 nodes
Node1(192.168.0.2) , Node2(192.168.0.3), Node3(192.168.0.4)
Created a keyspace 'test' with replication factor as 2.
Create KEYSPACE test WITH replication = {'class':'SimpleStrategy',
'replication_factor' : 2}
When I stop either Node2 or Node3 (one at a time and both at one time) , I am able to do the CRUD operations on the keyspace.table.
When I stop Node1 and try to update/create a row from Node4 or Node3, getting following error although Node3 and Node4 are up and running-:
All host(s) tried for query failed (tried: /192.168.0.4:9042
(com.datastax.driver.core.exceptions.DriverException: Timeout while
trying to acquire available connection (you may want to increase the
driver number of per-host connections)))
com.datastax.driver.core.exceptions.NoHostAvailableException: All
host(s) tried for query failed (tried: /192.168.0.4:9042
(com.datastax.driver.core.exceptions.DriverException: Timeout while
trying to acquire available connection (you may want to increase the
driver number of per-host connections)))
I am not sure how Cassandra elects a leader if a leader node dies.
So, you are using replication_factor 2, so only 2 nodes will have a replica of you keyspace (not all the 3 nodes).
My first advise is to change the RF to 3.
You have to pay attention to the consistency level you are using; If you have only 2 copies of you data (RF: 2), and you are using Consistency Level QUORUM, it will try to write the data on half of nodes + 1, in this case, all 2 nodes. So if 1 node is down, you will not be able to write/read data.
to verify where the data is replicated you could see how is the ring in you cluster. As you are using SimpleStrategy it will copy the data clockwise direction. And in your case its copied at nodes at 192.168.0.2 and 192.168.0.3.
Take a look at the concepts of replication factor: http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html
And Consistency Level: http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
Great answer about RF vs CL: https://stackoverflow.com/a/24590299/6826860
You can use this calculator to find out if your setup have a decent consistency. In your case the result is You can survive the loss of no nodes without impacting the application
I think I wasn't clear at response. The replication factor is about how many copies of your data will exists. The consistency level is how many copies your client will wait to be made before get an response from server.
Ex: All your nodes are up. The client make a CQL with CL Quorum, the server will copy the data in 2 nodes (3/2 + 1) and reply to client, in background it will copy the data at the third node as well.
In your example, if you shutdown 2 nodes of a 3 node cluster you will never achieve an QUORUM to make requests (with CL QUORUM), so you have to use consistency level ONE, once the nodes are up again, cassandra will copy the data on them. One thing that can happen is: before cassandra copy the data on other 2 nodes, the client make a request for node1 or node2 and the data is not there yet.

Set replication factor to 2, in a 3 node Cassandra cluster; but still data is getting replicated to all 3 nodes on insertion

I have a 3 node cluster, with replication factor of 2 but data is getting replicated on all 3 nodes. This is how I create my keyspace:
CREATE KEYSPACE IF NOT EXISTS DEMO WITH replication = {'class':'SimpleStrategy', 'replication_factor':2};
What's missing here ?
Cassandra distributes data based on primary key of the row. Any table is generally distributed over all the machines and when you insert a row, it is inserted on "two machines" only (These two machines are not random and can be calculated with nodetool)
If you want to know more about how data is distributed by primary key, take a look at partitioners. Cassandra Partitioners
Data is being distributed over 3 nodes, and each node holds 2 pieces of data: its own piece of data pertaining to its assigned partitions, and data belonging to its neighbor node.
Try to execute getendpoints on any of the partition key in a table with in that keyspace. You will get the nodes list which holds that partition. In this case, you should get output as 2 nodes only.
$ nodetool getendpoints <keyspace> <table> key

Cassandra nodes ownership is 0.00%

I have a Cassandra cluster with 2 nodes. I am using NetworkTopologyStrategy
I was trying to increase the replication factor of keyspace in Cassandra to 2. I did the following steps:
UPDATE KEYSPACE demo WITH strategy_options = {DC1:2,DC2:2}; on both the nodes
Then I ran the nodetool repair on both the nodes
Then I ran my Hector code to count the number of rows and columns in the database.
I get the following error: Unavailable Exception
Also when I run the command
./nodetool –h ip_address ring
I found that both nodes ownership is 0 %. Please tell me how should I fix that.
You mention "both nodes", which implies that you have two total nodes rather than two data centers as would be suggested by your strategy options. Specifying {DC1:2,DC2:2} would require a minimum of four nodes (two in each DC to satisfy the replication factor), although this would not be advised since essentially all your nodes would be points of failure.
A minimal Cassandra cluster should have at least three nodes, in which case a RF of two would allow one node to go down without bringing down the system. It sounds like you have a single cluster (rather than two data centers), so what you really need is one more node (3 total), RF=2, using the SimpleStrategy instead of NetworkTopologyStrategy.

How to migrate data from Cassandra cluster of size N to a different cluster of size N+/-M

I'm trying to figure out how to migrate data from one cassandra cluster, to another cassandra cluster of a different ring size...say from a 5 node cluster to a 7 node cluster.
I started looking at sstable2json, since it creates a json file for the SSTable on that specific cassandra node. My thought was to do this for a column family on each node in the ring. So on a 5 node ring, this would give me 5 json files, one file for the data stored on in the column family that resides on each node.
Then I'd merge the json files into one file, and use json2sstable to import into a new cluster, of size, lets say 7. I was hoping that cassandra would then replicate/balance the data out evenly across the nodes in the ring, but I just read that SSTables are immutable once written. So if I did what I just mentioned, I'd end up with a ring with all the data in my column family on one node.
So can anyone help me figure out the process for migrating data from one cluster to a different cluster of a different ring size?
Better: use bin/sstableloader on the sstables from the old ring, to stream to the new one.
Normally sstableloader is used in a sequence like this:
Create sstables locally using SSTableWriter
Use sstableloader to stream the data in the sstables to the right nodes (bin/sstableloader path-to-directory-full-of-sstables). The directory name is assumed to be the keyspace, which will be the case if you point it at an existing Cassandra data directory.
Since you're looking to stream data from an existing cluster A to a new cluter B, you can skip straight to running sstableloader against the data on each node in cluster A.
More details on using sstableloader in this blog post.
You don't need to use sstable2json. If you have the space you can:
get all the sstables from all of the nodes on the old ring
put them all together on each of the new servers (renaming any which have the same names)
run nodetool cleanup on each node in the new ring and they will throw away the data that doesn't belong to them.
You may do some steps as following:
1. Join 7 nodes into 5 nodes clusters and set up each node with its own ring token. At this time, you may have a cluster with 12 nodes.
2. Remove 5 nodes from new cluster in step 1.
3. Set up the token ring for each node after moving 5 nodes in your own.
4. Repairing the 7 nodes cluster.
I would venture to say that this isn't as big of a problem as it may seem.
Create your new ring and define the tokens for each node appropriately as per http://wiki.apache.org/cassandra/Operations#Token_selection
Import data into the new ring.
The ring will balance itself based on the tokens you have defined http://wiki.apache.org/cassandra/Operations#Import_.2BAC8_export

Resources