Cassandra cluster old data is not replicated in new node - cassandra

I installed apache cassandra on my local system for testing purpose. With 1 system (1 node) i was able to read/write and query in the database. I added another node and created a cluster. Now the data that I write on my system is replicated on other node and vice versa, but the data which was present on my system earlier to the addition of new node is not replicated. Though the Keyspaces and Tables are present on new node but they are empty. Did I do something wrong while adding the new node to the cluster?

My best guess is that you have auto_bootstrap turned off (it is ON by default). From the documentation:
auto_bootstrap
(Default: true ) This setting has been removed from default configuration. It makes new (non-seed) nodes automatically migrate the right data to themselves. When initializing a fresh cluster with no data, add auto_bootstrap: false.
The easiest way to fix this is to run a repair on the node which will ensure that it gets any data it's missing.
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html

Simply run this on new node if the data on it is not up-to-date.
Run $CASSANDRA_HOME/bin/nodetool rebuild
Login as cqlsh and verify that new node has received the data.
Also look into below -
http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html

If you have correct seeds and replication settings. But still the data is not getting replicated to new data center. Please perform nodetool repair with -full option will replicate data to the new datacenter node.

Related

Cannot change Cassandra datacenter

After cassandra install via sudo apt install -y cassandra, cassandra is running, but I have not yet configured cassandra.yaml and cassandra-topology.properties. Once I configure them cassandra won't start because Cannot start node if snitch's data center (0) differs from previous data center (datacenter1)
Assuming that you don't care about the data stored, you can fix that by deleting everything in your data_file_directories. If you're not setting that, you'll find it at $CASSANDRA_HOME/data/data.
Basically, the cluster metadata gets written in the system keyspace, which uses the local replication strategy (system is unique on each node). At start time, Cassandra verifies the stored metadata vs. the config properties being passed. When you change something about the node, Cassandra will throw an error on specific properties like cluster_name, dc, rack (and possibly a few others) when they don't match what's on disk.
tl;dr;
You probably only need to delete only the data for the system keyspace.
But another option, would be to uncomment and set data_file_directories. Then the new node's system metadata would be written there, and the node would start.

Workaround for CASSANDRA-12813: NPE in auth for bootstrapping node

I have a cassandra 3.9 production cluster and I am trying to add a node to that cluster. However I am seeing this issue.
CASSANDRA-12813 NPE in auth for bootstrapping node
https://issues.apache.org/jira/browse/CASSANDRA-12813
Shy of upgrading my production cluster to 3.11 (which I may not be able to do immediately), is there a known workaround to this issue?
An undocumented (but works) way around this is to copy "system_auth" directory from another node and place it in the new node's data directory. Start Cassandra only after this step. This way setting up new auth table during bootstrap will be bypassed with the existing content. The content from the system_auth SSTables will not cause any harm, as its a copy of the users/roles belonging to tokens corresponding to that other node. Once copied repair, will take the responsibility of cleaning it up, if the tokens corresponding doesn't belong there.
Once the node has successfully been able to bootstrap, do a "nodetool repair" on system_auth keyspace and it would take care of the exact replica copies.

How to reuse a cassandra node after decommission?

I had a cluster with 2 nodes (node 1 and node 2).
After decommissioning node 2 I wanted to use the server as a fresh Cassandra database for other purposes, but as soon as I restart this message appears:
org.apache.cassandra.exceptions.ConfigurationException: This node was
decommissioned and will not rejoin the ring unless
cassandra.override_decommission=true has been set, or all existing
data is removed and the node is bootstrapped again
So I removed all existing data.
But I don't want the node to be bootstrapped again (neither rejoin the previous ring) but to be a fresh new and pure Cassandra database to be used.
The old node is not on the seed list.
Cassandra version: 3.9
EDIT: I think I was missunderstood, sorry for that. After the decommission I want to have:
Db1: node 1
Db2: node 2
Two diferent databases with no correlation, totally separated. That's because we want to reuse the machine where node2 is hosted again to deploy a Cassandra DB in another enviroment.
Don't use override_decommission. That flag is only used for rejoining the same cluster.
You should remove all data files on the node (Cassandra will recreate system tables on start). Most importantly you need to change the seed in cassandra.yaml. I suspect that it is still the ip of node 1, so you need to change it to node 2 (itself).
Use option
cassandra.override_decommission: true
Use that option, cassandra.override_decommission=true. Also, be aware what is the definition of cluster_name is cassandra.yaml:
The name of the cluster. This setting prevents nodes in one logical
cluster from joining another. All nodes in a cluster must have the
same value.
So, to be sure, also use another value for cluster_name option in cassandra.yaml.
Try these steps:
run in cqlsh: UPDATE system.local SET cluster_name = 'new_name'
where key='local';
nodetool flush in order to persist the data
nodetool decommission
stop node
change name in cassandra.yaml
clean node sudo rm -rf /var/lib/cassandra/* /var/log/cassandra/* but I would just move those file in some other place until you get the state that you want
start node
Please check 1, 2

Cassandra new node bootstrapping while it should not

I'm adding a new datacenter to an existing cluster and I'm following this "procedure".
However the first node I start is apparently bootstrapping: the load information from nodetool status keeps growing...
I added
auto_boostrap: false
in cassandra.yaml.
Am I missing something?
By adding auto_bootstrap: false , you tell the node not to bootstrap - that DOESNT tell it not to take any writes. What is the replication settings for the new datacenter? Did you already enable it in the various keyspaces? If so, it will participate in writes.
When you say you see the load increasing, is it streaming? Do you see files being transferred in nodetool netstats? Is the node Up, Normal or Up, Joining?
check your cassandra-rackdc.properties and don't forget to change DC(dc=DC1) name in that file if you are creating a new datacenter otherwise it will consider the new node as of same datacenter.

Adding a new node to existing cluster

Is it possible to add a new node to an existing cluster in cassandra 1.2 without running nodetool cleanup on each individual node once data has been added?
It probably isn't but I need to ask because I'm trying to create an application where each user's machine is a server allowing for endless scaling.
Any advice would be appreciated.
Yes, it is possible. But you should be aware of the side-effects of not doing so.
nodetool cleanup purges keys that are no longer allocated to that node. According to the Apache docs, these keys count against the allocated data for that node, which can cause the auto bootstrap process for the next node to not properly balance the ring. So depending on how you are bringing new user machines into the ring, this may or may not be a problem.
Also keep in mind that nodetool cleanup only needs to be run on nodes that lost keyspace to the new node - i.e. adjacent nodes, not all nodes, in the cluster.

Resources