Scale Cassandra with copying data manually - cassandra

I created an AMI from my cassandra machine and then launched a new instance. After making config changes(setting the seed node to the first one, and setting auto_bootstrap: false) when I start cassandra and do a nodetool status it shows data on the both the nodes. I just want to know if the cluster actually knows that both nodes have the data and if a request comes can route it to the second node also.
As without manually copying data, the streaming is actually not getting completed. It somehow manages to fail after a certain period of time and then I have to again run 'nodetool bootstrap resume' to restart bootstraping process which again fails.

I don't think this should work this way (all the copying thing).
Why you can't perform normal bootstrapping? What are error messages in the logs when you try to do it? What is RF of your keyspace?
In addition to your data, Cassandra also saves information about the node on disk, all the system tables, for example node id, so you can't just replicate the image. If you copied cassandra image, and just changed config, this wouldn't work, you should delete all data prior to starting the node and joining to cluster.
EDIT:
If you going with auto_bootstrap: off
Remove all the data from the new server (both data and commit log directories).
Start the node, and after it joins, run rebuild.
Run repair after the process is finished.
If you going with auto_bootstrap: on
Remove all the data from the new server (both data and commit log directories).
Start the node and monitor the bootstraping.
Before trying these, remove the node you can't add from the cluster.

Related

Cassandra cluster - Migrating all hosts in cluster

I am using Cassandra(3.5) with 20 nodes with data center-1 with 10 nodes and data center-2 with 10 nodes and has huge data. All hosts belong to say legacy hosts. Now we have newer generation hosts say generation-2.
I have tried adding new nodes and decommissioning old node. But this will be tie consuming.
Q1: How can I migrate all hosts from legacy hosts to generation-2 host? What is the best approach for that?
Q2: What will be rollback strategy?
Q3: Finally, How can I validate data once I migrate to generation-2 hosts?
If you just replacing the nodes with newer hardware, keeping the same number of nodes, then it's simple (operations should be done on every node):
prepare the new installation on every node, with configuration identical to existing nodes, but with different IP addresses but don't start the nodes
(optional) disable autocompaction with nodetool disableautocompaction - this could help to execute step 5 faster
copy data from old node to new node using rsync (this could take long time)
execute nodetool drain & stop old node
use rsync to synchronize changes happened since initial copying (it should be relatively fast)
make sure that the old node won't start again (for example, remove Cassandra package) - otherwise it could be a chaos
start the new node
This works because Cassandra node is identified by the UUID that is stored in the local table, so changing of IP doesn't affect the operations.
P.S. In future, if you'll need to replace node (not as described, but completely died), use the procedure of node replacement - in this case, you won't stream data twice, as happened when you do decomissioning and then re-adding node.

Adding new node to Cassandra cluster

I have a 4 node cluster and will be adding an additional node in two days. We aren't using vnodes.
Just wondering the best way to rebalance the cluster after I'm done. Do I just bring the new node up and then start the nodetool move?
Or do I shut each node down, change the initial_token value for each one (using one of those generators to calculate the values for me) and then bring each node up?
I just want to know the simplest way to do this from command line. The new node already has Cassandra installed as it was initially a non-production server, I will delete the data off of the node and change the config files accordingly for the new cluster it will now be a part of, just unsure as to the other steps.
From this page Adding or replacing single-token nodes, the simplest mechanism is to start the new node with it's initial-token left empty in cassandra.yaml. This will make the cluster 'split the token range of the heaviest loaded node and position the new node there'. This won't give you a balanced cluster.
If you want a balanced cluster then you have to go through the nodetool move, node restart, nodetool cleanup procedure you mentioned.

Cassandra cluster old data is not replicated in new node

I installed apache cassandra on my local system for testing purpose. With 1 system (1 node) i was able to read/write and query in the database. I added another node and created a cluster. Now the data that I write on my system is replicated on other node and vice versa, but the data which was present on my system earlier to the addition of new node is not replicated. Though the Keyspaces and Tables are present on new node but they are empty. Did I do something wrong while adding the new node to the cluster?
My best guess is that you have auto_bootstrap turned off (it is ON by default). From the documentation:
auto_bootstrap
(Default: true ) This setting has been removed from default configuration. It makes new (non-seed) nodes automatically migrate the right data to themselves. When initializing a fresh cluster with no data, add auto_bootstrap: false.
The easiest way to fix this is to run a repair on the node which will ensure that it gets any data it's missing.
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
Simply run this on new node if the data on it is not up-to-date.
Run $CASSANDRA_HOME/bin/nodetool rebuild
Login as cqlsh and verify that new node has received the data.
Also look into below -
http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
If you have correct seeds and replication settings. But still the data is not getting replicated to new data center. Please perform nodetool repair with -full option will replicate data to the new datacenter node.

Cassandra ring configuration

I am trying to connect Apache Cassandra nodes into a ring. They are not Datastax versions, but Cassandra 1.2.8 from the Apache website. When trying to add one as the seed of the other I get following exception:
Unable to find compaction strategy class 'com.datastax.bdp.hadoop.cfs.compaction.CFSCompactionStrategy'
Before that I change the "listen_address" and "rpc_address" to local IP address of each node. Next step I add one IP as a seed to another node. The nodes start up, an exception is printed but both nodes run fine until restart. After restarting either node the exception is printed and nodes do not run.
This is very strange - I do not have any DSE components.
Did you previously use any DSE components? If you did and are using the same data directory on any of your nodes, it may find old column families that were created with this compaction strategy. If you have no data you want in the data directories on all your nodes, you should clear them by stopping all nodes, deleting the directories, then starting the nodes.
Or if you have any DSE nodes still up, they may be joining the new cluster and propagating their schema, so creating column families with this compaction strategy. You can find out by looking in the logs and seeing which nodes try to connect. If any aren't from your 1.2.8 ring then this is probably the cause.
That error means you either had a DSE Analytics node in your ring at some point, or you restored your schema from someplace that had an Analytics node.
I would check if you have the folder /etc/dse/ on your VM, that would mean DSE was installed there.
To just wipe the node and start from scratch schema wise, you can stop the node, remove the /system/schema_* folders, then start the node. When it starts it will have no schema. Re-create any keyspaces/column families you had before, and they will get read from disk.

Adding a new node to existing cluster

Is it possible to add a new node to an existing cluster in cassandra 1.2 without running nodetool cleanup on each individual node once data has been added?
It probably isn't but I need to ask because I'm trying to create an application where each user's machine is a server allowing for endless scaling.
Any advice would be appreciated.
Yes, it is possible. But you should be aware of the side-effects of not doing so.
nodetool cleanup purges keys that are no longer allocated to that node. According to the Apache docs, these keys count against the allocated data for that node, which can cause the auto bootstrap process for the next node to not properly balance the ring. So depending on how you are bringing new user machines into the ring, this may or may not be a problem.
Also keep in mind that nodetool cleanup only needs to be run on nodes that lost keyspace to the new node - i.e. adjacent nodes, not all nodes, in the cluster.

Resources