Removing DC from multi DC cluster in Cassandra - cassandra

I have a two datacenter site (dc1 and dc2). I am writing with replication of 3 (dc1:3 , dc2:3) on dc1. dc2 is backup site taking no traffic. I upgraded all the nodes of dc2 to C* version 3.11.2. Nodes of dc1 are on C* version 2.1.16. Now due to some issue I have to rollback my upgrade. I have two options
Data backup restore the complete site (dc1 and dc2) - It will cause a lot of data loss.
Remove dc2 from dc1 using steps given here.
Is there any issue in removing a site(dc2) in case of mixed C* versions?

If it were me, I would:
Take DC2 out of replication.
Shutdown nodes on DC2.
Remove the nodes/assassinate them.
Uninstall C* completely.
Wipe the nodes of all data/logs/configuration.
Install C* and reconfigure.
Add nodes to a new DC.
This means there's no data loss by having to restore from backups. Cheers!

Yes, second option is seems to be good and you can recover your data safely. You should remove DC2 datacenter from your existing cluster. As you are saying no traffic on DC2 so it could be easy to performing addition and removal operation.
You need to follow the steps as below.
Change the replication factor of keyspaces.
Stop the Cassandra services on DC2.
You can remove the nodes from existing cluster via nodetool removenode command if it is creating a issue you can use assassinate.
Once node removed from the cluster one by one you need to uninstall the Cassandra there.
Remove existing data on removed node completely.
Then, you need to install fresh Cassandra there based on previous configuration, you can refer config files from existing cluster or you took a backup for you config on 2.1.16
Now, you need to add your datacenter again on cluster.
In this way, you can easily get your datacenter and data quickly.
You can refer the documentation here for any confusion in addition
https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/operations/opsAddDCToClusterDesigDC.html

Related

Cassandra Cluster Migration Issues

Now:
Single Node
cassandra 3.11.3 + kairosdb 1.2
Two data storage path
/data/cassandra/data/kairosdb 4T Old data
/data1/cassandra/data/kaiosdb 1.1T Now Wrting data
target:
Three Node
cassandra 3.11.3 + kairosdb 1.2
One data storage path
/data/cassandra/data/kairosdb
In this case, how to migrate the data in two data directories under a single node to a three-node cluster, each node of this three-node cluster has only one data directory
I understand how to do it (and have practiced it) when migrating a single node to a three-node cluster, but only when there is only one data directory.2 data directories are migrated to 1,I have searched the Internet for a long time, but there is no reference material.
Data directories are something that the individual Cassandra node cares about but the Cluster doesn't.
Usually you'd want to have all nodes share the same configuration but for replication it really doesn't matter where the SSTables are on Disk on each node.
So migrating here would be the same as you've practiced.
That said the process I'd choose would be to add the new nodes as a second DC with the right replication, run a repair to have all the data in sync and then decommission the original node.

Add datacenter with one node to backup existing one

I already have a working datacenter with 3 nodes (replication factor 2). I want to add another datacenter with only one node to have all backup data from existing datacenter. The final solution:
dc1: 3 nodes (2 rf)
dc2: 1 node (1 rf)
My application would then connect only to dc1 nodes and send data. If dc1 breaks down I can recover data from dc2 which is on the other physical machine in different location. I could also use dc2 for AI queries or some other task. I'm a newbie in case of cassandra configuration so I want to know if I'm not making some kind of a mistake in my thinking. I'm planing on using this configuration docs to add new dc: https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/operations/opsAddDCToCluster.html
Is there anything more I should keep in mind to get this to work or some easier solution to have data backup?
Update: It won't only be a backup, we wont to use this second DC for connecting application also when dc1 would be unavailable (ex. power outage).
Update: dc2 is running, I had some problems with coping data from one dc to other and nodetool status didn't show 2 dc's but after fixing firewall rules for port 7000 I managed to connect both dc's and share data between them.
with this approach, your single node will get 2 times more traffic than other nodes. Also, it may add a load to the nodes in dc1 because they will need to collect hints, etc. when node in dc2 is not available. If you need just backup, setup something like medusa, and store data in the cheap environment, like, S3 - but of course, it will require time to restore if you lose the whole DC.
But in reality, you need to think about your high-availability strategy - what will happen with your clients if you lose the primary DC? Is it critical to wait until recovery, or you're really requiring full fault tolerance? I recommend to read the Designing Fault-Tolerant Applications with DataStax and Apache Cassandraâ„¢ whitepaper from DataStax - it explains the details of designing really fault tolerant applications.

Cassandra repair after datacenter went down

I have a Cassandra db (version 3.11.2) running in AWS, with 2 Datacenters - each in another AWS region and 3 nodes in each one.
The replication factor on all keyspaces is 3, so full replication of data on every node. The size of data is about 10GB per node.
All of our writes are in LOCAL_QUORUM against one DC (lets call it DC1). Basically the other DC is just for a kind of backup and disaster recovery, in case the AWS region for DC1 will be unavailable we will redirect traffic to DC2.
My issue is that we had a network disconnection between the two DCs, for several hours, and after several days we noticed that there is missing data in DC2. This all makes sense, since the time the DCs were apart is larger than the Hinted Handoff window (3 hours). So we need to run a repair to bring DC2 back to sync with DC1.
I went over the cassandra docs, and read countless SO answers and for the life of me I couldn't understand what is the right repair to do...
Do I need to issue a 'nodetool repair --full --sequential' from only one node? Do I need to run it on every node in the cluster? Maybe it's better to run 'nodetool rebuild'?
Executing nodetool cleanup on the nodes on datacenter2 should be able to bring up the data up to sync, but depending on the data size affected, this may be a task that can take time and resources. If the datacenter2 is only as a backup for disaster recovery purposes, it may be easier and quicker to backup the current dc1 cluster and restore it in the second datacenter (more information is available here.

Nodetool rebuild not working reliably on Cassandra 3.11.3

I presently have a cassandra 3.11.3 cluster with a single DC. I recently added another dc to my cluster. And I followed the instructions #
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsAddDCToCluster.html
As per the instructions I ran 'nodetool rebuild -ks -- dc1 on each of the nodes. However this rebuild command did not actually work as intended. My data is partially missing in the new nodes. This I know because I sampled the data in the new dc through my app using consistency local_one. I dont see the data replenish through read repair either. Oh and I should mention that there were no errors in the logs following the rebuild command. So everything appeared to have succeeded.
What am I missing here? Is there a known issue reported on this?
You should run nodetool rebuild --<existing DC> on each node. this command will pull all keyspace data from existing data center based on allocated tokens and RF. To ensure consistency please run full repair on nodes as well.

Way to determine healthy Cassandra cluster?

I've been tasked with re-writing some sub-par Ansible playbooks to stand up a Cassandra cluster in CentOS. Quite frankly, there doesn't seem to be much information on Cassandra out there.
I've managed to get the service running on all three nodes at the same time, using the following configuration file, info scrubbed.
HOSTIP=10.0.0.1
MSIP=10.10.10.10
ADMIN_EMAIL=my#email.com
LICENSE_FILE=/tmp/license.conf
USE_LDAP_REMOTE_HOST=n
ENABLE_AX=y
MP_POD=gateway
REGION=test-1
USE_ZK_CLUSTER=y
ZK_HOSTS="10.0.0.1 10.0.0.2 10.0.0.3"
ZK_CLIENT_HOSTS="10.0.0.1 10.0.0.2 10.0.0.3"
USE_CASS_CLUSTER=y
CASS_HOSTS="10.0.0.1:1,1 10.0.0.2:1,1 10.0.0.3:1,1"
CASS_USERNAME=test
CASS_PASSWORD=test
The HOSTIP changes depending on which node the configuration file is on.
The problem is, when I run nodetool ring, each node says there's only two nodes in the cluster: itself and one other, seemingly random from the other two.
What are some basic sanity checks to determine a "healthy" Cassandra cluster? Why is nodetool saying each one thinks there's a different node missing from the cluster?
nodetool status - overview of the cluster (load, state, ownership)
nodetool info - more granular details at the node-level
As for the node mismatch I would check the following:
cassandra-topology.properties - identical across the cluster (all 3 IPs listed)
cassandra.yaml - I typically keep this file the same across all nodes. The parameters that MUST stay the same across the cluster are: cluster_name, seeds, partitioner, snitch).
verify all nodes can reach each other (ping, telnet, etc)
DataStax (Cassandra Vendor) has some good documentation. Please note that some features are only available on DataStax Enterprise -
http://docs.datastax.com/en/landing_page/doc/landing_page/current.html
Also check out the Apache Cassandra site -
http://cassandra.apache.org/community/
As well as the user forums -
https://www.mail-archive.com/user#cassandra.apache.org/
Actually, the thing you really want to check is if all the nodes "AGREE" on schema_id. nodetool status shows if nodes or up, down, joining, yet it does not really mean 'healthy' enough to make schema changes or do other changes.
The simplest way is:
nodetool describecluster
Cluster Information:
Name: FooBarCluster
Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
DynamicEndPointSnitch: enabled
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
43fe9177-382c-327e-904a-c8353a9df590: [10.136.2.1, 10.136.2.2, 10.136.2.3]
If schema IDs do not match, you need to wait for schema to settle, or run repairs, say for example like this:
43fe9177-382c-327e-904a-c8353a9df590: [10.136.2.1, 10.136.2.2]
43fe9177-382c-327e-904a-c8353a9dxxxx: [10.136.2.3]
However, running nodetool is 'heavy' and hard to parse.
The information is inside the database, you can check here:
'SELECT schema_version, release_version FROM system.local' and
'SELECT peer, schema_version, release_version FROM system.peers'
Then you compare schema_version across all nodes... if they match, the cluster is very likely healthy. You should ALWAYS check this before making any changes to schema.
Now, during a rolling upgrade, when changing engine versions, the release_version is different, so to support automatic rolling upgrades, you need to check schema_id matching within release_versions separately.
I'm not sure all of the problems you might be having, but...
Check the cassandra.yaml file. You need minimum 3 things to be the same - seeds: list (but do not list all nodes as seeds!), cluster_name, and snitch. Make sure your listen_address is correct.
If you are using gossipingPropertyFileSnitch then check cassandra-topology.properties and/or cassandra-rackdc.properties files for accuracy.
Don't start all the nodes at the same time. Start the seed nodes 1st - the other nodes will "gossip" with the seed node to learn cluster topology. Shutdown the seed nodes last.
Don't use shared storage. That defeats the purpose of distributed data and is considered a cassandra anti-pattern.
If you're in AWS, don't use auto-scaling groups unless you know what you're doing.
Once you've done all that, use nodetool status | ring | info or jmx to see what the cluster is doing.
Datastax does have decent documentation for cassandra.

Resources