In the official manual, they don't recommend failback to remote DC if local DC is down. It might cause many unforeseen problems.
https://docs.datastax.com/en/developer/java-driver/4.9/manual/core/load_balancing/#local-only
Let's say I have 2 clusters: one is the main cluster and the other is the backup cluster, A and B respectively and they are in the same data center.
I want to implement a feature: if A cluster is down, switch to B cluster.
Is it OK to use DCAwareRoundRobinPolicy?
If I use DCAwareRoundRobinPolicy, does it cause any issue?
Related
As you know, with Cassandra, when nodes are overloaded, it may seriously hurt your production depending on required consistency, because nodes might become unresponsive, the entire daemon might also crash, hints might fill-up your data mount point, and so on.
So the keyword here is back-pressure.
To do appropriate back-pressure with Spark on Cassandra, there are especially the following properties :
--conf "spark.cassandra.output.throughputMBPerSec=2"
--total-executor-cores 24
(There are also similar back-pressure options with Datastax driver, or cqlsh. You basically limit the throughput per core, to apply some back-pressure)
Let say, I found my global write throuput on my Cassandra cluster, and I set appropriate settings for my application1 that works fine.
BUT still, the challenge, is that there are many developers on a Cassandra cluster. So at a given time, I may have Spark application1, application2, application3, ... that runs concurrently.
Question : What are my options to ensure that the write troughput (no matter how many applications runs concurrently) at a given time is globally NOT going to reach too much pressure for Cassandra, thus hurting my production workload ?
Thank you
What I recommend folks do to separate analytical workloads, is to spin-up another (logical) data center. Sure, it could be in the same physical data center. But what you want is separate compute and storage to keep the analytics load from interfering with the production traffic.
First, make sure that you're running with the GossipingPropertyFileSnitch (cassandra.yaml) and that your keyspaces are using the NetworkTopologyStrategy. Likewise, you'll want to make sure that your keyspace definition contains a named data center and that your production application/services are configured to use that data center (ex: dc1 as below) as their default DC:
ALTER KEYSPACE product_data WITH
REPLICATION={'class':'NetworkTopologyStrategy',
'dc1':'3'};
Once the new infra is up, install Cassandra and join the nodes to the cluster as a new DC by specifying the new name in the cassandra-rackdc.properties file. Something like:
dc=dc1_analytics
Next adjust your keyspace(s) to replicate data to the new DC.
ALTER KEYSPACE product_data WITH
REPLICATION={'class':'NetworkTopologyStrategy',
'dc1':'3','dc1_analyitcs':'3'};
Run a repair/rebuild on the new DC, and then configure the Spark jobs to only use dc1_analytics.
I already have a working datacenter with 3 nodes (replication factor 2). I want to add another datacenter with only one node to have all backup data from existing datacenter. The final solution:
dc1: 3 nodes (2 rf)
dc2: 1 node (1 rf)
My application would then connect only to dc1 nodes and send data. If dc1 breaks down I can recover data from dc2 which is on the other physical machine in different location. I could also use dc2 for AI queries or some other task. I'm a newbie in case of cassandra configuration so I want to know if I'm not making some kind of a mistake in my thinking. I'm planing on using this configuration docs to add new dc: https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/operations/opsAddDCToCluster.html
Is there anything more I should keep in mind to get this to work or some easier solution to have data backup?
Update: It won't only be a backup, we wont to use this second DC for connecting application also when dc1 would be unavailable (ex. power outage).
Update: dc2 is running, I had some problems with coping data from one dc to other and nodetool status didn't show 2 dc's but after fixing firewall rules for port 7000 I managed to connect both dc's and share data between them.
with this approach, your single node will get 2 times more traffic than other nodes. Also, it may add a load to the nodes in dc1 because they will need to collect hints, etc. when node in dc2 is not available. If you need just backup, setup something like medusa, and store data in the cheap environment, like, S3 - but of course, it will require time to restore if you lose the whole DC.
But in reality, you need to think about your high-availability strategy - what will happen with your clients if you lose the primary DC? Is it critical to wait until recovery, or you're really requiring full fault tolerance? I recommend to read the Designing Fault-Tolerant Applications with DataStax and Apache Cassandra™ whitepaper from DataStax - it explains the details of designing really fault tolerant applications.
I am a newbie in Cassandra.
In our production environment three node Cassandra clusters are running and serving production traffic but I have below mentioned doubts:-
1) All nodes are configured in different racks i.e rack 1, rack 2 and rack 3 in the same dc. Is it fine or does this configuration have some drawbacks?
2) We are using rf2 and network topology for all the keyspaces except system tables and these system tables are configured with rf2 and simplestrategy ..is it fine or does this need to be changed? should we increase the replication factor of system_auth? ..please let me know..
3) Now I want to add another node in the same dc, what will be the best procedure to do the same without impacting the live traffic?
Cassandra version is Apache cassandra 3.11.
Thanks in advance..
Ans 1) It seems good to have Cassandra nodes in different racks for availability and fault tolerance .
Ans 2) You must increase RF on system_auth so that you can avoid cqlsh login issue from other nodes.
Ans 3) You can add new node without affecting the live traffic on existing cluster. please follow below procedure.
http://cassandra.apache.org/doc/latest/operating/topo_changes.html
Cassandra is designed as a distributed system. Cassandra’s distributed architecture is specifically tailored for multiple-data center deployment. These features are robust and flexible enough that you can configure the cluster for optimal geographical distribution, for redundancy for fail-over and disaster recovery.
Multiple data center deployments are excellent for global solutions where in some applications are operational in one region and other applications in another region and yet using a single cluster of Cassandra which is working in multiple data centers across regions.
For single region applications, still having multiple data-centers is preferred option because it provides disaster recovery even in case one region goes down.
Ans 1) For a single DC Cassandra cluster , recommendation is to have 4 nodes with RF3. Rack 1 with 2 nodes and Rack 2 with 2 nodes. Remember that nodes in the same rack have faster network than nodes in different racks. With two nodes on the same Rack, queries with LOCAL_QUORUM will be faster as compared to queries on a cluster with all nodes on different racks.
If you are not concerned with the query latency , all nodes in different racks (3 racks) will give better disaster recovery as compared with two RACK deployment. Having said that, it's always recommended to use multi DC deployments for production cluster.
Ans 2) It’s always recommended to increase the replication factor of System_auth keyspace and change the replication class to NetworkTopologyStrategy. Please follow this documentation for more details https://docs.datastax.com/en/security/6.0/security/secSystemKeyspace.html
Ans 3) Yes, You can add a new node to existing cluster with ease without impacting the traffic. Please follow this documentation for more details: https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsAddNodeToCluster.html
Currently, we are facing a problem, suppose that we have GCP and Azure as our cloud solution, we want to set up the cassandra cluster like this,
3 nodes on GCP as the cluster, data is stored by setting replicators
1 node on Azure(only hot copy) to keep all data for 3 nodes on GCP.
Is this doable?
Yes, it should be possible declaring them as two different datacenters. Cassandra allows you to have different replication factors depending on the DC used:
replication = {'class': 'NetworkTopologyStrategy', 'gcp1': 3, 'azure1': 1};
This is a use case similar to the one explained here
There are gotchas, first, you need to ensure that the communication between the datacenters can be established; as the traffic will go through the internet, it will be highly recommendable that SSL encryption is enabled for the communication between datacenters.
The gcp1 datacenter should be set to use the GoogleCloudSnitch snitch, the definition of the topology is done in cassandra-rackdc.properties.
For the azure1 datacenter in Azure can be tricky as it should run in virtual machines, you can find a way to set this up here, note that the snitch in this DC will be GossipingPropertyFileSnitch, and the topology is defined in cassandra-topology.properties.
The instructions of how to add the second datacenter can be found here.
Is there a possibility to write to a particular node using datastax driver?
For example, I have three nodes in datacenter 1 and three nodes in datacenter 2.
Existing
If i build up the cluster with any one of them as seed, all the nodes will get detected by the datastax java driver. So, in this case, if i insert a data using driver, it will automatically choose one of the nodes and proceed with it as the co-ordinator(preferably local data center)
Requirement
I want a way to contact any node in datacenter 2 and hand over the co-ordinator job to one of the nodes in datacenter 2.
Why i need this
I am trying to use the trigger functionality from datacenter 2 alone. Since triggers are taken care by co-ordinator , i want a co-ordinator to be selected from datacenter 2 so that data center 1 doesnt have to do this operation.
You may be able to use the DCAwareRoundRobinPolicy load balancing policy to achieve this by creating the policy such that DC2 is considered the "local" DC.
Cluster.Builder builder = Cluster.builder().withLoadBalancingPolicy(new DCAwareRoundRobinPolicy("dc2"));
In the above example, remote (non-DC2) nodes will be ignored.
There is also a new WhiteListPolicy in driver version 2.0.2 that wraps another load balancing policy and restricts the nodes to a specific list you provide.
Cluster.Builder builder = Cluster.builder().withLoadBalancingPolicy(new WhiteListPolicy(new DCAwareRoundRobinPolicy("dc2"), whiteList));
For multi-DC scenarios Cassandra provides EACH and LOCAL consistency levels where EACH will acknowledge successful operation in each DC and LOCAL only in local one.
If I understood correctly, what you are trying to achieve is DC failover in your application. This is not a good practice. Let's assume your application is hosted in DC1 alongside with Cassandra. If DC1 goes down, your entire application is unavailable. If DC2 goes down, your application still can write with LOCAL CL and C* will replicate changes when DC2 is back.
If you want to achieve HA, you need to deploy application in each DC, use CL=LOCAL_X and finally do failover on DNS level (e.g. using AWS Route53).
See data consistency docs and this blog post for more info about consistency levels for multiple DCs.