How do you configure Apache Cassandra to allow for disaster recovery, to allow for one of two data-centres to fail?
The DataStax documentation talks about using a replication strategy that ensures at least one replication is written to each of your two data-centres. But I don't see how that helps once the disaster has actually happened. If you switch to the remaining data-centre, all your writes will fail because those writes will not be able to replicate to the other data-centre.
I guess you would want your software to operate in two modes: normal mode, for which writes must replicate across both data-centres, and disaster mode, for which they need not. But changing replication strategy does not seem possible.
What I really want is two data-centres that are over provisioned, and during normal operations use the resources of both data-centres, but use the resources of only the one remaining data-centre (with reduced performance) when only one data-centre is functioning.
The trick is to vary the consistency setting given through the API for writes, instead of varying the replication factor. Use the LOCAL_QUORUM setting for writes during a disaster, when only one data-centre is available. During normal operation use EACH_QUORUM to ensure both data-centres have a copy of the data. Reads can use LOCAL_QUORUM all the time.
Here is a summary of the Datastax documentation for multiple data centers and the older but still conceptionally relevant disaster recovery (0.7).
Make a recipe to suite your needs with the two consistencies LOCAL_QUORUM and EACH_QUORUM.
Here, “local” means local to a single data center, while “each” means consistency is strictly maintained at the same level in each data center.
Suppose you have 2 datacenters, one used strictly for disaster recovery then you could set the replication factor to...
3 for the primary write/read center, and two for the failover data center
Now depending how critical it is that your data is actually written to the disaster recovery nodes, you can either use EACH_QUORUM or LOCAL_QUORUM. Assuming you are using a replication placement strategy NetworkTopologyStrategy (NTS),
LOCAL_QUORUM on writes will only delay the client to write locally to the DC1 and asynchronously write to your recovery node(s) in DC2.
EACH_QUORUM will ensure that all data is replicated but will delay writes until both DCs confirm successful operations.
For reads it's likely best to just use LOCAL_QUORUM to avoid inter-data center latency.
There are catches to this approach! If you choose to use EACH_QUORUM on your writes you increase the potential failure points (DC2 is down, DC1-DC2 link is down, DC1 quorum can't be met).
The bonus is once your DC1 goes down, you have a valid DC2 disaster recovery. Also note in the 2nd link it talks about custom snitch settings for routing your IPs properly.
Related
We are evaluating Hazelcast for one our use cases and i have a doubt regarding the replication in hazelcast.
It is mentioned in http://docs.hazelcast.org/docs/latest-development/manual/html/Distributed_Data_Structures/Map/Backing_Up_Maps.html that "Backup operations are synchronous, so when a map.put(key, value)returns, it is guaranteed that the map entry is replicated to one other member".
But in another page http://docs.hazelcast.org/docs/latest-development/manual/html/Consistency_and_Replication_Model.html, it is mentioned "Two types of backup replication are available: sync and async. Despite what their names imply, both types are still implementations of the lazy (async) replication model".
Both these statements look a bit contradictory. Can someone please throw some light onto this?
Is replication in Hazelcast truly synchronous? I need to have the values updated in both the owner and backup nodes together.
The explanation in here is more correct. In the context of CAP theorem, Hazelcast is an AP product. Thus, Best-Effort Consistency is aimed on replication and both sync and async backups are implementations of the lazy replication model. As it is explained in the page; the difference between two options is;
in sync backups, the caller block until backup updates are applied by backup replicas and acknowledgments are sent back to the caller
the async backups works as fire & forget.
Below, please see the part from Hazelcast Reference Manual:
Hazelcast's replication technique enables Hazelcast clusters to offer high throughput. However, due to temporary situations in the system, such as network interruption, backup replicas can miss some updates and diverge from the primary. Backup replicas can also hit long GC pauses or VM pauses, and fall behind the primary, which is a situation called as replication lag. If a Hazelcast partition primary replica member crashes while there is a replication lag between itself and the backups, strong consistency of the data can be lost.
Is it possible to do Cassandra data replication into another server instance to run read only data operations on it? As we have explored SAN and it become more hardware expensive
Note: I am not allowed to copy data into file and therefore. It should be like mirroring of data.
You can use Cassandras internal replication for this. Use NetworkTopologyAwarePolicy and configure a second datacenter where data will be replicated to (can be lower than your production cluster). Then use this datacenter for your read-only workload and the other one for production.
Your application needs to be reconfigured to use LOCAL_QUORUM or another LOCAL consistency level so the second datacenter isn't used for requests.
This technique is uses for separating resource demanding analytic workloads from the rest for example.
I have a cassandra cluster having 3 nodes with replication factor of 2. But what would happen if the entire cassandra cluster goes down at the same time. How read and write can be manage in this situation and what would be the best consistency level so that i can manage my cassandra nodes for high availability, As of now i'm using QUORUM.
If your cluster is down on all nodes - it is down.
When you need HA, think of deploying more than one datacenter, so availability can be maintained even when an entire datacenter/rack goes down.
If you can live with stale data, you could use CL.ONE instead - you need only one node to respond.
More replicas also increases availability for CL.QUORUM - you need RF/2+1 nodes from your replicas alive, in case of 2 -> 2/2+1 = 2 or all your replicas need to be online. With RF=3 you sill only need 2 as 3/2+1 = 2 - now you can have one node down.
As for your writes - all acknowleged writes will be written to disk in the commitlog (if there is no caching issue on your disks) and restored when coming back online. Of course there may be a race condition where the changes are written to disk but not acked via network.
Keep in mind to setup NTP!
My application crawls user's mailbox and saves it to an RDBMS database. I started using Redis as a cache (simple key-value store) for RDBMS database. But gradually I started storing crawler states and other data in Redis that needs to be persistent. Loosing this data means a few hours of downtime. I must ensure airtight consistency for this data. The data should not be lost in node failures or split brain scenarios. Strong consistency is a must. Sharding is done by my application. One Redis process runs on each of ten EC2 m4.large instances. On each of these instances. I am doing up to 20K IOPS to Redis. I am doing more writes than reads, though I have not determined the actual percentage of both. All my data is completely in memory, not backed by disk.
My only problem is each of these instances are SPOF. I cannot use Redis cluster as it does not guarantee consistency. I have evaluated a few more tools like Aerospike, none gives 'No data loss guarantee'.
Cassandra looks promising as I can tune the consistency level I want. I plan to use Cassandra with a replication factor 2, and a write must be written to both the replicas before considered committed. This gives 'No data loss guarantee.
By launching enough cassandra nodes (ssd backed) can I replace my Redis key-value store and still get similar read/write IOPS and
latency? Will opensource cassandra suffice my use case? If not, will the Datastax enterprise In-Memory version solve it?
EDIT 1:
A bit of clarification:
I think I need to use Write consistency level 'ALL' and Read consistency level 'One'. I understand that with this consistency level my cluster will not tolerate any failure. That is OK for me. A few minutes of downtime occasionally is not a problem, as long as my data is consistent. In my present setup, one Redis instance failure causes a few hours of downtime.
I must ensure airtight consistency for this data.
Cassandra deals with failure better when there are more nodes. Assuming your case allows for having more nodes, this is my suggestion.
So, if you have 5 nodes, use CL of QUORUM for both READ and WRITE. What it means is that you always write to at least 3 nodes and read from 3 nodes.(for 5 nodes , QUORUM is 3).
This ensures a very high level consistency
Also ensures limited downtime. Even if a node is down your writes and reads won't break.
If you use CL ALL, then even if one node is down or overloaded, you will have to take a full downtime.
I hope it helps!
I have a three node Cassandra cluster with version 2.0.5.
RF=3 and all data is synced to all three nodes.
I read from cqlsh with Consistency=ONE.
When I bring down two of the nodes my reads are twice as fast than when I have the entire cluster up.
Tracing from cqlsh shows that the slow down on the reads with a full cluster up occurs when a request is forwarded to other nodes.
All nodes are local to the same datacenter and there is no other activity on the system.
So, why are requests sometimes forwarded to other nodes?
Even for the exact same key if I repeat the same query multiple times I see that sometimes the query executes on the local node and sometimes it gets forwarded and then becomes very slow.
Assuming that the cluster isn't overloaded, Cassandra should always prefer to do local reads when possible. Can you create a bug report at https://issues.apache.org/jira/browse/CASSANDRA ?
This is due to read repair.
By default read repair is applied for all the read with consistency level quorum or with 10% chance for lower consistency levels, that's why for consistency level one sometimes you see more activity and sometime less activity.