Cassandra datacenters replication advanced usage - cassandra

For a project, we use a Cassandra cluster in order to have fast reads/writes on a large number of (column oriented) generated data.
Until now, we only had 1 datacenter for prototyping.
We now plan to split our cluster in 2 datacenters to meet performance requirements (the data transfer between both datacenter is quite slow):
datacenter #1 : located near our data producer services : intensively writes all data in Cassandra periodically (each writes has a “run_id” column in its primary key)
datacenter #2 : located near our data consumer services: intensively reads all data produced by datacenter #1 for a given “run _id”.
However, we would like our consumer services to access data only in the datacenter near them (datacenter #2) and when all data for a given “run_id” have been completely replicated from datacenter #1 (data generated by the producer services).
My question is : how can we ensure that all data have been replicated in datancenter #2 before telling producer services (near datacenter #2) to start using them ?
Our best solutions so far (but still not good enough :-P):
producer services (datacenter #1) writes in consistency “all”. But this leads to poor partitioning failure tolerance AND really bad writes performances.
producer services (datacenter #1) writes in consistency “local_quorum” and a last “run finished” value could be written in consistency “all”. But it seems Cassandra does not ensure replication ordering.
Do you have any suggestion ?
Thanks a lot,
Fabrice

It seems there is no silver bullet to this issue.
We managed to use a single datacenter for our applications. We will use another one but as a backup and possibly in a downgrading manner.

Related

Cassandra Database replication to another instance

Is it possible to do Cassandra data replication into another server instance to run read only data operations on it? As we have explored SAN and it become more hardware expensive
Note: I am not allowed to copy data into file and therefore. It should be like mirroring of data.
You can use Cassandras internal replication for this. Use NetworkTopologyAwarePolicy and configure a second datacenter where data will be replicated to (can be lower than your production cluster). Then use this datacenter for your read-only workload and the other one for production.
Your application needs to be reconfigured to use LOCAL_QUORUM or another LOCAL consistency level so the second datacenter isn't used for requests.
This technique is uses for separating resource demanding analytic workloads from the rest for example.

Can a Cassandra cluster serve as a replacement for an in-memory Redis key-value store?

My application crawls user's mailbox and saves it to an RDBMS database. I started using Redis as a cache (simple key-value store) for RDBMS database. But gradually I started storing crawler states and other data in Redis that needs to be persistent. Loosing this data means a few hours of downtime. I must ensure airtight consistency for this data. The data should not be lost in node failures or split brain scenarios. Strong consistency is a must. Sharding is done by my application. One Redis process runs on each of ten EC2 m4.large instances. On each of these instances. I am doing up to 20K IOPS to Redis. I am doing more writes than reads, though I have not determined the actual percentage of both. All my data is completely in memory, not backed by disk.
My only problem is each of these instances are SPOF. I cannot use Redis cluster as it does not guarantee consistency. I have evaluated a few more tools like Aerospike, none gives 'No data loss guarantee'.
Cassandra looks promising as I can tune the consistency level I want. I plan to use Cassandra with a replication factor 2, and a write must be written to both the replicas before considered committed. This gives 'No data loss guarantee.
By launching enough cassandra nodes (ssd backed) can I replace my Redis key-value store and still get similar read/write IOPS and
latency? Will opensource cassandra suffice my use case? If not, will the Datastax enterprise In-Memory version solve it?
EDIT 1:
A bit of clarification:
I think I need to use Write consistency level 'ALL' and Read consistency level 'One'. I understand that with this consistency level my cluster will not tolerate any failure. That is OK for me. A few minutes of downtime occasionally is not a problem, as long as my data is consistent. In my present setup, one Redis instance failure causes a few hours of downtime.
I must ensure airtight consistency for this data.
Cassandra deals with failure better when there are more nodes. Assuming your case allows for having more nodes, this is my suggestion.
So, if you have 5 nodes, use CL of QUORUM for both READ and WRITE. What it means is that you always write to at least 3 nodes and read from 3 nodes.(for 5 nodes , QUORUM is 3).
This ensures a very high level consistency
Also ensures limited downtime. Even if a node is down your writes and reads won't break.
If you use CL ALL, then even if one node is down or overloaded, you will have to take a full downtime.
I hope it helps!

What are the differences between a node, a cluster and a datacenter in a cassandra nosql database?

I am trying to duplicate data in a cassandra nosql database for a school project using datastax ops center. From what I have read, there is three keywords: cluster, node, and datacenter, and from what I have understand, the data in a node can be duplicated in another node, that exists in another cluster. And all the nodes that contains the same (duplicated) data compose a datacenter. Is that right?
If it is not, what is the difference?
The hierarchy of elements in Cassandra is:
Cluster
Data center(s)
Rack(s)
Server(s)
Node (more accurately, a vnode)
A Cluster is a collection of Data Centers.
A Data Center is a collection of Racks.
A Rack is a collection of Servers.
A Server contains 256 virtual nodes (or vnodes) by default.
A vnode is the data storage layer within a server.
Note: A server is the Cassandra software. A server is installed on a machine, where a machine is either a physical server, an EC2 instance, or similar.
Now to specifically address your questions.
An individual unit of data is called a partition. And yes, partitions are replicated across multiple nodes. Each copy of the partition is called a replica.
In a multi-data center cluster, the replication is per data center. For example, if you have a data center in San Francisco named dc-sf and another in New York named dc-ny then you can control the number of replicas per data center.
As an example, you could set dc-sf to have 3 replicas and dc-ny to have 2 replicas.
Those numbers are called the replication factor. You would specifically say dc-sf has a replication factor of 3, and dc-ny has a replication factor of 2. In simple terms, dc-sf would have 3 copies of the data spread across three vnodes, while dc-sf would have 2 copies of the data spread across two vnodes.
While each server has 256 vnodes by default, Cassandra is smart enough to pick vnodes that exist on different physical servers.
To summarize:
Data is replicated across multiple virtual nodes (each server contains 256 vnodes by default)
Each copy of the data is called a replica
The unit of data is called a partition
Replication is controlled per data center
A node is a single machine that runs Cassandra. A collection of nodes holding similar data are grouped in what is known as a "ring" or cluster.
Sometimes if you have a lot of data, or if you are serving data in different geographical areas, it makes sense to group the nodes of your cluster into different data centers. A good use case of this, is for an e-commerce website, which may have many frequent customers on the east coast and the west coast. That way your customers on the east coast connect to your east coast DC (for faster performance), but ultimately have access to the same dataset (both DCs are in the same cluster) as the west coast customers.
More information on this can be found here: About Apache Cassandra- How does Cassandra work?
And all the nodes that contains the same (duplicated) data compose a datacenter. Is that right?
Close, but not necessarily. The level of data duplication you have is determined by your replication factor, which is set on a per-keyspace basis. For instance, let's say that I have 3 nodes in my single DC, all storing 600GB of product data. My products keyspace definition might look like this:
CREATE KEYSPACE products
WITH replication = {'class': 'NetworkTopologyStrategy', 'MyDC': '3'};
This will ensure that my product data is replicated equally to all 3 nodes. The size of my total dataset is 600GB, duplicated on all 3 nodes.
But let's say that we're rolling-out a new, fairly large product line, and I estimate that we're going to have another 300GB of data coming, which may start pushing the max capacity of our hard drives. If we can't afford to upgrade all of our hard drives right now, I can alter the replication factor like this:
CREATE KEYSPACE products
WITH replication = {'class': 'NetworkTopologyStrategy', 'MyDC': '2'};
This will create 2 copies of all of our data, and store it in our current cluster of 3 nodes. The size of our dataset is now 900GB, but since there are only two copies of it (each node is essentially responsible for 2/3 of the data) our size on-disk is still 600GB. The drawback here, is that (assuming I read and write at a consistency level of ONE) I can only afford to suffer a loss of 1 node. Whereas with 3 nodes and a RF of 3 (again reading and writing at consistency ONE), I could lose 2 nodes and still serve requests.
Edit 20181128
When I make a network request am I making that against the server? or the node? Or I make a request against the server does it then route it and read from the node or something else?
So real quick explanation: server == node
As far as making a request against the nodes in your cluster, that behavior is actually dictated from the driver on the application side. In fact, the driver maintains a copy of the current network topology, as it reads the cluster gossip similar to how the nodes do.
On the application side, you can set a load balancing policy. Specifically, the TokenAwareLoadBalancingPolicy class will examine the partition key of each request, figure out which node(s) has the data, and send the request directly there.
For the other load balancing policies, or for queries where a single partition key cannot be determined, the request will be sent to a single node. This node will act as a "coordinator." This chosen node will handle the routing of requests to the nodes responsible for them, as well as the compilation/returning of any result sets.
Node:
A machine which stores some portion of your entire database. This may included data replicated from another node as well as it's own data. What data it is responsible for is determined by it's token ranges, and the replication strategy of the keyspace holding the data.
Datacenter:
A logical grouping of Nodes which can be separated from another nodes. A common use case is AWS-EAST vs AWS-WEST. The replication NetworkTopologyStrategy is used to specify how many replicas of the entire keyspace should exist in any given datacenter. This is how Cassandra users achieve cross-dc replication. In addition their are Consistency Level policies that only require acknowledgement only within the Datacenter of the coordinator (LOCAL_*)
Cluster
The sum total of all the machines in your database including all datacenters. There is no cross-cluster replication.
As per below documents:-
https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/architecture/archIntro.html
Node
Where you store your data. It is the basic infrastructure component of Cassandra.
Datacenter
A collection of related nodes. A datacenter can be a physical datacenter or virtual datacenter. Different workloads should use separate datacenters, either physical or virtual. Replication is set by datacenter. Using separate datacenters prevents Cassandra transactions from being impacted by other workloads and keeps requests close to each other for lower latency. Depending on the replication factor, data can be written to multiple datacenters. datacenters must never span physical locations.
Cluster
A cluster contains one or more datacenters. It can span physical locations.

Configure cassandra to use different network interfaces for data streaming and client connection?

I have a cassandra cluster deployed with 3 cassandra nodes with replication factor of 3. I have a lot of data being written to cassandra on daily basis (10-15GB). I have provisioned these cassandra on commodity hardware as suggested by "Big data community" and I am expecting the nodes to go down frequently which is handled using redundancy provided by cassandra.
My problem is, I have observed cassandra to slow down with writes when a new node is provisioned and the data is being streamed while bootstrapping. So, to overcome this hurdle, We have decided to have a separate network interface for inter-node communication and for client application to write data to cassandra. My question is how can this be configured, if at all this is possible ?
Any help is appreciated.
I think you are chasing the wrong solution.
I am confused by the fact that you only have 3 nodes, yet your concern is around slow writes while bootstrapping. Why? Are you planning to grow your cluster regularly? What is your consistency level on write, as this has a big impact on performance? Obviously if you only have 2 or 3 nodes and you're trying to bootstrap, you will see a slowdown, because you're tying up a significant percentage of your cluster to do the streaming.
Note that "commodity hardware" doesn't mean cheap, low-performance hardware. It just means you don't need the super high-end database-class machines used for databases like Oracle. You should still use really good commodity hardware. You may also need more nodes, as setting RF equal to cluster size is not typically a great idea.
Having said that, you can set your listen_address to the inter-node interface and rpc_address to the client address if you feel that will help.

Configuring Apache Cassandra for Disaster Recovery

How do you configure Apache Cassandra to allow for disaster recovery, to allow for one of two data-centres to fail?
The DataStax documentation talks about using a replication strategy that ensures at least one replication is written to each of your two data-centres. But I don't see how that helps once the disaster has actually happened. If you switch to the remaining data-centre, all your writes will fail because those writes will not be able to replicate to the other data-centre.
I guess you would want your software to operate in two modes: normal mode, for which writes must replicate across both data-centres, and disaster mode, for which they need not. But changing replication strategy does not seem possible.
What I really want is two data-centres that are over provisioned, and during normal operations use the resources of both data-centres, but use the resources of only the one remaining data-centre (with reduced performance) when only one data-centre is functioning.
The trick is to vary the consistency setting given through the API for writes, instead of varying the replication factor. Use the LOCAL_QUORUM setting for writes during a disaster, when only one data-centre is available. During normal operation use EACH_QUORUM to ensure both data-centres have a copy of the data. Reads can use LOCAL_QUORUM all the time.
Here is a summary of the Datastax documentation for multiple data centers and the older but still conceptionally relevant disaster recovery (0.7).
Make a recipe to suite your needs with the two consistencies LOCAL_QUORUM and EACH_QUORUM.
Here, “local” means local to a single data center, while “each” means consistency is strictly maintained at the same level in each data center.
Suppose you have 2 datacenters, one used strictly for disaster recovery then you could set the replication factor to...
3 for the primary write/read center, and two for the failover data center
Now depending how critical it is that your data is actually written to the disaster recovery nodes, you can either use EACH_QUORUM or LOCAL_QUORUM. Assuming you are using a replication placement strategy NetworkTopologyStrategy (NTS),
LOCAL_QUORUM on writes will only delay the client to write locally to the DC1 and asynchronously write to your recovery node(s) in DC2.
EACH_QUORUM will ensure that all data is replicated but will delay writes until both DCs confirm successful operations.
For reads it's likely best to just use LOCAL_QUORUM to avoid inter-data center latency.
There are catches to this approach! If you choose to use EACH_QUORUM on your writes you increase the potential failure points (DC2 is down, DC1-DC2 link is down, DC1 quorum can't be met).
The bonus is once your DC1 goes down, you have a valid DC2 disaster recovery. Also note in the 2nd link it talks about custom snitch settings for routing your IPs properly.

Resources