Using read-replicas in YugabyteDB for DR - yugabytedb

[Question posted by a user on YugabyteDB Community Slack]
I’m trying to understand the tolerance failure in YugabyteDB.
My scenario is as follows:
Universe is setup with primary data cluster and 1 read replica cluster, max_stale_read_bound_time_ms = 60.
And the primary data cluster got wipe out (lost all data).
Questions:
Would we be able to rebuild the primary data cluster with the read replica cluster?
Can the read replica cluster become the primary data cluster?

The answer is no to both.
xCluster is what you want to use for DR.
The design point for read-replicas in YugabyteDB is not for DR, but rather to bring data closer to where data is being read from. And read replicas have no yb-master. Without a yb-master, one cannot read from it.

Related

Different versions of read-replica and primary-cluster in YugabyteDB

[Question posted by a user on YugabyteDB Community Slack]
Is it possible to have the primary data cluster and read replica be at different versions?
primary data # 2.1 and replicate cluster # 2.8
Ideally, no. During software upgrades, temporarily different nodes can be on different releases. But in a steady state, the read-replica nodes should be the same version as the primary cluster.
With xCluster, where the two clusters are really independent clusters linked with an async replication channel, they can be on different releases for extended periods of time.

Restoring data in a cluster with different number of nodes in YugabyteDB

[Question posted by a user on YugabyteDB Community Slack]
If I need to restore snapshots from one cluster with(4 nodes) to another cluster with(3 nodes).
How will we do that,as per documentation?
Data of 1st node should be restored on the 1st node of the other cluster.
Similarly for 2 more nodes we can do.
What will we do with the data of the remaining one node of 1st cluster?
Data is snapshotted/restored per-tablet. The new cluster will distribute the tablets on fewer nodes in your case and each node will have more tablets. You just have to go over all the nodes and restore each tablet according to how they were distributed when you imported the snapshot file. Doesn't matter the number of nodes.
More details at: https://docs.yugabyte.com/latest/manage/backup-restore/snapshot-ysql/

Set wal_level = logical instead of replica in YugabyteDB?

[Question posted by a user on YugabyteDB Community Slack]
Is there a way to set wal_level = logical instead of replica in YugabyteDB? I’m looking to implement CDC so I can synchronize other systems with the changes.
Note that YugabyteDB doesn’t reuse the PostgreSQL storage engine & replication mechanism, but has built its own. And it doesn’t support setting wal_level=logical.
Currently, CDC support is in the works https://github.com/yugabyte/yugabyte-db/issues/9019.

How cassandra improve performance by adding nodes?

I'm going build apache cassandra 3.11.X cluster with 44 nodes. Each application server will have one cluster node so that application do r/w locally.
I have couple of questions running in my mind kindly answer if possible.
1.How many server Ip's should mention in seednode parameter?
2.How HA works when all the mentioned seed node goes down?
3.What is the dis-advantage to mention all the serverIP's in seednode parameter?
4.How cassandra scales with respect to data other than(Primary key and Tunable consistency). As per my assumption replication factor can improve HA chances but not performances.
then how performance will increase by adding more nodes?
5.Is there any sharding mechanism in Cassandra.
Answers are in order:
It's recommended to point to at least to 2 nodes per DC
Seed/contact node is used only for initial bootstrap - when your program reaches any of listed nodes, it "learns" the topology of whole cluster, and then driver listens for nodes status change, and adjust a list of available hosts. So even if seed node(s) goes down after connection is already established, driver will able to reach other nodes
it's harder to maintain usually - you need to keep a configuration parameters for your driver & list of nodes in sync.
When you have RF > 1, Cassandra may read or write data from/to any replica. Consistency level regulates how many nodes should return answer for read or write operation. When you add the new node, the data is redistributed to new node, and if you have correctly selected partition key, then new node start to receive requests in parallel to old nodes
Partition key is responsible for selection of replica(s) that will hold data associated with it - you can see it as a shard. But you need to be careful with selection of partition key - it's easy to create too big partitions, or partitions that will be "hot" (receiving most of operations in cluster - for example, if you're using the date as partition key, and always writing reading data for today).
P.S. I would recommend to read DataStax Architecture guide - it contains a lot of information about Cassandra as well...

Cassandra datacenters replication advanced usage

For a project, we use a Cassandra cluster in order to have fast reads/writes on a large number of (column oriented) generated data.
Until now, we only had 1 datacenter for prototyping.
We now plan to split our cluster in 2 datacenters to meet performance requirements (the data transfer between both datacenter is quite slow):
datacenter #1 : located near our data producer services : intensively writes all data in Cassandra periodically (each writes has a “run_id” column in its primary key)
datacenter #2 : located near our data consumer services: intensively reads all data produced by datacenter #1 for a given “run _id”.
However, we would like our consumer services to access data only in the datacenter near them (datacenter #2) and when all data for a given “run_id” have been completely replicated from datacenter #1 (data generated by the producer services).
My question is : how can we ensure that all data have been replicated in datancenter #2 before telling producer services (near datacenter #2) to start using them ?
Our best solutions so far (but still not good enough :-P):
producer services (datacenter #1) writes in consistency “all”. But this leads to poor partitioning failure tolerance AND really bad writes performances.
producer services (datacenter #1) writes in consistency “local_quorum” and a last “run finished” value could be written in consistency “all”. But it seems Cassandra does not ensure replication ordering.
Do you have any suggestion ?
Thanks a lot,
Fabrice
It seems there is no silver bullet to this issue.
We managed to use a single datacenter for our applications. We will use another one but as a backup and possibly in a downgrading manner.

Resources