Possibility to discard region in multi region YugabyteDB cluster - yugabytedb

[Question posted by a user on YugabyteDB Community Slack]
A question that came in my recent large-scale tests, say that I have a multi-regional cluster and I lose a complete region, with volumes and everything, say the thing was obliterated by a meteorite.
Is there any way to tell the complete system hey, this region is never coming back, please discard it and continue working?

This is not available with a single command. Say it's a 3 region cluster. You can't "normally" continue working because you're down to 2 regions. The cluster will be online, but since all those nodes are lost you'll have to add new nodes to the region to have 3 available regions again.
This can be done by discarding all nodes in the down region one by one and adding new nodes to the same region.
For a step by step guide check this doc: https://docs.yugabyte.com/latest/manage/change-cluster-config/

Related

Increasing disk size in YugabyteDB cluster

[Question posted by a user on YugabyteDB Community Slack]
We have created a universe with 5 data nodes(airgap installed) and 3 as the replication factor. Now we are planning to increase the disk size in all 5 nodes and increase the mount point space as well which is in use without any db downtime...can anyone please suggest the steps
Ideally, you should add new nodes with bigger disks to the cluster rather then changing disk size of the node.
You could just stop the processes, do the upgrade, bring back the processes, and wait for data movement (if any) to finish.
For a safer approach, you can make it safer by blacklisting the node (tserver and master) first to drain the node of tablet leaders.
You can refer to some commands for this here.
https://docs.yugabyte.com/preview/manage/change-cluster-config/

How to distribute yb-masters in multi-region deployment in YugabyteDB

[Question posted by a user on YugabyteDB Community Slack]
In terms of numbers of yb-master, they should be as many as the replication factor. My question is, is having master and servers running in all the nodes a bad policy?
And, if we have a multidc deployment, should we have at least 1 master on each dc?
I guess the best is to accommodate the leader of yb-master in DC, which is going to be the main workload (if there is any) right?
It's perfectly normal to colocate an yb-tserver and yb-master in the same server. But in large deployments, it's better for them to be on separate servers for splitting workloads (so heavy usage on yb-tserver won't interfere with yb-master).
And, if you have a multidc deployment, then you should deploy 1 in each region, so that you have region failover for the yb-masters too.
For a situation with YB to be usable, you have to have 2 out of 3 masters, so indeed with a 2DC situation, you cannot build a situation where you always have availability, because you have to have 2 masters at one place and 1 at the other. So the only solution for high availability is 3DC.
Do 3DC with the same number of nodes in each DC, so you will end up with a total of 3, 6, 9, etc. nodes.A master should be in each D.C., if not you will again lose resilience.
I guess the best is to accommodate the leader of yb-master in DC,
which is going to be the main workload (if there is any) right?
In this case you can set 1 region/zone as the preferred one and the database will try to put leaders there automatically using set-preferred-zones https://docs.yugabyte.com/latest/admin/yb-admin/#set-preferred-zones

Records not showing until Azure Databricks cluster restarted

We have been using Azure Databricks / Delta lake for the last couple of months and recently have started to spot some strange behaviours with loaded records, in particular latest records not being returned unless the cluster is restarted or a specific version number is specified.
For example (returns no records)
df_nw = spark.read.format('delta').load('/mnt/xxxx')
display(df_nw.filter("testcolumn = ???"))
But this does
%sql
SELECT * FROM delta.`/mnt/xxxx` VERSION AS OF 472 where testcolumn = ???
As mentioned above this only seems to be effecting newly inserted records. Has anyone else come across this before?
Any help would be appreciated.
Thanks
Col
Check to see if you've set a staleness limit. If you have, this is expected, if not, please create a support ticket.
https://docs.databricks.com/delta/optimizations/file-mgmt.html#manage-data-recency
Just in case anyone else is having a similar problem, I thought it would be worth sharing the solution I accidentally stumbled across.
Over the last week I was encountering issues with our Databricks cluster, whereby the spark drivers kept crashing with resource intensive workloads. After a lot of investigations, it turned out that our cluster was in Standard (Single User) mode. So, I spun up a new High Concurrency cluster.
The issue was still occasionally appearing on the High Concurrency cluster, so I decided to flip the notebook to the old cluster, which was still in an active state, and the newly loaded data was there to be queried. This led me to believe that Databricks / Spark Engine was not refreshing the underlying data set and using a previously cached version of the data even though I hadn’t explicitly cached the underlying data set.
By running %sql CLEAR CACHE the data appeared as expected.

Cassandra 2.1 changing snitch from EC2Snitch to GossipingPropertyFileSnitch

Currently we have used EC2Snitch using two AZs in a single AWS region. The goal was to provide resiliency even when one AZ is not available. Most data are replicated with RF=2, so each AZ gets a copy based on Ec2Snitch.
Now we have come to a conclusion to move to GossipingPropertyFileSnitch. Reason primarily is that we have realized that one AZ going down is a remote occurrence and even if it happens, there are other systems in our stack that don't support this; so eventually whole app goes down if that happens.
Other reason is that with EC2Snitch and two AZs, we had to scale in factor of 2 (one in each AZ). With GossipingPropertyFileSnitch using just one rack, we can scale in factor of 1.
When we change this snitch setting, will the topology change? I want to avoid having a need to run nodetool repair. We always had failures with running nodetool repair and it runs forever.
Whether the topology changes depends on how you carry out the change. If you assign the same logical dc and rack to the node as what it's currently configured to, you shouldn't get a topology change.
You have to match the rack to the AZ after updating to GossipingPropertyFileSnitch. You need to do a rolling restart for the re-configuration to take place.
Example cassandra-rackdc.properties for 2 nodes in 1 dc across 2 AZs:
# node=10.0.0.1, dc=first, AZ=1
dc_suffix=first
# Becomes
dc=first
rack=1
# node=10.0.0.2, dc=first, AZ=2
dc_suffix=first
# Becomes
dc=first
rack=2
On a side note you need to explore why repairs are failing. Unfortunately they are very important for cluster health.

Can we backup only one availability zone for AZ replicated Cassandra cluster

Since my Cassandra cluster is replicated across three availability zones, I would like to backup only one availability zone to lower the backup costs. I have also experimented restoring nodes in a single availability zone and got back most of my data in a test environment. I would like to know if there are any drawbacks to this approach before deploying this solution in production. Is anyone following this approach in your production clusters?
Note: As I backup at regular intervals, I know that I may loose updates happened to other two AZ nodes quorum at the time of snapshot but that's not a problem.
You can backup only specific dc, or even nodes.
AFAIK, the only drawback is does your data consistent/up-to-date, and since you can afford to lose some data it shouldn't be a problem. And if you, for example performing writes with ALL consistency level, the data should be up-to-date on all nodes.
BUT, you must be sure that your data is indeed replicated between multi a-z, by playing with rack/dc properties or using ec2 switch that supports multi a-z.
EDIT:
Global Snapshot
Running nodetool snapshot is only run on a single node at a time.
This only creates a partial backup of your entire data. You will want
to run nodetool snapshot on all of the nodes in your cluster. But
it’s best to run them at the exact same time, so that you don’t have
fragmented data from a time perspective. You can do this a couple of
different ways. The first, is to use a parallel ssh program to
execute the nodetool snapshot command at the same time. The second,
is to create a cron job on each of the nodes to run at the same time.
The second assumes that your nodes have clocks that are in sync, which
Cassandra relies on as well.
Link to the page:
http://datascale.io/backing-up-cassandra-data/

Resources