How to change GlusterFS replica 2 to replica 3 with arbiter 1? - glusterfs

GlusterFS 3.7 introduced arbiter volume which it is a 3-way replication where the third brick is the arbiter.
How does one change from 2-way replication to 3-way replication with arbiter?
I could not find any documentation of changing running replica 2 volume to arbiter volume.
Reference:
https://gluster.readthedocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/

Just sent the patch http://review.gluster.org/#/c/14502/ to add this functionality. If everything goes well, it should make it to the 3.8 release.

Related

Can we add a new Cassandra 3.11 DC, upgrade it to 4.0, then eventually get rid of the old DC?

We want to upgrade from cassandra 3.11.12 to cassandra 4.0.2 using multi DC replication.
We want to do that for easy and quick rollback (avoid a situation of backup\snap and restore).
Steps we think to do:
1. lock app to use (read & writes) only with old\current dc (using driver options).
3. create new dc with same version (3.11.12) - new dc will created with 16 num_tokens (today we are with 3.11 default 256 and want to move to 16).
3. sync all data\keyspaces to new dc - and keep the sync active to the 2 DC's.
4. upgrade the new dc from 3.11.12 to 4.0.2
5. after step 4 done, move app to use only new DC (version 4 after upgrade)
6. wait few days to see all going well
7. stop replication to old dc
8. remove old dc nodes until and stay only with the new DC (cassandra 4.0.2)
A. The main question is this process should work?
B. Is there a problem with move to 16 num_tokens this way?
C. Is it ok to keep sync the data\keyspaces between the 2 DCs for a few days while they are in different versions of Cassandra (dc1 in 3.11.12 , dc2 in 4.0.2)?
Note that I have seen that it is not recommended to have a cluster with a mix of cassandra versions but this is only for an upgrade process with a quick and simple rollback. The old DC with the old cassandra version will be removed after a few days when everything seems to work properly with the new version.
It is a valid upgrade path but bear in mind that there could be some disadvantages with your proposed approach. For example, if a node goes down (say for a hardware failure) then you won't be able to decommission it.
Any operation that requires streaming will not work in a mixed-version cluster. Those operations include bootstrap, decommission, repairs.
To answer your questions directly:
A. Yes, it will work but with some gotchas.
B. No, adding a new DC is the only way you can change the number of tokens.
C. Yes, replication is designed to work in mixed-versions.
To answer the question you didn't ask: rolling back an upgrade is actually quite rare in my experience. You would typically upgrade one node at a time. If you run into a problem on a node, you would fix that node then proceed with the rolling upgrade until all nodes in the cluster have been upgraded.
During the rolling upgrade, your application should continue to work and so there should be no reason to perform a rollback. Cheers!

Add datacenter with one node to backup existing one

I already have a working datacenter with 3 nodes (replication factor 2). I want to add another datacenter with only one node to have all backup data from existing datacenter. The final solution:
dc1: 3 nodes (2 rf)
dc2: 1 node (1 rf)
My application would then connect only to dc1 nodes and send data. If dc1 breaks down I can recover data from dc2 which is on the other physical machine in different location. I could also use dc2 for AI queries or some other task. I'm a newbie in case of cassandra configuration so I want to know if I'm not making some kind of a mistake in my thinking. I'm planing on using this configuration docs to add new dc: https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/operations/opsAddDCToCluster.html
Is there anything more I should keep in mind to get this to work or some easier solution to have data backup?
Update: It won't only be a backup, we wont to use this second DC for connecting application also when dc1 would be unavailable (ex. power outage).
Update: dc2 is running, I had some problems with coping data from one dc to other and nodetool status didn't show 2 dc's but after fixing firewall rules for port 7000 I managed to connect both dc's and share data between them.
with this approach, your single node will get 2 times more traffic than other nodes. Also, it may add a load to the nodes in dc1 because they will need to collect hints, etc. when node in dc2 is not available. If you need just backup, setup something like medusa, and store data in the cheap environment, like, S3 - but of course, it will require time to restore if you lose the whole DC.
But in reality, you need to think about your high-availability strategy - what will happen with your clients if you lose the primary DC? Is it critical to wait until recovery, or you're really requiring full fault tolerance? I recommend to read the Designing Fault-Tolerant Applications with DataStax and Apache Cassandraâ„¢ whitepaper from DataStax - it explains the details of designing really fault tolerant applications.

Cassandra 2.1 changing snitch from EC2Snitch to GossipingPropertyFileSnitch

Currently we have used EC2Snitch using two AZs in a single AWS region. The goal was to provide resiliency even when one AZ is not available. Most data are replicated with RF=2, so each AZ gets a copy based on Ec2Snitch.
Now we have come to a conclusion to move to GossipingPropertyFileSnitch. Reason primarily is that we have realized that one AZ going down is a remote occurrence and even if it happens, there are other systems in our stack that don't support this; so eventually whole app goes down if that happens.
Other reason is that with EC2Snitch and two AZs, we had to scale in factor of 2 (one in each AZ). With GossipingPropertyFileSnitch using just one rack, we can scale in factor of 1.
When we change this snitch setting, will the topology change? I want to avoid having a need to run nodetool repair. We always had failures with running nodetool repair and it runs forever.
Whether the topology changes depends on how you carry out the change. If you assign the same logical dc and rack to the node as what it's currently configured to, you shouldn't get a topology change.
You have to match the rack to the AZ after updating to GossipingPropertyFileSnitch. You need to do a rolling restart for the re-configuration to take place.
Example cassandra-rackdc.properties for 2 nodes in 1 dc across 2 AZs:
# node=10.0.0.1, dc=first, AZ=1
dc_suffix=first
# Becomes
dc=first
rack=1
# node=10.0.0.2, dc=first, AZ=2
dc_suffix=first
# Becomes
dc=first
rack=2
On a side note you need to explore why repairs are failing. Unfortunately they are very important for cluster health.

cassandra nodes are unresponsive and "Native-Transport-Requests" are high only on 2 nodes

We recently deployed micro-services into our production and these micro-service communicates with Cassandra nodes for reads/writes.
After deployment, we started noticing sudden drop in CPU to 0 on all cassandra nodes in primary DC. This is happening at least once per day. when this happens each time, we see randomly 2 nodes (in SAME DC) are not able to reachable to each other ("nodetool describecluster") and when we check "nodetool tpstats", these 2 nodes has higher number of ACTIVE Native-Transport-Requests b/w 100-200. Also these 2 nodes are storing HINTS for each other but when i do longer "pings" b/w them i don't see any packet loss. when we restart those 2 cassandra nodes, issue will be fixed at that moment. This is happening since 2 weeks.
We use Apache Cassandra 2.2.8.
Also microservices logs are having reads/writes timeouts before sudden drop in CPU on all cassandra nodes.
You might be using token aware load balancing policy on client, and updating a single partition or range heavily. In which case all the coordination load will be focused on the single replica set. Can change your application to use RoundRobin (or dc aware round robin) LoadBalancingPolicy and it will likely resolve. If it does you have a hotspot in your application and you might want to give attention to your data model.
It does look like a datamodel problem (hot partitions causing issues in specific replicas).
But in any case you might want to add the following to your cassandra-env.sh to see if it helps:
JVM_OPTS="$JVM_OPTS -Dcassandra.max_queued_native_transport_requests=1024"
More information about this here: https://issues.apache.org/jira/browse/CASSANDRA-11363

Way to determine healthy Cassandra cluster?

I've been tasked with re-writing some sub-par Ansible playbooks to stand up a Cassandra cluster in CentOS. Quite frankly, there doesn't seem to be much information on Cassandra out there.
I've managed to get the service running on all three nodes at the same time, using the following configuration file, info scrubbed.
HOSTIP=10.0.0.1
MSIP=10.10.10.10
ADMIN_EMAIL=my#email.com
LICENSE_FILE=/tmp/license.conf
USE_LDAP_REMOTE_HOST=n
ENABLE_AX=y
MP_POD=gateway
REGION=test-1
USE_ZK_CLUSTER=y
ZK_HOSTS="10.0.0.1 10.0.0.2 10.0.0.3"
ZK_CLIENT_HOSTS="10.0.0.1 10.0.0.2 10.0.0.3"
USE_CASS_CLUSTER=y
CASS_HOSTS="10.0.0.1:1,1 10.0.0.2:1,1 10.0.0.3:1,1"
CASS_USERNAME=test
CASS_PASSWORD=test
The HOSTIP changes depending on which node the configuration file is on.
The problem is, when I run nodetool ring, each node says there's only two nodes in the cluster: itself and one other, seemingly random from the other two.
What are some basic sanity checks to determine a "healthy" Cassandra cluster? Why is nodetool saying each one thinks there's a different node missing from the cluster?
nodetool status - overview of the cluster (load, state, ownership)
nodetool info - more granular details at the node-level
As for the node mismatch I would check the following:
cassandra-topology.properties - identical across the cluster (all 3 IPs listed)
cassandra.yaml - I typically keep this file the same across all nodes. The parameters that MUST stay the same across the cluster are: cluster_name, seeds, partitioner, snitch).
verify all nodes can reach each other (ping, telnet, etc)
DataStax (Cassandra Vendor) has some good documentation. Please note that some features are only available on DataStax Enterprise -
http://docs.datastax.com/en/landing_page/doc/landing_page/current.html
Also check out the Apache Cassandra site -
http://cassandra.apache.org/community/
As well as the user forums -
https://www.mail-archive.com/user#cassandra.apache.org/
Actually, the thing you really want to check is if all the nodes "AGREE" on schema_id. nodetool status shows if nodes or up, down, joining, yet it does not really mean 'healthy' enough to make schema changes or do other changes.
The simplest way is:
nodetool describecluster
Cluster Information:
Name: FooBarCluster
Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
DynamicEndPointSnitch: enabled
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
43fe9177-382c-327e-904a-c8353a9df590: [10.136.2.1, 10.136.2.2, 10.136.2.3]
If schema IDs do not match, you need to wait for schema to settle, or run repairs, say for example like this:
43fe9177-382c-327e-904a-c8353a9df590: [10.136.2.1, 10.136.2.2]
43fe9177-382c-327e-904a-c8353a9dxxxx: [10.136.2.3]
However, running nodetool is 'heavy' and hard to parse.
The information is inside the database, you can check here:
'SELECT schema_version, release_version FROM system.local' and
'SELECT peer, schema_version, release_version FROM system.peers'
Then you compare schema_version across all nodes... if they match, the cluster is very likely healthy. You should ALWAYS check this before making any changes to schema.
Now, during a rolling upgrade, when changing engine versions, the release_version is different, so to support automatic rolling upgrades, you need to check schema_id matching within release_versions separately.
I'm not sure all of the problems you might be having, but...
Check the cassandra.yaml file. You need minimum 3 things to be the same - seeds: list (but do not list all nodes as seeds!), cluster_name, and snitch. Make sure your listen_address is correct.
If you are using gossipingPropertyFileSnitch then check cassandra-topology.properties and/or cassandra-rackdc.properties files for accuracy.
Don't start all the nodes at the same time. Start the seed nodes 1st - the other nodes will "gossip" with the seed node to learn cluster topology. Shutdown the seed nodes last.
Don't use shared storage. That defeats the purpose of distributed data and is considered a cassandra anti-pattern.
If you're in AWS, don't use auto-scaling groups unless you know what you're doing.
Once you've done all that, use nodetool status | ring | info or jmx to see what the cluster is doing.
Datastax does have decent documentation for cassandra.

Resources