Repair system_auth keyspace in Cassandra - cassandra

According to official documents, system keyspace uses Local replication strategy so there is no need to repair it, my question is about system_auth keyspace, should I manually run repair on this keyspace?
When I use full repair without specifying any keyspace, I expect to see system_auth being repaired in the log file, but I can't see any indication that system_auth is getting repaired.

Only some system keyspaces are using the local replication strategy. The system_auth uses SimpleStrategy with replication factor 1 by default (see docs). If you have a cluster of several nodes, then it's recommended to set replication strategy to NetworkTopologyStrategy (even if you have one DC - it will help in the future) and increase replication factor to 3 in each DC. And then you need to run repairs on it to have it in consistent state.
P.S. Also, create a new superuser (see step 5 in docs), because default cassandra uses QUORUM when reading login data, and it could be a problem if you lose half of the machines.

Related

cassandra 3.11.9 system_auth need to be SimpleStrategy or NetworkTopologyStrategy on production env?

What is the recommended for cassandra (apache) 3.11.9 system_auth? need to be SimpleStrategy or NetworkTopologyStrategy? And with how much RF?
We have cassandra with 1 dc (2-3 AWS racks with EC2_snitch + dynamic_snitch disabled). Most queries running on consistency level local_one). Today our system_auth keyspace configured SimpleStrategy with RF 3. In a lot of queries, we are wasting time on (tracing):
Executing single-partition query on roles [ReadStage-X]
As part of an attempt to solve our problems we also increased the parameters:
roles_validity_in_ms, permissions_validity_in_ms, credentials_validity_in_ms, permissions_cache_max_entries.
Can queries latency problems be connected to system_auth keyspace configuration?
I answered this question a while ago, which is similar:
Replication Factor to use for system_auth
Due to issues that can happen with larger clusters which fluctuate in size, we now treat system_auth like we do any other keyspace. That is, we set system_auth's RF to 3 in each DC.
tl;dr;, if you're using NetworkTopologyStrategy on your non-system keyspaces, then you should also be using it for system_auth. Same with your RF; I'd always match the RF of system_auth with that of my "normal" keyspaces, as well.
No, the replication strategy and RF used on system_auth does not typically cause query latency. That is of course, unless any of the Security cache settings have been altered. 10 years of working with Cassandra, I've never had to change those: https://docs.datastax.com/en/security/5.1/security/secAuthCacheSettings.html
queries wasting time on (tracing): "Executing single-partition query on roles [ReadStage-X]"
This statement got me thinking: Are you tracing queries in cqlsh while logged in as the default cassandra user? That user does trigger some cqlsh operations to execute at QUORUM. Could also be that maybe the query consistency and connection consistency are set differently. Just a thought.

Removing DC from multi DC cluster in Cassandra

I have a two datacenter site (dc1 and dc2). I am writing with replication of 3 (dc1:3 , dc2:3) on dc1. dc2 is backup site taking no traffic. I upgraded all the nodes of dc2 to C* version 3.11.2. Nodes of dc1 are on C* version 2.1.16. Now due to some issue I have to rollback my upgrade. I have two options
Data backup restore the complete site (dc1 and dc2) - It will cause a lot of data loss.
Remove dc2 from dc1 using steps given here.
Is there any issue in removing a site(dc2) in case of mixed C* versions?
If it were me, I would:
Take DC2 out of replication.
Shutdown nodes on DC2.
Remove the nodes/assassinate them.
Uninstall C* completely.
Wipe the nodes of all data/logs/configuration.
Install C* and reconfigure.
Add nodes to a new DC.
This means there's no data loss by having to restore from backups. Cheers!
Yes, second option is seems to be good and you can recover your data safely. You should remove DC2 datacenter from your existing cluster. As you are saying no traffic on DC2 so it could be easy to performing addition and removal operation.
You need to follow the steps as below.
Change the replication factor of keyspaces.
Stop the Cassandra services on DC2.
You can remove the nodes from existing cluster via nodetool removenode command if it is creating a issue you can use assassinate.
Once node removed from the cluster one by one you need to uninstall the Cassandra there.
Remove existing data on removed node completely.
Then, you need to install fresh Cassandra there based on previous configuration, you can refer config files from existing cluster or you took a backup for you config on 2.1.16
Now, you need to add your datacenter again on cluster.
In this way, you can easily get your datacenter and data quickly.
You can refer the documentation here for any confusion in addition
https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/operations/opsAddDCToClusterDesigDC.html

Cassandra repair after datacenter went down

I have a Cassandra db (version 3.11.2) running in AWS, with 2 Datacenters - each in another AWS region and 3 nodes in each one.
The replication factor on all keyspaces is 3, so full replication of data on every node. The size of data is about 10GB per node.
All of our writes are in LOCAL_QUORUM against one DC (lets call it DC1). Basically the other DC is just for a kind of backup and disaster recovery, in case the AWS region for DC1 will be unavailable we will redirect traffic to DC2.
My issue is that we had a network disconnection between the two DCs, for several hours, and after several days we noticed that there is missing data in DC2. This all makes sense, since the time the DCs were apart is larger than the Hinted Handoff window (3 hours). So we need to run a repair to bring DC2 back to sync with DC1.
I went over the cassandra docs, and read countless SO answers and for the life of me I couldn't understand what is the right repair to do...
Do I need to issue a 'nodetool repair --full --sequential' from only one node? Do I need to run it on every node in the cluster? Maybe it's better to run 'nodetool rebuild'?
Executing nodetool cleanup on the nodes on datacenter2 should be able to bring up the data up to sync, but depending on the data size affected, this may be a task that can take time and resources. If the datacenter2 is only as a backup for disaster recovery purposes, it may be easier and quicker to backup the current dc1 cluster and restore it in the second datacenter (more information is available here.

For system_auth default Class should be SimpleStrategy or NetworkTopologyStrategy

In our Prod cluster, I see some of the system* keyspaces with SimpleStrategy.
As we are adding new DCs in our cluster we need to be on NetworkTopologyStrategy otherwise while I am running nodetool rebuild, It got failed.
I alter all keyspaces from SimpleStrategy to NetworkTopologyStrategy & rebuild works well.
During ALTER it gives warning also that you are altering system keyspace.
Another question: Opscenter Keyspace(rollup* tables) is having a hell lot of data, why I should replicate that, it should be enough to put on only 1 DC??
My question what should be the ideal strategy for system* keyspaces??
My question what should be the ideal strategy for system* keyspaces?
A little warning on this one. system and system_schema have a special replication strategy of LocalStrategy, and they should stay that way.
The other keyspaces of system_auth, system_distributed, and system_traces however, are a different story. Setting those to use NetworkTopologyStrategy with a RF 3 (assuming each DC has at least 3 nodes) for each DC is perfectly acceptable. Setting that for system_distributed, and system_traces isn't as important, but it shouldn't hurt anything.
On the other hand, system_auth should definitely be set to use NetworkTopologyStrategy with a RF specified for each DC. The reason, is that SimpleStrategy could potentially put all of its replicas in a single DC, or even zero replicas in one DC. That could cause high latency or even timeouts for auth checks, as that would result in cross-DC network traffic.
Also, if your applications use DC-specific load balancing policies, you will need to specify a RF for each DC in system_auth. As mentioned above, using SimpleStrategy could result in a DC not getting any replicas for a user. And than would prevent DC-specific applications from connecting.

Cassandra and Spark

Hi I have a high level question regarding cluster topology and data replication with respect to cassandra and spark being used together in datastax enterprise.
It was my uderstanding that if there were 6 nodes in a cluster and there is heavy computing (e.g analytics) done then you could have three spark nodes and 3 cassandra nodes if you want. Or you don't need three nodes for analytics but your jobs would not run as fast. The reason you don't want the heavy analytics on the cassandra nodes is because the local memory is already being used up to handle the heavy read/write load of cassandra.
This much is clear, but here are my questions :
How does the replicated data work then?
Are all the cassandra only nodes in one rack, and all the spark nodes in another rack?
Does all the data get replicated to the spark nodes?
How does that work if it does?
What is the recommended configuration steps to make sure the data is replicated properly to the spark nodes?
How does the replicated data work then?
Regular Cassandra replication will operate between nodes and DC's. As far as replication goes this is the same as having a c* only cluster with two data centers.
Are all the cassandra only nodes in one rack, and all the spark nodes in another rack?
With the default DSE Snitch, your C* nodes will be in one DC and the Spark nodes in another DC. They will all be in a default rack. If you want to use multiple racks you will have to configure that yourself by using an advanced snitch. GPFS or PFS are good choices depending on your orchestration mechanisms. Learn more in the DataStax Documentation
Does all the data get replicated to the spark nodes? How does that work if it does?
Replication is controlled at the keyspace level and depends on your replication strategy:
SimpleStrategy will simply ask you the number of replicas you want in your cluster (it is not data center aware so don't use it if you have multiple DC's)
create KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3 }
This assumes you only have one DC and that you'll have 3 copies of each bit of data
NetworkTopology strategy let's you pick number of replicas per DC
create KEYSPACE tst WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1' : 2, 'DC2': 3 }
You can choose to have a different number of replicas per DC.
What is the recommended configuration steps to make sure the data is replicated properly to the spark nodes?
The procedure to update RF is in the datastax documentation. Here it is verbatim:
Updating the replication factor Increasing the replication factor
increases the total number of copies of keyspace data stored in a
Cassandra cluster. If you are using security features, it is
particularly important to increase the replication factor of the
system_auth keyspace from the default (1) because you will not be able
to log into the cluster if the node with the lone replica goes down.
It is recommended to set the replication factor for the system_auth
keyspace equal to the number of nodes in each data center.
Procedure
Update a keyspace in the cluster and change its replication strategy
options. ALTER KEYSPACE system_auth WITH REPLICATION = {'class' :
'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2}; Or if using
SimpleStrategy:
ALTER KEYSPACE "Excalibur" WITH REPLICATION = { 'class' :
'SimpleStrategy', 'replication_factor' : 3 }; On each affected node,
run the nodetool repair command. Wait until repair completes on a
node, then move to the next node.
Know that increasing the RF in your cluster will generate lots of IO and CPU utilization as well as network traffic, while your data gets pushed around your cluster.
If you have a live production workload, you can throttle the impact by using nodetool getstreamthroughput / nodetool setstreamthroughput.
You can also throttle the resulting compactions with nodetool getcompactionthroughput nodetool setcompactionthroughput
How does Cassandra and Spark work together on the analytics nodes and
not fight for resources? If you are not going to limit Cassandra at all in the whole cluster, then what is the point of limiting Spark, just have all the nodes Spark enabled.
The key point is that you won't be pointing your main transactional reads / writes at the Analytics DC (use something like consistency level ONE_LOCAL, or QUORUM_LOCAL to point those requests to the C* DC). Don't worry, your data still arrives at the analytics DC by virtue of replication, but you won't wait for acks to come back from analytics nodes in order to respond to customer requests. The second DC is eventually consistent.
You are right in that cassandra and spark are still running on the same boxes in the analytics DC (this is critical for data locality) and have access to the same resources (and you can do things like control the max spark cores so that cassandra still has breathing room). But you achieve workload isolation by having two Data Centers.
DataStax drivers, by default, will consider the DC of the first contact point they connect with as the local DC so just make sure that your contact point list only includes machines in the local (c* DC).
You can also specify the local datacenter yourself depending on the driver. Here's an example for the ruby driver, check the driver documentation for other languages.
use the :datacenter cluster method: First datacenter found will be
assumed current by default. Note that you can skip this option if you
specify only hosts from the local datacenter in :hosts option.
You are correct, you want to separate your cassandra and your analytics workload.
A typical setup could be:
3 Nodes in one datacenter (name: cassandra)
3 Nodes in second datacenter (name: analytics)
When creating your keyspaces you define them with a NetworkTopologyStrategy and a replication factor defined for each datacenter, like so:
CREATE KEYSPACE myKeyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'cassandra': 2, 'analytics': 2};
With this setup, your data will be replicated twice in each datacenter. This is done automatically by cassandra. So when you insert data in DC cassandra the inserted data will get replicated to DC analytics automatically and vice versa. Note: you can define what data is replicated by using seperate keyspaces for the data you want to be analyzed and the data you don't.
In your cassandra.yaml you should use the GossipingPropertyFileSnitch. With this snitch you can define the DC and the rack of your node in the file cassandra-rackdc.properties. This information then gets propagated via the gossip protocol. So each node learns the topology of your cluster.

Resources