system_auth replicates without changing replication_factor, so why change it? - cassandra

I have a PasswordAuthenticator login, user = cassandra, with system_auth RF = 1 and 2 non-seed nodes. When I changed the password, it propagated to the non-seed nodes even though RF = 1. So why change the RF? (The reason I ask is: I noticed if I change it to 3 before the other nodes are up --> I can't login: QUORUM error, so this is a bootstrapping question)
cassandra#cqlsh:system_auth> SELECT * FROM system_schema.keyspaces;
keyspace_name | durable_writes | replication
--------------------+----------------+---------------------------------------------------------------------------------------
system_auth | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'}

Two problems can happen if you don't change it:
In a multi-node cluster, if the node responsible for your user/role data crashes, you won't be able to log in.
In a multi-DC cluster, if you attempt to connect with your application using a "local" data center, you will only be able to connect to nodes in the DC containing your user/role data. This is because SimpleStrategy is not DC aware. Not changing it will also only store one replica in the cluster, which will only be in a single DC. Login attempts on the other DCs will fail.
In short, I recommend:
Never use SimpleStrategy. It's not useful for production MDHA (multi-datacenter high availability), so why build a lower environment with it?
Set the RF on system_auth to number of nodes, but no more than 3.
Creating an additional super user, and changing the password of the cassandra/cassandra user...never using it again. The cassandra/cassandra user triggers some special things inside of cqlsh to operate at QUORUM, by default. If nodes crash, you don't want the extra problems which that opens the door for.
Here's a related answer, which should help: Replication Factor to use for system_auth

Related

Add nodes to existing Cassandra cluster

We currently have a 2 node Cassandra cluster. We want to add 4 more nodes to the cluster, using the rack feature. The future topology will be:
node-01 (Rack1)
node-02 (Rack1)
node-03 (Rack2)
node-04 (Rack2)
node-05 (Rack3)
node-06 (Rack3)
We want to use different racks, but the same DC.
But for now we use SimpleStrategy and replication factor is 1 for all keyspaces. My plan to switch from a 2 to a 6 node cluster is shown below:
Change Endpoint snitch to GossipingPropetyFileSnitch.
Alter keyspace to NetworkTopologyStrategy...with replication_factor 'datacenter1': '3'.
According to the docs, when we add a new DC to an existing cluster, we must alter system keyspaces, too. But in our case, we change only the snitch and keyspace strategy, not the Datacenter. Or should I change the system keyspaces strategy and replication factor too, in the case of adding more nodes and changing the snitch?
First, I would change the endpoint_snitch to GossipingPropertyFileSnitch on one node and restart it. You need to make sure that approach works, first. Typically, you cannot (easily) change the logical datacenter or rack names on a running cluster. Which technically you're not doing that, but SimpleStrategy may do some things under the hood to abstract datacenter/rack awareness, so it's a good idea to test it.
If it works, make the change and restart the other node, as well. If it doesn't work, you may need to add 6 new nodes (instead of 4) and decommission the existing 2 nodes.
Or should I change the system keyspaces strategy and replication factor too?
Yes, you should set the same keyspace replication definition on the following keyspaces: system_auth, system_traces, and system_distributed.
Consider this situation: If one of your 2 nodes crashes, you won't be able to log in as the users assigned to that node via the system_auth table. So it is very important to ensure that system_auth is replicated appropriately.
I wrote a post on this some time ago (updated in 2018): Replication Factor to use for system_auth
Also, I recommend the same approach on system_traces and system_distributed, as future node adds/replacements/repairs may fail if valid token ranges for those keyspaces cannot be located. Basically, using the same approach on them prevents potential problems in the future.
Edit 20200527:
Do I need to launch the nodetool cleanup on old cluster nodes after the snitch and keyspace topology changes? According to docs "yes," but only on old nodes?
You will need to run it on every node, except for the very last one added. The last node is the only node guaranteed to only have data which match its token range assignments.
"Why?" you may ask. Consider the total percentage ownership as the cluster incrementally grows from 2 nodes to 6. If you bump the RF from 1 to 2 (run a repair), and then from 2 to 3 and add the first node, you have a cluster with 3 nodes and 3 replicas. Each node then has 100% data ownership.
That ownership percentage gradually decreases as each node is added, down to 50% when the 6th and final node is added. But even though all nodes will have ownership of 50% of the token ranges:
The first 3 nodes will still actually have 100% of the data set, accounting for an extra 50% of the data that they should.
The fourth node will still have an extra 25% (3/4 minus 1/2 or 50%).
The fifth node will still have an extra 10% (3/5 minus 1/2).
Therefore, the sixth and final node is the only one which will not contain any more data than it is responsible for.

Alter Keyspace on cassandra 3.11 production cluster to switch to NetworkTopologyStrategy

I have a cassandra 3.11 production cluster with 15 nodes. Each node has ~500GB total with replication factor 3. Unfortunately the cluster is setup with Replication 'SimpleStrategy'. I am switching it to 'NetworkTopologyStrategy'. I am looking to understand the caveats of doing so on a production cluster. What should I expect?
Switching from mSimpleStrategy to NetworkTopologyStrategy in a single data center configuration is very simple. The only caveat of which I would warn, is to make sure you spell the data center name correctly. Failure to do so will cause operations to fail.
One way to ensure that you use the right data center, is to query it from system.local.
cassdba#cqlsh> SELECT data_center FROM system.local;
data_center
-------------
west_dc
(1 rows)
Then adjust your keyspace to replicate to that DC:
ALTER KEYSPACE stackoverflow WITH replication = {'class': 'NetworkTopologyStrategy',
'west_dc': '3'};
Now for multiple data centers, you'll want to make sure that you specify your new data center names correctly, AND that you run a repair (on all nodes) when you're done. This is because SimpleStrategy treats all nodes as a single data center, regardless of their actual DC definition. So you could have 2 replicas in one DC, and only 1 in another.
I have changed RFs for keyspaces on-the-fly several times. Usually, there are no issues. But it's a good idea to run nodetool describecluster when you're done, just to make sure all nodes have schema agreement.
Pro-tip: For future googlers, there is NO BENEFIT to creating keyspaces using SimpleStrategy. All it does, is put you in a position where you have to fix it later. In fact, I would argue that SimpleStrategy should NEVER BE USED.
so when will the data movement commence? In my case since I have specific rack ids now, so I expect my replicas to switch nodes upon this alter keyspace action.
This alone will not cause any adjustments of token range responsibility. If you already have a RF of 3 and so does your new DC definition, you won't need to run a repair, so nothing will stream.
I have a 15 nodes cluster which is divided into 5 racks. So each rack has 3 nodes belonging to it. Since I previously have replication factor 3 and SimpleStrategy, more than 1 replica could have belonged to the same rack. Whereas NetworkStrategy guarantees that no two replicas will belong to the same rack. So shouldn't this cause data to move?
In that case, if you run a repair your secondary or ternary replicas may find a new home. But your primaries will stay the same.
So are you saying that nothing changes until I run a repair?
Correct.

Cassandra Error: "Unable to complete request: one or more nodes were unavailable."

I am a complete newbie at Cassandra and am just setting it up and playing around with it and testing different scenarios using cqlsh.
I currently have 4 nodes in 2 datacenters something like this (with proper IPs of course):
a.b.c.d=DC1:RACK1
a.b.c.d=DC1:RACK1
a.b.c.d=DC2:RACK1
a.b.c.d=DC2:RACK1
default=DCX:RACKX
Everything seems to make sense so far except that I brought down a node on purpose just to see the resulting behaviour and I notice that I can no longer query/insert data on the remaining nodes as it results in "Unable to complete request: one or more nodes were unavailable."
I get that a node is unavailable (I did that on purpose), but isnt one of the main points of distributed DB is to continue to support functionalities even as some nodes go down? Why does bringing one node down put a complete stop to everything?
What am I missing?
Any help would be greatly appreciated!!
You're correct in assuming that one node down should still allow you to query the cluster, but there are a few things to consider.
I'm assuming that "nodetool status" returns the expected results for that DC (i.e. "UN" for the UP node, "DN" for the DOWNed node)
Check the following:
Connection's Consistency level (default is ONE)
Keyspace replication strategy and factor (default is Simple, rack/dc unaware)
In cqlsh, "describe keyspace "
Note that if you've been playing around with replication factor you'll need to run a "nodetool repair" on the nodes.
More reading here
Is it possible that you did not set the replication factor on your keyspace with a value greater than 1? For example:
CREATE KEYSPACE "Excalibur"
WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 2, 'dc2' : 2};
Will configure your keyspace such that data is replicated to 2 nodes in each dc1 and dc2 datacenters.
If your replication factor is 1 and a node goes down that owns the data you are querying you will not be able to retrieve the data and C* will fail fast with an unavailable error. In general if C* detects that the consistency level cannot be met to service your query it will fail fast.

Not enough replica available for query at consistency ONE (1 required but only 0 alive)

I have a Cassandra cluster with three nodes, two of which are up. They are all in the same DC. When my Java application goes to write to the cluster, I get an error in my application that seems to be caused by some problem with Cassandra:
Caused by: com.datastax.driver.core.exceptions.UnavailableException: Not enough replica available for query at consistency ONE (1 required but only 0 alive)
at com.datastax.driver.core.exceptions.UnavailableException.copy(UnavailableException.java:79)
The part that doesn't make sense is that "1 required but only 0 alive" statement. There are two nodes up, which means that one should be "alive" for replication.
Or am I misunderstanding the error message?
Thanks.
You are likely getting this error because the Replication Factor of the keyspace the table you are querying belongs to has a Replication Factor of one, is that correct?
If the partition you are reading / updating does not have enough available replicas (nodes with that data) to meet the consistency level, you will get this error.
If you want to be able to handle more than 1 node being unavailable, what you could do is look into altering your keyspace to set a higher replication factor, preferably three in this case, and then running a nodetool repair on each node to get all of your data on all nodes. With this change, you would be able to survive the loss of 2 nodes to read at a consistency level of one.
This cassandra parameters calculator is a good reference for understanding the considerations of node count, replication factor, and consistency levels.
I hit this today because the datacenter field is case sensitive. If your dc is 'somedc01' this isn't going to work:
replication =
{
'class': 'NetworkTopologyStrategy',
'SOMEDC01': '3' # <-- BOOM!
}
AND durable_writes = true;
Anyway, it's not that intuitive, hope this helps.
in my case, I got a message 0 available, but cassandra was up and cqlsh worked correctly, the problem was accessing from java: query was for a complete table, and some records were not accesible (all nodes containing them down). From cqlsh, select * from table works, only shows accesible records. So, the solution is to recover down nodes, and maybe to change replication factors with:
ALTER KEYSPACE ....
nodetool repair -all
then nodetool status to see changes and cluster structure
For me it was that my endpoint_snitch was still set to SimpleSnitch instead of something like GossipingPropertyFileSnitch. This was preventing the multi-DC cluster from connecting properly and manifesting in the error above.

Getting "unable to complete request: one or more nodes were unavailable" when performing insert statement using cqlsh

I'm trying to perform an insert on a brand new install of Cassandra 2, and while I was able to set up a new keyspace and table just fine, I get the eror mentioned above when attempting to perform an insert.
I dont' have any fancy multi server setup, it's just running one one computer with a test db hence my confusion with node configuration
Commands used to create said items are:
CREATE KEYSPACE demodb WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'DC1' : 3 };
USE demodb;
CREATE TABLE users (user_name varchar, state varchar, birth_year bigint, PRIMARY KEY (user_name));
INSERT INTO users (user_name, state, birth_year) VALUES ('canadiancreed', 'PA', 1976);
CREATE KEYSPACE demodb WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'DC1' : 3 };
Is most likely your culprit. It says that data in the demodb keyspace belongs in DC1 and should be replicated 3 times. If your single test node is not specified as being a member of DC1 any request to insert to this keyspace will fail. In addition, if it is a member of DC1 and the consistency level is greater than 1 all requests will fail because it will be impossible for the write to get more than one acknolegdment.
Check what your Data Center is named (nodetool status) and adjust they keyspace replication details to match. That will most likely solve your problems.
---- Edited for more Details and Better Formatting ----
This is one of the most common errors new users have with Cassandra. Basically in Cassandra there are logical units of hardware we call Datacenters. A datacenter is supposed to represent a group of geographically or in some other way distinct group of machines. You can make many of these and protect against failure in one geographic location from causing your application to go offline.
Keyspaces are a logical structure for organizing groups of information, it would be analgous to a Database in the relational world. Each Keyspace gets to specify on which and how many machines should it replicate against. If we use the NetworkTopologyStrategy the replication is specified on a per datacenter basiss. We specify these details at creation time (although they can be modified later) using the "CREATE KEYSPACE .... WITH REPLICATION ".
In your above statement you have specified that all information within the Keyspace demodb should be placed in the datacenter "DC1" and there should be 3 copies of the data in that datacenter. This basically means you have at least 3 Nodes in DC1 and you want a copy of the data on each of those nodes. This by itself will not cause an insert to fail unless the entire datacenter is unknown to the Cassandra cluster. This would be the case if you did no initial configuration of your C* cluster and are just running off the stock yaml.
Running nodetool status will show you what a current node believes about the state of the cluster. Here is the output from C* running off my local machine.
Datacenter: Cassandra
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns (effective) Host ID Token Rack
UN 127.0.0.1 93.37 KB 100.0% 50be3bec-7e30-4385-bd4a-918055a29292 4731866028208108826 rack1
This output shows that I have a single node operating within a cluster named "Cassandra". This means any inserts to keyspaces which require replicas in other Datacenters will fail because the cluster doesn't know how to handle those requests. (If the nodes were simply down but we had seen them before we could save hints but if the other DC has never been seen we reject the request because the cluster has most likely been misconfigured.)
To fix this situation for I would modify my Keyspace using
cqlsh:demodb> ALTER KEYSPACE demodb WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'Cassandra' : 1 };
Now demoDB requires a copy of the data in 1 machine in the datacenter Cassandra. This is great beacuase as my nodetool output states, I have one node in a datacenter named Cassandra. If I try an insert now it passes.
cqlsh:demodb> INSERT INTO users (user_name, state, birth_year) VALUES ('canadiancreed', 'PA', 1976);
cqlsh:demodb> select * from users where user_name = 'canadiancreed' ;
user_name | birth_year | state
---------------+------------+-------
canadiancreed | 1976 | PA
(1 rows)
and I would change my setup schema script to have the correct datacenter name as well
CREATE KEYSPACE demodb WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'Cassandra' : 1 };
In case you end up here after a Google search, I found that you can get this error if you are using the consistency level ALL (might also be the case for QUORUM with specific replication factor numbers) and the keyspace you use is setup to be replicated on a non-existent or dead datacenter.
Updating the keyspace replication to remove reference to the non-existent datacenter solves the issue.
(and the message is entirely logical in this case: you want results from nodes that don't exist anymore).

Resources