Enable Cassandra PasswordAuthenticator at up time - cassandra

I have a Cassandra cluster (Datastax open source) and currently there is no authentication configured (i.e., it is using AllowAllAuthenticator), and I want to use PasswordAuthenticator. The official document says that I should follow these steps:
enable PasswordAuthenticator in cassandra.yaml,
restart the Cassandra node, which will create the system_auth keyspace,
change the system_auth replication factor,
create new user and password
However, this is a big problem to me because the cluster is used in production so we cannot have any downtime. Between step 2 and 4 no user has been configured yet, so even if the client supplies username and password, the request would still be rejected, which is not ideal.
I looked into the Datastax Enterprise doc, and it has a TransitionalAuthenticator class, which would create the system_auth keyspace but without rejecting requests. I wonder if this class can be ported to the open source version? Or if there are other ways around this problem? Thanks
Update
This is the Cassandra version I'm using:
cqlsh 4.1.1 | Cassandra 2.0.9 | CQL spec 3.1.1 | Thrift protocol 19.39.0

You should be able to execute steps 2-4 with just one node and have zero downtime, assuming proper client configuration, replication, and cluster capacity. Then, it's just a rolling restart of the remaining nodes.
Clients should be setup with credentials ahead of time, and they will start using them as nodes as nodes with authorizers come online (this behavior could depend on driver -- try it out first).
You might be able to manually generate the schema and data for steps 3-4 before engaging the CassandraAuthenticator, but that shouldn't be necessary.
What are your concerns about downtime?

Related

Way to determine healthy Cassandra cluster?

I've been tasked with re-writing some sub-par Ansible playbooks to stand up a Cassandra cluster in CentOS. Quite frankly, there doesn't seem to be much information on Cassandra out there.
I've managed to get the service running on all three nodes at the same time, using the following configuration file, info scrubbed.
HOSTIP=10.0.0.1
MSIP=10.10.10.10
ADMIN_EMAIL=my#email.com
LICENSE_FILE=/tmp/license.conf
USE_LDAP_REMOTE_HOST=n
ENABLE_AX=y
MP_POD=gateway
REGION=test-1
USE_ZK_CLUSTER=y
ZK_HOSTS="10.0.0.1 10.0.0.2 10.0.0.3"
ZK_CLIENT_HOSTS="10.0.0.1 10.0.0.2 10.0.0.3"
USE_CASS_CLUSTER=y
CASS_HOSTS="10.0.0.1:1,1 10.0.0.2:1,1 10.0.0.3:1,1"
CASS_USERNAME=test
CASS_PASSWORD=test
The HOSTIP changes depending on which node the configuration file is on.
The problem is, when I run nodetool ring, each node says there's only two nodes in the cluster: itself and one other, seemingly random from the other two.
What are some basic sanity checks to determine a "healthy" Cassandra cluster? Why is nodetool saying each one thinks there's a different node missing from the cluster?
nodetool status - overview of the cluster (load, state, ownership)
nodetool info - more granular details at the node-level
As for the node mismatch I would check the following:
cassandra-topology.properties - identical across the cluster (all 3 IPs listed)
cassandra.yaml - I typically keep this file the same across all nodes. The parameters that MUST stay the same across the cluster are: cluster_name, seeds, partitioner, snitch).
verify all nodes can reach each other (ping, telnet, etc)
DataStax (Cassandra Vendor) has some good documentation. Please note that some features are only available on DataStax Enterprise -
http://docs.datastax.com/en/landing_page/doc/landing_page/current.html
Also check out the Apache Cassandra site -
http://cassandra.apache.org/community/
As well as the user forums -
https://www.mail-archive.com/user#cassandra.apache.org/
Actually, the thing you really want to check is if all the nodes "AGREE" on schema_id. nodetool status shows if nodes or up, down, joining, yet it does not really mean 'healthy' enough to make schema changes or do other changes.
The simplest way is:
nodetool describecluster
Cluster Information:
Name: FooBarCluster
Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
DynamicEndPointSnitch: enabled
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
43fe9177-382c-327e-904a-c8353a9df590: [10.136.2.1, 10.136.2.2, 10.136.2.3]
If schema IDs do not match, you need to wait for schema to settle, or run repairs, say for example like this:
43fe9177-382c-327e-904a-c8353a9df590: [10.136.2.1, 10.136.2.2]
43fe9177-382c-327e-904a-c8353a9dxxxx: [10.136.2.3]
However, running nodetool is 'heavy' and hard to parse.
The information is inside the database, you can check here:
'SELECT schema_version, release_version FROM system.local' and
'SELECT peer, schema_version, release_version FROM system.peers'
Then you compare schema_version across all nodes... if they match, the cluster is very likely healthy. You should ALWAYS check this before making any changes to schema.
Now, during a rolling upgrade, when changing engine versions, the release_version is different, so to support automatic rolling upgrades, you need to check schema_id matching within release_versions separately.
I'm not sure all of the problems you might be having, but...
Check the cassandra.yaml file. You need minimum 3 things to be the same - seeds: list (but do not list all nodes as seeds!), cluster_name, and snitch. Make sure your listen_address is correct.
If you are using gossipingPropertyFileSnitch then check cassandra-topology.properties and/or cassandra-rackdc.properties files for accuracy.
Don't start all the nodes at the same time. Start the seed nodes 1st - the other nodes will "gossip" with the seed node to learn cluster topology. Shutdown the seed nodes last.
Don't use shared storage. That defeats the purpose of distributed data and is considered a cassandra anti-pattern.
If you're in AWS, don't use auto-scaling groups unless you know what you're doing.
Once you've done all that, use nodetool status | ring | info or jmx to see what the cluster is doing.
Datastax does have decent documentation for cassandra.

apache cassandra 3.9 - Enabling security

we are trying to add a node to the existing ring where in security is enabled and default cassandra user is made nonsuper. Also, alerted keyspace to networktopology with replication = no.of nodes. The ring is currently on AWS.
Once the new node joins the cluster, only user we see is nonsuper cassandra user. we are pretty much lokced out of the cluster. However, once we remove the newly joined node, all the security that we had before comes back.
Are there any best practices that we need to follow to enable security in 3.9?
Thanks in advance for helping me out on this.!!

How to update configuration of a Cassandra cluster

I have a 3 node Cassandra cluster and I want to make some adjustments to the cassandra.yaml
My question is, how should I perform this? One node at a time or is there a way to make it happen without shutting down nodes?
Btw, I am using Cassandra 2.2 and this is a production cluster.
There are multiple approaches here:
If you edit the cassandra.yaml file, you need to restart cassandra to re-read the contents of that file. If you restart all nodes at once, your cluster will be unavailable. Restarting one node at a time is almost always safe (provided you have sane replication-factors and consistency-levels). If your cluster is configured to survive a rack or datacenter outage, then you can safely restart more nodes concurrently.
Many settings can be changed without a restart via JMX, though I don't have a documentation link handy. Changing via JMX WON'T change cassandra.yml though, so you'll need to update that also or your config will revert back to what's in the file when the node restarts.
If you're using DSE, OpsCenter's Lifecycle Manager feature makes updating configs a simple point-and-click affair (disclaimer, I'm biased as I'm an LCM dev).

Unable to start DSE using SPARK_ENABLED=1

We are running 6 node cluster with:
HADOOP_ENABLED=0
SOLR_ENABLED=0
SPARK_ENABLED=0
CFS_ENABLED=0
Now, we would like to add Spark to all of them. It seems like "adding" is not the right term because this would not fail. Anyways, the steps we've done:
1. drained one of the nodes
2. changed /etc/default/dse to SPARK_ENABLED=1 and HADOOP_ENABLED=0
3. sudo service dse restart
And got the following in the log:
ERROR [main] 2016-05-17 11:51:12,739 CassandraDaemon.java:294 - Fatal exception during initialization
org.apache.cassandra.exceptions.ConfigurationException: Cannot start node if snitch's data center (Analytics) differs from previous data center (Cassandra). Please fix the snitch configuration, decommission and rebootstrap this node or use the flag -Dcassandra.ignore_dc=true.
There are two related questions that have been already answered:
Unable to start solr aspect of DSE search
Two node DSE spark cluster error setting up second node. Why?
Unfortunately, clearing the data on the node is not an option - why would I do that? I need the data to be intact.
Using "-Dcassandra.ignore_rack=true -Dcassandra.ignore_dc=true" is a bit scary in production. I don't understand why DSE wants to create another DC and why can't it just use the existing one?
I know that according to datastax's doc one should partition the load using different DC for different workloads. In our case we just want to run SPARK jobs on the same nodes that Cassandra is running using the same DC.
Is that possible?
Thanks!
The other answers are correct. The issue here is trying to warn you that you have previously identified this node as being in another DC. This means that it probably doesn't have the right data for any key-spaces with Network Topology Strategy. For example if you had a NTS keyspace which had only one replica in "Cassandra" and changed the DC to "Analytics" you could inadvertently lose all of the data.
This warning and the accompanying flag are telling you that you are doing something that you should not be doing in a production cluster.
The real solution to this is to explicitly name your dc's using GossipingFileSnitch and not rely on SimpleSnitch which names based on the DSE workload.
In this case, switch to GPFS and set the DC name to Cassandra.

Migrate Datastax Enterprise Cassandra to Apache Cassandra or Datastax Community?

I have a large, but simple Cassandra database on a Datastax 4.6 cluster. The license renewal is prohibitive for this very simple use case and I am trying to migrate to either a straight Apache or Datastax Comunity version. First is it possible to do an inline update?
I have altered all the keyspaces to remove the "EverywhereStrategy" replication strategy but I still get an error that the DSC version of cassandra I'm trying to get to join the cluster doesn't support it. I'm using Like Cassandra versions (2.0.16) and most other things seem to be close.
java.lang.RuntimeException: org.apache.cassandra.exceptions.ConfigurationException: Unable to find replication strategy class 'org.apache.cassandra.locator.EverywhereStrategy'
If it's not possible to do an inline upgrade what would be the best strategy to migrate a decent size (30 node, 150Tb) cluster?
So to make this work you have to extract any of the DSE features that you may have on any of your tables.
This meant I had to change the replication strategy on the dse_system table from EverywhereStrategy to SimpleStrategy with RF=3 (or almost anything after conversion you can drop this keyspace) The error message was:
java.lang.RuntimeException: org.apache.cassandra.exceptions.ConfigurationException: Unable to find replication strategy class 'org.apache.cassandra.locator.EverywhereStrategy'
I Also had to drop the unused CFS keyspaces. We never used the hadoop/CFS integration so we had nothing in these keyspaces anyway. I didn't capture the error for this.
We did have a solr index on a table we were testing on this cluster about a year ago so I had to drop this columnfamily. The error message was:
java.lang.RuntimeException: java.lang.ClassNotFoundException: com.datastax.bdp.search.solr.Cql3SolrSecondaryIndex
There may be other incompatibilities if you use other features of Datastax Enterprise that you would have to remove, but this was enough for me to get the migration working.
dse-core.jar contains the EverywhereStrategy class.
We solved this problem by doing the following:
Replace everything except the above JAR so nodes can come up fine. Once all nodes are migrated to OSS, drop the dse_system keyspace (that uses this replication), delete the JAR and restart the nodes one by one.

Resources