Cassandra NetworkTopologyStrategy replication

Cassandra NetworkTopologyStrategy replication - cassandra

I installed and started Cassandra on two linux machines in Amazon EC2. I also set cassandra.yaml to use a property file snitch and configured the cassandra-topology.properties file as the following:
<external IP 1>=AWS1:R1
<external IP 2>=AWS2:R1
Then created a keyspace as the following:
create keyspace myks with strategy_options = [{AWS1:1,AWS2:1}] and placement_strategy='NetworkTopologyStrategy';
Then I created a column family and tried inserting one row...However, I'm getting a null back from the CLI when I try to insert. Did I miss something in the configuration?
How can I find out what's going on?
Also -- does Cassandra only read the cassandra-topology at startup?
Thanks

Looks like keyspace creation is not properly done. Its a simple fact, whenever you are getting UnavailableException() while populating, take it for granted that there is an issue in creating the keyspace. In your case you haven't mention the full class path for the desired placement_strategy
CREATE KEYSPACE myks WITH placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy'
AND strategy_options=[{AWS1:1,AWS2:1}];
Yes Cassandra only read topology at the time creation of the keyspace(startup)

Related

Is there a way to select the cassandra keyspace to use in a gremlin query?

Normally, the janusgraph.properties file specifies the storage backend params to which the janus instance is pointing to:
// For a Cassandra backend
storage.backend=cql
storage.hostname=cassandraHost
storage.cql.keyspace=myKeyspace
// ... port, password, username, and so on
Now, once the Janusgraph instance is created, any gremlin query requested to Janusgraph will create/read the graph info from that specified keyspace named "myKeyspace".
Since I need to use a Janusgraph instance and a Cassandra instance that already running (cannot change the keyspace), but I need the queries to return the graph contained in another keyspace called "secondKeySpace" my question is:
Is there a way to select a different Cassandra keyspace to which to point the Janusgraph gremlin queries within the gremlin query itself?
Instead of doing
g.V().has(label, 'service').has('serviceId','%s').out().has(label,'user')```
Can I do something like the next?
g.keySpace('secondKeySpace').V().has(label, 'service').has('serviceId','%s').out().has(label,'user')
Thanks in advance for any help you all, I'm new to Janusgraph and I don't know if this is even possible.

Cassandra Authentication Fail: "Unable to perform authentication: Cannot achieve consistency level QUORUM"

I'm configuring a 3 node Cassandra cluster (multi datacenter) and everything works well until I set up the authentication process, setting from AllowAllAuthenticator to PasswordAuthenticator, as defined in Cassandra's doc.
The problem is, once I changed and restart nodes, I cannot access anymore the database, in this case with cassandra superuser, displaying this message:
Connection error: ('Unable to connect to any servers', {'10.0.0.10': AuthenticationFailed('Failed to authenticate to 10.0.0.10: Error from server: code=0100 [Bad credentials] message="Unable to perform authentication: Cannot achieve consistency level QUORUM"',)})
It's important to mention that before to set up the authenticator, I already updated the system_auth to NetworkTopologyStratety, setting up each node.
Also, without authentication all replications work fine, which means the cluster is running fine.
Does anyone have some idea about it? This is really driving me crazy, once I didn't find any reference about it.
All the best!

My guess is that you need to run repair on all of the nodes for the "system_auth", and if you're running DSE, ensure any keyspace that starts with "dse" that has "simple strategy" is updated to Network Topology Strategy with appropriate DC and RF settings - and run repair on each node for those as well.
That should solve your problem. My guess is that you created your users and then updated the keyspaces to use Network Topology. Once done, any new records will be be propagated correctly, but the existing records need repair to "fan them out" as it won't happen on its own.
-Jim

Is it possible to have "local" keyspace in a cassandra cluster with multiple datacenters

Can I prevent a keyspace from syncing over to another datacenter by NOT including the other datacenter in my keyspace replication definition?
Apparently, this is not the case.
In my own test, I have set up two Kubernetes clusters in GCP, each serves as a Cassandra datacenter. Each k8s clusters have 3 nodes.
I set up datacenter DC-WEST first, and create a keyspace demo using this:
CREATE KEYSPACE demo WITH replication = {‘class’: ‘NetworkTopologyStrategy’, ‘DC-WEST’ : 3};
Then I set up datacenter DC-EAST, without adding any use keyspaces.
To join the two data centers, I modify the CASSANDRA_SEEDS environment variable in the Cassandra StatefulSet YAML to include seeds nodes from both datacenters (I use host networking).
But after that, I notice the keyspace demo is synced over to DC-EAST, even though the keyspace only has DC-WEST in the replication.
cqlsh> select data_center from system.local
... ;
data_center
-------------
DC-EAST <-- Note: this is from the DC-EAST datacenter
(1 rows)
cqlsh> desc keyspace demo
CREATE KEYSPACE demo WITH replication = {'class': 'NetworkTopologyStrategy', 'DC-WEST': '3'} AND durable_writes = true;
So we see in DC-EAST the demo keyspace which should be replicated only on DC-WEST! What am I doing wrong?

Cassandra replication strategies control where data is placed, but the actual schema (the existence of the table/datacenters/etc) is global.
If you create a keyspace that only lives in one DC, all other DCs will still see the keyspace in their schema, and will even make the directory structure on disk, though no data will be replicated to those hosts.

You didn't specify how you deployed you Cassandra cluster in Kubernetes, but it looks like your nodes in DC-WEST may be configured to say that they are DC-EAST.
I would check the ConfigMap for the stateful set in DC-WEST. Maybe it has the DC-EAST value for cassandra-rackdc.properties(?). More info on the cassandra-rackdc.properties file here.

Cassandra System_Auth keyspace with RF=1 gets replicated to new nodes

As part of a POC. I have a 1 node cassandra cluster with system_auth keyspace RF=1.
I added a second node to this cluster(with empty data/commitlog/saved_cache directory) and I notice user credentials are replicated to the new node. Since RF=1 for existing node, I don't expect it to replicate to new node.
Any reason why ?
Cassandra Version : 2.1.8

For most system_auth queries, Cassandra uses a consistency level of LOCAL_ONE and uses QUORUM for the default cassandrasuperuser. If both nodes are up, you will be able to see the data and login without any problem. Also you added second node with empty commit log and saved caches, but if you copied the rest of the data form the original node the data will be there, including system_auth.

Bootstrapping a datacenter using a snapshot

I am provisioning a new datacenter for an existing cluster. A rather shaky VPN connection is hindering me from making a nodetool rebuild bootstrap of the new DC. Interestingly, I have a full fresh database snapshot/backup at the same location as the new DC (transferred outside of the VPN). I am now considering the following approach:
Make sure my clients are using the old DC.
Provision the new nodes in new DC.
ALTER the keyspace to enable replicas on the new DC. This
will start replicating all writes from old DC to new DC.
Before gc_grace_seconds after operation 3) above, use sstableloader to
stream my backup to the new nodes.
For safety precaution, do a full repair.
Would this work?

Our team also faced a similar situation. We run C* on Amazon EC2.
So first we prepared a snapshot of existing nodes and used them to create them for other datacenter(to avoid huge data transfer).
Procedure we followed:
Change replication strategy for all DC1 servers from simple-strategy to networkTopologyStrategy {DC1:x, DC2:y}
change cassandra.yaml
endpoint_snitch: GossipingPropertyFileSnitch
add a DC2 node IP to seeds list
others no need to change
change cassandra-rackdc.properties
dc=DC1
rack=RAC1
restart nodes one at a time.
restart seed node first
Alter the keyspace.
ALTER KEYSPACE keyspace_name WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'DC1' : x, 'DC2':y };
Do it for all keyspace in DC1
no need to repair.
verify if the system is stable by query
Add DC2 servers as new data center to DC1 data center
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html
in DC2 db, cassandra.yaml > auto_bootstrap: false
fix seeds, endpoint_snitch, cluster name
Node1 DC1 IP, Node2 DC2 IP as seeds.
recommended endpoint_snitch : GossipingPropertyFileSnitch
cluster name, same as DC1: test-cluster
fix gossiping-property-file-snith : cassandra-rackdc.properties
dc=DC2
rack=RAC1
bring DC2 nodes up one at a time
seed node first
change keyspace to networkTopologyStrategy {DC1:x, DC2:y}
since the DC2 db is copied from DC1, we should repair instead of rebuild

Yes, the approach should work. I've verified it with two knowledgeable people within the Cassandra community. Two pieces that are important to note, however:
That the snapshot is being taken efter the mutations have started being written to the new datacenter.
The backup must be fully imported before gc_grace_seconds after when the backup is taken. Otherwise you risk getting zombie data popping up.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Cassandra NetworkTopologyStrategy replication - cassandra

Related

Is there a way to select the cassandra keyspace to use in a gremlin query?

Cassandra Authentication Fail: "Unable to perform authentication: Cannot achieve consistency level QUORUM"

Is it possible to have "local" keyspace in a cassandra cluster with multiple datacenters

Cassandra System_Auth keyspace with RF=1 gets replicated to new nodes

Bootstrapping a datacenter using a snapshot

Categories

Resources