How to connect to specific cassandra node only - cassandra

I have two docker cassandra container nodes acting as node1 and node2 in the same data center.
My aim is to have my java application will always connect to node1 and my adhoc manual queries should return from node2 only (There should not be any inter node communication for data)
Normally i can execute read/write queries on top of container1 or container2 using cqlsh. If i fire some queries on top of container1 using cqlsh will it always return the data from same container (node1) or it may route to another node also internally?
and I know coordinator node will talk with peer node for data request , what will happen incase of RF=2 and 2 nodes cluster will coordinator node itself be able to serve the data?
Here, RF=2, node=2, Consistency=ONE

I have set up clusters before to separate OLTP from OLAP. The way to do it, is to separate your nodes into different logical data centers.
So node1 should have it's local data center in cassandra-rackdc.properties to be in "dc1."
dc=dc1
rack=r1
Likewise, node2 should be put into it's own data center, "dc2."
dc=dc2
rack=ra
Then your keyspace definition will look something like this:
CREATE KEYSPACE stackoverflow
WITH REPLICATION={'class':'NetworkTopologyStrategy','dc1':'1','dc2':'1'};
My aim is to have my java application will always connect to node1
In your Java code, you should specify "dc1" as your default data center, as I do in this example:
String dataCenter = "dc1";
Builder builder = Cluster.builder()
.addContactPoints(nodes)
.withQueryOptions(new QueryOptions().setConsistencyLevel(ConsistencyLevel.LOCAL_ONE))
.withLoadBalancingPolicy(new TokenAwarePolicy(
new DCAwareRoundRobinPolicy.Builder()
.withLocalDc(dataCenter).build()))
.withPoolingOptions(options);
That will make your Java app "sticky" to all nodes in data center "dc1," or just node1 in this case. Likewise, when you cqlsh into node2, your ad-hoc queries should be "sticky" to all nodes in "dc2."
Note: In this configuration, you do not have high-availability. If node1 goes down, your app will not jump over to node2.

Related

Cassandra: the node how to work when a new node joining ring?

I'm a beginner in Cassandra. I want to understand the two nodes(the streaming node and joining node) how to work when a new node joins an existing cluster. Can they provide normal services to the outside?
If the service is provided normally. I assumed the joining node is nodeA, and the node where the fetching data is nodeB. That means nodeA fetch data from nodeB. Assume that the data range C is transmitted from the nodeB to the nodeA, at which time new data falling into the range C is inserted into the cluster. Is the new data written to nodeA or nodeB?
I'm using datastax community edition of cassandra, version 3.11.3.
thanks!
Sun your question is bit confusing .. but what I make of it is , You want to understand the process to adding new node to existing cluster.
Adding a new node to existing cluster requires cassandra.yaml properties for new node identification and communications.
Set the following properties in the cassandra.yaml and, depending on the snitch, the cassandra-topology.properties or cassandra-rackdc.properties configuration files:
auto_bootstrap - This property is not listed in the default cassandra.yaml configuration file, but it might have been added and set to false by other operations. If it is not defined in cassandra.yaml, Cassandra uses true as a default value. For this operation, search for this property in the cassandra.yaml file. If it is present, set it to true or delete it..
cluster_name - The name of the cluster the new node is joining.
listen_address/broadcast_address - Can usually be left blank. Otherwise, use IP address or host name that other Cassandra nodes use to connect to the new node.
endpoint_snitch - The snitch Cassandra uses for locating nodes and routing requests.
num_tokens - The number of vnodes to assign to the node. If the hardware capabilities vary among the nodes in your cluster, you can assign a proportional number of vnodes to the larger machines.
seeds - Determines which nodes the new node contacts to learn about the cluster and establish the gossip process. Make sure that the -seeds list includes the address of at least one node in the existing cluster.
When a new node joins a cluster using topology defined, Seed nodes starts the gossip with the new node by the time it do not communicate with the client directly. Once the gossip completes the new node is ready to take the actual data load.
Hope this helps in understanding the process.

How to make sure your cassandra client is only aware of the local data center nodes?

I have multi data centers with many nodes and i need to let the client to connect to the local data center (nearest data center) first and if the local data center (nearest data center) is downed lets it connect to the remote data center.
I had added tow contact points from every data center.
How the client will recognize the nearest data center ?
I'm using java driver 3.0.0 in the client side.
From doc DCAwareRoundRobinPolicy:
This policy queries nodes of the local data-center in a round-robin fashion; optionally, it can also try a configurable number of hosts in remote data centers if all local hosts failed.
Call withLocalDc to specify the name of your local datacenter. You can also leave it out, and the driver will use the datacenter of the first contact point that was reached at initialization. However, remember that the driver shuffles the initial list of contact points, so this assumes that all contact points are in the local datacenter. In general, providing the datacenter name explicitly is a safer option.
Cluster cluster = Cluster.builder()
.addContactPoint("127.0.0.1")
.withLoadBalancingPolicy(
DCAwareRoundRobinPolicy.builder()
.withLocalDc("myLocalDC")
.withUsedHostsPerRemoteDc(2)
.allowRemoteDCsForLocalConsistencyLevel()
.build()
).build();

Cassandra 3 node configuration

I am new to Cassandra, use latest Cassandra 3.10. I have 3 nodes to link to participate in Cassandra. Cluster name Test Cluster same as three nodes. Same Datacenter dc1 ,Rack as rack1 and snitch as GossipingPropertyFileSnitch used .It Configures
Node A:
-seeds : "A,B,C address"
listen_address & rpc_address are same to A node ip address
Node B:
-seeds : "A,B,C address"
listen_address & rpc_address are same to B node ip address
Node C:
-seeds : "A,B,C address"
listen_address & rpc_address are same to C node ip address
What i am do possibility here listed
i) suppose if A node is failure get data from node B and C .
ii) If any one or two node failure get data from another node. How to configure these nodes.
I have use Simple Strategy with replication factor 3 has used.
If node failure get node from another node data retrieve so, seeds address or mistaken? Briefly explain what to do.
Answering your questions:
If a Node A goes down, then you want to fetch data from node B and C.
If one or two node goes down, you want to fetch data from other node.
To achieve the above, the replication factor which you have configured is enough to handle the node failure. The wrong configuration is having all your nodes be seed node.
A seed node is used to bootstrap other nodes, So usually first node is started first in a data center as a seed node. Suppose you have 2 data centers. Then you should have 2 seed nodes, as mentioned in below datastax docs:
http://docs.datastax.com/en/cassandra/3.0/cassandra/initialize/initSingleDS.html
As per your last comment, you have mentioned "schema version mismatch detected". Which is means all your nodes are not in same cluster. Check the schema using nodetool when all your nodes are running
nodetool describecluster
This should give nodes schema version. All nodes should be same schema version.
So if any one node does not have same version then restart the node till the schema version is same.
Once you fix this schema error, you will be able to create keyspace.

Bootstrapping a datacenter using a snapshot

I am provisioning a new datacenter for an existing cluster. A rather shaky VPN connection is hindering me from making a nodetool rebuild bootstrap of the new DC. Interestingly, I have a full fresh database snapshot/backup at the same location as the new DC (transferred outside of the VPN). I am now considering the following approach:
Make sure my clients are using the old DC.
Provision the new nodes in new DC.
ALTER the keyspace to enable replicas on the new DC. This
will start replicating all writes from old DC to new DC.
Before gc_grace_seconds after operation 3) above, use sstableloader to
stream my backup to the new nodes.
For safety precaution, do a full repair.
Would this work?
Our team also faced a similar situation. We run C* on Amazon EC2.
So first we prepared a snapshot of existing nodes and used them to create them for other datacenter(to avoid huge data transfer).
Procedure we followed:
Change replication strategy for all DC1 servers from simple-strategy to networkTopologyStrategy {DC1:x, DC2:y}
change cassandra.yaml
endpoint_snitch: GossipingPropertyFileSnitch
add a DC2 node IP to seeds list
others no need to change
change cassandra-rackdc.properties
dc=DC1
rack=RAC1
restart nodes one at a time.
restart seed node first
Alter the keyspace.
ALTER KEYSPACE keyspace_name WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'DC1' : x, 'DC2':y };
Do it for all keyspace in DC1
no need to repair.
verify if the system is stable by query
Add DC2 servers as new data center to DC1 data center
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html
in DC2 db, cassandra.yaml > auto_bootstrap: false
fix seeds, endpoint_snitch, cluster name
Node1 DC1 IP, Node2 DC2 IP as seeds.
recommended endpoint_snitch : GossipingPropertyFileSnitch
cluster name, same as DC1: test-cluster
fix gossiping-property-file-snith : cassandra-rackdc.properties
dc=DC2
rack=RAC1
bring DC2 nodes up one at a time
seed node first
change keyspace to networkTopologyStrategy {DC1:x, DC2:y}
since the DC2 db is copied from DC1, we should repair instead of rebuild
Yes, the approach should work. I've verified it with two knowledgeable people within the Cassandra community. Two pieces that are important to note, however:
That the snapshot is being taken efter the mutations have started being written to the new datacenter.
The backup must be fully imported before gc_grace_seconds after when the backup is taken. Otherwise you risk getting zombie data popping up.

Cassandra seed nodes and clients connecting to nodes

I'm a little confused about Cassandra seed nodes and how clients are meant to connect to the cluster. I can't seem to find this bit of information in the documentation.
Do the clients only contain a list of the seed node and each node delegates a new host for the client to connect to? Are seed nodes only really for node to node discovery, rather than a special node for clients?
Should each client use a small sample of random nodes in the DC to connect to?
Or, should each client use all the nodes in the DC?
Answering my own question:
Seeds
From the FAQ:
Seeds are used during startup to discover the cluster.
Also from the DataStax documentation on "Gossip":
The seed node designation has no purpose other than bootstrapping the gossip process
for new nodes joining the cluster. Seed nodes are not a single
point of failure, nor do they have any other special purpose in
cluster operations beyond the bootstrapping of nodes.
From these details it seems that a seed is nothing special to clients.
Clients
From the DataStax documentation on client requests:
All nodes in Cassandra are peers. A client read or write request can
go to any node in the cluster. When a client connects to a node and
issues a read or write request, that node serves as the coordinator
for that particular client operation.
The job of the coordinator is to act as a proxy between the client
application and the nodes (or replicas) that own the data being
requested. The coordinator determines which nodes in the ring should
get the request based on the cluster configured partitioner and
replica placement strategy.
I gather that the pool of nodes that a client connects to can just be a handful of (random?) nodes in the DC to allow for potential failures.
seed nodes serve two purposes.
they act as a place for new nodes to announce themselves to a cluster. so, without at least one live seed node, no new nodes can join the cluster because they have no idea how to contact non-seed nodes to get the cluster status.
seed nodes act as gossip hot spots. since nodes gossip more often with seeds than non-seeds, the seeds tend to have more current information, and therefore the whole cluster has more current information. this is the reason you should not make all nodes seeds. similarly, this is also why all nodes in a given data center should have the same list of seed nodes in their cassandra.yaml file. typically, 3 seed nodes per data center is ideal.
the cassandra client contact points simply provide the cluster topology to the client, after which the client may connect to any node in the cluster. as such, they are similar to seed nodes and it makes sense to use the same nodes for both seeds and client contacts. however, you can safely configure as many cassandra client contact points as you like. the only other consideration is that the first node a client contacts sets its data center affinity, so you may wish to order your contact points to prefer a given data center.
for more details about contact points see this question: Cassandra Java driver: how many contact points is reasonable?
Your answer is right. The only thing I would add is that it's recommended to use the same seed list (i.e. in your cassandra.yaml) across the cluster, as a "best practices" sort of thing. Helps gossip traffic propagate in nice, regular rates, since seeds are treated (very minimally) differently by the gossip code (see https://cwiki.apache.org/confluence/display/CASSANDRA2/ArchitectureGossip).

Resources