Joining a node to a cluster - cassandra

I have tried to do necessary configuration to deploy multiple instances of Cassandra on 2 different nodes of multi-node cluster. But the nodes are having trouble seeing each other. Can someone give me an advice how to join a node to my cluster?

To join a node to a cluster, the following need to match-up in the nodes' cassandra.yaml files:
cluster_name
endpoint_snitch
num_tokens
Get your first node running, and make sure the following ports are open on your firewall or internal network:
7000 (gossip)
7001 (if using node-to-node SSL)
7199 (JMX)
9042 (client connections)
On your second node, make sure the second node has the first node's IP address in its seed list. All of your nodes should share the same seed list, as well. Depending on the size of your cluster, you should have two or three per data center.
Example:
# seeds is actually a comma-delimited list of addresses.
- seeds: "192.168.0.100,192.168.0.101"
Once your seed nodes are set, fire-up your second node and it should join. If it doesn't check the system.log for errors.

Related

How do I start 2 nodes in the elastic-search cluster?

I can't find good information on making 2 node servers running in elastic-search cluster.
I just want a clear explanation of what commands I have to run or what I have to change in the elastic.yml config file.
Assuming you're not using automatic discovery:
identical for all nodes
cluster.name: My-Cluster-Name
specific per node
node.name: [node name]
network.host: [hostname]
http.port: 92xx
transport.tcp.port: 93xx
(If they run on the same host, they need different ports. If they run on two different hosts, you can use the same port numbers)
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["server1:9300","server2:9300","server2:9301"]
Here you list all the nodes in your cluster with host:port
discovery.zen.minimum_master_nodes: 2
#This must be set to minimum (number of nodes/2) + 1 to avoid "split brains". I.e. for two or three nodes you would set it to 2
discovery.zen.fd.ping_timeout: 600s
Not mandatory, but if the network has issues you don't want the cluster to start panicking too quickly
The important points are that if you don't use multicast discovery, all the nodes need to know the host and port of all other nodes, and this has to be set in the elasticsearch.yml files. If you only use one node per host you can use the default 9200 and 9300 ports.
Once the nodes are set up you just start them all and check the output. they should log that they have found the other nodes, and you can view the active nodes using the _cat/nodes API:
http://server1:9200/_cat/nodes?v&h=id,ip,port,v,m,d,fdp,r,get.current,n,u
Each node should have its own copy of the elasticsearch software and will store its own data in the /data folder.

Cassandra client port enable

How to enable cassandra port to connect with BI application. Here my setup with cassandra is of multiple nodes (192.xxx.xx.01,192.xxx.xx.02,192.xxx.xx.03). In this scenario which node will be acting like master / coordinator with my application.
Although i worked with listen_address, rpc_address, broadcast_rpc_address and seeds, I opened both tcp ports 9042 and 9160.
version: 3.10
Kindly, lead me to the right direction.
Cassandra uses master-less architecture.All nodes are equal in cassandra.
When you connect to one of the node that node act as co-ordinator node, any of the node can be co-ordinator.
The coordinator is selected by the driver based on the policy you have set. Common policies are DCAwareRoundRobinPolicy and TokenAware Policy.
For DCAwareRoundRobinPolicy, the driver selects the coordinator node based on its round robin policy. See more here: http://docs.datastax.com/en/drivers/java/2.1/com/datastax/driver/core/policies/DCAwareRoundRobinPolicy.html
For TokenAwarePolicy, it selects a coordinator node that has the data being queried - to reduce "hops" and latency. More info: http://docs.datastax.com/en/drivers/java/2.1/com/datastax/driver/core/policies/TokenAwarePolicy.html
native_transport_port is 9042 by default and clients use native transport by default.
Hence you should have connection from your BI to Cassandra host on port 9042.

What happen if I didn't put all the C* hosts in DevCenter?

Let's say I have 4 nodes: host1, host2, host3 and host4. However I only add host1 and host2 as Contact hosts. What would happen if I perform any operation in DevCenter? Will the action propagate to host3 and host4? Will this cause data corruption?
Here's what will happen:
DevCenter will use the Whitelist load balancing policy 1 to connect to the provided nodes
While DevCenter uses the DataStax Java driver as the underlying connector, it does use the above mentioned load balancing policy to reduce the time needed to obtain connections (instead of the default driver's load balancing policy which requires discovering all the nodes in the cluster and initiating connection pools to all those)
DevCenter will send the request to the nodes in the list you provided
If data is local to these nodes they will take care of the requests. If data is found on the other nodes in the cluster, the nodes used for the connection will act as coordinators (basically they'll relay the requests to the nodes having the data)
Bottom line there's no risk of data corruption and the results you get will be exactly the same as for connecting to all the nodes.

Will the DataStax Cluster class ever refresh IP address from the hostname given to builder.addContactPoint() if DNS changes?

I've a problem, once set host name, cluster wouldn't update it's IP, even in DNS changes.
Or what is the recommended way of making the application resilient to the fact that more nodes can be added to DNS round robin and old nodes decomissioned ?
I had same thing with Astyanax driver. For me it looks like it works this way:
DNS name is used only when initial connection to cluster is created. At this point driver collects data about cluster nodes. This information is kept in terms of IP addresses already and DNS names are not used any more. Sub-sequential changes in the cluster topology are propagated into the client also using IP addresses.
So, when you add more nodes to the cluster, you actually do not have to assign domain names to them. Just adding a node to the cluster propagates its IP address to the cluster topology table and this info is distributed among all cluster members and smart clients like Java Driver (some third party clients might not have this info and will use only seed nodes to pass queries to).
When you decommission node it works same way. Just all cluster nodes and smart clients receive information that node with a particular IP is not in the cluster any more. It can be even initial seed node.
->Domain name makes sense only for clients which hadn't established cluster connection.
In case you really need to switch IP you have to:
Join node with new IP
Decommission node with old IP
Assign DNS name to new IP

Cassandra big cluster configure the client connection

I've been looking to find how to configure a client to connect to a Cassandra cluster.
Independent of clients like Pelops, Hector, etc, what is the best way to connect to a multi-node Cassandra cluster?
Sending the string IP values works fine, but what about growing number cluster nodes in the future? Is maintaining synchronically ALL IP cluster nodes on client part?
Don't know if this answer all your questions but the growing cluster and your knowledge of clients ip are not related.
I have a 5 node cluster but the client(s) only knows 2 ip addresses: the seeds. Since each machine of the cluster knows about the seeds (each cassandra.yaml contains the seeds ip address) if new machine will be added information about new one will come "for free" on the client side.
Imagine a 5 nodes cluster with following ips
192.168.1.1
192.168.1.2 (seed)
192.168.1.3
192.168.1.4 (seed)
192.168.1.5
eg: the node .5 boot -- it will contact the seeds (node 2 and 4) and receive back information about the whole cluster. If you add a new 192.168.1.6 will behave exactly like the .5 and will point to the seeds to know the cluster situation. On the client side you don't have to change anything: you will just know that now you have 6 endpoints instead of 5.
ps: you don't have necessarily to connect to the seeds you can just connect to any node of since after having contacted the seeds each node knows the whole cluster topology
pps: it's your choice how many nodes to put in you "client known hosts", you can also put all 5 but this won't change the fact that if one node will be added you don't need to do anything on the client side
Regards,
Carlo
You will have an easier time letting the client track the state of each node. Smart clients will track endpoint state via the gossipinfo, which passes on new nodes as they appear in the cluster.

Resources