Cassandra client connection to multiple addresses - cassandra

I have a question about Cassandra.
Is it possible to open Cassandra client connection on many IPs?
On my server I have 2 network cards (eth0 and eth1) with IP 10.197.11.21 (eth0) and 192.168.0.45 (eth1) and 127.0.0.1 (lo).
I want my client to connect to Cassandra database with this three IP
in localhost, 10.197.11.21 and 192.168.0.45
For the moment I can choose only 1 IP, what does it do to modify in the file cassandra.yaml ?

You need to set rpc_address: 0.0.0.0 in cassandra.yaml
Note that when you set rpc_address to 0.0.0.0, you also must set broadcast_rpc_address to something other than 0.0.0.0 (e.g. 10.197.11.21).
rpc_address is the address that Cassandra listens on for connections from clients
listen_address is the address that Cassandra listens on for connections from other Cassandra nodes (not client connections)
broadcast_rpc_address is the address that Cassandra broadcasts to clients that are attempting to discover the other nodes in the cluster. When an application first connects to a Cassandra cluster, the cluster sends the application a list of all the nodes in the cluster, and their IP addresses. The IP address sent to the application is the broadcast_ip_address (side-note: cassandra actually sends all IP addresses, this is just the one that it tells the client to connect on). This allows the application to auto-discover all the nodes in the cluster, even if only one IP address was given to the application. This also allows applications to handle situations like a node going offline, or new nodes being added.
Even though your broadcast_rpc_address can only point to one of those two IP addresses, you application can still connect to either one. However, your application will also attempt to connect to other nodes via the broadcast_rpc_addresses sent back by the cluster. You can get around this by providing a full list of the address of every node in the cluster to your application, but the best solution is to build a driver-side address translator.

Related

Apache Cassandra Server and Datastax Client - Changing IP Addresses

We are using the latest Apache Cassandra database server, and the Datastax Node.js client, running in the cloud.
When our Cassandra servers are rebuilt, they get new IP addresses. Then any running service clients can't find the new servers, the client driver obviously must cache the IP addresses, instead of using DNS.
Is there some way around this problem, other than doing client shutdown and get a new client, in our services when we encounter an error accessing the database?
If you only have 1 server, there is nothing you can do.
Otherwise the node when it rebuilds (if it is a single node in the cluster of many) will advertise the new IP to the cluster and cluster topology is updated. So the peers table will be updated and the driver can register this event (AFAIK).
But why not use private static addresses for your cassandra nodes?

What is the difference between broadcast_address and broadcast_rpc_address in cassandra.yaml?

GOAL: I am trying to understand the best way to configure my Cassandra cluster so that several different drivers across several different networking scenarios can communicate with it properly.
PROBLEM/QUESTION: It is not entirely clear to me, after reading the documentation what the difference is between these two settings: broadcast_address and broadcast_rpc_address as it pertains to the way that a driver connects and interacts with the cluster. Which one or which combination of these settings should I use with my node's accessible network endpoint (DNS record attainable by the client's/drivers)?
Here is the documentation for broadcast_address from datastax:
(Default: listen_address)note The IP address a node tells other nodes in the cluster to contact it by. It allows public and private address to be different. For example, use the broadcast_address parameter in topologies where not all nodes have access to other nodes by their private IP addresses.
If your Cassandra cluster is deployed across multiple Amazon EC2 regions and you use the EC2MultiRegionSnitch, set the broadcast_address to public IP address of the node and the listen_address to the private IP.
Here is the documentation for broadcast_rpc_address from datastax:
(Default: unset)note RPC address to broadcast to drivers and other Cassandra nodes. This cannot be set to 0.0.0.0. If blank, it is set to the value of the rpc_address or rpc_interface. If rpc_address or rpc_interfaceis set to 0.0.0.0, this property must be set.
EDIT: This question pertains to Cassandra version 2.1, and may not be relevant in the future.
One of the users of #cassandra on freenode was kind enough to provide an answer to this question:
The rpc family of settings pertain to drivers that use the Thrift protocol to communicate with cassandra. For those drivers that use the native transport, the broadcast_address will be reported and used.
My test case confirms this.

Will the DataStax Cluster class ever refresh IP address from the hostname given to builder.addContactPoint() if DNS changes?

I've a problem, once set host name, cluster wouldn't update it's IP, even in DNS changes.
Or what is the recommended way of making the application resilient to the fact that more nodes can be added to DNS round robin and old nodes decomissioned ?
I had same thing with Astyanax driver. For me it looks like it works this way:
DNS name is used only when initial connection to cluster is created. At this point driver collects data about cluster nodes. This information is kept in terms of IP addresses already and DNS names are not used any more. Sub-sequential changes in the cluster topology are propagated into the client also using IP addresses.
So, when you add more nodes to the cluster, you actually do not have to assign domain names to them. Just adding a node to the cluster propagates its IP address to the cluster topology table and this info is distributed among all cluster members and smart clients like Java Driver (some third party clients might not have this info and will use only seed nodes to pass queries to).
When you decommission node it works same way. Just all cluster nodes and smart clients receive information that node with a particular IP is not in the cluster any more. It can be even initial seed node.
->Domain name makes sense only for clients which hadn't established cluster connection.
In case you really need to switch IP you have to:
Join node with new IP
Decommission node with old IP
Assign DNS name to new IP

Connecting to BAM internal cassandra

Now I am trying BAM 2.3.0 and I want to know the way to connect to BAM internal Cassandra from different server. Is it possible or it is tightly coupled?
No it is not tightly coupled. Similar to making standalone Cassandra cluster you must do the configuration but since you not creating a cluster but to access from external server no need to give seed addesses. Just configure the listen and rpc address. The location of the cassandra.yaml is BAM_HOME/repository/conf/etc.
In the cassandra.yaml change listen_address and rpc_address to your IP address. If you put 127.0.0.1 the Cassandra will only listen to the connections coming from the localhost, therefore you cannot access from outside.

Cassandra big cluster configure the client connection

I've been looking to find how to configure a client to connect to a Cassandra cluster.
Independent of clients like Pelops, Hector, etc, what is the best way to connect to a multi-node Cassandra cluster?
Sending the string IP values works fine, but what about growing number cluster nodes in the future? Is maintaining synchronically ALL IP cluster nodes on client part?
Don't know if this answer all your questions but the growing cluster and your knowledge of clients ip are not related.
I have a 5 node cluster but the client(s) only knows 2 ip addresses: the seeds. Since each machine of the cluster knows about the seeds (each cassandra.yaml contains the seeds ip address) if new machine will be added information about new one will come "for free" on the client side.
Imagine a 5 nodes cluster with following ips
192.168.1.1
192.168.1.2 (seed)
192.168.1.3
192.168.1.4 (seed)
192.168.1.5
eg: the node .5 boot -- it will contact the seeds (node 2 and 4) and receive back information about the whole cluster. If you add a new 192.168.1.6 will behave exactly like the .5 and will point to the seeds to know the cluster situation. On the client side you don't have to change anything: you will just know that now you have 6 endpoints instead of 5.
ps: you don't have necessarily to connect to the seeds you can just connect to any node of since after having contacted the seeds each node knows the whole cluster topology
pps: it's your choice how many nodes to put in you "client known hosts", you can also put all 5 but this won't change the fact that if one node will be added you don't need to do anything on the client side
Regards,
Carlo
You will have an easier time letting the client track the state of each node. Smart clients will track endpoint state via the gossipinfo, which passes on new nodes as they appear in the cluster.

Resources