Apache Cassandra Server and Datastax Client - Changing IP Addresses

Apache Cassandra Server and Datastax Client - Changing IP Addresses - node.js

We are using the latest Apache Cassandra database server, and the Datastax Node.js client, running in the cloud.
When our Cassandra servers are rebuilt, they get new IP addresses. Then any running service clients can't find the new servers, the client driver obviously must cache the IP addresses, instead of using DNS.
Is there some way around this problem, other than doing client shutdown and get a new client, in our services when we encounter an error accessing the database?

If you only have 1 server, there is nothing you can do.
Otherwise the node when it rebuilds (if it is a single node in the cluster of many) will advertise the new IP to the cluster and cluster topology is updated. So the peers table will be updated and the driver can register this event (AFAIK).
But why not use private static addresses for your cassandra nodes?

Related

Cassandra create pool with hosts on different ports

I have 10 Cassandra Nodes running on Kubernetes on my server and 1 contact point that expose the service on port 10023.
However, when the datastax driver tries to establish a connection with the other nodes of the cluster it uses the exposed port instead of the default one and i get the following error:
com.datastax.driver.core.ConnectionException: [/10.210.1.53:10023] Pool was closed during initialization
Is there a way to expose one single contact point and have it to communicate with the other nodes on the standard port (9042)?
i checked on the datastax documentation if there is anything related to it but i didn't find much.
this is how i connect to the cluster
Cluster.Builder builder = Cluster.builder();
builder.addContactPoints(address)
.withPort(Integer.valueOf(10023))
.withCredentials(user, password)
.withMaxSchemaAgreementWaitSeconds(600)
.withSocketOptions(
new SocketOptions()
.setConnectTimeoutMillis(Integer.valueOf(timeout))
.setReadTimeoutMillis(Integer.valueOf(timeout))
).build();
Cluster cluster = builder.withoutJMXReporting().build();
Session session = cluster.connect();

After driver contacts first node, it fetches information about cluster, and use this information, and this information includes on what ports Cassandra listens.
To implement what you want to do, you need that Cassandra listened on the corresponding port - this is configured via native_transport_port parameter of the cassandra.yaml.
Also, by default Cassandra driver will try to connect to all nodes in cluster because it uses DCAware/TokenAware load balancing policy. If you want to use only one node, then you need to use WhiteListPolicy instead of default policy. But is not optimal from the performance point of view.
I would suggest to re-think how you expose Cassandra to clients.

Cassandra client connection to multiple addresses

I have a question about Cassandra.
Is it possible to open Cassandra client connection on many IPs?
On my server I have 2 network cards (eth0 and eth1) with IP 10.197.11.21 (eth0) and 192.168.0.45 (eth1) and 127.0.0.1 (lo).
I want my client to connect to Cassandra database with this three IP
in localhost, 10.197.11.21 and 192.168.0.45
For the moment I can choose only 1 IP, what does it do to modify in the file cassandra.yaml ?

You need to set rpc_address: 0.0.0.0 in cassandra.yaml
Note that when you set rpc_address to 0.0.0.0, you also must set broadcast_rpc_address to something other than 0.0.0.0 (e.g. 10.197.11.21).
rpc_address is the address that Cassandra listens on for connections from clients
listen_address is the address that Cassandra listens on for connections from other Cassandra nodes (not client connections)
broadcast_rpc_address is the address that Cassandra broadcasts to clients that are attempting to discover the other nodes in the cluster. When an application first connects to a Cassandra cluster, the cluster sends the application a list of all the nodes in the cluster, and their IP addresses. The IP address sent to the application is the broadcast_ip_address (side-note: cassandra actually sends all IP addresses, this is just the one that it tells the client to connect on). This allows the application to auto-discover all the nodes in the cluster, even if only one IP address was given to the application. This also allows applications to handle situations like a node going offline, or new nodes being added.
Even though your broadcast_rpc_address can only point to one of those two IP addresses, you application can still connect to either one. However, your application will also attempt to connect to other nodes via the broadcast_rpc_addresses sent back by the cluster. You can get around this by providing a full list of the address of every node in the cluster to your application, but the best solution is to build a driver-side address translator.

Will the DataStax Cluster class ever refresh IP address from the hostname given to builder.addContactPoint() if DNS changes?

I've a problem, once set host name, cluster wouldn't update it's IP, even in DNS changes.
Or what is the recommended way of making the application resilient to the fact that more nodes can be added to DNS round robin and old nodes decomissioned ?

I had same thing with Astyanax driver. For me it looks like it works this way:
DNS name is used only when initial connection to cluster is created. At this point driver collects data about cluster nodes. This information is kept in terms of IP addresses already and DNS names are not used any more. Sub-sequential changes in the cluster topology are propagated into the client also using IP addresses.
So, when you add more nodes to the cluster, you actually do not have to assign domain names to them. Just adding a node to the cluster propagates its IP address to the cluster topology table and this info is distributed among all cluster members and smart clients like Java Driver (some third party clients might not have this info and will use only seed nodes to pass queries to).
When you decommission node it works same way. Just all cluster nodes and smart clients receive information that node with a particular IP is not in the cluster any more. It can be even initial seed node.
->Domain name makes sense only for clients which hadn't established cluster connection.
In case you really need to switch IP you have to:
Join node with new IP
Decommission node with old IP
Assign DNS name to new IP

How to check the client connected to which node in the cluster

We want to check the client connections going to which server in the cluster. Is there any way to track it?
I saw some audit logs which has host files and source files.
host:FQDN/5.6.7.8|source:/1.2.3.4
Is this means the client from 1.2.3.4 ipaddress is connected to server 5.6.7.8 to run the query?
5.6.7.8 is the coordinator node here for this session?

Yes,
indeed. Your assumption is totally correct. When you take a look at the "CQL Logging examples" part in the Datastax' Cassandra documentation:
http://www.datastax.com/docs/datastax_enterprise3.0/security/data_auditing
It explains that in your example 5.6.7.8 is the Cassandra node coordinating the requests from the client with IP 1.2.3.4.

Cassandra big cluster configure the client connection

I've been looking to find how to configure a client to connect to a Cassandra cluster.
Independent of clients like Pelops, Hector, etc, what is the best way to connect to a multi-node Cassandra cluster?
Sending the string IP values works fine, but what about growing number cluster nodes in the future? Is maintaining synchronically ALL IP cluster nodes on client part?

Don't know if this answer all your questions but the growing cluster and your knowledge of clients ip are not related.
I have a 5 node cluster but the client(s) only knows 2 ip addresses: the seeds. Since each machine of the cluster knows about the seeds (each cassandra.yaml contains the seeds ip address) if new machine will be added information about new one will come "for free" on the client side.
Imagine a 5 nodes cluster with following ips
192.168.1.1
192.168.1.2 (seed)
192.168.1.3
192.168.1.4 (seed)
192.168.1.5
eg: the node .5 boot -- it will contact the seeds (node 2 and 4) and receive back information about the whole cluster. If you add a new 192.168.1.6 will behave exactly like the .5 and will point to the seeds to know the cluster situation. On the client side you don't have to change anything: you will just know that now you have 6 endpoints instead of 5.
ps: you don't have necessarily to connect to the seeds you can just connect to any node of since after having contacted the seeds each node knows the whole cluster topology
pps: it's your choice how many nodes to put in you "client known hosts", you can also put all 5 but this won't change the fact that if one node will be added you don't need to do anything on the client side
Regards,
Carlo

You will have an easier time letting the client track the state of each node. Smart clients will track endpoint state via the gossipinfo, which passes on new nodes as they appear in the cluster.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string