cassandra: how to connect cassandra cluster using lb? - cassandra

I am presently trying to setup cassandra cluster (4 nodes with 2 seeds) in our production env.
When i connect with comma seperated host name and port, it is working fine.
cluster = HFactory.getOrCreateCluster("Test Cluster", "host1:9160,host2:9160,host3:9160,host4:9160");
But when I configured a cluster name at the lb connecting to the individaul nodes and configured the same in Hector thrift client. But i got the below eception,
cluster = HFactory.getOrCreateCluster("Test Cluster", "lbname");
SEVERE: me.prettyprint.hector.api.exceptions.HectorException: All host pools marked down. Retry burden pushed out to client.
me.prettyprint.hector.api.exceptions.HectorException: All host pools marked down. Retry burden pushed out to client.
at me.prettyprint.cassandra.connection.HConnectionManager.getClientFromLBPolicy(HConnectionManager.java:393)
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:249)
at me.prettyprint.cassandra.service.AbstractCluster.describeKeyspace(AbstractCluster.java:199)
at com.july.storage.cassandra.util.CassandraDBUtil.getDb(CassandraDBUtil.java:107)
at com.july.storage.cassandra.util.CassandraDBUtil.hasTable(CassandraDBUtil.java:91)
at com.july.storage.cassandra.action.CassandraHandler.getCall(CassandraHandler.java:65)
at com.july.storage.service.StorageService.GET(StorageService.java:58)
at com.july.storage.cassandra.action.CassandraHandler.main(CassandraHandler.java:571)

Don't use a load balancer in front of Cassandra. Let your client connect to all of the nodes. A load balancer will just be a single point of failure, and add unnecessary latency.

Related

How do I enable cross-datacenter failover programatically in the Cassandra Java driver v4?

In datastax driver 3.x,we've DCAwareRoundRobin policy,which tries to connect to remote nodes if nodes in local datacenter fails.In the datastax drvier 4.x,we donot have that policy and confines to local-only.But,in the datastax docs,it's mentioned as:
Cross-datacenter failover is enabled with the following configuration option:
datastax-java-driver.advanced.load-balancing-policy.dc-failover {
max-nodes-per-remote-dc = 2
}
The driver will then attempt to open connections to nodes in remote datacenter.But,in the driver,we specify only a single datacenter to connect to as below:
CqlSession session = CqlSession.builder()
.addContactPoint(new InetSocketAddress("1.2.3.4", 9042))
.addContactPoint(new InetSocketAddress("5.6.7.8", 9042))
.withLocalDatacenter("datacenter1")
.build();
How the connection to remote datacenter is handled?Please help..
The quick answer is you don't do it programatically -- the Java driver does it for you after you've enabled advanced.load-balancing-policy.dc-failover.
The contact points are just the initial hosts the driver "contacts" to discover the cluster topology. After it has collected metadata about the cluster, it knows about nodes in remote DCs.
Since you've configured max-nodes-per-remote-dc = 2, the driver will add 2 nodes from each remote DC to the end of the query plan. The nodes in the local DC will be listed first then followed by the remote nodes. If the driver can't contact the nodes in the local DC, then it will start contacting the remote nodes in the query plan, one node at a time until it runs out of nodes to contact.
But I have to reiterate what I said in your other question, we do not recommend enabling DC-failover. For anyone else coming across this answer, you've been warned. Cheers!

Cassandra create pool with hosts on different ports

I have 10 Cassandra Nodes running on Kubernetes on my server and 1 contact point that expose the service on port 10023.
However, when the datastax driver tries to establish a connection with the other nodes of the cluster it uses the exposed port instead of the default one and i get the following error:
com.datastax.driver.core.ConnectionException: [/10.210.1.53:10023] Pool was closed during initialization
Is there a way to expose one single contact point and have it to communicate with the other nodes on the standard port (9042)?
i checked on the datastax documentation if there is anything related to it but i didn't find much.
this is how i connect to the cluster
Cluster.Builder builder = Cluster.builder();
builder.addContactPoints(address)
.withPort(Integer.valueOf(10023))
.withCredentials(user, password)
.withMaxSchemaAgreementWaitSeconds(600)
.withSocketOptions(
new SocketOptions()
.setConnectTimeoutMillis(Integer.valueOf(timeout))
.setReadTimeoutMillis(Integer.valueOf(timeout))
).build();
Cluster cluster = builder.withoutJMXReporting().build();
Session session = cluster.connect();
After driver contacts first node, it fetches information about cluster, and use this information, and this information includes on what ports Cassandra listens.
To implement what you want to do, you need that Cassandra listened on the corresponding port - this is configured via native_transport_port parameter of the cassandra.yaml.
Also, by default Cassandra driver will try to connect to all nodes in cluster because it uses DCAware/TokenAware load balancing policy. If you want to use only one node, then you need to use WhiteListPolicy instead of default policy. But is not optimal from the performance point of view.
I would suggest to re-think how you expose Cassandra to clients.

Apache Cassandra Server and Datastax Client - Changing IP Addresses

We are using the latest Apache Cassandra database server, and the Datastax Node.js client, running in the cloud.
When our Cassandra servers are rebuilt, they get new IP addresses. Then any running service clients can't find the new servers, the client driver obviously must cache the IP addresses, instead of using DNS.
Is there some way around this problem, other than doing client shutdown and get a new client, in our services when we encounter an error accessing the database?
If you only have 1 server, there is nothing you can do.
Otherwise the node when it rebuilds (if it is a single node in the cluster of many) will advertise the new IP to the cluster and cluster topology is updated. So the peers table will be updated and the driver can register this event (AFAIK).
But why not use private static addresses for your cassandra nodes?

Apache Spark Error creating pool to EC2 Cassandra

My configuration are:
1 Spark machine on EC2: c3.2xlarge.
Communicating with 4 nodes of Cassandra on EC2.
I am getting the following error:
16/08/03 22:41:10 ERROR Session: Error creating pool to /XX.XX.XXX.XX:9042
com.datastax.driver.core.TransportException: [/XX.XX.XXX.XX:9042] Cannot connect
The XX are the public IP of the EC2 cassandra.
However inside my spark configuration: Im telling spark to use the seeder internal IP node, which then the spark-connector driver receive the information from Cassandra the public IP.
How my IT set the clusters up, I am assuming the following:
ERROR Session: Error creating pool to /127.0.0.1:9042
However I don't want to have my clusters connect via public IP and open up the firewall. I would like to have it stay to the internal IP of the cluster.
Is there a way to do this Spark code level wise or cassandra.yml configuration wise?

What happen if I didn't put all the C* hosts in DevCenter?

Let's say I have 4 nodes: host1, host2, host3 and host4. However I only add host1 and host2 as Contact hosts. What would happen if I perform any operation in DevCenter? Will the action propagate to host3 and host4? Will this cause data corruption?
Here's what will happen:
DevCenter will use the Whitelist load balancing policy 1 to connect to the provided nodes
While DevCenter uses the DataStax Java driver as the underlying connector, it does use the above mentioned load balancing policy to reduce the time needed to obtain connections (instead of the default driver's load balancing policy which requires discovering all the nodes in the cluster and initiating connection pools to all those)
DevCenter will send the request to the nodes in the list you provided
If data is local to these nodes they will take care of the requests. If data is found on the other nodes in the cluster, the nodes used for the connection will act as coordinators (basically they'll relay the requests to the nodes having the data)
Bottom line there's no risk of data corruption and the results you get will be exactly the same as for connecting to all the nodes.

Resources