Cassandra - Connection and Data Insertion - cassandra

Below is my basic program
public static void main(String[] args) {
Cluster cluster;
Session session;
cluster = Cluster.builder().addContactPoint("192.168.20.131").withPort(9042).build();
System.out.println("Connection Established");
cluster.close();
}
Now i want to know that i have a 7 node cluster and i have cassandra instance running on all 7 nodes. Assuming that the above mentioned IP address is my entry point how does it actually works. Suppose some other user try to run a program on any other cassandra node out of 7 so which IP will be entered as Contact Point. Or do i have to add all the 7 nodes IP addresses comma separated in my main() method ..?

As it is described here,
The driver discovers the nodes that constitute a cluster by querying
the contact points used in building the cluster object. After this it
is up to the cluster's load balancing policy to keep track of node
events (that is add, down, remove, or up) by its implementation of the
Host.StateListener interface.

Related

Will using DNS failover work as a Multi-DC failover strategy?

If I have a multi-DC cluster, DC1 and DC2, where DC2 is only used for failover. And in the driver on the client side, I define the contact points using the domain names (foo1.net, foo2.net, and foo3.net). I have foo* pointing to DC1 and if I ever detect any error with DC1, I will make the DNS route foo* to point to DC2.
This approach seems to work on paper, but will it actually work? Any issues with this approach?
In the case of the DataStax Java Driver 3.x this will not work since DNS is only evaluated at the beginning of Cluster instantiation.
The contact points provided are resolved using DNS via InetAddress.getAllByName in Cluster.Builder.addContactPoint:
public Builder addContactPoint(String address) {
// We explicitly check for nulls because InetAdress.getByName() will happily
// accept it and use localhost (while a null here almost likely mean a user error,
// not "connect to localhost")
if (address == null)
throw new NullPointerException();
try {
addContactPoints(InetAddress.getAllByName(address));
return this;
} catch (UnknownHostException e) {
throw new IllegalArgumentException("Failed to add contact point: " + address, e);
}
}
If DNS is changed during the lifecycle of the Cluster, the driver will not be aware of this unless you construct a new Cluster.Builder instance and create a new Cluster from it.
I prefer a design that pushes Data Center failover outside the scope of your application and into a higher level of your architecture. Instead of making your client application responsible for failing over, you should run instances of your clients colocated in each C* data center. Your application load balancer/router/DNS could direct traffic to instances of your application in other data centers when data centers become unavailable.

Datastax Cassandra C/C++ driver cass_cluster_set_blacklist_filtering functionality

Datastax C/C++ driver has a blacklist filtering functionality as part of its load balancing controls.
https://docs.datastax.com/en/developer/cpp-driver/2.5/topics/configuration/
Correct me If I missing something but my understanding is that a CQL client can't connect to blacklisted hosts.
I'm using C/C++ driver v2.5 and the below codeblock and trying to connect to a multinode cluster:
CassCluster* cluster = cass_cluster_new();
CassSession* session = cass_session_new();
const char* hosts = "192.168.57.101";
cass_cluster_set_contact_points(cluster, hosts);
cass_cluster_set_blacklist_filtering(cluster, hosts);
CassFuture* connect_future = cass_session_connect(session, cluster);
In this codeblock the host to which the CQL client is trying to connect is set as blacklisted. However, CQL client seems to connect to this host and executes any queries. Is there something wrong with the above codeblock? If not so, is this the expected behavior? Does it behaves differently because it is a multinode cluster and establish connection to the other peers?
Any help will be appreciated.
Thank you in advance
Since you are supplying only one contact point, that IP address is being used to establish the control connection into the cluster. Once that control connection is established and the peers table is read to determine other nodes available in the cluster, connections are made to those other nodes. At this point all queries will be routed to those other nodes and not your initial/blacklisted contact point; however the connection to the initial contact point will remain as it is the control connection into the cluster.
To get a better look at what is going on inside the driver you can enable logging in the driver. Here is an example to enable logging via the console:
void on_log(const CassLogMessage* message, void* data) {
fprintf(stderr, "%u.%03u [%s] (%s:%d:%s): %s\n",
(unsigned int) (message->time_ms / 1000),
(unsigned int) (message->time_ms % 1000),
cass_log_level_string(message->severity),
message->file, message->line, message->function,
message->message);
}
/* Log configuration *MUST* be done before any other driver call */
cass_log_set_level(CASS_LOG_TRACE);
cass_log_set_callback(on_log, NULL);
In order to reduce the extra connection on a node that will be blacklisted you can supply a different contact point into the cluster that is not the same as the node (or nodes) that will be blacklisted.

How to make workers to query only local cassandra nodes?

Suppose I have several machines each having spark worker and cassandra node installed. Is it possible to require each spark worker to query only its local cassandra node (on the same machine), so that no network operation involved when I do joinWithCassandraTable after repartitionByCassandraReplica using spark-cassandra-connector, so each spark worker fetches data from its local storage?
Inside the Spark-Cassandra connector, the LocalNodeFirstLoadBalancingPolicy handles this work. It prefers local nodes first, then checks for nodes in the same DC. Specifically local nodes are determined using java.net.NetworkInterface to find an address in the host list that matches one in the list of local addresses, as follows:
private val localAddresses =
NetworkInterface.getNetworkInterfaces.flatMap(_.getInetAddresses).toSet
/** Returns true if given host is local host */
def isLocalHost(host: Host): Boolean = {
val hostAddress = host.getAddress
hostAddress.isLoopbackAddress || localAddresses.contains(hostAddress)
}
This logic is used in the creation of the query plan, which returns a list of candidate hosts for the query. Regardless of the plan type (token aware or unaware), the first host in the list is always the local host if it exists.

Is there any configuration so that local program on C* node can connect using localhost and remote program using IP/name?

I have two node C* cluster and on these two nodes I want to run spark jobs locally. Inside sparkJob I have to put connection url as localhost so that it will insert data to local C* instance( I am using Cassandra nodes as my spark Job's slaves for execution via Mesos)
Problem is if I change rpc_address=localhost in cassandra.yml then I can connect locally using Spark job(with localhost as connection url) or cqlsh localhost but remote applications cannot connect to node using IP in connection url.
I am using apache-cassandra-2.2.0.
is there any configuration so that local program on C* node can connect using localhost as connection url and remote program's using IP/name in connection url?
I could think of one way this can be achieved is by Extending from RoundRobinPolicy. (Note: This is for JAVA driver, it could be similar for Spark Cassandra driver)
Override two methods
static class NewRoundRobinPolicy extends RoundRobinPolicy{
distanct(Host){
return HostDistance.LOCAL;
}
#Override
public Iterator<Host> newQueryPlan(String loggedKeyspace, Statement statement) {
Iterator<Host> hosts = super.newQueryPlan(loggedKeyspace, statement);
final List<Host> hostList = new ArrayList<Host>();
while(hosts.hasNext()){
hostList.add(hosts.next());
}
return new AbstractIterator<Host>() {
#Override
protected Host computeNext() {
Host host =null;
for(int i =0; i < hostList.size(); i++){
if(hostList.get(i).getAddress().getHostAddress().equals("YOUR IP ADDRESS HERE")){
host = hostList.get(i);
}
}
return host;
}
};
}
}

Set preferred listen address in weblogic 11g

I have a WebLogic 11g domain with 1 admin server and 4 managed servers running on 2 machines. Each machine has 3 ip addresses, but only one of those addresses is seen by another machine. Each machine is running a node manager which seems to communicate fine between each other and admin server. Though when managed server starts on the second machine it can't communicate to admin server because it uses wrong ip address. It appears that when weblogic starts it maps itself to all ip addresses, but selects wrong one as the first one i.e. default. That's why managed servers recieve wrong information from node manager.
Is there a way to set preffered listen address in weblogic 11g, but still allow it to listen to all other addresses either? How does weblogic get list of ip addresses? Is the order of them OS-dependent?
Does this answer the question? I believe if you play with the scripts in /etc/sysconfig, you'll affect the loading order and thence the enumeration order. I must admit, I don't have a RH box here to confirm that suspicion.
Weblogic uses the NetworkInterface.getNetworkInterfaces() method and his own logic to set the order of the listen addresses. This logic is changed from 10.3.2 to 10.3.4.
The relevant code is in the method getAllAddresses of the class weblogic.server.channels.AddressUtils$AddressMaker in weblogic.jar
You can check the order with a simple test:
import java.net.*;
import weblogic.server.channels.*;
public class TestIP_WLS {
public static void main(String args[]) throws UnknownHostException {
System.out.println("=== AddressUtils.getIPAny()");
InetAddress addrs[] = AddressUtils.getIPAny();
for (InetAddress addr : addrs) {
System.out.println("*** " + addr);
}
}
}

Resources