Can we connect to Cassandra using cluster name and not IP address? - cassandra

My Cassandra cluster name is: "Test Cluster"
It has three nodes with three separate IPs.
I am able to connect using the IP and password.
I need to know if there is a possible way to connect using Cluster Name and not IP addresses.

You can't use only cluster name - driver need somehow to find at least one node to connect to cluster. You can use host name although, not IP, and then driver will discover the other nodes in the cluster after connection to any node of the cluster.

You can connect your node using your hostname or IP address from the CQLSH. Generally we connect locally without IP if your node resolves the hostname and you configured /etc/hosts
You cannot connect using Cluster name in Cassandra.

Related

Cassandra inter DC sync over VPN on GCP

I have an VPN between the company network 172.16.0.0/16 and GCP 10.164.0.0/24
On GCP there is a cassandra cluster running with 3 instances. These instances get dynamical local ip adresses - for example 10.4.7.4 , 10.4.6.5, 10.4.3.4.
My issue: from the company network I cannot access 10.4x addresses as the tunnel works only for 10.164.0.0/24.
I tried setting up an LB service on 10.164.0.100 with the cassandra nodes behind. This doesnt work: when I configure that ip adress as seed node on local cluster, it gets an reply from one of the 10.4.x ip addresses, which it doesnt have in its seed list.
I need advice how to setup inter DC sync in this scenario.
IP addresses which K8s assign to Pods and Services are internal cluster-only addresses which are not accessible from outside of the cluster. It is possible by some CNI to create connection between in-cluster addresses and external networks, but I don't think that is a good idea in your case.
You need to expose your Cassandra using Service with NodePort or LoadBalancer type. That is another one answer with a same solution from Kubernetes Github.
If you will add a Service with type NodePort, your Cassandra will be available on a selected port on all Kubernetes nodes.
If you will choose LoadBalancer, Kubernetes will create for you Cloud Load Balancer which will be an entrypoint for Cassandra. Because you have a VPN to your VPC, I think you will need an Internal Load Balancer.

How to find the IP addresses of different nodes on a HDInsight cluster

I have a HDInsight cluster running Storm. This cluster was set up (not by me) without DNS, so it can only be accessed by using the IP address in the URL. How to find the IP address for the head node? I have accessed it before, but the IP address was given to me by someone. In future we will create additional clusters like this and I want to know the general method for finding the head node IP address
Ideally using CLI or the portal
EDIT: The HDInsight cluster is already in a VNET.
Here is a more succinct version of my question. How do I find the IP address without using the cluster name?
ssh into your cluster's head node and run $curl ifconfig.me command to get the IP address of head nodes.
You can get fully qualified domain names (FQDN) of all nodes on the cluster using below bash command (Replace $CLUSTERNAME with your cluster name).
curl -u admin -sS -G "https://$CLUSTERNAME.azurehdinsight.net/api/v1/clusters/$CLUSTERNAME/hosts" \
| jq '.items[].Hosts.host_name'
It may ask you to install "jq". The above command will return the list of nodes as below
"hn0-mydemo.lle2qymtat0ehndwwaba2j1gih.dx.internal.cloudapp.net"
"hn1-mydemo.lle2qymtat0ehndwwaba2j1gih.dx.internal.cloudapp.net"
"wn0-mydemo.lle2qymtat0ehndwwaba2j1gih.dx.internal.cloudapp.net"
"wn1-mydemo.lle2qymtat0ehndwwaba2j1gih.dx.internal.cloudapp.net"
"wn2-mydemo.lle2qymtat0ehndwwaba2j1gih.dx.internal.cloudapp.net"
"wn3-mydemo.lle2qymtat0ehndwwaba2j1gih.dx.internal.cloudapp.net"
"zk0-mydemo.lle2qymtat0ehndwwaba2j1gih.dx.internal.cloudapp.net"
"zk2-mydemo.lle2qymtat0ehndwwaba2j1gih.dx.internal.cloudapp.net"
"zk3-mydemo.lle2qymtat0ehndwwaba2j1gih.dx.internal.cloudapp.net"
You can SSH into your worker nodes from within your headnode SSH session using below command
ssh sshuser#wn0-mydemo
once you SSH into worker node run $curl ifconfig.me command to get the IP address of worker nodes
Usually, both the head nodes have same IP address and all the 4 worker nodes have same IP address.
the answer is that you can find the IP addresses if you navigate to the Overview of the Virtual Network resource.
Azure HDInsight cluster uses Secure Shell (SSH) to securely connect to Hadoop on Azure HDInsight Storm.
For more details, refer "Connect to HDInsight".
Note: Azure HDInsight cluster will not allow any public IP to be assigned to it.
If you need a IP to be allocated to HDInsight cluster, you would have to create HDInsight in a VNET.
For more details, refer “Extend Azure HDInsight using an Azure Virtual Network”.

Accessing Spark in Azure HDInsights via JDBC

I'm able to connect to hive externally using the following URL for a HDInsight cluster in Azure.
jdbc:hive2://<host>:443/default;transportMode=http;ssl=true;httpPath=/
However, I'm not able to find such a string for spark. The documentation says the port is 10002, but its not open externally. How do I connect to the cluster to run SparkSQL queries through JDBC?
There is not one available. But you can vote for the feature at https://feedback.azure.com/forums/217335-hdinsight/suggestions/14794632-create-a-jdbc-driver-for-spark-on-hdinsight.
HDInsight is deployed with a gateway. This is the reason why HDInsight clusters out-of-box enable only HTTPS (Port 443) and SSH (Ports 22, 23) communication to the cluster. If you don' t deploy the cluster in a virtual network (vnet) there is no other way you can communicate with HDInsight clusters. So instead of Port 10002 Port 443 is used if you want to access the Spark thrift server. If you deploy the cluster in a vnet, you could also access the thrift server via the ip address it is running on (one of the headnodes) and standard port 10002. See also public and non-public ports in the documentation.

Access Cassandra Node on the Azure Cloud from outside

I have created a Linux VM with a single node Cassandra cluster installed.
Cassandra.yaml has the following:
seeds:
listen address:
rpc address:
netstat -an check with all required port are up and listening. (i.e. 9160, 9042)
I am trying to connect my application which is outside of the Azure cloud to access the cassandra cluster in the cloud. Looks like the connection between the outside host to the Azure cloud Cassandra node has been block.
Wonder if there is a true restriction to access Azure VM from out of network. Is there a way to access this cassandra node from outside?
If someone can answer my question would be very nice.
Thank you!
You need to go to the "Endpoints" of your virtual machine:
At the bottom click on "Add", and add new endpoints for these ports.
Then you will need to manage ACL for each endpoint, defining the IP ranges of the allowed and blocked IP addresses.
Keep in mind that, if the internal IP that is used by the virtual machine, is different from external (public) IP, that is used by the client, then depending on a driver you may need to teach it how to do address translation. Otherwise, the cluster will report only internal IPs upon the discovery request, which will obviously be not accessible from outside.
From this and from the security prospective I would recommend setting up cassandra cluster inside of the virtual network, and accessing it via VPN.
There is a comprehensive tutorial how to do it here: http://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-nodejs-running-cassandra/

Cassandra multi datacenter setup configuration

I am looking for some help setting up a multi-datacenter cassandra cluster. My current setup is MySQL master / slave where MySQL replicates to a disaster-recovery center over the public IP of the remote datacenter and communicates with slaves in the main datacenter via the private IPs. While VPN'd into the network I can connect to any database via the public IP which is nice for testing / troubleshooting and app servers connect to the database via private IPs.
With Cassandra the disaster recovery center would become a second datacenter. I'm hoping to get advice on how each datacenter should be configured. I'm a bit confused over the listen_address and the broadcast_address. Currently, the seeds in the cassandra.yaml file are the public IPs and everything else is the private IP and this works for the app servers. If the rpc_address is configured with the private IP or public IP then connections are only accepted on that address. Connections are accepted via both if the rpc_address is 0.0.0.0 but, in my testing, that causes failover to fail. I have a java program running that inserts a row and reads it back in a loop. If the rpc_address is a specific address I can stop a node and the program continues (replication factor of 2 for testing). If the rpc_address is 0.0.0.0 then the program bombs because it cannot connect to the cluster even tho both node addresses are specified in the Cluster() constructor.
So, with that long lead in, my questions are:
What is the recommended configuration for the nodes to communicate with one another over the private IP for a given datacenter but also communicate to the other datacenter over the public IP. It seems the EC2 Multi snitch accomplishes this but we're not on Amazon's cloud so the code seems specific to amazon.
For development and debugging, can developers VPN into our network and connect via the public IP and have the app servers connect on the private IP? This works fine with rpc_address of 0.0.0.0 however failover doesn't work which was not unexpected given the warning comment in the cassandra.yaml file.
I really appreciate the help.
Thanks,
Greg

Resources