Cassandra multi datacenter setup configuration - cassandra

I am looking for some help setting up a multi-datacenter cassandra cluster. My current setup is MySQL master / slave where MySQL replicates to a disaster-recovery center over the public IP of the remote datacenter and communicates with slaves in the main datacenter via the private IPs. While VPN'd into the network I can connect to any database via the public IP which is nice for testing / troubleshooting and app servers connect to the database via private IPs.
With Cassandra the disaster recovery center would become a second datacenter. I'm hoping to get advice on how each datacenter should be configured. I'm a bit confused over the listen_address and the broadcast_address. Currently, the seeds in the cassandra.yaml file are the public IPs and everything else is the private IP and this works for the app servers. If the rpc_address is configured with the private IP or public IP then connections are only accepted on that address. Connections are accepted via both if the rpc_address is 0.0.0.0 but, in my testing, that causes failover to fail. I have a java program running that inserts a row and reads it back in a loop. If the rpc_address is a specific address I can stop a node and the program continues (replication factor of 2 for testing). If the rpc_address is 0.0.0.0 then the program bombs because it cannot connect to the cluster even tho both node addresses are specified in the Cluster() constructor.
So, with that long lead in, my questions are:
What is the recommended configuration for the nodes to communicate with one another over the private IP for a given datacenter but also communicate to the other datacenter over the public IP. It seems the EC2 Multi snitch accomplishes this but we're not on Amazon's cloud so the code seems specific to amazon.
For development and debugging, can developers VPN into our network and connect via the public IP and have the app servers connect on the private IP? This works fine with rpc_address of 0.0.0.0 however failover doesn't work which was not unexpected given the warning comment in the cassandra.yaml file.
I really appreciate the help.
Thanks,
Greg

Related

Can we connect to Cassandra using cluster name and not IP address?

My Cassandra cluster name is: "Test Cluster"
It has three nodes with three separate IPs.
I am able to connect using the IP and password.
I need to know if there is a possible way to connect using Cluster Name and not IP addresses.
You can't use only cluster name - driver need somehow to find at least one node to connect to cluster. You can use host name although, not IP, and then driver will discover the other nodes in the cluster after connection to any node of the cluster.
You can connect your node using your hostname or IP address from the CQLSH. Generally we connect locally without IP if your node resolves the hostname and you configured /etc/hosts
You cannot connect using Cluster name in Cassandra.

Cassandra inter DC sync over VPN on GCP

I have an VPN between the company network 172.16.0.0/16 and GCP 10.164.0.0/24
On GCP there is a cassandra cluster running with 3 instances. These instances get dynamical local ip adresses - for example 10.4.7.4 , 10.4.6.5, 10.4.3.4.
My issue: from the company network I cannot access 10.4x addresses as the tunnel works only for 10.164.0.0/24.
I tried setting up an LB service on 10.164.0.100 with the cassandra nodes behind. This doesnt work: when I configure that ip adress as seed node on local cluster, it gets an reply from one of the 10.4.x ip addresses, which it doesnt have in its seed list.
I need advice how to setup inter DC sync in this scenario.
IP addresses which K8s assign to Pods and Services are internal cluster-only addresses which are not accessible from outside of the cluster. It is possible by some CNI to create connection between in-cluster addresses and external networks, but I don't think that is a good idea in your case.
You need to expose your Cassandra using Service with NodePort or LoadBalancer type. That is another one answer with a same solution from Kubernetes Github.
If you will add a Service with type NodePort, your Cassandra will be available on a selected port on all Kubernetes nodes.
If you will choose LoadBalancer, Kubernetes will create for you Cloud Load Balancer which will be an entrypoint for Cassandra. Because you have a VPN to your VPC, I think you will need an Internal Load Balancer.

How do I expose Kubernetes service to the internet?

I am running a kubernetes cluster with 1 master (also a node) and 2 nodes on Azure. I am using Ubuntu with Flannel overlay network. So far everything is working well. The only problem I have is exposing the service to the internet.
I am running the cluster on an azure subnet. The master has a NIC attached to it that has a public IP. This means if I run a simple server that listens on port 80, I can reach my server using a domain name (Azure gives an option to have a domain name for a public IP).
I am also able to reach the kubernetes guest book frontend service with some hack. What I did was check all the listening ports on the master and try each port with the public IP. I was able to hit the kubernetes service and get response. Based on my understanding this is directly going to the pod that is running on the master (which is also a node) rather than going through the service IP (which would have load balanced across any of the pods).
My question is how do I map the external IP to the service IP? I know kubernetes has a setting that works only on GCE (which I can't use right now). But is there some neat way of telling etcd/flannel to do this?
If you use the kubectl expose command:
--external-ip="": External IP address to set for the service. The service can be accessed by this IP in addition to its generated service IP.
Or if you create from a json or yaml file, use the spec/externalIPs array.

Access Cassandra Node on the Azure Cloud from outside

I have created a Linux VM with a single node Cassandra cluster installed.
Cassandra.yaml has the following:
seeds:
listen address:
rpc address:
netstat -an check with all required port are up and listening. (i.e. 9160, 9042)
I am trying to connect my application which is outside of the Azure cloud to access the cassandra cluster in the cloud. Looks like the connection between the outside host to the Azure cloud Cassandra node has been block.
Wonder if there is a true restriction to access Azure VM from out of network. Is there a way to access this cassandra node from outside?
If someone can answer my question would be very nice.
Thank you!
You need to go to the "Endpoints" of your virtual machine:
At the bottom click on "Add", and add new endpoints for these ports.
Then you will need to manage ACL for each endpoint, defining the IP ranges of the allowed and blocked IP addresses.
Keep in mind that, if the internal IP that is used by the virtual machine, is different from external (public) IP, that is used by the client, then depending on a driver you may need to teach it how to do address translation. Otherwise, the cluster will report only internal IPs upon the discovery request, which will obviously be not accessible from outside.
From this and from the security prospective I would recommend setting up cassandra cluster inside of the virtual network, and accessing it via VPN.
There is a comprehensive tutorial how to do it here: http://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-nodejs-running-cassandra/

how to connect to cassandra on a cloud server

I have been trying to connect to a cassandra database on a Rackspace cloud server with no success.
Can anyone shed any light on the last paragraph of this comment from http://wiki.apache.org/cassandra/StorageConfiguration
listen_address
Commenting out this property leaves it up to InetAddress.getLocalHost(). This will always do the Right Thing if the node is properly configured (hostname, name resolution, etc), and the Right Thing is to use the address associated with the hostname (it might not be: on cloud services you should ensure the private interface is used).
On Rackspace Cloud Servers you will most likely want to listen on eth1 (10.X.X.X), the ServiceNet IP. This is only accessible to other Cloud Servers inside the same datacenter. The max throughput of eth1 is twice that of eth0 and you are not charged for the bandwidth.
Please remember that the ServiceNet network is not private so you will still need to restrict access to the ports that you bind to via iptables.

Resources