Binding Multiple IP addresses to Master in Spark - apache-spark

I am trying to set up Apache Spark with the following systems:
1 Master Node(having public IP, local IP)
Slave Node-3(having public IP, local IP)
Slave node-2(having Local IP)
The configuration is such that the Master Node and Slave Node-3 communicate via public IP's whereas Slave node-2 communicates with the other two Nodes via localIP's.
The problem I am facing is, that since the Master Node binds to a public IP, Slave Node-2 is unable to connect to the Master via it's local IP this giving a connection refused error, however Slave Node-3 is able to communicate with the Master Node without any difficulty.
Is there a way as to how can i allow communication between Master Node and Slave Node-2 or as to how can i bind multiple addresses to the Master Node, for e.g such a configuration is possible in Hadoop where we can have the namenode bind to multiple hosts.
Thank you

If you bind the Master to '0.0.0.0' or all local addresses, then the master should be able to communicate with Node-2 via the private network and Node-3 on the public network.

Related

Cassandra client connection to multiple addresses

I have a question about Cassandra.
Is it possible to open Cassandra client connection on many IPs?
On my server I have 2 network cards (eth0 and eth1) with IP 10.197.11.21 (eth0) and 192.168.0.45 (eth1) and 127.0.0.1 (lo).
I want my client to connect to Cassandra database with this three IP
in localhost, 10.197.11.21 and 192.168.0.45
For the moment I can choose only 1 IP, what does it do to modify in the file cassandra.yaml ?
You need to set rpc_address: 0.0.0.0 in cassandra.yaml
Note that when you set rpc_address to 0.0.0.0, you also must set broadcast_rpc_address to something other than 0.0.0.0 (e.g. 10.197.11.21).
rpc_address is the address that Cassandra listens on for connections from clients
listen_address is the address that Cassandra listens on for connections from other Cassandra nodes (not client connections)
broadcast_rpc_address is the address that Cassandra broadcasts to clients that are attempting to discover the other nodes in the cluster. When an application first connects to a Cassandra cluster, the cluster sends the application a list of all the nodes in the cluster, and their IP addresses. The IP address sent to the application is the broadcast_ip_address (side-note: cassandra actually sends all IP addresses, this is just the one that it tells the client to connect on). This allows the application to auto-discover all the nodes in the cluster, even if only one IP address was given to the application. This also allows applications to handle situations like a node going offline, or new nodes being added.
Even though your broadcast_rpc_address can only point to one of those two IP addresses, you application can still connect to either one. However, your application will also attempt to connect to other nodes via the broadcast_rpc_addresses sent back by the cluster. You can get around this by providing a full list of the address of every node in the cluster to your application, but the best solution is to build a driver-side address translator.

Cluster configuration problems in cassandra.yaml for multi-node cluster where only 1 public ip is known

I would like to know about configuration parameters of cassandra.yaml namely,
listen_address
broadcast_address
rpc_address
broadcast_rpc_address
on individual nodes in a particular scenario.
Scenario: 6-node cluster with respective private IPs but only one node has a public IP.
Requirement: remote python application to access the cluster
What I have tried on each node:
listen_address: respective private IP
broadcast_address: blank
rpc_address: blank
except on node with public ip as 0.0.0.0
broadcast_rpc_address: blank except on node with public ip as its public ip
I tried issuing from my application Cluster(['public ip'], port=9042) but I received the warning which eventually led to shutting down my application:
WARNING:cassandra.cluster:Failed to create connection pool for new
host 192.xxx.xx.3:
I recommend adding two interfaces to each machine.
One of them is listen_address
and one for the rpc_address.
in this approach, you don't use as broadcast_rpc_address.
but, if you use for public ip you have to put a general address for all nodes. Only one of them can not have a public address.

Running two single-node cassandra clusters using different ports

I want to run two instances of cassandra on a single machine. It runs fine with two loopback addresses 127.0.0.1 and 127.0.0.1 as the listen_address, rpc_address with the same native_port: 9042. But i shall be using the two resultant single-node clusters from a different machine so i need to have the addresses which can be identified by the other machines(loopbacks and localhost out of question) of the network.
Is there a way to achieve this?
You first need to create 2 IP address for your machine. This can be done by
Setting 2 NIC Cards (or)
Assigning multiple IP for a single NIC.
This can be done by assigning static IP (Make sure you provide the proper gateway and subnet so it will be accessible by other machine) and assign IP addresses, below link explains how to assign multiple IP addresses along with the bottlenecks of doing so:
http://www.tomshardware.com/faq/id-1925787/computer-address.html
After you have created 2 IP addresses, start each Cassandra Server with different IP address.
Do a telnet test:
telnet <IP address> <port(9042)>
from any other machine to check your Cassandra server is started with the assigned IP address.

Issues in using spark cluster from outside LAN

I am trying to use a spark cluster from outside the cluster itself.
The problem is that spark bind to my local machine private ip and it is able to connect to the master but then workers fail to connect to my machine (driver) because of IP problems (they see my private IP, because spark binds on my private IP).
I can see that from workers log:
"--driver-url" "spark://CoarseGrainedScheduler#PRIVATE_IP_MY_LAPTOP:34355"
any help?
Try setting spark.driver.host (search for it here for more info) to your public IP, the workers will then use that address instead of the (automatically resolved) private IP.
Try setting spark.driver.bindAddress to 0.0.0.0 so that the driver program can listen to the global.

Connecting to BAM internal cassandra

Now I am trying BAM 2.3.0 and I want to know the way to connect to BAM internal Cassandra from different server. Is it possible or it is tightly coupled?
No it is not tightly coupled. Similar to making standalone Cassandra cluster you must do the configuration but since you not creating a cluster but to access from external server no need to give seed addesses. Just configure the listen and rpc address. The location of the cassandra.yaml is BAM_HOME/repository/conf/etc.
In the cassandra.yaml change listen_address and rpc_address to your IP address. If you put 127.0.0.1 the Cassandra will only listen to the connections coming from the localhost, therefore you cannot access from outside.

Resources