spark worker not connecting to master - apache-spark

I want to create a spark standalone cluster. I am able to run master and slave on same node, but the slave on different node is neither showing master-URL nor connecting to master.
I am running command:
start-slave.sh spark://spark-server:7077
where spark-server is the hostname of my master.
I am able to ping master from worker, but the WebUI of master isn't showing any worker except that running on same machine. The client node is running a worker but it is independent and not connected to the master.

Please check configuration file "spark-env.sh" on your master node. Have you set the SPARK_MASTER_HOST variable to the IP address of the master node? If not try to set it and restart the master and worker nodes. For example, if your master node's IP is 192.168.0.1, you should have SPARK_MASTER_HOST=192.168.0.1 in there. Note that you don't need to set this variable on your worker nodes.

1) Make sure you set a password less SSH between nodes
Please refer the below link to setup a password less ssh between nodes
http://www.tecmint.com/ssh-passwordless-login-using-ssh-keygen-in-5-easy-steps/
2) Specify the slaves IP Address in slaves file present in $SPARK_HOME/conf directory
[This is the spark folder containing conf directory] on Master node
3) Once you specify the IP Address in slaves file start the spark cluster
[Execute the start-all.sh script present in $SPARK_HOME/sbin directory] on Master Node
Hope this Helps

If you are able to ping the master node from Worker means it has the network connectivity .The new worker node needs to be added in Spark master you need to update few things spark-env.sh
Please check the official document Spark CLuster launch
and update the reuired fileds .
Here is another blog which can help you Spark Cluster modeBlog

This solved my problem:
The idea is to use loopback address when both client and server are on the same machine.
Steps:
go to the conf folder in your spark-hadoop directory, and check if spark-env.sh is present if not then copy of spark-env.sh.template and name as spark-env.sh, then add SPARK_MASTER_HOST=127.0.0.1
then run the command to start the master from the directory (not conf folder)
./sbin/start-master.sh (this will start the master, view it in localhost:8080)
bin/spark-class org.apache.spark.deploy.worker.Worker spark://127.0.0.1:7077 (this will start the worker and you can see it listed under the worker tab in the same web UI i.e, localhost:8080)
you can add multiple workers with the above command
This worked for me, hopefully, this will work for you too.

Related

Why is spark cluster not setting up?

I am trying to set-up a multinode spark cluster from scratch with public cloud VMs. I have performed the following steps to set up this configuration:
Set-up ssh so that master node can ssh into worker nodes without any password
Updated '/etc/hosts' for master as well as worker nodes with the respective IP addresses (master,slave01,slave02)
Installed Apache spark and updated the bashrc file in each of the nodes
Updated '/usr/local/spark/conf/spark-env.sh' file in Master node to include:
export SPARK_MASTER_HOST=''
export JAVA_HOME=<Path_of_JAVA_installation>
Updated '/usr/local/spark/conf/worker' file in Master to include the IP
master
slave01
slave02
Finally, when I try to initiate the cluster using
./sbin/start-all.sh
I get the following output:
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark/logs/spark-exouser-org.apache.spark.deploy.master.Master-1-spark-master.out
exouser#master's password: slave01: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/logs/spark-exouser-org.apache.spark.deploy.worker.Worker-1-spark-slave1.out
slave02: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/logs/spark-exouser-org.apache.spark.deploy.worker.Worker-1-spark-slave2.out
And that's it. The cluster does not initiate. I tried accessing the Spark UI at :8080 but nothing shows up. Could someone please guide me as to what am I doing wrong here?

Spark master-machine:7077 not reachable

I have a Spark Spark cluster where the master node is also the worker node. I can't reach the master from the driver-code node, and I get the error:
14:07:10 WARN client.AppClient$ClientEndpoint: Failed to connect to master master-machine:7077
The SparkContext in driver-code node is configured as:
SparkConf conf = new SparkConf(true).setMaster(spark:master-machine//:7077);
I can successfully ping master-machine, but I can't successfully telnet master-machine 7077. Meaning the machine is reachable but the port is not.
What could be the issue? I have disabled Ubuntu's ufw firewall for both master node and node where driver code runs (client).
Your syntax is a bit off, you have:
setMaster(spark:master-machine//:7077)
You want:
setMaster(spark://master-machine:7077)
From the Spark docs:
Once started, the master will print out a spark://HOST:PORT URL for
itself, which you can use to connect workers to it, or pass as the
“master” argument to SparkContext. You can also find this URL on the
master’s web UI, which is http://localhost:8080 by default.
You can use an IP address in there too, I have run into issues with debian-based installs where I always have to use the IP address but that's a separate issue. An example:
spark.master spark://5.6.7.8:7077
From a configuration page in Spark docs

Why are Spark executors trying to connect to spark_master instead of SPARK_MASTER_IP?

Using a Spark 1.6.1 standalone cluster. After a system restart (and only minor config changes to /etc/hosts per worker) Spark executors suddenly started throwing errors that they couldn't connect to spark_master.
When I echo $SPARK_MASTER_IP on the same shell used to start the master, it correctly identifies the host as master.cluster. And when I open the GUI at port 8080 it also identifies the master as Spark Master at spark://master.cluster:7077.
I've also set in spark-env.sh the SPARK_MASTER_IP as well. Why are my executors trying to connect to spark_master?

Not able to launch Spark cluster in Standalone mode with start-all.sh

I am new to spark and I am trying to install Spark Standalone to a 3 node cluster. I have done password-less SSH from master to other nodes.
I have tried the following config changes
Updated the hostnames for 2 nodes in conf/slaves.sh file. Created spark-env.sh file and updated the SPARK_MASTER_IP with the master URL Also, tried
updating the spark.master value in the spark-defaults.conf file
Snapshot of conf/slaves.sh
# A Spark Worker will be started on each of the machines listed below.
Spark-WorkerNode1.hadoop.com
Spark-WorkerNode2.hadoop.com
Snapshot of spark-defaults.conf
# Example:
spark.master spark://Spark-Master.hadoop.com:7077
But when I try to start the cluster by running the start-all.sh on the master, it does not recognize the worker nodes and start the cluster as local.
It does not give any error, the log files shows Successfully started service 'sparkMaster' and Successfully started service 'sparkWorker' on the master.
I have tried to run start-master and start-slave script on individual nodes and it seems to work fine. I can see 2 workers in the web UI. I am using spark 1.6.0
Can somebody please help me with what I am missing while trying to run start-all?
Snapshot of conf/slaves.sh
The file should named slaves without extension.

workers can't connect to master in spark standalone cluster

i installed apache spark 1.6 on 2 machines ubuntu (ip 192.168.217.136 hostname= worker-virtual-machine ) is the matser and (ip= 192.168.217.139 hostname =slave) is worker
i configured ssh and configured folders slaves and spark-env.sh
the cofiguration of slaves is
192.168.217.139
and for the spark-env.sh folder
export SPARK_MASTER_IP=192.168.217.136
export SPARK_WORKER_CORES=1
export SPARK_WORKER_MEMORY=800m
export SPARK_WORKER_IN
STANCES=2
when i start the cluster by ./sbin/start-all.sh and enter the web GUI i found just 1 worker with ipadress of master connected so the worker with 192.168.217.139 ipadress dosn't connect to the master enter image description here
i tried to start master by ./start-master.sh and start worker from worker machine by
./start-slaves.sh spark://192.168.217.136:7077
and when enter jps command it show me worker created in terminal but when i move to the terminal of master and do jps it show me just master and the GUI doesn't show anything when i return to worker machine and do jps i don't find the worker
where is the problem ???
when i start worker manualy from its machine i have this message error in logs
Reading the docs, you can see that the args for ./start-slave.sh takes the master IP address as the first argument. This means you run ./start-slave.sh on the slave machine and point it to the master machine. It looks like you're doing the opposite: running on the master and pointing to the slave.

Resources