I am trying to set-up a multinode spark cluster from scratch with public cloud VMs. I have performed the following steps to set up this configuration:
Set-up ssh so that master node can ssh into worker nodes without any password
Updated '/etc/hosts' for master as well as worker nodes with the respective IP addresses (master,slave01,slave02)
Installed Apache spark and updated the bashrc file in each of the nodes
Updated '/usr/local/spark/conf/spark-env.sh' file in Master node to include:
export SPARK_MASTER_HOST=''
export JAVA_HOME=<Path_of_JAVA_installation>
Updated '/usr/local/spark/conf/worker' file in Master to include the IP
master
slave01
slave02
Finally, when I try to initiate the cluster using
./sbin/start-all.sh
I get the following output:
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark/logs/spark-exouser-org.apache.spark.deploy.master.Master-1-spark-master.out
exouser#master's password: slave01: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/logs/spark-exouser-org.apache.spark.deploy.worker.Worker-1-spark-slave1.out
slave02: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/logs/spark-exouser-org.apache.spark.deploy.worker.Worker-1-spark-slave2.out
And that's it. The cluster does not initiate. I tried accessing the Spark UI at :8080 but nothing shows up. Could someone please guide me as to what am I doing wrong here?
Related
I have installed spark 3.3.1 on ubuntu in oracle virtualbox. I am trying to run the following command.
$SPARK_HOME/sbin/start-all.sh
It seems to be starting the master node. But worker not is not starting. jps shows only master and jps.
Screenshot of terminal:
Webui:
Things tried:
I have tried reinstalling the spark, adding ip address in SPARK_MASTER_HOST in spark-env.sh.
Nothing is working. master node is created but worker node not is not getting created.
Any help is appreciated.
I want to create a spark standalone cluster. I am able to run master and slave on same node, but the slave on different node is neither showing master-URL nor connecting to master.
I am running command:
start-slave.sh spark://spark-server:7077
where spark-server is the hostname of my master.
I am able to ping master from worker, but the WebUI of master isn't showing any worker except that running on same machine. The client node is running a worker but it is independent and not connected to the master.
Please check configuration file "spark-env.sh" on your master node. Have you set the SPARK_MASTER_HOST variable to the IP address of the master node? If not try to set it and restart the master and worker nodes. For example, if your master node's IP is 192.168.0.1, you should have SPARK_MASTER_HOST=192.168.0.1 in there. Note that you don't need to set this variable on your worker nodes.
1) Make sure you set a password less SSH between nodes
Please refer the below link to setup a password less ssh between nodes
http://www.tecmint.com/ssh-passwordless-login-using-ssh-keygen-in-5-easy-steps/
2) Specify the slaves IP Address in slaves file present in $SPARK_HOME/conf directory
[This is the spark folder containing conf directory] on Master node
3) Once you specify the IP Address in slaves file start the spark cluster
[Execute the start-all.sh script present in $SPARK_HOME/sbin directory] on Master Node
Hope this Helps
If you are able to ping the master node from Worker means it has the network connectivity .The new worker node needs to be added in Spark master you need to update few things spark-env.sh
Please check the official document Spark CLuster launch
and update the reuired fileds .
Here is another blog which can help you Spark Cluster modeBlog
This solved my problem:
The idea is to use loopback address when both client and server are on the same machine.
Steps:
go to the conf folder in your spark-hadoop directory, and check if spark-env.sh is present if not then copy of spark-env.sh.template and name as spark-env.sh, then add SPARK_MASTER_HOST=127.0.0.1
then run the command to start the master from the directory (not conf folder)
./sbin/start-master.sh (this will start the master, view it in localhost:8080)
bin/spark-class org.apache.spark.deploy.worker.Worker spark://127.0.0.1:7077 (this will start the worker and you can see it listed under the worker tab in the same web UI i.e, localhost:8080)
you can add multiple workers with the above command
This worked for me, hopefully, this will work for you too.
I have a Spark Spark cluster where the master node is also the worker node. I can't reach the master from the driver-code node, and I get the error:
14:07:10 WARN client.AppClient$ClientEndpoint: Failed to connect to master master-machine:7077
The SparkContext in driver-code node is configured as:
SparkConf conf = new SparkConf(true).setMaster(spark:master-machine//:7077);
I can successfully ping master-machine, but I can't successfully telnet master-machine 7077. Meaning the machine is reachable but the port is not.
What could be the issue? I have disabled Ubuntu's ufw firewall for both master node and node where driver code runs (client).
Your syntax is a bit off, you have:
setMaster(spark:master-machine//:7077)
You want:
setMaster(spark://master-machine:7077)
From the Spark docs:
Once started, the master will print out a spark://HOST:PORT URL for
itself, which you can use to connect workers to it, or pass as the
“master” argument to SparkContext. You can also find this URL on the
master’s web UI, which is http://localhost:8080 by default.
You can use an IP address in there too, I have run into issues with debian-based installs where I always have to use the IP address but that's a separate issue. An example:
spark.master spark://5.6.7.8:7077
From a configuration page in Spark docs
Using a Spark 1.6.1 standalone cluster. After a system restart (and only minor config changes to /etc/hosts per worker) Spark executors suddenly started throwing errors that they couldn't connect to spark_master.
When I echo $SPARK_MASTER_IP on the same shell used to start the master, it correctly identifies the host as master.cluster. And when I open the GUI at port 8080 it also identifies the master as Spark Master at spark://master.cluster:7077.
I've also set in spark-env.sh the SPARK_MASTER_IP as well. Why are my executors trying to connect to spark_master?
I am new to spark and I am trying to install Spark Standalone to a 3 node cluster. I have done password-less SSH from master to other nodes.
I have tried the following config changes
Updated the hostnames for 2 nodes in conf/slaves.sh file. Created spark-env.sh file and updated the SPARK_MASTER_IP with the master URL Also, tried
updating the spark.master value in the spark-defaults.conf file
Snapshot of conf/slaves.sh
# A Spark Worker will be started on each of the machines listed below.
Spark-WorkerNode1.hadoop.com
Spark-WorkerNode2.hadoop.com
Snapshot of spark-defaults.conf
# Example:
spark.master spark://Spark-Master.hadoop.com:7077
But when I try to start the cluster by running the start-all.sh on the master, it does not recognize the worker nodes and start the cluster as local.
It does not give any error, the log files shows Successfully started service 'sparkMaster' and Successfully started service 'sparkWorker' on the master.
I have tried to run start-master and start-slave script on individual nodes and it seems to work fine. I can see 2 workers in the web UI. I am using spark 1.6.0
Can somebody please help me with what I am missing while trying to run start-all?
Snapshot of conf/slaves.sh
The file should named slaves without extension.