Worker node not visible in spark webui - apache-spark

I have installed spark 3.3.1 on ubuntu in oracle virtualbox. I am trying to run the following command.
$SPARK_HOME/sbin/start-all.sh
It seems to be starting the master node. But worker not is not starting. jps shows only master and jps.
Screenshot of terminal:
Webui:
Things tried:
I have tried reinstalling the spark, adding ip address in SPARK_MASTER_HOST in spark-env.sh.
Nothing is working. master node is created but worker node not is not getting created.
Any help is appreciated.

Related

Why is spark cluster not setting up?

I am trying to set-up a multinode spark cluster from scratch with public cloud VMs. I have performed the following steps to set up this configuration:
Set-up ssh so that master node can ssh into worker nodes without any password
Updated '/etc/hosts' for master as well as worker nodes with the respective IP addresses (master,slave01,slave02)
Installed Apache spark and updated the bashrc file in each of the nodes
Updated '/usr/local/spark/conf/spark-env.sh' file in Master node to include:
export SPARK_MASTER_HOST=''
export JAVA_HOME=<Path_of_JAVA_installation>
Updated '/usr/local/spark/conf/worker' file in Master to include the IP
master
slave01
slave02
Finally, when I try to initiate the cluster using
./sbin/start-all.sh
I get the following output:
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark/logs/spark-exouser-org.apache.spark.deploy.master.Master-1-spark-master.out
exouser#master's password: slave01: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/logs/spark-exouser-org.apache.spark.deploy.worker.Worker-1-spark-slave1.out
slave02: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/logs/spark-exouser-org.apache.spark.deploy.worker.Worker-1-spark-slave2.out
And that's it. The cluster does not initiate. I tried accessing the Spark UI at :8080 but nothing shows up. Could someone please guide me as to what am I doing wrong here?

Connection from local machine installed Zeppelin to Docker Spark cluster

I am trying to configure Spark interpreter on a local machine installed Zeppelin version 0.10.0 so that I can run scripts on a Spark cluster created also local on Docker. I am using docker-compose.yml from https://github.com/big-data-europe/docker-spark and Spark version 3.1.2. After docker compose-up, I can see in the browser spark-master on localhost:8080 and History Server on localhost:18081. After reading the ID of the spark-master container, I can also run shell and spark-shell on it (docker exec -it xxxxxxxxxxxx /bin/bash). As host OS I am using Ubuntu 20.04, the spark.master in Zeppelin is set now to spark://localhost:7077, zeppelin.server.port in zeppelin-site.xml to 8070.
There is a lot of information about connecting a container running Zeppelin or running both Spark and Zeppelin in the same container but unfortunately I also use that Zeppelin to connect to the Hive via jdbc on VirtualBox Hortonworks cluster like in one of my previous posts and I wouldn't want to change that configuration now due to hardware resources. In one of the posts (Running zeppelin on spark cluster mode) I saw that such a connection is possible, unfortunately all attempts end with the "Fail to open SparkInterpreter" message.
I would be grateful for any tips.
You need to change the spark.master in Zeppelin to point to the spark master in the docker container not the local machine. Hence spark://localhost:7077 won't work.
The port 7077 is fine because that is the port specified in the docker-compose file you are using. To get the IP address of the docker container you can follow this answer. Since I suppose your container is named spark-master you can try the following:
docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' spark-master
Then specify this as the spark.master in Zeppelin: spark://docker-ip:7077

spark worker not connecting to master

I want to create a spark standalone cluster. I am able to run master and slave on same node, but the slave on different node is neither showing master-URL nor connecting to master.
I am running command:
start-slave.sh spark://spark-server:7077
where spark-server is the hostname of my master.
I am able to ping master from worker, but the WebUI of master isn't showing any worker except that running on same machine. The client node is running a worker but it is independent and not connected to the master.
Please check configuration file "spark-env.sh" on your master node. Have you set the SPARK_MASTER_HOST variable to the IP address of the master node? If not try to set it and restart the master and worker nodes. For example, if your master node's IP is 192.168.0.1, you should have SPARK_MASTER_HOST=192.168.0.1 in there. Note that you don't need to set this variable on your worker nodes.
1) Make sure you set a password less SSH between nodes
Please refer the below link to setup a password less ssh between nodes
http://www.tecmint.com/ssh-passwordless-login-using-ssh-keygen-in-5-easy-steps/
2) Specify the slaves IP Address in slaves file present in $SPARK_HOME/conf directory
[This is the spark folder containing conf directory] on Master node
3) Once you specify the IP Address in slaves file start the spark cluster
[Execute the start-all.sh script present in $SPARK_HOME/sbin directory] on Master Node
Hope this Helps
If you are able to ping the master node from Worker means it has the network connectivity .The new worker node needs to be added in Spark master you need to update few things spark-env.sh
Please check the official document Spark CLuster launch
and update the reuired fileds .
Here is another blog which can help you Spark Cluster modeBlog
This solved my problem:
The idea is to use loopback address when both client and server are on the same machine.
Steps:
go to the conf folder in your spark-hadoop directory, and check if spark-env.sh is present if not then copy of spark-env.sh.template and name as spark-env.sh, then add SPARK_MASTER_HOST=127.0.0.1
then run the command to start the master from the directory (not conf folder)
./sbin/start-master.sh (this will start the master, view it in localhost:8080)
bin/spark-class org.apache.spark.deploy.worker.Worker spark://127.0.0.1:7077 (this will start the worker and you can see it listed under the worker tab in the same web UI i.e, localhost:8080)
you can add multiple workers with the above command
This worked for me, hopefully, this will work for you too.

Spark master-machine:7077 not reachable

I have a Spark Spark cluster where the master node is also the worker node. I can't reach the master from the driver-code node, and I get the error:
14:07:10 WARN client.AppClient$ClientEndpoint: Failed to connect to master master-machine:7077
The SparkContext in driver-code node is configured as:
SparkConf conf = new SparkConf(true).setMaster(spark:master-machine//:7077);
I can successfully ping master-machine, but I can't successfully telnet master-machine 7077. Meaning the machine is reachable but the port is not.
What could be the issue? I have disabled Ubuntu's ufw firewall for both master node and node where driver code runs (client).
Your syntax is a bit off, you have:
setMaster(spark:master-machine//:7077)
You want:
setMaster(spark://master-machine:7077)
From the Spark docs:
Once started, the master will print out a spark://HOST:PORT URL for
itself, which you can use to connect workers to it, or pass as the
“master” argument to SparkContext. You can also find this URL on the
master’s web UI, which is http://localhost:8080 by default.
You can use an IP address in there too, I have run into issues with debian-based installs where I always have to use the IP address but that's a separate issue. An example:
spark.master spark://5.6.7.8:7077
From a configuration page in Spark docs

Not able to launch Spark cluster in Standalone mode with start-all.sh

I am new to spark and I am trying to install Spark Standalone to a 3 node cluster. I have done password-less SSH from master to other nodes.
I have tried the following config changes
Updated the hostnames for 2 nodes in conf/slaves.sh file. Created spark-env.sh file and updated the SPARK_MASTER_IP with the master URL Also, tried
updating the spark.master value in the spark-defaults.conf file
Snapshot of conf/slaves.sh
# A Spark Worker will be started on each of the machines listed below.
Spark-WorkerNode1.hadoop.com
Spark-WorkerNode2.hadoop.com
Snapshot of spark-defaults.conf
# Example:
spark.master spark://Spark-Master.hadoop.com:7077
But when I try to start the cluster by running the start-all.sh on the master, it does not recognize the worker nodes and start the cluster as local.
It does not give any error, the log files shows Successfully started service 'sparkMaster' and Successfully started service 'sparkWorker' on the master.
I have tried to run start-master and start-slave script on individual nodes and it seems to work fine. I can see 2 workers in the web UI. I am using spark 1.6.0
Can somebody please help me with what I am missing while trying to run start-all?
Snapshot of conf/slaves.sh
The file should named slaves without extension.

Resources