Spark master-machine:7077 not reachable - apache-spark

I have a Spark Spark cluster where the master node is also the worker node. I can't reach the master from the driver-code node, and I get the error:
14:07:10 WARN client.AppClient$ClientEndpoint: Failed to connect to master master-machine:7077
The SparkContext in driver-code node is configured as:
SparkConf conf = new SparkConf(true).setMaster(spark:master-machine//:7077);
I can successfully ping master-machine, but I can't successfully telnet master-machine 7077. Meaning the machine is reachable but the port is not.
What could be the issue? I have disabled Ubuntu's ufw firewall for both master node and node where driver code runs (client).

Your syntax is a bit off, you have:
setMaster(spark:master-machine//:7077)
You want:
setMaster(spark://master-machine:7077)
From the Spark docs:
Once started, the master will print out a spark://HOST:PORT URL for
itself, which you can use to connect workers to it, or pass as the
“master” argument to SparkContext. You can also find this URL on the
master’s web UI, which is http://localhost:8080 by default.
You can use an IP address in there too, I have run into issues with debian-based installs where I always have to use the IP address but that's a separate issue. An example:
spark.master spark://5.6.7.8:7077
From a configuration page in Spark docs

Related

Why is spark cluster not setting up?

I am trying to set-up a multinode spark cluster from scratch with public cloud VMs. I have performed the following steps to set up this configuration:
Set-up ssh so that master node can ssh into worker nodes without any password
Updated '/etc/hosts' for master as well as worker nodes with the respective IP addresses (master,slave01,slave02)
Installed Apache spark and updated the bashrc file in each of the nodes
Updated '/usr/local/spark/conf/spark-env.sh' file in Master node to include:
export SPARK_MASTER_HOST=''
export JAVA_HOME=<Path_of_JAVA_installation>
Updated '/usr/local/spark/conf/worker' file in Master to include the IP
master
slave01
slave02
Finally, when I try to initiate the cluster using
./sbin/start-all.sh
I get the following output:
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark/logs/spark-exouser-org.apache.spark.deploy.master.Master-1-spark-master.out
exouser#master's password: slave01: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/logs/spark-exouser-org.apache.spark.deploy.worker.Worker-1-spark-slave1.out
slave02: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/logs/spark-exouser-org.apache.spark.deploy.worker.Worker-1-spark-slave2.out
And that's it. The cluster does not initiate. I tried accessing the Spark UI at :8080 but nothing shows up. Could someone please guide me as to what am I doing wrong here?

spark worker not connecting to master

I want to create a spark standalone cluster. I am able to run master and slave on same node, but the slave on different node is neither showing master-URL nor connecting to master.
I am running command:
start-slave.sh spark://spark-server:7077
where spark-server is the hostname of my master.
I am able to ping master from worker, but the WebUI of master isn't showing any worker except that running on same machine. The client node is running a worker but it is independent and not connected to the master.
Please check configuration file "spark-env.sh" on your master node. Have you set the SPARK_MASTER_HOST variable to the IP address of the master node? If not try to set it and restart the master and worker nodes. For example, if your master node's IP is 192.168.0.1, you should have SPARK_MASTER_HOST=192.168.0.1 in there. Note that you don't need to set this variable on your worker nodes.
1) Make sure you set a password less SSH between nodes
Please refer the below link to setup a password less ssh between nodes
http://www.tecmint.com/ssh-passwordless-login-using-ssh-keygen-in-5-easy-steps/
2) Specify the slaves IP Address in slaves file present in $SPARK_HOME/conf directory
[This is the spark folder containing conf directory] on Master node
3) Once you specify the IP Address in slaves file start the spark cluster
[Execute the start-all.sh script present in $SPARK_HOME/sbin directory] on Master Node
Hope this Helps
If you are able to ping the master node from Worker means it has the network connectivity .The new worker node needs to be added in Spark master you need to update few things spark-env.sh
Please check the official document Spark CLuster launch
and update the reuired fileds .
Here is another blog which can help you Spark Cluster modeBlog
This solved my problem:
The idea is to use loopback address when both client and server are on the same machine.
Steps:
go to the conf folder in your spark-hadoop directory, and check if spark-env.sh is present if not then copy of spark-env.sh.template and name as spark-env.sh, then add SPARK_MASTER_HOST=127.0.0.1
then run the command to start the master from the directory (not conf folder)
./sbin/start-master.sh (this will start the master, view it in localhost:8080)
bin/spark-class org.apache.spark.deploy.worker.Worker spark://127.0.0.1:7077 (this will start the worker and you can see it listed under the worker tab in the same web UI i.e, localhost:8080)
you can add multiple workers with the above command
This worked for me, hopefully, this will work for you too.

Why are Spark executors trying to connect to spark_master instead of SPARK_MASTER_IP?

Using a Spark 1.6.1 standalone cluster. After a system restart (and only minor config changes to /etc/hosts per worker) Spark executors suddenly started throwing errors that they couldn't connect to spark_master.
When I echo $SPARK_MASTER_IP on the same shell used to start the master, it correctly identifies the host as master.cluster. And when I open the GUI at port 8080 it also identifies the master as Spark Master at spark://master.cluster:7077.
I've also set in spark-env.sh the SPARK_MASTER_IP as well. Why are my executors trying to connect to spark_master?

Not able to launch Spark cluster in Standalone mode with start-all.sh

I am new to spark and I am trying to install Spark Standalone to a 3 node cluster. I have done password-less SSH from master to other nodes.
I have tried the following config changes
Updated the hostnames for 2 nodes in conf/slaves.sh file. Created spark-env.sh file and updated the SPARK_MASTER_IP with the master URL Also, tried
updating the spark.master value in the spark-defaults.conf file
Snapshot of conf/slaves.sh
# A Spark Worker will be started on each of the machines listed below.
Spark-WorkerNode1.hadoop.com
Spark-WorkerNode2.hadoop.com
Snapshot of spark-defaults.conf
# Example:
spark.master spark://Spark-Master.hadoop.com:7077
But when I try to start the cluster by running the start-all.sh on the master, it does not recognize the worker nodes and start the cluster as local.
It does not give any error, the log files shows Successfully started service 'sparkMaster' and Successfully started service 'sparkWorker' on the master.
I have tried to run start-master and start-slave script on individual nodes and it seems to work fine. I can see 2 workers in the web UI. I am using spark 1.6.0
Can somebody please help me with what I am missing while trying to run start-all?
Snapshot of conf/slaves.sh
The file should named slaves without extension.

Spark UI on AWS EMR

I am running a AWS EMR cluster with Spark (1.3.1) installed via the EMR console dropdown. Spark is current and processing data but I am trying to find which port has been assigned to the WebUI. I've tried port forwarding both 4040 and 8080 with no connection. I'm forwarding like so
ssh -i ~/KEY.pem -L 8080:localhost:8080 hadoop#EMR_DNS
1) How do I find out what the Spark WebUI's assigned port is?
2) How do I verify the Spark WebUI is running?
Spark on EMR is configured for YARN, thus the Spark UI is available by the application url provided by the YARN Resource Manager (http://spark.apache.org/docs/latest/monitoring.html). So the easiest way to get to it is to setup your browser with SOCKS using a port opened by SSH then from the EMR console open Resource Manager and click the Application Master URL provided to the right of the running application. Spark History server is available at the default port 18080.
Example of socks with EMR at http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-web-interfaces.html
Here is an alternative if you don't want to deal with the browser setup with SOCKS as suggested on the EMR docs.
Open a ssh tunnel to the master node with port forwarding to the machine running spark ui
ssh -i path/to/aws.pem -L 4040:SPARK_UI_NODE_URL:4040 hadoop#MASTER_URL
MASTER_URL (EMR_DNS in the question) is the URL of the master node that you can get from EMR Management Console page for the cluster
SPARK_UI_NODE_URL can be seen near the top of the stderr log. The log line will look something like:
16/04/28 21:24:46 INFO SparkUI: Started SparkUI at http://10.2.5.197:4040
Point your browser to localhost:4040
Tried this on EMR 4.6 running Spark 2.6.1
Glad to announce that this feature is finally available on AWS. You won't need to run any special commands (or to configure a SSH tunnel) :
By clicking on the link to the spark history server ui, you'll be able to see the old applications logs, or to access the running spark job's ui :
For more details: https://docs.aws.amazon.com/emr/latest/ManagementGuide/app-history-spark-UI.html
I hope it helps !
Just run the following command:
ssh -i /your-path/aws.pem -N -L 20888:ip-172-31-42-70.your-region.compute.internal:20888 hadoop#ec2-xxx.compute.amazonaws.com.cn
There are 3 places you need to change:
your .pem file
your internal master node IP
your public DNS domain.
Finally, on the Yarn UI you can click your Spark Application Tracking URL, then just replace the url:
"http://your-internal-ip:20888/proxy/application_1558059200084_0002/"
->
"http://localhost:20888/proxy/application_1558059200084_0002/"
It worked for EMR 5.x
Simply use SSH tunnel
On your local machine do:
ssh -i /path/to/pem -L 3000:ec2-xxxxcompute-1.amazonaws.com:8088 hadoop#ec2-xxxxcompute-1.amazonaws.com
On your local machine browser hit:
localhost:3000

Resources