i installed apache spark 1.6 on 2 machines ubuntu (ip 192.168.217.136 hostname= worker-virtual-machine ) is the matser and (ip= 192.168.217.139 hostname =slave) is worker
i configured ssh and configured folders slaves and spark-env.sh
the cofiguration of slaves is
192.168.217.139
and for the spark-env.sh folder
export SPARK_MASTER_IP=192.168.217.136
export SPARK_WORKER_CORES=1
export SPARK_WORKER_MEMORY=800m
export SPARK_WORKER_IN
STANCES=2
when i start the cluster by ./sbin/start-all.sh and enter the web GUI i found just 1 worker with ipadress of master connected so the worker with 192.168.217.139 ipadress dosn't connect to the master enter image description here
i tried to start master by ./start-master.sh and start worker from worker machine by
./start-slaves.sh spark://192.168.217.136:7077
and when enter jps command it show me worker created in terminal but when i move to the terminal of master and do jps it show me just master and the GUI doesn't show anything when i return to worker machine and do jps i don't find the worker
where is the problem ???
when i start worker manualy from its machine i have this message error in logs
Reading the docs, you can see that the args for ./start-slave.sh takes the master IP address as the first argument. This means you run ./start-slave.sh on the slave machine and point it to the master machine. It looks like you're doing the opposite: running on the master and pointing to the slave.
Related
I am trying to set-up a multinode spark cluster from scratch with public cloud VMs. I have performed the following steps to set up this configuration:
Set-up ssh so that master node can ssh into worker nodes without any password
Updated '/etc/hosts' for master as well as worker nodes with the respective IP addresses (master,slave01,slave02)
Installed Apache spark and updated the bashrc file in each of the nodes
Updated '/usr/local/spark/conf/spark-env.sh' file in Master node to include:
export SPARK_MASTER_HOST=''
export JAVA_HOME=<Path_of_JAVA_installation>
Updated '/usr/local/spark/conf/worker' file in Master to include the IP
master
slave01
slave02
Finally, when I try to initiate the cluster using
./sbin/start-all.sh
I get the following output:
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark/logs/spark-exouser-org.apache.spark.deploy.master.Master-1-spark-master.out
exouser#master's password: slave01: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/logs/spark-exouser-org.apache.spark.deploy.worker.Worker-1-spark-slave1.out
slave02: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/logs/spark-exouser-org.apache.spark.deploy.worker.Worker-1-spark-slave2.out
And that's it. The cluster does not initiate. I tried accessing the Spark UI at :8080 but nothing shows up. Could someone please guide me as to what am I doing wrong here?
I want to create a spark standalone cluster. I am able to run master and slave on same node, but the slave on different node is neither showing master-URL nor connecting to master.
I am running command:
start-slave.sh spark://spark-server:7077
where spark-server is the hostname of my master.
I am able to ping master from worker, but the WebUI of master isn't showing any worker except that running on same machine. The client node is running a worker but it is independent and not connected to the master.
Please check configuration file "spark-env.sh" on your master node. Have you set the SPARK_MASTER_HOST variable to the IP address of the master node? If not try to set it and restart the master and worker nodes. For example, if your master node's IP is 192.168.0.1, you should have SPARK_MASTER_HOST=192.168.0.1 in there. Note that you don't need to set this variable on your worker nodes.
1) Make sure you set a password less SSH between nodes
Please refer the below link to setup a password less ssh between nodes
http://www.tecmint.com/ssh-passwordless-login-using-ssh-keygen-in-5-easy-steps/
2) Specify the slaves IP Address in slaves file present in $SPARK_HOME/conf directory
[This is the spark folder containing conf directory] on Master node
3) Once you specify the IP Address in slaves file start the spark cluster
[Execute the start-all.sh script present in $SPARK_HOME/sbin directory] on Master Node
Hope this Helps
If you are able to ping the master node from Worker means it has the network connectivity .The new worker node needs to be added in Spark master you need to update few things spark-env.sh
Please check the official document Spark CLuster launch
and update the reuired fileds .
Here is another blog which can help you Spark Cluster modeBlog
This solved my problem:
The idea is to use loopback address when both client and server are on the same machine.
Steps:
go to the conf folder in your spark-hadoop directory, and check if spark-env.sh is present if not then copy of spark-env.sh.template and name as spark-env.sh, then add SPARK_MASTER_HOST=127.0.0.1
then run the command to start the master from the directory (not conf folder)
./sbin/start-master.sh (this will start the master, view it in localhost:8080)
bin/spark-class org.apache.spark.deploy.worker.Worker spark://127.0.0.1:7077 (this will start the worker and you can see it listed under the worker tab in the same web UI i.e, localhost:8080)
you can add multiple workers with the above command
This worked for me, hopefully, this will work for you too.
I have a Spark Spark cluster where the master node is also the worker node. I can't reach the master from the driver-code node, and I get the error:
14:07:10 WARN client.AppClient$ClientEndpoint: Failed to connect to master master-machine:7077
The SparkContext in driver-code node is configured as:
SparkConf conf = new SparkConf(true).setMaster(spark:master-machine//:7077);
I can successfully ping master-machine, but I can't successfully telnet master-machine 7077. Meaning the machine is reachable but the port is not.
What could be the issue? I have disabled Ubuntu's ufw firewall for both master node and node where driver code runs (client).
Your syntax is a bit off, you have:
setMaster(spark:master-machine//:7077)
You want:
setMaster(spark://master-machine:7077)
From the Spark docs:
Once started, the master will print out a spark://HOST:PORT URL for
itself, which you can use to connect workers to it, or pass as the
“master” argument to SparkContext. You can also find this URL on the
master’s web UI, which is http://localhost:8080 by default.
You can use an IP address in there too, I have run into issues with debian-based installs where I always have to use the IP address but that's a separate issue. An example:
spark.master spark://5.6.7.8:7077
From a configuration page in Spark docs
Using a Spark 1.6.1 standalone cluster. After a system restart (and only minor config changes to /etc/hosts per worker) Spark executors suddenly started throwing errors that they couldn't connect to spark_master.
When I echo $SPARK_MASTER_IP on the same shell used to start the master, it correctly identifies the host as master.cluster. And when I open the GUI at port 8080 it also identifies the master as Spark Master at spark://master.cluster:7077.
I've also set in spark-env.sh the SPARK_MASTER_IP as well. Why are my executors trying to connect to spark_master?
When I submitted an application to standalone cluster, i met this exception.
What is weird is it comes and goes several time. I already set SPARK_LOCAL_IP to the right ip address.
But I don't understand why the work always access to port 0
The environment is :
vm1: 10.3.100.169, running master and slave
vm2: 10.3.101.119, running slave
Anyone met this issue? Any ideas about how to solve?
Here is the command line and spark-env.sh
bin/spark-submit --master spark://10.3.100.169:7077 --deploy-mode cluster --class ${classname} --driver-java-options "-Danalytics.app.configuration.url=http://10.3.100.169:9090/application.conf -XX:+UseG1GC" --conf "spark.executor.extraJavaOptions=-Danalytics.app.configuration.url=http://10.3.100.169:9090/application.conf -XX:+UseG1GC" ${jar}
SPARK_LOCAL_IP=10.3.100.169
SPARK_MASTER_IP=10.3.100.169
SPARK_PUBLIC_DNS=10.3.100.169
SPARK_EXECUTOR_MEMORY=3g
SPARK_EXECUTOR_CORES=2
SPARK_WORKER_MEMORY=3g
SPARK_WORKER_CORES=2
Thanks
If we consider a fresh installation of Spark with its default configuration, the following steps should create a working Spark Standalone cluster.
1. Configure /etc/hosts file on master and slaves
Your hosts file on both nodes should look like
127.0.0.1 localhost
10.3.100.169 master.example.com master
10.3.101.119 slave.example.com slave
2. Setup password-less SSH between master and workers
On the master execute the following commands
# change to the user you are going to use to run Spark eg. 'spark-user'
su - spark-user
ssh-keygen
ssh-copy-id -i ~/.ssh/id_rsa.pub spark-user#slave
ssh-copy-id -i ~/.ssh/id_rsa.pub spark-user#master #(since you want to start a worker on master too)
verify that you are able to SSH to slave from master without a password
refer: setup passwordless ssh
3. configure conf/slaves file on all nodes
Your slaves file should look like:
master.example.com
slave.example.com
4. Start the cluster
sbin/start-all.sh
Hope this helps !