I am setting up standalone Spark cluster in Azure VM. I want to run Spark master with Azure VM's public IP not with VM's hostname, so that I can access from other VM.
Spark version: spark-2.2.0-bin-hadoop2.7
I have created new file "spark-env.sh" under conf folder and added export SPARK_MASTER_HOST=x.x.x.x
Start master
sbin>./start-master.sh
I am getting below mentioned error. Spark master is not started.
How to set public IP address to Spark Master ?
Error LOG
18/04/10 04:55:12 INFO SecurityManager: Changing view acls to: root
18/04/10 04:55:12 INFO SecurityManager: Changing modify acls to: root
18/04/10 04:55:12 INFO SecurityManager: Changing view acls groups to:
18/04/10 04:55:12 INFO SecurityManager: Changing modify acls groups to:
18/04/10 04:55:12 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
18/04/10 04:55:13 WARN Utils: Service 'sparkMaster' could not bind on port 7077. Attempting port 7078.
18/04/10 04:55:13 WARN Utils: Service 'sparkMaster' could not bind on port 7078. Attempting port 7079.
18/04/10 04:55:13 WARN Utils: Service 'sparkMaster' could not bind on port 7079. Attempting port 7080.
18/04/10 04:55:13 WARN Utils: Service 'sparkMaster' could not bind on port 7080. Attempting port 7081.
18/04/10 04:55:13 WARN Utils: Service 'sparkMaster' could not bind on port 7081. Attempting port 7082.
18/04/10 04:55:13 WARN Utils: Service 'sparkMaster' could not bind on port 7082. Attempting port 7083.
18/04/10 04:55:13 WARN Utils: Service 'sparkMaster' could not bind on port 7083. Attempting port 7084.
18/04/10 04:55:13 WARN Utils: Service 'sparkMaster' could not bind on port 7084. Attempting port 7085.
18/04/10 04:55:13 WARN Utils: Service 'sparkMaster' could not bind on port 7085. Attempting port 7086.
18/04/10 04:55:13 WARN Utils: Service 'sparkMaster' could not bind on port 7086. Attempting port 7087.
18/04/10 04:55:13 WARN Utils: Service 'sparkMaster' could not bind on port 7087. Attempting port 7088.
18/04/10 04:55:13 WARN Utils: Service 'sparkMaster' could not bind on port 7088. Attempting port 7089.
18/04/10 04:55:13 WARN Utils: Service 'sparkMaster' could not bind on port 7089. Attempting port 7090.
18/04/10 04:55:13 WARN Utils: Service 'sparkMaster' could not bind on port 7090. Attempting port 7091.
18/04/10 04:55:13 WARN Utils: Service 'sparkMaster' could not bind on port 7091. Attempting port 7092.
18/04/10 04:55:13 WARN Utils: Service 'sparkMaster' could not bind on port 7092. Attempting port 7093.
18/04/10 04:55:13 WARN Utils: Service 'sparkMaster' could not bind on port 7092. Attempting port 7093.
Exception in thread "main" java.net.BindException: Cannot assign requested address: Service 'sparkMaster' failed after 16 retries (starting from 7077)! Consider explicitly setting the appropriate port for the service 'sparkMaster' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:127)
at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:501)
at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1218)
at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:496)
at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:481)
at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:965)
at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:210)
at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:353)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:446)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
at java.lang.Thread.run(Thread.java:748)
For SPARK_MASTER_HOST, you should use VM's private IP or 0.0.0.0, you could not enter VM's public IP, if you do it, you will get the error log.
Now, you want to access Spark Master by public IP, you need open port 7077-7093(not sure, depends on your service) on Azure NSG and VM's firewall.
Related
I am trying to create a cluster of VM instances in Google cloud. There are 4 worker nodes and 1 master node.
Things that I have configured:
Created "sparkuser" and given sudo privileges
Installed same version of Java JDK and JRE in all machines and configured the path.
Installed same version of Scala and sparks.
Hosts file and host name added, able to ssh between each machines.
Configured the "spark-env.sh" and "slaves" file in spark on each machines
However, when I try to run this bash command "start-master.sh" it starts all the VM's spark in cluster. But with the jps command I cannot see any master and workers, on checking the file in: /spark/log
The log file contains the error and I tried to solve it with various ways found in the developers' community. Unfortunately, I am still not able to solve the issue:
I am adding the log file here:
sparkuser#master:~$ start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark-sparkuser-org.apache.spark.deploy.master.Master-1-master.out
worker4: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/logs/spark-sparkuser-org.apache.spark.deploy.worker.Worker-1-worker4.out
worker3: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/logs/spark-sparkuser-org.apache.spark.deploy.worker.Worker-1-worker3.out
worker2: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/logs/spark-sparkuser-org.apache.spark.deploy.worker.Worker-1-worker2.out
worker1: starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/logs/spark-sparkuser-org.apache.spark.deploy.worker.Worker-1-worker1.out
sparkuser#master:~$ jps
3280 Jps
sparkuser#master:~$ cat /opt/spark/logs/spark-sparkuser-org.apache.spark.deploy.master.Master-1-master.out.6
cat: /opt/spark/logs/spark-sparkuser-org.apache.spark.deploy.master.Master-1-master.out.6: No such file or directory
sparkuser#master:~$ cat /opt/spark/logs/spark-sparkuser-org.apache.spark.deploy.master.Master-1-master.out.5
Spark Command: /usr/lib/jvm/java-11-openjdk-amd64/bin/java -cp /opt/spark/conf/:/opt/spark/jars/* -Xmx1g org.apache.spark.deploy.master.Master --host 35.216.27.9 --port 7100 --webui-port 8080
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
22/09/30 07:09:21 INFO Master: Started daemon with process name: 3913#master
22/09/30 07:09:21 INFO SignalUtils: Registering signal handler for TERM
22/09/30 07:09:21 INFO SignalUtils: Registering signal handler for HUP
22/09/30 07:09:21 INFO SignalUtils: Registering signal handler for INT
22/09/30 07:09:22 WARN Utils: Your hostname, master resolves to a loopback address: 127.0.0.1; using 10.178.0.3 instead (on interface ens4)
22/09/30 07:09:22 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.2.2.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
22/09/30 07:09:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/09/30 07:09:22 INFO SecurityManager: Changing view acls to: sparkuser
22/09/30 07:09:22 INFO SecurityManager: Changing modify acls to: sparkuser
22/09/30 07:09:22 INFO SecurityManager: Changing view acls groups to:
22/09/30 07:09:22 INFO SecurityManager: Changing modify acls groups to:
22/09/30 07:09:22 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(sparkuser); groups with view permissions: Set(); users with modify permissions: Set(sparkuser); groups with modify permissions: Set()
22/09/30 07:09:23 WARN Utils: Service 'sparkMaster' could not bind on port 7100. Attempting port 7101.
22/09/30 07:09:23 WARN Utils: Service 'sparkMaster' could not bind on port 7101. Attempting port 7102.
22/09/30 07:09:23 WARN Utils: Service 'sparkMaster' could not bind on port 7102. Attempting port 7103.
22/09/30 07:09:23 WARN Utils: Service 'sparkMaster' could not bind on port 7103. Attempting port 7104.
22/09/30 07:09:23 WARN Utils: Service 'sparkMaster' could not bind on port 7104. Attempting port 7105.
22/09/30 07:09:23 WARN Utils: Service 'sparkMaster' could not bind on port 7105. Attempting port 7106.
22/09/30 07:09:23 WARN Utils: Service 'sparkMaster' could not bind on port 7106. Attempting port 7107.
22/09/30 07:09:23 WARN Utils: Service 'sparkMaster' could not bind on port 7107. Attempting port 7108.
22/09/30 07:09:23 WARN Utils: Service 'sparkMaster' could not bind on port 7108. Attempting port 7109.
22/09/30 07:09:23 WARN Utils: Service 'sparkMaster' could not bind on port 7109. Attempting port 7110.
22/09/30 07:09:23 WARN Utils: Service 'sparkMaster' could not bind on port 7110. Attempting port 7111.
22/09/30 07:09:23 WARN Utils: Service 'sparkMaster' could not bind on port 7111. Attempting port 7112.
22/09/30 07:09:23 WARN Utils: Service 'sparkMaster' could not bind on port 7112. Attempting port 7113.
22/09/30 07:09:23 WARN Utils: Service 'sparkMaster' could not bind on port 7113. Attempting port 7114.
22/09/30 07:09:23 WARN Utils: Service 'sparkMaster' could not bind on port 7114. Attempting port 7115.
22/09/30 07:09:23 WARN Utils: Service 'sparkMaster' could not bind on port 7115. Attempting port 7116.
22/09/30 07:09:23 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[main,5,main]
java.net.BindException: Cannot assign requested address: Service 'sparkMaster' failed after 16 retries (starting from 7100)! Consider explicitly setting the appropriate port for the service 'sparkMaster' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
at java.base/sun.nio.ch.Net.bind0(Native Method)
at java.base/sun.nio.ch.Net.bind(Net.java:459)
at java.base/sun.nio.ch.Net.bind(Net.java:448)
at java.base/sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:227)
at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:134)
at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:562)
at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1334)
at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:506)
at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:491)
at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:973)
at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:260)
at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:356)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
22/09/30 07:09:23 INFO ShutdownHookManager: Shutdown hook called
On spark/conf/spark-env.sh file add these following:
export SPARK_LOCAL_IP="127.0.0.1"
export SPARK_MASTER_WEBUI_PORT=8080
export SPARK_WOKER_DIR=/opt/spark/conf/slaves "user-case based path"
export SPARK_LOG_DIR=/opt/spark/logs
Along with that please ensure that you are able to SSH between all machines.
If you run scp among the machines and it runs without any error then the cluster will start. If SSH is working, but SCP is not working then remove the pub_keys and start over the key exchange process.
I hope this works.
It worked for me.
After running Databricks-connect configure I am trying to run test but it always fails to start with:
* PySpark is installed at c:\users\username\.conda\envs\dbconnect\lib\site-packages\pyspark
* Checking SPARK_HOME
* Checking java version
java version "1.8.0_211"
Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)
* Skipping scala command test on Windows
* Testing python command
19/07/25 15:11:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/07/25 15:11:17 WARN MetricsSystem: Using default name SparkStatusTracker for source because neither spark.metrics.namespace nor spark.app.id is set.
19/07/25 15:11:17 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/07/25 15:11:17 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
19/07/25 15:11:17 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
19/07/25 15:11:17 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044.
19/07/25 15:11:17 WARN Utils: Service 'SparkUI' could not bind on port 4044. Attempting port 4045.
19/07/25 15:11:17 WARN Utils: Service 'SparkUI' could not bind on port 4045. Attempting port 4046.
19/07/25 15:11:17 WARN Utils: Service 'SparkUI' could not bind on port 4046. Attempting port 4047.
19/07/25 15:11:17 WARN Utils: Service 'SparkUI' could not bind on port 4047. Attempting port 4048.
19/07/25 15:11:17 WARN Utils: Service 'SparkUI' could not bind on port 4048. Attempting port 4049.
19/07/25 15:11:17 WARN Utils: Service 'SparkUI' could not bind on port 4049. Attempting port 4050.
19/07/25 15:11:17 WARN Utils: Service 'SparkUI' could not bind on port 4050. Attempting port 4051.
19/07/25 15:11:17 WARN Utils: Service 'SparkUI' could not bind on port 4051. Attempting port 4052.
19/07/25 15:11:17 WARN Utils: Service 'SparkUI' could not bind on port 4052. Attempting port 4053.
19/07/25 15:11:17 WARN Utils: Service 'SparkUI' could not bind on port 4053. Attempting port 4054.
19/07/25 15:11:17 WARN Utils: Service 'SparkUI' could not bind on port 4054. Attempting port 4055.
19/07/25 15:11:17 WARN Utils: Service 'SparkUI' could not bind on port 4055. Attempting port 4056.
19/07/25 15:11:17 ERROR SparkUI: Failed to bind SparkUI
java.net.BindException: Address already in use: bind: Service 'SparkUI' failed after 16 retries (starting from 4040)! Consider explicitly setting the appropriate port for the service 'SparkUI' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
This is the first error raised, but I think the ones that follow are related to this. Can anyone tell me how to set the SparkUI port through databricks-connect? Worth noting that netstat -an shows these ports as available. I get this same error if I run as admin or not.
So I have set up a Spark cluster. But I can't actually get it to work. When I submit the SparkPi example with:
./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master spark://x.y.129.163:7077 \
--deploy-mode cluster \
--supervise \
--executor-memory 20G \
--total-executor-cores 2 \
examples/jars/spark-examples_2.11-2.0.0.jar 1000
I get the following from the worker logs:
Spark Command: /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.111-2.6.7.2.el7_2.x86_64/jre/bin/java -cp /opt/spark/spark-2.0.0-bin-hadoop2.7/conf/:/opt/spark/spark-2.0.0-bin-hadoop2.7/jars/* -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://mesos-master:7077
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/09/18 09:20:56 INFO Worker: Started daemon with process name: 23949#mesos-slave-4.novalocal
16/09/18 09:20:56 INFO SignalUtils: Registered signal handler for TERM
16/09/18 09:20:56 INFO SignalUtils: Registered signal handler for HUP
16/09/18 09:20:56 INFO SignalUtils: Registered signal handler for INT
16/09/18 09:20:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/09/18 09:20:56 INFO SecurityManager: Changing view acls to: root
16/09/18 09:20:56 INFO SecurityManager: Changing modify acls to: root
16/09/18 09:20:56 INFO SecurityManager: Changing view acls groups to:
16/09/18 09:20:56 INFO SecurityManager: Changing modify acls groups to:
16/09/18 09:20:56 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
16/09/18 09:21:00 WARN ThreadLocalRandom: Failed to generate a seed from SecureRandom within 3 seconds. Not enough entrophy?
16/09/18 09:21:00 INFO Utils: Successfully started service 'sparkWorker' on port 55256.
16/09/18 09:21:00 INFO Worker: Starting Spark worker x.y.129.162:55256 with 4 cores, 6.6 GB RAM
16/09/18 09:21:00 INFO Worker: Running Spark version 2.0.0
16/09/18 09:21:00 INFO Worker: Spark home: /opt/spark/spark-2.0.0-bin-hadoop2.7
16/09/18 09:21:00 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
16/09/18 09:21:00 INFO WorkerWebUI: Bound WorkerWebUI to x.y.129.162, and started at http://x.y.129.162:8081
16/09/18 09:21:00 INFO Worker: Connecting to master mesos-master:7077...
16/09/18 09:21:00 INFO TransportClientFactory: Successfully created connection to mesos-master/x.y.129.163:7077 after 33 ms (0 ms spent in bootstraps)
16/09/18 09:21:00 INFO Worker: Successfully registered with master spark://x.y.129.163:7077
16/09/18 09:21:00 INFO Worker: Asked to launch driver driver-20160918090435-0001
16/09/18 09:21:01 INFO DriverRunner: Launch Command: "/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.111-2.6.7.2.el7_2.x86_64/jre/bin/java" "-cp" "/opt/spark/spark-2.0.0-bin-hadoop2.7/conf/:/opt/spark/spark-2.0.0-bin-hadoop2.7/jars/*" "-Xmx1024M" "-Dspark.executor.memory=20G" "-Dspark.submit.deployMode=cluster" "-Dspark.app.name=org.apache.spark.examples.SparkPi" "-Dspark.cores.max=2" "-Dspark.rpc.askTimeout=10" "-Dspark.driver.supervise=true" "-Dspark.jars=file:/opt/spark/spark-2.0.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.0.0.jar" "-Dspark.master=spark://x.y.129.163:7077" "-XX:MaxPermSize=256m" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker#x.y.129.162:55256" "/opt/spark/spark-2.0.0-bin-hadoop2.7/work/driver-20160918090435-0001/spark-examples_2.11-2.0.0.jar" "org.apache.spark.examples.SparkPi" "1000"
16/09/18 09:21:06 INFO DriverRunner: Command exited with status 1, re-launching after 1 s.
16/09/18 09:21:07 INFO DriverRunner: Launch Command: "/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.111-2.6.7.2.el7_2.x86_64/jre/bin/java" "-cp" "/opt/spark/spark-2.0.0-bin-hadoop2.7/conf/:/opt/spark/spark-2.0.0-bin-hadoop2.7/jars/*" "-Xmx1024M" "-Dspark.executor.memory=20G" "-Dspark.submit.deployMode=cluster" "-Dspark.app.name=org.apache.spark.examples.SparkPi" "-Dspark.cores.max=2" "-Dspark.rpc.askTimeout=10" "-Dspark.driver.supervise=true" "-Dspark.jars=file:/opt/spark/spark-2.0.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.0.0.jar" "-Dspark.master=spark://x.y.129.163:7077" "-XX:MaxPermSize=256m" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker#x.y.129.162:55256" "/opt/spark/spark-2.0.0-bin-hadoop2.7/work/driver-20160918090435-0001/spark-examples_2.11-2.0.0.jar" "org.apache.spark.examples.SparkPi" "1000"
16/09/18 09:21:12 INFO DriverRunner: Command exited with status 1, re-launching after 1 s.
i.e. the job/driver appears to be failing and then retrying indefinitely.
When I look at the starter from the driver on the worker node I see:
Launch Command: "/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.111-2.6.7.2.el7_2.x86_64/jre/bin/java" "-cp" "/opt/spark/spark-2.0.0-bin-hadoop2.7/conf/:/opt/spark/spark-2.0.0-bin-hadoop2.7/jars/*" "-Xmx1024M" "-Dspark.executor.memory=20G" "-Dspark.submit.deployMode=cluster" "-Dspark.app.name=org.apache.spark.examples.SparkPi" "-Dspark.cores.max=2" "-Dspark.rpc.askTimeout=10" "-Dspark.driver.supervise=true" "-Dspark.jars=file:/opt/spark/spark-2.0.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.0.0.jar" "-Dspark.master=spark://x.y.129.163:7077" "-XX:MaxPermSize=256m" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker#x.y.129.162:33364" "/opt/spark/spark-2.0.0-bin-hadoop2.7/work/driver-20160918090435-0001/spark-examples_2.11-2.0.0.jar" "org.apache.spark.examples.SparkPi" "1000"
========================================
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/09/18 09:13:18 INFO SecurityManager: Changing view acls to: root
16/09/18 09:13:18 INFO SecurityManager: Changing modify acls to: root
16/09/18 09:13:18 INFO SecurityManager: Changing view acls groups to:
16/09/18 09:13:18 INFO SecurityManager: Changing modify acls groups to:
16/09/18 09:13:18 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
16/09/18 09:13:21 WARN ThreadLocalRandom: Failed to generate a seed from SecureRandom within 3 seconds. Not enough entrophy?
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/09/18 09:13:22 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
Exception in thread "main" java.net.BindException: Cannot assign requested address: Service 'Driver' failed after 16 retries! Consider explicitly setting the appropriate port for the service 'Driver' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:463)
at sun.nio.ch.Net.bind(Net.java:455)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125)
at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:485)
at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1089)
at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:430)
at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:415)
at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:903)
at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:198)
at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:348)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
(excuse the timestamps, the logs are the same later on)
On the master I have:
# /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 mesos-master
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
x.y.129.155 mesos-slave-1
x.y.129.161 mesos-slave-2
x.y.129.160 mesos-slave-3
x.y.129.162 mesos-slave-4
# conf/spark-env.sh
#!/usr/bin/env bash
SPARK_MASTER_HOST=x.y.129.163
SPARK_LOCAL_IP=x.y.129.163
And for the worker I have:
# /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 mesos-slave-4 mesos-slave-4.novalocal
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
x.y.129.163 mesos-master
# conf/spark-env.sh
#!/usr/bin/env bash
SPARK_MASTER_HOST=x.y.129.163
SPARK_LOCAL_IP=x.y.129.162
I also disabled all ipv6 from /etc/sysctl.conf.
All daemons are started with the sbin/start-master.sh and sbin/start-slave.sh spark://x.y.129.163:7077 commands.
Update: so I attempted the spark-submit again, but without the --deploy-mode cluster.... and it works! Any idea why it doesn't with cluster mode?
I'm trying to configure a Spark cluster using OpenStack. Currently I have two servers named
spark-master (IP: 192.x.x.1, floating IP: 87.x.x.1)
spark-slave-1 (IP: 192.x.x.2, floating IP: 87.x.x.2)
I am running into problems when trying to use these floating IPs vs the standard public IPs.
On the spark-master machine, the hostname is spark-master and /etc/hosts looks like
127.0.0.1 localhost
127.0.1.1 spark-master
The only change made to spark-env.sh is export SPARK_MASTER_IP='192.x.x.1'. If I run ./sbin/start-master.sh I can view the web UI.
The thing is I view the web UI using the floating IP 87.x.x.1, and there it lists the Master URL: spark://192.x.x.1:7077.
From the slave I can run ./sbin/start-slave.sh spark://192.x.x.1:7077 and it connects successfully.
If I try to use the floating IP by changing spark-env.sh on the master to export SPARK_MASTER_IP='87.x.x.1' then I get the following error log
Spark Command: /usr/lib/jvm/java-7-openjdk-amd64/bin/java -cp /usr/local/spark-1.6.1-bin-hadoop2.6/conf/:/usr/local/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar:/usr/local/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/usr/local/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar -Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.master.Master --ip 87.x.x.1 --port 7077 --webui-port 8080
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/05/12 15:05:33 INFO Master: Registered signal handlers for [TERM, HUP, INT]
16/05/12 15:05:33 WARN Utils: Your hostname, spark-master resolves to a loopback address: 127.0.1.1; using 192.x.x.1 instead (on interface eth0)
16/05/12 15:05:33 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/05/12 15:05:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/05/12 15:05:33 INFO SecurityManager: Changing view acls to: ubuntu
16/05/12 15:05:33 INFO SecurityManager: Changing modify acls to: ubuntu
16/05/12 15:05:33 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu); users with modify permissions: Set(ubuntu)
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7077. Attempting port 7078.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7078. Attempting port 7079.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7079. Attempting port 7080.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7080. Attempting port 7081.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7081. Attempting port 7082.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7082. Attempting port 7083.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7083. Attempting port 7084.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7084. Attempting port 7085.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7085. Attempting port 7086.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7086. Attempting port 7087.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7087. Attempting port 7088.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7088. Attempting port 7089.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7089. Attempting port 7090.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7090. Attempting port 7091.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7091. Attempting port 7092.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7092. Attempting port 7093.
Exception in thread "main" java.net.BindException: Cannot assign requested address: Service 'sparkMaster' failed after 16 retries!
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:463)
at sun.nio.ch.Net.bind(Net.java:455)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125)
at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:485)
at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1089)
at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:430)
at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:415)
at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:903)
at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:198)
at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:348)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
Obviously the takeaway here for me is the line
Your hostname, spark-master resolves to a loopback address: 127.0.1.1;
using 192.x.x.1 instead (on interface eth0) 16/05/12 15:05:33 WARN
Utils: Set SPARK_LOCAL_IP if you need to bind to another address
but no matter what approach I then try and take I just run into more errors.
If I set both export SPARK_MASTER_IP='87.x.x.1' and export SPARK_LOCAL_IP='87.x.x.1' and try ./sbin/start-master.sh I get the following error log
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7077. Attempting port 7078.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7078. Attempting port 7079.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7079. Attempting port 7080.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7080. Attempting port 7081.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7081. Attempting port 7082.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7082. Attempting port 7083.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7083. Attempting port 7084.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7084. Attempting port 7085.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7085. Attempting port 7086.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7086. Attempting port 7087.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7087. Attempting port 7088.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7088. Attempting port 7089.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7089. Attempting port 7090.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7090. Attempting port 7091.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7091. Attempting port 7092.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7092. Attempting port 7093.
Exception in thread "main" java.net.BindException: Cannot assign requested address: Service 'sparkMaster' failed after 16 retries!
This, despite the fact my security group seems correct
ALLOW IPv4 443/tcp from 0.0.0.0/0
ALLOW IPv4 80/tcp from 0.0.0.0/0
ALLOW IPv4 8081/tcp from 0.0.0.0/0
ALLOW IPv4 8080/tcp from 0.0.0.0/0
ALLOW IPv4 18080/tcp from 0.0.0.0/0
ALLOW IPv4 7077/tcp from 0.0.0.0/0
ALLOW IPv4 4040/tcp from 0.0.0.0/0
ALLOW IPv4 to 0.0.0.0/0
ALLOW IPv6 to ::/0
ALLOW IPv4 22/tcp from 0.0.0.0/0
I've set a spark cluster (standalone cluster) on Openstack myself and in my /etc/hosts file on the master, I have:
127.0.0.1 localhost
192.168.1.2 spark-master instead of 127.0.0.1
Now, since I have a virtual private network for my master and my slaves, I only work with the private IPs. The only time I use the floating IP is on my host computer when I launch spark-submit --master spark://spark-master (spark-master here resolves to the floating IP). I don't think you need to try to bind the floating IP. I hope that helps!
Bruno
As appears in logs,
Your hostname, spark-master resolves to a loopback address: 127.0.1.1; using 192.x.x.1 instead (on interface eth0)
Spark automatically try to get the IP of the host, and it uses the other IP 192.x.x.1 rather than the floating IP 87.x.x.1
To resolve this problem you should set SPARK_LOCAL_IP=87.x.x.1 (prefereably in spark-env.sh) and start your master again
I have a Spark workers which can't connect to its master because of an IP issue.
On the start-all.sh on the master (which name is 'pl'), I get the following on the slave log :
16/02/12 21:28:35 INFO WorkerWebUI: Started WorkerWebUI at http://192.168.0.38:8081
16/02/12 21:28:35 INFO Worker: Connecting to master pl:7077...
16/02/12 21:28:35 WARN Worker: Failed to connect to master pl:7077
java.io.IOException: Failed to connect to pl/192.168.0.39:7077
Here is my /etc/hosts file :
$ cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 wk
192.168.0.39 pl
# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
It seems like spark worker is confused between the master names and IP address...How should I set up this ?
Another question is : looking at the master's logs, it seems that the master is listening on another port (7078) than the one the worker is trying to reach (7077) because of a failure to start on the 1st port tried.
romain#pl:~/spark-1.6.0-bin-hadoop2.6/logs$ cat spark-romain-org.apache.spark.deploy.master.Master-1-pl.out
Spark Command: /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java -cp /home/romain/spark-1.6.0-bin-hadoop2.6/conf/:/home/romain/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/home/romain/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/home/romain/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/home/romain/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar -Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.master.Master --ip pl --port 7077 --webui-port 8080
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/02/12 21:28:35 INFO Master: Registered signal handlers for [TERM, HUP, INT]
16/02/12 21:28:35 WARN Utils: Your hostname, pl resolves to a loopback address: 127.0.1.1; using 192.168.0.39 instead (on interface eth0)
16/02/12 21:28:35 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/02/12 21:28:35 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/02/12 21:28:35 INFO SecurityManager: Changing view acls to: romain
16/02/12 21:28:35 INFO SecurityManager: Changing modify acls to: romain
16/02/12 21:28:35 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(romain); users with modify permissions: Set(romain)
16/02/12 21:28:36 WARN Utils: Service 'sparkMaster' could not bind on port 7077. Attempting port 7078.
16/02/12 21:28:36 INFO Utils: Successfully started service 'sparkMaster' on port 7078.
16/02/12 21:28:36 INFO Master: Starting Spark master at spark://pl:7078
16/02/12 21:28:36 INFO Master: Running Spark version 1.6.0
16/02/12 21:28:36 WARN Utils: Service 'MasterUI' could not bind on port 8080. Attempting port 8081.
16/02/12 21:28:36 WARN Utils: Service 'MasterUI' could not bind on port 8081. Attempting port 8082.
16/02/12 21:28:36 INFO Utils: Successfully started service 'MasterUI' on port 8082.
16/02/12 21:28:36 INFO MasterWebUI: Started MasterWebUI at http://192.168.0.39:8082
16/02/12 21:28:36 WARN Utils: Service could not bind on port 6066. Attempting port 6067.
16/02/12 21:28:36 INFO Utils: Successfully started service on port 6067.
16/02/12 21:28:36 INFO StandaloneRestServer: Started REST server for submitting applications on port 6067
16/02/12 21:28:36 INFO Master: I have been elected leader! New state: ALIVE
But what is strange is that the local worker logs as if successusfully connected to the local master on port :
16/02/12 21:28:38 INFO Worker: Connecting to master pl:7077...
16/02/12 21:28:38 INFO Worker: Successfully registered with master spark://pl:7077
You can try running netstat -pna | grep 7077 (needs root privileges) on the master to see what process is blocking the port.
Maybe you have another driver instance running. If this is a Java process blocking the port you can use jps to find out more about it.