Spark Cluster, failed to connect to master. (WARN Worker: Failed to connect to master) - apache-spark

I have a spark cluster with 2 nodes, master(172.17.0.229) and slave(172.17.0.228). I have edited spark-env.sh, added SPARK_MASTER_IP=127.17.0.229 and slaves, added 172.17.0.228.
I am starting my master node using start-master.sh and slave node using start-slaves.sh.
I can see the webUI with a master node with no worker, but the log of worker node is as:
Spark Command: /usr/lib/jvm/java-7-oracle/jre/bin/java -cp /usr/local/src/spark-1.5.2-bin-hadoop2.6/sbin/../conf/:/usr/local/src/spark-1.5.2-bin-hadoop$
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/12/18 14:17:25 INFO Worker: Registered signal handlers for [TERM, HUP, INT]
15/12/18 14:17:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/12/18 14:17:26 INFO SecurityManager: Changing view acls to: ujjwal
15/12/18 14:17:26 INFO SecurityManager: Changing modify acls to: ujjwal
15/12/18 14:17:26 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ujjwal); users wit$
15/12/18 14:17:27 INFO Slf4jLogger: Slf4jLogger started
15/12/18 14:17:27 INFO Remoting: Starting remoting
15/12/18 14:17:27 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkWorker#172.17.0.228:47599]
15/12/18 14:17:27 INFO Utils: Successfully started service 'sparkWorker' on port 47599.
15/12/18 14:17:27 INFO Worker: Starting Spark worker 172.17.0.228:47599 with 2 cores, 2.7 GB RAM
15/12/18 14:17:27 INFO Worker: Running Spark version 1.5.2
15/12/18 14:17:27 INFO Worker: Spark home: /usr/local/src/spark-1.5.2-bin-hadoop2.6
15/12/18 14:17:27 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
15/12/18 14:17:27 INFO WorkerWebUI: Started WorkerWebUI at http://172.17.0.228:8081
15/12/18 14:17:27 INFO Worker: Connecting to master 127.17.0.229:7077...
15/12/18 14:17:27 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkMaster#127.17.0.229:7077] has failed, address is now$
15/12/18 14:17:27 WARN Worker: Failed to connect to master 127.17.0.229:7077
akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://sparkMaster#127.17.0.229:7077/), Path(/user/Master)]
at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:73)
at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:120)
at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:266)
at akka.actor.EmptyLocalActorRef.specialHandle(ActorRef.scala:533)
at akka.actor.DeadLetterActorRef.specialHandle(ActorRef.scala:569)
at akka.actor.DeadLetterActorRef.$bang(ActorRef.scala:559)
at akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef.$bang(RemoteActorRefProvider.scala:87)
at akka.remote.EndpointWriter.postStop(Endpoint.scala:557)
at akka.actor.Actor$class.aroundPostStop(Actor.scala:477)
at akka.remote.EndpointActor.aroundPostStop(Endpoint.scala:411)
at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)
at akka.actor.ActorCell.terminate(ActorCell.scala:369)
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)
Thanks for your suggestion.

Generally, checking the IP that your worker is trying to connect to against the reported spark://...:7077 address on the web UI at 172.17.0.229 port 8080 will help identify whether the address is correct.
In this particular case, it looks like you have a typo; change
SPARK_MASTER_IP=127.17.0.229
to read:
SPARK_MASTER_IP=172.17.0.229
(you seem to have 127/172 inverted).

My issue was a version mismatch between the spark java library I was using (2.0.0) and the version of the spark cluster (2.2.1)

Related

jupyter notebook error when Starting Spark application using pyspark kernel

I've been trying to configure jupyter notebook and pyspark kernel. I am actually new to this and ubuntu os. When I tried to run some code in the jupyter notebook using pyspark kernel, I received the error log below.
Note that it used to work before but without SQL magic. After I installed sparkmagic to use SQL magic, this happened.
Appreciate your help, thanks.
ID YARN Application ID Kind State Spark UI Driver log Current session?
1 None pyspark idle ✔
The code failed because of a fatal error:
Session 1 unexpectedly reached final status 'error'. See logs:
stdout:
stderr:
19/10/12 16:47:57 WARN Utils: Your hostname, majd-desktop resolves to a loopback address: 127.0.1.1; using 192.168.1.6 instead (on interface enp1s0)
19/10/12 16:47:57 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
19/10/12 16:47:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
log4j:WARN No appenders could be found for logger (io.netty.util.internal.logging.InternalLoggerFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/10/12 16:48:00 INFO SparkContext: Running Spark version 2.4.4
19/10/12 16:48:00 INFO SparkContext: Submitted application: livy-session-1
19/10/12 16:48:00 INFO SecurityManager: Changing view acls to: majd
19/10/12 16:48:00 INFO SecurityManager: Changing modify acls to: majd
19/10/12 16:48:00 INFO SecurityManager: Changing view acls groups to:
19/10/12 16:48:00 INFO SecurityManager: Changing modify acls groups to:
19/10/12 16:48:00 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(majd); groups with view permissions: Set(); users with modify permissions: Set(majd); groups with modify permissions: Set()
19/10/12 16:48:00 INFO Utils: Successfully started service 'sparkDriver' on port 33779.
19/10/12 16:48:00 INFO SparkEnv: Registering MapOutputTracker
19/10/12 16:48:00 INFO SparkEnv: Registering BlockManagerMaster
19/10/12 16:48:00 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/10/12 16:48:00 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/10/12 16:48:00 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-d9d22c37-be4c-4498-b115-2011ee176dbf
19/10/12 16:48:00 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
19/10/12 16:48:00 INFO SparkEnv: Registering OutputCommitCoordinator
19/10/12 16:48:00 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/10/12 16:48:00 INFO Utils: Successfully started service 'SparkUI' on port 4041.
19/10/12 16:48:00 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.1.6:4041
19/10/12 16:48:00 INFO SparkContext: Added JAR file:///home/majd/anaconda3/share/apache-livy-0.4.0.60ee047/rsc/target/jars/livy-api-0.4.0-incubating-SNAPSHOT.jar at spark://192.168.1.6:33779/jars/livy-api-0.4.0-incubating-SNAPSHOT.jar with timestamp 1570888080918
19/10/12 16:48:00 INFO SparkContext: Added JAR file:///home/majd/anaconda3/share/apache-livy-0.4.0.60ee047/rsc/target/jars/livy-rsc-0.4.0-incubating-SNAPSHOT.jar at spark://192.168.1.6:33779/jars/livy-rsc-0.4.0-incubating-SNAPSHOT.jar with timestamp 1570888080919
19/10/12 16:48:00 INFO SparkContext: Added JAR file:///home/majd/anaconda3/share/apache-livy-0.4.0.60ee047/rsc/target/jars/netty-all-4.0.29.Final.jar at spark://192.168.1.6:33779/jars/netty-all-4.0.29.Final.jar with timestamp 1570888080919
19/10/12 16:48:00 INFO SparkContext: Added JAR file:///home/majd/anaconda3/share/apache-livy-0.4.0.60ee047/repl/scala-2.11/target/jars/commons-codec-1.9.jar at spark://192.168.1.6:33779/jars/commons-codec-1.9.jar with timestamp 1570888080919
19/10/12 16:48:00 INFO SparkContext: Added JAR file:///home/majd/anaconda3/share/apache-livy-0.4.0.60ee047/repl/scala-2.11/target/jars/livy-core_2.11-0.4.0-incubating-SNAPSHOT.jar at spark://192.168.1.6:33779/jars/livy-core_2.11-0.4.0-incubating-SNAPSHOT.jar with timestamp 1570888080920
19/10/12 16:48:00 INFO SparkContext: Added JAR file:///home/majd/anaconda3/share/apache-livy-0.4.0.60ee047/repl/scala-2.11/target/jars/livy-repl_2.11-0.4.0-incubating-SNAPSHOT.jar at spark://192.168.1.6:33779/jars/livy-repl_2.11-0.4.0-incubating-SNAPSHOT.jar with timestamp 1570888080920
19/10/12 16:48:00 INFO Executor: Starting executor ID driver on host localhost
19/10/12 16:48:01 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38259.
19/10/12 16:48:01 INFO NettyBlockTransferService: Server created on 192.168.1.6:38259
19/10/12 16:48:01 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
19/10/12 16:48:01 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.1.6, 38259, None)
19/10/12 16:48:01 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.6:38259 with 366.3 MB RAM, BlockManagerId(driver, 192.168.1.6, 38259, None)
19/10/12 16:48:01 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.6, 38259, None)
19/10/12 16:48:01 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.6, 38259, None).
Some things to try:
a) Make sure Spark has enough available resources for Jupyter to create a Spark context.
b) Contact your Jupyter administrator to make sure the Spark magics library is configured correctly.
c) Restart the kernel.

Spark SBT compilation issue

in my compilation even though i am placing twitter jar files in the src/main/resources folder ,but SBT compilation is not picking them up and compiles and package without errors but at run time gives me error as "class not found twitterUtils"
my question is why SBT is not including the jar files from resource folder in the compilation ?
people are telling me to do all these complex steps of getting the Git utility and then doing a sbt assembly which I did but since iam behind proxy Git is not working even though all the http_proxy setup.
I have also tried putting these twitter jar files in the CLASSPATH with no luck.
I am stuck with this issue so any help is highly appreciated.
please see the details below
[root#hadoop1 TwitterPopularTags]# pwd
/root/TwitterPopularTags
[root#hadoop1 TwitterPopularTags]# sbt compile
[info] Set current project to TwitterPopularTags (in build file:/root/TwitterPopularTags/)
[info] Updating {file:/root/TwitterPopularTags/}twitterpopulartags...
[info] Resolving jline#jline;2.12.1 ...
[info] Done updating.
[info] Compiling 2 Scala sources to /root/TwitterPopularTags/target/scala-2.11/classes...
[success] Total time: 14 s, completed Sep 16, 2016 9:55:20 AM
[root#hadoop1 TwitterPopularTags]# sbt package
[info] Set current project to TwitterPopularTags (in build file:/root/TwitterPopularTags/)
[info] Packaging /root/TwitterPopularTags/target/scala-2.11/twitterpopulartags_2.11-1.0.jar ...
[info] Done packaging.
[success] Total time: 1 s, completed Sep 16, 2016 9:56:20 AM
[root#hadoop1 TwitterPopularTags]# spark-submit /root/TwitterPopularTags/target/scala-2.11/twitterpopulartags_2.11-1.0.jar
16/09/16 09:57:06 INFO SparkContext: Running Spark version 1.6.2
16/09/16 09:57:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/09/16 09:57:06 INFO SecurityManager: Changing view acls to: root
16/09/16 09:57:06 INFO SecurityManager: Changing modify acls to: root
16/09/16 09:57:06 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/09/16 09:57:07 INFO Utils: Successfully started service 'sparkDriver' on port 53967.
16/09/16 09:57:07 INFO Slf4jLogger: Slf4jLogger started
16/09/16 09:57:07 INFO Remoting: Starting remoting
16/09/16 09:57:07 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem#10.100.44.17:57877]
16/09/16 09:57:07 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 57877.
16/09/16 09:57:07 INFO SparkEnv: Registering MapOutputTracker
16/09/16 09:57:07 INFO SparkEnv: Registering BlockManagerMaster
16/09/16 09:57:07 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-47a89077-0926-447c-ada7-fdb4a9aa1b83
16/09/16 09:57:07 INFO MemoryStore: MemoryStore started with capacity 511.5 MB
16/09/16 09:57:07 INFO SparkEnv: Registering OutputCommitCoordinator
16/09/16 09:57:08 INFO Server: jetty-8.y.z-SNAPSHOT
16/09/16 09:57:08 INFO AbstractConnector: Started SelectChannelConnector#0.0.0.0:4040
16/09/16 09:57:08 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/09/16 09:57:08 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.100.44.17:4040
16/09/16 09:57:08 INFO HttpFileServer: HTTP File server directory is /tmp/spark-d56628b6-fdbf-4d89-bbd2-a96603000607/httpd-ee499eb3-00ae-4276-b163-423e3b81f0b4
16/09/16 09:57:08 INFO HttpServer: Starting HTTP Server
16/09/16 09:57:08 INFO Server: jetty-8.y.z-SNAPSHOT
16/09/16 09:57:08 INFO AbstractConnector: Started SocketConnector#0.0.0.0:56067
16/09/16 09:57:08 INFO Utils: Successfully started service 'HTTP file server' on port 56067.
16/09/16 09:57:08 INFO SparkContext: Added JAR file:/root/TwitterPopularTags/target/scala-2.11/twitterpopulartags_2.11-1.0.jar at http://10.100.44.17:56067/jars/twitterpopulartags_2.11-1.0.jar with timestamp 1474034228091
16/09/16 09:57:08 INFO Executor: Starting executor ID driver on host localhost
16/09/16 09:57:08 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 49715.
16/09/16 09:57:08 INFO NettyBlockTransferService: Server created on 49715
16/09/16 09:57:08 INFO BlockManagerMaster: Trying to register BlockManager
16/09/16 09:57:08 INFO BlockManagerMasterEndpoint: Registering block manager localhost:49715 with 511.5 MB RAM, BlockManagerId(driver, localhost, 49715)
16/09/16 09:57:08 INFO BlockManagerMaster: Registered BlockManager
16/09/16 09:57:08 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
16/09/16 09:57:08 INFO EventLoggingListener: Logging events to hdfs:///spark-history/local-1474034228122
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/streaming/twitter/TwitterUtils$
at dot.state.fl.us.PrintTweets$.main(PrintTweets.scala:29)
at dot.state.fl.us.PrintTweets.main(PrintTweets.scala)
my question is why SBT is not including the jar files from resource folder in the compilation ?
Because that's not what resource folder is for. If you want to manage the dependencies manually, put them into lib folder instead. But in this case you also need to do the same with all dependencies of those dependencies, their dependencies, etc. Using managed dependencies, as described in the linked documentation, is a much better idea in general.

Submitting a job to Apache Spark Error

I have the following settings for my Apache Spark instance that runs locally on my machine:
export SPARK_HOME=/Users/joe/Softwares/apache/spark/spark-1.6.0-bin-hadoop2.6
export SPARK_MASTER_IP=127.0.0.1
export SPARK_MASTER_PORT=7077
export SPARK_MASTER_WEBUI_PORT=8080
export SPARK_LOCAL_DIRS=$SPARK_HOME/work
export SPARK_WORKER_CORES=1
export SPARK_WORKER_MEMORY=1G
export SPARK_EXECUTOR_INSTANCES=2
export SPARK_DAEMON_MEMORY=384m
I have a spark streaming consumer that I would like to submit to Spark. This streaming consumer is just a jar file that I submit like this:
$SPARK_HOME/bin/spark-submit --class com.my.job.MetricsConsumer --master spark://127.0.0.1:7077 /Users/joe/Sandbox/jaguar/spark-kafka-consumer/target/scala-2.11/spark-kafka-consumer-0.1.0-SNAPAHOT.jar
I get the following error:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/01/13 10:30:06 INFO SparkContext: Running Spark version 1.6.0
16/01/13 10:30:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/01/13 10:30:06 INFO SecurityManager: Changing view acls to: joe
16/01/13 10:30:06 INFO SecurityManager: Changing modify acls to: joe
16/01/13 10:30:06 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(joe); users with modify permissions: Set(joe)
16/01/13 10:30:07 INFO Utils: Successfully started service 'sparkDriver' on port 65528.
16/01/13 10:30:07 INFO Slf4jLogger: Slf4jLogger started
16/01/13 10:30:08 INFO Remoting: Starting remoting
16/01/13 10:30:08 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem#172.22.0.104:65529]
16/01/13 10:30:08 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 65529.
16/01/13 10:30:08 INFO SparkEnv: Registering MapOutputTracker
16/01/13 10:30:08 INFO SparkEnv: Registering BlockManagerMaster
16/01/13 10:30:08 INFO DiskBlockManager: Created local directory at /Users/joe/Softwares/apache/spark/spark-1.6.0-bin-hadoop2.6/work/blockmgr-cee3388d-ecfc-42a7-a76c-8738401db0c9
16/01/13 10:30:08 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
16/01/13 10:30:08 INFO SparkEnv: Registering OutputCommitCoordinator
16/01/13 10:30:08 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/01/13 10:30:08 INFO SparkUI: Started SparkUI at http://172.22.0.104:4040
16/01/13 10:30:08 INFO HttpFileServer: HTTP File server directory is /Users/joe/Softwares/apache/spark/spark-1.6.0-bin-hadoop2.6/work/spark-10d7d880-7d1d-4234-88d4-d80558c8051a/httpd-40f80936-7508-4b6c-bb90-411aa37d7e93
16/01/13 10:30:08 INFO HttpServer: Starting HTTP Server
16/01/13 10:30:08 INFO Utils: Successfully started service 'HTTP file server' on port 65530.
16/01/13 10:30:09 INFO SparkContext: Added JAR file:/Users/joe/Sandbox/jaguar/spark-kafka-consumer/target/scala-2.11/spark-kafka-consumer-0.1.0-SNAPAHOT.jar at http://172.22.0.104:65530/jars/spark-kafka-consumer-0.1.0-SNAPAHOT.jar with timestamp 1452677409966
16/01/13 10:30:10 INFO AppClient$ClientEndpoint: Connecting to master spark://myhost:7077...
16/01/13 10:30:10 WARN AppClient$ClientEndpoint: Failed to connect to master myhost:7077
java.io.IOException: Failed to connect to myhost:7077
export MAVEN_OPTS="-Xmx512m -XX:MaxPermSize=128m"
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:101)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622)
at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:209)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:207)
at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1097)
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
at io.netty.channel.ChannelOutboundHandlerAdapter.connect(ChannelOutboundHandlerAdapter.java:47)
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
at io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:50)
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:438)
at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:908)
at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:203)
at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:166)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more
I have checked my firewall settings and eveything seems to be Ok. Why would I get this error? Any ideas?

Spark worker can not connect to Master

While starting the worker node I get the following error :
Spark Command: /usr/lib/jvm/default-java/bin/java -cp /home/ubuntu/spark-1.5.1-bin-hadoop2.6/sbin/../conf/:/home/ubuntu/spark-1.5.1-bin-hadoop2.6/lib/spark-assembly-1.5.1-hadoop2.6.0.jar:/home/ubuntu/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/home/ubuntu/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/home/ubuntu/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar -Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://ip-1-70-44-5:7077
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/10/16 19:19:10 INFO Worker: Registered signal handlers for [TERM, HUP, INT]
15/10/16 19:19:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/10/16 19:19:11 INFO SecurityManager: Changing view acls to: ubuntu
15/10/16 19:19:11 INFO SecurityManager: Changing modify acls to: ubuntu
15/10/16 19:19:11 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu); users with modify permissions: Set(ubuntu)
15/10/16 19:19:12 INFO Slf4jLogger: Slf4jLogger started
15/10/16 19:19:12 INFO Remoting: Starting remoting
15/10/16 19:19:12 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkWorker#1.70.44.4:55126]
15/10/16 19:19:12 INFO Utils: Successfully started service 'sparkWorker' on port 55126.
15/10/16 19:19:12 INFO Worker: Starting Spark worker 1.70.44.4:55126 with 2 cores, 2.9 GB RAM
15/10/16 19:19:12 INFO Worker: Running Spark version 1.5.1
15/10/16 19:19:12 INFO Worker: Spark home: /home/ubuntu/spark-1.5.1-bin-hadoop2.6
15/10/16 19:19:12 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
15/10/16 19:19:12 INFO WorkerWebUI: Started WorkerWebUI at http://1.70.44.4:8081
15/10/16 19:19:12 INFO Worker: Connecting to master ip-1-70-44-5:7077...
15/10/16 19:19:24 INFO Worker: Retrying connection to master (attempt # 1)
15/10/16 19:19:24 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[sparkWorker-akka.actor.default-dispatcher-5,5,main]
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask#1c5651e9 rejected from java.util.concurrent.ThreadPoolExecutor#671ba687[Running, pool size = 1, active threads = 0, queued tasks = 0, completed tasks = 0]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:110)
at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1.apply(Worker.scala:211)
at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1.apply(Worker.scala:210)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at org.apache.spark.deploy.worker.Worker.org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters(Worker.scala:210)
at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$reregisterWithMaster$1.apply$mcV$sp(Worker.scala:288)
at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1119)
at org.apache.spark.deploy.worker.Worker.org$apache$spark$deploy$worker$Worker$$reregisterWithMaster(Worker.scala:234)
at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:521)
at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$processMessage(AkkaRpcEnv.scala:177)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$4.apply$mcV$sp(AkkaRpcEnv.scala:126)
at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:197)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1.applyOrElse(AkkaRpcEnv.scala:125)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59)
at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.aroundReceive(AkkaRpcEnv.scala:92)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15/10/16 19:19:24 INFO ShutdownHookManager: Shutdown hook called
I have added the hostnames to the conf/slaves file. I dont know which enviroment variables to set in spark-env.sh so right not its not being used.
Any pointers to the solution ?
Also, if I should use spark-env.sh then which enviroment vvariables should I run ?
setup details :
2 ubuntu14 machines having 2 cores each.
Please advise.
thanks
So, after some tinkering around I found that slave was not able to communicate with Master on the given port. I changed the security access rules and enabled all TCP traffic on all ports . This solved the problem.
To check if the port is open :
telnet master.ip master.port
The default port is 7077.
My spark-env.sh :
export SPARK_WORKER_INSTANCES=2
export SPARK_MASTER_IP=<ip address>
I'm afraid your hostname may be invalid to Spark, and you hava to change your spark-env.sh.
You can set the variable SPARK_MASTER_IP to be the real ip of master, instead of its hostname.
e.g.
export SPARK_MASTER_IP=1.70.44.5
INSTEAD OF
export SPARK_MASTER_IP=ip-1-70-44-5

SPARK + Standalone Cluster: Cannot start worker from another machine

I've been setting up a Spark standalone cluster setup following this link. I have 2 machines; The first one (ubuntu0) serve as both the master and a worker, and the second one (ubuntu1) is just a worker. Password-less ssh has been properly configured for both machines already and was tested by doing SSH manually on both sides.
Now when I tried to ./start-all.ssh, both master and worker on the master machine (ubuntu0) were started properly. This is signified by (1)WebUI being accessible (localhost:8081 on my part) and (2) Worker registered/displayed on the WebUI.
However, the other worker on the second machine (ubuntu1), was not started. The error displayed was:
ubuntu1: ssh: connect to host ubuntu1 port 22: Connection timed out
Now this is quite weird already given that I've properly configured the ssh to be password-less on both sides. Given this, I accessed the second machine and tried to start the worker manually using these commands:
./spark-class org.apache.spark.deploy.worker.Worker spark://ubuntu0:7707
./spark-class org.apache.spark.deploy.worker.Worker spark://<ip>:7707
However, below is the result:
14/05/23 13:49:08 INFO Utils: Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
14/05/23 13:49:08 WARN Utils: Your hostname, ubuntu1 resolves to a loopback address:
127.0.1.1; using 192.168.122.1 instead (on interface virbr0)
14/05/23 13:49:08 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
14/05/23 13:49:09 INFO Slf4jLogger: Slf4jLogger started
14/05/23 13:49:09 INFO Remoting: Starting remoting
14/05/23 13:49:09 INFO Remoting: Remoting started; listening on addresses :
[akka.tcp://sparkWorker#ubuntu1.local:42739]
14/05/23 13:49:09 INFO Worker: Starting Spark worker ubuntu1.local:42739 with 8 cores,
4.8 GB RAM
14/05/23 13:49:09 INFO Worker: Spark home: /home/ubuntu1/jaysonp/spark/spark-0.9.1
14/05/23 13:49:09 INFO WorkerWebUI: Started Worker web UI at http://ubuntu1.local:8081
14/05/23 13:49:09 INFO Worker: Connecting to master spark://ubuntu0:7707...
14/05/23 13:49:29 INFO Worker: Connecting to master spark://ubuntu0:7707...
14/05/23 13:49:49 INFO Worker: Connecting to master spark://ubuntu0:7707...
14/05/23 13:50:09 ERROR Worker: All masters are unresponsive! Giving up.
Below are the contents of my master and slave\worker spark-env.ssh:
SPARK_MASTER_IP=192.168.3.222
STANDALONE_SPARK_MASTER_HOST=`hostname -f`
How should I resolve this? Thanks in advance!
For those who are still encountering error(s) when it comes to starting workers on different machines, I just want to share that using IP addresses in conf/slaves worked for me.
Hope this helps!
I have add similar issues today running spark 1.5.1 on RHEL 6.7.
I have 2 machines, their hostname being
- master.domain.com
- slave.domain.com
I installed a standalone version of spark (pre-build against hadoop 2.6) and installed my Oracle jdk-8u66.
Spark download:
wget http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz
Java download
wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8u66-b17/jdk-8u66-linux-x64.tar.gz"
after spark and java are unpacked in my home directory I did the following:
on 'master.domain.com' I ran:
./sbin/start-master.sh
The webUI become available at http://master.domain.com:8080 (no slave running)
on 'slave.domain.com' I did try:
./sbin/start-slave.sh spark://master.domain.com:7077 FAILED AS FOLLOW
Spark Command: /root/java/bin/java -cp /root/spark-1.5.1-bin-hadoop2.6/sbin/../conf/:/root/spark-1.5.1-bin-hadoop2.6/lib/spark-assembly-1.5.1-hadoop2.6.0.jar:/root/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/root/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/root/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar -Xms1g -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://master.domain.com:7077
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/11/06 11:03:51 INFO Worker: Registered signal handlers for [TERM, HUP, INT]
15/11/06 11:03:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/11/06 11:03:51 INFO SecurityManager: Changing view acls to: root
15/11/06 11:03:51 INFO SecurityManager: Changing modify acls to: root
15/11/06 11:03:51 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/11/06 11:03:52 INFO Slf4jLogger: Slf4jLogger started
15/11/06 11:03:52 INFO Remoting: Starting remoting
15/11/06 11:03:52 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkWorker#10.80.70.38:50573]
15/11/06 11:03:52 INFO Utils: Successfully started service 'sparkWorker' on port 50573.
15/11/06 11:03:52 INFO Worker: Starting Spark worker 10.80.70.38:50573 with 8 cores, 6.7 GB RAM
15/11/06 11:03:52 INFO Worker: Running Spark version 1.5.1
15/11/06 11:03:52 INFO Worker: Spark home: /root/spark-1.5.1-bin-hadoop2.6
15/11/06 11:03:53 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
15/11/06 11:03:53 INFO WorkerWebUI: Started WorkerWebUI at http://10.80.70.38:8081
15/11/06 11:03:53 INFO Worker: Connecting to master master.domain.com:7077...
15/11/06 11:04:05 INFO Worker: Retrying connection to master (attempt # 1)
15/11/06 11:04:05 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[sparkWorker-akka.actor.default-dispatcher-4,5,main]
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask#48711bf5 rejected from java.util.concurrent.ThreadPoolExecutor#14db705b[Running, pool size = 1, active threads = 0, queued tasks = 0, completed tasks = 1]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1.apply(Worker.scala:211)
at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1.apply(Worker.scala:210)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at org.apache.spark.deploy.worker.Worker.org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters(Worker.scala:210)
at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$reregisterWithMaster$1.apply$mcV$sp(Worker.scala:288)
at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1119)
at org.apache.spark.deploy.worker.Worker.org$apache$spark$deploy$worker$Worker$$reregisterWithMaster(Worker.scala:234)
at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:521)
at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$processMessage(AkkaRpcEnv.scala:177)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$4.apply$mcV$sp(AkkaRpcEnv.scala:126)
at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:197)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1.applyOrElse(AkkaRpcEnv.scala:125)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59)
at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.aroundReceive(AkkaRpcEnv.scala:92)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15/11/06 11:04:05 INFO ShutdownHookManager: Shutdown hook called
start-slave spark://<master-IP>:7077 also FAILED as above.
start-slave spark://master:7077 WORKED and the worker shows in the master webUI
Spark Command: /root/java/bin/java -cp /root/spark-1.5.1-bin-hadoop2.6/sbin/../conf/:/root/spark-1.5.1-bin-hadoop2.6/lib/spark-assembly-1.5.1-hadoop2.6.0.jar:/root/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/root/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/root/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar -Xms1g -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://master:7077
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/11/06 11:08:15 INFO Worker: Registered signal handlers for [TERM, HUP, INT]
15/11/06 11:08:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/11/06 11:08:15 INFO SecurityManager: Changing view acls to: root
15/11/06 11:08:15 INFO SecurityManager: Changing modify acls to: root
15/11/06 11:08:15 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/11/06 11:08:16 INFO Slf4jLogger: Slf4jLogger started
15/11/06 11:08:16 INFO Remoting: Starting remoting
15/11/06 11:08:17 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkWorker#10.80.70.38:40780]
15/11/06 11:08:17 INFO Utils: Successfully started service 'sparkWorker' on port 40780.
15/11/06 11:08:17 INFO Worker: Starting Spark worker 10.80.70.38:40780 with 8 cores, 6.7 GB RAM
15/11/06 11:08:17 INFO Worker: Running Spark version 1.5.1
15/11/06 11:08:17 INFO Worker: Spark home: /root/spark-1.5.1-bin-hadoop2.6
15/11/06 11:08:17 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
15/11/06 11:08:17 INFO WorkerWebUI: Started WorkerWebUI at http://10.80.70.38:8081
15/11/06 11:08:17 INFO Worker: Connecting to master master:7077...
15/11/06 11:08:17 INFO Worker: Successfully registered with master spark://master:7077
Note: I haven't added any extra config in conf/spark-env.sh
Note2: when looking in the master webUI, the spark master URL at the top is actually the one that worked for me, so I'd say in doubts just use that one.
I hope this helps ;)
Using hostname in /cong/slaves worked well for me.
Here are some steps I would take it,
Checked SSH public key
scp /etc/spark/conf.dist/spark-env.sh to your workers
My part of setting in spark-env.sh
export STANDALONE_SPARK_MASTER_HOST=hostname
export SPARK_MASTER_IP=$STANDALONE_SPARK_MASTER_HOST
I guess you missed something in your configuration, that's what I learned from your log.
Check your /etc/hosts, make sure ubuntu1 in your master's host list and its Ip is match the slave's IP address.
Add export SPARK_LOCAL_IP='ubuntu1' in the spark-env.sh file of your slave

Resources