I have a four node Hadoop/Spark cluster running in AWS. I can submit and run jobs perfectly in local mode:
spark-submit --master local[*] myscript.py
But when I attempt to run the script in cluster mode, it fails. I'm just trying the cluster equivalent of "hello world":
spark-submit spark-yarn.py
Where the script is this one that was recommended:
from pyspark import SparkConf
from pyspark import SparkContext
conf = SparkConf()
conf.setMaster('yarn')
conf.setAppName('spark-yarn')
sc = SparkContext(conf=conf)
def mod(x):
import numpy as np
return (x, np.mod(x, 2))
rdd = sc.parallelize(range(1000)).map(mod).take(10)
print(rdd)
I've spent days looking at every log I can find and reading everything I can online, but nothing has helped me get to the root of why it's not working. Before I tear down all the servers and start over, I'm hoping someone can point me in the right direction to get this working.
Here's the output in the terminal:
20/02/25 12:59:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/02/25 13:00:11 ERROR YarnClientSchedulerBackend: YARN application has exited unexpectedly with state FAILED! Check the YARN application logs for more details.
20/02/25 13:00:11 ERROR YarnClientSchedulerBackend: Diagnostics message: Application application_1582603840719_0002 failed 2 times due to AM Container for appattempt_1582603840719_0002_000002 exited with exitCode: -103
Failing this attempt.Diagnostics: [2020-02-25 13:00:11.601]Container [pid=3124,containerID=container_1582603840719_0002_02_000001] is running beyond virtual memory limits. Current usage: 328.7 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1582603840719_0002_02_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 3128 3124 3124 3124 (java) 504 34 2359349248 83396 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -server -Xmx512m -Djava.io.tmpdir=/tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1582603840719_0002/container_1582603840719_0002_02_000001/tmp -Dspark.yarn.app.container.log.dir=/home/ubuntu/server/hadoop-2.9.2/logs/userlogs/application_1582603840719_0002/container_1582603840719_0002_02_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg ip-172-31-7-96.ec2.internal:43275 --properties-file /tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1582603840719_0002/container_1582603840719_0002_02_000001/__spark_conf__/__spark_conf__.properties
|- 3124 3122 3124 3124 (bash) 0 0 13635584 760 /bin/bash -c /usr/lib/jvm/java-8-openjdk-amd64/bin/java -server -Xmx512m -Djava.io.tmpdir=/tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1582603840719_0002/container_1582603840719_0002_02_000001/tmp -Dspark.yarn.app.container.log.dir=/home/ubuntu/server/hadoop-2.9.2/logs/userlogs/application_1582603840719_0002/container_1582603840719_0002_02_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg 'ip-172-31-7-96.ec2.internal:43275' --properties-file /tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1582603840719_0002/container_1582603840719_0002_02_000001/__spark_conf__/__spark_conf__.properties 1> /home/ubuntu/server/hadoop-2.9.2/logs/userlogs/application_1582603840719_0002/container_1582603840719_0002_02_000001/stdout 2> /home/ubuntu/server/hadoop-2.9.2/logs/userlogs/application_1582603840719_0002/container_1582603840719_0002_02_000001/stderr
[2020-02-25 13:00:11.651]Container killed on request. Exit code is 143
[2020-02-25 13:00:11.658]Container exited with a non-zero exit code 143.
For more detailed output, check the application tracking page: http://ec2-34-200-223-235.compute-1.amazonaws.com:8088/cluster/app/application_1582603840719_0002 Then click on links to logs of each attempt.
. Failing the application.
20/02/25 13:00:11 ERROR TransportClient: Failed to send RPC RPC 6867152665638655473 to /172.31.9.94:57526: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
20/02/25 13:00:11 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint: Sending RequestExecutors(0,0,Map(),Set()) to AM was unsuccessful
java.io.IOException: Failed to send RPC RPC 6867152665638655473 to /172.31.9.94:57526: java.nio.channels.ClosedChannelException
at org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:362)
at org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:339)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:122)
at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:987)
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:869)
at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1316)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730)
at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:38)
at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:1081)
at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1128)
at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1070)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
20/02/25 13:00:11 ERROR Utils: Uncaught exception in thread YARN application state monitor
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:574)
at org.apache.spark.scheduler.cluster.YarnSchedulerBackend.stop(YarnSchedulerBackend.scala:98)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:164)
at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:653)
at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:2042)
at org.apache.spark.SparkContext$$anonfun$stop$6.apply$mcV$sp(SparkContext.scala:1949)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1948)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:121)
Caused by: java.io.IOException: Failed to send RPC RPC 6867152665638655473 to /172.31.9.94:57526: java.nio.channels.ClosedChannelException
at org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:362)
at org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:339)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:122)
at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:987)
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:869)
at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1316)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730)
at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:38)
at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:1081)
at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1128)
at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1070)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
20/02/25 13:00:12 ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalStateException: Spark context stopped while waiting for backend
at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:818)
at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:196)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:560)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Traceback (most recent call last):
File "/home/ubuntu/server/spark-yarn.py", line 7, in <module>
sc = SparkContext(conf=conf)
File "/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 136, in __init__
File "/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 198, in _do_init
File "/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 306, in _initialize_context
File "/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1525, in __call__
File "/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalStateException: Spark context stopped while waiting for backend
at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:818)
at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:196)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:560)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
If I change 'spark' to 'spark-client' as the master, it gives a slightly different error:
20/02/25 13:07:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/02/25 13:07:46 ERROR TransportClient: Failed to send RPC RPC 5381013595535555066 to /172.31.5.228:39748: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
20/02/25 13:07:46 ERROR YarnScheduler: Lost executor 1 on ip-172-31-5-228.ec2.internal: Slave lost
20/02/25 13:07:51 ERROR YarnClientSchedulerBackend: YARN application has exited unexpectedly with state FAILED! Check the YARN application logs for more details.
20/02/25 13:07:51 ERROR YarnClientSchedulerBackend: Diagnostics message: Application application_1582603840719_0003 failed 2 times due to AM Container for appattempt_1582603840719_0003_000002 exited with exitCode: -103
Failing this attempt.Diagnostics: [2020-02-25 13:07:51.067]Container [pid=3223,containerID=container_1582603840719_0003_02_000001] is running beyond virtual memory limits. Current usage: 320.8 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1582603840719_0003_02_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 3227 3223 3223 3223 (java) 489 32 2355855360 81352 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -server -Xmx512m -Djava.io.tmpdir=/tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1582603840719_0003/container_1582603840719_0003_02_000001/tmp -Dspark.yarn.app.container.log.dir=/home/ubuntu/server/hadoop-2.9.2/logs/userlogs/application_1582603840719_0003/container_1582603840719_0003_02_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg ip-172-31-7-96.ec2.internal:40963 --properties-file /tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1582603840719_0003/container_1582603840719_0003_02_000001/__spark_conf__/__spark_conf__.properties
|- 3223 3221 3223 3223 (bash) 0 0 13635584 767 /bin/bash -c /usr/lib/jvm/java-8-openjdk-amd64/bin/java -server -Xmx512m -Djava.io.tmpdir=/tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1582603840719_0003/container_1582603840719_0003_02_000001/tmp -Dspark.yarn.app.container.log.dir=/home/ubuntu/server/hadoop-2.9.2/logs/userlogs/application_1582603840719_0003/container_1582603840719_0003_02_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg 'ip-172-31-7-96.ec2.internal:40963' --properties-file /tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1582603840719_0003/container_1582603840719_0003_02_000001/__spark_conf__/__spark_conf__.properties 1> /home/ubuntu/server/hadoop-2.9.2/logs/userlogs/application_1582603840719_0003/container_1582603840719_0003_02_000001/stdout 2> /home/ubuntu/server/hadoop-2.9.2/logs/userlogs/application_1582603840719_0003/container_1582603840719_0003_02_000001/stderr
[2020-02-25 13:07:51.089]Container killed on request. Exit code is 143
[2020-02-25 13:07:51.090]Container exited with a non-zero exit code 143.
For more detailed output, check the application tracking page: http://ec2-34-200-223-235.compute-1.amazonaws.com:8088/cluster/app/application_1582603840719_0003 Then click on links to logs of each attempt.
. Failing the application.
20/02/25 13:07:51 ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalStateException: Spark context stopped while waiting for backend
at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:818)
at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:196)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:560)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
20/02/25 13:07:52 ERROR TransportClient: Failed to send RPC RPC 8397804982944513692 to /172.31.0.102:39468: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
20/02/25 13:07:52 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint: Sending RequestExecutors(0,0,Map(),Set()) to AM was unsuccessful
java.io.IOException: Failed to send RPC RPC 8397804982944513692 to /172.31.0.102:39468: java.nio.channels.ClosedChannelException
at org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:362)
at org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:339)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)
at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
20/02/25 13:07:52 ERROR Utils: Uncaught exception in thread YARN application state monitor
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:574)
at org.apache.spark.scheduler.cluster.YarnSchedulerBackend.stop(YarnSchedulerBackend.scala:98)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:164)
at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:653)
at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:2042)
at org.apache.spark.SparkContext$$anonfun$stop$6.apply$mcV$sp(SparkContext.scala:1949)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1948)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:121)
Caused by: java.io.IOException: Failed to send RPC RPC 8397804982944513692 to /172.31.0.102:39468: java.nio.channels.ClosedChannelException
at org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:362)
at org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:339)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)
at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
Traceback (most recent call last):
File "/home/ubuntu/server/spark-yarn.py", line 7, in <module>
sc = SparkContext(conf=conf)
File "/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 136, in __init__
File "/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 198, in _do_init
File "/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 306, in _initialize_context
File "/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1525, in __call__
File "/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalStateException: Spark context stopped while waiting for backend
at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:818)
at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:196)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:560)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
It mentions checking the logs at:
http://ec2-34-200-223-235.compute-1.amazonaws.com:8088/cluster/app/application_1582603840719_0003
But clicking on any of the log links on that page give an error:
Firefox can’t establish a connection to the server at ip-172-31-0-102.ec2.internal:8042.
(That's probably unrelated.)
Grepping for warnings, I see the following:
2020-02-25 13:07:38,904 WARN org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The specific max attempts: 0 for application: 3 is invalid, because it is out of the range [1, 2]. Use the global max attempts instead.
2020-02-25 13:07:51,241 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=ubuntOPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1582603840719_0003 failed 2 times due to AM Container for appattempt_1582603840719_0003_000002 exited with exitCode: -103
2020-02-25 13:07:40,367 WARN org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint done. New Image Size: 12086
No errors were generated.
Trying yarn logs from the command line will show me the jobs:
ubuntu#ip-172-31-7-96:~/server$ yarn application -list -appStates ALL
20/02/25 13:31:44 INFO client.RMProxy: Connecting to ResourceManager at ec2-34-200-223-235.compute-1.amazonaws.com/172.31.7.96:8032
Total number of applications (application-types: [], states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING, FINISHED, FAILED, KILLED] and tags: []):3
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1582603840719_0001 spark-yarn SPARK ubuntu default FINISHED UNDEFINED 100% N/A
But asking for the logs fails:
ubuntu#ip-172-31-7-96:~/server$ yarn logs -applicationId application_1582603840719_0001
20/02/25 13:32:48 INFO client.RMProxy: Connecting to ResourceManager at ec2-34-200-223-235.compute-1.amazonaws.com/172.31.7.96:8032
fs.AbstractFileSystem.ec2-34-200-223-235.compute-1.amazonaws.com.impl=null: No AbstractFileSystem configured for scheme: ec2-34-200-223-235.compute-1.amazonaws.com
Can not find any log file matching the pattern: [ALL] for the application: application_1582603840719_0001
Can not find the logs for the application: application_1582603840719_0001 with the appOwner: ubuntu
Again, if someone could direct me to next trouble-shooting steps, I'd really appreciate it. I've spent days on this and don't seem to be making progress.
Two of these things ended up solving this issue:
First, I added the following lines to all nodes in the yarn-site.xml file:
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
Next, I changed my spark-submit command to include the following lines to give the driver more memory:
spark-submit --master yarn \
--deploy-mode client \
--driver-memory 6g \
--executor-memory 6g \
--executor-cores 2 \
--num-executors 10 \
my_app.py
The YARN application has already ended! It might have been killed or the Application Master may have failed to start. Check the YARN application logs for more details.
19/05/17 10:11:06 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
19/05/17 10:11:06 WARN MetricsSystem: Stopping a MetricsSystem that is not running
19/05/17 10:11:06 WARN SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at:
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
java.lang.reflect.Constructor.newInstance(Constructor.java:423)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
py4j.Gateway.invoke(Gateway.java:238)
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
py4j.GatewayConnection.run(GatewayConnection.java:238)
java.lang.Thread.run(Thread.java:748)
19/05/17 10:11:06 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
19/05/17 10:11:10 ERROR YarnClientSchedulerBackend: The YARN application has already ended! It might have been killed or the Application Master may have failed to start. Check the YARN application logs for more details.
19/05/17 10:11:10 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Application application_1558064260263_0002 failed 2 times due to AM Container for appattempt_1558064260263_0002_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: [2019-05-17 10:11:09.626]File file:/home/hadoop/.sparkStaging/application_1558064260263_0002/pyspark.zip does not exist
19/05/17 10:11:06 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
19/05/17 10:11:06 WARN MetricsSystem: Stopping a MetricsSystem that is not running
19/05/17 10:11:06 WARN SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at:
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
java.lang.reflect.Constructor.newInstance(Constructor.java:423)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
py4j.Gateway.invoke(Gateway.java:238)
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
py4j.GatewayConnection.run(GatewayConnection.java:238)
java.lang.Thread.run(Thread.java:748)
19/05/17 10:11:06 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
19/05/17 10:11:10 ERROR YarnClientSchedulerBackend: The YARN application has already ended! It might have been killed or the Application Master may have failed to start. Check the YARN application logs for more details.
19/05/17 10:11:10 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Application application_1558064260263_0002 failed 2 times due to AM Container for appattempt_1558064260263_0002_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: [2019-05-17 10:11:09.626]File file:/home/hadoop/.sparkStaging/application_1558064260263_0002/pyspark.zip does not exist
Add these lines in your .bashrc
function snotebook ()
{
#Spark path (based on your computer)
SPARK_PATH=$SPARK_HOME
export PYSPARK_DRIVER_PYTHON="jupyter"
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
# For python 3 users, you have to add the line below or you will get an error
export PYSPARK_PYTHON=/home/anaconda3/bin/python
$SPARK_PATH/bin/pyspark --master yarn
}
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
When I run the following Python program
from pyspark.ml.classification import LinearSVC
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Sparkmodel").getOrCreate()
data = spark.read.format("libsvm").load("/usr/local/spark/data/mllib/sample_libsvm_data.txt")
model = LinearSVC().fit(data)
model.save("mymodel")
LinearSVC.load("mymodel")
the load fails with a "java.lang.NoSuchMethodException".
/anaconda3/envs/scratch/bin/python /Users/billmcn/src/toy/sparkmodel/sparkmodel/little.py
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/11/12 13:23:05 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/11/12 13:23:06 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
17/11/12 13:23:06 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
17/11/12 13:23:17 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
17/11/12 13:23:17 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
Traceback (most recent call last):
File "/Users/billmcn/src/toy/sparkmodel/sparkmodel/little.py", line 9, in <module>
LinearSVC.load("mymodel")
File "/anaconda3/envs/scratch/lib/python3.6/site-packages/pyspark/ml/util.py", line 257, in load
return cls.read().load(path)
File "/anaconda3/envs/scratch/lib/python3.6/site-packages/pyspark/ml/util.py", line 197, in load
java_obj = self._jread.load(path)
File "/anaconda3/envs/scratch/lib/python3.6/site-packages/py4j/java_gateway.py", line 1160, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/anaconda3/envs/scratch/lib/python3.6/site-packages/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/anaconda3/envs/scratch/lib/python3.6/site-packages/py4j/protocol.py", line 320, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o64.load.
: java.lang.NoSuchMethodException: org.apache.spark.ml.classification.LinearSVCModel.<init>(java.lang.String)
at java.lang.Class.getConstructor0(Class.java:3082)
at java.lang.Class.getConstructor(Class.java:1825)
at org.apache.spark.ml.util.DefaultParamsReader.load(ReadWrite.scala:328)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
Process finished with exit code 1
The "mymodel" directory is created and its contents appear to be valid.
I am running Spark 2.2.0 and pyspark 2.2.0. I have the following mllib jars in my installation.
> ll /usr/local/spark.versions/spark-2.2.0-bin-hadoop2.7/jars/spark-mllib*
-rw-r--r--# 1 billmcn admin 6501535 Jun 30 18:09 /usr/local/spark.versions/spark-2.2.0-bin-hadoop2.7/jars/spark-mllib_2.11-2.2.0.jar
-rw-r--r--# 1 billmcn admin 182887 Jun 30 18:09 /usr/local/spark.versions/spark-2.2.0-bin-hadoop2.7/jars/spark-mllib-local_2.11-2.2.0.jar
And the latter contains the class I want.
jar tf /usr/local/spark.versions/spark-2.2.0-bin-hadoop2.7/jars/spark-mllib_2.11-2.2.0.jar | grep LinearSVCModel
org/apache/spark/ml/classification/LinearSVCModel$LinearSVCWriter$Data.class
org/apache/spark/ml/classification/LinearSVCModel$.class
org/apache/spark/ml/classification/LinearSVCModel$LinearSVCWriter.class
org/apache/spark/ml/classification/LinearSVCModel$LinearSVCReader.class
org/apache/spark/ml/classification/LinearSVCModel$$anonfun$11.class
org/apache/spark/ml/classification/LinearSVCModel$LinearSVCWriter$$typecreator1$1.class
org/apache/spark/ml/classification/LinearSVCModel$LinearSVCWriter$Data$.class
org/apache/spark/ml/classification/LinearSVCModel.class
The same problem happens on two different machines.
What am I doing wrong?
I was using the wrong class to load the module. The following works
model = LinearSVCModel.load(model_path)
This looks like a version mismatch. The most likely scenario is:
Your Python (PySpark) installation uses Spark 2.2
While JVM jars have been compiled with earlier Spark version, which didn't include LinearSVCModel.
I am running a Kafka server. (when i use the command bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning it gives me all ma data in my topic).
When I want to test the example JavaDirectKafkaWordCount in spark in order to understand how it works i get the following error:
$ ./run-example streaming.JavaDirectKafkaWordCount localhost:2181 test
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/08/17 11:19:33 INFO StreamingExamples: Setting log level to [WARN] for streaming example. To override add a custom log4j.properties to the classpath.
16/08/17 11:19:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/08/17 11:19:33 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 10.66.212.132 instead (on interface enp5s0)
16/08/17 11:19:33 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Exception in thread "main" org.apache.spark.SparkException: java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$checkErrors$1.apply(KafkaCluster.scala:366)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$checkErrors$1.apply(KafkaCluster.scala:366)
at scala.util.Either.fold(Either.scala:97)
at org.apache.spark.streaming.kafka.KafkaCluster$.checkErrors(KafkaCluster.scala:365)
at org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(KafkaUtils.scala:222)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:484)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:607)
at org.apache.spark.streaming.kafka.KafkaUtils.createDirectStream(KafkaUtils.scala)
at org.apache.spark.examples.streaming.JavaDirectKafkaWordCount.main(JavaDirectKafkaWordCount.java:71)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I would like to know what the error means and how i could possible solve it.
Thank you very much for your attention and your help.
Joe
I have a hortonwork sandbox 2.4 with spark 1.6 set up. Then I create Intellij spark development environment in windows using hdp spark jar and scala 2.10.5. So both spark and scala version are matched between my windows and hdp environment as indicated here. And my Intellij dev environment works with local as Master.
Then I'm trying to connect hdp in windows using
val sparkConf = new SparkConf()
.setAppName("spark-word-count")
.setMaster("spark://10.33.241.160:7077")
And I get below error information and have no clue to resolve it. Please help!
6/03/21 16:27:40 INFO SparkUI: Started SparkUI at http://10.33.240.126:4040
16/03/21 16:27:40 INFO AppClient$ClientEndpoint: Connecting to master spark://10.33.241.160:7077...
16/03/21 16:27:41 WARN AppClient$ClientEndpoint: Failed to connect to master 10.33.241.160:7077
java.io.IOException: Failed to connect to /10.33.241.160:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.net.ConnectException: Connection refused: no further information: /10.33.241.160:7077
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more
16/03/21 16:28:40 ERROR MapOutputTrackerMaster: Error communicating with MapOutputTracker
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1325)
at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77)
at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:110)
at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:120)
at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:462)
at org.apache.spark.SparkEnv.stop(SparkEnv.scala:93)
at org.apache.spark.SparkContext$$anonfun$stop$12.apply$mcV$sp(SparkContext.scala:1756)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1229)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1755)
at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.dead(SparkDeploySchedulerBackend.scala:127)
at org.apache.spark.deploy.client.AppClient$ClientEndpoint.markDead(AppClient.scala:264)
at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2$$anonfun$run$1.apply$mcV$sp(AppClient.scala:134)
at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1163)
at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2.run(AppClient.scala:129)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
It turns out I need to setup my hortonworks Spark as master server every time server restart. Then use my intellij dev environment to connect with hdp as slave. Just run ./sbin/start-master.sh in hdp as this link.