Spark Streaming:java.io.InvalidClassException: scala.concurrent.duration.Duration; local class incompatible: stream classdesc serialVersionUID - apache-spark

i am running a streaming job on CDH cluster and get error, and the cdh spark version is 1.2.0-cdh5.3.8, but i need spark2.1.0, so i have downloaded the apache spark and builded it(spark version: 2.1.0-cdh5.3.8,hadoop version=2.5.0-cdh5.3.8).
error message is below:
17/04/14 18:12:34 ERROR server.TransportRequestHandler: Error while invoking RpcHandler#receive() on RPC id 4724089633860239943
java.io.InvalidClassException: scala.concurrent.duration.Duration; local class incompatible: stream classdesc serialVersionUID = -7521802526148376080, local class serialVersionUID = -2941674837829752814
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:108)
at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1$$anonfun$apply$1.apply(NettyRpcEnv.scala:259)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:308)
at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1.apply(NettyRpcEnv.scala:258)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:257)
at org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:582)
at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:567)
at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:159)
at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:107)
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:119)
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:346)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:346)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:346)
at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:346)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:652)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:575)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:489)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:451)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
at java.lang.Thread.run(Thread.java:745)
17/04/14 18:12:35 INFO impl.AMRMClientImpl: Received new token for : ztdm006:8041
17/04/14 18:12:35 INFO yarn.YarnAllocator: Received 1 containers from YARN, launching executors on 0 of them.
17/04/14 18:12:35 INFO yarn.YarnAllocator: Completed container container_1488960736410_229415_01_000004 on host: ztdm009 (state: COMPLETE, exit status: 1)
17/04/14 18:12:35 WARN yarn.YarnAllocator: Container marked as failed: container_1488960736410_229415_01_000004 on host: ztdm009. Exit status: 1. Diagnostics: Exception from container-launch.
Container id: container_1488960736410_229415_01_000004
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
at org.apache.hadoop.util.Shell.run(Shell.java:460)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:707)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:197)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
17/04/14 18:12:35 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1488960736410_229415_01_000004 on host: ztdm009. Exit status: 1. Diagnostics: Exception from container-launch.
Container id: container_1488960736410_229415_01_000004
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
at org.apache.hadoop.util.Shell.run(Shell.java:460)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:707)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:197)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
17/04/14 18:12:35 INFO yarn.YarnAllocator: Completed container container_1488960736410_229415_01_000005 on host: ztdm010 (state: COMPLETE, exit status: 1)
17/04/14 18:12:35 WARN yarn.YarnAllocator: Container marked as failed: container_1488960736410_229415_01_000005 on host: ztdm010. Exit status: 1. Diagnostics: Exception from container-launch.
Container id: container_1488960736410_229415_01_000005
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
at org.apache.hadoop.util.Shell.run(Shell.java:460)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:707)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:197)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
17/04/14 18:12:35 INFO storage.BlockManagerMaster: Removal of executor 3 requested
17/04/14 18:12:35 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1488960736410_229415_01_000005 on host: ztdm010. Exit status: 1. Diagnostics: Exception from container-launch.
Container id: container_1488960736410_229415_01_000005
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
at org.apache.hadoop.util.Shell.run(Shell.java:460)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:707)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:197)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
17/04/14 18:12:35 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 3 from BlockManagerMaster.
17/04/14 18:12:35 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asked to remove non-existent executor 3
17/04/14 18:12:35 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 4 from BlockManagerMaster.
17/04/14 18:12:35 INFO storage.BlockManagerMaster: Removal of executor 4 requested
17/04/14 18:12:35 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asked to remove non-existent executor 4
17/04/14 18:12:38 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 11, (reason: Max number of executor failures (3) reached)
17/04/14 18:12:38 INFO storage.DiskBlockManager: Shutdown hook called
17/04/14 18:12:38 INFO util.ShutdownHookManager: Shutdown hook called
17/04/14 18:12:38 INFO util.ShutdownHookManager: Deleting directory /mnt/disk2/yarn/nm/usercache/efinance/appcache/application_1488960736410_229415/spark-1f1b6198-961b-418d-9274-5f35f8e67829
17/04/14 18:12:38 INFO util.ShutdownHookManager: Deleting directory /mnt/disk6/yarn/nm/usercache/efinance/appcache/application_1488960736410_229415/spark-fdce09f8-8677-45b2-9ce4-ac7134ab63b0
17/04/14 18:12:38 INFO util.ShutdownHookManager: Deleting directory /mnt/disk4/yarn/nm/usercache/efinance/appcache/application_1488960736410_229415/spark-10c52d9a-a76b-465f-82d5-42eba9c89c86
17/04/14 18:12:38 INFO util.ShutdownHookManager: Deleting directory /mnt/disk5/yarn/nm/usercache/efinance/appcache/application_1488960736410_229415/spark-4c492882-813a-4c2b-a041-ae69aba7ce00
17/04/14 18:12:38 INFO util.ShutdownHookManager: Deleting directory /mnt/disk3/yarn/nm/usercache/efinance/appcache/application_1488960736410_229415/spark-1d9e6c60-fc33-45c3-8552-55cbe4266931
17/04/14 18:12:38 INFO util.ShutdownHookManager: Deleting directory /mnt/disk1/yarn/nm/usercache/efinance/appcache/application_1488960736410_229415/spark-b6aa5cf9-042a-4804-9472-9bcddde2814e/userFiles-b259c206-d618-4d54-8630-824d955d0be4
17/04/14 18:12:38 INFO util.ShutdownHookManager: Deleting directory /mnt/disk1/yarn/nm/usercache/efinance/appcache/application_1488960736410_229415/spark-b6aa5cf9-042a-4804-9472-9bcddde2814e

Reason might be the scala jar in my code is able to conflict with the scala jar in the cluster. when i deleted scala directory in the jar file of compiled and built maven project, it fixed the issue.

Related

spark-sql/spark-submit with delta lake is resulting null pointer exception (at org.apache.spark.storage.BlockManagerMasterEndpoint)

I'm using delta lake on using pyspark by submitting below command
spark-sql --packages io.delta:delta-core_2.12:0.8.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
System Specs:
Spark - 3.0.3
scala - 2.12.10
java - 1.8.0
hadoop - 2.7
i'm looking at reference blog https://docs.delta.io/latest/quick-start.html
https://www.confessionsofadataguy.com/introduction-to-delta-lake-on-apache-spark-for-data-engineers/
but when I use spark without the delta config spark works fine.
Error (trucated the stacktrace):
22/12/29 20:38:10 INFO Executor: Starting executor ID driver on host 192.168.0.100
22/12/29 20:38:10 INFO Executor: Fetching spark://192.168.0.100:50920/jars/org.antlr_antlr4-runtime-4.7.jar with timestamp 1672326488255
22/12/29 20:38:23 WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped
22/12/29 20:38:23 INFO Executor: Told to re-register on heartbeat
22/12/29 20:38:23 INFO BlockManager: BlockManager null re-registering with master
22/12/29 20:38:23 INFO BlockManagerMaster: Registering BlockManager null
22/12/29 20:38:23 ERROR Inbox: Ignoring error
java.lang.NullPointerException
at org.apache.spark.storage.BlockManagerMasterEndpoint.org$apache$spark$storage$BlockManagerMasterEndpoint$$register(BlockManagerMasterEndpoint.scala:404)
at org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:97)
at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
22/12/29 20:38:23 WARN Executor: Issue communicating with driver in heartbeater
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:302)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:103)
at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:87)
at org.apache.spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:66)
at org.apache.spark.storage.BlockManager.reregister(BlockManager.scala:567)
at org.apache.spark.executor.Executor.reportHeartBeat(Executor.scala:934)
at org.apache.spark.executor.Executor.$anonfun$heartbeater$1(Executor.scala:200)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1934)
at org.apache.spark.Heartbeater$$anon$1.run(Heartbeater.scala:46)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at org.apache.spark.storage.BlockManagerMasterEndpoint.org$apache$spark$storage$BlockManagerMasterEndpoint$$register(BlockManagerMasterEndpoint.scala:404)
at org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:97)
at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
22/12/29 20:38:31 ERROR Utils: Aborting task
java.io.IOException: Failed to connect to /192.168.0.100:50920
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:253)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:195)
at org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:392)
at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$openChannel$4(NettyRpcEnv.scala:360)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:931)
at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:49)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.<init>(SparkSQLCLIDriver.scala:321)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection timed out: no further information: /192.168.0.100:50920
Caused by: java.net.ConnectException: Connection timed out: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
at java.lang.Thread.run(Thread.java:745)
22/12/29 20:38:31 ERROR SparkContext: Error initializing SparkContext.
java.io.IOException: Failed to connect to /192.168.0.100:50920
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:253)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:195)
at org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:392)
at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$openChannel$4(NettyRpcEnv.scala:360)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:745)
22/12/29 20:38:31 INFO SparkUI: Stopped Spark web UI at http://192.168.0.100:4041
22/12/29 20:38:31 ERROR Utils: Uncaught exception in thread main
java.lang.NullPointerException
at org.apache.spark.scheduler.local.LocalSchedulerBackend.org$apache$spark$scheduler$local$LocalSchedulerBackend$$stop(LocalSchedulerBackend.scala:168)
at org.apache.spark.scheduler.local.LocalSchedulerBackend.stop(LocalSchedulerBackend.scala:144)
at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:734)
at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:2171)
java.lang.NullPointerException
at org.apache.spark.executor.Executor.$anonfun$stop$3(Executor.scala:304)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:221)
at org.apache.spark.executor.Executor.stop(Executor.scala:304)
at org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:74)
at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
22/12/29 20:38:31 INFO ShutdownHookManager: Shutdown hook called
what is being missed?
Try it like this:
!pip3 install findspark --user
import findspark
findspark.init()
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages io.delta:delta- core_2.12:2.2.0 --driver-memory 2g pyspark-shell'
spark = SparkSession.builder.appName("your application") \
.config("spark.jars.packages", "io.delta:delta-core_2.12:2.2.0") \
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
.getOrCreate()

apache-spark: rdd.unpersist crashes for large files

My code runs on EMR, Spark version 2.0.2.
It works fine for smaller files but frequently crashes for files larges than 15GB.
The crash happens in unpersist function, which incidentally is the last step of processing.
Any ideas will be very helpful.
Thanks!
17/05/06 23:46:01 ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from /10.0.2.149:56200 is closed
17/05/06 23:46:01 INFO YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 7.
17/05/06 23:46:01 INFO DAGScheduler: Executor lost: 7 (epoch 5)
17/05/06 23:46:01 INFO BlockManagerMasterEndpoint: Trying to remove executor 7 from BlockManagerMaster.
17/05/06 23:46:01 WARN BlockManagerMaster: Failed to remove RDD 43 - Connection reset by peer
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
17/05/06 23:46:01 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(7, ip-10-0-2-149.eu-west-1.compute.internal, 36043)
17/05/06 23:46:01 INFO BlockManagerMaster: Removed 7 successfully in removeExecutor
17/05/06 23:46:01 INFO YarnScheduler: Executor 7 on ip-10-0-2-149.eu-west-1.compute.internal killed by driver.
Traceback (most recent call last):
File "/mnt/update_hid.py", line 565, in <module>
process(current_date)
File "/mnt/update_hid.py", line 517, in process
get_missing_ip=get_missing_ip)
File "/mnt/update_hid.py", line 466, in add_migration_info
17/05/06 23:46:01 INFO ExecutorAllocationManager: Existing executor 7 has been removed (new total is 11)
Hid.unpersist()
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 251, in unpersist
File "/usr/lib/spark/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/usr/lib/spark/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o205.unpersist.
: org.apache.spark.SparkException: Exception thrown in awaitResult
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
at org.apache.spark.storage.BlockManagerMaster.removeRdd(BlockManagerMaster.scala:117)
at org.apache.spark.SparkContext.unpersistRDD(SparkContext.scala:1683)
at org.apache.spark.rdd.RDD.unpersist(RDD.scala:212)
at org.apache.spark.api.java.JavaRDD.unpersist(JavaRDD.scala:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more
17/05/06 23:46:02 INFO YarnClientSchedulerBackend: Requesting to kill executor(s) 9
17/05/06 23:46:02 INFO ExecutorAllocationManager: Removing executor 9 because it has been idle for 60 seconds (new desired total will be 10)
17/05/06 23:46:02 INFO SparkContext: Invoking stop() from shutdown hook
17/05/06 23:46:02 INFO SparkUI: Stopped Spark web UI at http://10.0.2.182:4040
17/05/06 23:46:02 INFO YarnClientSchedulerBackend: Interrupting monitor thread
17/05/06 23:46:02 INFO YarnClientSchedulerBackend: Shutting down all executors
17/05/06 23:46:02 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
17/05/06 23:46:02 INFO SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
17/05/06 23:46:02 INFO YarnClientSchedulerBackend: Stopped
17/05/06 23:46:02 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/05/06 23:46:02 INFO MemoryStore: MemoryStore cleared
17/05/06 23:46:02 INFO BlockManager: BlockManager stopped
17/05/06 23:46:02 INFO BlockManagerMaster: BlockManagerMaster stopped
17/05/06 23:46:02 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/05/06 23:46:02 INFO SparkContext: Successfully stopped SparkContext
17/05/06 23:46:02 INFO ShutdownHookManager: Shutdown hook called

Docker memory leak with sonarqube

I am trying to run a docker container which contains SonarQube.
After build container, I did below command to run the container. For first several moment it looks fine(I guess since I can find up status in docker ps -a), but I exit automatically exit.
I have typed command like ...
docker run -d --name sonarqube
-p 9000:9000 -p 9092:9092
-e SONARQUBE_JDBC_USERNAME=sonar
-e SONARQUBE_JDBC_PASSWORD=sonar
-e SONARQUBE_JDBC_URL="jdbc:mysql://111.222.33.444:3306/sonar?characterEncoding=utf8&useUnicode=true&rewriteBatchedStatements=true"
sonarqube
And follow is failure log
2017.04.21 06:39:37 WARN web[][o.a.c.l.WebappClassLoaderBase] The web application [ROOT] appears to have started a thread named [elasticsearch[Pip the Troll][[timer]]] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
java.lang.Thread.sleep(Native Method)
org.elasticsearch.threadpool.ThreadPool$EstimatedTimeThread.run(ThreadPool.java:747)
2017.04.21 06:39:37 WARN web[][o.a.c.l.WebappClassLoaderBase] The web application [ROOT] appears to have started a thread named [elasticsearch[Pip the Troll][scheduler][T#1]] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
2017.04.21 06:39:37 WARN web[][o.a.c.l.WebappClassLoaderBase] The web application [ROOT] appears to have started a thread named [elasticsearch[Pip the Troll][transport_client_worker][T#1]{New I/O worker #1}] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)
org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:434)
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
2017.04.21 06:39:37 WARN web[][o.a.c.l.WebappClassLoaderBase] The web application [ROOT] appears to have started a thread named [elasticsearch[Pip the Troll][transport_client_worker][T#2]{New I/O worker #2}] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)
org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:434)
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
2017.04.21 06:39:37 WARN web[][o.a.c.l.WebappClassLoaderBase] The web application [ROOT] appears to have started a thread named [elasticsearch[Pip the Troll][transport_client_worker][T#3]{New I/O worker #3}] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)
org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:434)
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
After all, it seems like main cause of auto shutdown is memory leak. How can I fix this?
FYI, without JDBC information, it works fine.
================= EDIT ==================
Maybe I should provide more information to fix this.
When I type docker run and immediately docker logs sonarqube, log looks like ...
[root#DCSF-DEV08 ice]# docker logs sonarqube
01:00:46.930 [main] WARN org.sonar.application.JdbcSettings - JDBC URL is recommended to have the property 'useConfigs=maxPerformance'
2017.04.24 01:00:47 INFO app[][o.s.a.AppFileSystem] Cleaning or creating temp directory /opt/sonarqube/temp
2017.04.24 01:00:47 INFO app[][o.s.p.m.JavaProcessLauncher] Launch process[es]: /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Djava.awt.headless=true -Xmx1G -Xms256m -Xss256k -Djna.nosys=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Djava.io.tmpdir=/opt/sonarqube/temp -javaagent:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/management-agent.jar -cp ./lib/common/*:./lib/search/* org.sonar.search.SearchServer /opt/sonarqube/temp/sq-process7897977644818879465properties
2017.04.24 01:00:47 INFO es[][o.s.p.ProcessEntryPoint] Starting es
2017.04.24 01:00:47 INFO es[][o.s.s.EsSettings] Elasticsearch listening on /127.0.0.1:9001
2017.04.24 01:00:47 INFO es[][o.elasticsearch.node] [sonarqube] version[2.4.4], pid[45], build[fcbb46d/2017-01-03T11:33:16Z]
2017.04.24 01:00:47 INFO es[][o.elasticsearch.node] [sonarqube] initializing ...
2017.04.24 01:00:47 INFO es[][o.e.plugins] [sonarqube] modules [], plugins [], sites []
2017.04.24 01:00:47 INFO es[][o.elasticsearch.env] [sonarqube] using [1] data paths, mounts [[/opt/sonarqube/data (/dev/mapper/centos-root)]], net usable_space [7gb], net total_space [49.9gb], spins? [possibly], types [xfs]
2017.04.24 01:00:47 INFO es[][o.elasticsearch.env] [sonarqube] heap size [989.8mb], compressed ordinary object pointers [true]
2017.04.24 01:00:49 INFO es[][o.elasticsearch.node] [sonarqube] initialized
2017.04.24 01:00:49 INFO es[][o.elasticsearch.node] [sonarqube] starting ...
2017.04.24 01:00:49 INFO es[][o.e.transport] [sonarqube] publish_address {127.0.0.1:9001}, bound_addresses {127.0.0.1:9001}
2017.04.24 01:00:49 INFO es[][o.e.discovery] [sonarqube] sonarqube/GPO7RRqHR8a8tfu1KfgVtw
But after few seconds, the error happens and exit. The first error is something related with ElasticSearch.
2017.04.24 01:00:52 INFO es[][o.elasticsearch.node] [sonarqube] started
2017.04.24 01:00:52 INFO es[][o.e.gateway] [sonarqube] recovered [0] indices into cluster_state
2017.04.24 01:00:52 INFO app[][o.s.p.m.Monitor] Process[es] is up
2017.04.24 01:00:52 INFO app[][o.s.p.m.JavaProcessLauncher] Launch process[web]: /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Xmx512m -Xms128m -XX:+HeapDumpOnOutOfMemoryError -Djava.security.egd=file:/dev/./urandom -Djava.io.tmpdir=/opt/sonarqube/temp -javaagent:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/management-agent.jar -cp ./lib/common/*:./lib/server/*:/opt/sonarqube/lib/jdbc/mysql/mysql-connector-java-5.1.39.jar org.sonar.server.app.WebServer /opt/sonarqube/temp/sq-process8670082336569494309properties
2017.04.24 01:00:53 INFO web[][o.s.p.ProcessEntryPoint] Starting web
2017.04.24 01:00:53 INFO web[][o.a.t.u.n.NioSelectorPool] Using a shared selector for servlet write/read
2017.04.24 01:00:54 INFO web[][o.e.plugins] [Immortus] modules [], plugins [], sites []
2017.04.24 01:00:54 INFO web[][o.s.s.e.EsClientProvider] Connected to local Elasticsearch: [127.0.0.1:9001]
2017.04.24 01:00:54 INFO web[][o.s.s.p.LogServerVersion] SonarQube Server / 6.3.0.19869 / 43ea4f4c43aa89d4c435017f86d0da254e115e6b
2017.04.24 01:00:54 INFO web[][o.sonar.db.Database] Create JDBC data source for jdbc:mysql://125.131.88.156:3306/sonar?characterEncoding=utf8&useUnicode=true&rewriteBatchedStatements=true
2017.04.24 01:00:55 ERROR web[][o.a.c.c.C.[.[.[/]] Exception sending context initialized event to listener instance of class org.sonar.server.platform.web.PlatformServletContextListener
org.sonar.api.utils.MessageException: Unsupported mysql version: 5.5. Minimal supported version is 5.6.
2017.04.24 01:00:55 ERROR web[][o.a.c.c.StandardContext] One or more listeners failed to start. Full details will be found in the appropriate container log file
2017.04.24 01:00:55 ERROR web[][o.a.c.c.StandardContext] Context [] startup failed due to previous errors
2017.04.24 01:00:55 WARN web[][o.a.c.l.WebappClassLoaderBase] The web application [ROOT] appears to have started a thread named [elasticsearch[Immortus][[timer]]] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
java.lang.Thread.sleep(Native Method)
org.elasticsearch.threadpool.ThreadPool$EstimatedTimeThread.run(ThreadPool.java:747)
2017.04.24 01:00:55 WARN web[][o.a.c.l.WebappClassLoaderBase] The web application [ROOT] appears to have started a thread named [elasticsearch[Immortus][scheduler][T#1]] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
2017.04.24 01:00:55 WARN web[][o.a.c.l.WebappClassLoaderBase] The web application [ROOT] appears to have started a thread named [elasticsearch[Immortus][transport_client_worker][T#1]{New I/O worker #1}] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:68)
org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:434)
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
I think JDBC url looks fine. I can access the database with sqllite.
Thanks for answering my errors.
I have change my database system form MariaDB ver.10.1.22 to MySQL ver.5.7. There are some posts about this problem, but it seems does not solved yet. For now Sonarqube can not be used with some version of MariaDB.
Possibly database connectivity issue. Check your address for typos and make sure your credentials are valid.

how to understand yarn appattempt log and diagnostics?

I am running a spark application on YARN cluster(on AWS EMR). The application seems to be killed and I want to find the cause. I try to understand the YARN info given in the following screen.
The diagnostic line in the screen seems to show that YARN killing the app because of the memory limit:
Diagnostics: Container [pid=1540,containerID=container_1488651686158_0012_02_000001] is running beyond physical memory limits. Current usage: 1.6 GB of 1.4 GB physical memory used; 3.6 GB of 6.9 GB virtual memory used. Killing container.
However, the appattempt log shows completely different exception, something related to the IO/network. My question is : should I trust the diagnostic in the screen or the appattempt log? Is the IO exception causing the kill or the out of memory cause the IO exception in the appattempt log? Is it another log/diagnostic I should look at? Thanks.
17/03/04 21:59:02 ERROR Utils: Uncaught exception in thread task-result-getter-0
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:190)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:190)
at org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:104)
at org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:579)
at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:82)
at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
at org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:62)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "task-result-getter-0" java.lang.Error: java.lang.InterruptedException
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1148)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:190)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:190)
at org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:104)
at org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:579)
at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:82)
at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
at org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:62)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
... 2 more
17/03/04 21:59:02 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/03/04 21:59:02 ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from ip-172-31-9-207.ec2.internal/172.31.9.207:38437 is closed
17/03/04 21:59:02 INFO RetryingBlockFetcher: Retrying fetch (1/3) for 1 outstanding blocks after 5000 ms
17/03/04 21:59:02 ERROR DiskBlockManager: Exception while deleting local spark dir: /mnt/yarn/usercache/hadoop/appcache/application_1488651686158_0012/blockmgr-941a13d8-1b31-4347-bdec-180125b6f4ca
java.io.IOException: Failed to delete: /mnt/yarn/usercache/hadoop/appcache/application_1488651686158_0012/blockmgr-941a13d8-1b31-4347-bdec-180125b6f4ca
at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010)
at org.apache.spark.storage.DiskBlockManager$$anonfun$org$apache$spark$storage$DiskBlockManager$$doStop$1.apply(DiskBlockManager.scala:169)
at org.apache.spark.storage.DiskBlockManager$$anonfun$org$apache$spark$storage$DiskBlockManager$$doStop$1.apply(DiskBlockManager.scala:165)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at org.apache.spark.storage.DiskBlockManager.org$apache$spark$storage$DiskBlockManager$$doStop(DiskBlockManager.scala:165)
at org.apache.spark.storage.DiskBlockManager.stop(DiskBlockManager.scala:160)
at org.apache.spark.storage.BlockManager.stop(BlockManager.scala:1361)
at org.apache.spark.SparkEnv.stop(SparkEnv.scala:89)
at org.apache.spark.SparkContext$$anonfun$stop$11.apply$mcV$sp(SparkContext.scala:1842)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1283)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1841)
at org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:581)
at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
17/03/04 21:59:02 INFO MemoryStore: MemoryStore cleared
17/03/04 21:59:02 INFO BlockManager: BlockManager stopped
17/03/04 21:59:02 INFO BlockManagerMaster: BlockManagerMaster stopped
17/03/04 21:59:02 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/03/04 21:59:02 ERROR Utils: Uncaught exception in thread Thread-3
java.lang.NoClassDefFoundError: Could not initialize class java.nio.file.FileSystems$DefaultFileSystemHolder
at java.nio.file.FileSystems.getDefault(FileSystems.java:176)
at java.nio.file.Paths.get(Paths.java:138)
at org.apache.spark.util.Utils$.isSymlink(Utils.scala:1021)
at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:991)
at org.apache.spark.SparkEnv.stop(SparkEnv.scala:102)
at org.apache.spark.SparkContext$$anonfun$stop$11.apply$mcV$sp(SparkContext.scala:1842)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1283)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1841)
at org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:581)
at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
17/03/04 21:59:02 WARN ShutdownHookManager: ShutdownHook '$anon$2' failed, java.lang.NoClassDefFoundError: Could not initialize class java.nio.file.FileSystems$DefaultFileSystemHolder
java.lang.NoClassDefFoundError: Could not initialize class java.nio.file.FileSystems$DefaultFileSystemHolder
at java.nio.file.FileSystems.getDefault(FileSystems.java:176)
at java.nio.file.Paths.get(Paths.java:138)
at org.apache.spark.util.Utils$.isSymlink(Utils.scala:1021)
at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:991)
at org.apache.spark.SparkEnv.stop(SparkEnv.scala:102)
at org.apache.spark.SparkContext$$anonfun$stop$11.apply$mcV$sp(SparkContext.scala:1842)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1283)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1841)
at org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:581)
at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
The information in your screenshot is the most relevant. Your ApplicationMaster container ran out of memory. You need to increase yarn.app.mapreduce.am.resource.mb which is set in mapred-site.xml. I recommend a value of 2000 since that will usually accommodate running Spark and MapReduce applications at scale.
The container was killed (memory exceeds physical memory limits) so any attempt to reach this container fails.
Yarn is fine to have an overall view of the process, but you should prefer spark history server to analyse better your job (check unbalanced memory in spark history).

Livy pyspark Python Session Error in Jypyter with Spark Magic - ERROR repl.PythonInterpreter: Process has died with 1

I'm running a spark v2.0.0 YARN cluster. I have livy running beside the Spark master.
I have set up a jupyter Python3 notetebook and have Spark Magic installed and have followed the nessesary instructions to connect Spark Magic to Livy although When I create my session I get an error message from the notebook.
Added endpoint http://spark-master:8998
Starting Spark application
ID YARN Application ID Kind State Spark UI Driver log Current session?
0 None pyspark idle ✔
---------------------------------------------------------------------------
LivyUnexpectedStatusException Traceback (most recent call last)
/opt/conda/lib/python3.5/site-packages/hdijupyterutils/ipywidgetfactory.py in submit_clicked(self, button)
63
64 def submit_clicked(self, button):
---> 65 self.parent_widget.run()
/opt/conda/lib/python3.5/site-packages/sparkmagic/controllerwidget/createsessionwidget.py in run(self)
56
57 try:
---> 58 self.spark_controller.add_session(alias, endpoint, skip, properties)
59 except ValueError as e:
60 self.ipython_display.send_error("""Could not add session with
/opt/conda/lib/python3.5/site-packages/sparkmagic/livyclientlib/sparkcontroller.py in add_session(self, name, endpoint, skip_if_exists, properties)
79 session = self._livy_session(http_client, properties, self.ipython_display)
80 self.session_manager.add_session(name, session)
---> 81 session.start()
82
83 def get_session_id_for_client(self, name):
/opt/conda/lib/python3.5/site-packages/sparkmagic/livyclientlib/livysession.py in start(self)
148 else:
149 command = Command("sqlContext")
--> 150 (success, out) = command.execute(self)
151 if success:
152 self.ipython_display.writeln(u"SparkContext available as 'sc'.")
/opt/conda/lib/python3.5/site-packages/sparkmagic/livyclientlib/command.py in execute(self, session)
29 statement_id = -1
30 try:
---> 31 session.wait_for_idle()
32 data = {u"code": self.code}
33 response = session.http_client.post_statement(session.id, data)
/opt/conda/lib/python3.5/site-packages/sparkmagic/livyclientlib/livysession.py in wait_for_idle(self, seconds_to_wait)
238 .format(self.id, self.status)
239 self.logger.error(error)
--> 240 raise LivyUnexpectedStatusException(u'{} See logs:\n{}'.format(error, self.get_logs()))
241
242 if seconds_to_wait <= 0.0:
LivyUnexpectedStatusException: Session 0 unexpectedly reached final status 'error'. See logs:
Error I get from the Livy logs when creating a new session in the manage spark section of jupyter
17/02/10 13:06:08 INFO StateStore$: Using BlackholeStateStore for recovery.
17/02/10 13:06:08 INFO BatchSessionManager: Recovered 0 batch sessions. Next session id: 0
17/02/10 13:06:08 INFO InteractiveSessionManager: Recovered 0 interactive sessions. Next session id: 0
17/02/10 13:06:08 INFO InteractiveSessionManager: Heartbeat watchdog thread started.
17/02/10 13:06:08 INFO WebServer: Starting server on http://spark-master:8998
17/02/10 13:06:34 INFO InteractiveSession$: Creating LivyClient for sessionId: 0
17/02/10 13:06:34 WARN RSCConf: Your hostname, spark-master, resolves to a loopback address, but we couldn't find any external IP address!
17/02/10 13:06:34 WARN RSCConf: Set livy.rsc.rpc.server.address if you need to bind to another address.
17/02/10 13:06:35 INFO InteractiveSessionManager: Registering new session 0
17/02/10 13:06:35 INFO ContextLauncher: 17/02/10 13:06:35 INFO driver.RSCDriver: Starting RPC server...
17/02/10 13:06:35 INFO ContextLauncher: 17/02/10 13:06:35 WARN rsc.RSCConf: Set livy.rsc.rpc.server.address if you need to bind to another address.
17/02/10 13:06:35 INFO ContextLauncher: 17/02/10 13:06:35 INFO driver.RSCDriver: Received job request 3ca8a52b-8dd5-41f0-8151-a8201d72d422
17/02/10 13:06:35 INFO ContextLauncher: 17/02/10 13:06:35 INFO driver.RSCDriver: SparkContext not yet up, queueing job request.
17/02/10 13:06:36 INFO ContextLauncher: Setting default log level to "WARN".
17/02/10 13:06:36 INFO ContextLauncher: To adjust logging level use sc.setLogLevel(newLevel).
17/02/10 13:06:36 INFO ContextLauncher: 17/02/10 13:06:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/02/10 13:06:37 INFO ContextLauncher: 17/02/10 13:06:37 ERROR repl.PythonInterpreter: Process has died with 1
17/02/10 13:06:37 INFO RSCClient: Received result for 3ca8a52b-8dd5-41f0-8151-a8201d72d422
and get this output in the livy logs
I'm unable to put my finger on what the exact issue/fix is. I'm able to create a successful connection if I set my session to use the Scala language instead of the Python. Although I only get the error if I set the session language to python. If someone knows a solution to connecting a livy-repl pyspark session in Jupyter please let me know!
UPDATE
Livy still fails to create a PySpark session.
curl -v -X POST --data '{"kind": "pyspark"}' -H "Content-Type: application/json" example.com/sessions
The session state will go straight from "starting" to "failed". YARN logs on Resource Manager give the following right before the livy session fails.
To adjust logging level use sc.setLogLevel(newLevel).
17/02/26 05:02:25 WARN rsc.RSCConf: Your hostname, yarn-slave1, resolves to a loopback address, but we couldn't find any external IP address!
17/02/26 05:02:25 WARN rsc.RSCConf: Set livy.rsc.rpc.server.address if you need to bind to another address.
17/02/26 05:02:32 ERROR repl.PythonInterpreter: Process has died with 1
17/02/26 05:02:33 WARN yarn.YarnAllocator: Container marked as failed: container_1488085279373_0001_01_000002 on host: yarn-slave1. Exit status: 1. Diagnostics: Exception from container-launch.
Container id: container_1488085279373_0001_01_000002
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
at org.apache.hadoop.util.Shell.run(Shell.java:479)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
17/02/26 05:02:33 WARN yarn.ApplicationMaster: Reporter thread fails 1 time(s) in a row.
java.lang.IllegalStateException: RpcEnv already stopped.
at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:159)
at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:131)
at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:185)
at org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:508)
at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$processCompletedContainers$1$$anonfun$apply$7.apply(YarnAllocator.scala:531)
at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$processCompletedContainers$1$$anonfun$apply$7.apply(YarnAllocator.scala:512)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$processCompletedContainers$1.apply(YarnAllocator.scala:512)
at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$processCompletedContainers$1.apply(YarnAllocator.scala:442)
at scala.collection.Iterator$class.foreach(Iterator.scala:742)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at org.apache.spark.deploy.yarn.YarnAllocator.processCompletedContainers(YarnAllocator.scala:442)
at org.apache.spark.deploy.yarn.YarnAllocator.allocateResources(YarnAllocator.scala:242)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$1.run(ApplicationMaster.scala:372)
17/02/26 05:02:40 WARN yarn.YarnAllocator: Container marked as failed: container_1488085279373_0001_01_000005 on host: yarn-slave1. Exit status: 1. Diagnostics: Exception from container-launch.
Container id: container_1488085279373_0001_01_000005
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
at org.apache.hadoop.util.Shell.run(Shell.java:479)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
17/02/26 05:02:40 WARN yarn.ApplicationMaster: Reporter thread fails 1 time(s) in a row.
java.lang.IllegalStateException: RpcEnv already stopped.
at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:159)
at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:131)
at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:185)
at org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:508)
at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$processCompletedContainers$1$$anonfun$apply$7.apply(YarnAllocator.scala:531)
at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$processCompletedContainers$1$$anonfun$apply$7.apply(YarnAllocator.scala:512)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$processCompletedContainers$1.apply(YarnAllocator.scala:512)
at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$processCompletedContainers$1.apply(YarnAllocator.scala:442)
at scala.collection.Iterator$class.foreach(Iterator.scala:742)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at org.apache.spark.deploy.yarn.YarnAllocator.processCompletedContainers(YarnAllocator.scala:442)
at org.apache.spark.deploy.yarn.YarnAllocator.allocateResources(YarnAllocator.scala:242)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$1.run(ApplicationMaster.scala:372)
17/02/26 05:02:47 WARN yarn.YarnAllocator: Container marked as failed: container_1488085279373_0001_01_000006 on host: yarn-slave1. Exit status: 1. Diagnostics: Exception from container-launch.
Container id: container_1488085279373_0001_01_000006
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
at org.apache.hadoop.util.Shell.run(Shell.java:479)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
17/02/26 05:02:47 WARN yarn.ApplicationMaster: Reporter thread fails 1 time(s) in a row.
java.lang.IllegalStateException: RpcEnv already stopped.
at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:159)
at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:131)
at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:185)
at org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:508)
at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$processCompletedContainers$1$$anonfun$apply$7.apply(YarnAllocator.scala:531)
at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$processCompletedContainers$1$$anonfun$apply$7.apply(YarnAllocator.scala:512)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$processCompletedContainers$1.apply(YarnAllocator.scala:512)
at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$processCompletedContainers$1.apply(YarnAllocator.scala:442)
at scala.collection.Iterator$class.foreach(Iterator.scala:742)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at org.apache.spark.deploy.yarn.YarnAllocator.processCompletedContainers(YarnAllocator.scala:442)
at org.apache.spark.deploy.yarn.YarnAllocator.allocateResources(YarnAllocator.scala:242)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$1.run(ApplicationMaster.scala:372)
17/02/26 05:02:53 WARN yarn.YarnAllocator: Container marked as failed: container_1488085279373_0001_01_000007 on host: yarn-slave1. Exit status: 1. Diagnostics: Exception from container-launch.
Container id: container_1488085279373_0001_01_000007
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
at org.apache.hadoop.util.Shell.run(Shell.java:479)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
17/02/26 05:02:53 WARN yarn.ApplicationMaster: Reporter thread fails 1 time(s) in a row.
java.lang.IllegalStateException: RpcEnv already stopped.
at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:159)
at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:131)
at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:185)
at org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:508)
at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$processCompletedContainers$1$$anonfun$apply$7.apply(YarnAllocator.scala:531)
at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$processCompletedContainers$1$$anonfun$apply$7.apply(YarnAllocator.scala:512)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$processCompletedContainers$1.apply(YarnAllocator.scala:512)
at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$processCompletedContainers$1.apply(YarnAllocator.scala:442)
at scala.collection.Iterator$class.foreach(Iterator.scala:742)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at org.apache.spark.deploy.yarn.YarnAllocator.processCompletedContainers(YarnAllocator.scala:442)
at org.apache.spark.deploy.yarn.YarnAllocator.allocateResources(YarnAllocator.scala:242)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$1.run(ApplicationMaster.scala:372)
spark-defaults.conf
spark.yarn.appMasterEnv.PYSPARK_PYTHON python2
core-site.xml
<property>
<name>hadoop.proxyuser.livy.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.livy.hosts</name>
<value>*</value>
</property>
livy.conf
livy.server.host = 0.0.0.0
livy.server.port = 8998
livy.spark.master = yarn
livy.spark.deployMode = cluster
I was able to reproduce this issue.
The problem seems to be that spark 2.0.0 and livy have incompatible pyspark versions. If you update spark to the most recent version(currently 2.1.0) the pyspark versions can communicate and the spark session is created without a hitch.
I had faced similar issue even with spark 2.1.1 and livy. Livy-session status went to "error" from "starting". Turned out that I was using Java-7 while Livy and Spark need Java-8. Solved my issue.
I was facing a similar issue. Turns out the culprit was livy version. When replaced cloudera livy with apache livy-0.6.0-incubating version, the problem was solved; and I was able to create pyspark kind session on livy.

Resources