Exception running /etc/hadoop/conf.cloudera.yarn/topology.py - apache-spark

Any time I try to run a Spark application on a Cloudera CDH 5.4.4 cluster, Yarn client mode, I get the following exception (repeated many times in the stack trace). The process continue anyway (it is a warning), but it is imposible to find something in the logs. How can I solve it?
15/09/01 08:53:58 WARN net.ScriptBasedMapping: Exception running /etc/hadoop/conf.cloudera.yarn/topology.py 10.0.0.5
java.io.IOException: Cannot run program "/etc/hadoop/conf.cloudera.yarn/topology.py" (in directory "/home/azureuser/scripts/streaming"): error=13, Permission denied
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:485)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.runResolveCommand(ScriptBasedMapping.java:251)
at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.resolve(ScriptBasedMapping.java:188)
at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:119)
at org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:101)
at org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:81)
at org.apache.spark.scheduler.cluster.YarnScheduler.getRackForHost(YarnScheduler.scala:38)
at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$1.apply(TaskSchedulerImpl.scala:271)
at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$1.apply(TaskSchedulerImpl.scala:263)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:263)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor.makeOffers(CoarseGrainedSchedulerBackend.scala:167)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receiveWithLogging$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:131)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53)
at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.io.IOException: error=13, Permission denied
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:186)
at java.lang.ProcessImpl.start(ProcessImpl.java:130)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
... 32 more

Faced the same problem while submitting a Spark job.
Most probably you don't have the "/etc/hadoop/conf.cloudera.yarn/topology.py" file on the server you ran the Spark application. Or "/etc/hadoop/conf.cloudera.yarn/topology.py" is not available on one or more servers.
You can also resolve this problem by running the application in yarn cluster mode.
--master yarn-cluster

chmod 755 /etc/hadoop/conf.cloudera.yarn/topology.py

Found a solution here: https://groups.google.com/a/cloudera.org/forum/#!searchin/cdh-user/Exception$20running$20$2Fetc$2Fhadoop$2Fconf.cloudera.yarn$2Ftopology.py/cdh-user/fte4IPjX8TU/0jUOGkXyCAAJ
Just add permissions to the user that launches the script on all the directories that form the path of the script (for reading and listing files).

I face the same issue. you can scp /etc/hadoop/conf.cloudera.yarn Username#Host:/etc/hadoop
in the others datanode in your clusters to your submit-shell computer and notices the permissions. Hoping it's fine for you.

Related

Cannot run livy in sparkR mode on zeppelin 0.8

We've install the zeppelin that is one of HDP 3.1.5 service and install the suitable R version (from RHEL 7 repos) that compatible with the zeppelin version (0.8).
At first, when we scripting and running the Spark code using spark interpreter especially using sparkR one, it runs well but when we change the interpreter into livy, we always got an error with message "Fail to start interpreter".
When we check into Yarn UI, the application is created and running but when it's checked into spark ui, we've got the stderror that said
22/07/28 10:21:17 WARN SparkRInterpreter$: Fail to init Spark RBackend, using different method signature
java.lang.ClassCastException: scala.Tuple2 cannot be cast to java.lang.Integer
at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:101)
at org.apache.livy.repl.SparkRInterpreter$$anon$1.run(SparkRInterpreter.scala:88)
22/07/28 10:21:17 WARN Session: Fail to start interpreter sparkr
java.io.IOException: Cannot run program "R": error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.livy.repl.SparkRInterpreter$.apply(SparkRInterpreter.scala:143)
at org.apache.livy.repl.Session.liftedTree1$1(Session.scala:107)
at org.apache.livy.repl.Session.interpreter(Session.scala:98)
at org.apache.livy.repl.Session$$anonfun$execute$1.apply$mcV$sp(Session.scala:168)
at org.apache.livy.repl.Session$$anonfun$execute$1.apply(Session.scala:163)
at org.apache.livy.repl.Session$$anonfun$execute$1.apply(Session.scala:163)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 11 more
Is there any step/configuration that we skip?
-Thanks-

Spark job fails: storage.DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file

I have a Spark (1.4.1) application, running on Yarn, that fails with the following executor log entry:
16/07/21 23:09:08 ERROR executor.CoarseGrainedExecutorBackend: Driver 9.4.136.20:55995 disassociated! Shutting down.
16/07/21 23:09:08 ERROR storage.DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /dfs1/hadoop/yarn/local/usercache/mitchus/appcache/application_1465987751317_1172/blockmgr-f367f43b-f4c8-4faf-a829-530da30fb040/1c/temp_shuffle_581adb36-1561-4db8-a556-c4ac0e6400ed
java.io.FileNotFoundException: /dfs1/hadoop/yarn/local/usercache/mitchus/appcache/application_1465987751317_1172/blockmgr-f367f43b-f4c8-4faf-a829-530da30fb040/1c/temp_shuffle_581adb36-1561-4db8-a556-c4ac0e6400ed (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(BlockObjectWriter.scala:189)
at org.apache.spark.util.collection.ExternalSorter.spillToMergeableFile(ExternalSorter.scala:328)
at org.apache.spark.util.collection.ExternalSorter.spill(ExternalSorter.scala:257)
at org.apache.spark.util.collection.ExternalSorter.spill(ExternalSorter.scala:95)
at org.apache.spark.util.collection.Spillable$class.maybeSpill(Spillable.scala:83)
at org.apache.spark.util.collection.ExternalSorter.maybeSpill(ExternalSorter.scala:95)
at org.apache.spark.util.collection.ExternalSorter.maybeSpillCollection(ExternalSorter.scala:240)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:220)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Any clues as to what might have gone wrong?
The reason caused by temp shuffle file is deleted. There are many reasons, for one which I met is because the other executor was killed by Yarn. After the executor killed, a SHUT_DOWN signal will be sent to other executors, then the ShutdownHookManager will delete all the temp files which have registered to ShutdownHookManager. That's why you see the error. So you maybe need to check whether there are any ShutdownHookManager called log.
You can try to improve spark.yarn.executor.memoryOverhead.

Spark YARN client on Windows 7 issue

I am trying to execute
spark-submit --master yarn-client
on windows 7 client machine for CDH 5.4.5. cluster.
I downloaded spark 1.5. assembly from spark.apache.org.
Then downloaded yarn-config from cloudera manager running on cluster, and wrote its path to env variable YARN_CONF on client.
Yarn application worked correctly, but client gets the exception
15/10/16 10:54:59 WARN net.ScriptBasedMapping: Exception running /etc/hadoop/conf.cloudera.yarn/topology.py 10.20.52.104
java.io.IOException: Cannot run program "/etc/hadoop/conf.cloudera.yarn/topology.py" (in directory "C:\workspace\development\"): CreateProcess error=2, ═х єфрхЄё  эрщЄш єърчрээ√щ Їрщы
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:482)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.runResolveCommand(ScriptBasedMapping.java:251)
at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.resolve(ScriptBasedMapping.java:188)
at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:119)
at org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:101)
at org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:81)
at org.apache.spark.scheduler.cluster.YarnScheduler.getRackForHost(YarnScheduler.scala:38)
at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$1.apply(TaskSchedulerImpl.scala:270)
at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$1.apply(TaskSchedulerImpl.scala:262)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:262)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.makeOffers(CoarseGrainedSchedulerBackend.scala:167)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:106)
at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$processMessage(AkkaRpcEnv.scala:178)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$4.apply$mcV$sp(AkkaR
pcEnv.scala:127)
at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:198)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1.applyOrElse(AkkaRpcEnv.scala:126)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59)
at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.aroundReceive(AkkaRpcEnv.scala:93)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.io.IOException: CreateProcess error=2, ═х єфрхЄё  эрщЄш єърчрээ√щ Їрщы
at java.lang.ProcessImpl.create(Native Method)
at java.lang.ProcessImpl.<init>(ProcessImpl.java:386)
at java.lang.ProcessImpl.start(ProcessImpl.java:137)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 38 more
Then I modified client configuration of yarn site.xml param "net.topology.script.file.name" to correct path and now client get the exception
15/10/16 10:48:57 WARN net.ScriptBasedMapping: Exception running C:\packages\hadoop-client\yarn-conf\topology.py 10.20.52.105
java.io.IOException: Cannot run program "C:\packages\hadoop-client\yarn-conf\topology.py" (in directory "C:\workspace\development\"): CreateProcess error=193, %1 эх  ты хЄё  яЁшыюцхэшхь Win32
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:482)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.runResolveCommand(ScriptBasedMapping.java:251)
at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.resolve(ScriptBasedMapping.java:188)
at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:119)
at org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:101)
at org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:81)
at org.apache.spark.scheduler.cluster.YarnScheduler.getRackForHost(YarnScheduler.scala:38)
at org.apache.spark.scheduler.TaskSetManager$$anonfun$org$apache$spark$scheduler$TaskSetManager$$addPendingTask$1.apply(TaskSetManager.scala:213)
at org.apache.spark.scheduler.TaskSetManager$$anonfun$org$apache$spark$scheduler$TaskSetManager$$addPendingTask$1.apply(TaskSetManager.scala:192)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.TaskSetManager.org$apache$spark$scheduler$TaskSetManager$$addPendingTask(TaskSetManager.scala:192)
at org.apache.spark.scheduler.TaskSetManager$$anonfun$1.apply$mcVI$sp(TaskSetManager.scala:161)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
As I understood, spark can't invoke topology.py script correctly using python.exe on windows, but how to fix it?
Just a comment "net.topology.script.file.name" yarn param in site.xml.
I had the exact same problem as above on HortonWorks HDP2.4 when trying to access it with an iPython notebook with Spark. I solved it with the proposal from #mikhail-kramer above.
On the Windows client, I had to comment out the value of the net.topology.script.name variable in file core-site.xml that I had downloaded using Ambari. The commented out value now looks like this:
<property>
<name>net.topology.script.file.name</name>
<value><!--/etc/hadoop/conf/topology_script.py--></value>
</property>
I hope that this may help the next person who has the same problem in the future.

How to control the memory heap size of Spark History Server?

We have Spark (1.2) running on YARN with CDH 5.3.2, and Spark History Server.
For small jobs history server is able to works, but for few large jobs Spark History Server not able to retrieve logs/job history. and showing following error in
2015-04-09 09:50:48,061 WARN org.eclipse.jetty.servlet.ServletHandler: Error for /history/application_1428034115331_31584
org.spark-project.guava.common.util.concurrent.ExecutionError: java.lang.OutOfMemoryError: Java heap space
at org.spark-project.guava.common.cache.LocalCache$Segment.get(LocalCache.java:2261)
at org.spark-project.guava.common.cache.LocalCache.get(LocalCache.java:4000)
at org.spark-project.guava.common.cache.LocalCache.getOrLoad(LocalCache.java:4004)
at org.spark-project.guava.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
at org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:85)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:735)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:370)
at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at org.json4s.MonadicJValue$$anonfun$org$json4s$MonadicJValue$$findDirectByName$1.apply(MonadicJValue.scala:26)
at org.json4s.MonadicJValue$$anonfun$org$json4s$MonadicJValue$$findDirectByName$1.apply(MonadicJValue.scala:22)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at org.json4s.MonadicJValue.org$json4s$MonadicJValue$$findDirectByName(MonadicJValue.scala:22)
at org.json4s.MonadicJValue.$bslash(MonadicJValue.scala:16)
at org.apache.spark.util.JsonProtocol$.taskInfoFromJson(JsonProtocol.scala:560)
at org.apache.spark.util.JsonProtocol$.taskEndFromJson(JsonProtocol.scala:465)
at org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:425)
at org.apache.spark.scheduler.ReplayListenerBus$$anonfun$replay$2$$anonfun$apply$1.apply(ReplayListenerBus.scala:71)
at org.apache.spark.scheduler.ReplayListenerBus$$anonfun$replay$2$$anonfun$apply$1.apply(ReplayListenerBus.scala:69)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at org.apache.spark.scheduler.ReplayListenerBus$$anonfun$replay$2.apply(ReplayListenerBus.scala:69)
at org.apache.spark.scheduler.ReplayListenerBus$$anonfun$replay$2.apply(ReplayListenerBus.scala:55)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
at org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:55)
at org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$getAppUI$1.apply(FsHistoryProvider.scala:128)
at org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$getAppUI$1.apply(FsHistoryProvider.scala:117)
at scala.Option.map(Option.scala:145)
at org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:117)
at org.apache.spark.deploy.history.HistoryServer$$anon$3.load(HistoryServer.scala:55)
at org.apache.spark.deploy.history.HistoryServer$$anon$3.load(HistoryServer.scala:53)
at org.spark-project.guava.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
at org.spark-project.guava.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
at org.spark-project.guava.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
I didn't find any way to set heap for spark history server ?
or is this related to YARN History server ?
Thanks
setting SPARK_DAEMON_MEMORY=2g in spark-env.sh helped me
I think one of the causes is that the log file is bigger than the daemon memory so that the history server cannot read the whole log into the heap.
You can check the size of the log files in /tmp/spark-events if you didn't specify the log directory. Otherwise, go to the specified folder of spark.eventLog.dir.
In my case, my log files are about 17g so that I need to set SPARK_DAEMON_MEMORY=20g in {SPARK_HOME}/conf/spark.env.sh.

Spark-Submit exception SparkException: Job aborted due to stage failure

Whenever I tried to run a spark-submit command like the one below I'm getting an exception. Please could someone suggest what's going wrong here.
My command:
spark-submit --class com.xyz.MyTestClass --master spark://<spark-master-IP>:7077 SparkTest.jar
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:0 failed 4 times, most recent failure: TID 7 on host <hostname> failed for unknown reason
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633)
at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1207)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
I am not sure if the parameter
--master spark://<spark-master-IP>:7077
is what you actually have written instead of the actual IP of the master node. If so, you should change it and type the IP or public DNS of the master, such as:
--master spark://ec2-XX-XX-XX-XX.eu-west-1.compute.amazonaws.com:7077
If that's not the case, I would appreciate if you could provide more information about the error of the application, just as pointed on the comments above. Also make sure that the --class parameter points to the actual main class of the application.

Resources