Spark job fails due to stackoverflow error - apache-spark

My spark job is using Mllib to train a LogisticRegression on a data but it fails due to the Stackoverflow error, here is the error message shown in the spark-shell
java.lang.StackOverflowError
at scala.collection.generic.Growable$$anonfun$$plus$plus$eq$1.apply(Growable.scala:48)
at scala.collection.generic.Growable$$anonfun$$plus$plus$eq$1.apply(Growable.scala:48)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:176)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
at scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:48)
...
when I check the Spark UI, there is no failed stage or job! This is how I run my spark-shell
spark-shell --num-executors 100 --driver-memory 20g --conf spark.driver.maxResultSize=5g --executor-memory 8g --executor-cores 3
I even tried to increase the size of the stack by adding the following line when running the spark-shell, but it didn't help
--conf "spark.driver.extraJavaOptions='-XX:ThreadStackSize=81920'"
What the issue is?

Related

Why does spark-shell/pyspark gets less resources compared to spark-submit

I am running a PySpark code which runs on a single node due to code requirements.
But when I run the code using PySpark
pyspark --master yarn --executor-cores 1 --driver-memory 60g --executor-memory 5g --conf spark.driver.maxResultSize=4g
I get Segment error and the code fails, when I checked in Yarn I saw that my job was not getting resources even when they were available.
But when I use spark-submit
spark-submit --master yarn --executor-cores 1 --driver-memory 60g --executor-memory 5g --conf spark.driver.maxResultSize=4g code.py
the code gets all the resources it needs and the code runs perfectly.
Am I missing some fundamental aspect of spark or why is this happening?

java.lang.OutOfMemoryError: Java heap space spark streaming job

I've a spark streaming job. It operates in batches of 10 minutes. The driver machine is m4X4x (64GB) ec2 instance.
The job stalled after 18 hours. It crashes on the following exception. As I read the other posts it seems that the driver may have run out of memory. How can I check this? My pyspark config is as follows
Also, how do i check the memory in spark-ui ? I only see the 11 tasks nodes i have, not the driver.
export PYSPARK_SUBMIT_ARGS='--master yarn --deploy-mode client
--driver-memory 10g
--executor-memory 10g
--executor-cores 4
--conf spark.driver.cores=5
--packages "org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.2"
--conf spark.driver.maxResultSize=2g
--conf spark.shuffle.spill=true
--conf spark.yarn.driver.memoryOverhead=2048
--conf spark.yarn.executor.memoryOverhead=2048
--conf "spark.broadcast.blockSize=512M"
--conf "spark.memory.storageFraction=0.5"
--conf "spark.kryoserializer.buffer.max=1024"
--conf "spark.default.parallelism=600"
--conf "spark.sql.shuffle.partitions=600"
--driver-java-options - Dlog4j.configuration=file:///usr/lib/spark/conf/log4j.properties pyspark-shell'
[Stage 3507:> (0 + 0) / 600]Exception in thread "dag-scheduler-event-loop" java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$4.apply(TorrentBroadcast.scala:231)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$4.apply(TorrentBroadcast.scala:231)
at org.apache.spark.util.io.ChunkedByteBufferOutputStream.allocateNewChunkIfNeeded(ChunkedByteBufferOutputStream.scala:87)
at org.apache.spark.util.io.ChunkedByteBufferOutputStream.write(ChunkedByteBufferOutputStream.scala:75)
at net.jpountz.lz4.LZ4BlockOutputStream.flushBufferedData(LZ4BlockOutputStream.java:205)
at net.jpountz.lz4.LZ4BlockOutputStream.finish(LZ4BlockOutputStream.java:235)
at net.jpountz.lz4.LZ4BlockOutputStream.close(LZ4BlockOutputStream.java:175)
at java.io.ObjectOutputStream$BlockDataOutputStream.close(ObjectOutputStream.java:1828)
at java.io.ObjectOutputStream.close(ObjectOutputStream.java:742)
at org.apache.spark.serializer.JavaSerializationStream.close(JavaSerializer.scala:57)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$blockifyObject$1.apply$mcV$sp(TorrentBroadcast.scala:238)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1319)
at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:237)
at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:107)
at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:86)
at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:56)
at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1387)
at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1012)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:933)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:936)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:935)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:935)
at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:873)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1630)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1622)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1611)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
[Stage 3507:> (0 + 0) / 600]18/02/23 12:59:33 ERROR TransportRequestHandler: Error sending result RpcResponse{requestId=8388437576763608177, body=NioManagedBuffer{buf=java.nio.HeapByteBuffer[pos=0 lim=81 cap=156]}} to /172.23.56.231:58822; closing connection

java.lang.ClassNotFoundException: org.apache.spark.Logging

I just upgraded to Spark 2.1.0 and decided to test out my data with beeline, but for some reason it gives me:
Error:
org.spark_project.guava.util.concurrent.UncheckedExecutionException:
java.lang.ClassNotFoundException: org.apache.spark.Logging was removed
in Spark 2.0. Please check if your library is compatible with Spark
2.0 (state=,code=0)
I renamed the old directory so all files would be new. I'm not running my own code, but beeline that comes with Spark.
Here are steps that I followed:
cd /usr/local/spark
./sbin/start-thriftserver.sh --master spark://REMOVED:7077 --num-executors 2 --driver-memory 6G --executor-cores 6 --executor-memory 14G --hiveconf hive.server2.thrift.port=10015 --packages datastax:spark-cassandra-connector:1.6.4-s_2.11 --conf spark.cassandra.connection.host=REMOVED --conf spark.cassandra.auth.username=REMOVED --conf spark.cassandra.auth.password=REMOVED
./bin/beeline -u jdbc:hive2://REMOVED:10015
So I'm not very sure what to do now, any suggestions please?
You need to update datastax:spark-cassandra-connector also. Please, try:
--packages datastax:spark-cassandra-connector:2.0.0-M3-s_2.11

submit spark Diagnostics: java.io.FileNotFoundException:

I'm using following command
bin/spark-submit --class com.my.application.XApp
--master yarn-cluster
--executor-memory 100m
--num-executors 50
/Users/nish1013/proj1/target/x-service-1.0.0-201512141101-assembly.jar
1000
and getting java.io.FileNotFoundException: and I can see on my cluster Yarn the app status as FAILED.
The jar is available at the location. Is there any specific place I need to place this jar when use cluster mode spark submit ?
Exception:
Diagnostics: java.io.FileNotFoundException: File file:/Users/nish1013/proj1/target/x-service-1.0.0-201512141101-assembly.jar does not exist
Failing this attempt. Failing the application.
You must pass the jar file to the execution nodes by adding it to the "--jar" argument of spark-submit. E.g.
bin/spark-submit --class com.my.application.XApp
--master yarn-cluster
--jars "/Users/nish1013/proj1/target/x-service-1.0.0-201512141101-assembly.jar"
--executor-memory 100m
--num-executors 50
/Users/nish1013/proj1/target/x-service-1.0.0-201512141101-assembly.jar 1000

How to prevent Spark Executors from getting Lost when using YARN client mode?

I have one Spark job which runs fine locally with less data but when I schedule it on YARN to execute I keep on getting the following error and slowly all executors get removed from UI and my job fails
15/07/30 10:18:13 ERROR cluster.YarnScheduler: Lost executor 8 on myhost1.com: remote Rpc client disassociated
15/07/30 10:18:13 ERROR cluster.YarnScheduler: Lost executor 6 on myhost2.com: remote Rpc client disassociated
I use the following command to schedule Spark job in yarn-client mode
./spark-submit --class com.xyz.MySpark --conf "spark.executor.extraJavaOptions=-XX:MaxPermSize=512M" --driver-java-options -XX:MaxPermSize=512m --driver-memory 3g --master yarn-client --executor-memory 2G --executor-cores 8 --num-executors 12 /home/myuser/myspark-1.0.jar
What is the problem here? I am new to Spark.
I had a very similar problem. I had many executors being lost no matter how much memory we allocated to them.
The solution if you're using yarn was to set --conf spark.yarn.executor.memoryOverhead=600, alternatively if your cluster uses mesos you can try --conf spark.mesos.executor.memoryOverhead=600 instead.
In spark 2.3.1+ the configuration option is now --conf spark.yarn.executor.memoryOverhead=600
It seems like we were not leaving sufficient memory for YARN itself and containers were being killed because of it. After setting that we've had different out of memory errors, but not the same lost executor problem.
You can follow this AWS post to calculate memory overhead (and other spark configs to tune): best-practices-for-successfully-managing-memory-for-apache-spark-applications-on-amazon-emr
When I had the same issue, deleting logs and free up more hdfs space worked.

Resources