Spark heapspace error while running Python program - apache-spark

When I running Python code in Spark using
spark-submit --master local --packages com.databricks:spark-xml_2.10:0.4.1 \
--driver-memory 8G --executor-memory 7G
I get this error
17/02/28 18:59:25 ERROR util.Utils: Uncaught exception in thread stdout writer for /usr/local/bin/python2.7 java.lang.OutOfMemoryError: Java heap space
I get the same error when using
spark.yarn.executor.memoryOverhead=1024M
I have 32 GB of RAM and Java options are 4 GB.
How can I fix this?

Related

hadoop/yarn/spark executor memory increase

when I execute a spark-submit command with --master yarn-cluster --num-executors 7 --driver-memory 10g --executor-memory 16g --executor-cores 5, I get the below error, I am not sure where to change the heap size, I suspect Yarn config files somewhere, Please advice
error
Invalid maximum heap size: -Xmx10g
The specified size exceeds the maximum representable size.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.**

Yarn Spark HBase - ExecutorLostFailure Container killed by YARN for exceeding memory limits

I am trying to read a big hbase table in spark (~100GB in size).
Spark Version : 1.6
Spark submit parameters:
spark-submit --master yarn-client --num-executors 10 --executor-memory 4G
--executor-cores 4
--conf spark.yarn.executor.memoryOverhead=2048
Error: ExecutorLostFailure Reason: Container killed by YARN for
exceeding limits.
4.5GB of 3GB physical memory used limits. Consider boosting spark.yarn.executor.memoryOverhead.
I have tried setting spark.yarn.executor.memoryOverhead to 100000. Still getting similar error.
I don't understand why spark doesn't spill to disk if the memory is insufficient OR is YARN causing the problem here.
Please share your code how you try to read in.
And also your cluster architecture
Container killed by YARN for exceeding limits. 4.5GB of 3GB physical memory used limits
Try
spark-submit
--master yarn-client
--num-executors 4
--executor-memory 100G
--executor-cores 4
--conf spark.yarn.executor.memoryOverhead=20480
If you have 128 gRam
The situation is clear, you run out of ram, try to rewrite your code in a disk friendly way.

Spark job fails due to stackoverflow error

My spark job is using Mllib to train a LogisticRegression on a data but it fails due to the Stackoverflow error, here is the error message shown in the spark-shell
java.lang.StackOverflowError
at scala.collection.generic.Growable$$anonfun$$plus$plus$eq$1.apply(Growable.scala:48)
at scala.collection.generic.Growable$$anonfun$$plus$plus$eq$1.apply(Growable.scala:48)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:176)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
at scala.collection.generic.GenericCompanion.apply(GenericCompanion.scala:48)
...
when I check the Spark UI, there is no failed stage or job! This is how I run my spark-shell
spark-shell --num-executors 100 --driver-memory 20g --conf spark.driver.maxResultSize=5g --executor-memory 8g --executor-cores 3
I even tried to increase the size of the stack by adding the following line when running the spark-shell, but it didn't help
--conf "spark.driver.extraJavaOptions='-XX:ThreadStackSize=81920'"
What the issue is?

how to solve java.lang.OutOfMemoryError: Java heap space when train word2vec model in Spark?

Solu:I put the params driver-memory 40G in the spark-submit .
Ques:My Spark cluster is consist of 5 ubuntu server ,each with 80G memory and 24 cores.
word2vec is about 10G newsdata.
and I submit the job with standalone mode like this :
spark-submit --name trainNewsdata --class Word2Vec.trainNewsData --master spark://master:7077 --executor-memory 70G --total-executor-cores 96 sogou.jar hdfs://master:9000/user/bd/newsdata/* hdfs://master:9000/user/bd/word2vecModel_newsdata
When I train word2vec model in spark,I occure :
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space,
and I don't know how to solve it,please help me:)
I put the params driver-memory 40G in the spark-submit,and Then solve it.

Spark GraphX memory out of error SparkListenerBus (java.lang.OutOfMemoryError: Java heap space)

I have problem with out of memory on Apache Spark (Graphx). Application run, but after some time shutdown. I use Spark 1.2.0. Cluster has enough memory a number of cores. Other application where I am not using GraphX, run without problem. Application use Pregel.
I submit application in Hadoop YARN mode:
HADOOP_CONF_DIR=/etc/hadoop/conf spark-submit --class DPFile --deploy-mode cluster --master yarn --num-executors 4 --driver-memory 10g --executor-memory 6g --executor-cores 8 --files log4j.properties spark_routing_2.10-1.0.jar road_cr_big2 1000
Spark configuration:
val conf = new SparkConf(true)
.set("spark.eventLog.overwrite", "true")
.set("spark.driver.extraJavaOptions", "-Dlog4j.configuration=log4j.properties")
.set("spark.yarn.applicationMaster.waitTries", "60")
.set("yarn.log-aggregation-enable","true")
.set("spark.akka.frameSize", "500")
.set("spark.akka.askTimeout", "600")
.set("spark.core.connection.ack.wait.timeout", "600")
.set("spark.akka.timeout","1000")
.set("spark.akka.heartbeat.pauses","60000")
.set("spark.akka.failure-detector.threshold","3000.0")
.set("spark.akka.heartbeat.interval","10000")
.set("spark.ui.retainedStages","100")
.set("spark.ui.retainedJobs","100")
.set("spark.driver.maxResultSize","4G")
Thank you for answers.
Log:
ERROR Utils: Uncaught exception in thread SparkListenerBus
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2367)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
at java.lang.StringBuilder.append(StringBuilder.java:132)
at scala.collection.mutable.StringBuilder.append(StringBuilder.scala:197)
at org.apache.spark.util.FileLogger.logLine(FileLogger.scala:192)
at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:88)
at org.apache.spark.scheduler.EventLoggingListener.onJobStart(EventLoggingListener.scala:113)
at org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$3.apply(SparkListenerBus.scala:50)
at org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$3.apply(SparkListenerBus.scala:50)
at org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:83)
at org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:81)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.SparkListenerBus$class.foreachListener(SparkListenerBus.scala:81)
at org.apache.spark.scheduler.SparkListenerBus$class.postToAll(SparkListenerBus.scala:50)
at org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:32)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:56)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1468)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:46)
Exception in thread "SparkListenerBus" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2367)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
at java.lang.StringBuilder.append(StringBuilder.java:132)
at scala.collection.mutable.StringBuilder.append(StringBuilder.scala:197)
at org.apache.spark.util.FileLogger.logLine(FileLogger.scala:192)
at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:88)
at org.apache.spark.scheduler.EventLoggingListener.onJobStart(EventLoggingListener.scala:113)
at org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$3.apply(SparkListenerBus.scala:50)
at org.apache.spark.scheduler.SparkListenerBus$$anonfun$postToAll$3.apply(SparkListenerBus.scala:50)
at org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:83)
at org.apache.spark.scheduler.SparkListenerBus$$anonfun$foreachListener$1.apply(SparkListenerBus.scala:81)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.SparkListenerBus$class.foreachListener(SparkListenerBus.scala:81)
at org.apache.spark.scheduler.SparkListenerBus$class.postToAll(SparkListenerBus.scala:50)
at org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:32)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:56)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:56)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1468)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:46)
ERROR LiveListenerBus: SparkListenerBus thread is dead! This means SparkListenerEvents have notbeen (and will no longer be) propagated to listeners for some time.
ERROR ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM

Resources