spark heap size error even RAM is 32 GB and JAVA_OPTIONS=-Xmx8g - apache-spark

I have 32 GB of physical memoryand my input file size about 30 MB, I try to submit my spark job in yarn client mode using the below command
spark-submit --master yarn --packages com.databricks:spark-xml_2.10:0.4.1 --driver-memory 8g ericsson_xml_parsing_version_6_stage1.py
and my executor space is 8g, but get the below error anyone please help me to configure the java heap memory. I read about the --driver-java-options using command line but I don't know how to set java heap space using this option.
Anyone please help me out.
java.lang.OutOfMemoryError: Java heap space
enter image description here

Did you try to configure executor memory as well?
like this: "--executor-memory 8g"

Related

Spark: use of driver-memory parameter

When I submit this command, my job failed with error "Container is running beyond physical memory limits".
spark-submit --master yarn --deploy-mode cluster --executor-memory 5G --total-executor-cores 30 --num-executors 15 --conf spark.yarn.executor.memoryOverhead=1000
But adding the parameter: --driver-memory to 5GB (or upper), the job ends without error.
spark-submit --master yarn --deploy-mode cluster --executor-memory 5G --total executor-cores 30 --num-executors 15 --driver-memory 5G --conf spark.yarn.executor.memoryOverhead=1000
Cluster info: 6 nodes with 120GB of Memory. YARN Container Memory Minimum: 1GB
The question is: what is the difference in using or not this parameter?
If increasing the driver memory is helping you to successfully complete the job then it means that driver is having lots of data coming into it from executors. Typically, the driver program is responsible for collecting results back from each executor after the tasks are executed. So, in your case it seems that increasing the driver memory helped to store more results back into the driver memory.
If you read the some points on executor memory, driver memory and the way Driver interacts with executors then you will get better clarity on the situation you are in.
Hope it helps to some extent.

hadoop/yarn/spark executor memory increase

when I execute a spark-submit command with --master yarn-cluster --num-executors 7 --driver-memory 10g --executor-memory 16g --executor-cores 5, I get the below error, I am not sure where to change the heap size, I suspect Yarn config files somewhere, Please advice
error
Invalid maximum heap size: -Xmx10g
The specified size exceeds the maximum representable size.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.**

Yarn Spark HBase - ExecutorLostFailure Container killed by YARN for exceeding memory limits

I am trying to read a big hbase table in spark (~100GB in size).
Spark Version : 1.6
Spark submit parameters:
spark-submit --master yarn-client --num-executors 10 --executor-memory 4G
--executor-cores 4
--conf spark.yarn.executor.memoryOverhead=2048
Error: ExecutorLostFailure Reason: Container killed by YARN for
exceeding limits.
4.5GB of 3GB physical memory used limits. Consider boosting spark.yarn.executor.memoryOverhead.
I have tried setting spark.yarn.executor.memoryOverhead to 100000. Still getting similar error.
I don't understand why spark doesn't spill to disk if the memory is insufficient OR is YARN causing the problem here.
Please share your code how you try to read in.
And also your cluster architecture
Container killed by YARN for exceeding limits. 4.5GB of 3GB physical memory used limits
Try
spark-submit
--master yarn-client
--num-executors 4
--executor-memory 100G
--executor-cores 4
--conf spark.yarn.executor.memoryOverhead=20480
If you have 128 gRam
The situation is clear, you run out of ram, try to rewrite your code in a disk friendly way.

how to solve java.lang.OutOfMemoryError: Java heap space when train word2vec model in Spark?

Solu:I put the params driver-memory 40G in the spark-submit .
Ques:My Spark cluster is consist of 5 ubuntu server ,each with 80G memory and 24 cores.
word2vec is about 10G newsdata.
and I submit the job with standalone mode like this :
spark-submit --name trainNewsdata --class Word2Vec.trainNewsData --master spark://master:7077 --executor-memory 70G --total-executor-cores 96 sogou.jar hdfs://master:9000/user/bd/newsdata/* hdfs://master:9000/user/bd/word2vecModel_newsdata
When I train word2vec model in spark,I occure :
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space,
and I don't know how to solve it,please help me:)
I put the params driver-memory 40G in the spark-submit,and Then solve it.

Spark on Yarn: driver memory is checked on the client side?

I thought I understood the spark on yarn architecture quite well but now I wonder: when I launch
spark-submit --master yarn-cluster --class com.domain.xxx.ddpaction.DdpApp --num-executors 24 --deploy-mode cluster --driver-memory 4g --executor-memory 2g --executor-cores 1 --conf "spark.yarn.jar=/spark/lib/spark-assembly-1.1.0-hadoop2.4.0.jar" ddpaction-3.1.0.jar yarn-cluster config.yml
it fails with a
# Native memory allocation (malloc) failed to allocate 2863333376 bytes for committing reserved memory
The server from which I launch spark-submit has less than 2GB of free mem and this causes the error, but the resource manager, where the driver should execute has far more than the 4GB set as the driver-memory parameter.
Why does driver-memory value, that in my understanding should only be checked and allocated on the yarn cluster in the resource manager, is allocated on the server that launch spark-submit in yarn-cluster mode?
This is a bug that was fixed in Spark-1.4.0 See SPARK-3884
It looks like there is a bad simplification in spark-submit script:
elif [ "$1" = "--driver-memory" ]; then
export SPARK_SUBMIT_DRIVER_MEMORY=$2
So the driver memory param value is used by spark-submit to set its allocated memory; this is only right and needed in yarn-client mode, not in yarn-cluster.
I solved my problem by replacing those lines by:
elif [ "$1" = "--spark-submit-memory" ]; then
export SPARK_SUBMIT_DRIVER_MEMORY=$2
so now I can set (if needed) the memory allocated to spark-submit to a different value of the one of the driver.

Resources