What should be my spark-submit options for better performance and now Heap memory issue - apache-spark

I have 1 driver and 6 core instances with 16GB ram and 8 cores each.
I am running spark-submit with below options:
spark-submit --driver-memory 4g \
--executor-memory 6g \
--num-executors 12 \
--executor-cores 2 \
--conf spark.driver.maxResultSize=0 \
--conf spark.network.timeout=800 job.py
I am getting Java heap memory error multiple times, I think there is something wrong with the options can someone help me out with this.
Thanks

Related

Spark fail if not all resources are allocated

Does spark or yarn has any flag to fail fast job if we can't allocate all resoucres?
For example if i run
spark-submit --class org.apache.spark.examples.SparkPi
--master yarn-client
--num-executors 7
--driver-memory 512m
--executor-memory 4g
--executor-cores 1
/usr/hdp/current/spark2-client/examples/jars/spark-examples_*.jar 1000
For now if spark can allocate only 5 executors it just will go with 5. Can we make to run it only with 7 or fail in other case?
You can set a spark.dynamicAllocation.minExecutors config in your job. For it you need to set spark.dynamicAllocation.enabled=true, detailed in this doc

spark scala memory management issues

I am trying to submit a spark scala job with below configuration:
spark-submit --class abcd --queue new --master yarn --executor-cores 1 --executor-memory 4g --driver-memory 2g --num-executors 1
The allocated space for the queue is 700GB and it is taking entire 700GB and running.
Is there a way to restrict to 100GB only?
Thanks in advance.

Spark Streaming - Diagnostics: Container is running beyond physical memory limits

My Spark Streaming job failed with the below exception
Diagnostics: Container is running beyond physical memory limits.
Current usage: 1.5 GB of 1.5 GB physical memory used; 3.6 GB of 3.1 GB
virtual memory used. Killing container.
Here is my spark submit command
spark2-submit \
--name App name \
--class Class name \
--master yarn \
--deploy-mode cluster \
--queue Queue name \
--num-executors 5 --executor-cores 3 --executor-memory 5G \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.locality.wait=10 \
--conf spark.task.maxFailures=8 \
--conf spark.ui.killEnabled=false \
--conf spark.logConf=true \
--conf spark.yarn.driver.memoryOverhead=512 \
--conf spark.yarn.executor.memoryOverhead=2048 \
--conf spark.yarn.max.executor.failures=40 \
jar path
I am not sure what's causing the above issue. Am I missing something in the above command or is it failing as I didn't set --driver-memory in my spark submit command?

Spark Job using more executors than allocated in jobs

I have following settings in my Spark job:
--num-executors 2
--executor-cores 1
--executor-memory 12G
--driver memory 16G
--conf spark.streaming.dynamicAllocation.enabled=false \
--conf spark.dynamicAllocation.enabled=false \
--conf spark.streaming.receiver.writeAheadLog.enable=false
--conf spark.executor.memoryOverhead=8192
--conf spark.driver.memoryOverhead=8192'
My understanding is job should run with 2 executors however it is running with 3. This is happening to multiple of my jobs. Could someone please explain the reason?

Spark Standalone --total-executor-cores

Im using Spark 2.1.1 Standalone cluster,
Although I have 29 free cores in my cluster (Cores in use: 80 Total, 51 Used), when submitting new spark job with --total-executor-cores 16 this config is not taking affect and the job submitted only with 6 cores..
What am I missing?
(deleting checkpoints doesn't help)
Here is my spark-submit command:
PYSPARK_PYTHON="/usr/bin/python3.4"
PYSPARK_DRIVER_PYTHON="/usr/bin/python3.4" \
/opt/spark/spark-2.1.1-bin-hadoop2.7/bin/spark-submit \
--master spark://XXXX.XXXX:7077 \
--conf "spark.sql.shuffle.partitions=2001" \
--conf "spark.port.maxRetries=200" \
--conf "spark.executorEnv.PYTHONHASHSEED=0" \
--executor-memory 24G \
--total-executor-cores 16 \
--driver-memory 8G \
/home/XXXX/XXXX.py \
--spark_master "spark://XXXX.XXXX:7077" \
--topic "XXXX" \
--broker_list "XXXX" \
--hdfs_prefix "hdfs://XXXX"
My problem was the high number of memory I asked from spark (--executor-memory 24G) - spark tried to find worker nodes with 24G free memory and found only 2 nodes, each node had 3 free cores (that's why I saw only 6 cores).
When decreasing the number of memory to 8G, spark found the number of cores specified.

Resources