Spark Streaming - Diagnostics: Container is running beyond physical memory limits - apache-spark

My Spark Streaming job failed with the below exception
Diagnostics: Container is running beyond physical memory limits.
Current usage: 1.5 GB of 1.5 GB physical memory used; 3.6 GB of 3.1 GB
virtual memory used. Killing container.
Here is my spark submit command
spark2-submit \
--name App name \
--class Class name \
--master yarn \
--deploy-mode cluster \
--queue Queue name \
--num-executors 5 --executor-cores 3 --executor-memory 5G \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.locality.wait=10 \
--conf spark.task.maxFailures=8 \
--conf spark.ui.killEnabled=false \
--conf spark.logConf=true \
--conf spark.yarn.driver.memoryOverhead=512 \
--conf spark.yarn.executor.memoryOverhead=2048 \
--conf spark.yarn.max.executor.failures=40 \
jar path
I am not sure what's causing the above issue. Am I missing something in the above command or is it failing as I didn't set --driver-memory in my spark submit command?

Related

Can we have multiple executors in Spark master local[*] deployment code client

I have a 1 node Hadoop Cluster, I am submitting a spark job like this
spark-submit \
--class com.compq.scriptRunning \
--master local[*] \
--deploy-mode client \
--num-executors 3 \
--executor-cores 4 \
--executor-memory 21g \
--driver-cores 2 \
--driver-memory 5g \
--conf "spark.local.dir=/data/spark_tmp" \
--conf "spark.sql.shuffle.partitions=2000" \
--conf "spark.sql.inMemoryColumnarStorage.compressed=true" \
--conf "spark.sql.autoBroadcastJoinThreshold=200000" \
--conf "spark.speculation=false" \
--conf "spark.hadoop.mapreduce.map.speculative=false" \
--conf "spark.hadoop.mapreduce.reduce.speculative=false" \
--conf "spark.ui.port=8099" \
.....
Though I define 3 executors, I see only 1 executor in spark UI page running all the time. Can we have multiple executors running in parallel with
--master local[*] \
--deploy-mode client \
Its a on-prem, plain open source hadoop flavor installed in the cluster.
I tried changing master local to local[*] and playing around with deployment modes still, I could see only 1 executor running in spark UI

Spark-submit with properties file for python

I am trying to load spark configuration through the properties file using the following command in Spark 2.4.0.
spark-submit --properties-file props.conf sample.py
It gives the following error
org.apache.spark.SparkException: Dynamic allocation of executors requires the external shuffle service. You may enable this through spark.shuffle.service.enabled.
The props.conf file has this
spark.master yarn
spark.submit.deployMode client
spark.authenticate true
spark.sql.crossJoin.enabled true
spark.dynamicAllocation.enabled true
spark.driver.memory 4g
spark.driver.memoryOverhead 2048
spark.executor.memory 2g
spark.executor.memoryOverhead 2048
Now, when I try to run the same by adding all arguments to the command itself, it works fine.
spark2-submit \
--conf spark.master=yarn \
--conf spark.submit.deployMode=client \
--conf spark.authenticate=true \
--conf spark.sql.crossJoin.enabled=true \
--conf spark.dynamicAllocation.enabled=true \
--conf spark.driver.memory=4g \
--conf spark.driver.memoryOverhead=2048 \
--conf spark.executor.memory=2g \
--conf spark.executor.memoryOverhead=2048 \
sample.py
This works as expected.
I don't think spark supports --properties-file, one workaround is making the change on $SPARK_HOME/conf/spark-defaults.conf, spark will auto load it.
You can refer https://spark.apache.org/docs/latest/submitting-applications.html#loading-configuration-from-a-file

Spark Thrift server queuing up queries

When Parallel queries are hitting Spark Thrift server, in Spark UI --> JDBC/ODBC Server , it shows up all queries as started but all of them gets executed in a sequential manner
Here's the Thrift Server startup script---
start_thriftserver (){
sudo /usr/lib/spark/sbin/start-thriftserver.sh \
--master yarn \
--deploy-mode client \
--executor-memory 3200m \
--executor-cores 2 \
--driver-memory 4g \
--conf spark.dynamicAllocation.enabled=true \
--conf spark.shuffle.service.enabled=true \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.dynamicAllocation.schedulerBacklogTimeout=1s \
--conf spark.dynamicAllocation.minExecutors=50 \
--conf spark.executor.memoryOverhead=684
This is indeed a confusing topic.
spark.sql.hive.thriftServer.singleSession=false
Try this.
That said, I am a little sceptical on all this.

Spark Job using more executors than allocated in jobs

I have following settings in my Spark job:
--num-executors 2
--executor-cores 1
--executor-memory 12G
--driver memory 16G
--conf spark.streaming.dynamicAllocation.enabled=false \
--conf spark.dynamicAllocation.enabled=false \
--conf spark.streaming.receiver.writeAheadLog.enable=false
--conf spark.executor.memoryOverhead=8192
--conf spark.driver.memoryOverhead=8192'
My understanding is job should run with 2 executors however it is running with 3. This is happening to multiple of my jobs. Could someone please explain the reason?

Spark Standalone --total-executor-cores

Im using Spark 2.1.1 Standalone cluster,
Although I have 29 free cores in my cluster (Cores in use: 80 Total, 51 Used), when submitting new spark job with --total-executor-cores 16 this config is not taking affect and the job submitted only with 6 cores..
What am I missing?
(deleting checkpoints doesn't help)
Here is my spark-submit command:
PYSPARK_PYTHON="/usr/bin/python3.4"
PYSPARK_DRIVER_PYTHON="/usr/bin/python3.4" \
/opt/spark/spark-2.1.1-bin-hadoop2.7/bin/spark-submit \
--master spark://XXXX.XXXX:7077 \
--conf "spark.sql.shuffle.partitions=2001" \
--conf "spark.port.maxRetries=200" \
--conf "spark.executorEnv.PYTHONHASHSEED=0" \
--executor-memory 24G \
--total-executor-cores 16 \
--driver-memory 8G \
/home/XXXX/XXXX.py \
--spark_master "spark://XXXX.XXXX:7077" \
--topic "XXXX" \
--broker_list "XXXX" \
--hdfs_prefix "hdfs://XXXX"
My problem was the high number of memory I asked from spark (--executor-memory 24G) - spark tried to find worker nodes with 24G free memory and found only 2 nodes, each node had 3 free cores (that's why I saw only 6 cores).
When decreasing the number of memory to 8G, spark found the number of cores specified.

Resources