Difference between requested and allocated memory - Spark on Kubernetes - apache-spark

I am running a Spark Job in Kubernetes cluster using spark-submit command as below,
bin/spark-submit \
--master k8s://https://api-server-host:443 \
--deploy-mode cluster \
--name spark-job-name \
--conf spark.kubernetes.namespace=spark \
--conf spark.kubernetes.container.image=docker-repo/pyspark:55 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-submit \
--conf spark.kubernetes.pyspark.pythonVersion=3 \
--conf spark.executor.memory=4G \
--files local:///mnt/conf.json \
local:///mnt/ingest.py
and when I check the request and limit for the executor pod, it shows below. There is almost 1700 MB extra got allocated for the pod.
Limits:
memory: 5734Mi
Requests:
cpu: 4
memory: 5734Mi
Why is that?

In addition to #CptDolphin 's answer, be aware that Spark always allocates spark.executor.memoryOverhead extra memory (max of 10% of spark.executor.memory or 384MB, unless explicitly configured), and may allocate additional spark.executor.pyspark.memory if you defined that in your configuration.

What you define the pod (as individual system) can use is one thing, what you define spark or java or any other app runing inside that system (pod) can use, is another thing; think of it as normal computer with limits, and then your application with its own limits.

Related

Java heap space OutOfMemoryError while running join query in Spark SQL shell

Here is my cluster configuration:
Master nodes: 1 (16 vCPU, 64 GB memory)
Worker nodes: 2 (total of 64 vCPU, 256 GB memory)
Here is the Hive query I'm trying to run on the Spark SQL shell:
select a.*,b.name as name from (
small_tbl b
join
(select *
from large_tbl where date = '2019-01-01') a
on a.id = b.id);
Here is the query execution plan as shown on the Spark UI:
The configuration properties set while launching the shell are as follows:
spark-sql --conf spark.driver.maxResultSize=30g \
--conf spark.broadcast.compress=true \
--conf spark.rdd.compress=true \
--conf spark.memory.offHeap.enabled=true \
--conf spark.memory.offHeap.size=304857600 \
--conf spark.dynamicAllocation.enabled=false \
--conf spark.executor.instances=12 \
--conf spark.executor.memory=16g
--conf spark.executor.cores=5 \
--conf spark.driver.memory=32g \
--conf spark.yarn.executor.memoryOverhead=512 \
--conf spark.executor.extrajavaoptions=-Xms20g \
--conf spark.executor.heartbeatInterval=30s \
--conf spark.shuffle.io.preferDirectBufs=true \
--conf spark.memory.fraction=0.5
I have tried most of the solutions suggested here and here which is evident in the properties set above. As far as I know it's not a good idea to increase the maxResultSize property on the driver side since datasets may grow beyond driver's memory size and driver shouldn't be used to store data in this scale.
I have executed the query on Tez engine successfully which took around 4 minutes, whereas Spark takes more than 15 mins to execute and terminates abruptly with the lack of heap space issue.
I strongly believe there must be a way to speed up the query execution on Spark. Please suggest me a solution that works for this kind of queries.

Spark: Entire dataset concentrated in one executor

I am running a spark job with 3 files each of 100MB size, for some reason my spark UI shows all dataset concentrated into 2 executors.This is making the job run for 19 hrs and still running.
Below is my spark configuration . spark 2.3 is the version used.
spark2-submit --class org.mySparkDriver \
--master yarn-cluster \
--deploy-mode cluster \
--driver-memory 8g \
--num-executors 100 \
--conf spark.default.parallelism=40 \
--conf spark.yarn.executor.memoryOverhead=6000mb \
--conf spark.dynamicAllocation.executorIdleTimeout=6000s \
--conf spark.executor.cores=3 \
--conf spark.executor.memory=8G \
I tried repartitioning inside the code which works , as this makes the file go into 20 partitions (i used rdd.repartition(20)). But why should I repartition , i believe specifying spark.default.parallelism=40 in the script should let spark divide the input file to 40 executors and process the file in 40 executors.
Can anyone help.
Thanks,
Neethu
I am assuming you're running your jobs in YARN if yes, you can check following properties.
yarn.scheduler.maximum-allocation-mb
yarn.nodemanager.resource.memory-mb
yarn.scheduler.maximum-allocation-vcores
yarn.nodemanager.resource.cpu-vcores
In YARN these properties would affect number of containers that can be instantiated in a NodeManager based on spark.executor.cores, spark.executor.memory property values (along with executor memory overhead)
For example, if a cluster with 10 nodes (RAM : 16 GB, cores : 6) and set with following yarn properties
yarn.scheduler.maximum-allocation-mb=10GB
yarn.nodemanager.resource.memory-mb=10GB
yarn.scheduler.maximum-allocation-vcores=4
yarn.nodemanager.resource.cpu-vcores=4
Then with spark properties spark.executor.cores=2, spark.executor.memory=4GB you can expect 2 Executors/Node so total you'll get 19 executors + 1 container for Driver
If the spark properties are spark.executor.cores=3, spark.executor.memory=8GB then you will get 9 Executor (only 1 Executor/Node) + 1 container for Driver
you can refer to link for more details
Hope this helps

Spark executor pods quickly in error after their creation using kubernetes as master

When I launch the SparkPi example on a self-hosted kubernetes cluster, the executor pods are quickly created -> have an error status -> are deleted -> are replaced by new executors pods.
I tried the same command on a Google Kubernetes Engine with success. I check the RBAC rolebinding to make sure that the service account has right to create the pod.
Guessing when the next executor pod will be ready, I can see using kubectl describe pod <predicted_executor_pod_with_number> that the pod is actually created:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 1s default-scheduler Successfully assigned default/examplepi-1563878435019-exec-145 to slave-node04
Normal Pulling 0s kubelet, slave-node04 Pulling image "myregistry:5000/imagery:c5b8e0e64cc98284fc4627e838950c34ccb22676.5"
Normal Pulled 0s kubelet, slave-node04 Successfully pulled image "myregistry:5000/imagery:c5b8e0e64cc98284fc4627e838950c34ccb22676.5"
Normal Created 0s kubelet, slave-node04 Created container executor
This is my spark-submit call:
/opt/spark/bin/spark-submit \
--master k8s://https://mycustomk8scluster:6443 \
--name examplepi \
--deploy-mode cluster \
--driver-memory 2G \
--executor-memory 2G \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///opt/spark/work-dir/log4j.properties \
--conf spark.kubernetes.container.image=myregistry:5000/imagery:c5b8e0e64cc98284fc4627e838950c34ccb22676.5 \
--conf spark.kubernetes.executor.container.image=myregistry:5000/imagery:c5b8e0e64cc98284fc4627e838950c34ccb22676.5 \
--conf spark.kubernetes.container.image.pullPolicy=Always \
--conf spark.kubernetes.driver.pod.name=pi-driver \
--conf spark.driver.allowMultipleContexts=true \
--conf spark.kubernetes.local.dirs.tmpfs=true \
--class com.olameter.sdi.imagery.IngestFromGrpc \
--class org.apache.spark.examples.SparkPi \
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.3.jar 100
I expect that the required executor (2) should be created. If the driver script cannot create it, I would at least expect some log to be able to diagnose the issue.
The issue was related to Hadoop + Spark integration. I was using Spark binary without Hadoop spark-2.4.3-bin-without-hadoop.tgz+ Hadoop 3.1.2. The configuration using environment variables seemed to be problematic for the Spark Executor.
I compiled Spark with Hadoop 3.1.2 to solve this issue. See: https://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn.

Dynamic Allocation with spark streaming on yarn not scaling down executors

I'm using spark-streaming (spark version 2.2) on yarn cluster and am trying to enable dynamic core allocation for my application.
The number of executors scale up as required but once executors are assigned, they are not being scaled down even when the traffic is reduced, i.e, and executor once allocated is not being released. I have also enabled external shuffle service on yarn, as mentioned here:
https://spark.apache.org/docs/latest/running-on-yarn.html#configuring-the-external-shuffle-service
The configs that I have set in spark-submit command are:
--conf spark.dynamicAllocation.enabled=false \
--conf spark.streaming.dynamicAllocation.enabled=true \
--conf spark.streaming.dynamicAllocation.scalingInterval=30 \
--conf spark.shuffle.service.enabled=true \
--conf spark.streaming.dynamicAllocation.initialExecutors=15 \
--conf spark.streaming.dynamicAllocation.minExecutors=10 \
--conf spark.streaming.dynamicAllocation.maxExecutors=30 \
--conf spark.streaming.dynamicAllocation.executorIdleTimeout=60s \
--conf spark.streaming.dynamicAllocation.cachedExecutorIdleTimeout=60s \
Can someone help if there is any particular config that I'm missing ?
Thanks
The document added as part of this JIRA helped me out: https://issues.apache.org/jira/browse/SPARK-12133.
The key point to note was that the number of executors are scaled down when the ratio (batch processing time/batch duration) is less than 0.5(default value) which essentially means that the executors are idle for half of the time. The config that can be used to alter this default value is "spark.streaming.dynamicAllocation.scalingDownRatio"

Spark client mode - YARN allocates a container for driver?

I am running Spark on YARN in client mode, so I expect that YARN will allocate containers only for the executors. Yet, from what I am seeing, it seems like a container is also allocated for the driver, and I don't get as many executors as I was expecting.
I am running spark submit on the master node. Parameters are as follows:
sudo spark-submit --class ... \
--conf spark.master=yarn \
--conf spark.submit.deployMode=client \
--conf spark.yarn.am.cores=2 \
--conf spark.yarn.am.memory=8G \
--conf spark.executor.instances=5 \
--conf spark.executor.cores=3 \
--conf spark.executor.memory=10G \
--conf spark.dynamicAllocation.enabled=false \
While running this application, Spark UI's Executors page shows 1 driver and 4 executors (5 entries in total). I would expect 5, not 4 executors.
At the same time, YARN UI's Nodes tab shows that on the node that isn't actually used (at least according to Spark UI's Executors page...) there's a container allocated, using 9GB of memory. The rest of the nodes have containers running on them, 11GB of memory each.
Because in my Spark Submit the driver has 2GB less memory than executors, I think that the 9GB container allocated by YARN is for the driver.
Why is this extra container allocated? How can i prevent this?
Spark UI:
YARN UI:
Update after answer by Igor Dvorzhak
I was falsely assuming that the AM will run on the master node, and that it will contain the driver app (so setting spark.yarn.am.* settings will relate to the driver process).
So I've made the following changes:
set the spark.yarn.am.* settings to defaults (512m of memory, 1 core)
set the driver memory through spark.driver.memory to 8g
did not try to set driver cores at all, since it is only valid for cluster mode
Because AM on default settings takes up 512m + 384m of overhead, its container fits into the spare 1GB of free memory on a worker node.
Spark gets the 5 executors it requested, and the driver memory is appropriate to the 8g setting. All works as expected now.
Spark UI:
YARN UI:
Extra container is allocated for YARN application master:
In client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
Even though in client mode driver runs in the client process, YARN application master is still running on YARN and requires container allocation.
There are no way to prevent container allocation for YARN application master.
For reference, similar question asked time ago: Resource Allocation with Spark and Yarn.
You can specify the driver memory and number of executors in spark submit as below.
spark-submit --jars..... --master yarn --deploy-mode cluster --driver-memory 2g --driver-cores 4 --num-executors 5 --executor-memory 10G --executor-cores 3
Hope it helps you.

Resources