Spark-submit executor memory issue - apache-spark

I have a 10 node cluster, 8 DNs(256 GB, 48 cores) and 2 NNs. I have a spark sql job being submitted to the yarn cluster. Below are the parameters which I have used for spark-submit.
--num-executors 8 \
--executor-cores 50 \
--driver-memory 20G \
--executor-memory 60G \
As can be seen above executor-memory is 60GB, but when I check Spark UI is shows 31GB.
1) Can anyone explain me why it is showing 31GB instead of 60GB.
2) Also help in setting optimal values for parameters mentioned above.

I think,
Memory allocated gets divided into two parts:
1. Storage (caching dataframes/tables)
2. Processing (the one you can see)
31gb is the memory available for processing.
Play around with spark.memory.fraction property to increase/decrease the memory available for processing.
I would suggest to reduce the executor cores to about 8-10
My configuration :
spark-shell --executor-memory 40g --executor-cores 8 --num-executors 100 --conf spark.memory.fraction=0.2

Related

spark number of executors when dynamic allocation is enabled

I have a r5.8xlarge AWS cluster with 12 nodes, so there are 6144 cores (12nodes * 32vCPU * 16cores), I have set --executor-cores=5 and enabled the dynamic execution using the below spark-submit command, even after setting the spark.dynamicAllocation.initialExecutors=150 --conf spark.dynamicAllocation.minExecutors=150, I'm only seeing 70 executors in the spark-UI application, what am I doing wrong?
r5.8xlarge clusters have 256GB per node, so 3072GB(256GB*12nodes)
FYI -I'm not including the driver node in this calculation.
--driver-memory 200G --deploy-mode client --executor-memory 37G --executor-cores 7 --conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.driver.maxResultSize=0 --conf spark.sql.shuffle.partitions=2000 --conf spark.dynamicAllocation.initialExecutors=150 --conf spark.dynamicAllocation.minExecutors=150
You have 256GB per node and 37G per executor, an executor can only be in one node (a executor cannot be shared between multiple nodes), so for each node you will have at most 6 executors (256 / 37 = 6), since you have 12 nodes so the max number of executors will be 6 * 12 = 72 executor which explain why you see only 70 executor in your spark ui (the difference of 2 executor's is caused by the memory allocated to the driver or maybe because of some memory allocation problem in some nodes).
If you want more executors then you have to decrease the memory of the executors, also to fully utilize your cluster make sure that the reminder of the the node memory divided by the executor memory is as close to zero as possible, ex:
256GB per node and 37G per executor: 256 / 37 = 6.9 => 6 executor per node (34G lost per node)
256GB per node and 36G per executor: 256 / 36 = 7.1 => 7 executor per node ( only 4G lost per node, so you gain 30G of unused memory per node)
If you want at least 150 executor then executor memory should be at most 19G

Spark Resource Allocation: Number of Cores

Require understanding on how to configure Cores for an Spark Job.
My Machine can have a max. of 11 Cores , 28 Gb memory .
Below is how I'm allocating resources for my Spark Job and it's execution time is 4.9 mins
--driver-memory 2g \
--executor-memory 24g \
--executor-cores 10 \
--num-executors 6
But I ran through multiple articles mentioning number of cores should be ~ 5, when I ran job with this configuration it's execution time increased to 6.9 mins
--driver-memory 2g \
--executor-memory 24g \
--executor-cores 5 \
--num-executors 6 \
Will there be any issue keeping Number of Cores close to Max. value (10 in my case) ?
Are there any benefits of keeping No. of Cores to 5 , as suggested in many articles ?
So in general what are the factors to consider in determining Number of cores?
It all depends on the behaviour of job, one config does not optimise all needs.
--executor-cores means no of cores on 1 machine.
It that number is too big (>5) then the machine's disk and network (which will be shared among all executor spark cores on that machine) will create bottleneck. If that no is too less (~1) then it will not achieve good data parallelism and won't benefit from locality of data on same machine.
TLDR: --executor-coers 5 is fine.

how to decide no. of cores and executor in aws

i have a data size of 5 TB and if i decide to use r4.8xlarge EC2 machine which has memory = 244 GB per machine and CPU = 32
how can i now decide no. of cores and executors i have to use?
i tried few combination of cores and executors but spark job fails with heap space issue, i have listed it below excluding other parameters
--master yarn
--conf spark.yarn.executor.memoryOverhead=4000
--driver-memory 25G
--executor-memory 240G
--executor-cores 26
--num-executors 13

Spark runs only one executor for large jobs

I have a four node hadoop cluster(mapr) with 40GB memory each. My spark startup parameters are as follows:
MASTER="yarn-client" /opt/mapr/spark/spark-1.6.1/bin/pyspark --num-executors 8 --executor-memory 10g --executor-cores 5 --driver-memory 20g --driver-cores 10 --conf spark.driver.maxResultSize="0" --conf spark.default.parallelism="100"
Now when I run my spark job with 100K records, and run results.count() or result.saveTable(), it runs on all the 8 executors. But if I run the job with 1M records, the jobs is split into 3 stages and final stage runs on only ONE executor. Is it something do with partitioning?
I resolved this issue by converting my dataframe into an rdd and repartition it to a large value like greater than 500, instead of using df.withColumn()
pseudo code:
df_rdd = df.rdd
df_rdd_partioned = df_rdd.repartition(1000)
df_rdd_partioned.cache().count()
result = df_rdd_partioned.map(lambda r: (r, transform(r)), preservesPartitioning=True).toDF()
result.cache()

spark executor memory cut to 1/2

I am doing a spark-submit like this
spark-submit --class com.mine.myclass --master yarn-cluster --num-executors 3 --executor-memory 4G spark-examples_2.10-1.0.jar
in the web ui, I can see indeed there are 3 executor nodes, but each has 2G of memory. When I set --executor-memory 2G, then ui shows 1G per node.
How did it figure to reduce my setting by 1/2?
The executor page of the Web UI is showing the amount of storage memory, which is equal to 54% of Java heap by default (spark.storage.safetyFraction 0.9 * spark.storage.memoryFraction 0.6)

Resources