How to properly set spark cluster properties in Databricks - apache-spark

I have a cluster in Databricks for my spark workflow and I wanted some help in setting up right for optimal use. Following are the details of my cluster.
RUNTIME: 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12)
DRIVER TYPE: c5a.8xlarge (64GB Memory, 32 Cores)
WORKER TYPE: c5a.4xlarge (32GB Memory, 16 Cores)
(Min worker 1, Max Workers 5)
This is a going to process large amount of data (not sure about the exact numbers). Here are the current properties that i am using. I think these are not optimal.
spark.driver.extraJavaOptions -Xss64M
spark.executor.cores 7
spark.executor.memory 20G
spark.driver.maxResultSize 20G
spark.sql.shuffle.partitions 35
spark.driver.memory 48G
spark.sql.execution.arrow.pyspark.enabled true
spark.sql.execution.arrow.pyspark.fallback.enabled true
spark.executor.memoryOverhead 1G
Is there a rule or a guide as how to set proper values to get the maximum performance?

Related

yarn allocate containers for Spark in AWS

I was able to create YARN containers for my spark jobs.
I have come across various blogs and youtube videos to efficiently use --executors-cores (use values from 4 -6 for efficient throughput) and --executor memory after reserving 1 CPU cores and 1GB RAM for hadoop deamons and determined the right values for each executor.
I also came across articles like these.
I am checking how many containers are created by YARN from spark shell and i am not able to understand how the containers are allocated.
For example i have created EMR cluster with 1 master node m5.xlarge (4 vcore , 16 Gib) and 1 core node with instance type c5.2xlarge ( 8 vcore and 16 Gib RAM)
When i create the spark shell with the following command spark-shell --num-executors=6 --executor-cores=5 --conf spark.executor.memoryOverhead=1G --executor-memory 1G --driver-memory 1G
i see that 6 executors including a driver are being created with 5 cores for each executor for a total of 25 cores
However the metrics from hadoop history server does not reflect the right calculations
I am very confused how in spark UI , more cores than available were allocated for each executor . The total vcores in the cluster is 8 cores considering the core nodes but a total of 25 executors are allocated for the executors.
Can someone please explain what i am missing.

How is YARN ResourceManager's Total Memory calculated?

I'm running a Spark cluster in a 1 MasterNode, 3 WorkerNode configuration using aws emr and YARN-client, with the MasterNode being the client machine. All 4 nodes have 8GB of memory and 4 cores each. Given that hardware setup, I set the following:
spark.executor.memory = 5G
spark.executor.cores = 3
spark.yarn.executor.memoryOverhead = 600
With that configuration, would the expected Total Memory recognized by Yarn's ResourceManager be 15GB? It's displaying 18GB. I've only seen Yarn use up to 15GB when running Spark applications. Is that 15GB from the spark.executor.memory * 3 nodes?
I want to assume that the YARN Total Memory is calculated by spark.executor.memory + spark.yarn.executor.memoryOverhead but I can't find that documented anywhere. What's the proper way to find the exact number?
And I should be able to increase the value of spark.executor.memory to 6G right? I've gotten errors in the past when it was set like that. Would there be other configurations I need to set?
Edit- So it looks like the workerNodes' value for yarn.scheduler.maximum-allocation-mb is 6114 or 6GB. This is the default that EMR sets for the instance type. And since 6GB * 3 = 18GB, that likely makes sense. I want to restart Yarn and increase that value from 6GB to 7GB, but can't since this is a cluster being used, so I guess my question still stands.
I want to assume that the YARN Total Memory is calculated by spark.executor.memory + spark.yarn.executor.memoryOverhead but I can't find that documented anywhere. What's the proper way to find the exact number?
This is sort of correct, but said backwards. YARN's total memory is independent of any configurations you set up for Spark. yarn.scheduler.maximum-allocation-mb controls how much memory YARN has access to, and can be found here. To use all available memory with Spark, you would set spark.executor.memory + spark.yarn.executor.memoryOverhead to equal yarn.scheduler.maximum-allocation-mb. See here for more info on tuning your spark job and this spreadsheet for calculating configurations.
And I should be able to increase the value of spark.executor.memory to 6G right?
Based on the spreadsheet, the upper limit of spark.executor.memory is 5502M if yarn.scheduler.maximum-allocation-mb is 6114M. Calculated by hand, this is .9 * 6114 as spark.executor.memoryOverhead defaults to
executorMemory * 0.10, with minimum of 384 (source)

Spark on YARN resource manager: Relation between YARN Containers and Spark Executors

I'm new to Spark on YARN and don't understand the relation between the YARN Containers and the Spark Executors. I tried out the following configuration, based on the results of the yarn-utils.py script, that can be used to find optimal cluster configuration.
The Hadoop cluster (HDP 2.4) I'm working on:
1 Master Node:
CPU: 2 CPUs with 6 cores each = 12 cores
RAM: 64 GB
SSD: 2 x 512 GB
5 Slave Nodes:
CPU: 2 CPUs with 6 cores each = 12 cores
RAM: 64 GB
HDD: 4 x 3 TB = 12 TB
HBase is installed (this is one of the parameters for the script below)
So I ran python yarn-utils.py -c 12 -m 64 -d 4 -k True (c=cores, m=memory, d=hdds, k=hbase-installed) and got the following result:
Using cores=12 memory=64GB disks=4 hbase=True
Profile: cores=12 memory=49152MB reserved=16GB usableMem=48GB disks=4
Num Container=8
Container Ram=6144MB
Used Ram=48GB
Unused Ram=16GB
yarn.scheduler.minimum-allocation-mb=6144
yarn.scheduler.maximum-allocation-mb=49152
yarn.nodemanager.resource.memory-mb=49152
mapreduce.map.memory.mb=6144
mapreduce.map.java.opts=-Xmx4915m
mapreduce.reduce.memory.mb=6144
mapreduce.reduce.java.opts=-Xmx4915m
yarn.app.mapreduce.am.resource.mb=6144
yarn.app.mapreduce.am.command-opts=-Xmx4915m
mapreduce.task.io.sort.mb=2457
These settings I made via the Ambari interface and restarted the cluster. The values also match roughly what I calculated manually before.
I have now problems
to find the optimal settings for my spark-submit script
parameters --num-executors, --executor-cores & --executor-memory.
to get the relation between the YARN container and the Spark executors
to understand the hardware information in my Spark History UI (less memory shown as I set (when calculated to overall memory by multiplying with worker node amount))
to understand the concept of the vcores in YARN, here I couldn't find any useful examples yet
However, I found this post What is a container in YARN? , but this didn't really help as it doesn't describe the relation to the executors.
Can someone help to solve one or more of the questions?
I will report my insights here step by step:
First important thing is this fact (Source: this Cloudera documentation):
When running Spark on YARN, each Spark executor runs as a YARN container. [...]
This means the number of containers will always be the same as the executors created by a Spark application e.g. via --num-executors parameter in spark-submit.
Set by the yarn.scheduler.minimum-allocation-mb every container always allocates at least this amount of memory. This means if parameter --executor-memory is set to e.g. only 1g but yarn.scheduler.minimum-allocation-mb is e.g. 6g, the container is much bigger than needed by the Spark application.
The other way round, if the parameter --executor-memory is set to somthing higher than the yarn.scheduler.minimum-allocation-mb value, e.g. 12g, the Container will allocate more memory dynamically, but only if the requested amount of memory is smaller or equal to yarn.scheduler.maximum-allocation-mb value.
The value of yarn.nodemanager.resource.memory-mb determines, how much memory can be allocated in sum by all containers of one host!
=> So setting yarn.scheduler.minimum-allocation-mb allows you to run smaller containers e.g. for smaller executors (else it would be waste of memory).
=> Setting yarn.scheduler.maximum-allocation-mb to the maximum value (e.g. equal to yarn.nodemanager.resource.memory-mb) allows you to define bigger executors (more memory is allocated if needed, e.g. by --executor-memory parameter).

Spark-SQL slow query performance

Below are configurations:
Hadoop-2x (1 master, 2 slaves)
yarn.nodemanager.resource.memory = 7096 m
yarn.scheduler.maximum-allocation= 2560 m
Spark - 1.5.1
spark/conf details in all three nodes :
spark.driver.memory 4g
spark.executor.memory 2g
spark.executor.instances 2
spark-sql>CREATE TABLE demo
USING org.apache.spark.sql.json
OPTIONS path
This path has 32 GB compressed data. It is taking 25 minutes to create table demo. Is there anyway to optimize and bring it down in few minutes? Am I missing something out here?
Most usually each executor should represent each core of your CPU. Also note that master is the most irrelevant of your all your machines, because it only assigns tasks to slaves, which do the actual data processing. Your setup is then correct if your slaves are single-core machines but in most cases you would do something like:
spark.driver.memory // this may be the whole memory of your master
spark.executor.instances // sum of all CPU cores that your slaves have
spark.executor.memory // (sum of all slaves memory) / (executor.instances)
That's the easiest formula and will work in vast majority of Spark jobs.

Using all resources in Apache Spark with Yarn

I am using Apache Spark with Yarn client.
I have 4 worker PCs with 8 vcpus each and 30 GB of ram in my spark cluster.
Im set my executor memory to 2G and number of instances to 33.
My job is taking 10 hours to run and all machines are about 80% idle.
I dont understand the correlation between executor memory and executor instances. Should I have an instance per Vcpu? Should I set the executor memory to be memory of machine/#executors per machine?
I believe that you have to use the following command:
spark-submit --num-executors 4 --executor-memory 7G --driver-memory 2G --executor-cores 8 --class \"YourClassName\" --master yarn-client
Number of executors should be 4, since you have 4 workers. The executor memory should be close to the maximum memory that each yarn node has allocated, roughly ~5-6GB (I assume you have 30GB total RAM).
You should take a look on the spark-submit parameters and fully understand them.
We were using cassandra as our data source for spark. The problem was there were not enough partitions. We needed to split up the data more. Our mapping for # of cassandra partitions to spark partitions was not small enough and we would only generate 10 or 20 tasks instead of 100s of tasks.

Resources