I have a spark UI running with 3 workers
Worker Id Address State Cores Memory Resources
worker-20220824124633-192.168.191.141-34525 192.168.191.141:34525 DEAD 6 (0 Used) 6.7 GiB (0.0 B Used)
worker-20220824125307-192.168.191.141-44971 192.168.191.141:44971 DEAD 6 (2 Used) 6.7 GiB (1024.0 MiB Used)
worker-20220824130742-192.168.191.141-46371 192.168.191.141:46371 ALIVE 6 (2 Used) 6.7 GiB (1024.0 MiB Used)
This is my pyspark code. I am trying to read a parquet file of size 111 KB.
spark = SparkSession.builder.master("spark://master:7077")\
.config(conf=conf) \
.getOrCreate()
df = spark.read.parquet("userdata1.parquet")
I am scheduling my job with the following configuration:
conf = SparkConf().setAppName("etcjob")\
.set("spark.cores.max", "2")
Even though spark clearly has 4 more cores free on the worker worker-20220824130742-192.168.191.141-46371 I keep getting this message
22/08/24 18:10:03 WARN TaskSchedulerImpl: Initial job has not accepted any resources; >check your cluster UI to ensure that workers are registered and have sufficient resources
When I run the code the worker screen displays this information
Running Executors (1)
ExecutorID State Cores Memory Resources
13 RUNNING 2 1024.0 MiB
Job Detials
ID: app-20220824131220-0013
Name: etcjob
User: myuser
Logs
stdout stderr
It is clear that enough memory and resources are present. Still, Spark keeps claiming resources are not available. What am I doing wrong?
Related
I am running an spark application for 350 GB and getting error of Signal Term in yarn logs. Here are some configuration of Spark I have.
Executor memory : 50 GB
Driver Memory : 50 GB
Memory Overhead : 6 GB
Number of cores per Executor: 5
I am not able to find a root cause and solution. Please help
Based on my understanding, when YARN allocates containers based on spark configuration ask, YARN automatically rounds up the container size in multiples of 'yarn.scheduler.minimum-allocation-mb'.
For Example,
yarn.scheduler.minimum-allocation-mb: 2 GB
spark.executor.memory: 4 GB
spark.yarn.executor.memoryOverhead: 384 MB
overall spark executor memory ask to YARN is [4 GB + 384 MB] = 4.384 GB
Spark places request to YARN for 4.384 GB container size however YARN allocates containers of size in multiplies of 2 GB(yarn.scheduler.minimum-allocation-mb) and hence in this case it returns the container of size 6 GB (rounding 4.384 to 6). Hence spark executor JVM is started with 6 GB size inside YARN container.
So,
Original spark-executor-memory ask was = 4.384 GB however
YARN allocated memory is = 6 GB
Increment in executor size = 6 - 4.384 GB = 1.6 GB
My question is, as I understand 1.6 GB added to each spark executor overall memory which comprises of executor memory and overhead memory, Which part of overall spark executor memory is increased by 1.6 GB. Is it the spark.yarn.executor.memoryOverhead (or) spark.executor.memory ? How does spark uses the extra memory received due to YARN round up?
Yarn Cluster Configuration:
8 Nodes
8 cores per Node
8 GB RAM per Node
1TB HardDisk per Node
Executor memory & No of Executors
Executor memory and no of executors/node are interlinked so you would first start selecting Executor memory or No of executors then based on your choice you can follow this to set properties to get desired results
In YARN these properties would affect number of containers (/executors in Spark) that can be instantiated in a NodeManager based on spark.executor.cores, spark.executor.memory property values (along with executor memory overhead)
For example, if a cluster with 10 nodes (RAM : 16 GB, cores : 6) and set with following yarn properties
yarn.scheduler.maximum-allocation-mb=10GB
yarn.nodemanager.resource.memory-mb=10GB
yarn.scheduler.maximum-allocation-vcores=4
yarn.nodemanager.resource.cpu-vcores=4
Then with spark properties spark.executor.cores=2, spark.executor.memory=4GB you can expect 2 Executors/Node so total you'll get 19 executors + 1 container for Driver
If the spark properties are spark.executor.cores=3, spark.executor.memory=8GB then you will get 9 Executor (only 1 Executor/Node) + 1 container for Driver link
Driver memory
spark.driver.memory —Maximum size of each Spark driver's Java heap memory
spark.yarn.driver.memoryOverhead —Amount of extra off-heap memory that can be requested from YARN, per driver. This, together with spark.driver.memory, is the total memory that YARN can use to create a JVM for a driver process.
Spark driver memory does not impact performance directly, but it ensures that the Spark jobs run without memory constraints at the driver. Adjust the total amount of memory allocated to a Spark driver by using the following formula, assuming the value of yarn.nodemanager.resource.memory-mb is X:
12 GB when X is greater than 50 GB
4 GB when X is between 12 GB and 50 GB
1 GB when X is between 1GB and 12 GB
256 MB when X is less than 1 GB
These numbers are for the sum of spark.driver.memory and spark.yarn.driver.memoryOverhead . Overhead should be 10-15% of the total.
You can also follow this Cloudera link for tuning Spark jobs
Whether I use dynamic allocation or explicitly specify executors (16) and executor cores (8), I have been losing executors even though the tasks outstanding are well beyond the current number of executors.
For example, I have a job (Spark SQL) running with over 27,000 tasks and 14,000 of them were complete, but executors "decayed" from 128 down to as few as 16 with thousands of tasks still outstanding. The log doesn't note any errors/exceptions precipitating these lost executors.
It is a Cloudera CDH 5.10 cluster running on AWS EC2 instances with 136 CPU cores and Spark 2.1.0 (from Cloudera).
17/05/23 18:54:17 INFO yarn.YarnAllocator: Driver requested a total number of 91 executor(s).
17/05/23 18:54:17 INFO yarn.YarnAllocator: Canceling requests for 1 executor container(s) to have a new desired total 91 executors.
It's a slow decay where every minute or so more executors are removed.
Some potentially relevant configuration options:
spark.dynamicAllocation.maxExecutors = 136
spark.dynamicAllocation.minExecutors = 1
spark.dynamicAllocation.initialExecutors = 1
yarn.nodemanager.resource.cpu-vcores = 8
yarn.scheduler.minimum-allocation-vcores = 1
yarn.scheduler.increment-allocation-vcores = 1
yarn.scheduler.maximum-allocation-vcores = 8
Why are the executors decaying away and how can I prevent it?
I'm using Spark in a YARN cluster (HDP 2.4) with the following settings:
1 Masternode
64 GB RAM (48 GB usable)
12 cores (8 cores usable)
5 Slavenodes
64 GB RAM (48 GB usable) each
12 cores (8 cores usable) each
YARN settings
memory of all containers (of one host): 48 GB
minimum container size = maximum container size = 6 GB
vcores in cluster = 40 (5 x 8 cores of workers)
minimum #vcores/container = maximum #vcores/container = 1
When I run my spark application with the command spark-submit --num-executors 10 --executor-cores 1 --executor-memory 5g ... Spark should give each executor 5 GB of RAM right (I set memory only to 5g due to some overhead memory of ~10%).
But when I had a look in the Spark UI, I saw that each executor only has 3.4 GB of memory, see screenshot:
Can someone explain why there's so less memory allocated?
The storage memory column in the UI displays the amount of memory used for execution and RDD storage. By default, this equals (HEAP_SPACE - 300MB) * 75%. The rest of the memory is used for internal metadata, user data structures and other stuffs.
You can control this amount by setting spark.memory.fraction (not recommended). See more in Spark's documentation