I'm running 2 copies of my Spark Streaming application (Spark 2.2.1, EMR 5.11, Scala) on AWS EMR (3 nodes * m4.4xlarge cluster - 16vCPU and 64G RAM each node).
In built-in EMR cluster monitoring (Ganglia) I see that CPU utilization of the cluster is less than 30%, memory is used not more than 32GB from ~200GB available, the network is also far from 100%. But the applications can barely finish batch processing within the batch interval.
Here are params I'm using to submit each copy of the app to Master using client mode:
--master yarn
--num-executors 2
--executor-cores 20
--executor-memory 20G
--conf spark.driver.memory=4G
--conf spark.driver.cores=3
How can I reach better resources utilization (app performance)?
Using maximizeResourceAllocation from aws docs there all these things are discussed in detail. Read it completely
You can configure your executors to utilize the maximum resources possible on each node in a cluster by using the spark configuration classification to set maximizeResourceAllocation option to true. This EMR-specific option calculates the maximum compute and memory resources available for an executor on an instance in the core instance group. It then sets the corresponding spark-defaults settings based on this information.
[
{
"Classification": "spark",
"Properties": {
"maximizeResourceAllocation": "true"
}
}
]
Further reading
Best practices for successfully managing memory for Apache Spark applications on Amazon EMR
EMR-spark-tuning-demystified
does your spark executors have multiple vcores?
if yes, then there is a configuration issue on aws emr for allocating the correct amount of cpu
yarn is not honouring yarn.nodemanager.resource.cpu-vcores
see this answer here, turning on a dominant capacity allowed more vcores to operate which i saw increased the cpu usage when monitoring the usage.
As for memory, how big is your dataset and how much memory do you have? - can you see any disk write operations which would account for data moving from memory to disk?
Related
There are parameters that decide the maximum, minimum and total of the memory and cpu that yarn can allocate via containers
example:
yarn.nodemanager.resource.memory-mb
yarn.scheduler.maximum-allocation-mb
yarn.scheduler.minimum-allocation-mb
yarn.nodemanager.resource.cpu-vcores
yarn.scheduler.maximum-allocation-vcores
yarn.scheduler.minimum-allocation-vcores
There are also spark side parameters that seemingly would control similar kind of allocations:
spark.executor.instances
spark.executor.memory
spark.executor.cores
etc
What happens when the two set of parameters are infeasible according to the bounds set by the other. For example: What if yarn.scheduler.maximum-allocation-mb is set to 1G and the spark.executor.memory is set to 2G? Similar conflicts and infeasibilities could be imagined for the other parameters as well.
What happens in such cases? And, what is the suggested way to set these parameters?
When running Spark on YARN, each Spark executor runs as a YARN container
So take spark.executor.memory as an example:
If spark.executor.memory is 2G and yarn.scheduler.maximum-allocation-mb is 1G, then your container will be OOM killer
If spark.executor.memory is 2G and yarn.scheduler.minimum-allocation-mb is 4G, then your container is much bigger than needed by the Spark application
Suggestions for setting parameters depend on your hardware resources and other services running on this machine. You can try to use the default value first, and then make adjustments by monitoring machine resources
This excellent https://community.cloudera.com/t5/Support-Questions/Yarn-container-size-flexible-to-satisfy-what-application-ask/m-p/115458 and Difference between `yarn.scheduler.maximum-allocation-mb` and `yarn.nodemanager.resource.memory-mb`? should give you the basics. Additionally, here is a good SO-related answer Spark on YARN resource manager: Relation between YARN Containers and Spark Executors.
TL;DR
As you are not talking about Kubernetes, then YARN as Resource / Cluster Mgr allocates Executors with needed resouces, based on Spark params / defaults that are allocated based on those YARN params for the Containers.
1 Container = 1 Executor. Some state incorrectly, 1 Container N Executors, not so.
There is minimum allocation and max allocation of resources, based on those YARN params. So,YARN will provide Executors with some wastage of resources, if it can - or of restricted size.
If non-dynamic YARN resource allocation, then Apps start with less resources, else there will be a wait to get all resources, and those acquired are not available for others.
There is also a fair scheduler for more smooth, uniform throughput for many concurrent apps.
I ran several streaming spark jobs and batch spark jobs in the same EMR cluster. Recently, one batch spark job is programmed wrong, which consumed a lot of memory. It causes the master node not response and all other spark jobs stuck, which means the whole EMR cluster is basically down.
Are there some way that we can restrict the maximum memory that a spark job can consume? If the spark job consumes too much memory, it can be failed. However, we do not hope the whole EMR cluster is down.
The spark jobs are running in the client mode with spark submit cmd as below.
spark-submit --driver-memory 2G --num-executors 1 --executor-memory 2G --executor-cores 1 --class test.class s3://test-repo/mysparkjob.jar
'Classification':'yarn-site',
'Properties':{
'yarn.nodemanager.disk-health-checker.enable':'true',
'yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage':'95.0',
'yarn.nodemanager.localizer.cache.cleanup.interval-ms': '100000',
'yarn.nodemanager.localizer.cache.target-size-mb': '1024',
'yarn.nodemanager.pmem-check-enabled': 'false',
'yarn.nodemanager.vmem-check-enabled': 'false',
'yarn.log-aggregation.retain-seconds': '12000',
'yarn.log-aggregation-enable': 'true',
'yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds': '3600',
'yarn.resourcemanager.scheduler.class': 'org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler'
Thanks!
You can utilize yarn.nodemanager.resource.memory-mb
The total amount of memory that YARN can use on a given node.
Example : If your machine is having 16 GB Ram,
and you set this property to 12GB , maximum 6 executors or drivers will launched (since you are using 2gb per executor/driver) and 4 GB will be free and can be used for background processes.
Option 1:
You can run your spark-submit in cluster mode instead of client mode. In that way your master will be always free to execute other work. You can choose a smaller master instance if you want to save cost.
Advantage: As the spark driver will be created on CORE, you can add auto-scaling to it. And you will be able to use 100% cluster resources. Read more here Spark yarn cluster vs client - how to choose which one to use?
Option 2:
You can create yarn queue, and submit memory heavy jobs to separate queue.
So let's say you configure 2 queue, Q1 & Q2. And you configured Q1 to take max 80% of total resources, and you submit normal jobs to Q2 as there is no max limit to it. But in case of memory heavy jobs you choose queue Q1.
Cloudera Blog
AWS Blog
Seeing your requirement I think Option 1 suits you better. And it's easy to implement, no infra change.
But with Option 2 when we did it in emr-5.26.0 we faced many challenges configuring yarn queue.
I was running an application on AWS EMR-Spark. Here, is the spark-submit job;-
Arguments : spark-submit --deploy-mode cluster --class com.amazon.JavaSparkPi s3://spark-config-test/SWALiveOrderModelSpark-1.0.assembly.jar s3://spark-config-test/2017-08-08
So, AWS uses YARN for resource management. I had a couple of doubts around this while I was observing the cloudwatch metrics :-
1)
What does container allocated imply here? I am using 1 master & 3 slave/executor nodes (all 4 are 8 cores CPU).
2)
I changed my query to:-
spark-submit --deploy-mode cluster --executor-cores 4 --class com.amazon.JavaSparkPi s3://spark-config-test/SWALiveOrderModelSpark-1.0.assembly.jar s3://spark-config-test/2017-08-08
Here the number of cores running is 3. Should it not be 3(number of executors)*4(number of cores) = 12?
1) Container allocated here basically represents the number of spark executors. Spark executor-cores are more like `executor-tasks meaning that you could have your app configured to run one executor per physical cpu and still ask it to have 3 executor-cores per cpu (think hyper-threading).
What happens by default on EMR, when you don't specify the number of spark-executors, is that dynamic allocation is assumed and Spark will only ask from YARN what it thinks it needs in terms of resources. Tried setting explicitly the number of executors to 10 and the containers allocated went upto 6 (max partitions of data). Also, under the tab "Application history", you can get a detailed view of YARN/Spark executors.
2) "cores" here refer to EMR core nodes and are not the same as spark executor cores. Same for "task" that in the monitoring tab refer to EMR task nodes. That is consistent with my setup, as I have 3 EMR slave nodes.
I created a spark cluster(learning so did not create high memory-cpu cluster) with 1 master node and 2 Core to run executors using below config
Master:Running1m4.large (2 Core , 8GB)
Core:Running2c4.large (2 core , 3.5 GB)
Hive 2.1.1, Pig 0.16.0, Hue 3.11.0, Spark 2.1.0, Sqoop 1.4.6, HBase 1.3.0
When pyspark is run getting below error
Required executor memory (1024+384 MB) is above the max threshold (896 MB) of this cluster! Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'.
Before trying to increase yarn-site.xml config , curious to understand why EMR is taking just 896MB as limit when master has 8GB and worker node has 3.5GB each.
And Resource manager URL (for master- http://master-public-dns-name:8088/) is showing 1.75 GB where as memory for vm is 8GB. Is hbase or other sws taking up too much memory?
If anyone encountered similar issue , please share your insight why it is EMR is setting low defaults. Thanks!
Before trying to increase yarn-site.xml config , curious to understand
why EMR is taking just 896MB as limit when master has 8GB and worker
node has 3.5GB each.
If you run spark jobs with yarn cluster mode (which you probably were using) , the executors will be run on core's and masters memory will not be used.
Now, all-though your CORE EC2 instance (c4.large) has 3.75 GB to use, EMR configures YARN not to use all this memory for running YARN containers or spark executors. This is because you gotta leave enough memory for other permanent daemons ( like HDFS's datanode , YARN's nodemanager , EMR's own daemons etc.. based on app's you provision)
EMR does publish this default YARN configuration it sets for all instance types on this page : http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hadoop-task-config.html
c4.large
Configuration Option Default Value
mapreduce.map.java.opts -Xmx717m
mapreduce.map.memory.mb 896
yarn.scheduler.maximum-allocation-mb 1792
yarn.nodemanager.resource.memory-mb 1792
So, yarn.nodemanager.resource.memory-mb = 1792, which means 1792 MB is the physical memory that will be allocated to YARN containers on that core node having 3.75 actual memory. Also, check spark-defaults.xml where EMR has some defaults for spark executor memory. These are default's and of course you can change those before starting cluster using EMR's configurations API . But keep in mind that if you over provision memory for YARN containers , you might starve some other processes.
Given that it is important to understand YARN configs and how SPARK interacts with YARN .
https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
http://spark.apache.org/docs/latest/running-on-yarn.html
http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
It's not really a property of EMR but rather of YARN, which is the resource manager running on EMR.
My personal take on YARN is that is really build for managing long running clusters that continuously take in a variety of jobs that it has to run simultaneously. In these cases it makes sense for YARN to only assign a small part of the available memory to each job.
Unfortunately, when it comes to specific-purpose clusters (like: "I will just spin up a cluster run my job and terminate the cluster again") these YARN-defaults are simply annoying, and you have to configure a bunch of stuff in order to make YARN utilise your resources optimally. But running on EMR it's what we are stuck with these days, so one has to live with that...
I am reading through literature about Spark & Resource Management i.e. Yarn in my case.
I think I understood the basic concept and how Yarn encapsulates Spark Master/Workers in containers.
Is there any point in still providing resource-parameters such as --driver-memory, --executor-memory or --number-executors? Shouldn't the Yarn-application-master(spark-master) figure out the demand and request accordingly new resources?
Or is it wise to interfere in the resource negotiation process by providing this parameters?
Spark needs to negotiate the resources from YARN. Providing the resource-parameters tells Spark how many resources to request from YARN.
For executors on YARN:
Spark applications use a fixed number of executors (default = 2).
The --num-executors flag for spark-submit, spark-shell, etc. sets the number of executors as expected.
For memory management on YARN:
Set the memory used by each executor using --executor-memory.
Setting --executor-cores tells Spark how many cores to claim from YARN.
Set the amount of memory for the driver process with --driver-memory.
Some general Spark-on-YARN notes:
Use the --queue option if your YARN cluster schedules application into queues.
Spark is optimized for in-memory computation, so ask YARN for a smaller number of memory-heavy executors (with multiple cores and more memory). Be careful if you have set memory caps within YARN.
The Spark on YARN Documentation has more details.