EMR Cluster utilization - apache-spark

I have a 20 mode c4.4xlarge cluster to run a spark job. Each node is a 16 vCore, 30 GiB memory, EBS only storage EBS Storage:32 GiB machine.
Since each node has 16 vCore, I understand that maximum number of executors are 16*20 > 320 executors. Total memory available is 20(#nodes)*30 ~ 600GB. Assigning 1/3rd to system operations, I have 400 GB of Memory to process my data in-memory. Is this the right understanding.
Also, Spark History shows non-uniform distribution of input and shuffle. I believe the processing is not distributed evenly across executors. I pass these config parameters in my spark-submit -
> —-conf spark.dynamicAllocation.enabled=true  —-conf spark.dynamicAllocation.minExecutors=20
Executor summary from spark history UI also shows that data distribution load is completely skewed, and I am not using the cluster in the best way. How can I distribute my load in a better way -

Related

Experiencing very Low CPU utilization of Spark Jobs on AWS EMR

we have a spark job that reads a csv file and applies a series of transformations, and writes the result in an orc file,
the spark job breaks into close to 20 stages and runs for around an hour
input csv file size: 10 GB
spark-submit job resource configuration:
driver-memory= 5 GB
num-executors= 2
executor-core= 3
executor-memory= 20 GB
EC2 instance type: r5d.xlarge i.e. 32GB Memory and 4 vCPU with attached 128 GB EBS volume
EMR Cluster comprises of 1 Master Node and 2 Core machines
when we run the spark job on the above cluster configurations, the cpu utilization is only close to 10-15%
our requirement is to maximize the cpu utilization of EC2 instances for my spark job.
Appreciate for any suggestion!
AFAIK if you increase the parllelism automatically CPU usage will increase try using these in your spark job configuration
num-executors= 4
executor-core= 5
executor-memory= 25 GB
specially if you increase cpu cores parllelism will increase..
more than 5 cores not recommended for each executor. This is based on a study where any application with more than 5 concurrent threads would start hampering the performance.
spark.dynamicAllocation.enabled could be another option.
spark.default.parallelism = 2 * number of CPUs in total on worker nodes
Make sure that you always use yarn mode
Follow Using maximizeResourceAllocation from aws docs there all these things are discussed in detail. Read it completely
You can configure your executors to utilize the maximum resources possible on each node in a cluster by using the spark configuration classification to set maximizeResourceAllocation option to true. This EMR-specific option calculates the maximum compute and memory resources available for an executor on an instance in the core instance group. It then sets the corresponding spark-defaults settings based on this information.
[
{
"Classification": "spark",
"Properties": {
"maximizeResourceAllocation": "true"
}
}
]
Further reading
Best practices for successfully managing memory for Apache Spark applications on Amazon EMR
EMR-spark-tuning-demystified

spark : HDFS blocks vs Cluster cores vs rdd Partitions

I do have a doubt on spark : HDFS blocks vs Cluster cores vs rdd Partitions.
Assume I am trying to process a file in HDFS (say block size is 64 MB and file is 6400 MB). So Ideally it do have 100 splits.
My cluster do have 200 cores in total , and I submitted the jobs with 25 Executors with 4 cores each (means 100 parallel tasks can run).
In nutshell I do have 100 partition by default in rdd and 100 cores will run.
Is this a good approach , or should I repartition the data to 200 partition and use all core in cluster ?
Since you have 200 cores in total, using all of them can improve the performance depending on what kind of workload you are running.
Configure your spark application to use 50 executor (i.e. all 200 cores can be used by Spark). Also Change your spark split size from 64 MB to 32 MB. This will make sure that 6400 MB file will be divided into 200 RDD partitions and so your entire cluster can be used by it.
Don't use repartition - it will be slow as it involves shuffle.

Spark Yarn Memory configuration

I have a spark application that keeps failing on error:
"Diagnostics: Container [pid=29328,containerID=container_e42_1512395822750_0026_02_000001] is running beyond physical memory limits. Current usage: 1.5 GB of 1.5 GB physical memory used; 2.3 GB of 3.1 GB virtual memory used. Killing container."
I saw lots of different parameters that was suggested to change to increase the physical memory. Can I please have the some explanation for the following parameters?
mapreduce.map.memory.mb (currently set to 0 so suppose to take the default which is 1GB so why we see it as 1.5 GB, changing it also dint effect the number)
mapreduce.reduce.memory.mb (currently set to 0 so suppose to take the default which is 1GB so why we see it as 1.5 GB, changing it also dint effect the number)
mapreduce.map.java.opts/mapreduce.reduce.java.opts set to 80% form the previous number
yarn.scheduler.minimum-allocation-mb=1GB (when changing this then I see the effect on the max physical memory, but for the value 1 GB it still 1.5G)
yarn.app.mapreduce.am.resource.mb/spark.yarn.executor.memoryOverhead can't find at all in configuration.
We are defining YARN (running with yarn-cluster deployment mode) using cloudera CDH 5.12.1.
spark.driver.memory
spark.executor.memory
These control the base amount of memory spark will try to allocate for it's driver and for all the executors. These are probably the ones you want to increase if you are running out of memory.
// options before Spark 2.3.0
spark.yarn.driver.memoryOverhead
spark.yarn.executor.memoryOverhead
// options after Spark 2.3.0
spark.driver.memoryOverhead
spark.executor.memoryOverhead
This value is an additional amount of memory to request when you are running Spark on yarn. It is intended to account extra RAM needed for the yarn container that is hosting your Spark Executors.
yarn.scheduler.minimum-allocation-mb
yarn.scheduler.maximum-allocation-mb
When Spark goes to ask Yarn to reserve a block of RAM for an executor, it will ask a value of the base memory plus the overhead memory. However, Yarn may not give it back one of exactly that size. These parameters control the smallest container size and the largest container size that YARN will grant. If you are only using the cluster for one job, I find it easiest to set these to very small and very large values and then using the spark memory settings mentions above to set the true container size.
mapreduce.map.memory.mb
mapreduce.map.memory.mb
mapreduce.map.java.opts/mapreduce.reduce.java.opts
I don't think these have any bearing on your Spark/Yarn job.

Spark + Elastic search write performance issue

Seeing low # of writes to elasticsearch using spark java.
Here are the Configurations
using 13.xlarge machines for ES cluster
4 instances each have 4 processors.
Set refresh interval to -1 and replications to '0' and other basic
configurations required for better writing.
Spark :
2 node EMR cluster with
2 Core instances
- 8 vCPU, 16 GiB memory, EBS only storage
- EBS Storage:1000 GiB
1 Master node
- 1 vCPU, 3.8 GiB memory, 410 SSD GB storage
ES index has 16 shards defined in mapping.
having below config when running job,
executor-memory - 8g
spark.executor.instances=2
spark.executor.cores=4
and using
es.batch.size.bytes - 6MB
es.batch.size.entries - 10000
es.batch.write.refresh - false
with this configuration, I try to load 1Million documents (each document has a size of 1300 Bytes) , so it does the load at 500 records/docs per ES nodes.
and in the spark log am seeing each task
-1116 bytes result sent to driver
Spark Code
JavaRDD<String> javaRDD = jsc.textFile("<S3 Path>");
JavaEsSpark.saveJsonToEs(javaRDD,"<Index name>");
Also when I look at the In-Network graph in ES cluster it is very low, and I see EMR is not sending huge data over a network. Is there a way I can tell Spark to send a right number of data to make write faster?
OR
Is there any other config that I am missing to tweak.
Cause I see 500docs per sec per es instance is lower. Can someone please guide what am missing with this settings to improve my es write performance
Thanks in advance
You may have an issue here.
spark.executor.instances=2
You are limited to two executors, where you could have 4 based on your cluster configuration. I would change this to 4 or greater. I might also try executor-memory = 1500M, cores=1, instances=16. I like to leave a little overhead in my memory, which is why I dropped from 2G to 1.5G(but you can't do 1.5G so we have to do 1500M). If you are connecting via your executors this will improve performance.
Would need some code to debug further. I wonder if you are connected to elastic search only in your driver, and not in your worker nodes. Meaning you are only getting one connection instead of one for each executor.

Spark on YARN resource manager: Relation between YARN Containers and Spark Executors

I'm new to Spark on YARN and don't understand the relation between the YARN Containers and the Spark Executors. I tried out the following configuration, based on the results of the yarn-utils.py script, that can be used to find optimal cluster configuration.
The Hadoop cluster (HDP 2.4) I'm working on:
1 Master Node:
CPU: 2 CPUs with 6 cores each = 12 cores
RAM: 64 GB
SSD: 2 x 512 GB
5 Slave Nodes:
CPU: 2 CPUs with 6 cores each = 12 cores
RAM: 64 GB
HDD: 4 x 3 TB = 12 TB
HBase is installed (this is one of the parameters for the script below)
So I ran python yarn-utils.py -c 12 -m 64 -d 4 -k True (c=cores, m=memory, d=hdds, k=hbase-installed) and got the following result:
Using cores=12 memory=64GB disks=4 hbase=True
Profile: cores=12 memory=49152MB reserved=16GB usableMem=48GB disks=4
Num Container=8
Container Ram=6144MB
Used Ram=48GB
Unused Ram=16GB
yarn.scheduler.minimum-allocation-mb=6144
yarn.scheduler.maximum-allocation-mb=49152
yarn.nodemanager.resource.memory-mb=49152
mapreduce.map.memory.mb=6144
mapreduce.map.java.opts=-Xmx4915m
mapreduce.reduce.memory.mb=6144
mapreduce.reduce.java.opts=-Xmx4915m
yarn.app.mapreduce.am.resource.mb=6144
yarn.app.mapreduce.am.command-opts=-Xmx4915m
mapreduce.task.io.sort.mb=2457
These settings I made via the Ambari interface and restarted the cluster. The values also match roughly what I calculated manually before.
I have now problems
to find the optimal settings for my spark-submit script
parameters --num-executors, --executor-cores & --executor-memory.
to get the relation between the YARN container and the Spark executors
to understand the hardware information in my Spark History UI (less memory shown as I set (when calculated to overall memory by multiplying with worker node amount))
to understand the concept of the vcores in YARN, here I couldn't find any useful examples yet
However, I found this post What is a container in YARN? , but this didn't really help as it doesn't describe the relation to the executors.
Can someone help to solve one or more of the questions?
I will report my insights here step by step:
First important thing is this fact (Source: this Cloudera documentation):
When running Spark on YARN, each Spark executor runs as a YARN container. [...]
This means the number of containers will always be the same as the executors created by a Spark application e.g. via --num-executors parameter in spark-submit.
Set by the yarn.scheduler.minimum-allocation-mb every container always allocates at least this amount of memory. This means if parameter --executor-memory is set to e.g. only 1g but yarn.scheduler.minimum-allocation-mb is e.g. 6g, the container is much bigger than needed by the Spark application.
The other way round, if the parameter --executor-memory is set to somthing higher than the yarn.scheduler.minimum-allocation-mb value, e.g. 12g, the Container will allocate more memory dynamically, but only if the requested amount of memory is smaller or equal to yarn.scheduler.maximum-allocation-mb value.
The value of yarn.nodemanager.resource.memory-mb determines, how much memory can be allocated in sum by all containers of one host!
=> So setting yarn.scheduler.minimum-allocation-mb allows you to run smaller containers e.g. for smaller executors (else it would be waste of memory).
=> Setting yarn.scheduler.maximum-allocation-mb to the maximum value (e.g. equal to yarn.nodemanager.resource.memory-mb) allows you to define bigger executors (more memory is allocated if needed, e.g. by --executor-memory parameter).

Resources