My Spark job fails with the following error:
java.lang.IllegalArgumentException: Required executor memory (33792 MB), offHeap memory (0) MB, overhead (8192 MB), and PySpark memory (0 MB)
is above the max threshold (24576 MB) of this cluster!
Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'.
I have defined executor memory to be 33g and executor memory overhead to be 8g. However, the total should be less than or equal to 24g as per the error log. Can someone help me understand what exactly does 24g refer to? Is it the RAM on the master node or something else? Why is it capped to 24g?
Once I figure it out, I can programmatically calculate my other values to not run into this issue again.
Setup: Running make command which houses multiple spark-submit commands on Jenkins which launches it on an AWS EMR cluster running Spark 3.x
This error is happening because you're requesting more resources than is available on the cluster (org.apache.spark.deploy.yarn.Client source). For your case specifically (AWS EMR), I think you should check the value of yarn.nodemanager.resource.memory-mb as message says (in yarn-site.xml or via NodeManager Web UI), and do not try to allocate more than this value per YARN container memory.
Related
We are running a Spark Streaming application on a Kubernetes cluster using spark 2.4.5.
The application is receiving massive amounts of data through a Kafka topic (one message each 3ms). 4 executors and 4 kafka partitions are being used.
While running, the memory of the driver pod keeps increasing until it is getting killed by K8s with an 'OOMKilled' status. The memory of executors is not facing any issues.
When checking the driver pod resources using this command :
kubectl top pod podName
We can see that the memory increases until it reaches 1.4GB, and the pod is getting killed.
However, when checking the storage memory of the driver on Spark UI, we can see that the storage memory is not fully used (50.3 KB / 434 MB). Is there any difference between the storage memory of the driver, and the memory of the pod containing the driver ?
Has anyone had experience with a similar issue before?
Any help would be appreciated.
Here are few more details about the app :
Kubernetes version : 1.18
Spark version : 2.4.5
Batch interval of spark streaming context : 5 sec
Rate of input data : 1 kafka message each 3 ms
Scala language
In brief, the Spark memory consists of three parts:
Reversed memory (300MB)
User memory ((all - 300MB)*0.4), used for data processing logic.
Spark memory ((all-300MB)*0.6(spark.memory.fraction)), used for cache and shuffle in Spark.
Besides this, there is also max(executor memory * 0.1, 384MB)(0.1 is spark.kubernetes.memoryOverheadFactor) extra memory used by non-JVM memory in K8s.
Adding executor memory limit by memory overhead in K8S should fix the OOM.
You can also decrease spark.memory.fraction to allocate more RAM to user memory.
I'm working on Azure databricks. My driver node and worker node specs are : 14.0 GB Memory, 4 Cores, 0.75 DBU Standard_DS3_v2.
My pyspark notebook fails with Java heap space error. I checked online and one suggestion was to increase driver memory. I'm trying to use following conf parameter in the notebook
spark.conf.get("spark.driver.memory")
To get driver memory. But my notebook cell fails with error.
java.util.NoSuchElementException: spark.driver.memory
Any idea how to check driver memory and change its value?
You can set the spark config when you setup your cluster on Databricks. When you create a cluster and expand the "Advanced Options"-menu, you can see that there is a "Spark Config" section. In this field you can set the configurations you want.
For more information you can always check the documentation page of Azure Databricks.
This should increase the memory usage of the cluster when hit the limit.
It's the same answer as above, but not as a photo
spark.executor.memory 19g
spark.driver.memory 19g
I have a cluster on EMR (emr-5.20.0) with a m5.2xlarge as Node Master, two m4.large as core and three m4.large as node workers. The sum of memory ram of this cluster is 62GB, but in the YARN UI the total memory displayed is 30GB.
Somebody can help me understand how this value is calculed ?
I have already check the configuration in Yarn-site.xml and spark-default.conf and them is configured according to the AWS recommendadion: https://docs.aws.amazon.com/pt_br/emr/latest/ReleaseGuide/emr-hadoop-task-config.html#emr-hadoop-task-config-m5
Every help is welcome
The memory settings in YARN can be configured using the below parameters of cluster:
yarn.nodemanager.resource.memory-mb
yarn.scheduler.minimum-allocation-mb
yarn.scheduler.increment-allocation-mb
yarn.scheduler.maximum-allocation-mb
My tweaking these parameters you can increase/decrease the total memory allocated to the cluster.
Yarn do not include the master node in it's available memory/cores.
So you should get roughly 5 x 8GB (m4.large). You will get less than that because there are memory overhead left for the OS and services.
I created a spark cluster(learning so did not create high memory-cpu cluster) with 1 master node and 2 Core to run executors using below config
Master:Running1m4.large (2 Core , 8GB)
Core:Running2c4.large (2 core , 3.5 GB)
Hive 2.1.1, Pig 0.16.0, Hue 3.11.0, Spark 2.1.0, Sqoop 1.4.6, HBase 1.3.0
When pyspark is run getting below error
Required executor memory (1024+384 MB) is above the max threshold (896 MB) of this cluster! Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'.
Before trying to increase yarn-site.xml config , curious to understand why EMR is taking just 896MB as limit when master has 8GB and worker node has 3.5GB each.
And Resource manager URL (for master- http://master-public-dns-name:8088/) is showing 1.75 GB where as memory for vm is 8GB. Is hbase or other sws taking up too much memory?
If anyone encountered similar issue , please share your insight why it is EMR is setting low defaults. Thanks!
Before trying to increase yarn-site.xml config , curious to understand
why EMR is taking just 896MB as limit when master has 8GB and worker
node has 3.5GB each.
If you run spark jobs with yarn cluster mode (which you probably were using) , the executors will be run on core's and masters memory will not be used.
Now, all-though your CORE EC2 instance (c4.large) has 3.75 GB to use, EMR configures YARN not to use all this memory for running YARN containers or spark executors. This is because you gotta leave enough memory for other permanent daemons ( like HDFS's datanode , YARN's nodemanager , EMR's own daemons etc.. based on app's you provision)
EMR does publish this default YARN configuration it sets for all instance types on this page : http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hadoop-task-config.html
c4.large
Configuration Option Default Value
mapreduce.map.java.opts -Xmx717m
mapreduce.map.memory.mb 896
yarn.scheduler.maximum-allocation-mb 1792
yarn.nodemanager.resource.memory-mb 1792
So, yarn.nodemanager.resource.memory-mb = 1792, which means 1792 MB is the physical memory that will be allocated to YARN containers on that core node having 3.75 actual memory. Also, check spark-defaults.xml where EMR has some defaults for spark executor memory. These are default's and of course you can change those before starting cluster using EMR's configurations API . But keep in mind that if you over provision memory for YARN containers , you might starve some other processes.
Given that it is important to understand YARN configs and how SPARK interacts with YARN .
https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
http://spark.apache.org/docs/latest/running-on-yarn.html
http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
It's not really a property of EMR but rather of YARN, which is the resource manager running on EMR.
My personal take on YARN is that is really build for managing long running clusters that continuously take in a variety of jobs that it has to run simultaneously. In these cases it makes sense for YARN to only assign a small part of the available memory to each job.
Unfortunately, when it comes to specific-purpose clusters (like: "I will just spin up a cluster run my job and terminate the cluster again") these YARN-defaults are simply annoying, and you have to configure a bunch of stuff in order to make YARN utilise your resources optimally. But running on EMR it's what we are stuck with these days, so one has to live with that...
I'm trying to benchmark a program on an Azure cluster using Spark. We previously ran this on EC2 and know that 150 GB of RAM is sufficient. I have tried multiple setups for the executors and given them 160-180GB of RAM but regardless of what I do, the program dies due to executors requesting more memory.
What can I do? Are there more launch options I should consider, I have tried every conceivable executor setup and nothing seems to want to work. I'm at a total loss.
For your command, you specified 7 executor and each with 40g of memory. That's 280G of memory in total, but you said your cluster has only 160-180 G of memory? If only 150G of memory is needed, why the spark-submit is configured that way?
What's your HDI cluster node type and how many of them you created?
Were you using YARN previously on EC2 as well? In that case, are the configuration the same?