Spark Job Fails on yarn with memory error - apache-spark

My spark job fails with following error :
Diagnostics: Container [pid=7277,containerID=container_1528934459854_1736_02_000001] is running beyond physical memory limits. Current usage: 1.4 GB of 1.4 GB physical memory used; 3.1 GB of 6.9 GB virtual memory used. Killing container.

Your containers are getting killed. This happens when your Yarn memory is not as much as required to perform the task. So, the possible solution is to increase Yarn memory.
You have 2 choices:
Either increase the current memory size of your node manager
Or assign a new Node manager on one more Datanode.
It will increase the Yarn Memory and make sure it's around 2 GB at least.

Related

Spark Insufficient Memory

My Spark job fails with the following error:
java.lang.IllegalArgumentException: Required executor memory (33792 MB), offHeap memory (0) MB, overhead (8192 MB), and PySpark memory (0 MB)
is above the max threshold (24576 MB) of this cluster!
Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'.
I have defined executor memory to be 33g and executor memory overhead to be 8g. However, the total should be less than or equal to 24g as per the error log. Can someone help me understand what exactly does 24g refer to? Is it the RAM on the master node or something else? Why is it capped to 24g?
Once I figure it out, I can programmatically calculate my other values to not run into this issue again.
Setup: Running make command which houses multiple spark-submit commands on Jenkins which launches it on an AWS EMR cluster running Spark 3.x
This error is happening because you're requesting more resources than is available on the cluster (org.apache.spark.deploy.yarn.Client source). For your case specifically (AWS EMR), I think you should check the value of yarn.nodemanager.resource.memory-mb as message says (in yarn-site.xml or via NodeManager Web UI), and do not try to allocate more than this value per YARN container memory.

Hadoop YARN Cluster / Spark and RAM Disks

Because my computational tasks require fast disk I/O, I am interested in mounting large RAM disks on each worker node in a YARN cluster that runs Spark, and am thus wondering how the YARN cluster manager handles the memory occupied by such a RAM disk.
If I were to allocate 32GB to a RAM disk on each 128GB RAM machine, for example, would the YARN cluster manager know how to allocate RAM so as to avoid over allocating memory when performing tasks (in this case, does YARN of RAM to the requisitioned tasks, or at most only 96GB)?
If so, is there any way to indicate to the YARN cluster manager that a RAM disk is present and so, a specific partition of the RAM is off limits to YARN? Will Spark know about these constraints either?
In Spark configurations you can set driver and executors configs like cores and memory allocation amount. Moreover, when you use yarn as the resource manager there is some extra configs supported by it you can help you to manage the cluster resources better. "spark.driver.memoryOverhead" or "spark.yarn.am.memoryOverhead" which is the amount of off-heap space with the default value of
AM memory * 0.10, with minimum of 384
for further information here is the link.

How to insert configuration in yarn-site.xml in EMR cluster

I am have a problem with:
running beyond physical memory limits. Current usage: 1.5 GB of 1.4 GB physical memory used; 3.4 GB of 6.9 GB virtual memory used. Killing container.
My cluster is: 4x c3.4xlarge(datanode) and m3.2xlarge(namenode), same my configuration I have only 1.4GB available.
and to resolve this point I Read in this site https://www.knowru.com/blog/first-3-frustrations-you-will-encounter-when-migrating-spark-applications-aws-emr/ and others sites, the point is change the yarn-site.xml and add this config yarn.nodemanager.vmem-check-enabled
But, when I change this config, save and restart the resourcemanager in EMR, this configuration not applied in configuration page(EMR namenode:8088/conf) and does not work, but config create by default to EMR accept changes.
how can i change my configuration with my cluster EMR running?
I've seen that this setting needs to be configured only in cluster creation, its really?
How can I trick this?
i was taking this error running beyond physical memory limits. Current usage: 1.5 GB of 1.4 GB physical memory used; 3.4 GB of 6.9 GB virtual memory used. Killing container because my spark-driver was going up with default configuration, i put this --driver-memory 5gconfig in my spark-submit jar and solve my problem.
It was only this in my case.

Spark worker dies after running for some duration

I am running spark streaming job.
My cluster config
Spark version - 1.6.1
spark node config
cores - 4
memory - 6.8 G (out of 8G)
number of nodes - 3
For my job I am giving 6GB memory per node and total cores - 3
After the job has been running for an hour , I am getting the following error on worker log
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f53b496a000, 262144, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 262144 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /usr/local/spark/sbin/hs_err_pid1622.log
Whereas I don't see any errors in my work-dir/app-id/stderr .
What is the xm* settings that is usually recommended for running spark worker ?
How to debug this issue further ?
PS: I started my worker and master with the default settings.
Update:
I see my executors are getting added and removed frequently because of the error "cannot allocate memory".
log:
16/06/24 12:53:47 INFO MemoryStore: Block broadcast_53 stored as values in memory (estimated size 14.3 KB, free 440.8 MB)
16/06/24 12:53:47 INFO BlockManager: Found block rdd_145_1 locally
16/06/24 12:53:47 INFO BlockManager: Found block rdd_145_0 locally
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f3440743000, 12288, 0) failed; error='Cannot allocate memory' (errno=12)
I have got the same situation.I find the reason in the official Document ,it said that:
In general, Spark can run well with anywhere from 8 GB to hundreds of gigabytes of memory per machine. In all cases, we recommend allocating only at most 75% of the memory for Spark; leave the rest for the operating system and buffer cache.
Your compute memory have 8GB and 6GB is for worker node.So,if the operating system used memory exceeding 2GB ,leave not enough memory for worker nodeļ¼Œthe worker will loss.
*just check how much memory the operating system will use,and allocate the rest memory for the worker node *

Spark streaming on yarn - Container running beyond physical memory limits

I'm running a spark streaming application on Yarn, It works well for several days and after that I encountered a problem, the error message from yarn list below:
Application application_1449727361299_0049 failed 2 times due to AM Container for appattempt_1449727361299_0049_000002 exited with exitCode: -104
For more detailed output, check application tracking page:https://sccsparkdev03:26001/cluster/app/application_1449727361299_0049Then, click on links to logs of each attempt.
Diagnostics: Container [pid=25317,containerID=container_1449727361299_0049_02_000001] is running beyond physical memory limits. Current usage: 3.5 GB of 3.5 GB physical memory used; 5.3 GB of 8.8 GB virtual memory used. Killing container.
And here is my memory configuration:
spark.driver.memory = 3g
spark.executor.memory = 3g
mapred.child.java.opts -Xms1024M -Xmx3584M
mapreduce.map.java.opts -Xmx2048M
mapreduce.map.memory.mb 4096
mapreduce.reduce.java.opts -Xmx3276M
mapreduce.reduce.memory.mb 4096
This OOM error is strange because I didn't maintain any data in memory since it's a streaming program, does anyone encountered the same question like it? Or who know what cause it?
Check the mem on the box/vm instance you're running it on. My guess is the host machine is red lining it.
...due to, it appears, over-allocating memory.
Where do you think the streaming gets executed? Regardless of whether you store anything there? Yup. memory. Not cats or dancing Viking either (add "e").
Guess what? You're allocating 7 GB of memory that is heavily weighted towards physical over virtual mem.
Check your logging, as that would have similar build up time.
What's spark.yarn.am.memory value?
Get your VM and container memory allocation in balance :)
Another thought is to adjust memoryOverhead so as physical & virtual can be more proportional

Resources