I am have a problem with:
running beyond physical memory limits. Current usage: 1.5 GB of 1.4 GB physical memory used; 3.4 GB of 6.9 GB virtual memory used. Killing container.
My cluster is: 4x c3.4xlarge(datanode) and m3.2xlarge(namenode), same my configuration I have only 1.4GB available.
and to resolve this point I Read in this site https://www.knowru.com/blog/first-3-frustrations-you-will-encounter-when-migrating-spark-applications-aws-emr/ and others sites, the point is change the yarn-site.xml and add this config yarn.nodemanager.vmem-check-enabled
But, when I change this config, save and restart the resourcemanager in EMR, this configuration not applied in configuration page(EMR namenode:8088/conf) and does not work, but config create by default to EMR accept changes.
how can i change my configuration with my cluster EMR running?
I've seen that this setting needs to be configured only in cluster creation, its really?
How can I trick this?
i was taking this error running beyond physical memory limits. Current usage: 1.5 GB of 1.4 GB physical memory used; 3.4 GB of 6.9 GB virtual memory used. Killing container because my spark-driver was going up with default configuration, i put this --driver-memory 5gconfig in my spark-submit jar and solve my problem.
It was only this in my case.
Related
My spark job fails with following error :
Diagnostics: Container [pid=7277,containerID=container_1528934459854_1736_02_000001] is running beyond physical memory limits. Current usage: 1.4 GB of 1.4 GB physical memory used; 3.1 GB of 6.9 GB virtual memory used. Killing container.
Your containers are getting killed. This happens when your Yarn memory is not as much as required to perform the task. So, the possible solution is to increase Yarn memory.
You have 2 choices:
Either increase the current memory size of your node manager
Or assign a new Node manager on one more Datanode.
It will increase the Yarn Memory and make sure it's around 2 GB at least.
I created a spark cluster(learning so did not create high memory-cpu cluster) with 1 master node and 2 Core to run executors using below config
Master:Running1m4.large (2 Core , 8GB)
Core:Running2c4.large (2 core , 3.5 GB)
Hive 2.1.1, Pig 0.16.0, Hue 3.11.0, Spark 2.1.0, Sqoop 1.4.6, HBase 1.3.0
When pyspark is run getting below error
Required executor memory (1024+384 MB) is above the max threshold (896 MB) of this cluster! Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'.
Before trying to increase yarn-site.xml config , curious to understand why EMR is taking just 896MB as limit when master has 8GB and worker node has 3.5GB each.
And Resource manager URL (for master- http://master-public-dns-name:8088/) is showing 1.75 GB where as memory for vm is 8GB. Is hbase or other sws taking up too much memory?
If anyone encountered similar issue , please share your insight why it is EMR is setting low defaults. Thanks!
Before trying to increase yarn-site.xml config , curious to understand
why EMR is taking just 896MB as limit when master has 8GB and worker
node has 3.5GB each.
If you run spark jobs with yarn cluster mode (which you probably were using) , the executors will be run on core's and masters memory will not be used.
Now, all-though your CORE EC2 instance (c4.large) has 3.75 GB to use, EMR configures YARN not to use all this memory for running YARN containers or spark executors. This is because you gotta leave enough memory for other permanent daemons ( like HDFS's datanode , YARN's nodemanager , EMR's own daemons etc.. based on app's you provision)
EMR does publish this default YARN configuration it sets for all instance types on this page : http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hadoop-task-config.html
c4.large
Configuration Option Default Value
mapreduce.map.java.opts -Xmx717m
mapreduce.map.memory.mb 896
yarn.scheduler.maximum-allocation-mb 1792
yarn.nodemanager.resource.memory-mb 1792
So, yarn.nodemanager.resource.memory-mb = 1792, which means 1792 MB is the physical memory that will be allocated to YARN containers on that core node having 3.75 actual memory. Also, check spark-defaults.xml where EMR has some defaults for spark executor memory. These are default's and of course you can change those before starting cluster using EMR's configurations API . But keep in mind that if you over provision memory for YARN containers , you might starve some other processes.
Given that it is important to understand YARN configs and how SPARK interacts with YARN .
https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
http://spark.apache.org/docs/latest/running-on-yarn.html
http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
It's not really a property of EMR but rather of YARN, which is the resource manager running on EMR.
My personal take on YARN is that is really build for managing long running clusters that continuously take in a variety of jobs that it has to run simultaneously. In these cases it makes sense for YARN to only assign a small part of the available memory to each job.
Unfortunately, when it comes to specific-purpose clusters (like: "I will just spin up a cluster run my job and terminate the cluster again") these YARN-defaults are simply annoying, and you have to configure a bunch of stuff in order to make YARN utilise your resources optimally. But running on EMR it's what we are stuck with these days, so one has to live with that...
I'm running a spark streaming application on Yarn, It works well for several days and after that I encountered a problem, the error message from yarn list below:
Application application_1449727361299_0049 failed 2 times due to AM Container for appattempt_1449727361299_0049_000002 exited with exitCode: -104
For more detailed output, check application tracking page:https://sccsparkdev03:26001/cluster/app/application_1449727361299_0049Then, click on links to logs of each attempt.
Diagnostics: Container [pid=25317,containerID=container_1449727361299_0049_02_000001] is running beyond physical memory limits. Current usage: 3.5 GB of 3.5 GB physical memory used; 5.3 GB of 8.8 GB virtual memory used. Killing container.
And here is my memory configuration:
spark.driver.memory = 3g
spark.executor.memory = 3g
mapred.child.java.opts -Xms1024M -Xmx3584M
mapreduce.map.java.opts -Xmx2048M
mapreduce.map.memory.mb 4096
mapreduce.reduce.java.opts -Xmx3276M
mapreduce.reduce.memory.mb 4096
This OOM error is strange because I didn't maintain any data in memory since it's a streaming program, does anyone encountered the same question like it? Or who know what cause it?
Check the mem on the box/vm instance you're running it on. My guess is the host machine is red lining it.
...due to, it appears, over-allocating memory.
Where do you think the streaming gets executed? Regardless of whether you store anything there? Yup. memory. Not cats or dancing Viking either (add "e").
Guess what? You're allocating 7 GB of memory that is heavily weighted towards physical over virtual mem.
Check your logging, as that would have similar build up time.
What's spark.yarn.am.memory value?
Get your VM and container memory allocation in balance :)
Another thought is to adjust memoryOverhead so as physical & virtual can be more proportional
I've installed Apache Spark 1.5.2 (for Hadoop 2.6+). My cluster contains of the following hardware:
Master: 12 CPU Cores & 128 GB RAM
Slave1: 12 CPU Cores & 64 GB RAM
Slave2: 6 CPU Cores & 64 GB RAM
Actually my slaves file has the two entries:
slave1_ip
slave2_ip
Because my master also has a very "strong" hardware, it wouldn't be used to capacity only by the master threads. So I wanted to ask whether it is possible to provide some of the CPU cores and the RAM from the master machine to a third worker instance...? Thank you!
FIRST ATTEMPT TO SOLVE THE PROBLEM
After Jacek Laskowski's answer I set the following settings:
spark-defaults.conf (only on Master machine):
spark.driver.cores=2
spark.driver.memory=4g
spark-env.sh (on Master):
SPARK_WORKER_CORES=10
SPARK_WORKER_MEMORY=120g
spark-env.sh (on Slave1):
SPARK_WORKER_CORES=12
SPARK_WORKER_MEMORY=60g
spark-env.sh (on Slave2):
SPARK_WORKER_CORES=6
SPARK_WORKER_MEMORY=60g
I also added the master's ip address to the slaves file.
The cluster now contains of 3 worker nodes (slaves + master), that's perfect.
BUT: The web UI shows that there're only 1024m of RAM per node, see Screenshot:
Can someone say how to fix this? Setting spark.executor.memory will set the same amount of RAM for each machine, which wouldn't be optimal to use as much RAM as possible...! What am I doing wrong? Thank you!
It's possible. Just limit the number of cores and memory used by the master and run one or more workers on the machine.
Use conf/spark-defaults.conf where you can set up spark.driver.memory and spark.driver.cores. Consult Spark Configuration.
You should however use conf/spark-env.sh to set up more than one instance per node using SPARK_WORKER_INSTANCES. Include the other settings as follows:
SPARK_WORKER_INSTANCES=2
SPARK_WORKER_CORES=2
SPARK_WORKER_MEMORY=2g
You may also want to set up the number of RAM for executors (per worker) using spark.executor.memory or SPARK_EXECUTOR_MEMORY (as depicted in the following screenshot).
In spark standalone cluster manager you should put all conf file same like spark-env.sh is same in master and worker so it cant match the configuration and set default memory for worker its 1g
spark-defaults.conf (only on Master machine):
spark.driver.cores=2
spark.driver.memory=4g
spark-env.sh (on Master)
SPARK_WORKER_CORES=10
SPARK_WORKER_MEMORY=60g
spark-env.sh (on Slave1):
SPARK_WORKER_CORES=10
SPARK_WORKER_MEMORY=60g
spark-env.sh (on Slave2):
SPARK_WORKER_CORES=10
SPARK_WORKER_MEMORY=60g
and in slaves.conf on each machine as below
masterip
slave1ip
slave2ip
after above configuration you have 3 workers one on master machine and 2 other on node and your driver is also on master machine.
But we careful you are giving lot of configuration for memory and core if your machines are small resource manager cant allocate resources.
I know this is a very old post, but why wouldn't you set the property spark.executor.memory in spark-default.xml? (OR --executor-memory)
Note this value is 1024MB by default and that is what you seem to be encountering.
The thing is executor.memory is defined at the application level and not at the node level, so there doesnt seem to be a way to start the executors with different cores/memory on diff nodes.
I have created a Spark cluster of 8 machines. Each machine have 104 GB of RAM and 16 virtual cores.
I seems that Spark only sees 42 GB of RAM per machine which is not correct. Do you know why Spark does not see all the RAM of the machines?
PS : I am using Apache Spark 1.2
Seems like a common misconception. What is displayed is the spark.storage.memoryFraction :
https://stackoverflow.com/a/28363743/4278362
Spark makes no attempt at guessing the available memory. Executors use as much memory as you specify with the spark.executor.memory setting. Looks like it's set to 42 GB.