Heap dump genrated for the process but process not crashed. what could be the reason of heap dump genration and how can we identify it ?
Below are the jvm arguments.
-XX:CompressedClassSpaceSize=528482304 -XX:+HeapDumpOnOutOfMemoryError -XX:InitialHeapSize=1977595776 -XX:MaxHeapSize=4294967296 -XX:MaxMetaspaceSize=536870912 -XX:MetaspaceSize=268435456 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC -XX:-UsePerfData -XX:+UseStringDeduplication
we need to idenify why heap dump is genrated without the OOM ?
Related
I am using CentOS Linux release 7.9.2009 (Core) and this is the GC log file configuration
JAVA_OPTS_GC="-XX:+UseG1GC -XX:+DisableExplicitGC -Xmx8192m -Xms8192m -Xss512k -XX:MaxGCPauseMillis=100 -XX:ParallelGCThreads=5 -XX:+UnlockCommercialFeatures -verbose:gc -XX:+PrintGCCause -XX:+PrintAdaptiveSizePolicy -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=3 -XX:GCLogFileSize=50M -Xloggc:absolute_path"
and passing above in java jar command , JDK version used is 8, please share your thoughts on this
I am trying to use the G1GC garbage collector for spark job but I get a
Error: Invalid argument to --conf: -XX:+UseG1GC
I tried using these options but haven't been able to get it working
spark-submit --master spark://192.168.60.20:7077 --conf -XX:+UseG1GC /appdata/bblite-codebase/test.py
and
spark-submit --master spark://192.168.60.20:7077 -XX:+UseG1GC /appdata/bblite-codebase/test.py
What is the correct way to call a G1GC collector from spark?
JVM options should be passed as spark.executor.extraJavaOptions / spark.driver.extraJavaOptions, ie.
--conf "spark.executor.extraJavaOptions=-XX:+UseG1GC"
This is how you can configure garbage collection setting in both driver and executor.
spark-submit --master spark://192.168.60.20:7077 \
--conf "spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps" \
--conf "spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps" \
/appdata/bblite-codebase/test.py
Starting with Spark 2.4.3, this will not work for the driver extraJavaOptions, which will produce an error of
Conflicting collector combinations in option list; please refer to the release notes for the combinations allowed
This is because the default spark-defaults.conf includes
spark.executor.defaultJavaOptions -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p' -XX:+UseParallelGC -XX:InitiatingHeapOccupancyPercent=70
spark.driver.defaultJavaOptions -XX:OnOutOfMemoryError='kill -9 %p' -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled
which already includes a GC setting, and setting two GC options causes it to complain with this error. So you may need:
--conf "spark.executor.defaultJavaOptions=-XX:+UseG1GC"
--conf "spark.driver.defaultJavaOptions=-XX:+UseG1GC"
and also adding other defaults you'd like to propagate over.
Alternatively, you can edit the defaults in spark-defaults.conf to remove the GC defaults for driver/executor and force it to be specified in extraJavaOptions, depending on your use cases.
My spark application reads 3 files of 7 MB , 40 MB ,100MB and so many transformations and store multiple directories
Spark version CDH1.5
MASTER_URL=yarn-cluster
NUM_EXECUTORS=15
EXECUTOR_MEMORY=4G
EXECUTOR_CORES=6
DRIVER_MEMORY=3G
My spark job was running for some time and then it throws the below error message and restarts from begining
18/03/27 18:59:44 INFO avro.AvroRelation: using snappy for Avro output
18/03/27 18:59:47 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
18/03/27 18:59:47 INFO CuratorFrameworkSingleton: Closing ZooKeeper client.
Once it restarted again it ran for sometime and failed with this error
Application application_1521733534016_7233 failed 2 times due to AM Container for appattempt_1521733534016_7233_000002 exited with exitCode: -104
For more detailed output, check application tracking page:http://entline.com:8088/proxy/application_1521733534016_7233/Then, click on links to logs of each attempt.
Diagnostics: Container [pid=52716,containerID=container_e98_1521733534016_7233_02_000001] is running beyond physical memory limits. Current usage: 3.5 GB of 3.5 GB physical memory used; 4.3 GB of 7.3 GB virtual memory used. Killing container.
Dump of the process-tree for container_e98_1521733534016_7233_02_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 52720 52716 52716 52716 (java) 89736 8182 4495249408 923677 /usr/java/jdk1.7.0_67-cloudera/bin/java -server -Xmx3072m -Djava.io.tmpdir=/apps/hadoop/data04/yarn/nm/usercache/bdbuild/appcache/application_1521733534016_7233/container_e98_1521733534016_7233_02_000001/tmp -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1521733534016_7233/container_e98_1521733534016_7233_02_000001 -XX:MaxPermSize=256m org.apache.spark.deploy.yarn.ApplicationMaster --class com.sky.ids.dovetail.asrun.etl.DovetailAsRunETLMain --jar file:/apps/projects/dovetail_asrun_etl/jars/EntLine-1.0-SNAPSHOT-jar-with-dependencies.jar --arg --app.conf.path --arg application.conf --arg --run_type --arg AUTO --arg --bus_date --arg 2018-03-27 --arg --code_base_id --arg EntLine-1.0-SNAPSHOT --executor-memory 4096m --executor-cores 6 --properties-file /apps/hadoop/data04/yarn/nm/usercache/bdbuild/appcache/application_1521733534016_7233/container_e98_1521733534016_7233_02_000001/__spark_conf__/__spark_conf__.properties
|- 52716 52714 52716 52716 (bash) 2 0 108998656 389 /bin/bash -c LD_LIBRARY_PATH=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/hadoop/../../../CDH-5.5.1-1.cdh5.5.1.p0.11/lib/hadoop/lib/native: /usr/java/jdk1.7.0_67-cloudera/bin/java -server -Xmx3072m -Djava.io.tmpdir=/apps/hadoop/data04/yarn/nm/usercache/bdbuild/appcache/application_1521733534016_7233/container_e98_1521733534016_7233_02_000001/tmp -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1521733534016_7233/container_e98_1521733534016_7233_02_000001 -XX:MaxPermSize=256m org.apache.spark.deploy.yarn.ApplicationMaster --class 'com.sky.ids.dovetail.asrun.etl.DovetailAsRunETLMain' --jar file:/apps/projects/dovetail_asrun_etl/jars/EntLine-1.0-SNAPSHOT-jar-with-dependencies.jar --arg '--app.conf.path' --arg 'application.conf' --arg '--run_type' --arg 'AUTO' --arg '--bus_date' --arg '2018-03-27' --arg '--code_base_id' --arg 'EntLine-1.0-SNAPSHOT' --executor-memory 4096m --executor-cores 6 --properties-file /apps/hadoop/data04/yarn/nm/usercache/bdbuild/appcache/application_1521733534016_7233/container_e98_1521733534016_7233_02_000001/__spark_conf__/__spark_conf__.properties 1> /var/log/hadoop-yarn/container/application_1521733534016_7233/container_e98_1521733534016_7233_02_000001/stdout 2> /var/log/hadoop-yarn/container/application_1521733534016_7233/container_e98_1521733534016_7233_02_000001/stderr
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Failing this attempt. Failing the application.
As per my CDH
Container Memory[Amount of physical memory, in MiB, that can be allocated for containers]
yarn.nodemanager.resource.memory-mb 50655 MiB
Please see the containers running in my driver node
Why are there many containers running in one node .
I know that container_e98_1521733534016_7880_02_000001 is for my spark application . I don't know about other containers ? Any idea on that ?
Also I see that physical memory for container_e98_1521733534016_7880_02_000001 is 3584 which is close to 3.5 GB
What does this error mean? Whe it usally occurs?
What is 3.5 GB of 3.5 GB physical memory? Is it driver memory?
Could some one help me on this issue?
container_e98_1521733534016_7233_02_000001 is the first container started and given MASTER_URL=yarn-cluster that's not only the ApplicationMaster, but also the driver of the Spark application.
It appears that the memory setting for the driver, i.e. DRIVER_MEMORY=3G, is too low and you have to bump it up.
Spark on YARN runs two executors by default (see --num-executors) and so you'll end up with 3 YARN containers with 000001 for the ApplicationMaster (perhaps with the driver) and 000002 and 000003 for the two executors.
What is 3.5 GB of 3.5 GB physical memory? Is it driver memory?
Since you use yarn-cluster the driver, the ApplicationMaster and container_e98_1521733534016_7233_02_000001 are all the same and live in the same JVM. That gives that the error is about how much memory you assigned to the driver.
My understanding is that you gave DRIVER_MEMORY=3G which happened to have been too little for your processing and once YARN figured it out killed the driver (and hence the entire Spark application as it's not possible to have a Spark application up and running without the driver).
See the document Running Spark on YARN.
A small addition to what #Jacek already wrote to answer the question
why you get 3.5GB instead of 3GB?
is that apart the DRIVER_MEMORY=3G you need to consider spark.driver.memoryOverhead which can be calculated as MIN(DRIVER_MEMORY * 0.10, 384)MB = 384MB + 3GB ~ 3.5GB
I have 32 GB of physical memoryand my input file size about 30 MB, I try to submit my spark job in yarn client mode using the below command
spark-submit --master yarn --packages com.databricks:spark-xml_2.10:0.4.1 --driver-memory 8g ericsson_xml_parsing_version_6_stage1.py
and my executor space is 8g, but get the below error anyone please help me to configure the java heap memory. I read about the --driver-java-options using command line but I don't know how to set java heap space using this option.
Anyone please help me out.
java.lang.OutOfMemoryError: Java heap space
enter image description here
Did you try to configure executor memory as well?
like this: "--executor-memory 8g"
I encounter weird issue this with jboss as 7 running with jvm 1.7._029. When I turned OFF following from standalone.conf, jboss takes 7x longer to deploy as well as 7x times longer to serve request.
JAVA_OPTS="$JAVA_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n"
Request that takes normally 2sec now takes around 15 - 20 sec! If I turned on above option than both jboss deployment and request serving becomes faster. We usually have it turned on in Dev so we never experienced it until we turned it off in staging.
Xms and Xmx settings have no effect on above obervations.
I did some profiling with VisualVM. Only a single instance of jboss and only one client thread making request. Heap profiling indicates that when debugger is turned on It preallocates good chunk of heap out of Xms setting upon boot. i.e lets say Xms is 1g then it says "Heap Size" 1G and "Used Heap" 750mb. From then on it starts jboss quickly and it flacuate very little when request is made sequentially.
While running with debugger turned off, initial "Heap Size" is still at 1G but "Used Heap" is pretty low and then it tries to allocate more memory midway during deployment and eventually after a while it completes deployment. Upon new request it does same thing again. i.e allocate and de-allocates memory at runtime.
java -XX:+PrintCommandLineFlags -XX:+PrintGCDetails -version
-XX:InitialHeapSize=268435456 -XX:MaxHeapSize=4294967296 -XX:+PrintCommandLineFlags -XX:+PrintGCDetails -XX:+UseCompressedOops -XX:+UseParallelGC
java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
Heap
PSYoungGen total 76800K, used 2642K [0x00000007aaa80000, 0x00000007b0000000, 0x0000000800000000)
eden space 66048K, 4% used [0x00000007aaa80000,0x00000007aad14878,0x00000007aeb00000)
from space 10752K, 0% used [0x00000007af580000,0x00000007af580000,0x00000007b0000000)
to space 10752K, 0% used [0x00000007aeb00000,0x00000007aeb00000,0x00000007af580000)
ParOldGen total 174592K, used 0K [0x0000000700000000, 0x000000070aa80000, 0x00000007aaa80000)
object space 174592K, 0% used [0x0000000700000000,0x0000000700000000,0x000000070aa80000)
PSPermGen total 21504K, used 2119K [0x00000006fae00000, 0x00000006fc300000, 0x0000000700000000)
object space 21504K, 9% used [0x00000006fae00000,0x00000006fb011da0,0x00000006fc300000)
JBoss Process:
/usr/bin/java -D[Standalone] -server -XX:+UseCompressedOops -XX:+**TieredCompilation** -Dprogram.name=standalone.sh -Djava.net.preferIPv4Stack=true -Dorg.jboss.resolver.warning=true -Djboss.modules.system.pkgs=org.jboss.byteman -Djava.awt.headless=true -Djboss.server.default.config=standalone.xml -Xdebug -Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n -Dcom.sun.management.jmxremote.port=11090 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Djava.util.logging.manager=org.jboss.logmanager.LogManager -Xms1024m -Xmx2048m -XX:MaxPermSize=512M -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000