Spark Streaming Driver and App work files cleanup - apache-spark

I am running spark 2.0.2 and deployed streaming job in cluster deploy-mode on a spark standalone cluster. The streaming job works fine but there is an issue with the application's and driver's stderr files that are created in the work directory of SPARK_HOME. As the streaming is always running, these files only grow in size and I have no clue how to control it.
I have tried the following solutions even though they are not exactly related to the problem in hand but I still tried and didn't work:
Apache Spark does not delete temporary directories
How to log using log4j to local file system inside a Spark application that runs on YARN?
Can anyone please help me how to limit the size of these files being created?
P.S: I have tried the solution of adding the below line in conf/spark-env.sh and restarting the cluster but it didn't work in case of running streaming application.
export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=60 -Dspark.worker.cleanup.appDataTtl=60"
EDIT:
#YuvalItzchakov I have tried your suggestion but it didn't work. The driver's stderr log is as below:
Launch Command: "/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java" "-cp" "/mnt/spark2.0.2/conf/:/mnt/spark2.0.2/jars/*" "-Xmx2048M" "-Dspark.eventLog.enabled=true" "-Dspark.eventLog.dir=/mnt/spark2.0.2/JobsLogs" "-Dspark.executor.memory=2g" "-Dspark.deploy.defaultCores=2" "-Dspark.io.compression.codec=snappy" "-Dspark.submit.deployMode=cluster" "-Dspark.shuffle.consolidateFiles=true" "-Dspark.shuffle.compress=true" "-Dspark.app.name=Streamingjob" "-Dspark.kryoserializer.buffer.max=128M" "-Dspark.master=spark://172.16.0.27:7077" "-Dspark.shuffle.spill.compress=true" "-Dspark.serializer=org.apache.spark.serializer.KryoSerializer" "-Dspark.cassandra.input.fetch.size_in_rows=20000" "-Dspark.executor.extraJavaOptions=-Dlog4j.configuration=file:///mnt/spark2.0.2/sparkjars/log4j.xml" "-Dspark.jars=file:/mnt/spark2.0.2/sparkjars/StreamingJob-assembly-0.1.0.jar" "-Dspark.executor.instances=10" "-Dspark.driver.extraJavaOptions=-Dlog4j.configuration=file:///mnt/spark2.0.2/sparkjars/log4j.xml" "-Dspark.driver.memory=2g" "-Dspark.rpc.askTimeout=10" "-Dspark.eventLog.compress=true" "-Dspark.executor.cores=1" "-Dspark.driver.supervise=true" "-Dspark.history.fs.logDirectory=/mnt/spark2.0.2/JobsLogs" "-Dlog4j.configuration=file:///mnt/spark2.0.2/sparkjars/log4j.xml" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker#172.16.0.29:34475" "/mnt/spark2.0.2/work/driver-20170210124424-0001/StreamingJob-assembly-0.1.0.jar" "Streamingjob"
========================================
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/02/10 12:44:26 INFO SecurityManager: Changing view acls to: cassuser
17/02/10 12:44:26 INFO SecurityManager: Changing modify acls to: cassuser
17/02/10 12:44:26 INFO SecurityManager: Changing view acls groups to:
17/02/10 12:44:26 INFO SecurityManager: Changing modify acls groups to:
And my log4j.xml file looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd" >
<log4j:configuration>
<appender name="stdout" class="org.apache.log4j.RollingFileAppender">
<param name="threshold" value="TRACE"/>
<param name="File" value="stdout"/>
<param name="maxFileSize" value="1MB"/>
<param name="maxBackupIndex" value="10"/>
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n"/>
</layout>
<filter class="org.apache.log4j.varia.LevelRangeFilter">
<param name="levelMin" value="ALL" />
<param name="levelMax" value="OFF" />
</filter>
</appender>
<appender name="stderr" class="org.apache.log4j.RollingFileAppender">
<param name="threshold" value="WARN"/>
<param name="File" value="stderr"/>
<param name="maxFileSize" value="1MB"/>
<param name="maxBackupIndex" value="10"/>
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n"/>
</layout>
</appender>
</log4j:configuration>
Note that I have removed this root tag from your xml in the answer as it gives some error:
<root>
<appender-ref ref="console"/>
</root>

You can use a custom log4j xml file for that.
First, declare your XML file:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd" >
<log4j:configuration>
<appender name="stdout" class="org.apache.log4j.RollingFileAppender">
<param name="threshold" value="TRACE"/>
<param name="File" value="stdout"/>
<param name="maxFileSize" value="50MB"/>
<param name="maxBackupIndex" value="100"/>
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n"/>
</layout>
<filter class="org.apache.log4j.varia.LevelRangeFilter">
<param name="levelMin" value="ALL" />
<param name="levelMax" value="OFF" />
</filter>
</appender>
<appender name="stderr" class="org.apache.log4j.RollingFileAppender">
<param name="threshold" value="WARN"/>
<param name="File" value="stderr"/>
<param name="maxFileSize" value="50MB"/>
<param name="maxBackupIndex" value="100"/>
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n"/>
</layout>
</appender>
<root>
<appender-ref ref="console"/>
</root>
</log4j:configuration>
Then, when you run your streaming job, you need to pass the log4j.xml file to Spark master and workers via extraJavaOptions:
spark-submit \
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:///path/to/log4j.xml \
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///path/to/log4j.xml
Note that the path on the master and worker node maybe different, depending on how you deploy your JAR and files to Spark. You said you're using cluster mode so I assume you're manually dispatching the JAR and extra files, but for anyone running this in client mode, you'll need to also add the xml files via the --files flag.

Related

Log4j.xml smtpappender emailthrottle

I have a log4J.xml SMTPAppender configuration as follows:
<appender name="MAIL" class="org.apache.log4j.net.SMTPAppender">
<param name="Threshold" value="ERROR"/>
<param name="EvaluatorClass" value="fi.reaktor.log4j.emailthrottle.ErrorEmailThrottle"/>
<param name="BufferSize" value="512"/>
<param name="SMTPHost" value="xxxx"/>
<param name="SMTPPort" value="25"/>
<param name="From" value="xxxx"/>
<param name="To" value="xxx"/>
<param name="Subject" value="xxx"/>
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d{dd/MM/yyyy HH:mm:ss} [%-5p] [%c{1}: %M] %m%n"/>
</layout>
</appender>
I use an EvaluatorClass that I was advised on the link: https://github.com/reaktor/log4j-email-throttle
On the page, it was marked that you can change the default configuration in a log4j.properties file:
fi.reaktor.log4j.emailthrottle.throttleIfUnderSecs=60
fi.reaktor.log4j.emailthrottle.emailIntervalInSecs=900
fi.reaktor.log4j.emailthrottle.normalAfterSecs=3600
Unfortunately, I do not see how to apply it in my Log4j.xml file.
The page you refer to says:
You can change default values by setting these System properties
So you can't put those settings in log4j config file.
You either need to set those properties on the command line where you start jvm, with "-D" flag:
java -Dfi.reaktor.log4j.emailthrottle.throttleIfUnderSecs=60 \
-Dfi.reaktor.log4j.emailthrottle.emailIntervalInSecs=900 \
-Dfi.reaktor.log4j.emailthrottle.normalAfterSecs=3600
or set it from your code programmatically:
System.setProperty("fi.reaktor.log4j.emailthrottle.throttleIfUnderSecs", "60");
System.setProperty("fi.reaktor.log4j.emailthrottle.emailIntervalInSecs", "900");
System.setProperty("fi.reaktor.log4j.emailthrottle.normalAfterSecs", "3600");

Log4j RollingFileAppender overwrites the old files

My appender is defined as below:
<appender name="M_FILE" class="org.jboss.logging.appender.RollingFileAppender">
<errorHandler class="org.jboss.logging.util.OnlyOnceErrorHandler"/>
<param name="File" value="${jboss.server.home.dir}/log/m_ser.log"/>
<param name="Append" value="false" />
<param name="MaxFileSize" value="5MB"/>
<!--param name="MaxBackupIndex" value="25"/-->
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d{ABSOLUTE} %-5p [%c] %m%n"/>
</layout>
</appender>
<root>
<appender-ref ref="M_FILE"/>
</root>
With this, I get m_ser.log upto 5MB, then it created m_ser.log.1, then it it created m_ser.log.2 and m_ser.log.1 file is missing. After a while, m_ser.log.3 got created and m_se.log.2 is missing.
It looks like the log4j is overwriting the backup files or it is unable to keep the old file.
This is log4j 1.2, Windows 7 system and JBoss 4.1
The MaxBackupIndex parameter should define how many backup files (.1,.2,.3, etc) are kept before it starts overwriting them.

Configure common log4j xml for multiple sub projects

I am trying to use a common log4j xml for subprojects in tomcat. There is a Parent project deployed already and part of the parent project are three other projects. Two projects A and B already exist and the logging works fine. I am adding a new project C and updated the log4j like below. I do see the ProjectC.log file being created (which is happening when tomcat starts up), but there are no Project C related log statements in this file (or any other file). This is my current log4j xml :
<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/"
debug="true">
<appender name="rootAppender" class="org.apache.log4j.ConsoleAppender">
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d [%t] %-5p %c - %m%n"/>
</layout>
</appender>
<appender name="ProjectAAppender" class="org.apache.log4j.RollingFileAppender">
<param name="file" value="${catalina.base}/logs/projectA.log"/>
<param name="Append" value="true"/>
<param name="MaxFileSize" value="100000KB"/>
<param name="MaxBackupIndex" value="3"/>
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d [%t] %-5p %c %x - %m%n"/>
</layout>
</appender>
<appender name="ProjectBAppender" class="org.apache.log4j.RollingFileAppender">
<param name="file" value="${catalina.base}/logs/ProjectB.csv"/>
<param name="Append" value="true"/>
<param name="MaxFileSize" value="10000KB"/>
<param name="MaxBackupIndex" value="3"/>
<layout class="org.apache.log4j.PatternLayout"/>
</appender>
<appender name="ProjectCAppender" class="org.apache.log4j.RollingFileAppender">
<param name="file" value="${catalina.base}/logs/ProjectC.log"/>
<param name="Append" value="true"/>
<param name="MaxFileSize" value="10000KB"/>
<param name="MaxBackupIndex" value="3"/>
<layout class="org.apache.log4j.PatternLayout"/>
</appender>
<logger name="projA" additivity="true">
<level value ="DEBUG" />
<appender-ref ref="ProjectAAppender"/>
</logger>
<logger name="projA.Performance" additivity="true">
<level value ="INFO" />
<appender-ref ref="ProjectBAppender"/>
</logger>
<logger name="projC" additivity="true">
<level value ="DEBUG" />
<appender-ref ref="ProjectCAppender"/>
</logger>
<root>
<priority value ="INFO" />
<appender-ref ref="rootAppender"/>
</root>
The way I get my log4j instance is using the slf4j LoggerFactory :
LoggerFactory.getLogger(clazz)
I have declared dependencies for log4j(1.2.14), slf4j-log4j12(1.4.1) jar files in my pom.
This setup works fine when I execute Project C independently (when running junit test cases).
How can I make logging work for project C ? Any changes that I should be making to my log4j xml ? Thank you.
Turns out I was using the wrong properties file. In the actual log4j.properties file all I had to do was to create a new appender named Project C and added this line instead of registering with the rootCategory :
log4j.logger.com.projectC.related.package=DEBUG, ProjectCAppender

Log4J - Logger don't work

I use Log4J for an Java Application.
Please find an extract of my log4j.xml file next:
<appender name="CONSOLE" class="org.apache.log4j.ConsoleAppender">
<errorHandler class="org.jboss.logging.util.OnlyOnceErrorHandler"/>
<param name="Target" value="System.out"/>
<param name="Threshold" value="INFO"/>
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d{ABSOLUTE} %-5p [%c{1}] %m%n"/>
</layout>
</appender>
<logger name="my.package.name">
<priority value="debug" />
</logger>
I want to print each log with a info level and debug logs for my package my.package.name.
But, these debug logs don't appear... :(
Can someone help me ?
Change
<param name="Threshold" value="INFO"/>
to
<param name="Threshold" value="debug"/>
Since you have placed Threshold to INFO only info and above will logged . Debug level is below info level. So that is why debug levels are not logged.
log4j hierarchy is TRACE Level < DEBUG Level< INFO Level< WARN Level < ERROR Level < FATAL Level.
Hope this helps

log4J: Failure in post-close rollover action using TimeBasedRollingPolicy

I have setup TimeBasedRollingPolicy to rollout the file every minute (for test purpose) and the problem I am facing is a warning and no zip or gz file is being created. Warning is:
log4j:WARN Failure in post-close rollover action
I attached the source to figure-out the problem but have no success yet. Am I missing any configuration in my log4j.xml?
<appender name="errorAppender" class="org.apache.log4j.rolling.RollingFileAppender">
<param name="File" value="C:/error.log"/>
<param name="Append" value="true"/>
<param name="BufferedIO" value="true"/>
<rollingPolicy class="org.apache.log4j.rolling.TimeBasedRollingPolicy">
<param name="FileNamePattern" value="C:/error.%d{ddMMMyyyy HH:mm:ss}.log.gz" />
<param name="ActiveFileName" value="C:/error.log"/>
</rollingPolicy>
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d [%t] %-5p %C (line:%L) - %m%n"/>
</layout>
<filter class="org.apache.log4j.varia.LevelRangeFilter">
<param name="LevelMax" value="error"/>
<param name="LevelMin" value="error"/>
<param name="AcceptOnMatch" value="true"/>
</filter>
</appender>
I am using log4j-1.2.17 and apache-log4j-extras-1.1. Has anybody seen this problem or have any clue about it?
Problem with "log4j:WARN Failure in post-close rollover action" message is that in windows-based systems you can not create a file name with the ":" char, so the FileNamePattern specified should not contain any one of these: \, /, :, *, ?, ", <, >, |
Here it is a log4j.xml for my application that works fine using a rolling file appender. For testing purposes I made the rolling to create a new file every second:
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/">
<appender name="consola" class="org.apache.log4j.ConsoleAppender">
<param name="target" value="System.out"/>
<layout class="org.apache.log4j.PatternLayout">
<param name="conversionPattern" value="[%d{yyyyMMdd HH:mm:ss:mm,SSS}]%-5p [%t] [%c{1}-%M:%L] - %m%n"/>
</layout>
</appender>
<appender name="desarr" class="org.apache.log4j.rolling.RollingFileAppender">
<param name="Append" value="false"/>
<rollingPolicy name="desarr" class="org.apache.log4j.rolling.TimeBasedRollingPolicy">
<param name="fileNamePattern" value="C:/workspace/Probador/log/backups/importacion222.log_%d{mmss_mm}"/>
<param name="activeFileName" value="C:/workspace/Probador/log/importacion222.log"/>
</rollingPolicy>
<layout class="org.apache.log4j.PatternLayout">
<param name="conversionPattern" value="[%d{yyyyMMdd HH:mm:ss:mm,SSS}]%-5p [%t] [%c{1}-%M] - %m%n"/>
</layout>
</appender>
<root>
<priority value ="debug" />
<appender-ref ref="consola" />
<appender-ref ref="desarr"/>
</root>
</log4j:configuration>
Special attention to:
<param name="fileNamePattern" value="C:/workspace/Probador/log/backups/importacion222.log_%d{mmss_mm}"/>
Try this before attempting to zip the file.
I hit the same issue in log4j with WARN message - "log4j:WARN Failure in post-close rollover action" and the log file was not rolling over. It was root caused to insufficient permission issue on the directory into which log file was getting written. In this case, Java's File.renameTo() method was failing silently (just returns a boolean false). Took lot of time to figure out the issue :(
I am using log4j-1.2.17 and apache-log4j-extras-1.1. Has anybody seen
this problem and have any clue about it?
I have also observed this problem using log4j-1.2.16 and apache-log4j-extras-1.1. The exact same message.
I have tried various tweaks to no avail. The only time when rollingPolicy->FileNamePattern seems to be observed is when it is used without the appender->File parameter and rollingPolicy->ActiveFileName parameter. But even still I have not seen it rollover successfully nor gz or zip previous files.
I also get the same messages:
log4j: setFile called: somepath/somefile.log, true
log4j: setFile ended
log4j:WARN Failure in post-close rollover action
Very frustrating.
For me the solution was to create manually the directory for archived files.
I Also had the same problem,but in my case it was because of the fact that the 'fileNamePattern' path folder did not exist. Rectifying that worked for me and the rollover files were being created then.
If you are using the org.apache.log4j.rolling.TimeBasedRollingPolicy rollingPolicy, then the directory must exist prior to log4j being able to rotate.
For example, the following rollover will only work if /var/log/blah/archive/YYYY/MM directory exists; create it in a nightly cronjob should do the trick. And, as mentioned previously, this will also occur when there is not enough permission to create the log file.
<appender name="infoFile"
class="org.apache.log4j.rolling.RollingFileAppender">
<param name="threshold"
value="INFO"/>
<param name="append"
value="true"/>
<rollingPolicy class="org.apache.log4j.rolling.TimeBasedRollingPolicy">
<param name="ActiveFileName"
value="/var/log/blah/file.log"/>
<!-- IMPORTANT the archive folder must already exist, or log4j cannot
put the rotated log there, and will keep using the old one -->
<param name="FileNamePattern"
value="/var/log/blah/archive/%d{yyyy}/%d{MM}/file.log.%d{yyyy-MM-dd}.gz"/>
</rollingPolicy>
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern"
value="%5p | %-40c{2} | %-4L | %d{yyyy-MM-dd}T%d{HH:mm:ss} | %m%n"/>
</layout>
</appender>

Resources