Pyspark and Log4J configuration - apache-spark

I'm trying to have some decent logging through Python using log4j.
I want to have all logs written to a DB, only errors written to an error.log file, and only info written to an info.log file.
logger = sc._jvm.org.apache.log4j
lg = logger.LogManager.getRootLogger()
lg.info('test')
lg.error('test')
lg.debug('test')
lg.fatal('test')
and my log4j.properties file is as follow:
# Set everything to be logged to the console
log4j.rootLogger=INFO, ria_info, ria_error, ria_mysql, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.out
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=[%p] %d %c %M - %m%n
# Set info logs to be written to info file
log4j.appender.ria_info=org.apache.log4j.RollingFileAppender
log4j.appender.ria_info.filter.RangeFilter=org.apache.log4j.varia.LevelMatchFilter
log4j.appender.ria_info.filter.RangeFilter.LevelToMatch=INFO
log4j.appender.ria_info.filter.RangeFilter.AcceptOnMatch=true
log4j.appender.ria_info.layout=org.apache.log4j.PatternLayout
log4j.appender.ria_info.layout.ConversionPattern=[%p] %d %c %M - %m%n
log4j.appender.ria_info.File=/home/data/logs/info.log
log4j.appender.ria_info.MaxFileSize=10MB
log4j.appender.ria_info.MaxBackupIndex=10
log4j.appender.ria_error=org.apache.log4j.RollingFileAppender
log4j.appender.ria_error.Append=false
log4j.appender.ria_error.filter.RangeFilter=org.apache.log4j.varia.LevelMatchFilter
log4j.appender.ria_error.filter.RangeFilter.LevelToMatch=ERROR
log4j.appender.ria_error.filter.RangeFilter.AcceptOnMatch=true
log4j.appender.ria_error.layout=org.apache.log4j.PatternLayout
log4j.appender.ria_error.layout.ConversionPattern=[%p] %d %c %M - %m%n
log4j.appender.ria_error.File=/home/data/logs/error.log
log4j.appender.ria_error.MaxFileSize=10MB
log4j.appender.ria_error.MaxBackupIndex=10
log4j.appender.ria_mysql=org.apache.log4j.jdbc.JDBCAppender
log4j.appender.ria_mysql.URL=jdbc:mysql://localhost/DB
log4j.appender.ria_mysql.driver=com.mysql.jdbc.Driver
log4j.appender.ria_mysql.user=xxxx
log4j.appender.ria_mysql.password=xxxxx
log4j.appender.ria_mysql.sql=INSERT INTO LOGS VALUES('%p','%d{yyyy-MM-dd HH:mm:ss}','%t','%x','%c','%m')
log4j.appender.ria_mysql.layout=org.apache.log4j.PatternLayout
# Set the default spark-shell log level to WARN. When running the spark-shell, the
# log level for this class is used to overwrite the root logger's log level, so that
# the user can have different defaults for the shell and regular Spark apps.
log4j.logger.org.apache.spark.repl.Main=INFO
# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark_project.jetty=WARN
log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR
# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
Now I get the error and fatal 'test' message in my DB and error.log. But the info and debug message gets completely lost for some reason. Also lg.isInfoEnabled() returns False.
I tried a lot of stuff around additivity but it didn't seem to solve the problem.

Related

How to config log4j on yarn?

Background: The yarn is installed by cloudera manager. I use yarn to run my xxx.jar.
I use yarn to run my xxx.jar. As we know, syslog file will contain the log4j related logs. stdout file will contain some output like System.out.println(...). Blow is the some out put in stdout file:
===============================================================
LogType:stdout
LogLastModifiedTime:Sun Oct 11 21:19:27 +0800 2020
LogLength:109238
LogContents:
log4j: Trying to find [container-log4j.properties] using context classloader sun.misc.Launcher$AppClassLoader#5c647e05.
log4j: Using URL [jar:file:/usr/lib/hadoop-yarn/hadoop-yarn-server-nodemanager-3.0.0-cdh6.3.2.jar!/container-log4j.properties] for automatic log4j configuration.
log4j: Reading configuration from URL jar:file:/usr/lib/hadoop-yarn/hadoop-yarn-server-nodemanager-3.0.0-cdh6.3.2.jar!/container-log4j.properties
log4j: Hierarchy threshold set to [ALL].
log4j: Parsing for [root] with value=[INFO,CLA, EventCounter].
log4j: Level token is [INFO].
log4j: Category root set to INFO
log4j: Parsing appender named "CLA".
log4j: Parsing layout options for "CLA".
log4j: Setting property [conversionPattern] to [%d{ISO8601} %p [%t] %c: %m%n].
log4j: End of parsing for "CLA".
log4j: Setting property [containerLogFile] to [syslog].
log4j: Setting property [totalLogFileSize] to [1048576].
log4j: Setting property [containerLogDir] to [/yarn/container-logs/application_1602420941906_0002/container_1602420941906_0002_01_000001].
log4j: setFile called: /yarn/container-logs/application_1602420941906_0002/container_1602420941906_0002_01_000001/syslog, true
log4j: setFile ended
log4j: Parsed "CLA" options.
log4j: Parsing appender named "EventCounter".
log4j: Parsed "EventCounter" options.
log4j: Parsing for [org.apache.hadoop.mapreduce.task.reduce] with value=[INFO,CLA].
log4j: Level token is [INFO].
log4j: Category org.apache.hadoop.mapreduce.task.reduce set to INFO
log4j: Parsing appender named "CLA".
log4j: Appender "CLA" was already parsed.
log4j: Handling log4j.additivity.org.apache.hadoop.mapreduce.task.reduce=[false]
log4j: Setting additivity for "org.apache.hadoop.mapreduce.task.reduce" to false
log4j: Parsing for [org.apache.hadoop.mapred.Merger] with value=[INFO,CLA].
log4j: Level token is [INFO].
log4j: Category org.apache.hadoop.mapred.Merger set to INFO
log4j: Parsing appender named "CLA".
log4j: Appender "CLA" was already parsed.
log4j: Handling log4j.additivity.org.apache.hadoop.mapred.Merger=[false]
log4j: Setting additivity for "org.apache.hadoop.mapred.Merger" to false
log4j: Finished configuring.
Launcher AM configuration loaded
From the log, we can see the log4j properties is from jar:file:/usr/lib/hadoop-yarn/hadoop-yarn-server-nodemanager-3.0.0-cdh6.3.2.jar!/container-log4j.properties. I alse download the hadoop-yarn-server-nodemanager-3.0.0-cdh6.3.2.jar. the content of container-log4j.properties is:
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. See accompanying LICENSE file.
#
# Define some default values that can be overridden by system properties
hadoop.root.logger=DEBUG,CLA
yarn.app.mapreduce.shuffle.logger=${hadoop.root.logger}
# Define the root logger to the system property "hadoop.root.logger".
log4j.rootLogger=${hadoop.root.logger}, EventCounter
# Logging Threshold
log4j.threshold=ALL
#
# ContainerLog Appender
#
#Default values
yarn.app.container.log.dir=null
yarn.app.container.log.filesize=100
log4j.appender.CLA=org.apache.hadoop.yarn.ContainerLogAppender
log4j.appender.CLA.containerLogDir=${yarn.app.container.log.dir}
log4j.appender.CLA.containerLogFile=${hadoop.root.logfile}
log4j.appender.CLA.totalLogFileSize=${yarn.app.container.log.filesize}
log4j.appender.CLA.layout=org.apache.log4j.PatternLayout
log4j.appender.CLA.layout.ConversionPattern=%d{ISO8601} %p [%t] %c: %m%n
log4j.appender.CRLA=org.apache.hadoop.yarn.ContainerRollingLogAppender
log4j.appender.CRLA.containerLogDir=${yarn.app.container.log.dir}
log4j.appender.CRLA.containerLogFile=${hadoop.root.logfile}
log4j.appender.CRLA.maximumFileSize=${yarn.app.container.log.filesize}
log4j.appender.CRLA.maxBackupIndex=${yarn.app.container.log.backups}
log4j.appender.CRLA.layout=org.apache.log4j.PatternLayout
log4j.appender.CRLA.layout.ConversionPattern=%d{ISO8601} %p [%t] %c: %m%n
log4j.appender.shuffleCLA=org.apache.hadoop.yarn.ContainerLogAppender
log4j.appender.shuffleCLA.containerLogDir=${yarn.app.container.log.dir}
log4j.appender.shuffleCLA.containerLogFile=${yarn.app.mapreduce.shuffle.logfile}
log4j.appender.shuffleCLA.totalLogFileSize=${yarn.app.mapreduce.shuffle.log.filesize}
log4j.appender.shuffleCLA.layout=org.apache.log4j.PatternLayout
log4j.appender.shuffleCLA.layout.ConversionPattern=%d{ISO8601} %p [%t] %c: %m%n
log4j.appender.shuffleCRLA=org.apache.hadoop.yarn.ContainerRollingLogAppender
log4j.appender.shuffleCRLA.containerLogDir=${yarn.app.container.log.dir}
log4j.appender.shuffleCRLA.containerLogFile=${yarn.app.mapreduce.shuffle.logfile}
log4j.appender.shuffleCRLA.maximumFileSize=${yarn.app.mapreduce.shuffle.log.filesize}
log4j.appender.shuffleCRLA.maxBackupIndex=${yarn.app.mapreduce.shuffle.log.backups}
log4j.appender.shuffleCRLA.layout=org.apache.log4j.PatternLayout
log4j.appender.shuffleCRLA.layout.ConversionPattern=%d{ISO8601} %p [%t] %c: %m%n
################################################################################
# Shuffle Logger
#
log4j.logger.org.apache.hadoop.mapreduce.task.reduce=${yarn.app.mapreduce.shuffle.logger}
log4j.additivity.org.apache.hadoop.mapreduce.task.reduce=false
# Merger is used for both map-side and reduce-side spill merging. On the map
# side yarn.app.mapreduce.shuffle.logger == hadoop.root.logger
#
log4j.logger.org.apache.hadoop.mapred.Merger=${yarn.app.mapreduce.shuffle.logger}
log4j.additivity.org.apache.hadoop.mapred.Merger=false
#
# Event Counter Appender
# Sends counts of logging messages at different severity levels to Hadoop Metrics.
#
log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
So, i think the log4j related log will be saved in syslog file. But after run my xxx.jar in yarn, there is nothing in syslog, my xxx.jar also contains some code like: System.out.println("my demo");. And i can find the output "my demo" in stdout after run my xxx.jar.
So my question is why the log4j related output cann't been printed out? Is there any other configuration needed?
Got exactly same issue. (I use EMR-5.33, Oozie 5.2.0)
The logs are written to syslog if you set 0 to this property log4j.appender.CLA.totalLogFileSize
It gets the value from yarn.app.container.log.filesize config.
log4j.appender.CLA.totalLogFileSize=${yarn.app.container.log.filesize}
There is an Oozie ticket for that: https://issues.apache.org/jira/browse/OOZIE-3600

Unable to locate PySpark stdout logs

I'm working on an PySpark application and I deploy it yarn-cluster mode. I have given stdout as the logging streamhandler. I could see the log in YARN UI. However, I couldn't find stdout logs under /var/log/sparkapp/yarn. I see only stderr logs there. What could be the reason for this?
This is my logging part in the application
import logging
import sys
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
lsh = logging.StreamHandler(sys.stdout)
lsh.setLevel(logging.INFO)
lformat = logging.Formatter(fmt='%(asctime)s.%(msecs)03d %(levelname)s :%(name)s - %(message)s', datefmt='%m/%d/%Y %I:%M:%S')
lsh.setFormatter(lformat)
logger.addHandler(lsh)
log4j.properties
log4jspark.root.logger=INFO,console
log4jspark.log.dir=.
log4jspark.log.file=spark.log
log4jspark.log.maxfilesize=1024MB
log4jspark.log.maxbackupindex=10
# Define the root logger to the system property "spark.root.logger".
log4j.rootLogger=${log4jspark.root.logger}, EventCounter
# Set everything to be logged to the console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
log4j.appender.console.Threshold=INFO
# Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
Try this instead to get the logger for your spark job:
log4jLogger = sc._jvm.org.apache.log4j
logger = log4jLogger.LogManager.getLogger(__name__)
You can modify log4j.propertiesto change the targetfile:
log4j.appender.console.target=System.out

log4j configuration level error

My log4j configuration is as follows
log4j.rootLogger=INFO, CA, FA, DA
#Console Appender
log4j.appender.CA=org.apache.log4j.ConsoleAppender
log4j.appender.CA.layout=org.apache.log4j.PatternLayout
log4j.appender.CA.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
#File Appender
log4j.appender.FA=org.apache.log4j.FileAppender
log4j.appender.FA.File=/home/admin/logs/sysout.log
log4j.appender.FA.layout=org.apache.log4j.PatternLayout
log4j.appender.FA.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
log4j.appender.FA.Threshold = WARN
#File Appender 2
log4j.appender.DA=org.apache.log4j.FileAppender
log4j.appender.DA.File=/home/admin/logs/debug.log
log4j.appender.DA.layout=org.apache.log4j.PatternLayout
log4j.appender.DA.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
log4j.appender.DA.Threshold = TRACE
My understading is
INFO will be logged to console
WARN will be logged to sysout.log
TRACE will logged to debug.log
But WARN is getting logged to both debug.log and sysout.log. Also TRACE is not logging in any of the file.
Console is having TRACE and WARN both.
Can you please tell me what am I doing wrong
You need to separate the logger and appender concepts in your mind.
For the three appenders, remember that the threshold is the lowest level of message that the appender will process. An appender will process messages at its threshold level or any higher level.
CA has no threshold set, so it will log all messages that are sent to it regardless of level. Similarly DA has a threshold of TRACE so it will also log everything that is sent to it (since TRACE is the lowest level). FA has a threshold of WARN so it will filter out any messages at levels below WARN - it will contain only WARN, ERROR and FATAL messages.
The important part of that previous paragraph is "all messages that are sent to it". Since you have configured your root logger with a level of INFO and have not configured any specific loggers to a lower level, only messages at INFO and above will be sent to the appenders - DEBUG and TRACE messages will be silently dropped. This is why you see no TRACE output in any of your loggers.

Runnig Ex-crawler

Hi i am runnig the jar of this open source Ex-Crawler
But i always receive this error :
og4j:WARN No appenders could be found for logger (eu.medsea.mimeutil.TextMimeDetector).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info
The application you're running uses log4j to produce log files. And log4j needs a configuration file, usually named log4j.properties, to be available in the application's class path, in order to start properly.
This is sample of default configuration you might start with:
log4j.rootLogger=WARN, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.conversionPattern=%5p [%t] (%F:%L) - %m%n

Possible to configure log4j with 2 ConsoleAppenders, each with differrent layout?

I'd like to have 2 different log4j ConsoleAppenders defined with different layouts. I tried the following:
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{ABSOLUTE} %5p %c{1}:%L - %m%n
log4j.appender.stdoutMDC=org.apache.log4j.ConsoleAppender
log4j.appender.stdoutMDC.Target=System.out
log4j.appender.stdoutMDC.layout=org.apache.log4j.PatternLayout
log4j.appender.stdoutMDC.layout.ConversionPattern=%d{ABSOLUTE} %5p %c{1}:%L (hibernateLoadPlanWalkPath->%X{hibernateLoadPlanWalkPath}) - %m%n
However, when I attempt to use these appenders I am running into problems. I have the first appender attached to root and then attempt to attach the second to certain ancestor loggers:
log4j.rootLogger=info, stdout
log4g.logger.org.hibernate.loader.plan=trace, stdoutMDC
log4g.additivity.org.hibernate.loader.plan=false
log4g.logger.org.hibernate.persister.walking=trace, stdoutMDC
log4g.additivity.org.hibernate.persister.walking=false
The trouble I am having is that the messages from both of those ancestor loggers end up going to the stdout appender and not the stdoutMDC appender. I tried both with and without disabling additivity, but no difference.
Any ideas?
please try this.
It may help out you
Note : %X{userName} - this is how you fetch data from Mapped Diagnostic Context (MDC)
note the %X{userName} - this is how you fetch data from Mapped Diagnostic Context (MDC)
log4j.appender.consoleAppender.layout.ConversionPattern = %-4r [%t] %5p %c %x - %m - %X{userName}%n
log4j.rootLogger = DEBUG, consoleAppender

Resources