Databricks Log4J Custom Appender Not Working as expected - log4j

I'm trying to figure out how a custom appender should be configured in a Databricks environment but I cannot figure it out.
When cluster is running, in driver logs, time is displayed as 'unknown' for my custom log file and when cluster is stopped, custom log file is not displayed at all in the log files list
#appender configuration
log4j.appender.bplm=com.databricks.logging.RedactionRollingFileAppender
log4j.appender.bplm.layout=org.apache.log4j.PatternLayout
log4j.appender.bplm.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
log4j.appender.bplm.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.bplm.rollingPolicy.FileNamePattern=logs/log4j-%d{yyyy-MM-dd-HH}-bplm.log.gz
log4j.appender.bplm.rollingPolicy.ActiveFileName=logs/log4j-bplm.log
log4j.logger.com.myPackage=INFO,bplm
Above configuration was added to following files
"/databricks/spark/dbconf/log4j/executor/log4j.properties"
"/databricks/spark/dbconf/log4j/driver/log4j.properties"
"/databricks/spark/dbconf/log4j/master-worker/log4j.properties"
After above configuration was added to above mentioned files, there are two issues which i cannot figure out.
1 - When cluster is running, if I go to driver logs in the list of log files, I can see my custom logfile generated, correctly populated, but time column is displayed as 'unknown'.
2 - When cluster is stopped, if I go to driver logs in the list of log files, my custom appender are not displayed. ( stdout, stderr, and log4j-active are displayed )
I also used different FileNamePatterns, but issues mentioned above seems to happens for any configuration I tried
log4j.appender.bplm.rollingPolicy.FileNamePattern=logs/log4j-%d{yyyy-MM-dd-HH}.bplm.log.gz - appender1
log4j.appender.bplm.rollingPolicy.FileNamePattern=logs/log4j.bplm-%d{yyyy-MM-dd-HH}.log.gz - appender2
log4j.appender.bplm.rollingPolicy.FileNamePattern=logs/bplm-log4j-%d{yyyy-MM-dd-HH}.log.gz - appender3
log4j.appender.bplm.rollingPolicy.FileNamePattern=logs/bplm.log4j-%d{yyyy-MM-dd-HH}.log.gz - appender4
log4j.appender.bplm.rollingPolicy.FileNamePattern=logs/log4j-%d{yyyy-MM-dd-HH}.log.bplm.gz - appender5
log4j.appender.bplm7.rollingPolicy.FileNamePattern=logs/log4j-bplm-%d{yyyy-MM-dd-HH}.log.gz - appender7
log4j.appender.bplm8.rollingPolicy.FileNamePattern=logs/log4j-%d{yyyy-MM-dd-HH}-bplm.log.gz - appender8
I also tried to put *-active in ActiveFileName, but result was the same
log4j.appender.custom.rollingPolicy.FileNamePattern=/tmp/custom/logs/log4j-bplm-%d{yyyy-MM-dd-HH}.log.gz
log4j.appender.custom.rollingPolicy.ActiveFileName=/tmp/custom/logs/log4j-bplm-active.log

Related

write spark application ID with spark logs

I have been researching it for 1 month and couldn't find a good solution. Default spark logs doesn't contain the application ID.
Default logs contains - "time":,"date":,"level":,"thread":,"message"
I tried to customize the log4j properties but I could't find a way. I am a newbie to the the big data area.
My default log4j.properties file is
log4j.rootLogger=${hadoop.root.logger}
hadoop.root.logger=INFO,console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
Does anyone know a solution ? Appreciate your help even it is a small help.

Handling logs and writing to a file in python?

I have a module name acms and inside that have number of python files.The main.py has calls to other python files.I have added logs in those files, which are displayed on console but i also want to write these logs in a file called all.log, i tried with setting log levels and logger in a file called log.py but didnt get the expected format,since am new to python am getting difficulty in handling logs
Use the logging module and use logger = logging.getLogger(__name__). Then it will use the correct logger with the options that you have set up.
See the thinkpad-scripts project for its logging. Also the logging cookbook has a section for logging to multiple locations.
We use the following to log to the console and the syslog:
kwargs = {}
dev_log = '/dev/log'
if os.path.exists(dev_log):
kwargs['address'] = dev_log
syslog = logging.handlers.SysLogHandler(**kwargs)
syslog.setLevel(logging.DEBUG)
formatter = logging.Formatter(syslog_format)
syslog.setFormatter(formatter)
logging.getLogger('').addHandler(syslog)

CORB Batch process output Report Extraction issue

While running the CORB job, I am Extracting 100,000 URI's and loading the
data in one file at Linux server. The expectation is all the output records should be store in one file with 100k count. However The data was stored in multiple files with different counts. Can anyone help me out with root cause why the CORB process is creating multiple files in the output directory?
Please find the details of the CORB properties file that I configured in my local directory
Properties file :
THREAD-COUNT=4
PROCESS-TASK=com.marklogic.developer.corb.extension.ResilientTransform
SSL-CONFIG-CLASS=com.marklogic.developer.corb.TwoWaySSLConfig
SSL-PROPERTIES-FILE=/eiestore/ssl-configs/common-corb-sslconfig.properties
DECRYPTER=com.marklogic.developer.corb.HostKeyDecrypter
MODULE-ROOT=/a/abcmodules/corb-process/
MODULES-DATABASE="abcmodules"
URIS-MODULE=corb-select-uris.xqy
XQUERY-MODULE=corb-get-process.xqy
PROCESS-TASK=com.marklogic.developer.corb.ExportBatchToFileTask
PRE-BATCH-TASK=com.marklogic.developer.corb.PreBatchUpdateFileTask
EXPORT-FILE-TOP-CONTENT=Id,value,type
EXPORT-FILE-DIR=/a/b/c/d/

parsing log4j HTMLLayout logfile with filebeat

i am a newbie in ELK Stack and it could be that my approuch ist wrong.
Im trying to get my logs into Elasticsearch via filebeat>logstash
Logfiles are saved as [Date]log.html.
Looks like this (picture copied from a stackoverflow Thread): HTMLLayout Log4j picture example
Im able to read the [Date]log.html and view it via Kibana.
But the way it is displayed is awful. All HTML tags are shown too.
I would like to parse it into the headers like shown in the picture linked above:
- Time
- Thread
- Level
- Category
- Message
Im trying this in the filebeat.yml under filebeat.prospectors without success.
Thank you for your help in advance.
Best regards
Edit:
My filebeat.yml (try and error)
filebeat.prospectors:
- input_type: log
# Paths that should be crawled and fetched. Glob based paths.
paths:
- c:\MyDir\log.html
multiline:
pattern: '^[[:space:]]'
negate: false
match: after
#----------------------------- Logstash output --------------------------------
output.logstash:
# The Logstash hosts
hosts: ["localhost:5044"]

is Rollover logic of TimeBasedRollingPolicy correct?

Documentation says that 'if file names haven't changed, no rollover'.fileName is derived with the fileName pattern String.
I have two observations :
1)If today appender doesn't have message to write then it wont roll the file even if triggring time has passed (i.e. we have a log file which was modified yesterday ).
2)If the yesterdays log file is 0KB(means yesterday no log was written in file) & today appender have some messages to write then it rolls the 0kb file and writes data to newly created log file
I wanted to discuss whether above both cases are correctly implemented by TimeBasedRollingPolicy class OR should the implementation be changed ?
My implementation strategy for 1st Scenario would be 'if FileNamePattern is set to %d{dd-MM-yyyy} then at the midnight file should be rolled irrespective whether appender has data to write if yesterday's file is non empty.'
In case of 2nd case if yesterday's file is 0kb means no message was logged yesterday then it should write data into same file . because main purpose of rolling is to take backup of the logs and if file is empty is it worth to roll it ?
take below configuration of log4j.properties file as referenec for above two scenario's discussion
Sample log4j.properties
####### Root Logger ########################################
log4j.rootLogger=ERROR,CA,FA
############################################################
################### APPENDERS ##############################
############################################################
# CA is set to be a ConsoleAppender
log4j.appender.CA=org.apache.log4j.ConsoleAppender
log4j.appender.CA.layout=org.apache.log4j.PatternLayout
log4j.appender.CA.layout.ConversionPattern=%d %p %t %c: %m%n
# FA is set to be a FileAppender
log4j.appender.FA=org.apache.log4j.rolling.RollingFileAppender
log4j.appender.FA.RollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.FA.RollingPolicy.FileNamePattern=.\\logs\\application.log-%d{dd-MM-yyyy}
log4j.appender.FA.File=.\\logs\\application.log
log4j.appender.FA.layout=org.apache.log4j.EnhancedPatternLayout
log4j.appender.FA.layout.ConversionPattern=%d %p %t %c: %m%n
log4j.appender.FA.Append=true

Resources