parsing log4j HTMLLayout logfile with filebeat - log4j

i am a newbie in ELK Stack and it could be that my approuch ist wrong.
Im trying to get my logs into Elasticsearch via filebeat>logstash
Logfiles are saved as [Date]log.html.
Looks like this (picture copied from a stackoverflow Thread): HTMLLayout Log4j picture example
Im able to read the [Date]log.html and view it via Kibana.
But the way it is displayed is awful. All HTML tags are shown too.
I would like to parse it into the headers like shown in the picture linked above:
- Time
- Thread
- Level
- Category
- Message
Im trying this in the filebeat.yml under filebeat.prospectors without success.
Thank you for your help in advance.
Best regards
Edit:
My filebeat.yml (try and error)
filebeat.prospectors:
- input_type: log
# Paths that should be crawled and fetched. Glob based paths.
paths:
- c:\MyDir\log.html
multiline:
pattern: '^[[:space:]]'
negate: false
match: after
#----------------------------- Logstash output --------------------------------
output.logstash:
# The Logstash hosts
hosts: ["localhost:5044"]

Related

Databricks Log4J Custom Appender Not Working as expected

I'm trying to figure out how a custom appender should be configured in a Databricks environment but I cannot figure it out.
When cluster is running, in driver logs, time is displayed as 'unknown' for my custom log file and when cluster is stopped, custom log file is not displayed at all in the log files list
#appender configuration
log4j.appender.bplm=com.databricks.logging.RedactionRollingFileAppender
log4j.appender.bplm.layout=org.apache.log4j.PatternLayout
log4j.appender.bplm.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
log4j.appender.bplm.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.bplm.rollingPolicy.FileNamePattern=logs/log4j-%d{yyyy-MM-dd-HH}-bplm.log.gz
log4j.appender.bplm.rollingPolicy.ActiveFileName=logs/log4j-bplm.log
log4j.logger.com.myPackage=INFO,bplm
Above configuration was added to following files
"/databricks/spark/dbconf/log4j/executor/log4j.properties"
"/databricks/spark/dbconf/log4j/driver/log4j.properties"
"/databricks/spark/dbconf/log4j/master-worker/log4j.properties"
After above configuration was added to above mentioned files, there are two issues which i cannot figure out.
1 - When cluster is running, if I go to driver logs in the list of log files, I can see my custom logfile generated, correctly populated, but time column is displayed as 'unknown'.
2 - When cluster is stopped, if I go to driver logs in the list of log files, my custom appender are not displayed. ( stdout, stderr, and log4j-active are displayed )
I also used different FileNamePatterns, but issues mentioned above seems to happens for any configuration I tried
log4j.appender.bplm.rollingPolicy.FileNamePattern=logs/log4j-%d{yyyy-MM-dd-HH}.bplm.log.gz - appender1
log4j.appender.bplm.rollingPolicy.FileNamePattern=logs/log4j.bplm-%d{yyyy-MM-dd-HH}.log.gz - appender2
log4j.appender.bplm.rollingPolicy.FileNamePattern=logs/bplm-log4j-%d{yyyy-MM-dd-HH}.log.gz - appender3
log4j.appender.bplm.rollingPolicy.FileNamePattern=logs/bplm.log4j-%d{yyyy-MM-dd-HH}.log.gz - appender4
log4j.appender.bplm.rollingPolicy.FileNamePattern=logs/log4j-%d{yyyy-MM-dd-HH}.log.bplm.gz - appender5
log4j.appender.bplm7.rollingPolicy.FileNamePattern=logs/log4j-bplm-%d{yyyy-MM-dd-HH}.log.gz - appender7
log4j.appender.bplm8.rollingPolicy.FileNamePattern=logs/log4j-%d{yyyy-MM-dd-HH}-bplm.log.gz - appender8
I also tried to put *-active in ActiveFileName, but result was the same
log4j.appender.custom.rollingPolicy.FileNamePattern=/tmp/custom/logs/log4j-bplm-%d{yyyy-MM-dd-HH}.log.gz
log4j.appender.custom.rollingPolicy.ActiveFileName=/tmp/custom/logs/log4j-bplm-active.log

write spark application ID with spark logs

I have been researching it for 1 month and couldn't find a good solution. Default spark logs doesn't contain the application ID.
Default logs contains - "time":,"date":,"level":,"thread":,"message"
I tried to customize the log4j properties but I could't find a way. I am a newbie to the the big data area.
My default log4j.properties file is
log4j.rootLogger=${hadoop.root.logger}
hadoop.root.logger=INFO,console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
Does anyone know a solution ? Appreciate your help even it is a small help.

where are the logs written to in aws lambda with python logger utility?

I am working on a python lambda code, see sample below . when I use logger utility vs simply using print statement in python, where does it logs the information?
import logging
logger = logging.getLogger("module1")
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
try:
#do something
except Exception as error:
loggger.exception(error)
They are written to a CloudWatch log group. Go to the Monitoring tab in your function and there should be a link to see the logs in Cloudwatch.
Read more here:
https://docs.aws.amazon.com/lambda/latest/dg/lambda-monitoring.html
Goto Resource Groups > CloudWatch > Logs > Log groups.
Even better way of search a particular log is as below:
CloudWatch > Logs > insights.
Select lambda in which you want to search the log in log groups dropdown above query window.
Select the time range through which you want to search the log.
Sample Query below
fields #timestamp, #message, #logStream
| filter #message like /Error/
| sort #timestamp desc
| limit 20
In the result #logStream column will give you a direct link to that log stream containing the searched message.
Note: If you have selected multiple log groups in the dropdown, you would see the log stream name but the anchor link to it will not be enabled - it only works with the single log group/lambda search selection.
You can also apply Regex between the two forwarded slash.
filter #message like /your regex goes here/
When you click on the log stream link and go to the log stream page. There also you can deep search to get the log stream link for your search.
Select on Gear icon at the top right corner.
Checkmark "log stream name".
Then you can search in "Filter events" search box and goto the particular log location.

Handling logs and writing to a file in python?

I have a module name acms and inside that have number of python files.The main.py has calls to other python files.I have added logs in those files, which are displayed on console but i also want to write these logs in a file called all.log, i tried with setting log levels and logger in a file called log.py but didnt get the expected format,since am new to python am getting difficulty in handling logs
Use the logging module and use logger = logging.getLogger(__name__). Then it will use the correct logger with the options that you have set up.
See the thinkpad-scripts project for its logging. Also the logging cookbook has a section for logging to multiple locations.
We use the following to log to the console and the syslog:
kwargs = {}
dev_log = '/dev/log'
if os.path.exists(dev_log):
kwargs['address'] = dev_log
syslog = logging.handlers.SysLogHandler(**kwargs)
syslog.setLevel(logging.DEBUG)
formatter = logging.Formatter(syslog_format)
syslog.setFormatter(formatter)
logging.getLogger('').addHandler(syslog)

Why Logstash reloading duplicate data from file in Linux?

I am using logstash, elasticsearch and kibana.
My logstash configuration file is as follows.
input {
file {
path => "/home/rocky/Logging/logFiles/test1.txt"
start_position => "end"
sincedb_path => "test.db"
}
}
output {
stdout { codec => rubydebug }
elasticsearch { host => localhost }
}
When I am running Logstash in windows environment it is working fine, but when I am using same configuration in my virtual Linux OS (Fedora) it is creating a problem.
In fedora when i am inserting anything at the end of the log file when logstash is running. Sometimes it is sending all data of the file from beginning, sometimes half data. But it should only load new data appended to that log file. also sincedb file is storing data correctly. Still it is nor giving proper data in Fedora. Please help.
I had a similar problem on my LinuxMint machine using the official logstash docker image.
I was using a text editor (Geany) to add new lines to the file. After playing around a bit more, I figured out that it must have been related to what my text editor (Geany) was doing when saving the file after I added new lines.
When I added new lines using a simple echo command instead, things worked fine:
echo "some new line" >> my_file.log
I know this thread is old, but this was the only thing that came up at all when I googled for this, so hopefully this will help someone else...

Resources