log4j aligned ConversionPattern - log4j

Is it possible to create a ConversionPattern in log4j as such
Some short Msg : [INFO] class date
Some very looooooooooooooong msg : [INFO] class date
What I basically would like to achieve is that message is aligned to left and the rest of information starts after n characters, thus the rest is nicely aligned instead of
Some short Msg : [INFO] class date
Some very looooooooooooooong msg : [INFO] class date

Just put a - in the pattern like:
%-50m | [%p] %c{1} %d %n"/

Related

regex to match paragraph in between 2 substrings

I have a string look like this:
string=""
( 2021-07-10 01:24:55 PM GMT )TEST
---
Badminton is a racquet sport played using racquets to hit a shuttlecock across
a net. Although it may be played with larger teams, the most common forms of
the game are "singles" (with one player per side) and "doubles" (with two
players per side).
( 2021-07-10 01:27:55 PM GMT )PATRICKWARR
---
Good morning, I am doing well. And you?
---
---
* * *""
I am trying to split the String up into parts as:
text=['Badminton is a racquet sport played using racquets to hit a
shuttlecock across a net. Although it may be played with larger teams,
the most common forms of the game are "singles" (with one player per
side) and "doubles" (with two players per side).','Good morning, I am
doing well. And you?']
What I have tried as:
text=re.findall(r'\( \d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2} PM GMT \)\w+ [\S\n]--- .*',string)
I'm not able get how to extract multiple lines.
You can use
(?m)^\(\s*\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\s*[AP]M\s+GMT\s*\)\w+\s*\n---\s*\n(.*(?:\n(?!(?:\(\s*\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\s*[AP]M\s+GMT\s*\)\w+\s*\n)?---).*)*)
See the regex demo. Details:
^ - start of line
{left_rx} - left boundary
--- - three hyphens
\s*\n - zero or more whitespaces and then an LF char
(.*(?:\n(?!(?:{left_rx})?---).*)*) - Group 1:
.* - zero or more chars other than line break chars as many as possible
(?:\n(?!(?:{left_rx})?---).*)* - zero or more (even empty, due to .*) lines that do not start with the (optional) left boundary pattern followed with ---
The boundary pattern defined in left_rx is \(\s*\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\s*[AP]M\s+GMT\s*\)\w+\s*\n, it is basically the same as the original, I used \s* to match any zero or more whitespaces or \s+ to match one or more whitespaces between "words".
See the Python demo:
import re
text = '''string=""\n( 2021-07-10 01:24:55 PM GMT )TEST \n--- \nBadminton is a racquet sport played using racquets to hit a shuttlecock across\na net. Although it may be played with larger teams, the most common forms of\nthe game are "singles" (with one player per side) and "doubles" (with two\nplayers per side). \n \n \n\n \n\n( 2021-07-10 01:27:55 PM GMT )PATRICKWARR \n--- \nGood morning, I am doing well. And you? \n \n \n\n \n \n \n--- \n \n \n \n \n--- \n \n* * *""'''
left_rx = r"\(\s*\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\s*[AP]M\s+GMT\s*\)\w+\s*\n"
rx = re.compile(fr"^{left_rx}---\s*\n(.*(?:\n(?!(?:{left_rx})?---).*)*)", re.M)
print ( [x.strip().replace('\n', ' ') for x in rx.findall(text)] )
Output:
['Badminton is a racquet sport played using racquets to hit a shuttlecock across a net. Although it may be played with larger teams, the most common forms of the game are "singles" (with one player per side) and "doubles" (with two players per side).', 'Good morning, I am doing well. And you?']
One of the approaches:
import re
# Replace all \n with ''
string = string.replace('\n', '')
# Replace the date string '( 2021-07-10 01:27:55 PM GMT )PATRICKWARR ' and string like '* * *' with ''
string = re.sub(r"\(\s*\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2} [AP]M GMT\s*\)\w+|\*+", '', string)
data = string.split('---')
data = [item.strip() for item in data if item.strip()]
print (data)
Output:
['Badminton is a racquet sport played using racquets to hit a shuttlecock acrossa net. Although it may be played with larger teams, the most common forms ofthe game are "singles" (with one player per side) and "doubles" (with twoplayers per side).', 'Good morning, I am doing well. And you?']

A complicated logstash pattern in Grok

I have following 3 lines in a log that need to be grok'd for ElasticSearch through logstash.
2020-01-27 13:30:43,536 INFO com.test.bestmatch.streamer.function.BestMatchProcessor - Best match for ID: COi0620200110450BAD5CB723457A9B4747F1727 Total Batch Processing time: 3942
2020-01-27 13:30:43,581 INFO HTTPConnection - COi0620200110450BAD5CB723457A9B4747F1727 | People: 51 | Addresses: 5935 | HTTP Query Time: 24
2020-01-27 13:30:43,698 INFO bestRoute - COi0620200110450BAD5CB723457A9B4747F1727 | Touch Points: 117 | Best Match Time 3943
I tried various grok patterns but couldn't get to any concrete one.
Edited as per request
I need the following in ES in the context of the specific log entry
1st line
ID: COi0620200110450BAD5CB723457A9B4747F1727
Total Batch Processing time: 3942
2nd Line
ID: COi0620200110450BAD5CB723457A9B4747F1727
People: 51
Addresses: 5935
HTTP Query Time: 24
3rd Line
Touch Points 117
Best Match Time: 3943.
The output is from a Flink log. If there are flink patterns out there then please let me know.
1st line:
^%{TIMESTAMP_ISO8601:time}\s*%{LOGLEVEL:loglevel}.*ID: (?<ID>[\w\d]*).*time: (?<total_time>[\d]*)$
2nd line:
^%{TIMESTAMP_ISO8601:time}\s*%{LOGLEVEL:loglevel}.* - (?<ID>[\w]*).*People: (?<people>[\w]*).*Addresses: (?<addresses>[\d]*).*HTTP Query Time: (?<query_time>[\d]*)$
3rd line:
^%{TIMESTAMP_ISO8601:time}\s*%{LOGLEVEL:loglevel}.* - (?<ID>[\w]*).*Touch Points: (?<touch_points>[\d]*).*Best Match Time (?<best_match_time>[\d]*)$
There are many ways to parse this, this is only one approach. I would reccomend to adjust the field names I used to the new ECS. https://www.elastic.co/guide/en/ecs/current/index.html

Grok pattern file

I'm trying to get pattern grok on my log file data
this is the message log
116.50.181.5 - - [18/May/2019:09:05:32 +0000] "SHARP56" 50 245 "INFO: System componement ready for use" 23 "A4" "/user/admistrator/68768.pdf" "INFO: No ERROR TO SHOW"
I've tried this grok pattern but it didn't works
%{IP:client} %{HTTPDATE:timestamp}\] %{WORD:name} %{NUMBER:X1} %{NUMBER:x2} %{WORD:msg} %{NUMBER:X3} %{WORD:format} %{WORD:path} %{WORD:label}
the output file that I want should look like this
{
client = 116.50.181.5
timeStamp = 18/May/2019:09:05:32 +0000
name = SHARP56
x1 = 50
x2 = 245
msg =INFO
format = A4
type = pdf
label = INFO: No ERROR TO SHOW
}
any suggestion ?
you can use the following:
%{IP:client} - - \[%{HTTPDATE:timestamp}\] \"%{DATA:name}\" %{NUMBER:X1} %{NUMBER:x2} \"%{GREEDYDATA:msg}\" %{NUMBER:X3} \"%{WORD:format}\" \"%{DATA:path}\" \"%{GREEDYDATA:label}\"

Parsing two formats of log messages in LogStash

In a single log file, there are two formats of log messages. First as so:
Apr 22, 2017 2:00:14 AM org.activebpel.rt.util.AeLoggerFactory info
INFO:
======================================================
ActiveVOS 9.* version Full license.
Licensed for All application server(s), for 8 cpus,
License expiration date: Never.
======================================================
and second:
Apr 22, 2017 2:00:14 AM org.activebpel.rt.AeException logWarning
WARNING: The product license does not include Socrates.
First line is same, but on the other lines, there can be (written in pseudo) :loglevel: <msg>, or loglevel:<newline><many of =><newline><multiple line msg><newline><many of =>
I have the following configuration:
Query:
%{TIMESTAMP_MW_ERR:timestamp} %{DATA:logger} %{GREEDYDATA:info}%{SPACE}%{LOGLEVEL:level}:(%{SPACE}%{GREEDYDATA:msg}|%{SPACE}=+(%{GREEDYDATA:msg}%{SPACE})*=+)
Grok patterns:
AMPM (am|AM|pm|PM|Am|Pm)
TIMESTAMP_MW_ERR %{MONTH} %{MONTHDAY}, %{YEAR} %{HOUR}:%{MINUTE}:%{SECOND} %{AMPM}
Multiline filter:
%{LOGLEVEL}|%{GREEDYDATA}|=+
The problem is that all messages are always identified with %{SPACE}%{GREEDYDATA:msg}, and so in second case return <many of => as msg, and never with %{SPACE}=+(%{GREEDYDATA:msg}%{SPACE})*=+, probably as first msg pattern contains the second.
How can I parse these two patterns of msg ?
I fixed it by following:
Query:
%{TIMESTAMP_MW_ERR:timestamp} %{DATA:logger} %{DATA:info}\s%{LOGLEVEL:level}:\s((=+\s%{GDS:msg}\s=+)|%{GDS:msg})
Patterns:
AMPM (am|AM|pm|PM|Am|Pm)
TIMESTAMP_MW_ERR %{MONTH} %{MONTHDAY}, %{YEAR} %{HOUR}:%{MINUTE}:%{SECOND} %{AMPM}
GDS (.|\s)*
Multiline pattern:
%{LOGLEVEL}|%{GREEDYDATA}
Logs are correctly parsed.

Splunk splitting xml log event

We have logs that log an event to a single file. Each log entry looks something like this:
<LogEntry>
<UserName>IIS APPPOOL\ASP.NET v4.0</UserName>
<TimeStamp>02/28/2014 13:54:17</TimeStamp>
<ThreadName>20</ThreadName>
<CorrelationId>7a0d464d-556c-4d47-820f-0cf01322e54c</CorrelationId>
<LoggerName>-Api-booking</LoggerName>
<Level>INFO</Level>
<Identity></Identity>
<Domain>API-1-130380690118132000</Domain>
<CreatedOn>02/28/2014 13:54:22</CreatedOn>
<ExceptionObject />
<RenderedMessage>"7a0d464d-556c-4d47-820f-0cf01322e54c" - "GET https://myapi.com/booking" - API-"Response":
"Unauthorized"</RenderedMessage>
</LogEntry>
When we import these logs into Splunk, the log entry is split up incorrectly into 3 parts e.g.
1-
<LogEntry>
<UserName>IIS APPPOOL\ASP.NET v4.0</UserName>
2-
<CreatedOn>02/28/2014 02:57:55</CreatedOn>
<ExceptionObject />
<RenderedMessage>"66d8cdda-ff62-480a-b7d2-ec175b151e5f" - "POST https://myapi.com/booking" - API-"Response":
"Bad Request"</RenderedMessage>
</LogEntry>
3-
<TimeStamp>02/28/2014 02:57:29</TimeStamp>
<ThreadName>21</ThreadName>
<CorrelationId>66d8cdda-ff62-480a-b7d2-ec175b151e5f</CorrelationId>
<LoggerName>-Api-booking</LoggerName>
<Level>INFO</Level>
<Identity></Identity>
<Domain>/LM/W3SVC/1/ROOT/Api-1-130380256918440000</Domain>
How can I configure Splunk to see these as a single log event?
props.conf (pay attention to LINE_BREAKER)
[your_xml_sourcetype]
TIME_PREFIX = <TimeStamp>
MAX_TIMESTAMP_LOOKAHEAD = 19
TZ = GMT
# A performance tweak is to disable SHOULD_LINEMERGE and then set the
# LINE_BREAKER to "line ending characters coming before a new time stamp"
# (note the direct link of the TIME_FORMAT to the regex of LINE_BREAKER).
TIME_FORMAT = %m/%d/%Y %T
LINE_BREAKER = ([\r\n]+)<LogEntry>
SHOULD_LINEMERGE = False
# 10000 is default, should be set on a case by case basis
TRUNCATE = 5000
# If the data does not have nice key=value pairs, (or some other readily
# machine parseable format, like JSON or XML), set KV_MODE = none so that
# Splunk doesn't spin its wheels on attempting to look for key = value
# pairs which don't exist.
KV_MODE = xml
# Leaving PUNCT enabled can impact indexing performance. Customers can
# comment this line if they need to use PUNCT
ANNOTATE_PUNCT = false
More information here: http://docs.splunk.com/Documentation/Splunk/latest/Admin/Propsconf

Resources