Grok regex with escaped “[“, “(“, and “)” chars problems

Grok regex with escaped “[“, “(“, and “)” chars problems - logstash

Elastic newbie here - working with a new 5.5 install. I have a log line that looks like so:
[2015/10/01#19:48:22.785-0400] P-4780 T-2208 I DBUTIL : (451) prostrct
create session begin for timk519 on CON:.
I have the following regex:
\[%{DATE:date}#%{TIME:time}-(?<gmtoffset>\d{4})\]\s*(?<procid>P-[0-9]+)\s*(?<threadid>T-[0-9]+)\s*(?<msgtype>[ifIF])\s*(?<processtype>[a-zA-Z]+)\s*(?<usernumber>[0-9]+|[:])\s*\((?<msgnum>[0-9]+|[\-]+)\)\s*%{GREEDYDATA:message}
When I try it in the kibana grok debugger it doesn't work and I get the following error:
GrokDebugger: [parse_exception] [pattern_definitions] property isn't a
map, but of type [java.lang.String], with { header={
processor_type="grok" & property_name="pattern_definitions" } }
this appears to be due to the \[ at the start of the line. If I replace the leading \[ with a period "." I get this
.%{DATE:date}#%{TIME:time}-(?<gmtoffset>\d{4})\]\s*(?<procid>P-[0-9]+)\s*(?<threadid>T-[0-9]+)\s*(?<msgtype>[ifIF])\s*(?<processtype>[a-zA-Z]+)\s*(?<usernumber>[0-9]+|[:])\s*\((?<msgnum>[0-9]+|[\-]+)\)\s*%{GREEDYDATA:message}
the grok debugger and https://grokdebug.herokuapp.com/ are good with this pattern.
When I put this regex into logstash, it fails to recognize the msgnum (451) part of the line because of the escaped parens \( and \) around the msgnum field, and as a result fails to recognize the line as a legal string.
Am I escaping something incorrectly? Is this a bug?
UPDATE 2017-07-21
I got around the issue with escaping ( and ) by putting them in [(] and [)]. I haven't figured out a way to solve matching the leading [ yet.
UPDATE 2017-07-24
The answer below was an epic catch and I've used that to create the following custom patterns:
DBTIME %{TIME}[-+]\d{4}
DBTIMESTAMP %{YEAR}/%{MONTHNUM}/%{MONTHDAY}#%{DBTIME}
which I've implemented in my grok statement like so:
\[%{DBTIMESTAMP:dbdatetime}\]\s*%{PROCESSID:processid}\s*%{DBTHREADID:threadid}\s*%{DBMSGTYPE:msgtype}\s*%{PROCESSTYPE:processtype}?\s*%{USERNUMBER:usernumber}?\s*:\s*[(]%{MSGNUMBER:msgnumber}[)].\s*%{GREEDYDATA:eventmessage}\s*\r
I then use the date filter to turn the dbdatetime into a #timestamp setting, and now the regex matches the incoming log stream which is what I want. Thx!

The devil is in the detail and the error is not apparent at first. The reason the Grok Debugger fails is because of your use of the DATE pattern. This pattern resolves like this:
DATE_US %{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR}
DATE_EU %{MONTHDAY}[./-]%{MONTHNUM}[./-]%{YEAR}
MONTHNUM and MONTHDAY are both 2 digit patterns, which in turn actually means they are matching the 15 in your year. This is the reason why the pattern does not work because \[%{DATE} is actually not matching (it is missing the 20). Why does the pattern .%{DATE} work tough? Because you are not matching the [ with the dot, your are matching the 0 of the year.
How to fix this? Use a custom pattern to match the date. Something like this works:
\[(?<date>%{YEAR}/%{MONTHNUM}/%{MONTHDAY})#%{TIME:time}-(?<gmtoffset>\d{4})\]\s*(?<procid>P-[0-9]+)\s*(?<threadid>T-[0-9]+)\s*(?<msgtype>[ifIF])\s*(?<processtype>[a-zA-Z]+)\s*(?<usernumber>[0-9]+|[:])\s*\((?<msgnum>[0-9]+|[\-]+)\)\s*%{GREEDYDATA:message}
This will return the following output:
{
"date": "2015/10/01",
"msgnum": "451",
"procid": "P-4780",
"processtype": "DBUTIL",
"message": "prostrct create session begin for timk519 on CON:.",
"threadid": "T-2208",
"usernumber": ":",
"gmtoffset": "0400",
"time": "19:48:22.785",
"msgtype": "I"
}

Related

Replacing a certain part of string with a pre-specified Value

I am fairly new to Puppet and Ruby. Most likely this question has been asked before but I am not able to find any relevant information.
In my puppet code I will have a string variable retrieved from the fact hostname.
$n="$facts['hostname'].ex-ample.com"
I am expecting to get the values like these
DEV-123456-02B.ex-ample.com,
SCC-123456-02A.ex-ample.com,
DEV-123456-03B.ex-ample.com,
SCC-999999-04A.ex-ample.com
I want to perform the following action. Change the string to lowercase and then replace the
-02, -03 or -04 to -01.
So my output would be like
dev-123456-01b.ex-ample.com,
scc-123456-01a.ex-ample.com,
dev-123456-01b.ex-ample.com,
scc-999999-01a.ex-ample.com
I figured I would need to use .downcase on $n to make everything lowercase. But I am not sure how to replace the digits. I was thinking of .gsub or split but not sure how. I would prefer to make this happen in a oneline code.

If you really want a one-liner, you could run this against each string:
str
.downcase
.split('-')
.map
.with_index { |substr, i| i == 2 ? substr.gsub(/0[0-9]/, '01') : substr }
.join('-')
Without knowing what format your input list is taking, I'm not sure how to advise on how to iterate through it, but maybe you have that covered already. Hope it helps.

Note that Puppet and Ruby are entirely different languages and the other answers are for Ruby and won't work in Puppet.
What you need is:
$h = downcase(regsubst($facts['hostname'], '..(.)$', '01\1'))
$n = "${h}.ex-ample.com"
notice($n)
Note:
The downcase and regsubst functions come from stdlib.
I do a regex search and replace using the regsubst function and replace ..(.)$ - 2 characters followed by another one that I capture at the end of the string and replace that with 01 and the captured string.
All of that is then downcased.

If the -01--04 part is always on the same string index you could use that to replace the content.
original = 'DEV-123456-02B.ex-ample.com'
# 11 -^
string = original.downcase # creates a new downcased string
string[11, 2] = '01' # replace from index 11, 2 characters
string #=> "dev-123456-01b.ex-ample.com"

How to search for a specific dynamic pattern of a field's in mongodb.?

I need to search mongodb collection for a specific pattern field. I tried using {$exists:true}; However, this gives results only if you provide exact field.
I tried using {$exists:true} for my field. But this does not give results if you give some pattern.
{
"field1":"value1",
"field2":"value2",
"field3":object
{/arjun1/pat1: 1,
/arjun2/pat2: 3,
/arjun3/pat3: 5
}
"field4":"value4",
}
From some field, I get the keys pat3 & field3. From this I would need to find out if the value /arjun3/pat3 exists in the document.
If I use {"field3./arjun3/pat3":{$exists:true}}, this would give me results. But the problem is I get only field3 and pat3 and I need to use some pattern matching like field3.*.pat3 and then use $expr or $exists; which I'm not exactly sure how to. Please help.

you could try something of this kind
db.arjun.find(
{"field3" : {
"$elemMatch" : { $and: [
{"arjun3.pat3" : {$exists:true}},
{"arjun3.pat3" : 5}
]
}}}
);

You can either go for regex (re module) for SQL like pattern matching, and compile your own custom wildcard. But if you don't want that then you can simple use the fnmatch module, it is a builtin library of python which allows wildcard matching for multiple characters (via*) or a single character (via ?).
import fnmatch
a = "hello"
print(fnmatch.fnmatch(a, "h*"))
OUTPUT:-
True

How to move part of the string after exact word to another field in logstash?

Let's imagine I have log file like the following:
My custom exception ST: java.lang.RuntimeException: Text of this dummy err.
My final goal is to put everything after ST: to new field ST called and remove ST:.
I'm trying to use the pattern, but it doesn't work.
filter {
grok {
match => { "message" => "(?<newField>(?<=ST)(?s)(.*$))" }
}

Grok is based on Oniguruma regex library. To make . match any char with an Oniguruma regex, you need to pass (?m) inline modifier, not (?s) (as in PCRE and some other regex engines).
By placing (?<=ST) positive lookahead inside the named capturing group, you require ST to appear immediately before the current location, but you have ST and a colon right after, and then a space. It makes sense to just move ST: out of the named group:
"(?m)ST: (?<newField>.*)"
^^^^^^^^
The ST: and space will get matched and consumed, newField group will hold the rest of string in it.

You can use a specific regex like that:
^My custom exception ST: %{GREEDYDATA:ST}
Or a more generric one:
%{GREEDYDATA} \bST\b: %{GREEDYDATA:ST}
Always try to use specific regex.

Grok Parsing Failure in logstash with Pattern That Includes Square Brackets

I have a log pattern where every log element is enclosed in square brackets. I can't control the original log. I just want the grok parsing to ignore the brackets and only interpret what's between them. Based on something close to the following line:
2019-04-04 13.23.57.057 [52] [77] [MEASURE] [XYZService]
, I want the pattern to see the 52 as a threadId. I have the following code:
if [message] =~ "MEASURE" {
grok {
match => { " message" => "%{TIMESTAMP_ISO8601:logtime} [%{NUMBER:threadId}] %{GREEDYDATA:restofmessage}" }
}
}
else {
drop()
}
In this state, I get a grokparsefailure when logstash attempts to interpret the line. I am certain its only related to the bracketed portion, because when I remove that pattern, every works fine. I would be grateful for any ideas what I am doing wrong. Thanks

nevermind. I got it to work by escaping the brackets like this: \ [ %{NUMBER:threadId}
\ ]

Parsing formatted strings in Go

The Problem
I have slice of string values wherein each value is formatted based on a template. In my particular case, I am trying to parse Markdown URLs as shown below:
- [What did I just commit?](#what-did-i-just-commit)
- [I wrote the wrong thing in a commit message](#i-wrote-the-wrong-thing-in-a-commit-message)
- [I committed with the wrong name and email configured](#i-committed-with-the-wrong-name-and-email-configured)
- [I want to remove a file from the previous commit](#i-want-to-remove-a-file-from-the-previous-commit)
- [I want to delete or remove my last commit](#i-want-to-delete-or-remove-my-last-commit)
- [Delete/remove arbitrary commit](#deleteremove-arbitrary-commit)
- [I tried to push my amended commit to a remote, but I got an error message](#i-tried-to-push-my-amended-commit-to-a-remote-but-i-got-an-error-message)
- [I accidentally did a hard reset, and I want my changes back](#i-accidentally-did-a-hard-reset-and-i-want-my-changes-back)
What I want to do?
I am looking for ways to parse this into a value of type:
type Entity struct {
Statement string
URL string
}
What have I tried?
As you can see, all the items follow the pattern: - [{{ .Statement }}]({{ .URL }}). I tried using the fmt.Sscanf function to scan each string as:
var statement, url string
fmt.Sscanf(s, "[%s](%s)", &statement, &url)
This results in:
statement = "I"
url = ""
The issue is with the scanner storing space-separated values only. I do not understand why the URL field is not getting populated based on this rule.
How can I get the Markdown values as mentioned above?
EDIT: As suggested by Marc, I will add couple of clarification points:
This is a general purpose question on parsing strings based on a format. In my particular case, a Markdown parser might help me but my intention to learn how to handle such cases in general where a library might not exist.
I have read the official documentation before posting here.

Note: The following solution only works for "simple", non-escaped input markdown links. If this suits your needs, go ahead and use it. For full markdown-compatibility you should use a proper markdown parser such as gopkg.in/russross/blackfriday.v2.
You could use regexp to get the link text and the URL out of a markdown link.
So the general input text is in the form of:
[some text](somelink)
A regular expression that models this:
\[([^\]]+)\]\(([^)]+)\)
Where:
\[ is the literal [
([^\]]+) is for the "some text", it's everything except the closing square brackets
\] is the literal ]
\( is the literal (
([^)]+) is for the "somelink", it's everything except the closing brackets
\) is the literal )
Example:
r := regexp.MustCompile(`\[([^\]]+)\]\(([^)]+)\)`)
inputs := []string{
"[Some text](#some/link)",
"[What did I just commit?](#what-did-i-just-commit)",
"invalid",
}
for _, input := range inputs {
fmt.Println("Parsing:", input)
allSubmatches := r.FindAllStringSubmatch(input, -1)
if len(allSubmatches) == 0 {
fmt.Println(" No match!")
} else {
parts := allSubmatches[0]
fmt.Println(" Text:", parts[1])
fmt.Println(" URL: ", parts[2])
}
}
Output (try it on the Go Playground):
Parsing: [Some text](#some/link)
Text: Some text
URL: #some/link
Parsing: [What did I just commit?](#what-did-i-just-commit)
Text: What did I just commit?
URL: #what-did-i-just-commit
Parsing: invalid
No match!

You could create a simple lexer in pure-Go code for this use case. There's a great talk by Rob Pike from years ago that goes into the design of text/template which would be applicable. The implementation chains together a series of state functions into an overall state machine, and delivers the tokens out through a channel (via Goroutine) for later processing.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Grok regex with escaped “[“, “(“, and “)” chars problems - logstash

Related

Replacing a certain part of string with a pre-specified Value

How to search for a specific dynamic pattern of a field's in mongodb.?

How to move part of the string after exact word to another field in logstash?

Grok Parsing Failure in logstash with Pattern That Includes Square Brackets

Parsing formatted strings in Go

Categories

Resources