Is there a difference between add_field in grok and mutate filters of logstash? - logstash-grok

I'm trying to add fields while parsing the logs in logstash for the below log record.
64.242.88.10 - - [07/Mar/2004:16:05:49 -0800] "GET /twiki/bin/edit/Main/Double_bounce_sender?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 12846
I want to know if there is a difference when add_field is used under grok filter and when it is used under mutate filter plugins.

The function of the add_field is same in both the filters.

Related

Logstash unable to mutate text log

I am sending below log file from filbeat to Logstash. The below is the output from filbeat to Logstash
\u001b[m\u001b[32m[2019-12-02T17:30:09,995] INFO - [http-nio-8080-exec-9:40] {client_ip=13.232.113.45, request_id=8D9383C6E4FD40EC90324627F8EF839C} [filter.RequestAndResponseLoggingFilter.doFilterInternal:113] Response body: {\"status\":\"success\",\"message\":\"success\"}"
I want to remove \u001b[m\u001b[32m
I added below setting in gsub for logstash but it still comes in Kibana
["message","^\\u001b\[m\\u001b\[32m"," "]
The code that I want to mutate is a color code used in tomcat to see info, error in tomcat logs
Unable to figure out color code , figured out a way around
mutate {
gsub => ["message", ".+?(?=\[[0-9]{4}-[0-9]{2}-[0-9]{2})", ""
}
Above GSUB mutate anything before Date. This resolved the issue

Syntax for Lookahead and Lookbehind in Grok Custom Pattern

I'm trying to use a lookbehind and a lookahead in a Grok custom pattern and getting pattern match errors in the Grok debugger that I cannot resolve.
This is for archiving system logs. I am currently trying to parse the postgrey application.
Given data such as:
2019-04-09T11:41:31-05:00 67.157.192.7 postgrey: action=pass, reason=triplet found, delay=388, client_name=unknown, client_address=103.255.78.9, sender=members#domain.com, recipient=person#domain.com
I'm trying to use the following to pull the string between "action=" and the comma immediately following it as the field "postgrey_action":
%{TIMESTAMP_ISO8601:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG} (?<postgrey_action>(?<=action=).+?(?=\,))
I expect to see the following output:
{
"program": "dhcpd:",
"logsource": "66.146.192.67",
"timestamp": "2019-04-09T11:41:31-05:00"
"postgrey_action": "pass"
}
Instead, from the debugger, I receive "Provided Grok patterns do not match data in the input".
How can I properly make this lookbehind/lookahead work?
Edit: I should note that without the postgrey_action match at the end of the Grok pattern, the Grok Debugger runs and works as expected (using linux-syslog and grok-patterns).
Logstash version 6.3.2
As a work around, I have resorted to modifying my syntax, using a custom patterns file, and referencing it in each filter using the patterns_dir directive.
Ex.
My pattern:
POSTGREY %{TIMESTAMP_ISO8601:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG} (action=)%{WORD:postgrey_action}(,) (reason=)%{DATA:postgrey_reason}(,) (delay=)%{NUMBER:postgrey_delay}(,) (client_name=)%{IPORHOST}(,) (client_address=)%{IPORHOST:postgrey_clientaddr}(,) (sender=)%{EMAILADDRESS:postgrey_sender}(,)
My filter:
if "postgrey" in [program] {
grok {
match => { "message" => "%{POSTGREY}"}
patterns_dir => ["/etc/logstash/patterns"]
overwrite => [ "message" ]
}
}
However, this workaround still does not answer my original question, which is, why did my initial approach not work?
Looking at the Oniguruma Regex documentation and the Grok filters documentation, it's not clear to me what is wrong with my original syntax or how a look-ahead/look-behind should be properly implemented with grok regex named capture. If it is not supported, it should not be documented as such.

Grok pattern for log

problem finding a right grok pattern for all my logs in order to parse all of them through logstash. here is my sample log
20180809 17:43:27,user.mystack.com,adam,172.16.1.1,36610,QUERY,test_db,select * from table,'SET autocommit=0',0
I want grok pattern which parse the log in the format:
Date- 09/08/2018 17:43:27
Domain- user.mystack.com
User- adam
ClientIP- 172.16.1.1
ID- 36610
Operation- Query
Db_name- test_db
Query- select * from table,'SET autocommit=0',0
This will be the grok pattern:
grok {
match => ["message", '%{DATA:Date},%{DATA:Domain},%{DATA:User},%{DATA:ClientIP},%{DATA:ID},%{DATA:Operation},%{DATA:Db_name},%{GREEDYDATA:Query}']
}
DATA and GREEDYDATA are just regular expression patterns that can be reused conveniently. There are more patterns that we can use and are available here: https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns
Also, use this app to test your grok patterns: https://grokdebug.herokuapp.com/
To convert the date field use the date filter if you're planning to do time-based plotting of your logs and requests. Date filter: https://www.elastic.co/guide/en/logstash/current/plugins-filters-date.html

logstash for custom logs

Im trying write a grok filter for below logs but getting grokfailure. I'm new to logstash, please help me.
Logs:
msg.repository.routed.ABC_MAS:101::20170526-05:03:08: got from remote host at t-rate <0.068> and f-size <68> into tmp dir
msg.repository.routed.ABC_MAS:101::20170526-05:03:07: got from remote host at t-rate <0.068> and f-size <68> into tmp dir
msg.repository.routed.BCD_MAS:101::20170526-00:04:34: sftp connected to 1.2.2.1(msg), timeOut:1000
msg.repository.routed.ABC_MAS:101::20170526-00:04:37: sftp connected to 1.2.2.1(msg), timeOut:1000
Grok pattern which I used:
filter {
grok {
match => { "message" => '(?: %{GREEDYDATA:pullpathinfo}):%{NUMBER:thread}::%{NUMBER:date}-%{NUMBER:HOUR}:%{NUMBER:MINUTE}:%{NUMBER:SECOND}: (?: sftp connected to %{IPORHOST:remoteip} %{GREEDYDATA:msg})' }
match => { "message" => '(?: %{GREEDYDATA:pullpathinfo}):%{NUMBER:thread}::%{NUMBER:date}-%{NUMBER:HOUR}:%{NUMBER:MINUTE}:%{NUMBER:SECOND}: (?: got \<%{GREEDYDATA:filename}> %{GREEDYDATA:rate_size})' }
}
}
To develop grok patterns I suggest you to use Grok Debugger. It allows you to build up the grok patterns incrementally.
For the follow log (on of the log lines in your question):
msg.repository.routed.ABC_MAS:101::20170526-00:04:37: sftp connected to 1.2.2.1(msg), timeOut:1000
the following grok pattern will work:
%{USERNAME:pullpathinfo}:%{NUMBER:thread}::%{NUMBER:date}-%{TIME:time}: sftp connected to %{IPORHOST:remoteip}%{GREEDYDATA:msg}
The follow changes are relevant:
The grok pattern has to be exact about every character in the pattern. This is also true for every space between the grok patterns (%{WORD} %{WORD} is not the same as %{WORD}%{WORD}. In your pattern there was a space too much between %{IPORHOST:remoteip} and %{GREEDYDATA:msg}.
%{USERNAME} instead of %{GREEDYDATA} (GREEDYDATA should only be used for remaining parts in a log line. Even if the pattern USERNAME has a name, which does not fit, the pattern behind it looks like a good fit, because it includes [a-zA-Z0-9._-]+ (but not the colon :)
%{TIME} instead of `%{NUMBER:HOUR}:%{NUMBER:MINUTE}:%{NUMBER:SECOND}
I hope this helps.

Is there any way to grok parse URIPATHPARAM when the URL contains invalid characters

Quick background: using access logging from HAProxy and parsing it using grok. HAProxy's %{+Q}r log variable prints "<http verb> <uri> <HTTP version>" which we are parsing using
"%{WORD:method} %{URIPATHPARAM:url} HTTP/%{NUMBER:httpversion}"
This works fine for most requests but when we are hit with various kinds of scanners trying to do injection attacks etc. by sending junk in the URL grok fails to parse the uri. Here are some examples that crash this grok filter:
"GET /index.html?14068'#22><bla> HTTP/1.1"
"GET /index.html?fName=\Windows\system.ini%00&lName=&guestEmail= HTTP/1.1"
Can anyone think of a solution that would preferably parse even invalid URIs or at least not crash, i.e. parse as much of the URL as possible and discard junk?
Yes, by using the multiple match ability of grok.
https://groups.google.com/forum/#!topic/logstash-users/H3_3gnWY2Go
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html#plugins-filters-grok-match
When combined with break_on_match => true (the default), you can specify multiple patterns for grok to try and it will stop after it finds a matching pattern and applies it.
Here, if the first pattern doesn't work, it will try the next pattern which uses a NOTSPACE, which will eat up those bad characters, and tags the field bad_url instead of url
filter {
grok {
match => {
"message" => [
"%{WORD:method} %{URIPATHPARAM:url} HTTP/%{NUMBER:httpversion}",
"%{WORD:method} %{NOTSPACE:bad_url} HTTP/%{NUMBER:httpversion}"
]
}
break_on_match => true
}
}

Resources