Grep and Replace a string in a particular pattern - string

I want to add/replace a string in a file with in a particular pattern. Please refer below
"dont_search_this" => {
-tag => "qwerty",
-abc_asd => [ "q/rg/dfg.txt",],
-dependent_fcv => ["me_lib", "you_lib",],
-vlog_opts => (($ENV{ABC_PROJECT}) eq "xuv")
? [ "-error=AMR", "-error=GHJ", "-error=TYU", "-error=IJK", ]
: [] ,
},
"search_this" => {
-tag => "qwerty",
-abc_asd => [ "q/rg/dfg.txt",],
-dependent_fcv => ["me_lib", "you_lib",],
-vlog_opts => (($ENV{ABC_PROJECT}) eq "xuv")
? [ "-error=AMR", "-error=GHJ", "-error=TYU", "-error=IJK", ]
:[],
},
In above data, I want to add string "-error=all", in the line -vlog_opts in search_this paragraph only. Modified should be as follows
"dont_search_this" => {
-tag => "qwerty",
-abc_asd => [ "q/rg/dfg.txt",],
-dependent_fcv => ["me_lib", "you_lib",],
-vlog_opts => (($ENV{ABC_PROJECT}) eq "xuv")
? [ "-error=AMR", "-error=GHJ", "-error=TYU", "-error=IJK", ]
:[],
},
"search_this" => {
-tag => "qwerty",
-abc_asd => [ "q/rg/dfg.txt",],
-dependent_fcv => ["me_lib", "you_lib",],
-vlog_opts => (($ENV{ABC_PROJECT}) eq "xuv")
? [ "-error=AMR", "-error=GHJ", "-error=TYU", "-error=IJK", "-error=all" ]
:[],
},
Please help me in this.
Using perl is also fine.
Thank You very much!

I can't help it but think that there's got to be a better way than editing the source code ... ?
Read the whole script file into a string and then follow the trail to identify the place to change
perl -0777 -wpe'
s/"search_this"\s+=>\s+\{.*?\-vlog_opts\s+=>\s+[^\]]+\K/ADD_THIS/s;
' file
(broken over lines for readability)
Notes
0777 switch unsets the input record separator, so the file is "slurped" whole as one "line"
the /s modifier makes it so that . matches the newline as well
the \K makes it so that all matches up to that point are dropped (not consumed) so they don't have to be (captured and) entered in the replacement part. So we literally add ADD_THIS
Good information about \K is under "Lookaround Assertions" in Extended Patterns in perlre but keep in mind that it subtly differs from other lookarounds

That looks like a perl data structure.
Any reason why can't just push "-error=all" into $hash{search_this}{-vlog_opts}->#*

Related

get name of pattern that matched in grok in logstash

If I have a patterns file with a bunch of regex patterns such as the following
A .*foo.*
B .*bar.*
C .*baz.*
and my grok filter looks like the following:
grok {
patterns_dir => ["/location/of/patterns"]
match => { "request" => [ "%{A}", "%{B}", "%{C}",]
}
}
is there any way to know which one matched. I.e the name of the SYNTAX. I would like to annotate the document with the name of the one that matched
what you would usually do is name the matched variables. The syntax for that would be:
(taking your example):
grok {
patterns_dir => ["/location/of/patterns"]
match =>
{
"request" => [ "%{A:A}", "%{B:NameOfB}", "%{C:SomeOtherName}",]
}
}
Accordingly, the matches of your grok would now be named:
A: A
B: NameOfB
C: SomeOtherName
So in your case you could just name them after the patterns. That should work just fine.
Alternatively (I just tested that with grok debugger) it appears that if you do not name your matched pattern they will default to the name of the pattern (which I think is what you want). The downfall of this is that if you reuse your pattern, the result will be an array of values.
This is the test I ran:
Input:
Caused by: com.my.application.IOException: null Caused by: com.my.application.IOException: null asd asd
grok:
(.*?)Caused by:%{GREEDYDATA}:%{GREEDYDATA}
Output:
{
"GREEDYDATA": [
[
" com.my.application.IOException: null Caused by: com.my.application.IOException",
" null asd asd"
]
]
}
Hope that solves your problems,
Artur
EDIT:
Based on OP's other question here is my approach to solving that issue dynamically.
You will still have to match the names. Decide on a common prefix on how to name your matches. I will base my example on 2 json strings to make this easier:
{"a" : "b", "prefix_patterna" : "", "prefix_patternb" : "bla"}
{"a" : "b", "prefix_patterna" : "sd", "prefix_patternb" : ""}
Note how there are 2 artificial matches, prefix_patterna and prefix_patternb. So, I decided on the prefix "prefix" and I use that to identify which event fields to inspect. (you can grok to also drop empty events if that is something you want).
Then in my filter, I use ruby to iterate through all events to find the one that matched my pattern:
ruby {
code => "
toAdd = nil;
event.to_hash.each { |k,v|
if k.start_with?('prefix_') && v.to_s != ''
toAdd = k
end
}
if toAdd.to_s != ''
event['test'] = toAdd
end
"
}
All this code does is to check the event keys for the prefix, and see if the value of that field is empty or nil. If it finds the field that has a value, it writes it into a new event field called "test".
Here are my tests:
Settings: Default pipeline workers: 8
Pipeline main started
{"a" : "b", "prefix_patterna" : "sd", "prefix_patternb" : ""}
{
"message" => "{\"a\" : \"b\", \"prefix_patterna\" : \"sd\", \"prefix_patternb\" : \"\"}",
"#version" => "1",
"#timestamp" => "2016-09-15T09:48:29.418Z",
"host" => "pandaadb",
"a" => "b",
"prefix_patterna" => "sd",
"prefix_patternb" => "",
"test" => "prefix_patterna"
}
{"a" : "b", "prefix_patterna" : "", "prefix_patternb" : "bla"}
{
"message" => "{\"a\" : \"b\", \"prefix_patterna\" : \"\", \"prefix_patternb\" : \"bla\"}",
"#version" => "1",
"#timestamp" => "2016-09-15T09:48:36.359Z",
"host" => "pandaadb",
"a" => "b",
"prefix_patterna" => "",
"prefix_patternb" => "bla",
"test" => "prefix_patternb"
}
Note how the first test writes "prefix_patterna" while the second test writes "prefix_patternb".
I hope this solves your issue,
Artur
You can tag the match, (or add fields) by having multiple grok filters as follows.
It doesn't feel elegant, is not very scalable as it is prone to a lot of repetition (not DRY), but seems to be the only way to "flag" matches of complex patterns - especially predefined library patterns.
Note you have to add conditionals to the subsequent filters to avoid them being run too when previous filters have already matched. Otherwise you'll still get _grokparsefailure tags for the later filters. Source
You also need to remove the failure tags of all but the final "else" filter. Otherwise you will get spurious _grokparsefailures e.g. from A when B or C matches. Source
grok {
patterns_dir => ["/location/of/patterns"]
match => { "request" => "%{A}"
add_tag => [ "pattern_A" ]
add_field => { "pattern" => "A" } # another option
tag_on_failure => [ ] # prevent false failure tags
}
if ("pattern_A" not in [tags]) {
grok {
patterns_dir => ["/location/of/patterns"]
match => { "request" => "%{B}"
add_tag => [ "pattern_B" ]
tag_on_failure => [ ] # prevent false failure tags
}
}
if (["pattern_A","pattern_B"] not in [tags]) {
grok {
patterns_dir => ["/location/of/patterns"]
match => { "request" => "%{C}"
add_tag => [ "pattern_C" ]
}
}
There may be ways to simplify / tune this, but I'm not an expert (yet!).

Logstash grok filter doesn't work for the last field

With Logstash 2.3.3, grok filter doesn't work for the last field.
To reproduce the problem, create test.conf as follows:
input {
file {
path => "/Users/izeye/Applications/logstash-2.3.3/test.log"
}
}
filter {
grok {
match => { "message" => "%{DATA:id1},%{DATA:id2},%{DATA:id3},%{DATA:id4},%{DATA:id5}" }
}
}
output {
stdout {
codec => rubydebug
}
}
Run ./bin/logstash -f test.conf
and after it started, in another terminal run echo "1,2,3,4,5" >> test.log
and I got the following output:
Johnnyui-MacBook-Pro:logstash-2.3.3 izeye$ ./bin/logstash -f test.conf
Settings: Default pipeline workers: 8
Pipeline main started
{
"message" => "1,2,3,4,5",
"#version" => "1",
"#timestamp" => "2016-07-07T07:57:42.830Z",
"path" => "/Users/izeye/Applications/logstash-2.3.3/test.log",
"host" => "Johnnyui-MacBook-Pro.local",
"id1" => "1",
"id2" => "2",
"id3" => "3",
"id4" => "4"
}
You can see the missing id5.
I'm not sure this is a bug or mis-configured.
Any hint will be appreciated.
I think it is because how the DATA pattern is defined. Its regex is .*?, so it's a lazy match.
It's not a bug, it's how regex works (example).
But you might want to ask a regex question in order to have an accurate answer.
As a solution, you can replace the last DATA with NUMBER (or something appropriate to your situation). GREEDYDATA would also work.
Though, in that solution, the csv or dissect filters might be better fit, as easier to configure and more performant.

configuring the logstash-output-csv

I am pretty new to logstash and I have been trying to convert an existing log into a csv format using the logstash-output-csv plugin.
My input log string looks as follows which is a custom log written in our application.
'128.111.111.11/cpu0/log:5988:W/"00601654e51a15472-76":687358:<9>2015/08/18 21:06:56.05: comp/45 55% of memory in use: 2787115008 bytes (change of 0)'
I wrote a quick regex and added it to the patterns_dir using the grok plugin.
My pattern is as follows :
IP_ADDRESS [0-9,.]+
CPU [0-9]
NSFW \S+
NUMBER [0-9]
DATE [0-9,/]+\s+[0-9]+[:]+[0-9]+[:]+[0-9,.]+
TIME \S+
COMPONENT_ID \S+
LOG_MESSAGE .+
without adding any csv filters I was able to get this output.
{
"message" => "128.111.111.11/cpu0/log:5988:W/"00601654e51a15472-76":687358:<9>2015/08/18 21:06:56.05: comp/45 55% of memory in use: 2787115008 bytes (change of 0)",
"#version" => "1",
"#timestamp" => "2015-08-18T21:06:56.05Z",
"host" => "hostname",
"path" => "/usr/phd/raveesh/sample.log_20150819000609",
"tags" => [
[0] "_grokparsefailure"
]
}
This is my configuration in order to get the csv as an output
input {
file {
path => "/usr/phd/raveesh/temporary.log_20150819000609"
start_position => beginning
}
}
filter {
grok {
patterns_dir => "./patterns"
match =>["message", "%{IP_ADDRESS:ipaddress}/%{CPU:cpu}/%{NSFW:nsfw}<%{NUMBER:number}>%{DATE}:%{SPACE:space}%{COMPONENT_ID:componentId}%{SPACE:space}%{LOG_MESSAGE:logmessage}" ]
break_on_match => false
}
csv {
add_field =>{"ipaddress" => "%{ipaddress}" }
}
}
output {
# Print each event to stdout.
csv {
fields => ["ipaddress"]
path => "./logs/firmwareEvents.log"
}
stdout {
# Enabling 'rubydebug' codec on the stdout output will make logstash
# pretty-print the entire event as something similar to a JSON representation.
codec => rubydebug
}
}
The above configuration does not seem to give the output. I am trying only to print the ipaddress in a csv file but finally I need to print all the captured patterns in a csv file. so I need the output as follows :
128.111.111.111,cpu0,nsfw, ....
Could you please let me know the changes i need to make. ?
Thanks in advance
EDIT:
I fixed the regex as suggested using the tool http://grokconstructor.appspot.com/do/match#result
Now my regex filter looks as follows :
%{IP:client}\/%{WORD:cpu}\/%{NOTSPACE:nsfw}<%{NUMBER:number}>%{YEAR:year}\/%{MONTHNUM:month}\/%{MONTHDAY:day}%{SPACE:space}%{TIME:time}:%{SPACE:space2}%{NOTSPACE:comp}%{SPACE:space3}%{GREEDYDATA:messagetext}
How do I capture the individual splits and save it as a csv ?
Thanks
EDIT:
I finally resolved this using the File plugin .
output {
file{
path => "./logs/sample.log"
message_pattern =>"%{client},%{number}"
}
}
The csv tag in the filter section is for parsing the input and exploding the message to key/value pairs.
In your case you are already parsing the input with the grok, so I bet you don't need the csv filter.
But in the output we can see there is a gorkfailure
{
"message" => "128.111.111.11/cpu0/log:5988:W/"00601654e51a15472-76":687358:<9>2015/08/18 21:06:56.05: comp/45 55% of memory in use: 2787115008 bytes (change of 0)",
"#version" => "1",
"#timestamp" => "2015-08-18T21:06:56.05Z",
"host" => "hostname",
"path" => "/usr/phd/raveesh/sample.log_20150819000609",
"tags" => [
[0] "****_grokparsefailure****"
]
}
That means your grok expression cannot parse the input.
You should fix the expression according to your input and then the csv will output properly.
Checkout http://grokconstructor.appspot.com/do/match for some help
BTW, are you sure the patterns NSFW, CPU, COMPONENT_ID, ... are defined somewhere ?
HIH

Logstash - grok multiline

I tried using multiline in grok filters but its not working properly.
My Logs are
H3|15:55:04:760|exception|not working properly
message:space exception
at line number 25
My conf file is
input { file {
path => "logs/test.log"
start_position => beginning
sincedb_path => "/dev/null"
}}
filter{
multiline {
pattern => "^(\s|[A-Z][a-z]).*"
what => "previous"
}
if [message] =~ /H\d+/{
grok {
match => ["message", "(?m)%{USERNAME:level}\|%{TIME:timestamp}\|%{WORD:method}\|%{GREEDYDATA:error_Message}" ]
}
}
else {
grok {
match => ["message", "(?m)%{GREEDYDATA:error_Message}" ]
}
}
}
output {elasticsearch { host => "localhost" protocol => "http" port => "9200" }}
I am able to process the first line of log file, but second line of log file is not working where I would like to use multiline
Output i would like to have
{
"#timestamp" => "2014-06-19 00:00:00,000"
"path" => "logs/test.log"
"level"=>"H3"
"timestamp"=>15:55:04:760
"method"=>exception
"error_message"=>not working properly
},
{
"#timestamp" => "2014-06-19 00:00:00,000"
"path" => "logs/test.log"
"error_message" => "space exception at line 25"
}
Kindly help me to get required output.
Your multiline config says, "if I find this pattern, keep it with the previous line".
Your pattern "^(\s|[A-Z][a-z]).*" says "either a space, or a capital letter followed by a lowercase letter, then followed by other stuff".
So, " foo" or "California" would match, but "H3" wouldn't.
I would suggest a pattern that matches the start of your multiline expression, and use the 'negate' feature to have all lines that don't match that pattern join to the original line:
filter {
multiline {
pattern => "^[A-Z][0-9]\|"
negate => 'true'
what => 'previous'
}
}
}
This would take the "H3|" line as the beginning, and join all other lines to it. Depending on the range of values at the beginning of the line, you may need to edit the regexp.

Groking and then mutating?

I am running the following filter in a logstash config file:
filter {
if [type] == "logstash" {
grok {
match => {
"message" => [
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{DATA:mymessage}, reason:%{GREEDYDATA:reason}",
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{GREEDYDATA:mymessage}"
]
}
}
}
}
It kind of works:
it does identify and carve out variables "timestamp", "severity", "instance", "mymessage", and "reason"
Really what I wanted was to have text which is now %{mymessage} to be the ${message} but when I add any sort of mutate command to this grok it stops working (btw, should there be a log that tells me what is breaking? I didn't see it... ironic for a logging solution to not have verbose logging).
Here's what I tried:
filter {
if [type] == "logstash" {
grok {
match => {
"message" => [
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{DATA:mymessage}, reason:%{GREEDYDATA:reason}",
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{GREEDYDATA:mymessage}"
]
}
mutate => {
replace => [ "message", "%{mymessage}"]
remove => [ "mymessage" ]
}
}
}
}
So in summary I'd like to understand:
Are there log files I can look at to see why/where a failure is happening?
Why would my mutate commands illustated above not work?
I also thought that if I never used the mymessage variable but instead just referred to message as the variable that maybe it would automatically truncate message to just the matched pattern but that appeared to append the results instead ... what is the correct behaviour?
Using the overwrite option is the best solution, but I thought I'd address a couple of your questions directly anyway.
It depends on how Logstash is started. Normally you'd run it via an init script that passes the -l or --log option. /var/log/logstash would be typical.
mutate is a filter of its own, not a part of grok. You could've done like this (or used rename instead of replace + remove):
grok {
...
}
mutate {
replace => [ "message", "%{mymessage}" ]
remove => [ "mymessage" ]
}
I'd do it a different way. For what you're trying to do, the overwrite option might be more apt.
Something like this:
grok {
overwrite => "message"
match => [
"message" => [
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{DATA:message}, reason:%{GREEDYDATA:reason}",
"\[%{DATA:timestamp}\]\[%{DATA:severity}\]\[%{DATA:instance}\]%{GREEDYDATA:message}"
]
]
}
This'll replace 'message' with the 'grokked' bit.
I know that doesn't directly answer your question - about all I can say is when you start logstash, it writes to STDOUT - at least on the version I'm using - which I'm capturing and writing to a file. In here, it reports some of the errors.
There's a -l option to logstash that lets you specify a log file to use - this will usually show you what's going on in the parser, but bear in mind that if something doesn't match a rule, it won't necessarily tell you why it didn't.

Resources