Negative regexp in logstash configuration - logstash

I cannot get negative regexp expressions working within LogStash (as described in the docs)
Consider the following positive regex which works correctly to detect fields that have been assigned a value:
if [remote_ip] =~ /(.+)/ {
mutate { add_tag => ["ip"] }
}
However, the negative expression seems to return false even when the field is blank:
if [remote_ip] !~ /(.+)/ {
mutate { add_tag => ["no_ip"] }
}
Am I misunderstanding the usage?
Update - this was fuzzy thinking on my part. There were issues with my config file. If the rest of your config file is sane, the above should work.

This was fuzzy thinking on my part - there were issues with the rest of my config file.
Based on Ben Lim's example, I came up with an input that is easier to test:
input {
stdin { }
}
filter {
if [message] !~ /(.+)/ {
mutate { add_tag => ["blank_message"] }
}
if [noexist] !~ /(.+)/ {
mutate { add_tag => ["tag_does_not_exist"] }
}
}
output {
stdout {debug => true}
}
The output for a blank message is:
{
"message" => "",
"#version" => "1",
"#timestamp" => "2014-02-27T01:33:19.285Z",
"host" => "benchmark.example.com",
"tags" => [
[0] "blank_message",
[1] "tag_does_not_exist"
]
}
The output for a message with the content "test message" is:
test message
{
"message" => "test message",
"#version" => "1",
"#timestamp" => "2014-02-27T01:33:25.059Z",
"host" => "benchmark.example.com",
"tags" => [
[0] "tag_does_not_exist"
]
}
Thus, the "negative regex" /(.+)/ returns true only when the field is empty or the field does not exist.
The negative regex /(.*)/ will only return true when the field does not exist. If the field exists (whether empty or with values), the return value will be false.

Below is my configuration. The type field is not exist, therefore, the negative expression is return true.
input {
stdin {
}
}
filter {
if [type] !~ /(.+)/ {
mutate { add_tag => ["aa"] }
}
}
output {
stdout {debug => true}
}
The regexp /(.+)/ means it accepts everything, include blank. So, when the "type" field is exist, even the field value is blank, it also meet the regexp. Therefore, in your example, if the remote_ip field exist, your "negative expression" will always return false.

Related

Logstash - change value of field in cloned document (logstash-clone filter plugin)

Logstash 7.8.1
I'm trying to create two documents from one input with logstash. Different templates, different output indexes. Everything worked fine until I tried to change value only on the cloned doc.
I need to have one field in both documents with different values - is it possible with clone filter plugin?
Doc A - [test][event]- trn
Doc B (cloned doc) - [test][event]- spn
I thought that it will work if I use remove_field and next add_field in clone plugin, but I'm afraid that there was problem with sorting - maybe remove_field method is called after add_field (the field was only removed, but not added with new value).
Next I tried to add value to cloned document first and than to original, but it always made an array with both values (orig and cloned) and I need to have only one value in that field:/.
Can someone help me please?
Config:
input {
file {
path => "/opt/test.log"
start_position => beginning
}
}
filter {
grok {
match => {"message" => "... grok...."
}
}
mutate {
add_field => {"[test][event]" => "trn"}
}
clone {
clones => ["cloned"]
#remove_field => [ "[test][event]" ] #remove the field completely
add_field => {"[test][event]" => "spn"} #not added
add_tag => [ "spn" ]
}
}
output {
if "spn" in [tags] {
elasticsearch {
index => "spn-%{+yyyy.MM}"
hosts => ["localhost:9200"]
template_name => "templ1"
}
stdout { codec => rubydebug }
} else {
elasticsearch {
index => "trn-%{+yyyy.MM}"
hosts => ["localhost:9200"]
template_name => "templ2"
}
stdout { codec => rubydebug }
}
}
If you want to make the field that is added conditional on whether the event is the clone or the original then check the [type] field.
clone { clones => ["cloned"] }
if [type] == "cloned" {
mutate { add_field => { "foo" => "spn" } }
} else {
mutate { add_field => { "foo" => "trn" } }
}
add_field is always done before remove_field.

get name of pattern that matched in grok in logstash

If I have a patterns file with a bunch of regex patterns such as the following
A .*foo.*
B .*bar.*
C .*baz.*
and my grok filter looks like the following:
grok {
patterns_dir => ["/location/of/patterns"]
match => { "request" => [ "%{A}", "%{B}", "%{C}",]
}
}
is there any way to know which one matched. I.e the name of the SYNTAX. I would like to annotate the document with the name of the one that matched
what you would usually do is name the matched variables. The syntax for that would be:
(taking your example):
grok {
patterns_dir => ["/location/of/patterns"]
match =>
{
"request" => [ "%{A:A}", "%{B:NameOfB}", "%{C:SomeOtherName}",]
}
}
Accordingly, the matches of your grok would now be named:
A: A
B: NameOfB
C: SomeOtherName
So in your case you could just name them after the patterns. That should work just fine.
Alternatively (I just tested that with grok debugger) it appears that if you do not name your matched pattern they will default to the name of the pattern (which I think is what you want). The downfall of this is that if you reuse your pattern, the result will be an array of values.
This is the test I ran:
Input:
Caused by: com.my.application.IOException: null Caused by: com.my.application.IOException: null asd asd
grok:
(.*?)Caused by:%{GREEDYDATA}:%{GREEDYDATA}
Output:
{
"GREEDYDATA": [
[
" com.my.application.IOException: null Caused by: com.my.application.IOException",
" null asd asd"
]
]
}
Hope that solves your problems,
Artur
EDIT:
Based on OP's other question here is my approach to solving that issue dynamically.
You will still have to match the names. Decide on a common prefix on how to name your matches. I will base my example on 2 json strings to make this easier:
{"a" : "b", "prefix_patterna" : "", "prefix_patternb" : "bla"}
{"a" : "b", "prefix_patterna" : "sd", "prefix_patternb" : ""}
Note how there are 2 artificial matches, prefix_patterna and prefix_patternb. So, I decided on the prefix "prefix" and I use that to identify which event fields to inspect. (you can grok to also drop empty events if that is something you want).
Then in my filter, I use ruby to iterate through all events to find the one that matched my pattern:
ruby {
code => "
toAdd = nil;
event.to_hash.each { |k,v|
if k.start_with?('prefix_') && v.to_s != ''
toAdd = k
end
}
if toAdd.to_s != ''
event['test'] = toAdd
end
"
}
All this code does is to check the event keys for the prefix, and see if the value of that field is empty or nil. If it finds the field that has a value, it writes it into a new event field called "test".
Here are my tests:
Settings: Default pipeline workers: 8
Pipeline main started
{"a" : "b", "prefix_patterna" : "sd", "prefix_patternb" : ""}
{
"message" => "{\"a\" : \"b\", \"prefix_patterna\" : \"sd\", \"prefix_patternb\" : \"\"}",
"#version" => "1",
"#timestamp" => "2016-09-15T09:48:29.418Z",
"host" => "pandaadb",
"a" => "b",
"prefix_patterna" => "sd",
"prefix_patternb" => "",
"test" => "prefix_patterna"
}
{"a" : "b", "prefix_patterna" : "", "prefix_patternb" : "bla"}
{
"message" => "{\"a\" : \"b\", \"prefix_patterna\" : \"\", \"prefix_patternb\" : \"bla\"}",
"#version" => "1",
"#timestamp" => "2016-09-15T09:48:36.359Z",
"host" => "pandaadb",
"a" => "b",
"prefix_patterna" => "",
"prefix_patternb" => "bla",
"test" => "prefix_patternb"
}
Note how the first test writes "prefix_patterna" while the second test writes "prefix_patternb".
I hope this solves your issue,
Artur
You can tag the match, (or add fields) by having multiple grok filters as follows.
It doesn't feel elegant, is not very scalable as it is prone to a lot of repetition (not DRY), but seems to be the only way to "flag" matches of complex patterns - especially predefined library patterns.
Note you have to add conditionals to the subsequent filters to avoid them being run too when previous filters have already matched. Otherwise you'll still get _grokparsefailure tags for the later filters. Source
You also need to remove the failure tags of all but the final "else" filter. Otherwise you will get spurious _grokparsefailures e.g. from A when B or C matches. Source
grok {
patterns_dir => ["/location/of/patterns"]
match => { "request" => "%{A}"
add_tag => [ "pattern_A" ]
add_field => { "pattern" => "A" } # another option
tag_on_failure => [ ] # prevent false failure tags
}
if ("pattern_A" not in [tags]) {
grok {
patterns_dir => ["/location/of/patterns"]
match => { "request" => "%{B}"
add_tag => [ "pattern_B" ]
tag_on_failure => [ ] # prevent false failure tags
}
}
if (["pattern_A","pattern_B"] not in [tags]) {
grok {
patterns_dir => ["/location/of/patterns"]
match => { "request" => "%{C}"
add_tag => [ "pattern_C" ]
}
}
There may be ways to simplify / tune this, but I'm not an expert (yet!).

Retrieving RESTful GET parameters in logstash

I am trying to get logstash to parse key-value pairs in an HTTP get request from my ELB log files.
the request field looks like
http://aaa.bbb/get?a=1&b=2
I'd like there to be a field for a and b in the log line above, and I am having trouble figuring it out.
My logstash conf (formatted for clarity) is below which does not load any additional key fields. I assume that I need to split off the address portion of the URI, but have not figured that out.
input {
file {
path => "/home/ubuntu/logs/**/*.log"
type => "elb"
start_position => "beginning"
sincedb_path => "log_sincedb"
}
}
filter {
if [type] == "elb" {
grok {
match => [ "message", "%{TIMESTAMP_ISO8601:timestamp}
%{NOTSPACE:loadbalancer} %{IP:client_ip}:%{NUMBER:client_port:int}
%{IP:backend_ip}:%{NUMBER:backend_port:int}
%{NUMBER:request_processing_time:float}
%{NUMBER:backend_processing_time:float}
%{NUMBER:response_processing_time:float}
%{NUMBER:elb_status_code:int}
%{NUMBER:backend_status_code:int}
%{NUMBER:received_bytes:int} %{NUMBER:sent_bytes:int}
%{QS:request}" ]
}
date {
match => [ "timestamp", "ISO8601" ]
}
kv {
field_split => "&?"
source => "request"
exclude_keys => ["callback"]
}
}
}
output {
elasticsearch { host => localhost }
}
kv will take a URL and split out the params. This config works:
input {
stdin { }
}
filter {
mutate {
add_field => { "request" => "http://aaa.bbb/get?a=1&b=2" }
}
kv {
field_split => "&?"
source => "request"
}
}
output {
stdout {
codec => rubydebug
}
}
stdout shows:
{
"request" => "http://aaa.bbb/get?a=1&b=2",
"a" => "1",
"b" => "2"
}
That said, I would encourage you to create your own versions of the default URI patterns so that they set fields. You can then pass the querystring field off to kv. It's cleaner that way.
UPDATE:
For "make your own patterns", I meant to take the existing ones and modify them as needed. In logstash 1.4, installing them was as easy as putting them in a new file the 'patterns' directory; I don't know about patterns for >1.4 yet.
MY_URIPATHPARAM %{URIPATH}(?:%{URIPARAM:myuriparams})?
MY_URI %{URIPROTO}://(?:%{USER}(?::[^#]*)?#)?(?:%{URIHOST})?(?:%{MY_URIPATHPARAM})?
Then you could use MY_URI in your grok{} pattern and it would create a field called myuriparams that you could feed to kv{}.

logstash : how to extract data from log4j message?

I try to extract data from my log4j message with logstash.
The message look like this :
Method findAll - Start by : bokc
I would like to extract the method name : "findAll" and the user "bokc".
How can I do this?
I use logstash 1.5.2 and my config is :
input {
log4j {
mode => "server"
type => "log4j-artemis"
port => 4560
}
}
filter {
multiline {
type => "log4j-artemis"
pattern => "^\\s"
what => "previous"
}
mutate {
add_field => [ "source_ip", "%{host}" ]
}
}
Use a grok filter:
filter {
grok {
match => [
"message",
"^Method %{WORD:method} - Start by : %{USER:user}"
]
tag_on_failure => []
}
}
This extracts the two words into the fields "method" and "user". The setting of tag_on_failure makes sure that non-matching messages aren't tagged with _grokparsefailure. Since most messages aren't supposed to match the pattern it doesn't make sense to mark them as failures.

logstash output not showing the desired timestamp

I am trying to get the desired time stamp format from logstash output. I can''t get that if I use this format in syslog
Please share your thoughts about convert to the other format that’s in the _source field like Yyyy-mm-ddThh:mm:ss.sssZ format?
filter {
grok {
match => [ "logdate", "Yyyy-mm-ddThh:mm:ss.sssZ" ]
overwrite => ["host", "message"]
}
_source: {
message: "activity_log: {"created_at":1421114642210,"actor_ip":"192.168.1.1","note":"From system","user":"4561c9d7aaa9705a25f66d","user_id":null,"actor":"4561c9d7aaa9705a25f66d","actor_id":null,"org_id":null,"action":"user.failed_login","data":{"transaction_id":"d6768c473e366594","name":"user.failed_login","timing":{"start":1422127860691,"end":14288720480691,"duration":0.00257},"actor_locatio
I am using this code in syslog file
filter {
if [message] =~ /^activity_log: / {
grok {
match => ["message", "^activity_log: %{GREEDYDATA:json_message}"]
}
json {
source => "json_message"
remove_field => "json_message"
}
date {
match => ["created_at", "UNIX_MS"]
}
mutate {
rename => ["[json][repo]", "repo"]
remove_field => "json"
}
}
}
output {
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}
thanks
"message" => "<134>feb 1 20:06:12 {\"created_at\":1422765535789, pid=5450 tid=28643 version=b0b45ac proto=http ip=192.168.1.1 duration_ms=0.165809 fs_sent=0 fs_recv=0 client_recv=386 client_sent=0 log_level=INFO msg=\"http op done: (401)\" code=401" }
"#version" => "1",
"#timestamp" => "2015-02-01T20:06:12.726Z",
"type" => "activity_log",
"host" => "192.168.1.1"
The pattern in your grok filter doesn't make sense. You're using a Joda-Time pattern (normally used for the date filter) and not a grok pattern.
It seems your message field contains a JSON object. That's good, because it makes it easy to parse. Extract the part that comes after "activity_log: " to a temporary json_message field,
grok {
match => ["message", "^activity_log: %{GREEDYDATA:json_message}"]
}
and parse that field as JSON with the json filter (removing the temporary field if the operation was successful):
json {
source => "json_message"
remove_field => ["json_message"]
}
Now you should have the fields from the original message field at the top level of your message, including the created_at field with the timestamp you want to extract. That number is the number of milliseconds since the epoch so you can use the UNIX_MS pattern in a date filter to extract it into #timestamp:
date {
match => ["created_at", "UNIX_MS"]
}

Resources