I have file containing series of such messages:
component+branch.job 2014-09-04_21:24:46 2014-09-04_21:24:49
It is string, some white spaces, first date and time, some white spaces and second date and time. Currently I'm using such filter:
grok {
match => [ "message", "%{WORD:componentName}\+%{WORD:branchName}\.%{DATA:jobType}\s+20%{DATE:dateStart}_%{TIME:timeStart}\s+20%{DATE:dateStop}_%{TIME:timeStop}" ]
}
mutate {
add_field => {"tmp_start_timestamp" => "20%{dateStart}_%{timeStart}"}
add_field => {"tmp_stop_timestamp" => "20%{dateStop}_%{timeStop}"}
}
date {
match => [ "tmp_start_timestamp", "YYYY-MM-dd_HH:mm:ss" ]
add_tag => [ "jobStarted" ]
}
date {
match => [ "tmp_stop_timestamp", "YYYY-MM-dd_HH:mm:ss" ]
target => "stop_timestamp"
remove_field => ["tmp_stop_timestamp", "tmp_start_timestamp", "dateStart", "timeStart", "dateStop", "timeStop"]
add_tag => [ "jobStopped" ]
}
elapsed {
start_tag => "jobStarted"
end_tag => "jobStopped"
unique_id_field => "message"
}
As result I receive "#timestamp" and "stop_timestamp" fields with date time data and two tags, without elapsed time calculation. What I'm missing?
UPDATE
I tried with splitting (as #Rumbles suggested) event on two separate events, but somehow logstash creates two the same events:
input {
stdin { type => "time" }
}
filter {
grok {
match => [ "message", "%{WORD:componentName}\+%{WORD:branchName}\.%{DATA:jobType}\s+20%{DATE:dateStart}_%{TIME:timeStart}\s+20%{DATE:dateStop}_%{TIME:timeStop}" ]
}
mutate {
add_field => {"tmp_start_timestamp" => "20%{dateStart}_%{timeStart}"}
add_field => {"tmp_stop_timestamp" => "20%{dateStop}_%{timeStop}"}
update => [ "type", "start" ]
}
clone {
clones => ["stop"]
}
if [type] == "start" {
date {
match => [ "tmp_start_timestamp", "YYYY-MM-dd_HH:mm:ss" ]
target => ["start_timestamp"]
add_tag => [ "jobStarted" ]
}
}
if [type] == "stop" {
date {
match => [ "tmp_stop_timestamp", "YYYY-MM-dd_HH:mm:ss" ]
target => "stop_timestamp"
remove_field => ["tmp_stop_timestamp", "tmp_start_timestamp", "dateStart", "timeStart", "dateStop", "timeStop"]
add_tag => [ "jobStopped" ]
}
}
elapsed {
start_tag => "jobStarted"
end_tag => "jobStopped"
unique_id_field => "message"
timeout => 15
}
}
output {
stdout { codec => rubydebug }
}
I've never used this filter, however I have just had a quick read of the documentation, and I think I understand the issue you are having.
From your description I believe you are trying to run the elapsed filter on one event, from the documentation it would appear that the filter is expecting 2 events, one with the starting time the second with the ending time, with a common id helping the filter to identify when the 2 events match up:
The events managed by this filter must have some particular properties. The event describing the start of the task (the “start event”) must contain a tag equal to ‘start_tag’. On the other side, the event describing the end of the task (the “end event”) must contain a tag equal to ‘end_tag’. Both these two kinds of event need to own an ID field which identify uniquely that particular task. The name of this field is stored in ‘unique_id_field’.
Each message is considered an event, so you would need to split your messages in to two events and have each pair of events have a unique identifier to help the filter to link them back together. It's not exactly a tidy solution (split your event in to two events, and then reconnect them again later) there may be a better solution to this that I am not aware of.
Related
I inherited a logstash config as follows. I do not want to do major changes in this because I do not want to break anything that is working. The metrics are sent as logs with json in format - "metric": "metricname", "value": "int". This has been working great. However, there is a requirement to have a string in value for a new metric. It is not really a metric but to indicate the state of the processing in string. Based on the following filter, it converts everything to integer and any string in value will be converted to 0. The requirement is that if the value is a string, it shouldn't attempt convert. Thank you!
input {
beats {
port => 5044
}
}
filter {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:ts} - M_%{DATA:task}_%{NUMBER:thread} - INFO - %{GREEDYDATA:jmetric}"}
remove_field => [ "message", "ecs", "original", "agent", "log", "host", "path" ]
break_on_match => false
}
if "_grokparsefailure" in [tags] {
drop {}
}
date {
match => ["ts", "ISO8601"]
target => "#timestamp"
}
json {
source => "jmetric"
remove_field => "jmetric"
}
split {
field => "points"
add_field => {
"metric" => "%{[points][metric]}"
"value" => "%{[points][value]}"
}
remove_field => [ "points", "event", "tags", "ts", "stream", "input" ]
}
mutate {
convert => { "value" => "integer" }
convert => { "thread" => "integer" }
}
}
You should use index mappings for this mainly.
Even if you handle things in logstash, elasticsearch will - if configured with the defaults - do dynamic mapping, which may work against any configuration you do in logstash.
See Elasticsearch index templates
An index template is a way to tell Elasticsearch how to configure an index when it is created.
...
Index templates can contain a collection of component templates, as well as directly specify settings, mappings, and aliases.
Mappings are pr index! This means that when you apply new mapping, you will have to create a new index. You can "rollover" to a new index, or delete / import your data again. What you do depends on your data, how you receive it, etc. ymmv...
No matter what, if your index has the wrong mapping you will need to create a new index to get the new mapping.
PS! If you have a lot of legacy data take a look at the reindex API for elasticsearch.
Logstash 7.8.1
I'm trying to create two documents from one input with logstash. Different templates, different output indexes. Everything worked fine until I tried to change value only on the cloned doc.
I need to have one field in both documents with different values - is it possible with clone filter plugin?
Doc A - [test][event]- trn
Doc B (cloned doc) - [test][event]- spn
I thought that it will work if I use remove_field and next add_field in clone plugin, but I'm afraid that there was problem with sorting - maybe remove_field method is called after add_field (the field was only removed, but not added with new value).
Next I tried to add value to cloned document first and than to original, but it always made an array with both values (orig and cloned) and I need to have only one value in that field:/.
Can someone help me please?
Config:
input {
file {
path => "/opt/test.log"
start_position => beginning
}
}
filter {
grok {
match => {"message" => "... grok...."
}
}
mutate {
add_field => {"[test][event]" => "trn"}
}
clone {
clones => ["cloned"]
#remove_field => [ "[test][event]" ] #remove the field completely
add_field => {"[test][event]" => "spn"} #not added
add_tag => [ "spn" ]
}
}
output {
if "spn" in [tags] {
elasticsearch {
index => "spn-%{+yyyy.MM}"
hosts => ["localhost:9200"]
template_name => "templ1"
}
stdout { codec => rubydebug }
} else {
elasticsearch {
index => "trn-%{+yyyy.MM}"
hosts => ["localhost:9200"]
template_name => "templ2"
}
stdout { codec => rubydebug }
}
}
If you want to make the field that is added conditional on whether the event is the clone or the original then check the [type] field.
clone { clones => ["cloned"] }
if [type] == "cloned" {
mutate { add_field => { "foo" => "spn" } }
} else {
mutate { add_field => { "foo" => "trn" } }
}
add_field is always done before remove_field.
I try to extract data from my log4j message with logstash.
The message look like this :
Method findAll - Start by : bokc
I would like to extract the method name : "findAll" and the user "bokc".
How can I do this?
I use logstash 1.5.2 and my config is :
input {
log4j {
mode => "server"
type => "log4j-artemis"
port => 4560
}
}
filter {
multiline {
type => "log4j-artemis"
pattern => "^\\s"
what => "previous"
}
mutate {
add_field => [ "source_ip", "%{host}" ]
}
}
Use a grok filter:
filter {
grok {
match => [
"message",
"^Method %{WORD:method} - Start by : %{USER:user}"
]
tag_on_failure => []
}
}
This extracts the two words into the fields "method" and "user". The setting of tag_on_failure makes sure that non-matching messages aren't tagged with _grokparsefailure. Since most messages aren't supposed to match the pattern it doesn't make sense to mark them as failures.
I have file containing series of such messages:
component+branch.job 2014-09-04_21:24:46 2014-09-04_21:24:49
It is string, some white spaces, first date and time, some white spaces and second date and time. Currently I'm using such filter:
filter {
grok {
match => [ "message", "%{WORD:componentName}\+%{WORD:branchName}\.%{WORD:jobType}\s+20%{DATE:dateStart}_%{TIME:timeStart}\s+20%{DATE:dateStop}_%{TIME:timeStop}" ]
}
}
I would like to convert dateStart and timeStart to #timestamp for that message.
I found that there is date filter but I don't know how to use it on two separate fields.
I have also tried something like this as filter:
date {
match => [ "message", "YYYY-MM-dd_HH:mm:ss" ]
}
but it didn't worked as expected.
Based on duplicate suggested by Magnus Bäck, I created solution for my problem. Solution was to mutate parsed data into one field:
mutate {
add_field => {"tmp_start_timestamp" => "20%{dateStart}_%{timeStart}"}
}
and then parse it as I suggested in my question.
So final solution looks like this:
filter {
grok {
match => [ "message", "%{WORD:componentName}\+%{WORD:branchName}\.%{DATA:jobType}\s+20%{DATE:dateStart}_%{TIME:timeStart}\s+20%{DATE:dateStop}_%{TIME:timeStop}" ]
}
mutate {
add_field => {"tmp_start_timestamp" => "20%{dateStart}_%{timeStart}"}
}
date {
match => [ "tmp_start_timestamp", "YYYY-MM-dd_HH:mm:ss" ]
}
}
I cannot get negative regexp expressions working within LogStash (as described in the docs)
Consider the following positive regex which works correctly to detect fields that have been assigned a value:
if [remote_ip] =~ /(.+)/ {
mutate { add_tag => ["ip"] }
}
However, the negative expression seems to return false even when the field is blank:
if [remote_ip] !~ /(.+)/ {
mutate { add_tag => ["no_ip"] }
}
Am I misunderstanding the usage?
Update - this was fuzzy thinking on my part. There were issues with my config file. If the rest of your config file is sane, the above should work.
This was fuzzy thinking on my part - there were issues with the rest of my config file.
Based on Ben Lim's example, I came up with an input that is easier to test:
input {
stdin { }
}
filter {
if [message] !~ /(.+)/ {
mutate { add_tag => ["blank_message"] }
}
if [noexist] !~ /(.+)/ {
mutate { add_tag => ["tag_does_not_exist"] }
}
}
output {
stdout {debug => true}
}
The output for a blank message is:
{
"message" => "",
"#version" => "1",
"#timestamp" => "2014-02-27T01:33:19.285Z",
"host" => "benchmark.example.com",
"tags" => [
[0] "blank_message",
[1] "tag_does_not_exist"
]
}
The output for a message with the content "test message" is:
test message
{
"message" => "test message",
"#version" => "1",
"#timestamp" => "2014-02-27T01:33:25.059Z",
"host" => "benchmark.example.com",
"tags" => [
[0] "tag_does_not_exist"
]
}
Thus, the "negative regex" /(.+)/ returns true only when the field is empty or the field does not exist.
The negative regex /(.*)/ will only return true when the field does not exist. If the field exists (whether empty or with values), the return value will be false.
Below is my configuration. The type field is not exist, therefore, the negative expression is return true.
input {
stdin {
}
}
filter {
if [type] !~ /(.+)/ {
mutate { add_tag => ["aa"] }
}
}
output {
stdout {debug => true}
}
The regexp /(.+)/ means it accepts everything, include blank. So, when the "type" field is exist, even the field value is blank, it also meet the regexp. Therefore, in your example, if the remote_ip field exist, your "negative expression" will always return false.