Extract fields from a json substring - logstash

Extract fields from a json substring - logstash - logstash

I'm fairly new to Logstash filtering stuff. I've below json string
{
"changed": false,
"msg": "Foo Facts: oma_phase: prd, oma_app: fsd, oma_apptype: obe, oma_componenttype: oltp, oma_componentname: -, oma_peak: pk99, oma_phaselevel: prd"
}
I would like to extract the fields oma_phase, oma_app, oma_apptype, oma_componenttype, oma_componentname, oma_peak & oma_phaselevel.
I've tried below native json filter,
filter {
if [type] == "ansible" {
json {
source => "ansible_result"
}
}
}
Here ansible_result is the key holding the above json value. However, there are many keys having different values but with the same ansible_result key. This is creating lot of index keys and I don't want that.
I would like to have some sort of filter which can match the substring Foo Facts and there after extracting the oma_* fields.
I somehow couldn't managed to do with grok filter to match the substring. It would be really great if you could help me with this.
Many thanks in advance..

Please try the following code:
filter {
json {
source => "message"
}
}
the ansible_result json will be considered as a message.

It was little difficult in the beginning but eventually managed to crack the grok.
\"msg\": \"Foo Facts: oma_phase: %{DATA:oma_phase}, oma_app: %{DATA:oma_app}, oma_apptype: %{DATA:oma_apptype},( oma_componenttype: %{DATA:oma_componenenttype},)? oma_componentname: %{DATA:oma_componenentname}, oma_peak: %{DATA:oma_peak}, oma_phaselevel: %{DATA:oma_phaselevel}\"
From the logs, I got to know oma_componenttype is missing for some logs. So I marked it as an optional field with ()?
It wouldn't have been possible without the help of below online parsers.
grokdebug
Grok Constructor

Related

Logstash Filter not working when something has a period in the name

So I need to write a filter that changes all the periods in field names to underscores. I am using mutate, and I can do some things and not other things. For reference here is my current output in Kibana.
See those fields that say "packet.event-id" and so forth? I need to rename all of those. Here is my filter that I wrote and I do not know why it doesn't work
filter {
json {
source => "message"
}
mutate {
add_field => { "pooooo" => "AW CMON" }
rename => { "offset" = "my_offset" }
rename => { "packet.event-id" => "my_packet_event_id" }
}
}
The problem is that I CAN add a field, and the renaming of "offset" WORKS. But when I try and do the packet one nothing changes. I feel like this should be simple and I am very confused as to why only the one with a period in it doesn't work.
I have refreshed the index in Kibana, and still nothing changes. Anyone have a solution?

When they show up in dotted notation in Kibana, it's because there is structure to the document you originally loaded in json format.
To access the document structure using logstash, you need to use [packet][event-id] in your rename filter instead of packet.event-id.
For example:
filter {
mutate {
rename => {
"[packet][event-id]" => "my_packet_event_id"
}
}
}

You can do the JSON parsing directly in Filebeat by adding a few lines of config to your filebeat.yml.
filebeat.prospectors:
- paths:
- /var/log/snort/snort.alert
json.keys_under_root: true
json.add_error_key: true
json.message_key: log
You shouldn't need to rename the fields. If you do need to access a field in Logstash you can reference the field as [packet][length] for example. See Logstash field references for documentation on the syntax.
And by the way, there is a de_dot for replacing dots in field names, but that shouldn't be applied in this case.

have a grok filter create nested fields as a result

I have a drupal watchdog syslog file that I want to parse into essentially two nested fields, the syslog part and the message part so that I get this result
syslogpart: {
timestamp: "",
host: "",
...
},
messagepart:{
parsedfield1: "",
parsedfield2: "",
...
}
I tried making a custom pattern that looks like this:
DRUPALSYSLOG (%{SYSLOGTIMESTAMP:date} %{SYSLOGHOST:logsource} %{WORD:program}: %{URL:domain}\|%{EPOCH:epoch}\|%{WORD:instigator}\|%{IP:ip}\|%{URL:referrer}\|%{URL:request}\|(?<user_id>\d+)\|\|)
and then run match => ['message', '%{DRUPALSYSLOG:drupal}'}
but I don't get a nested response, I get a textblock drupal: "ALL THE MATCHING FIELDS IN ONE STRING" and then all the matches separately as well but not nested under drupal but rather on the same level.

Actually, you can do something like that in your pattern config
%{WORD:[drupal][program]}
It will create the json object like
drupal:{
program: "..."
}

Yes, this is expected. I don't think there's a way to produce nested fields with grok. I suspect you'll have to use the mutate filter to move them into place.
mutate {
rename => {
"date" => "[drupal][date]"
"instigator" => "[drupal][instigator]"
...
}
}
If you have a lot of fields it might be more convenient to use a ruby filter. This is especially true if you prefix Drupal fields with e.g. "drupal." – then you'd write a filter to move all fields with that prefix into a subfield with the same name.

automatically map fields in syslog "message" section

Is it possible to automatically map fields for events I would receive by syslog, if they follow a format field1=value1 field2=value2 ... ? An example would be
name=john age=15
age=29 name=jane
name=mark car=porshe
(note that the fields are different and not always there)
One of the solutions I am considering is to send the syslog "message" part as JSON but I am not sure if it possible to automatically parse it (when the rest of the log is in syslog format). My current approach fails with _jsonparsefailure but I will keep trying
input {
tcp
{
port => 5514
type => "syslogandjson"
codec => json
}
}
filter{
json{
source => "message"
}
}
output ...

Fields with a key=value format can be parsed with the kv filter, but it doesn't support fields with double-quoted values, i.e.
key1=value1 key2="value2 with spaces" key3=value3
or (even worse)
key1=value1 key2=value2 with spaces key3=value3
won't turn out good.
Sending the message as JSON is way better, but as you've discovered you can't use the json codec since the codec applies to the whole message (timestamp and all) and not just the message part where your serialized JSON string can be found. You're on the right track with the json filter though. Just make sure you have that filter after the grok filter that parses the raw syslog message to extract timestamp, severity, and so on. You'll want something like this:
filter {
grok {
match => [...]
# Allow replacement of the original message field
overwrite => ["message"]
}
date {
...
}
json {
source => "message"
}
}
Since presumably not all messages you pick up are JSON messages you might want a conditional around the json filter. Or, attempt the JSON parsing of all messages but remove any _jsonparsefailure tag that the filter adds for messages it couldn't parse.

Multiple patterns in one log

So I wrote now several patterns for logs which are working. The thing is now, that I have these multiple logs, with multiple patterns, in one single file. How does logstash know what kind of pattern it has to use for which line in the log? ( I am using grok for my filtering ) And if you guys would be super kind, could you give me the link to the docs, because I weren't able to find anything regarding this :/

You could use multiple patterns for your grok filter,
grok {
match => ["fieldname", "pattern1", "pattern2", ..., "patternN"]
}
and they will be applied in order but a) it's not the best option performance-wise and b) you probably want to treat different types of logs differently anyway, so I suggest you use conditionals based on the type or tags of a message:
if [type] == "syslog" {
grok {
match => ["message", "your syslog pattern"]
}
}
Set the type in the input plugin.
The documentation for the currently released version of Logstash is at http://logstash.net/docs/1.4.2/. It probably doesn't address your question specifically but it can be inferred.

Write the most specific grok first and use this syntax:
grok {
match => {
"message" => [
#Most specific grok:
"%{TIMESTAMP_ISO8601:temp_date}%{SPACE}%{LOGLEVEL:log_level}%{UUID:user_id}",
#Less specific:
"%{TIMESTAMP_ISO8601:temp_date}%{SPACE}%{GREEDYDATA:log_message}"
]
}
}

Logstash conditional to check if tag exists?

Is there any way in logstash to use a conditional to check if a specific tag exists?
For example,
grok {
match => [
"message", "Some expression to match|%{GREEDYDATA:NOMATCHES}"
]
if NOMATCHES exists Do something.
How do I verify if NOMATCHES tag exists or not?
Thanks.

Just so we're clear: the config snippet you provided is setting a field, not a tag.
Logstash events can be thought of as a dictionary of fields. A field named tags is referenced by many plugins via add_tag and remove_tag operations.
You can check if a tag is set:
if "foo" in [tags] {
...
}
But you seem to want to check if a field contains anything:
if [NOMATCHES] =~ /.+/ {
...
}
The above will check that NOMATCHES exists and isn't empty.
Reference: configuration file overview.

The following test for existence also works [tested in Logstash 1.4.2], although it may not validate non-empty:
if [NOMATCHES] {
...
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Extract fields from a json substring - logstash - logstash

Please try the following code: filter { json { source => "message" } } the ansible_result json will be considered as a message.

Related

Logstash Filter not working when something has a period in the name

have a grok filter create nested fields as a result

automatically map fields in syslog "message" section

Multiple patterns in one log

Logstash conditional to check if tag exists?

Categories

Resources