converting a nested field to a list json in logstash - logstash

I want to convert a field to a json list, sg like this:
"person": {
"name":"XX",
"adress":"124"
}
to
"person": [{"name":"XX",
"adress":"124"}]
Thank you for help.

A bit of ruby magic will do here:
input {
stdin{}
}
filter {
ruby {
code => "
require 'json'
event['res'] = [JSON.parse(event['message'])['person']]
"
}
}
output {
stdout { codec => rubydebug }
}
This will simply parse your message field containing your Json document, then extract the person object and add it to a field.
The test looks as such:
artur#pandaadb:~/dev/logstash$ ./logstash-2.3.2/bin/logstash -f conf_json_list/
Settings: Default pipeline workers: 8
Pipeline main started
{ "person": { "name":"XX", "adress":"124" }}
{
"message" => "{ \"person\": { \"name\":\"XX\", \"adress\":\"124\" }}",
"#version" => "1",
"#timestamp" => "2017-03-15T11:34:37.424Z",
"host" => "pandaadb",
"res" => [
[0] {
"name" => "XX",
"adress" => "124"
}
]
}
As you can see, your hash now lives in a list on index 0.
Hope that helps,
Artur

Related

LogStash Conf | Drop Empty Lines

The contents of LogStash's conf file looks like this:
input {
beats {
port => 5044
}
file {
path => "/usr/share/logstash/iway_logs/*"
start_position => "beginning"
sincedb_path => "/dev/null"
#ignore_older => 0
codec => multiline {
pattern => "^\[%{NOTSPACE:timestamp}\]"
negate => true
what => "previous"
max_lines => 2500
}
}
}
filter {
grok {
match => { "message" =>
['(?m)\[%{NOTSPACE:timestamp}\]%{SPACE}%{WORD:level}%{SPACE}\(%{NOTSPACE:entity}\)%{SPACE}%{GREEDYDATA:rawlog}'
]
}
}
date {
match => [ "timestamp", "yyyy-MM-dd'T'HH:mm:ss.SSS"]
target => "#timestamp"
}
grok {
match => { "entity" => ['(?:W.%{GREEDYDATA:channel}:%{GREEDYDATA:inlet}:%{GREEDYDATA:listener}\.%{GREEDYDATA:workerid}|W.%{GREEDYDATA:channel}\.%{GREEDYDATA:workerid}|%{GREEDYDATA:channel}:%{GREEDYDATA:inlet}:%{GREEDYDATA:listener}\.%{GREEDYDATA:workerid}|%{GREEDYDATA:channel}:%{GREEDYDATA:inlet}:%{GREEDYDATA:listener}|%{GREEDYDATA:channel})']
}
}
dissect {
mapping => {
"[log][file][path]" => "/usr/share/logstash/iway_logs/%{serverName}#%{configName}#%{?ignore}.log"
}
}
}
output {
elasticsearch {
hosts => "${ELASTICSEARCH_HOST_PORT}"
index => "iway_"
user => "${ELASTIC_USERNAME}"
password => "${ELASTIC_PASSWORD}"
ssl => true
ssl_certificate_verification => false
cacert => "/certs/ca.crt"
}
}
As one can make out, the idea is to parse a custom log employing multiline extraction. The extraction does its job. The log occasionally contains an empty first line. So:
[2022-11-29T12:23:15.073] DEBUG (manager) Generic XPath iFL functions use full XPath 1.0 syntax
[2022-11-29T12:23:15.074] DEBUG (manager) XPath 1.0 iFL functions use iWay's full syntax implementation
which naturally is causing Kibana to report an empty line:
In an attempt to supress this line from being sent to ES, I added the following as a last filter item:
if ![message] {
drop { }
}
if [message] =~ /^\s*$/ {
drop { }
}
The resulting JSON payload to ES:
{
"#timestamp": [
"2022-12-09T14:09:35.616Z"
],
"#version": [
"1"
],
"#version.keyword": [
"1"
],
"event.original": [
"\r"
],
"event.original.keyword": [
"\r"
],
"host.name": [
"xxx"
],
"host.name.keyword": [
"xxx"
],
"log.file.path": [
"/usr/share/logstash/iway_logs/localhost#iCLP#iway_2022-11-29T12_23_33.log"
],
"log.file.path.keyword": [
"/usr/share/logstash/iway_logs/localhost#iCLP#iway_2022-11-29T12_23_33.log"
],
"message": [
"\r"
],
"message.keyword": [
"\r"
],
"tags": [
"_grokparsefailure"
],
"tags.keyword": [
"_grokparsefailure"
],
"_id": "oRc494QBirnaojU7W0Uf",
"_index": "iway_",
"_score": null
}
While this does drop the empty first line, it also unfortunately interferes with the multiline operation on other lines. In other words, the multiline operation does not work anymore. What am I doing incorrectly?
Use of the following variation resolved the issue:
if [message] =~ /\A\s*\Z/ {
drop { }
}
This solution is based on Badger's answer provided on the Logstash forums, where this question was raised as well.

Logstash Filter JSON syslog message field is not getting parsed

I cannot parse the incoming Syslog by JSON. The message field is not getting parsed. I tried JSON filter using addfield and also with mutate but no luck. I used GROK to parse specific fields but the message field has keys and values. How to parse the below message field into JSON
conf file
input {
file {
path => "/opt/log/sample/*.txt"
codec => "plain" # { format => "%{message}" }
}
}
filter {
# mutate { gsub => [ "message","(\")", "" ] }
mutate { gsub => [ "message","(\\")", "" ] }
json {
source => "message"
}
}
output {
file {
path => "/opt/log/out/out.txt"
codec => json_lines
}
stdout {}
}
GROK
%{TIME:timestamp} %{HOST:host} %{GREEDYDATA:message}
GROK output
"timestamp": [
[
"18:11:58"
]
],
"host": [
[
"myhost.aco.mydomain.net"
]
],
"message": [
[
"{destinationPort:90,exception:-,totalByteUsage:0,sourcePort:160,extension:.com\\\\/,contentTypeHeader:-,callout:0,scheme:http,reportingGroup:0,requestMethod:GET,privateIp:-,sAction:Allowed,sourceIpAddress:10.10.10.10,description:-,categoryName:News,sandBoxDecoded:-,urlLogId:0,responseCode:0,sandboxResult:-,computerName:-,totalByteCount:0,audit:0,host:www.local.com,action:Allowed,useTime:0,upstreamByteUsage:0,uriPath:\\\\/,computerMacAddress:00:00:00:00:00:00,direction:0,myboss:myhost,malware:0,ipAddress:10.10.10.10,userAgent:-,publicIp:-,url:http:\\\\/\\\\/www.local.com\\\\/,logTime:2022-07-12,referrerUrl:-,mde:-,sha256Sum:-,macAddress:00:00:00:00:00:00,filename:-,uriQuery:-,filteringGroupName:Default Catch All,downstreamByteUsage:0,cncFlag:0,location:-,time:18:11:57,username:*10.10.10.10}""
]
]
}

Logstash make a copy a nested field with mutate.add_field

I wanted to make a copy of a nested field in a Logstash filter but I can't figure out the correct syntax.
Here is what I try:
incorrect syntax:
mutate {
add_field => { "received_from" => %{beat.hostname} }
}
beat.hostname is not replaced
mutate {
add_field => { "received_from" => "%{beat.hostname}" }
}
beat.hostname is not replaced
mutate {
add_field => { "received_from" => "%{[beat][hostname]}" }
}
beat.hostname is not replaced
mutate {
add_field => { "received_from" => "%[beat][hostname]" }
}
No way. If I give a non nested field it works as expected.
The data structure received by logstash is the following:
{
"#timestamp" => "2016-08-24T13:01:28.369Z",
"beat" => {
"hostname" => "etg-dbs-master-tmp",
"name" => "etg-dbs-master-tmp"
},
"count" => 1,
"fs" => {
"device_name" => "/dev/vdb",
"total" => 5150212096,
"used" => 99287040,
"used_p" => 0.02,
"free" => 5050925056,
"avail" => 4765712384,
"files" => 327680,
"free_files" => 326476,
"mount_point" => "/opt/ws-etg/datas"
},
"type" => "filesystem",
"#version" => "1",
"tags" => [
[0] "topbeat"
],
"received_at" => "2016-08-24T13:01:28.369Z",
"received_from" => "%[beat][hostname]"
}
EDIT:
Since you didn't show your input message I worked off your output. In your output the field you are trying to copy into already exists, which is why you need to use replace. If it does not exist, you do in deed need to use add_field. I updated my answer for both cases.
EDIT 2: I realised that your problem might be to access the value that is nested, so I added that as well :)
you are using the mutate filter wrong/backwards.
First mistake:
You want to replace a field, not add one. In the docs, it gives you the "replace" option. See: https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html#plugins-filters-mutate-replace
Second mistake, you are using the syntax in reverse. It appears that you believe this is true:
"text I want to write" => "Field I want to write it in"
While this is true:
"myDestinationFieldName" => "My Value to be in the field"
With this knowledge, we can now do this:
mutate {
replace => { "[test][a]" => "%{s}"}
}
or if you want to actually add a NEW NOT EXISTING FIELD:
mutate {
add_field => {"[test][myNewField]" => "%{s}"}
}
Or add a new existing field with the value of a nested field:
mutate {
add_field => {"some" => "%{[test][a]}"}
}
Or more details, in my example:
input {
stdin {
}
}
filter {
json {
source => "message"
}
mutate {
replace => { "[test][a]" => "%{s}"}
add_field => {"[test][myNewField]" => "%{s}"}
add_field => {"some" => "%{[test][a]}"}
}
}
output {
stdout { codec => rubydebug }
}
This example takes stdin and outputs to stdout. It uses a json filter to parse the message, and then the mutate filter to replace the nested field. I also add a completely new field in the nested test object.
And finally creates a new field "some" that has the value of test.a
So for this message:
{"test" : { "a": "hello"}, "s" : "to_Repalce"}
We want to replace test.a (value: "Hello") with s (Value: "to_Repalce"), and add a field test.myNewField with the value of s.
On my terminal:
artur#pandaadb:~/dev/logstash$ ./logstash-2.3.2/bin/logstash -f conf2/
Settings: Default pipeline workers: 8
Pipeline main started
{"test" : { "a": "hello"}, "s" : "to_Repalce"}
{
"message" => "{\"test\" : { \"a\": \"hello\"}, \"s\" : \"to_Repalce\"}",
"#version" => "1",
"#timestamp" => "2016-08-24T14:39:52.002Z",
"host" => "pandaadb",
"test" => {
"a" => "to_Repalce",
"myNewField" => "to_Repalce"
},
"s" => "to_Repalce"
"some" => "to_Repalce"
}
The value has succesfully been replaced.
A field "some" with the replaces value has been added
A new field in the nested array has been added.
if you use add_field, it will convert a into an array and append your value there.
Hope this solves your issue,
Artur

Use Logstash with HTML log

I'm new to Logstash, trying to use it to parse a HTML log file.
I need to output only the log lines, i.e. ignore preceding JS, CSS and HTML that are also included in the file.
A log line in the file looks like this:
<tr bgcolor="tomato"><td>Jan 28<br>13:52:25.692</td><td>Jan 28<br>13:52:23.950</td><td>qtp114615276-1648 [POST] [call_id:-8009072655119858507]</td><td>REST</td><td>sa</td><td>0.0.0.0</td><td>ERR</td><td>ProjectValidator.validate(36)</td><td>Project does not exist</td></tr>
I have no problem getting all the lines, but I would like to have an output which contains only the relevant ones, without HTML tags, and looks something like that:
{
"db_timestamp": "2015-01-28 13:52:25.692",
"server_timestamp": "2015-01-28 13:52:25.950",
"node": "qtp114615276-1648 [POST] [call_id:-8009072655119858507]",
"thread": "REST",
"user": "sa",
"ip": "0.0.0.0",
"level": "ERR",
"method": "ProjectValidator.validate(36)",
"message": "Project does not exist"
}
My Logstash configuration is:
input {
file {
type => "request"
path => "<some path>/*.log"
start_position => "beginning"
}
file {
type => "log"
path => "<some path>/*.html"
start_position => "beginning"
}
}
filter {
if [type] == "log" {
grok {
match => [ WHAT SHOULD I PUT HERE??? ]
}
}
}
output {
stdout {}
if [type] == "request" {
http {
http_method => "post"
url => "http://<some url>"
mapping => ["type", "request", "host" ,"%{host}", "timestamp", "%{#timestamp}", "message", "%{message}"]
}
}
if [type] == "log" {
http {
http_method => "post"
url => "http://<some url>"
mapping => [ ALSO WHAT SHOULD I PUT HERE??? ]
}
}
}
Is there a way to do that? So far I haven't found any relevant documentation or samples.
Thanks!
Finally figured out the answer.
Not sure this is the best or most elegant solution, but it works.
I changed the http output format to "message", which enabled me to override and format the whole message as JSON, instead of using mapping. Also, found out how to name parameters in the grok filter and use them in the output.
This is the new Logstash configuration file:
input {
file {
type => "request"
path => "<some path>/*.log"
start_position => "beginning"
}
file {
type => "log"
path => "<some path>/*.html"
start_position => "beginning"
}
}
filter {
if [type] == "log" {
grok {
match => { "message" => "<tr bgcolor=.*><td>%{MONTH:db_date}%{SPACE}%{MONTHDAY:db_date}<br>%{TIME:db_date}</td><td>%{MONTH:alm_date}%{SPACE}%{MONTHDAY:alm_date}<br>%{TIME:alm_date}</td><td>%{DATA:thread}</td><td>%{DATA:req_type}</td><td>%{DATA:username}</td><td>%{IP:ip}</td><td>%{DATA:level}</td><td>%{DATA:method}</td><td>%{DATA:err_message}</td></tr>" }
}
}
}
output { stdout { codec => rubydebug }
if [type] == "request" {
http {
http_method => "post"
url => "http://<some URL>"
mapping => ["type", "request", "host" ,"%{host}", "timestamp", "%{#timestamp}", "message", "%{message}"]
}
}
if [type] == "log" {
http {
format => "message"
content_type => "application/json"
http_method => "post"
url => "http://<some URL>"
message=> '{
"db_date":"%{db_date}",
"alm_date":"%{alm_date}",
"thread": "%{thread}",
"req_type": "%{req_type}",
"username": "%{username}",
"ip": "%{ip}",
"level": "%{level}",
"method": "%{method}",
"message": "%{err_message}"
}'
}
}
}
Note the single quote for the http message block and the double quotes for the parameters inside this block.
For anyone parsing HP ALM logs, the following Logstash filter will do the work:
grok {
break_on_match => true
match => [ "message", "<tr bgcolor=.*><td>%{MONTH:db_date_mon}%{SPACE}%{MONTHDAY:db_date_day}<br>%{TIME:db_date_time}<\/td><td>%{MONTH:alm_date_mon}%{SPACE}%{MONTHDAY:alm_date_day}<br>%{TIME:alm_date_time}<\/td><td>(?<thread_col1>.*?)<\/td><td>(?<request_type>.*?)<\/td><td>(?<login>.*?)<\/td><td>(?<ip>.*?)<\/td><td>(?<level>.*?)<\/td><td>(?<method>.*?)<\/td><td>(?m:(?<log_message>.*?))</td></tr>" ]
}
mutate {
add_field => ["db_date", "%{db_date_mon} %{db_date_day}"]
add_field => ["alm_date", "%{alm_date_mon} %{alm_date_day}"]
remove_field => [ "db_date_mon", "db_date_day", "alm_date_mon", "alm_date_day" ]
gsub => [
"log_message", "<br>", "
"
]
gsub => [
"log_message", "<p>", " "
]
}
Tested and working fine with Logstash 2.4.0

how to assign particular value from logstash output

In logstash I am getting this output, however, I am trying to get repo: "username/logstashrepo" only from this output. Please share your thoughts how to grep only that value and assign to variable.
message: "github_audit: {"actor_ip":"192.168.1.1","from":"repositories#create","actor":"username","repo":"username/logstashrepo","user":"username","created_at":1416299104782,"action":"repo.create","user_id":1033,"repo_id":44744,"actor_id":1033,"data":{"actor_location":{"location":{"lat":null,"lon":null}}}}",
#version: "1",
#timestamp: "2014-11-18T08:25:05.427Z",
host: "15-274-145-63",
type: "syslog",
syslog5424_pri: "190",
timestamp: "Nov 18 00:25:05",
actor_ip: "192.168.1.1",
from: "repositories#create",
actor: "username",
repo: "username/logstashrepo",
user: "username",
created_at: 1416299104782,
action: "repo.create",
user_id: 1033,
repo_id: 44744,
actor_id: 1033,
I am using this in my config file:
input {
tcp {
port => 8088
type => syslog
}
udp {
port => 8088
type => syslog
}
}
filter {
grok {
match => [
"message",
"%{SYSLOG5424PRI}%{SYSLOGTIMESTAMP:timestamp} %{HOSTNAME:host} %{GREEDYDATA:message}"
]
overwrite => ["host", "message"]
}
if [message] =~ /^github_audit: / {
grok {
match => ["message", "^github_audit: %{GREEDYDATA:json_payload}"]
}
json {
source => "json_payload"
remove_field => "json_payload"
}
}
}
output {
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}
I actually posted the question here, for some reason I can't edit and followup.
how to grep particulr field from logstash output
You can have the json filter store the expanded JSON object in a subfield. Use the mutate filter to move the "repo" field into the toplevel and delete the whole subfield. Partial example from the json filter and onwards:
json {
source => "json_payload"
target => "json"
remove_field => "json_payload"
}
mutate {
rename => ["[json][repo]", "repo"]
remove_field => "json"
}

Resources