grok pattern for log event - logstash

I'm a Logstash newbie and I've looked at numerous examples of grok patterns and I'm still struggling to acheive my goal which is to parse the following JSON formatted log event.
{
"#fields": {
"level": "DEBUG",
"mdc": {},
"file": "SearchServiceImpl.java",
"class": "com.blah.blah.service.impl.SearchServiceImpl",
"line_number": "767",
"method": "getUserSavedSearches"
},
"#timestamp": "2015-04-24T12:30:37.953+01:00",
"#message": "username: admin sessionid: 56cR73aBpuIBzRgIElzLUtJJ method_name: getUserSavedSearches",
"#source_host": "Kens-MacBook.local"
}
In particular I'd like to extract the session id and username.
I'm also hoping I can be pointed to detailed documentation explaining how to use Grok. (I've read the available docs on logstash etc).
Any help will be appreciated

First, your log format is in JSON. So, in your config you can use json codec to read the log. Then, use GROK to parse the username & session id.
input {
stdin{
codec => json
}
}
filter {
grok {
match => [
"#message", "username: %{USERNAME:username} sessionid: %{NOTSPACE:sessionId} method_name: %{WORD:method_name}"
]
}
}
output {
stdout { codec => rubydebug }
}
For more detailed document, you can use this site to help you try your grok pattern and this site for the grok pattern that can use.
Here is the sample output:
{
"#fields" => {
"level" => "DEBUG",
"mdc" => {},
"file" => "SearchServiceImpl.java",
"class" => "com.blah.blah.service.impl.SearchServiceImpl",
"line_number" => "767",
"method" => "getUserSavedSearches"
},
"#timestamp" => "2015-04-24T19:30:37.953+08:00",
"#message" => "username: admin sessionid: 56cR73aBpuIBzRgIElzLUtJJ method_name: getUserSavedSearches",
"#source_host" => "Kens-MacBook.local",
"#version" => "1",
"host" => "BEN_LIM",
"username" => "admin",
"sessionId" => "56cR73aBpuIBzRgIElzLUtJJ",
"method_name" => "getUserSavedSearches"
}

Related

Parsing a multiline bracket-wrapped JSON file in Logstash

I'm trying to parse JSON logs being generated from a system in the form of:
[{"name" : "test1", "state" : "success"},
{"name" : "test2", "state" : "fail"},
{"name" : "test3", "state" : "success"}]
This is a valid JSON file however I am not able to parse it using the json filter.
My current logstash.config file looks like this:
input {
file {
path => "C:/elastic_stack/json_stream/*.json"
start_position => "beginning"
sincedb_path => "NUL"
}
}
filter{
mutate {
gsub => [ "message","\]",""]
gsub => [ "message","\[",""]
gsub => [ "message","\},","}"]
}
json{
source=> "message"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstashindex"
}
stdout { codec => rubydebug }
}
I have tried adding a bunch of regex filters with gsub but since I'm pretty new I'm not sure if that's the cleanest and most efficient parsing solution. Any suggestion?

converting a nested field to a list json in logstash

I want to convert a field to a json list, sg like this:
"person": {
"name":"XX",
"adress":"124"
}
to
"person": [{"name":"XX",
"adress":"124"}]
Thank you for help.
A bit of ruby magic will do here:
input {
stdin{}
}
filter {
ruby {
code => "
require 'json'
event['res'] = [JSON.parse(event['message'])['person']]
"
}
}
output {
stdout { codec => rubydebug }
}
This will simply parse your message field containing your Json document, then extract the person object and add it to a field.
The test looks as such:
artur#pandaadb:~/dev/logstash$ ./logstash-2.3.2/bin/logstash -f conf_json_list/
Settings: Default pipeline workers: 8
Pipeline main started
{ "person": { "name":"XX", "adress":"124" }}
{
"message" => "{ \"person\": { \"name\":\"XX\", \"adress\":\"124\" }}",
"#version" => "1",
"#timestamp" => "2017-03-15T11:34:37.424Z",
"host" => "pandaadb",
"res" => [
[0] {
"name" => "XX",
"adress" => "124"
}
]
}
As you can see, your hash now lives in a list on index 0.
Hope that helps,
Artur

Use Logstash with HTML log

I'm new to Logstash, trying to use it to parse a HTML log file.
I need to output only the log lines, i.e. ignore preceding JS, CSS and HTML that are also included in the file.
A log line in the file looks like this:
<tr bgcolor="tomato"><td>Jan 28<br>13:52:25.692</td><td>Jan 28<br>13:52:23.950</td><td>qtp114615276-1648 [POST] [call_id:-8009072655119858507]</td><td>REST</td><td>sa</td><td>0.0.0.0</td><td>ERR</td><td>ProjectValidator.validate(36)</td><td>Project does not exist</td></tr>
I have no problem getting all the lines, but I would like to have an output which contains only the relevant ones, without HTML tags, and looks something like that:
{
"db_timestamp": "2015-01-28 13:52:25.692",
"server_timestamp": "2015-01-28 13:52:25.950",
"node": "qtp114615276-1648 [POST] [call_id:-8009072655119858507]",
"thread": "REST",
"user": "sa",
"ip": "0.0.0.0",
"level": "ERR",
"method": "ProjectValidator.validate(36)",
"message": "Project does not exist"
}
My Logstash configuration is:
input {
file {
type => "request"
path => "<some path>/*.log"
start_position => "beginning"
}
file {
type => "log"
path => "<some path>/*.html"
start_position => "beginning"
}
}
filter {
if [type] == "log" {
grok {
match => [ WHAT SHOULD I PUT HERE??? ]
}
}
}
output {
stdout {}
if [type] == "request" {
http {
http_method => "post"
url => "http://<some url>"
mapping => ["type", "request", "host" ,"%{host}", "timestamp", "%{#timestamp}", "message", "%{message}"]
}
}
if [type] == "log" {
http {
http_method => "post"
url => "http://<some url>"
mapping => [ ALSO WHAT SHOULD I PUT HERE??? ]
}
}
}
Is there a way to do that? So far I haven't found any relevant documentation or samples.
Thanks!
Finally figured out the answer.
Not sure this is the best or most elegant solution, but it works.
I changed the http output format to "message", which enabled me to override and format the whole message as JSON, instead of using mapping. Also, found out how to name parameters in the grok filter and use them in the output.
This is the new Logstash configuration file:
input {
file {
type => "request"
path => "<some path>/*.log"
start_position => "beginning"
}
file {
type => "log"
path => "<some path>/*.html"
start_position => "beginning"
}
}
filter {
if [type] == "log" {
grok {
match => { "message" => "<tr bgcolor=.*><td>%{MONTH:db_date}%{SPACE}%{MONTHDAY:db_date}<br>%{TIME:db_date}</td><td>%{MONTH:alm_date}%{SPACE}%{MONTHDAY:alm_date}<br>%{TIME:alm_date}</td><td>%{DATA:thread}</td><td>%{DATA:req_type}</td><td>%{DATA:username}</td><td>%{IP:ip}</td><td>%{DATA:level}</td><td>%{DATA:method}</td><td>%{DATA:err_message}</td></tr>" }
}
}
}
output { stdout { codec => rubydebug }
if [type] == "request" {
http {
http_method => "post"
url => "http://<some URL>"
mapping => ["type", "request", "host" ,"%{host}", "timestamp", "%{#timestamp}", "message", "%{message}"]
}
}
if [type] == "log" {
http {
format => "message"
content_type => "application/json"
http_method => "post"
url => "http://<some URL>"
message=> '{
"db_date":"%{db_date}",
"alm_date":"%{alm_date}",
"thread": "%{thread}",
"req_type": "%{req_type}",
"username": "%{username}",
"ip": "%{ip}",
"level": "%{level}",
"method": "%{method}",
"message": "%{err_message}"
}'
}
}
}
Note the single quote for the http message block and the double quotes for the parameters inside this block.
For anyone parsing HP ALM logs, the following Logstash filter will do the work:
grok {
break_on_match => true
match => [ "message", "<tr bgcolor=.*><td>%{MONTH:db_date_mon}%{SPACE}%{MONTHDAY:db_date_day}<br>%{TIME:db_date_time}<\/td><td>%{MONTH:alm_date_mon}%{SPACE}%{MONTHDAY:alm_date_day}<br>%{TIME:alm_date_time}<\/td><td>(?<thread_col1>.*?)<\/td><td>(?<request_type>.*?)<\/td><td>(?<login>.*?)<\/td><td>(?<ip>.*?)<\/td><td>(?<level>.*?)<\/td><td>(?<method>.*?)<\/td><td>(?m:(?<log_message>.*?))</td></tr>" ]
}
mutate {
add_field => ["db_date", "%{db_date_mon} %{db_date_day}"]
add_field => ["alm_date", "%{alm_date_mon} %{alm_date_day}"]
remove_field => [ "db_date_mon", "db_date_day", "alm_date_mon", "alm_date_day" ]
gsub => [
"log_message", "<br>", "
"
]
gsub => [
"log_message", "<p>", " "
]
}
Tested and working fine with Logstash 2.4.0

how to assign particular value from logstash output

In logstash I am getting this output, however, I am trying to get repo: "username/logstashrepo" only from this output. Please share your thoughts how to grep only that value and assign to variable.
message: "github_audit: {"actor_ip":"192.168.1.1","from":"repositories#create","actor":"username","repo":"username/logstashrepo","user":"username","created_at":1416299104782,"action":"repo.create","user_id":1033,"repo_id":44744,"actor_id":1033,"data":{"actor_location":{"location":{"lat":null,"lon":null}}}}",
#version: "1",
#timestamp: "2014-11-18T08:25:05.427Z",
host: "15-274-145-63",
type: "syslog",
syslog5424_pri: "190",
timestamp: "Nov 18 00:25:05",
actor_ip: "192.168.1.1",
from: "repositories#create",
actor: "username",
repo: "username/logstashrepo",
user: "username",
created_at: 1416299104782,
action: "repo.create",
user_id: 1033,
repo_id: 44744,
actor_id: 1033,
I am using this in my config file:
input {
tcp {
port => 8088
type => syslog
}
udp {
port => 8088
type => syslog
}
}
filter {
grok {
match => [
"message",
"%{SYSLOG5424PRI}%{SYSLOGTIMESTAMP:timestamp} %{HOSTNAME:host} %{GREEDYDATA:message}"
]
overwrite => ["host", "message"]
}
if [message] =~ /^github_audit: / {
grok {
match => ["message", "^github_audit: %{GREEDYDATA:json_payload}"]
}
json {
source => "json_payload"
remove_field => "json_payload"
}
}
}
output {
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}
I actually posted the question here, for some reason I can't edit and followup.
how to grep particulr field from logstash output
You can have the json filter store the expanded JSON object in a subfield. Use the mutate filter to move the "repo" field into the toplevel and delete the whole subfield. Partial example from the json filter and onwards:
json {
source => "json_payload"
target => "json"
remove_field => "json_payload"
}
mutate {
rename => ["[json][repo]", "repo"]
remove_field => "json"
}

how to grep particulr field from logstash output

I am trying to grep only few fields from this output from logstash 1.repositories#create 2.\"repo\":\"username/reponame\" . please share your ideas to grep particular info from this outpput and assign this to another variable
"message" => "<190>Nov 01 20:35:15 10-254-128-66 github_audit: {\"actor_ip\":\"192.168.1.1\",\"from\":\"repositories#create\",\"actor\":\"myuserid\",\"repo\":\"username/reponame\",\"action\":\"staff.repo_route\",\"created_at\":1516286634991,\"repo_id\":44743,\"actor_id\":1033,\"data\":{\"actor_location\":{\"location\":{\"lat\":null,\"lon\":null}}}}",
I am using this syslog.conf file to get the output.
input {
tcp {
port => 8088
type => syslog
}
udp {
port => 8088
type => syslog
}
}
filter {
if [type] == "syslog" {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp}"
}
grep {
match => { "message" => "repositories#create" }
}
}
}
output {
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}
I am not able to add my comments for your reply, thank you so much for your reply.
could you please share your ideas to get username: and repo: only from this output , i m trying assign the values from this particular output, thanks again
message: "github_audit: {"actor_ip":"192.168.1.1","from":"repositories#create","actor":"username","repo":"username/logstashrepo","user":"username","created_at":1416299104782,"action":"repo.create","user_id":1033,"repo_id":44744,"actor_id":1033,"data":{"actor_location":{"location":{"lat":null,"lon":null}}}}",
#version: "1",
#timestamp: "2014-11-18T08:25:05.427Z",
host: "15-274-145-63",
type: "syslog",
syslog5424_pri: "190",
timestamp: "Nov 18 00:25:05",
actor_ip: "10.239.37.185",
from: "repositories#create",
actor: "username",
repo: "username/logstashrepo",
user: "username",
created_at: 1416299104782,
action: "repo.create",
user_id: 1033,
repo_id: 44744,
actor_id: 1033,
Use a grok filter to extract the JSON payload into a separate field, then use a json filter to extract the fields from the JSON object. The example below works but only extracts the JSON payload from messages prefixed with "github_audit: ". I'm also guessing that the field after the timestamp is a hostname that should overwrite whatever might currently be in the "host" field. Don't forget to add a date filter to parse the string in the "timestamp" field into "#timestamp".
filter {
grok {
match => [
"message",
"%{SYSLOG5424PRI}%{SYSLOGTIMESTAMP:timestamp} %{HOSTNAME:host} %{GREEDYDATA:message}"
]
overwrite => ["host", "message"]
}
if [message] =~ /^github_audit: / {
grok {
match => ["message", "^github_audit: %{GREEDYDATA:json_payload}"]
}
json {
source => "json_payload"
remove_field => "json_payload"
}
}
}

Resources