I need to parse a tomcat log file and output it into several output files.
Each file is the result of a certain filter that will pick certain entries in the tomcat file that match a series of regexes or other transformation rule.
Currently I am doing this using a python script but it is not very flexible.
Is there a configurable tool for doing this?
I have looked into filebeat and logstash [none of which I am very familiar with] but it is not clear if it is possible to configure them to map a single input file into multiple output files each with a different filter/grok set of expressions.
Is it possible to achieve this with filebeat/logstash?
If all logs files are on the same servers you dont need Filebeat. Logstash can do the work.
Here an example of what your config logstash can look like.
In input you have you tomcat log file and you have multi output (json) depend of loglevel once logs have been parsed.
The grok is also an example you must define your own grok pattern depend on your log format.
input {
file {
path => "/var/log/tomcat.log"
}
}
filter {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:loglevel} - %{POSTFIX_SESSIONID:sessionId}: %{GREEDYDATA:messageText}" }
}
}
output {
if [loglevel] == "info" {
file {
codec => "json"
path => "/var/log/tomcat_info_parsed.log"
}
}
if [loglevel] == "warning" {
file {
codec => "json"
path => "/var/log/tomcat_warning_parsed.log"
}
}
}
I send json-formatted logs to my Logstash server. The log looks something like this (Note: The whole message is really on ONE line, but I show it in multi-line to ease reading)
2016-09-01T21:07:30.152Z 153.65.199.92
{
"type":"trm-system",
"host":"susralcent09",
"timestamp":"2016-09-01T17:17:35.018470-04:00",
"#version":"1",
"customer":"cf_cim",
"role":"app_server",
"sourcefile":"/usr/share/tomcat/dist/logs/trm-system.log",
"message":"some message"
}
What do I need to put in my Logstash configuration to get the "sourcefile" value, and ultimately get the filename, e.g., trm-system.log?
If you pump the hash field (w/o the timestamp) into ES it should recognize it.
If you want to do it inside a logstash pipeline you would use the json filter and point the source => to the second part of the line (possibly adding the timestamp prefix back in).
This results in all fields added to the current message, and you can access them directly or all combined:
Config:
input { stdin { } }
filter {
# split line in Timestamp and Json
grok { match => [ message , "%{NOTSPACE:ts} %{NOTSPACE:ip} %{GREEDYDATA:js}"] }
# parse json part (called "js") and add new field from above
json { source => "js" }
}
output {
# stdout { codec => rubydebug }
# you access fields directly with %{fieldname}:
stdout { codec => line { format => "sourcefile: %{sourcefile}"} }
}
Sample run
2016-09-01T21:07:30.152Z 153.65.199.92 { "sourcefile":"/usr" }
sourcefile: /usr
and with rubydebug (host and #timestamp removed):
{
"message" => "2016-09-01T21:07:30.152Z 153.65.199.92 { \"sourcefile\":\"/usr\" }",
"#version" => "1",
"ts" => "2016-09-01T21:07:30.152Z",
"ip" => "153.65.199.92",
"js" => "{ \"sourcefile\":\"/usr\" }",
"sourcefile" => "/usr"
}
As you can see, the field sourcefile is directly known with the value in the rubydebug output.
Depending on the source of your log records you might need to use the multiline codec as well. You might also want to delete the js field, rename the #timestamp to _parsedate and parse ts into the records timestamp (for Kibana to be happy). This is not shown in the sample. I would also remove message to save space.
I am totally new to Logstash. Can anyone please tell me the filter to add to the configuration file, to separate the following log line using Logstash?
"2011/08/10 09:47:23.449598,0.001199,udp,203.136.22.37,15306, <->,147.32.84.229,13363,CON,0,0,2,317,64,flow=Background-UDP-Established",
I want the above line to return a JSON object like the following:
{
TimeStamp: 2011/08/10 09:47:23.449598
Value: 0.001199
protocol: udp
IP: 203.136.22.37
...
}
Copy below text and write it to your conf file and run logstash.
It will take input from console and will output to the console in your desired format.
input {
stdin{
}
}
filter {
grok {
match => ["message","%{DATESTAMP:timestamp},%{BASE16FLOAT:value},%{WORD:protocol},%{IP:ip},%{GREEDYDATA:remaining}" ]
}
}
output {
stdout {
codec => rubydebug
}
}
I have the following json object logs as following in a log file
{"con":"us","sl":[[1,2]],"respstats_1":"t:2,ts:140,m:192.168.7.5,p:|mobfox:1,P,E,0,0.4025:0.0:-:-,0-98;appnexus-marimedia:2,P,L,140,0.038:0.0:-:-,-;","rid":"AKRXRWLYCZIDFM","stats":"t:2,h:2,ts:140,mobfox:0,appnexus-marimedia:140,;m:192.168.7.5;p:","resp_count":0,"client_id":"15397682","err_stats":"mobfox:0-98,"}
{"con":"br","sl":[[1,2,3,4]],"respstats_1":"t:4,ts:285,m:192.168.7.5,p:|smaato:1,P,M,143,0.079:0.0:-:-,-;vserv-specialbuy:2,P,W,285,0.0028:0.0:-:-,-;mobfox:3,P,E,42,0.077:0.0:-:-,0-98;inmobi-pre7:4,P,H,100,0.0796:0.0:-:-,-;","rid":"AKRXRWLYCY4DOU","stats":"t:4,h:4,ts:285,smaato:143,vserv-specialbuy:285,mobfox:42,inmobi-pre7:100,;m:192.168.7.5;p:","resp_count":1,"client_id":"15397682","err_stats":"mobfox:0-98,","ads":[{"pricing":{"price":"0","type":"cpc"},"rank":2,"resp_json":{"img_url":"http://img.vserv.mobi/i/320x50_7/7bfffd967a91e0e38ee06ffcee1a75e5.jpg?108236_283989_c46e3f74","cli_url":"http://c.vserv.mobi/delivery/ck.php?p=2__b=283989__zoneid=108236__OXLCA=1__cb=c46e3f74__dc=1800__cd=usw3_uswest2a-1416567600__c=37742__rs=0a587520_15397682__mser=cdn__dat=3__dacp=12__zt=s__r=http%3A%2F%2Fyeahmobi.go2cloud.org%2Faff_c%3Foffer_id%3D28007%26aff_id%3D10070%26aff_sub%3D108236_283989_c46e3f74","beacons":["http://img.vserv.mobi/b.gif"],"ad_type":"image"},"resp_code":200,"resp_html":"<a href=\"http://c.vserv.mobi/delivery/ck.php?p=2__b=283989__zoneid=108236__OXLCA=1__cb=c46e3f74__dc=1800__cd=usw3_uswest2a-1416567600__c=37742__rs=0a587520_15397682__mser=cdn__dat=3__dacp=12__zt=s__r=http%3A%2F%2Fyeahmobi.go2cloud.org%2Faff_c%3Foffer_id%3D28007%26aff_id%3D10070%26aff_sub%3D108236_283989_c46e3f74\"><img src=\"http://img.vserv.mobi/i/320x50_7/7bfffd967a91e0e38ee06ffcee1a75e5.jpg?108236_283989_c46e3f74\" alt=\"\" /> <\/a><img src=\"http://img.vserv.mobi/b.gif\" alt=\"\" />","tid":"vserv-specialbuy","bid":"576111"}]}
How ever I am not able to figure out whether they are multiline or single line but I have used as following configuration
input {
file {
codec => multiline {
pattern => '^{'
negate => true
what => previous
}
path => ['/home/pp38/fetcher.log']
}
}
filter {
json {
source => message
remove_field => message
}
}
output { stdout { codec => rubydebug } }
I am not able to see any kind of output or error when it is started
edited:
I have used the following config which had generated output.
input {
file {
codec => "json"
type => "json"
path => "/home/pp38/fetcher.log"
sincedb_path => "/home/pp38/tmp/logstash/sincedb"
}
}
filter {
json {
source => "message"
target => "message"
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => ["localhost:9200"]
}
}
But i am getting the output where each field is indexed by elasticsearch
how can i append the entire json message to new field as message:jsonContent ?
You can handle this with the plain multiline, but for you situation there is a better codec plugin called json_lines.
The json_lines will input a source with multiple jsons(one in each line) and handle each json out of the box.
This codec will decode streamed JSON that is newline delimited. Encoding will emit a single JSON string ending in a \n NOTE: Do not use this codec if your source input is line-oriented JSON, for example, redis or file inputs. Rather, use the json codec. More info: This codec is expecting to receive a stream (string) of newline terminated lines. The file input will produce a line string without a newline. Therefore this codec cannot work with line oriented inputs.
I have a json log file, which I am taking as a input with this config
input {
file { filename }
codec { json_lines }
}
Each line is a deeply nested JSON.
In the filters,
When I say
mutate { add_field => { "new_field_name" => "%{old_field_name}"}
if the old_field is a nested hash/array it is converted to a string and then added. Is there anyway I can preserve the type instead of stringifying it ?
You might consider using the ruby filter to duplicate the array:
filter {
ruby { code => "event['newhash'] = event['myhash']" }
}
I don't think there is a cleaner solution.
Apart from that, do you really use the json_lines codec with the file input? From logstash doc:
Do not use this codec if your source input is line-oriented JSON, for
example, redis or file inputs. Rather, use the json codec.