Parse log file of json using logstatsh - logstash

I have the following json object logs as following in a log file
{"con":"us","sl":[[1,2]],"respstats_1":"t:2,ts:140,m:192.168.7.5,p:|mobfox:1,P,E,0,0.4025:0.0:-:-,0-98;appnexus-marimedia:2,P,L,140,0.038:0.0:-:-,-;","rid":"AKRXRWLYCZIDFM","stats":"t:2,h:2,ts:140,mobfox:0,appnexus-marimedia:140,;m:192.168.7.5;p:","resp_count":0,"client_id":"15397682","err_stats":"mobfox:0-98,"}
{"con":"br","sl":[[1,2,3,4]],"respstats_1":"t:4,ts:285,m:192.168.7.5,p:|smaato:1,P,M,143,0.079:0.0:-:-,-;vserv-specialbuy:2,P,W,285,0.0028:0.0:-:-,-;mobfox:3,P,E,42,0.077:0.0:-:-,0-98;inmobi-pre7:4,P,H,100,0.0796:0.0:-:-,-;","rid":"AKRXRWLYCY4DOU","stats":"t:4,h:4,ts:285,smaato:143,vserv-specialbuy:285,mobfox:42,inmobi-pre7:100,;m:192.168.7.5;p:","resp_count":1,"client_id":"15397682","err_stats":"mobfox:0-98,","ads":[{"pricing":{"price":"0","type":"cpc"},"rank":2,"resp_json":{"img_url":"http://img.vserv.mobi/i/320x50_7/7bfffd967a91e0e38ee06ffcee1a75e5.jpg?108236_283989_c46e3f74","cli_url":"http://c.vserv.mobi/delivery/ck.php?p=2__b=283989__zoneid=108236__OXLCA=1__cb=c46e3f74__dc=1800__cd=usw3_uswest2a-1416567600__c=37742__rs=0a587520_15397682__mser=cdn__dat=3__dacp=12__zt=s__r=http%3A%2F%2Fyeahmobi.go2cloud.org%2Faff_c%3Foffer_id%3D28007%26aff_id%3D10070%26aff_sub%3D108236_283989_c46e3f74","beacons":["http://img.vserv.mobi/b.gif"],"ad_type":"image"},"resp_code":200,"resp_html":"<a href=\"http://c.vserv.mobi/delivery/ck.php?p=2__b=283989__zoneid=108236__OXLCA=1__cb=c46e3f74__dc=1800__cd=usw3_uswest2a-1416567600__c=37742__rs=0a587520_15397682__mser=cdn__dat=3__dacp=12__zt=s__r=http%3A%2F%2Fyeahmobi.go2cloud.org%2Faff_c%3Foffer_id%3D28007%26aff_id%3D10070%26aff_sub%3D108236_283989_c46e3f74\"><img src=\"http://img.vserv.mobi/i/320x50_7/7bfffd967a91e0e38ee06ffcee1a75e5.jpg?108236_283989_c46e3f74\" alt=\"\" /> <\/a><img src=\"http://img.vserv.mobi/b.gif\" alt=\"\" />","tid":"vserv-specialbuy","bid":"576111"}]}
How ever I am not able to figure out whether they are multiline or single line but I have used as following configuration
input {
file {
codec => multiline {
pattern => '^{'
negate => true
what => previous
}
path => ['/home/pp38/fetcher.log']
}
}
filter {
json {
source => message
remove_field => message
}
}
output { stdout { codec => rubydebug } }
I am not able to see any kind of output or error when it is started
edited:
I have used the following config which had generated output.
input {
file {
codec => "json"
type => "json"
path => "/home/pp38/fetcher.log"
sincedb_path => "/home/pp38/tmp/logstash/sincedb"
}
}
filter {
json {
source => "message"
target => "message"
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => ["localhost:9200"]
}
}
But i am getting the output where each field is indexed by elasticsearch
how can i append the entire json message to new field as message:jsonContent ?

You can handle this with the plain multiline, but for you situation there is a better codec plugin called json_lines.
The json_lines will input a source with multiple jsons(one in each line) and handle each json out of the box.
This codec will decode streamed JSON that is newline delimited. Encoding will emit a single JSON string ending in a \n NOTE: Do not use this codec if your source input is line-oriented JSON, for example, redis or file inputs. Rather, use the json codec. More info: This codec is expecting to receive a stream (string) of newline terminated lines. The file input will produce a line string without a newline. Therefore this codec cannot work with line oriented inputs.

Related

Logstash multiline input works on local but not on EC2 instance

I've tried to parse it using the json, json_lines and even the multiline input plugin, yet to no avail. The multiline works well on my local machine but doesn't seem to work on my s3 and ec2 instance.
How would I write the grok filter to parse this?
This is what my JSON file looks like
{
"sourceId":"94:54:93:3B:81:6F1",
"machineId":"c1VR21A0GoCBgU6EMJ78d3CL",
"columnsCSV":"timestamp,state,0001,0002,0003,0004",
"tenantId":"iugcp",
"valuesCSV":"1557920277890,1,98.66,0.07,0.1,0.17 ",
"timestamp":"2019-05-15T11:37:57.890Z"
}
This is my config -
input {
file{
codec => multiline
{
pattern => '^\{'
negate => true
what => previous
}
path => "/home/*myusername*/Desktop/data/*.json"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
mutate
{
replace => [ "message", "%{message}}" ]
gsub => [ 'message','\n','']
}
if [message] =~ /^{.*}$/
{
json { source => message }
}
}
//Output tag is correct, haven't included it here
The results I get is just the json file present in the "message" field.
What I wanted is for every json tag, there should be a separate field in the document.

In Logstash, how do I extract fields from a log event using the json filter?

Logstash v2.4.1.
I'm sending JSON formatted logs to my Logstash server via UDP packet. The logs look something similar to this.
{
"key1":"value1",
"key2":"value2",
"msg":"2017-03-02 INFO [com.company.app] Hello world"
}
This is my output filter
output {
stdout {
codec => rubydebug
}
file {
path => "/var/log/trm/debug.log"
codec => line { format => "%{msg}" }
}
}
The rubydebug output codec shows the log like this
{
"message" => {\"key1\":\"value1\", "key2\":\"value2\", \"msg\":\"2017-03-02 INFO [com.company.app] Hello world\"
}
and the file output filter also shows the JSON log correctly, like this
{"key1":"value1", "key2":"value2", "msg":"2017-03-02 INFO [com.company.app] Hello world"}
When I use the JSON code in the input filter, I get _jsonparsefailures from Logstash on "some" logs, even though different online JSON parsers parse the JSON correctly, meaning my logs are in a valid JSON format.
input {
udp => {
port => 5555
codec => json
}
}
Therefore, I'm trying to use the json filter instead, like this
filter {
json => {
source => "message"
}
}
Using the json filter, how can I extract the "key1", "key2", and the "msg" fields in the "message?"
I tried this to no avail, that is, I don't see the "key1" field in my rubydebug output.
filter {
json => {
source => "message"
add_field => {
"key1" => "%{[message][key1]}"
}
}
}
I would suggest you to start with one of the two configuration below (I use the multiline codec to concatenate the input into a json, because otherwise logstash will read line by line, and one line of a json is not a valid json), then either filter the json, or use the json codec, and then output it to wherever it is needed. You will still have some configuration to do, but I believe it might help you to get started:
input{
file {
path => "/an/absolute/path/tt2.json" #It really has to be absolute!
start_position => beginning
sincedb_path => "/another/absolute/path" #Not mandatory, just for ease of testing
codec => multiline{
pattern => "\n"
what => "next"
}
}
}
filter{
json {
source => "multiline"
}
}
output {
file {
path => "data/log/trm/debug.log"
}
stdout{codec => json}
}
Second possibility:
input{
file {
path => "/an/absolute/path/tt2.json" #It really has to be absolute!
start_position => beginning
sincedb_path => "/another/absolute/path" #Not mandatory, just for ease of testing
codec => multiline{
pattern => "\n"
what => "next"
}
codec => json{}
}
}
output {
file {
path => "data/log/trm/debug.log"
}
stdout{codec => json}
}
Edit With the udp input I guess it should be (not tested):
input {
udp => {
port => 5555
codec => multiline{ #not tested this part
pattern => "^}"
what => "previous"
}
codec => json{}
}
}

How do I parse a json-formatted log message in Logstash to get a certain key/value pair?

I send json-formatted logs to my Logstash server. The log looks something like this (Note: The whole message is really on ONE line, but I show it in multi-line to ease reading)
2016-09-01T21:07:30.152Z 153.65.199.92
{
"type":"trm-system",
"host":"susralcent09",
"timestamp":"2016-09-01T17:17:35.018470-04:00",
"#version":"1",
"customer":"cf_cim",
"role":"app_server",
"sourcefile":"/usr/share/tomcat/dist/logs/trm-system.log",
"message":"some message"
}
What do I need to put in my Logstash configuration to get the "sourcefile" value, and ultimately get the filename, e.g., trm-system.log?
If you pump the hash field (w/o the timestamp) into ES it should recognize it.
If you want to do it inside a logstash pipeline you would use the json filter and point the source => to the second part of the line (possibly adding the timestamp prefix back in).
This results in all fields added to the current message, and you can access them directly or all combined:
Config:
input { stdin { } }
filter {
# split line in Timestamp and Json
grok { match => [ message , "%{NOTSPACE:ts} %{NOTSPACE:ip} %{GREEDYDATA:js}"] }
# parse json part (called "js") and add new field from above
json { source => "js" }
}
output {
# stdout { codec => rubydebug }
# you access fields directly with %{fieldname}:
stdout { codec => line { format => "sourcefile: %{sourcefile}"} }
}
Sample run
2016-09-01T21:07:30.152Z 153.65.199.92 { "sourcefile":"/usr" }
sourcefile: /usr
and with rubydebug (host and #timestamp removed):
{
"message" => "2016-09-01T21:07:30.152Z 153.65.199.92 { \"sourcefile\":\"/usr\" }",
"#version" => "1",
"ts" => "2016-09-01T21:07:30.152Z",
"ip" => "153.65.199.92",
"js" => "{ \"sourcefile\":\"/usr\" }",
"sourcefile" => "/usr"
}
As you can see, the field sourcefile is directly known with the value in the rubydebug output.
Depending on the source of your log records you might need to use the multiline codec as well. You might also want to delete the js field, rename the #timestamp to _parsedate and parse ts into the records timestamp (for Kibana to be happy). This is not shown in the sample. I would also remove message to save space.

logstash mutate filter always stringifies hash and array

I have a json log file, which I am taking as a input with this config
input {
file { filename }
codec { json_lines }
}
Each line is a deeply nested JSON.
In the filters,
When I say
mutate { add_field => { "new_field_name" => "%{old_field_name}"}
if the old_field is a nested hash/array it is converted to a string and then added. Is there anyway I can preserve the type instead of stringifying it ?
You might consider using the ruby filter to duplicate the array:
filter {
ruby { code => "event['newhash'] = event['myhash']" }
}
I don't think there is a cleaner solution.
Apart from that, do you really use the json_lines codec with the file input? From logstash doc:
Do not use this codec if your source input is line-oriented JSON, for
example, redis or file inputs. Rather, use the json codec.

Logstash: Received an event that has a different character encoding

when using logstash I see an error like this one :
Received an event that has a different character encoding than you configured. {:text=>"2014-06-22T11:49:57.832631+02:00 10.17.22.37 date=2014-06-22 time=11:49:55 device_id=LM150D9L23000422 log_id=0312318759 type=statistics pri=information session_id=\\\"s617nnE2019973-s617nnE3019973\\\" client_name=\\\"[<IP address>]\\\" dst_ip=\\\"<ip address>\\\" from=\\\"machin#machin.fr\\\" to=\\\"truc#machin.fr\\\" polid=\\\"0:1:1\\\" domain=\\\"machin.fr\\\" subject=\\\"\\xF0\\xCC\\xC1\\xD4\\xC9 \\xD4\\xCF\\xCC\\xD8\\xCB\\xCF \\xDA\\xC1 \\xD0\\xD2\\xCF\\xC4\\xC1\\xD6\\xC9!\\\" mailer=\\\"mta\\\" resolved=\\\"OK\\\" direction=\\\"in\\\" virus=\\\"\\\" disposition=\\\"Quarantine\\\" classifier=\\\"FortiGuard AntiSpam\\\" message_length=\\\"1024\\\"", :expected_charset=>"UTF-8", :level=>:warn}
my logstash.conf is :
input {
file{
path => "/var/log/fortimail.log"
}
}
filter {
grok {
# grok-parsing for logs
}
}
output {
elasticsearch {
host => "10.0.10.62"
embedded => true
cluster => "Mastertest"
node_name => "MasterNode"
protocol => "http"
}
}
I do not know what codec should be used for the correct format of events ??
he problem is in the subject field.
This is because the default charset is UTF-8 and the incoming message contained a character not in the UTF-8 set
To fix this set the charset in the input section using codec and the correct charset. For example
file {
path => "var/log/http/access_log"
type => apache_access_log
codec => plain {
charset => "ISO-8859-1"
}
stat_interval => 60
}
http://logstash.net/docs/1.3.3/codecs/plain
If u received the logs from external server try to use:
input {
udp {
port => yourListenPort
type => inputType
codec => plain {
charset => "ISO-8859-1"
}
}
}
i have the same error and i used this, works!

Resources