logstash grok not working for some of the dataset - logstash

This is just related to the post where i'm using the below grok filter to dissect the data to be visualize into kibana, below is what i'm using into my logstash conf file and working for the data as desired but today I got into a situation where its not filtering the data as desired.
Correct visual at Kibana are like:
received_at:February 1st 2019, 21:00:04.105 float:0.5, 0.0 type:rmlog Hostname:dba- foxon93 Date:19/02/01 User_1:dv_vxehw #version:1 Hour_since:06 Command:rm -rf /data/rg/log
grok filter in logstash conf file:
match => { "message" => "%{HOSTNAME:Hostname},%{DATE:Date},%{HOUR:Hour_since}:%{MINUTE:Mins_since},%{NUMBER}-%{WORD},%{USER:User_1},%{USER:User_2} %{NUMBER:Pid} %{NUMBER:float} %{NUMBER:float} %{NUMBER:Num_1} %{NUMBER:Num_2} %{DATA} %{HOUR:hour2}:%{MINUTE:minute2} %{HOUR:hour3}:%{MINUTE:minute3} %{GREEDYDATA:Command}" }
My logstash conf file:
input {
file {
path => [ "/data/mylogs/*.txt" ]
start_position => beginning
sincedb_path => "/dev/null"
type => "tac"
}
}
filter {
if [type] == "tac" {
grok {
match => { "message" => "%{HOSTNAME:Hostname},%{DATE:Date},%{HOUR:Hour_since}:%{MINUTE:Mins_since},%{NUMBER}-%{WORD},%{USER:User_1},%{USER:User_2} %{NUMBER:Pid} %{NUMBER:float} %{NUMBER:float} %{NUMBER:Num_1} %{NUMBER:Num_2} %{DATA} %{HOUR:hour2}:%{MINUTE:minute2} %{HOUR:hour3}:%{MINUTE:minute3} %{GREEDYDATA:Command}" }
add_field => [ "received_at", "%{#timestamp}" ]
remove_field => [ "#version", "host", "message", "_type", "_index", "_score" ]
}
}
}
output {
if [type] == "rmlog" {
elasticsearch {
hosts => ["localhost:9200"]
manage_template => false
index => "tac-%{+YYYY.MM.dd}"
}
}
}
Below is the new data which is getting processed but i'm not getting the Hostname , Command etc fields for this data.
dbproj01,19/02/01,00:04,23-hrs,cvial,cvial 120804 0.0 0.0 106096 1200 pts/90 S Jan30 0:00 /bin/sh -c /bin/rm -f ../../../../../../tools.lnx86/dfII/etc/context/64bit/hBrowser.cxt ../../../../../../
tools.lnx86/dfII/etc/context/64bit/hBrowser.toc ../../../../../../tools.lnx86/dfII/etc/context/64bit/hBrowser.aux ../../../../../../tools.lnx86/dfII/etc/context/64bit/hBrowser.ini ; (CUR_DIR=`pwd` ;
cd ../../../../obj/linux-x86-64/optimize/bin/virtuoso ; ${CUR_DIR}/../../../../../../tools.lnx86/dfII/bin/virtuoso -ilLoadIL hBrowserBuildContext.il -log hBrowserBuildContext.log -nograph && [ `/bi
n/grep -c Error hBrowserBuildContext.log` = 0 ]) || (echo '*** Error: Failed to build hBrowser context.' ; /bin/rm -f ../../../../../../tools.lnx86/dfII/etc/context/64bit/hBrowser.cxt ../../../../..
/../tools.lnx86/dfII/etc/context/64bit/hBrowser.toc ../../../../../../tools.lnx86/dfII/etc/context/64bit/hBrowser.aux ../../../../../../tools.lnx86/dfII/etc/context/64bit/hBrowser.ini ; exit 1),/pro
j/cvial/WS/BUNGEE/REBASE_190120-138_2/tools.lnx86/dfII/group/bin/src

I see your issue in %{HOUR:hour2}:%{MINUTE:minute2} value as it returns as date Jan30 not time and also its included in %{DATA} section.
Below pattern will handle it
%{HOSTNAME:Hostname},%{DATE:Date},%{HOUR:Hour_since}:%{MINUTE:Mins_since},%{NUMBER}-%{WORD},%{USER:User_1},%{USER:User_2} %{NUMBER:Pid} %{NUMBER:float} %{NUMBER:float} %{NUMBER:Num_1} %{NUMBER:Num_2} %{DATA} (?:%{HOUR:hour2}:|)(?:%{MINUTE:minute2}|) (?:%{HOUR:hour3}:|)(?:%{MINUTE:minute3}|)%{GREEDYDATA:Command}
Also you can use Grok Debug for pattern test.

Related

How to convert my time with the date filter for kibana

I need to exploit this log line:
30361 30485 494 8861012 42:42 /plw/modules/bin/Lx86_64/opx2-intranet.exe -I /plw/modules/bin/Lx86_64/opx2-intranet.dxl -H /plw/modules/bin/Lx86_64 -L /plw/PLW_PROD/modules/preload-intranet.ini -- plw-sysconsole -port 8400 -logdir /plw/PLW_PROD/httpdocs/admin/log/ -slaves 2
My goal is to recover the time it took the system to boot up, here 42:42. The problem is that the format can be mm:ss as here, or HH:mm:ss for example, 01:42:30. I'd like to know which paterne grok to use.
Here's my conf file :
input {
file {
path => ["/home/mathis/Documents/*]"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
grok {
match => {"message" => ["%{NUMBER}%{SPACE}%{NUMBER}%{SPACE}%{NUMBER}%{SPACE}%{NUMBER}%{SPACE}%{TIME:starttime}"]}
remove_field => "message"
}
date {
match => [ "starttime", "HH:mm:ss","mm:ss" ]
}
}
output {
elasticsearch {
hosts => "127.0.0.1:9200"
index => "logstash-local3-%{+YYYY.MM.dd}"
}
}
Unfortunately, the syntax of TIME is HH:mm:ss and does not include mm:ss.
Solution:
%{NUMBER}%{SPACE}%{NUMBER}%{SPACE}%{NUMBER}%{SPACE}%{NUMBER}%{SPACE}(?<starttime>((?!<[0-9])%{HOUR}:)?%{MINUTE}(?::%{SECOND})(?![0-9]))

Removing unwanted fields in Logstash configuration file

I'm building a ELK Setup and its working fine , however i'm getting into a situation where i want to remove certain fields from by system-log data while processing through logstash like remove_field & remove_tag which i've defined in my logstash configuration file but that's not working.
Looking for any esteem and expert advice to correct the config to make it running, thanks very much in advanced.
My logstash configuration file:
[root#sandbox-prd~]# cat /etc/logstash/conf.d/syslog.conf
input {
file {
path => [ "/data/SYSTEMS/*/messages.log" ]
start_position => beginning
sincedb_path => "/dev/null"
max_open_files => 64000
type => "sj-syslog"
}
}
filter {
if [type] == "sj-syslog" {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp } %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
add_field => [ "received_at", "%{#timestamp}" ]
remove_field => ["#version", "host", "_type", "_index", "_score", "path"]
remove_tag => ["_grokparsefailure"]
}
syslog_pri { }
date {
match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
}
}
output {
if [type] == "sj-syslog" {
elasticsearch {
hosts => "sandbox-prd02:9200"
manage_template => false
index => "sj-syslog-%{+YYYY.MM.dd}"
document_type => "messages"
}
}
}
Data sample appearing on the Kibana Portal
syslog_pid:6662 type:sj-syslog syslog_message:(root) CMD (LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok) syslog_severity:notice syslog_hostname:dbaprod01 syslog_severity_code:5 syslog_timestamp:Feb 11 10:25:02 #timestamp:February 11th 2019, 23:55:02.000 message:Feb 11 10:25:02 dbaprod01 CROND[6662]: (root) CMD (LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok) syslog_facility:user-level syslog_facility_code:1 syslog_program:CROND received_at:February 11th 2019, 10:25:03.353 _id:KpHo2mgBybCgY5IwmRPn _type:messages
_index:sj-syslog-2019.02.11 _score: -
MY Resource Details:
OS version : Linux 7
Logstash Version: 6.5.4
You can't remove _type and _index, those are metadata fields needed by elasticsearch to work, they have information about your index name and the mapping of your data, the _score field is also a metadata field, generated at search time, it's not on your document.

How to write a Multiple input and logstash filter for hostname based on message field

As a newbie to logstash i would like to understand as i have two types of logs one is Linux system logs and another i have CISCO switches logs , now i'm looking forward to create the diffrent input and filter's for both.
I have defined the type for linux logs as syslog and for CISCO switches as APIC and want to define the and for filter section. My CISCO log pattrens sample is as below where my SWITCH NAME is 7th Field in the messages , so wonder how to take that 7th field as a Hostname for swiches.
Aug 23 16:36:58 Aug 23 11:06:58.830 mydc-leaf-3-5 %LOG_-1-SYSTEM_MSG [E4210472][transition][info][sys] sent user message to syslog group:Syslog_Elastic_Server:final
Blow is my logstash-syslog.conf file which is working for syslog but needs while for CISCO logs ie type => APIC ..
# cat logstash-syslog.conf
input {
file {
path => [ "/scratch/rsyslog/*/messages.log" ]
type => "syslog"
}
file {
path => [ "/scratch/rsyslog/Aug/messages.log" ]
type => "APIC"
}
}
filter {
if [type] == "syslog" {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp } %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
add_field => [ "received_at", "%{#timestamp}" ]
add_field => [ "received_from", "%{host}" ]
}
syslog_pri { }
date {
match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
}
if [type] == "APIC" {
grok {
match => { "message" => "%{CISCOTIMESTAMP:syslog_timestamp} %{CISCOTIMESTAMP} %{SYSLOGHOST:syslog_hostname} %{GREEDYDATA:syslog_message}" }
add_field => [ "received_at", "%{#timestamp}" ]
add_field => [ "received_from", "%{host}" ]
}
}
}
output {
#if "automount" in [message] or "ldap" in [message] {
elasticsearch {
hosts => "noida-elk:9200"
index => "syslog-%{+YYYY.MM.dd}"
#index => "%{[type]}-%{+YYYY.MM.dd}"
#index => "%{index}-%{+YYYY.MM.dd}"
#type => "%{type}
document_type => "messages"
}
}
Filter works correctly for below message and i get the Field syslog_hostname correctly, here in case i get can get the linuxdev.
Aug 24 10:34:02 linuxdev automount[1905]: key ".P4Config" not found in map source(s).
Filter do not work for below message..
Aug 24 10:26:22 Aug 24 04:56:22.444 my-apic-1 %LOG_-3-SYSTEM_MSG [F1546][soaking_clearing][packets-dropped][minor][dbgs/ac/sdvpcpath-207-208-to-109-110/fault-F1546] 2% of packets were dropped during the last collection interval
After some grokking here is my pattern for Cisco APIC syslogs:
%{SYSLOG5424PRI:initial_code}%{CISCOTIMESTAMP:cisco_timestamp}%{SPACE}%{TZ}%{ISO8601_TIMEZONE}%{SPACE}%{URIHOST:uri_host}%{SPACE}%{SYSLOGPROG:syslog_prog}%{SPACE}%{SYSLOG5424SD:message_code}%{SYSLOG5424SD:message_type}%{SYSLOG5424SD:messa
ge_class}%{NOTSPACE:message_dn}%{SPACE}%{GREEDYDATA:message_content}
Let me have some feedbacks to improve.

logstash grok multiline - how to merge to previous line any line that doesn't start with timestamp

sometimes I print to log indented pretty jsons which printed in multiple lines. so I need to be able to tell logstash to append these prints to the original line of the original event.
example:
xxx p:INFO d:2015-07-21 11:11:58,906 sourceThread:3iMind-Atlas-akka.actor.default-dispatcher-2 queryUserId: queryId: hrvJobId:6c1a4d60-e5e6-40d8-80aa-a4dc00e9f0c4 etlStreamId:70 etlOmdId: etlDocId: logger:tim.atlas.module.etl.mq.MQConnectorEtl msg:(st:Consuming) received NotifyMQ. sending to [openmind_exchange/job_ack] message:
{
"JobId" : "6c1a4d60-e5e6-40d8-80aa-a4dc00e9f0c4",
"Time" : "2015-07-21T11:11:58.904Z",
"Errors" : [ ],
"FeedItemSchemaCounts" : {
"Document" : 1,
"DocumentMetadata" : 1
},
"OtherSchemaCounts" : { }
}
Since I've set a special log4j appender to function solely as logstash input, this task should be quiet easy. I control the layout of the log, so I can add as many prefix/suffix indicators as I please.
here's how my appender look like:
log4j.appender.logstash-input.layout.ConversionPattern=xxx p:%p d:%d{yyyy-MM-dd HH:mm:ss,SSS}{UTC} sourceThread:%X{sourceThread} queryUserId:%X{userId} queryId:%X{queryId} hrvJobId:%X{hrvJobId} etlStreamId:%X{etlStreamId} etlOmdId:%X{etlOmdId} etlDocId:%X{etlDocId} logger:%c msg:%m%n
as you can see I've prefixed every message with 'xxx' so I could tell logstash to append any line which doesn't start with 'xxx' to the previous line
here's my logstash configuration:
if [type] == "om-svc-atlas" {
grok {
match => [ "message" , "(?m)p:%{LOGLEVEL:loglevel} d:%{TIMESTAMP_ISO8601:logdate} sourceThread:%{GREEDYDATA:sourceThread} queryUserId:%{GREEDYDATA:userId} queryId:%{GREEDYDATA:queryId} hrvJobId:%{GREEDYDATA:hrvJobId} etlStreamId:%{GREEDYDATA:etlStreamId} etlOmdId:%{GREEDYDATA:etlOmdId} etlDocId:%{GREEDYDATA:etlDocId} logger:%{GREEDYDATA:logger} msg:%{GREEDYDATA:msg}" ]
add_tag => "om-svc-atlas"
}
date {
match => [ "logdate" , "YYYY-MM-dd HH:mm:ss,SSS" ]
timezone => "UTC"
}
multiline {
pattern => "<please tell me what to put here to tell logstash to append any line which doesnt start with xxx to the previous line>"
what => "previous"
}
}
yes it was easy indeed :
if [type] == "om-svc-atlas" {
grok {
match => [ "message" , "(?m)p:%{LOGLEVEL:loglevel} d:%{TIMESTAMP_ISO8601:logdate} sourceThread:%{GREEDYDATA:sourceThread} queryUserId:%{GREEDYDATA:userId} queryId:%{GREEDYDATA:queryId} hrvJobId:%{GREEDYDATA:hrvJobId} etlStreamId:%{GREEDYDATA:etlStreamId} etlOmdId:%{GREEDYDATA:etlOmdId} etlDocId:%{GREEDYDATA:etlDocId} logger:%{GREEDYDATA:logger} msg:%{GREEDYDATA:msg}" ]
add_tag => "om-svc-atlas"
}
date {
match => [ "logdate" , "YYYY-MM-dd HH:mm:ss,SSS" ]
timezone => "UTC"
}
multiline {
pattern => "^(?!xxx).+"
what => "previous"
}
}

Logstash 1.4.2 grok filter: _grokparsefailure

i am trying to parse this log line:
- 2014-04-29 13:04:23,733 [main] INFO (api.batch.ThreadPoolWorker) Command-line options for this run:
here's the logstash config file i use:
input {
stdin {}
}
filter {
grok {
match => [ "message", " - %{TIMESTAMP_ISO8601:time} \[%{WORD:main}\] %{LOGLEVEL:loglevel} %{JAVACLASS:class} %{DATA:mydata} "]
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
output {
elasticsearch {
host => "localhost"
}
stdout { codec => rubydebug }
}
Here's the output i get:
{
"message" => " - 2014-04-29 13:04:23,733 [main] INFO (api.batch.ThreadPoolWorker) Commans run:",
"#version" => "1",
"#timestamp" => "2015-02-02T10:53:58.282Z",
"host" => "NAME_001.corp.com",
"tags" => [
[0] "_grokparsefailure"
]
}
Please if anyone can help me find where the problem is on the gork pattern.
I tried to parse that line in http://grokdebug.herokuapp.com/ but it parses only the timestamp, %{WORD} and %{LOGLEVEL} the rest is ignored!
There are two error in your config.
First
The error in GROK is the JAVACLASS, you have to include ( ) in the pattern, For example: \(%{JAVACLASS:class}\.
Second
The date filter match have two value, first is the field you want to parse, so in your example it is time, not timestamp. The second value is the date pattern. You can refer to here
Here is the config
input {
stdin {
}
}
filter {
grok {
match => [ "message", " - %{TIMESTAMP_ISO8601:time} \[%{WORD:main}\] %{LOGLEVEL:loglevel} \(%{JAVACLASS:class}\) %{GREEDYDATA:mydata}"
]
}
date {
match => [ "time" , "YYYY-MM-dd HH:mm:ss,SSS" ]
}
}
output
{
stdout {
codec => rubydebug
}
}
FYI. Hope this can help you.

Resources