GROK pattern not returning results - logstash

I am very new to ELK and I am stuck at extracting fields.Below is the sample data
Dec 9 06:36:01 s-login-01 CRON[2436102]: pam_unix(cron:session): session closed for user mXXt
Dec 9 06:34:07 s-login-01 sshd[2424671]: Disconnected from user sw 10.xx.1x.xx port 4000
Dec 9 06:34:05 s-login-01 systemd-logind[2405]: Session 20923 logged out. Waiting for processes to exit.
I have the above sample data I want to know how to write the .conf file for this .I tried using the below .conf but It did not extract the fields.
input {
file {
path => "/../syslog.log"
type => "syslog"
start_position => beginning
sincedb_path => "/dev/null"
}
}
filter {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:timestamp}%{SPACE}%{NOTSPACE:HOST}%{SPACE}%{NOTSPACE:PROCESS}\[%{NUMBER:PID}\]\:%{GREEDYDATA:activity}" }
match => { "message" => "%{SYSLOGTIMESTAMP:timestamp}%{SPACE}%{NOTSPACE:HOST}%{SPACE}%{NOTSPACE:PROCESS}\[%{NUMBER:PID}\]\:%{GREEDYDATA:activity}.?*%{WORD:User}" }
match => { "message" => "%{SYSLOGTIMESTAMP:timestamp}%{SPACE}%{NOTSPACE:HOST}%{SPACE}%{NOTSPACE:PROCESS}\[%{NUMBER:PID}\]\:%{GREEDYDATA:activity}.?*%{WORD:User}.?*%{IP:IP}.?*%{NUMBER:Port}" }
match => { "message" => "%{SYSLOGTIMESTAMP:timestamp}%{SPACE}%{NOTSPACE:HOST}%{SPACE}%{NOTSPACE:PROCESS}\[%{NUMBER:PID}\]\:%{GREEDYDATA:activity}.?*%{NUMBER:Session_ID}" }
}
}
output {
Elasticsearch {
hosts => ["localhost:9200"]
index => "sample_"
}
}

Here is the single grok pattern that satisfies all the sample data you provided:
%{SYSLOGTIMESTAMP:TIMESTAMP} %{DATA:HOST} %{DATA:PROCESS}\[%{BASE10NUM:PID}\]: %{GREEDYDATA:MSG}
I have used the GROK DEBUGGER to create the grok pattern.
Here is the output:
Find below the grok pattern for individual sample data:
Dec 9 06:34:07 s-login-01 sshd[2424671]: Disconnected from user sw 10.xx.1x.xx port 4000
%{SYSLOGTIMESTAMP:TIMESTAMP} %{DATA:HOST} %{DATA:PROCESS}\[%{BASE10NUM:PID}\]: %{DATA:ACTIVITY} %{DATA:msg} user %{DATA:USER} %{IP:IPADDRESS} %{DATA:msg} %{BASE10NUM:PORT}
Dec 9 06:36:01 s-login-01 CRON[2436102]: pam_unix(cron:session): session closed for user mXXt
%{SYSLOGTIMESTAMP:TIMESTAMP} %{DATA:HOST} %{DATA:PROCESS}\[%{BASE10NUM:PID}\]: %{GREEDYDATA:ACTIVITY} user %{WORD:USER}
Dec 9 06:34:05 s-login-01 systemd-logind[2405]: Session 20923 logged out. Waiting for processes to exit.
%{SYSLOGTIMESTAMP:TIMESTAMP} %{DATA:HOST} %{DATA:PROCESS}\[%{BASE10NUM:PID}\]: Session %{BASE10NUM:SESSION_ID} %{GREEDYDATA:ACTIVITY}

Related

logstash mix json and plain content

I use logstash as a syslog relay, it forwards the data to a graylog and writes data to a file.
I use the dns filter module to replace the IP with the FQDN and after this I can't write raw content to file, the IP is "json-ed".
What I get :
2022-05-17T15:17:01.580175Z {ip=vm2345.lab.com} <86>1 2022-05-17T17:17:01.579496+02:00 vm2345 CRON 2057538 - - pam_unix(cron:session): session closed for user root
What I want to get :
2022-05-17T15:17:01.580175Z vm2345.lab.com <86>1 2022-05-17T17:17:01.579496+02:00 vm2345 CRON 2057538 - - pam_unix(cron:session): session closed for user root
My config :
input {
syslog {
port => 514
type => "rsyslog"
}
}
filter {
if [type] == "rsyslog" {
dns {
reverse => [ "[host][ip]" ]
action => "replace"
}
}
}
output {
if [type] == "rsyslog" {
gelf {
host => "graylog.lab.com"
port => 5516
}
file {
path => "/data/%{+YYYY}/%{+MM}/%{+dd}/%{[host][ip]}/%{[host][ip]}_%{{yyyy_MM_dd}}.log"
codec => "line"
}
stdout { }
}
}
What's the best way to handle this ?
When you use codec => line, there is no default setting for the #format option, so the codec calls, .to_s on the event. The toString method for an event concatenates the #timestamp, the [host] field, and [message] field. You want the [host][ip] field, not the [host] field (which is an object) so tell the codec that
codec => line { format => "%{#timestamp} %{[host][ip]} %{message}" }

Logstash aggregation return empty message

I have a testing environment to test some logstash plugin before to move to production.
For now, I am using kiwi syslog generator, to generate some syslog for testing.
The field I have are as follow:
#timestamp
message
+ elastic medatadata
Starting from this basic fields, I start filtering my data.
The first thing is to add a new field based on the timestamp and message as follow:
input {
syslog {
port => 514
}
}
filter {
prune {
whitelist_names =>["timestamp","message","newfield", "message_count"]
}
mutate {
add_field => {"newfield" => "%{#timestamp}%{message}"}
}
}
The prune is just to don't process unwanted data.
And this works just fine as I am getting a new field with those 2 values.
The next step was to run some aggregation based on specific content of the message, such as if the message contains logged in or logged out
and to do this, I used the aggregation filter
grok {
match => {
"message" => [
"(?<[#metadata][event_type]>logged out)",
"(?<[#metadata][event_type]>logged in)",
"(?<[#metadata][event_type]>workstation locked)"
]
}
}
aggregate {
task_id => "%{message}"
code => "
map['message_count'] ||= 0; map['message_count'] += 1;
"
push_map_as_event_on_timeout => true
timeout_timestamp_field => "#timestamp"
timeout => 60
inactivity_timeout => 50
timeout_tags => ['_aggregatetimeout']
}
}
This worked as expected but I am having a problem here. When the aggregation times out. the only field populated for the specific aggregation, is the message_count
As you can see in the above screenshot, the newfield and message(the one on the total left, sorry it didn't fit in the screenshot) are both empty.
For the demostration and testing purpose that's is absolutely fine, but it will because unmanageable if I get hundreds of syslog per second not knowing to with message that message_count refers to.
Please, I am struggling here and I don't know how to solve this issue, can please somebody help me to understand how I can fill the newfield with the content of the message that it refers to?
This is my whole logstash configuration to make it easier.
input {
syslog {
port => 514
}
}
filter {
prune {
whitelist_names =>["timestamp","message","newfield", "message_count"]
}
mutate {
add_field => {"newfield" => "%{#timestamp}%{message}"}
}
grok {
match => {
"message" => [
"(?<[#metadata][event_type]>logged out)",
"(?<[#metadata][event_type]>logged in)",
"(?<[#metadata][event_type]>workstation locked)"
]
}
}
aggregate {
task_id => "%{message}"
code => "
map['message_count'] ||= 0; map['message_count'] += 1;
"
push_map_as_event_on_timeout => true
timeout_timestamp_field => "#timestamp"
timeout => 60
inactivity_timeout => 50
timeout_tags => ['_aggregatetimeout']
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash_index"
}
stdout {
codec => rubydebug
}
csv {
path => "C:\Users\adminuser\Desktop\syslog\syslogs-%{+yyyy.MM.dd}.csv"
fields => ["timestamp", "message", "message_count", "newfield"]
}
}
push_map_as_event_on_timeout => true
When you use this, and a timeout occurs, it creates a new event using the contents of the map. If you want fields from the original messages to be in the new event then you have to add them to the map. For the task_id there is a shorthand notation to do this using the timeout_task_id_field option on the filter, otherwise you have explicitly add them
map['newfield'] ||= event.get('newfield');

Logstash multiline codec ignore last event / line

Logstash multiline codec ignore my last event (line) until send next package of logs.
My logstash.conf:
input {
}
http {
port => "5001"
codec => multiline {
pattern => "^\[%{TIMESTAMP_ISO8601}\]"
negate => true
what => previous
auto_flush_interval => 15
}
}
}
filter{
grok {
match => { "message" => "(?m)\[%{TIMESTAMP_ISO8601:timestamp}\]\s\<%{LOGLEVEL:log-level}\>\s\[%{WORD:component}\]\s%{GREEDYDATA:log-message}"
}
}
output {
elasticsearch {
hosts => "elasticsearch:9200"
index => "%{+YYYY-MM-dd}"
}
}
Moreover solution with auto_flush_interval don't work.
For example:
input using Postman:
[2017-07-11 22:32:12.345] [KCU] Component initializing
Exception in thread "main" java.lang.NullPointerException
at com.example.myproject.Book.getTitle(Book.java:16)
[2017-07-11 22:32:16.345] [KCU] Return with status 1
output - only one event (should be two):
[2017-07-11 22:32:12.345] [KCU] Component initializing
Exception in thread "main" java.lang.NullPointerException
at com.example.myproject.Book.getTitle(Book.java:16)
I need this last line.
Question:
Am I doing something wrong or there are problems with multiline codec? - How to fix this?
I'm afraid you're using the multiline codec wrong. Let's take a look at your configuration:
codec => multiline {
pattern => "^\[%{TIMESTAMP_ISO8601}\]"
negate => true
what => previous
}
It says if a logline does not (negate => true) start with a ISO timestamp (pattern) append it to the previous log line (what => previous).
But the logline you're missing starts with a ISO timestamp:
[2017-07-11 22:32:16.345] [KCU] Return with status 1
So it will not be appended to the previous log lines but instead create a new document in Elasticsearch.

Logstash Grok Filter key/value pairs

Working on getting our ESET log files (json format) into elasticsearch. I'm shipping logs to our syslog server (syslog-ng), then to logstash, and elasticsearch. Everything is going as it should. My problem is in trying to process the logs in logstash...I cannot seem to separate the key/value pairs into separate fields.
Here's a sample log entry:
Jul 8 11:54:29 192.168.1.144 1 2016-07-08T15:55:09.629Z era.somecompany.local ERAServer 1755 Syslog {"event_type":"Threat_Event","ipv4":"192.168.1.118","source_uuid":"7ecab29a-7db3-4c79-96f5-3946de54cbbf","occured":"08-Jul-2016 15:54:54","severity":"Warning","threat_type":"trojan","threat_name":"HTML/Agent.V","scanner_id":"HTTP filter","scan_id":"virlog.dat","engine_version":"13773 (20160708)","object_type":"file","object_uri":"http://malware.wicar.org/data/java_jre17_exec.html","action_taken":"connection terminated","threat_handled":true,"need_restart":false,"username":"BATHSAVER\\sickes","processname":"C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe"}
Here is my logstash conf:
input {
udp {
type => "esetlog"
port => 5515
}
tcp {
type => "esetlog"
port => 5515
}
filter {
if [type] == "esetlog" {
grok {
match => { "message" => "%{DATA:timestamp}\ %{IPV4:clientip}\ <%{POSINT:num1}>%{POSINT:num2}\ %{DATA:syslogtimestamp}\ %{HOSTNAME}\ %{IPORHOST}\ %{POSINT:syslog_pid\ %{DATA:type}\ %{GREEDYDATA:msg}" }
}
kv {
source => "msg"
value_split => ":"
target => "kv"
}
}
}
output {
elasticsearch {
hosts => ['192.168.1.116:9200']
index => "eset-%{+YYY.MM.dd}"
}
}
When the data is displayed in kibana other than the data and time everything is lumped together in the "message" field only, with no separate key/value pairs.
I've been reading and searching for a week now. I've done similar things with other log files with no problems at all so not sure what I'm missing. Any help/suggestions is greatly appreciated.
Can you try belows configuration of logstash
grok {
match => {
"message" =>["%{CISCOTIMESTAMP:timestamp} %{IPV4:clientip} %{POSINT:num1} %{TIMESTAMP_ISO8601:syslogtimestamp} %{USERNAME:hostname} %{USERNAME:iporhost} %{NUMBER:syslog_pid} Syslog %{GREEDYDATA:msg}"]
}
}
json {
source => "msg"
}
It's working and tested in http://grokconstructor.appspot.com/do/match#result
Regards.

Logstash http_poller only shows last logmessage in Kibana

I am using Logstash to get the log from a url using http_poller. This works fine. The problem I have is that the log that gets received does not get send to Elastic Search in the right way. I tried splitting the result in different events but the only event that shows in Kibana is the last event from the log. Since I am pulling the log every 2 minutes, a lot of log information gets lost this way.
The input is like this:
input {
http_poller {
urls => {
logger1 => {
method => get
url => "http://servername/logdirectory/thislog.log"
}
}
keepalive => true
automatic_retries => 0
# Check the site every 2 minutes
interval => 120
request_timeout => 110
# Wait no longer than 110 seconds for the request to complete
# Store metadata about the request in this field
metadata_target => http_poller_metadata
type => 'log4j'
codec => "json"
# important tag settings
tags => stackoverflow
}
}
I then use a filter to add some fields and to split the logs
filter {
if "stackoverflow" in [tags] {
split {
terminator => "\n"
}
mutate {
add_field => {
"Application" => "app-stackoverflow"
"Environment" => "Acceptation"
}
}
}
}
The output then gets send to the Kibana server using the following output conf
output {
redis {
host => "kibanaserver.internal.com"
data_type => "list"
key => "logstash N"
}
}
Any suggestions why not all the events are stored in Kibana?

Resources