I want to create a custom GROK for my log - logstash

log example:
30-10-22 20:35:36 [DEBUG] [Default] [Worker] Sleeping for 10 seconds...
I can get %{TIMESTAMP_ISO8601:timestamp} or %{LOGLEVEL:log-level}
but I need only the message "Sleeping for 10 seconds..."
%{GREEDYDATA:message} return all the message:
Sleeping for 10 seconds...
Example from opensearch output:
#timestamp
Nov 2, 2022 # 12:30:45.074
#version
1
_id
Z_nkN4QBewLz0DWJsjKB
_index
logstash-logs-2022.11.02
_score
-
_type
_doc
host
ip-10-155-29-101
log-level_pvt
DEBUG
message
30-10-22 19:58:28 [DEBUG] [Default] [Worker] Sleeping for 10 seconds...
message_pvt
30-10-22 19:58:28 [DEBUG] [Default] [Worker] Sleeping for 10 seconds...
path
/home/ubuntu/logstashdir/WorkerLogs/EC2AMAZ-06N2AJA_2.log
timestamp_pvt
30-10-22 19:58:28
in json format:
{
"_index": "logstash-logs-2022.11.02",
"_type": "_doc",
"_id": "Z_nkN4QBewLz0DWJsjKB",
"_version": 1,
"_score": null,
"_source": {
"#version": "1",
"timestamp_pvt": "30-10-22 19:58:28",
"message_pvt": "30-10-22 19:58:28\t[DEBUG]\t[Default]\t[Worker]\t Sleeping for 10 seconds... is 113\r",
"#timestamp": "2022-11-02T10:30:45.074Z",
"host": "ip-10-155-29-101",
"path": "/home/ubuntu/logstashdir/WorkerLogs/EC2AMAZ-06N2AJA_2.log",
"log-level_pvt": "DEBUG",
"message": "30-10-22 19:58:28\t[DEBUG]\t[Default]\t[Worker]\t Sleeping for 10 seconds... 113\r"
},
"fields": {
"#timestamp": [
"2022-11-02T10:30:45.074Z"
]
},
"sort": [
1667385045074
]
}
Any advice?
filter
{
grok
{
match => { "message" => "%{LOGLEVEL:log-level_pvt}" }
}
grok
{
match => { "message" => "%{GREEDYDATA:message_pvt}" }
}
grok
{
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp_pvt}" }
}
}
message is:
30-10-22 19:58:28 [DEBUG] [Default] [Worker] Sleeping for 10 seconds...
I want that message to be:
Sleeping for 10 seconds...

You can use the overwrite parameter to overwrite the original message.
For performance reasons combine the grok statements and start with ^
grok {
overwrite => "message"
match => { "message" => "^%{TIMESTAMP_ISO8601:timestamp_pvt}\t\[%{LOGLEVEL:log-level_pvt}\]\t\[%{WORD}\]\t\[%{WORD}\]%{GREEDYDATA:message}$" }
}

Related

All messages receive a "user level notice"

Im trying to parse a message from my network devices which send messages in format similar to
<30>Feb 14 11:33:59 wireless: ath0 Sending auth to xx:xx:xx:xx:xx:xx. Status: The request has been declined due to MAC ACL (52).\n
<190>Feb 14 11:01:29 CCR00 user admin logged out from xx.xx.xx.xx via winbox
<134>2023 Feb 14 11:00:33 ZTE command-log:An alarm 36609 level notification occurred at 11:00:33 02/14/2023 CET sent by MCP GponRm notify: <gpon-onu_1/1/1:1> SubType:1 Pos:1 ONU Uni lan los. restore\n on \n
using this logstash.conf file
input {
beats {
port => 5044
}
tcp {
port => 50000
}
udp {
port => 50000
}
}
## Add your filters / logstash plugins configuration here
filter {
grok {
match => {
"message" => "^(?:<%{POSINT:syslog_pri}>)?%{GREEDYDATA:message_payload}"
}
}
syslog_pri {
}
mutate {
remove_field => [ "#version" , "message" ]
}
}
output {
stdout {}
elasticsearch {
hosts => "elasticsearch:9200"
user => "logstash_internal"
password => "${LOGSTASH_INTERNAL_PASSWORD}"
}
}
which results in this output
{
"#timestamp": [
"2023-02-14T10:38:59.228Z"
],
"data_stream.dataset": [
"generic"
],
"data_stream.namespace": [
"default"
],
"data_stream.type": [
"logs"
],
"event.original": [
"<14> Feb 14 11:38:59 UBNT BOXSERV[boxs Req]: boxs.c(691) 55381193 %% Error 17 occurred reading thermal sensor 2 data\n\u0000"
],
"host.ip": [
"10.125.132.10"
],
"log.syslog.facility.code": [
1
],
"log.syslog.facility.name": [
"user-level"
],
"log.syslog.severity.code": [
5
],
"log.syslog.severity.name": [
"notice"
],
"message_payload": [
" Feb 14 11:38:59 UBNT[boxs Req]: boxs.c(691) 55381193 %% Error 17 occurred reading thermal sensor 2 data\n\u0000"
],
"syslog_pri": [
"14"
],
"_id": "UzmBT4YBAZPdbqc4m_IB",
"_index": ".ds-logs-generic-default-2023.02.04-000001",
"_score": null
}
which is mostly satisfactory, but i would expect the
log.syslog.facility.name
and
log.syslog.severity.name
fields to be processed by the
syslog_pri
filter
with imput of
<14>
to result into
secur/auth
and
Alert
recpectively,
but i keep getting the default user-level notice for all my messages, no matter what the part of the syslog message contains
anyone could advise and maybe fix my .conf syntax, if its wrong?
thank you very much!
i have logstash configured properly to receive logs and send them to elastics, but the grok/syslog_pri doesnt yield expected results
The fact that the syslog_pri filter is setting [log][syslog][facility][code] shows that it has ECS compatibility enabled. As a result, if you do not set the syslog_pri_field_name option on the syslog_pri filter, it will try to parse [log][syslog][priority]. If that field does not exist then it will parse the default value of 13, which is user-level/notice.
thank you for the answer, i have adjusted the code by the given advice
filter {
grok {
match => { "message" => "^(?:<%{POSINT:syslog_code}>)?%{GREEDYDATA:message_payload}"
} }
syslog_pri { syslog_pri_field_name => "syslog_code"
}
mutate { remove_field => [ "#version" , "message" ] } }
and now it behaves as intended
"event" => {
"original" => "<30>Feb 15 18:41:04 dnsmasq-dhcp[960]: DHCPACK(eth0) 10.0.0.165 xx:xx:xx:xx:xx CZ\n"
},
"#timestamp" => 2023-02-15T17:41:04.977038615Z,
"message_payload" => "Feb 15 18:41:04 dnsmasq-dhcp[960]: DHCPACK(eth0) 10.0.0.165 xx:xx:xx:xx:xx CZ\n",
"log" => {
"syslog" => {
"severity" => {
"code" => 6,
"name" => "informational"
},
"facility" => {
"code" => 3,
"name" => "daemon"
}
}
},
"syslog_code" => "30",
"host" => {
"ip" => "xx.xx.xx.xx"
} }
i will adjust the message a bit to fit my needs,
but that is out of the scope of this question
thank you very much!

LogStash Conf | Drop Empty Lines

The contents of LogStash's conf file looks like this:
input {
beats {
port => 5044
}
file {
path => "/usr/share/logstash/iway_logs/*"
start_position => "beginning"
sincedb_path => "/dev/null"
#ignore_older => 0
codec => multiline {
pattern => "^\[%{NOTSPACE:timestamp}\]"
negate => true
what => "previous"
max_lines => 2500
}
}
}
filter {
grok {
match => { "message" =>
['(?m)\[%{NOTSPACE:timestamp}\]%{SPACE}%{WORD:level}%{SPACE}\(%{NOTSPACE:entity}\)%{SPACE}%{GREEDYDATA:rawlog}'
]
}
}
date {
match => [ "timestamp", "yyyy-MM-dd'T'HH:mm:ss.SSS"]
target => "#timestamp"
}
grok {
match => { "entity" => ['(?:W.%{GREEDYDATA:channel}:%{GREEDYDATA:inlet}:%{GREEDYDATA:listener}\.%{GREEDYDATA:workerid}|W.%{GREEDYDATA:channel}\.%{GREEDYDATA:workerid}|%{GREEDYDATA:channel}:%{GREEDYDATA:inlet}:%{GREEDYDATA:listener}\.%{GREEDYDATA:workerid}|%{GREEDYDATA:channel}:%{GREEDYDATA:inlet}:%{GREEDYDATA:listener}|%{GREEDYDATA:channel})']
}
}
dissect {
mapping => {
"[log][file][path]" => "/usr/share/logstash/iway_logs/%{serverName}#%{configName}#%{?ignore}.log"
}
}
}
output {
elasticsearch {
hosts => "${ELASTICSEARCH_HOST_PORT}"
index => "iway_"
user => "${ELASTIC_USERNAME}"
password => "${ELASTIC_PASSWORD}"
ssl => true
ssl_certificate_verification => false
cacert => "/certs/ca.crt"
}
}
As one can make out, the idea is to parse a custom log employing multiline extraction. The extraction does its job. The log occasionally contains an empty first line. So:
[2022-11-29T12:23:15.073] DEBUG (manager) Generic XPath iFL functions use full XPath 1.0 syntax
[2022-11-29T12:23:15.074] DEBUG (manager) XPath 1.0 iFL functions use iWay's full syntax implementation
which naturally is causing Kibana to report an empty line:
In an attempt to supress this line from being sent to ES, I added the following as a last filter item:
if ![message] {
drop { }
}
if [message] =~ /^\s*$/ {
drop { }
}
The resulting JSON payload to ES:
{
"#timestamp": [
"2022-12-09T14:09:35.616Z"
],
"#version": [
"1"
],
"#version.keyword": [
"1"
],
"event.original": [
"\r"
],
"event.original.keyword": [
"\r"
],
"host.name": [
"xxx"
],
"host.name.keyword": [
"xxx"
],
"log.file.path": [
"/usr/share/logstash/iway_logs/localhost#iCLP#iway_2022-11-29T12_23_33.log"
],
"log.file.path.keyword": [
"/usr/share/logstash/iway_logs/localhost#iCLP#iway_2022-11-29T12_23_33.log"
],
"message": [
"\r"
],
"message.keyword": [
"\r"
],
"tags": [
"_grokparsefailure"
],
"tags.keyword": [
"_grokparsefailure"
],
"_id": "oRc494QBirnaojU7W0Uf",
"_index": "iway_",
"_score": null
}
While this does drop the empty first line, it also unfortunately interferes with the multiline operation on other lines. In other words, the multiline operation does not work anymore. What am I doing incorrectly?
Use of the following variation resolved the issue:
if [message] =~ /\A\s*\Z/ {
drop { }
}
This solution is based on Badger's answer provided on the Logstash forums, where this question was raised as well.

How to detect logstash input connection error

How can I monitor and detect errors when connecting kafka to logstash.
Say for example my kafka broker is down and no connection is established between kafka and logstash.
Is there a way in to get the monitor the connection status between logstash and kafka?
I can query logstash logs (but I don't think it is the appropriate way) and I tried to use logstash monitoring API (for example localhost:9600/_node/stats/pipelines?pretty) but no api gives me the connection status is off
Thank you in advance
If you have an elastic agent or a metricbeat agent installed on the Kafka node, you can configure the agent to monitor them using their Kafka specific module.
elastic-agent kafka module
metricbeat kafka module
For getting the connection status from logstash as you mentioned, you can also configure your logstash config to grab the status from the log message.
Sample document in elasticsearch :
{
"_index": "topicname",
"_type": "_doc",
"_id": "ulF8uH0BK9MbBSR7DPEw",
"_version": 1,
"_score": null,
"fields": {
"#timestamp": [
"2022-05-09T10:27:56.956Z"
],
"#version": [
"1"
],
"#version.keyword": [
"1"
],
"message": [
"{\"requestMethod\":\"GET\",\"headers\":{\"content-type\":\"application/json\",\"user-agent\":\"PostmanRuntime/7.XX.XX\",\"accept\":\"*/*\",\"postman-token\":\"11224442345223\",\"host\":\"localhost:2300\",\"accept-encoding\":\"gzip, deflate, br\",\"connection\":\"keep-alive\",\"content-length\":\"44\"},\"body\":{\"category\":\"CAT\",\"noise\":\"purr\"},\"query\":{},\"requestUrl\":\"http://localhost:2300/kafka\",\"protocol\":\"HTTP/1.1\",\"remoteIp\":\"1\",\"requestSize\":302,\"userAgent\":\"PostmanRuntime/7.XX.X\",\"statusCode\":200,\"response\":{\"success\":true,\"message\":\"Kafka Details are added\",\"data\":{\"kafkaData\":{\"_id\":\"12gvsddwqbwrfteacr313rcet5\",\"category\":\"DOG\",\"noise\":\"bark\",\"__v\":0},\"postData\":{\"category\":\"DOG\",\"noise\":\"bark\"}}},\"latency\":{\"seconds\":0,\"nanos\":61000000},\"responseSize\":193}"
]
} }
Below configuration can be added to fetch the status:
input {
kafka {
topics => ["topicname"]
bootstrap_servers => "11.11.11.11:1111"
}
}
filter{
mutate { add_field => { "StatusCode" => "%{[message][0][status]}" } }
}
output {
elasticsearch {
hosts => ["11.11.11.12:9200"]
index => "topic-name-index"
}
}

Grok pattern working fine in grok debugger but the same pattern is not working when running with logstash

**Input File:**
123.123.12.123 - - [09/Jan/2021:00:00:41 -0500] "GET /abcde/common/abcde.jsp HTTP/1.1" 401 1944 1
Output in Grok Debugger
{
"Path": "GET /abcde/common/abcde.jsp HTTP/1.1",
"ResponseCode": "401",
"KnowCode": "1944",
"ExitCode": "1",
"UserInfo": "-",
"HostName": "123.123.12.123",
"Date": "09/Jan/2021:00:00:41 -0500"
}
GROK filter in logstash
grok {
match => {"message" => " %{IP:HostName}\s\-\s%{USERNAME:UserInfo}\s\[%{GREEDYDATA:Date}\]\s\"%{GREEDYDATA:Path}\"\s%{BASE10NUM:ResponseCode}\s%{BASE10NUM:KnowCode}\s%{BASE10NUM:ExitCode}"}
}
Whereas when process same grok pattern in logstash filter in Kibana screen it gives me result like this ::
"#version" => "1",
"#timestamp" => 2021-02-12T16:38:28.141Z,
"type" => "access_logs",
"path" => "C:/Temp/BOHLogs/CatalinaAccess/localhost_access_log.2021-01-09.txt",
"tags" => [
[0] "_grokparsefailure"
],
"host" => "AB-1SM433",
"message" => "123.123.12.123 - - [09/Jan/2021:00:00:41 -0500] \"GET /abcde/common/abcde.jsp HTTP/1.1\" 401 1944 1"
You have a space at the start of your pattern that needs to be removed.
grok {
match => {"message" => " %{IP:HostName}\s\-\s%{USERNAME:UserInfo}\s\[%{GREEDYDATA:Date}\]\s\"%{GREEDYDATA:Path}\"\s%{BASE10NUM:ResponseCode}\s%{BASE10NUM:KnowCode}\s%{BASE10NUM:ExitCode}"}
^ Delete this
}

elapsed + aggregate passing custom fields in Logstash

I am using elapsed plugin to calculate time and aggregate plugin then to display it.
I added custom fields to elapsed filter
You can see it below:
add_field => {
"status" => "Status"
"User" => "%{byUser}"
}
One is static the other one is dynamic coming with event.
On output of logstash it display only static values not dynamic one..
It displays %{byUser} for dynamic one.
But for task id and status fields works just fine and I got right values.
Any idea why?
Little bit more code
elapsed {
unique_id_field => "assetId"
start_tag => "tag1:tag2"
end_tag => "tag3:tag4"
add_field => {
"wasInStatus" => "tag3"
"User" => "%{byUser}"
}
add_tag => ["CustomTag"]
}
grok input:
grok {
match => [
"message", "%{TIMESTAMP_ISO8601:timestamp} %{NUMBER:assetId} %{WORD:event}:%{WORD:event1} User:%{USERNAME:byUser}"]
if "CustomTag" in [tags] and "elapsed" in [tags] {
aggregate {
task_id => "%{assetId}"
code => "event.to_hash.merge!(map)"
map_action => "create_or_update"
}
}
problem is connected with:
elapsed filter:
new_event_on_match => true/false
Change new_event_on_match to false was true in my pipeline fixed issue.but still wonder why.
I also faced similar issue now, and found a fix for it. When new_event_on_match => true is used the elapsed event will be separated from the original log and a new elapsed event will be entered to the ElasticSearch as below
{
"_index": "elapsed_index_name",
"_type": "doc",
"_id": "DzO03mkBUePwPE-nv6I_",
"_version": 1,
"_score": null,
"_source": {
"execution_id": "dfiegfj3334fdsfsdweafe345435",
"elapsed_timestamp_start": "2019-03-19T15:18:34.218Z",
"tags": [
"elapsed",
"elapsed_match"
],
"#timestamp": "2019-04-02T15:39:40.142Z",
"host": "3f888b2ddeec",
"cus_code": "Custom_name", [This is a custom field]
"elapsed_time": 41.273,
"#version": "1"
},
"fields": {
"#timestamp": [
"2019-04-02T15:39:40.142Z"
],
"elapsed_timestamp_start": [
"2019-03-19T15:18:34.218Z"
]
},
"sort": [
1554219580142
]
}
For adding the "cus_code" to the elapsed event object from the original log (log from where the elapsed filter end tag is detected), I added an aggregate filter as below:
if "elapsed_end_tag" in [tags] {
aggregate {
task_id => "%{execution_id}"
code => "map['cus_code'] = event.get('custom_code_field_name')"
map_action => "create"
}
}
and add the end block of aggregation by validating the 'elapsed' tag
if "elapsed" in [tags] {
aggregate {
task_id => "%{execution_id}"
code => "event.set('cus_code', map['cus_code'])"
map_action => "update"
end_of_task => true
timeout => 400
}
}
So to add custom field to elapsed event we need to combine aggregate filter along with elapse filter

Resources