Logstash Grok filter for uwsgi logs - logstash

I'm a new user to ELK stack. I'm using UWSGI as my server. I need to parse my uwsgi logs using Grok and then analyze them.
Here is the format of my logs:-
[pid: 7731|app: 0|req: 357299/357299] ClientIP () {26 vars in 511 bytes} [Sun Mar 1 07:47:32 2015] GET /?file_name=123&start=0&end=30&device_id=abcd&verif_id=xyzsghg => generated 28 bytes in 1 msecs (HTTP/1.0 200) 2 headers in 79 bytes (1 switches on core 0)
I used this link to generate my filter, but it didn't parse much of the information.
The filter generated by the above link is
%{SYSLOG5424SD} %{IP} () {26 vars in 511 bytes} %{SYSLOG5424SD} GET %{URIPATHPARAM} => generated 28 bytes in 1 msecs (HTTP%{URIPATHPARAM} 200) 2 headers in 79 bytes (1 switches on core 0)
Here is my logstash-conf file.
input { stdin { } }
filter {
grok {
match => { "message" => "%{SYSLOG5424SD} %{IP} () {26 vars in 511 bytes} %{SYSLOG5424SD} GET %{URIPATHPARAM} => generated 28 bytes in 1 msecs (HTTP%{URIPATHPARAM} 200) 2 headers in 79 bytes (1 switches on core 0)" }
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
stdout { codec => rubydebug }
}
Upon running logstash with this conf file, I get an error message saying:-
{
"message" => "[pid: 7731|app: 0|req: 357299/357299] ClientIP () {26 vars in 511 bytes} [Sun Mar 1 07:47:32 2015] GET /?file_name=123&start=0&end=30&device_id=abcd&verif_id=xyzsghg => generated 28 bytes in 1 msecs (HTTP/1.0 200) 2 headers in 79 bytes (1 switches on core 0)",
"#version" => "1",
"#timestamp" => "2015-03-01T07:57:02.291Z",
"host" => "cube26-Inspiron-3542",
"tags" => [
[0] "_grokparsefailure"
]
}
The date has been properly formatted. How do I extract other information from my logs, such as my query parameters(filename, start,end, deviceid etc) and ClientIP , Response code etc.
Also, is there any built-in UWSGI log parser which can be used, such as the one built for apache and syslog?
EDIT
I wrote this on my own, but it throws the same error:
%{SYSLOG5424SD} %{IP:client_ip} () {%{NUMBER:vars} vars in %{NUMBER:bytes} bytes} %{SYSLOGTIMESTAMP:date} %{WORD:method} %{URIPATHPARAM:request} => generated %{NUMBER:generated_bytes} bytes in {NUMBER:secs} msecs (HTTP/1.0 %{NUMBER:response_code}) %{NUMBER:headers} headers in %{NUMBER:header_bytes} (1 switches on core 0)
EDIT 2
I'm finally able to crack it myself. The GROK filter for the above log will be:
\[pid: %{NUMBER:pid}\|app: %{NUMBER:app}\|req: %{NUMBER:req_num1}/%{NUMBER:req_num2}\] %{IP:client_ip} \(\) \{%{NUMBER:vars} vars in %{NUMBER:bytes} bytes\} %{SYSLOG5424SD} %{WORD:method} /\?file_name\=%{NUMBER:file_name}\&start\=%{NUMBER:start}\&end\=%{NUMBER:end} \=\> generated %{NUMBER:generated_bytes} bytes in %{NUMBER:secs} msecs \(HTTP/1.0 %{NUMBER:response_code}\) %{NUMBER:headers} headers in %{NUMBER:header_bytes}
But my questions still remain:
is there any default uwsgi log filter in grop??**
I've been applying different matches for different query parameters. Is there anything in grok that fetches the different query parameters by itself??

I found the solution for extracting the query parameters:-
Here is my final configuration:-
For log line
[pid: 7731|app: 0|req: 426435/426435] clientIP () {28 vars in 594 bytes} [Mon Mar 2 06:43:08 2015] GET /?file_name=wqvqwv&start=0&end=30&device_id=asdvqw&verif_id=qwevqwr&lang=English&country=in => generated 11018 bytes in 25 msecs (HTTP/1.0 200) 2 headers in 82 bytes (1 switches on core 0)
the configuration is
input { stdin { } }
filter {
grok {
match => { "message" => "\[pid: %{NUMBER}\|app: %{NUMBER}\|req: %{NUMBER}/%{NUMBER}\] %{IP} \(\) \{%{NUMBER} vars in %{NUMBER} bytes\} %{SYSLOG5424SD:DATE} %{WORD} %{URIPATHPARAM} \=\> generated %{NUMBER} bytes in %{NUMBER} msecs \(HTTP/1.0 %{NUMBER}\) %{NUMBER} headers in %{NUMBER}" }
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
kv {
field_split => "&? "
include_keys => [ "file_name", "device_id", "lang", "country"]
}
}
output {
stdout { codec => rubydebug }
elasticsearch { host => localhost }
}

I found your solution did't support HTTP/1.1. I fixed it and also add variables name. Ref
Here's my grok config:
grok {
match => { "message" => "\[pid: %{NUMBER:pid}\|app: %{NUMBER:id}\|req: %{NUMBER:currentReq}/%{NUMBER:totalReq}\] %{IP:remoteAddr} \(%{WORD:remoteUser}?\) \{%{NUMBER:CGIVar} vars in %{NUMBER:CGISize} bytes\} %{SYSLOG5424SD:timestamp} %{WORD:method} %{URIPATHPARAM:uri} \=\> generated %{NUMBER:resSize} bytes in %{NUMBER:resTime} msecs \(HTTP/%{NUMBER:httpVer} %{NUMBER:status}\) %{NUMBER:headers} headers in %{NUMBER:headersSize} bytes %{GREEDYDATA:coreInfo}" }
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}

Related

”_grokparsefailure” even though the grok pattern works

I am trying to parse different logs lines from two different type of file : slave and master. I did test my pattern in the Grok Dubugger and it is working fine but tags field in kibana is _grokparsefailure.
Here is my config file
input {
file {
type => "slave"
path => "/home/mathis/Documents/**/intranet*.log"
exclude =>"*8402.log"
sincedb_path => '/dev/null'
start_position => beginning
}
file {
type => "master"
path => "/home/mathis/Documents/**/intranet*8402.log"
sincedb_path => '/dev/null'
}
}
filter {
if [type] == "slave" {
grok {
match => { "message" => ["\[%{DATESTAMP:eventtime}\] \- %{USERNAME:user} \- %{IPV4:clientip} \- %{NUMBER} \- %{WORD} %{NUMBER:exectime} %{WORD} %{NUMBER:time} %{GREEDYDATA:data} %{NUMBER:waittime}","\[%{DATESTAMP:eventtime}\] \- Process status database sync \- %{WORD}\.%{WORD}\.%{WORD}\:%{NUMBER:slavenumb}\(\#%{NUMBER}\) \(load %{NUMBER:nbutilisateur} grace period 5 minutes\) %{GREEDYDATA}"] }
remove_field => "message"
}
date {
match => [ "eventtime", "dd/MM/YYYY HH:mm:ss.SSS" ]
target => "#timestamp"
}
}
if [type] == "master" {
grok {
match => {"message" => ["%{NUMBER}%{SPACE}%{NUMBER}%{SPACE}%{NUMBER}%{SPACE}%{NUMBER}%{SPACE}(?<starttime>((?!<[0-9])%{HOUR}:)?%{MINUTE}(?::%{SECOND})(?![0-9]))"]}
remove_field => "message"
}
date {
match => [ "starttime", "HH:mm:ss","mm:ss" ]
}
}
}
output {
elasticsearch {
hosts => "127.0.0.1:9200"
index => "logstash-local3-%{+YYYY.MM.dd}"
}
}
Here are the 3 logs lines that I want to parse :
(they are in the order of groks in my conf file)
[24/06/2020 21:57:29.548] - Process status database sync - us1salx08167.corpnet2.com:8100(#53738) (load 0 grace period 5 minutes) : current date 2020/06/24 21:57:29 update date 2020/06/24 21:55:44 old state OK new state OK
[29/05/2020 07:41:51.354] - ih912865 - 10.104.149.128 - 93 - Transaction 7635 COMPLETED 318 ms wait time 3183 ms
31730 31626 464 10970020 52:25 /plw/modules/bin/Lx86_64/opx2-intranet.exe -I /plw/modules/bin/Lx86_64/opx2-intranet.dxl -H /plw/modules/bin/Lx86_64 -L /plw/PLW_PROD/modules/preload-intranet.ini -- plw-sysconsole -port 8400 -logdir /plw/PLW_PROD/httpdocs/admin/log/ -slaves 2
So, I don't know if you've already resolved this -- but below is something you could use.
N.B. I added a couple of extra fields, but you can easily remove those [https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html#plugins-filters-mutate-remove_field].
When trying the expressions you provided, one of them actually failed in the grok debugger, so I just took it upon myself to rewrite them all from scratch while still maintaining variable names.
I noticed there was a lot of data that you simply didn't glean. If you want more captured, let me know.
Line 1:
[24/06/2020 21:57:29.548] - Process status database sync - us1salx08167.corpnet2.com:8100(#53738) (load 0 grace period 5 minutes) : current date 2020/06/24 21:57:29 update date 2020/06/24 21:55:44 old state OK new state OK
Pattern 1:
\[(?<eventtime>%{DATESTAMP})\] - Process status database sync - (?<host>%{HOSTNAME}):(?<slavenumber>%{NUMBER})(?<zz>\(#[\d]+\)) \(load (?<nbutilisateur>%{NUMBER}) grace period 5 minutes\)%{GREEDYDATA}
Line 2:
[29/05/2020 07:41:51.354] - ih912865 - 10.104.149.128 - 93 - Transaction 7635 COMPLETED 318 ms wait time 3183 ms
Pattern 2:
\[(?<eventtime>%{DATESTAMP})\] - (?<user>%{USER}) - (?<clientip>%{IPV4}) - %{NUMBER} - %{WORD} (?<exectime>%{NUMBER}) %{WORD} (?<ctime>%{NUMBER}) (?<ctimeunits>%{WORD}) wait time (?<waittime>%{NUMBER}) (?<waittimeunits>%{WORD})
Line 3:
31730 31626 464 10970020 52:25 /plw/modules/bin/Lx86_64/opx2-intranet.exe -I /plw/modules/bin/Lx86_64/opx2-intranet.dxl -H /plw/modules/bin/Lx86_64 -L /plw/PLW_PROD/modules/preload-intranet.ini -- plw-sysconsole -port 8400 -logdir /plw/PLW_PROD/httpdocs/admin/log/ -slaves 2
Pattern 3:
%{GREEDYDATA}(?<starttime>(?<=[\s])([\d]+:[\d]+))%{GREEDYDATA}

Logstash receive strange "<133>" code at the start of receiving TrendMicro log

My Logstash server is CentOS Linux release 8.1.1911.
logstash.version"=>"7.7.0"
I have a capture of what I received on port UDP 5514 with :
nc -lvu 5514 -o log.txt
The content of log.txt
<133>Jun 05 09:23:35 TMCM:EVT_URL_CONTENT_FILTERING Security product="OfficeScan" Security product node="N/A" Security product IP="xx.xx.xx.xx;xxxx::xxxx:xxxx:xxxx:4490" Event time="4/25/2020 11:46:01 PM (UTC)" URL="http://xxxxxxx.xxxxxxx.intranet/SMS_MP/.sms_pol?DEP-Z0120115-ScopeId_B14503FF-F7AA-49EC-A38C-F50D813EEC6E/Application_57a673e1-3e65-4f1c-8ce2-0f4cc1b38acc.SHA256:5EF20484EEC38EA203D7A885EAA48BE2DFDC4F130ED8BF5BEA333378875B2516" Source IP="" Destination IP="yyy.yyy.yyy.yyy" Policy rule="" Blocking type="Web reputation" Domain="xxxx-xxxxx" Event time (local)="4/25/2020 7:46:01 PM" Client host name="N/A" Reputation Score="81"`
myfilter.conf
input
{
udp
{
port => 5514
type => syslog
}
}
filter
{
grok
{
match =>
{ "message" => "(?<user_agent>[^>]*)(?<user_agent>[^:]*)%{POSINT}\s%{WORD:logfrom}\s%{WORD:logtag}\:\s%{NOTSPACE:eventname}\s([^=]*)\=%{QUOTEDSTRING:security_product} ([^=]*)\=%{QUOTEDSTRING:security_prod_node}\s([^=]*)\=\"%{IPV4:security_prod_ip}([^=]*)\=\"(?<agent_detected_time>%{MONTHNUM}\/%{MONTHDAY}\/%{YEAR} %{TIME}\s(?:AM|am|PM|pm)\s*\s\(%{TZ:tz}\)).*URL\=\"%{URI:url}\" ([^=]*)\=%{QUOTEDSTRING:src_ip}\s([^=]*)\=\"%{IPV4:dest_ipv4}\"\s([^=]*)\=%{QUOTEDSTRING:policy_rule} ([^=]*)\=%{QUOTEDSTRING:bloking_type} ([^=]*)\=%{QUOTEDSTRING:domain} ([^=]*)\=\"(?<server_alert_time>%{MONTHNUM}\/%{MONTHDAY}\/%{YEAR} %{TIME}\s(?:AM|am|PM|pm))\"\s([^=]*)\=%{QUOTEDSTRING:client_hostname} ([^=]*)\=\"%{BASE10NUM:reputation_score}/?"
}
}
}
output
{
stdout { codec => rubydebug }
}
The example of the output of logstash:
[2020-06-08T13:11:02,253][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/awesome_print-1.7.0/lib/awesome_print/formatters/base_formatter.rb:31: warning: constant ::Fixnum is deprecated
{
"type" => "syslog",
"#timestamp" => 2020-06-08T18:06:39.090Z,
"message" => "<133>Jun 08 14:06:38 TMCM:EVT_URL_CONTENT_FILTERING Security product=\"OfficeScan\" Security product node=\"N/A\" Security product IP=\"xx.xx.xx.xx;xxxx::xxxx:xxx:xxxx:4490\" Event time=\"4/26/2020 7:33:36 AM (UTC)\" URL=\"http://blabnlabla.bla-blabla.intranet/SMS_MP/.sms_pol?DEP-Z0120105-ScopeId_B14503FF-F7AA-49EC-A38C-F50D813EEC6E/Application_2be50193-9121-4239-a70f-ba06ad7bbfbd.SHA256:6FF12991BBA769F9C15F7E1FA3E3058E22B4D918F6C5659CF7B976059082510D\" Source IP=\"\" Destination IP=\"xxx.xx.xxx.xx\" Policy rule=\"\" Blocking type=\"Web reputation\" Domain=\"bla-blabla\" Event time (local)=\"4/26/2020 3:33:36 AM\" Client host name=\"N/A\" Reputation Score=\"81\"",
"#version" => "1",
"host" => "xx.xxx.xx.xx",
"tags" => [
[0] "_grokparsefailure"
]
}
I have tried also "\<133\>" but it still appears. I have no idea what this <133> is.
P.S. I'm learning by myself since last 2 weeks.

Two configs for logstash not working together

I am having a ELK setup for processing haproxy and nginx logs, for this i have used separate config files for logstash, the main data which i want from logs are the "content url" and the "response time", in haproxy the responsetime is in milliseconds like 1345 and in nginx the response time is in seconds like 1.23. In order to bring the response time in same format i changed the haproxy response time to seconds using ruby plugin in logstash. And i m getting the desired results from both when ran individually, in kibana also i changed the response time field to duration on which input is in seconds and output also in seconds. But when i run both configs together the response time for ngnix logs returns 0.000 value and i can see tag of "_grokparsefailure" in json response, but when i run the ngnix config individually to debug it everything works fine, in kibana dashboard i can see proper response time values.
Below is the config for my Nginx logstash Config:
input {
beats {
port => 5045
}
}
filter {
grok {
match => { "message" => "%{IPORHOST:clientip} - - \[%{HTTPDATE:timestamp}\] \"%{WORD:verb} %{URIPATHPARAM:content} HTTP/%{NUMBER:httpversion}\" %{NUMBER:response} %{NUMBER:response_bytes:int} \"-\" \"%{GREEDYDATA:junk}\" %{NUMBER:response_time}"}
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
}
}
Below is the config of my Haproxy logstash config:
input {
beats {
port => 5044
}
}
filter {
grok {
match => { "message" => "%{MONTH:month} %{MONTHDAY:date} %{TIME:time} %{WORD:[source]} %{WORD:[app]}\[%{DATA:[class]}\]: %{IPORHOST:[UE_IP]}:%{NUMBER:[UE_Port]} %{IPORHOST:[NATTED_IP]}:%{NUMBER:[NATTED_Source_Port]} %{IPORHOST:[NATTED_IP]}:%{NUMBER:[NATTED_Destination_Port]} %{IPORHOST:[WAN_IP]}:%{NUMBER:[WAN_Port]} \[%{HAPROXYDATE:[timestamp]}\] %{NOTSPACE:[frontend_name]}~ %{NOTSPACE:[backend_name]} %{NOTSPACE:[ty_name]}/%{NUMBER:[response_time]} %{NUMBER:[http_status_code]} %{NUMBER:[response_bytes]:int} - - ---- %{NOTSPACE:[df]} %{NOTSPACE:[df]} %{DATA:[domain_name]} %{DATA:[cache_status]} %{DATA:[domain_name]} %{URIPATHPARAM:[content]} HTTP/%{NUMBER:[http_version]}" }
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
ruby {
code => "event.set('response_time', event.get('response_time').to_f / 1000)"
}
}
output {
elasticsearch { hosts => ["localhost:9200"] }
stdout {
codec => rubydebug
}
}
I m suspecting the response_time pattern ie %{NUMBER:[response_time]} in haproxy and nginx is creating problem. Don't know what is causing this issue tried every possible thing.

Logstash - Data from Kafka to ES

Using logstash 5.0.0, Taking kafka source as the input -> taking the data and producing the output in Elasticsearch. (ElasticSearch version 5.0.0)
Logstash conf:
input{
kafka{
bootstrap_servers => "XXX.XXX.XX.XXX:9092","XXX.XXX.XX.XXX:9092","XXX.XXX.XX.XXX:9092"
topics => ["a-data","f-data","n-data"]
group_id => "sound"
auto_offset_reset => "earliest"
consumer_threads => 2
}
}
filter{
json{
source => "message"
}
}
output {
elasticsearch {
hosts => [ "XXX.XXX.XX.XXX:9200" ]
}
}
When I run the below configuration , i am getting this following error.
$ ./logstash -f sound.conf
Sending Logstash logs to /logstash-5.0.0/logs which is now configured vi a log4j2.properties.
[2017-01-17T10:53:29,273][ERROR][logstash.agent ] fetched an invalid c onfig {:config=>"input{\nkafka{\nbootstrap_servers => \"XX.XXX.XXX.XX:9092\",\"XXX.XXX.XX.XXX:9092\",\"XXX.XXX.XX.XXX:9092\"\ntopics => [\"a-data\",\"f-data\ ",\"n-data\"]\ngroup_id => \"sound\"\nauto_offset_reset => \"earliest\"\nc onsumer_threads => 2\n}\n}\nfilter{\njson{\nsource => \"message\"\n}\n}\noutput {\nelasticsearch {\nhosts => [ \"XX.XX.XXX.XX:9200\" ]\n}\n}\n\n", :reason=>"Ex pected one of #, {, } at line 3, column 40 (byte 54) after input{\nkafka{\nboots trap_servers => \"XX.XX.XXX.XX:9092\""}
Can anyone help me with this configuration.
Shouldn't your topic be topics which is an array, where you've inserted the values as a hash:
topics => ["a-data","f-data","n-data"] <-- try changing this line

logstash mutate to replace field value in output

I'm trying to replace 10.100.251.98 with another IP 10.100.240.199 in my logstash config, I have tried using filter with mutate function, yet, I'm unable to get the syntax wrtie
Sep 25 15:50:57 10.100.251.98 mail_logs: Info: New SMTP DCID 13417989 interface 172.30.75.10 address 172.30.75.12 port 25
Sep 25 15:50:57 10.100.251.98 local_mail_logs: Info: New SMTP DCID 13417989 interface 172.30.75.10 address 172.30.75.12 port 25
Sep 25 15:51:04 10.100.251.98 cli_logs: Info: PID 35559: User smaduser login from 10.217.3.22 on 172.30.75.10
Sep 25 15:51:22 10.100.251.98 cli_logs: Info: PID 35596: User smaduser login from 10.217.3.22 on 172.30.75.10
Here is my code:
input { file { path => "/data/collected" } }
filter {
if [type] == "syslog" {
mutate {
replace => [ "#source_host", "10.100.251.99" ]
}
}
}
output {
syslog {
facility => "kernel"
host => "10.100.250.199"
port => 514
}
}
I'm noticing a few things about your config. First, you don't have any log parsing. You won't be able to replace a field if it doesn't yet exist. To do this, you can use a codec in your input block or a grok filter. I added a simple grok filter.
You also check if [type] == "syslog". You never set the type, so that check will always fail. If you want to set a type, you can do that in your input block input { file { path => "/data/collected" type => "syslog} }
Here is the sample config I used for testing the grok pattern and replacement of the IP.
input { tcp { port => 5544 } }
filter {
grok { match => { "message" => "%{CISCOTIMESTAMP:log_time} %{IP:#source_host} %{DATA:log_type}: %{DATA:log_level}: %{GREEDYDATA:log_message}" } }
mutate {
replace => [ "#source_host", "10.100.251.199" ]
}
}
output {
stdout { codec => rubydebug }
}
which outputs this:
{
"message" => "Sep 25 15:50:57 10.100.251.98 mail_logs: Info: New SMTP DCID 13417989 interface 172.30.75.10 address 172.30.75.12 port 25",
"#version" => "1",
"#timestamp" => "2016-09-25T14:03:20.332Z",
"host" => "0:0:0:0:0:0:0:1",
"port" => 52175,
"log_time" => "Sep 25 15:50:57",
"#source_host" => "10.100.251.199",
"log_type" => "mail_logs",
"log_level" => "Info",
"log_message" => "New SMTP DCID 13417989 interface 172.30.75.10 address 172.30.75.12 port 25"
}

Resources