Grok Filter String Pattern - logstash-grok

I am pretty new to Grok and I need to filter a line as the one below:
Dec 20 18:46:00 server-04 script_program.sh[14086]: 2017-12-20 18:46:00 068611 +0100 - server-04.location-2 - 14086/0x00007f093b7fe700 - processname/SIMServer - 00000000173d9b6b - info - work: You have 2 connections running
So far I just managed to get the following:
SYSLOGBASE %{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}:
So I get all the date/timestamp details + program + process which is ok.
But that leaves me with the following remaining string:
2017-12-20 18:46:00 068611 +0100 - server-04.location-2 - 14086/0x00007f093b7fe700 - processname/SIMServer - 00000000173d9b6b - info - work: You have 2 connections running
And here I am struggling to break everything into chunks.
I have tried lot of combinations trying to split that based on the hyphen (-) but I am failing so far to do so..
So far I have been pretty much using as a guideline the following:
https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns
Any help/suggestions/tips on this please?
I am using graylog2 and as shown above, trying to use GROK for filtering my messages out..
Many thanks

I managed to get my filter fully done and working. so the solution is below:
SERVER_TIMESTAMP_ISO8601 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%
{MINUTE}(?::?%{SECOND})?[T ]%{INT}[T ]%{ISO8601_TIMEZONE}?
SERVER_HOSTNAME \b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)
SERVER_Unknown %{SERVER_HOSTNAME}[/]%{SERVER_HOSTNAME}
SERVER_Loglevel ([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)
SYSLOGBASE_SERVER %{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}:[T ]%{SERVER_TIMESTAMP_ISO8601: timestamp_match}[T ]-[T ]%{SERVER_HOSTNAME:SERVER_host_node}[T ]-[T ]%{SERVER_Unknown:SERVER_Unknown}[T ]-[T ]%{SERVER_Unknown:service_component}[T ]-[T ]%{SERVER_HOSTNAME:process_code_id}[T ]-[T ]%{SERVER_Loglevel}[T ]-[T ]%{GREEDYDATA:syslog_message}
All the rest or regular expresions from GROK.
Many thanks

Related

correct REST API for autosuggest on google?

I feel silly asking this.. but its doing my head..
if I use 'https://maps.googleapis.com/maps/api/place/autocomplete/json' and set the input parameter to say - 'Palazzo Cast' I will get about 5 suggestions - none of which will be the one I'm looking for. if I set input to 'Palazzo Castellania' I will get zero results - even though there is a place called this (see below). I've set the region parameter to 'mt'...
If I use 'https://maps.googleapis.com/maps/api/place/findplacefromtext' and set the input parameter to 'Palazzo Castellania' - I will get 'the Ministry of Health' - which is correct - however, if I put a partial string in I'll get only a single candidate which will be something different - there doesn't seem to be a way to get multiple place candidates?
I'm guessing from an API side - I have to do a multi-step process - but it would be good to get some input.
My thoughts:
I start with 'https://maps.googleapis.com/maps/api/place/autocomplete/json' - if I get an empty result, I try 'https://maps.googleapis.com/maps/api/place/findplacefromtext'
if I get a single result from either then I can pass the placeID to the places API to get more detailed data.
Make sense? It feels argly..
Edit
So watching how https://www.google.com.mt/ does it... while typing it uses suggest (and never gives the right answer, just like the API) and then when I hit enter it uses search and gives the correct answer... leading me to the conclusion that there is actually two databases happening!
Basically "its by design".. there is no fix as of Feb 2023.. My thoughts are to cache results and do a first search against that otherwise I'll probably use bing or here

How to write the grok expression for my log?

I am trying to write a grok to analysis my logs.
Use logstash 7 to collect logs. But I failed writing grok after many attempts.
Log looks like this:
[2018-09-17 18:53:43] - biz_util.py [Line:55] - [ERROR]-[thread:14836]-[process:9504] - an integer is required
My Grok(fake):
%{TIMESTAMP_ISO8601 :log_time} - %{USERNAME:module}[Line:%{NUMBER:line_no}] - [%{WORD:level}]-[thread:%{NUMBER:thread_no}]-[process:%{NUMBER:process_no}] - %{GREEDYDATA:log}
Only the timestamp part is OK. The others failed.
that will work:
\[%{TIMESTAMP_ISO8601:log_time}\] - %{DATA:module} \[Line:%{NUMBER:line_no}\] - \[%{WORD:level}\]-\[thread:%{NUMBER:thread_no}\]-\[process:%{NUMBER:process_no}\] - %{GREEDYDATA:log}
you need to escap [
This will work,
[%{TIMESTAMP_ISO8601:log_time}] %{NOTSPACE} %{USERNAME:module} [Line:%{BASE10NUM:Line}] %{NOTSPACE} [%{LOGLEVEL}]%{NOTSPACE}[thread:%{BASE10NUM:thread}]%{NOTSPACE}[process:%{BASE10NUM:process}]

getting rid of colon in grok

Basically I was setting up an Elasticsearch-Logstash-Kibana (elk) stack for monitoring syslogs. Now I have to write the grok pattern for logstash.
Here's an example of my log:
May 8 15:14:50 tileserver systemd[25780]: Startup finished in 29ms.
And that's my pattern (yet):
%{SYSLOGTIMESTAMP:zeit} %{HOSTNAME:host} %{SYSLOGPROG:program}
Usually I'm also using %{DATA:text} for the message but it just works on the link below.
I'm using Test grok patterns to test my patterns and these 3 work fine but there's the colon (from after PID) in front of the message and I don't want it to be there.
How do I get rid of it?
try this:
%{SYSLOGTIMESTAMP:zeit} %{HOSTNAME:host} %{GREEDYDATA:syslog_process}(:) %{GREEDYDATA:message}

Logstash Grok Problems with 4 digit year

Spent the past hour trying to setup a grok filter for logstash. Working with the Grok Debugger everything's good until I get to the timestamp. Grok chokes on the four digit year.
Here is a logfile entry as its sent to logstash:
Jul 8 11:54:29 192.168.1.144 1 2016-07-08T15:55:09.629Z era.somedomain.local ETAServer 1755 Syslog {"event_type":"Threat_Event","ipv4":"192.168.1.118","source_uuid":"7ecab29a-7db3-4c79-96f5-3946de54cbbf","occured":"08-Jul-2016 15:54:54","severity":"Warning","threat_type":"trojan","threat_name":"HTML/Agent.V","scanner_id":"HTTP filter","scan_id":"virlog.dat","engine_version":"13773 (20160708)","object_type":"file","object_uri":"http://malware.wicar.org/data/java_jre17_exec.html","action_taken":"connection terminated","threat_handled":true,"need_restart":false,"username":"DOMAIN\username","processname":"C:\Program Files (x86)\Google\Chrome\Application\chrome.exe"}
What I'm having trouble with is the first part before the JSON data. The first part of my grok statement:
%{MONTH}\ %{MONTHDAY}%{SPACE}%{TIME}%{SPACE}%{IPV4}%{SPACE}%{NUMBER}%{SPACE}
works fine correctly identifying everything up to the number '1' just before the year in the timestamp. The problem is when I add the following:
%{MONTH}\ %{MONTHDAY}%{SPACE}%{TIME}%{SPACE}%{IPV4}%{SPACE}%{NUMBER}%{SPACE}%{TIMESTAMP_ISO8601}
then I get "No Matches" in the grok debugger. Messing around with it a bit more it appears that problem is somewhere between the number '1' and the first two digits of the year in the timestamp since %{TIMESTAMP_ISO8601} only uses a two digit year.
Any suggestions or help would be greatly appreciated.
Digging a little deeper into Regex and Grok looks like I figured it out. I replaced %{TIMESTAMP_ISO8601} with:
([^\d\d]%{YEAR})[./-]%{MONTHNUM}[./-]%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}
and it worked perfectly. The key was the [^\d\d] in front of %{YEAR}

How can I avoid "write everything twice" in my hiera data?

Is there a better way to format my hiera data?
I want to avoid the "write everything twice" problem.
Here is what I have now:
[root#puppet-el7-001 ~]# cat example.yaml
---
controller_ips:
- 10.0.0.51
- 10.0.0.52
- 10.0.0.53
controller::horizon_cache_server_ip:
- 10.0.0.51:11211
- 10.0.0.52:11211
- 10.0.0.53:11211
I was wondering if there is functionality avaialble in hiera that is like Perl's map function.
If so then I could do something like:
controller::horizon_cache_server_ip: "%{hiera_map( {"$_:11211"}, %{hiera('controller_ips')})}"
Thanks
It depends on which puppet version you are using. I puppet 3.x, you can do the following:
common::test::var1: a
common::test::var2: b
common::test::variable:
- "%{hiera('common::test::var1')}"
- "%{hiera('common::test::var2')}"
common::test::variable2:
- "%{hiera('common::test::var1')}:1"
- "%{hiera('common::test::var2')}:2"
In puppet 4.0 you can try using a combination of zip, hash functions from stdlib, with built in function map.
Something like:
$array3 = zip($array1, $array2)
$my_hash = hash($array3)
$my_hash.map |$key,$val|{ "${key}:${val}" }
The mutation is a problem. It is simpler with identical data thanks to YAML's referencing capability.
controller_ips: &CONTROLLERS
- 10.0.0.51
- 10.0.0.52
- 10.0.0.53
controller::horizon_cache_server_ip: *CONTROLLERS
You will need more logic so that the port can be stored independently.
controller::horizon_cache_server_port: 11211
The manifest needs to be structured in a way that allows you to combine the IPs with the port.

Resources