Grok pattern file - logstash-grok

I'm trying to get pattern grok on my log file data
this is the message log
116.50.181.5 - - [18/May/2019:09:05:32 +0000] "SHARP56" 50 245 "INFO: System componement ready for use" 23 "A4" "/user/admistrator/68768.pdf" "INFO: No ERROR TO SHOW"
I've tried this grok pattern but it didn't works
%{IP:client} %{HTTPDATE:timestamp}\] %{WORD:name} %{NUMBER:X1} %{NUMBER:x2} %{WORD:msg} %{NUMBER:X3} %{WORD:format} %{WORD:path} %{WORD:label}
the output file that I want should look like this
{
client = 116.50.181.5
timeStamp = 18/May/2019:09:05:32 +0000
name = SHARP56
x1 = 50
x2 = 245
msg =INFO
format = A4
type = pdf
label = INFO: No ERROR TO SHOW
}
any suggestion ?

you can use the following:
%{IP:client} - - \[%{HTTPDATE:timestamp}\] \"%{DATA:name}\" %{NUMBER:X1} %{NUMBER:x2} \"%{GREEDYDATA:msg}\" %{NUMBER:X3} \"%{WORD:format}\" \"%{DATA:path}\" \"%{GREEDYDATA:label}\"

Related

TELEGRAF TAIL : Creating parser: Invalid data format: Grok

I am using Telegraf to get logs information from specific logs with Hexa data.
I am using tail but i still to get the same error : Invalid data format: Grok.
My log look like this :
18/08/2022 21:04:23 01 41 7B 00 04 14 00 00 00 FD AB
and a configuration for tail in telegraf :
[[inputs.tail]]
files = ["/mnt/cle/*a.*.log"]
from_beginning = true
max_undelivered_lines = 300
character_encoding = "utf-8"
data_format = "Grok"
grok_patterns = ['%{DATE_EU:date} %{TIME:time} %{WORD:my1id} %{WORD:my2id} %{BASE16NUM:01hexa} %{BASE16NUM:02hexa} %{BASE16NUM:03hexa} %{BASE16NUM:04hexa} %{BASE16NUM:05hexa} %{BASE16NUM:06hexa} %{BASE16NUM:07hexa} %{BASE16NUM:08hexa} %{BASE16NUM:09hexa} %{BASE16NUM:10hexa} %{BASE16NUM:11hexa}']
I try also for grok_patterns :
grok_patterns = ['%{TIMESTAMP_ISO8601:timestamp:ts-"2006/01/02 15:04:05"} %{WORD:MRIid} %{WORD:OPUid} %{WORD:01hexa} %{WORD:02hexa} %{WORD:03hexa} %{WORD:04hexa} %{WORD:05hexa} %{WORD:06hexa} %{WORD:07hexa} %{WORD:08hexa} %{WORD:09hexa} %{WORD:10hexa} %{WORD:11hexa}']
Also, i would like to convert my hexa data to decimal and apply a conversion formula.
And to complicate things, i vould like to join two pattern before converting data.
I have used this link for the grok_patterns : Grok input data format
I found a solution and below my correction :
grok_patterns = ['%{TIMESTAMP_ISO8601:timestamp:ts-"2006-01-02 15:04:05"} %{WORD:MATid} %{WORD:OPUid}']
grok_custom_patterns = '''
'''
With this log :
2022-08-19 17:21:18 MAT01 OPU30
2022-08-19 17:21:19 MAT01 OPU30
However, I'm still looking to convert my hexa data to decimal and apply a conversion formula. And to complicate things, i vould like to join two pattern before converting data.

How to extract the substring preceding marker?

I have a string:
[3016] - Device is ready...
[10ice is loading..13] - v3[3016] - Device is ready...
[1r 0.[3016] - Device is ready.
Everything except '[3016] - Device is ready...' is 'noise'
The key word here is "Device is ready"
3016 - timestamp in msec. I need to extract '3016' from string for further operations
Tried following:
if "Device is ready" in reply:
# set a pattern for extracting time from the result
found = re.findall("\[.*\]", reply)
# Cut timestemp from reply
x = [tm[1:-1] for tm in found]
in case the reply was 'clean' ([3016] - Device is ready...) it's ok, but if there is 'noise' in reply then it doesn't work. Can someone point me in the right direction or perhaps assist with the code? Thanks in advance
If there is a single key, and it should precede the marker Device is ready, you can capture the digits first.
\[(\d+)].*\bDevice is ready\b
The pattern matches:
\[(\d+)] Capture 1+ digits between square brackets in group 1
.* Match 0+ times any char
\bDevice is ready\b and then Device is ready
Regex demo | Python demo
import re
strings = [
"[3016] - Device is ready...",
"[10ice is loading..13] - v3[3017] - Device is ready...",
"[1r 0.[3018] - Device is ready.",
"[1r 0 - Device is ready. [3019]",
]
pattern = r"\[(\d+)].*\bDevice is ready\b"
for s in strings:
match = re.search(pattern, s)
if match:
print(match.group(1))
Output
3016
3017
3018
You should use a regex group () to extract the number. found will be a list of all the numbers found inside []:
if "Device is ready" in reply:
# set a pattern for extracting time from the result
found = re.findall("\[(\d+)\]", reply)
print(found[0])

Can't paste value of re.findall in cell using openpyxl

I'm unable to print the result of re.findall in an Excel cell, which is the result of another re.findall which is captured from a text file with multi-line data.
Below is the section of code from the entire code where I am facing a problem.
import openpyxl,os,re # -- Import
pyro_asr920_dir = '{}\\'.format(os.getcwd()) # -- MARK DIRECTORIES
input_asr920_robot_pid_wb = openpyxl.load_workbook('Data_Workbook.xlsm', keep_vba = True) # -- OPEN EXCEL
input_asr920_robot_pid_tpd_sheet = input_asr920_robot_pid_wb['Topology-Ports_Details']
wduplinka = open(os.path.join(pyro_asr920_dir, 'DELSNACAG01C7606-logs.txt'),'r') # -- OPEN MULTILINE TEXT FILE
uplinkacontent = wduplinka.read()
PreBBa = re.findall( r'{}[\s\S]*?!\ni'.format('interface TenGigabitEthernet4/2'), uplinkacontent) # -- GET REQUIRED SUBSTRING WITH MATCH CRITERIA IN BETWEEN
print(PreBBa)
output01 = '''
['interface TenGigabitEthernet4/2\n isis metric 10 level-2\n!\ni']'''
for line in PreBBa: # - 01 > I CAN PRINT THIS ON EXCEL CELL
input_asr920_robot_pid_tpd_sheet['H27'] = line[:-1]
print(line[:-1])
print('-----------')
output02 = '''
interface TenGigabitEthernet4/2
isis metric 10 level-2
!'''
# ----------------------------------------------------------------- UNABLE TO GET VALUES IN CELL
for line in PreBBa: # - 02 > I CAN'T PRINT THIS ON EXCEL CELL {THIS IS WHERE I AM STUCK}
if 'ospf' in line:
theOSPF = re.findall(r'{}[\s\S]*?{}'.format(' ip ospf','\n c'), line)
input_asr920_robot_pid_tpd_sheet['C47'] = 'Yes'
else:
input_asr920_robot_pid_tpd_sheet['C47'] = 'No' # UNABLE TO GETRESULT IN EXCEL
output03 = '''No'''
# -----------------------------------------------------------------
metric = re.findall(r'{}[\s\S]*?{}'.format('metric ',' '), str(line))
metric = re.findall(r'\d+',str(metric))
input_asr920_robot_pid_tpd_sheet['C46'].value = metric[0] # UNABLE TO GETRESULT IN EXCEL
print(metric)
output04 = '''10'''
# -----------------------------------------------------------------
input_asr920_robot_pid_wb.save('Data_Workbook.xlsm')
wduplinka.close()
input_asr920_robot_pid_wb.close()
print('TEST COMPLETED')
Some content of Text file is as below:
DELSNACAG01C7606#sh running-config
Load for five secs: 18%/1%; one minute: 26%; five minutes: 26%
Time source is NTP, 13:43:23.718 IST Fri Aug 16 2019
Building configuration...
Current configuration : 228452 bytes
!
! Last configuration change at 21:15:56 IST Thu Aug 15 2019
! NVRAM config last updated at 06:42:52 IST Sat Aug 10 2019 by cor382499
!
interface TenGigabitEthernet4/2
isis metric 10 level-2
!
interface TenGigabitEthernet4/3
!
end

splitting a file.txt into two file with a condition

How can i split the given file into two different files results codes and warning codes.AS given below is single text file and I want to split it into two files as I had lot more file in such condition to split.
Result Codes:
0 - SYS_OK - "Ok"
1 - SYS_ERROR_E - "System Error"
1001 - MVE_SYS_E - "MTE System Error"
1002 - MVE_COMMAND_SYNTAX_ERROR_E - "Command Syntax is wrong"
Warning Codes:
0 - SYS_WARN_W - "System Warning"
100001 - MVE_SYS_W - "MVE System Warning"
200001 - SLEA_SYS_W - "SLEA System Warning"
200002 - SLEA_INCOMPLETE_SCRIPTED_OVERRIDE_COMMAND_W - "One or more of the entered scripted override commands has missing mandatory parameters"
300001 - L1_SYS_W - "L1 System Warning"
Well, on first glance, the distinction seems to be that "warnings" all contain the character sequence _W - and anything that doesn't is "results". Did you notice that?
awk '/_W -/{print >"warnings";next}{print >"results"}'
Here is a python solution:
I am assuming you are having the list of warning codes.
import re
warnings = open(r'warning-codes.txt');
warn_codes =[]
for line in warnings:
m = re.search(r'(\d+) .*',line);
if(m):
warn_codes.append(m.groups(1));
ow = open('output-warnings.txt','w')
ors = open('output-results.txt','w')
log_file = open(r'log.txt');
for line in log_file:
m = re.search(r'(\d+) .*',line);
if(m and (m.groups(1) in warn_codes)):
ow.write(line+'\n');
elif(m):
ors.write(line+'\n');
else:
print("none");
ow.close()
ors.close()

Splunk splitting xml log event

We have logs that log an event to a single file. Each log entry looks something like this:
<LogEntry>
<UserName>IIS APPPOOL\ASP.NET v4.0</UserName>
<TimeStamp>02/28/2014 13:54:17</TimeStamp>
<ThreadName>20</ThreadName>
<CorrelationId>7a0d464d-556c-4d47-820f-0cf01322e54c</CorrelationId>
<LoggerName>-Api-booking</LoggerName>
<Level>INFO</Level>
<Identity></Identity>
<Domain>API-1-130380690118132000</Domain>
<CreatedOn>02/28/2014 13:54:22</CreatedOn>
<ExceptionObject />
<RenderedMessage>"7a0d464d-556c-4d47-820f-0cf01322e54c" - "GET https://myapi.com/booking" - API-"Response":
"Unauthorized"</RenderedMessage>
</LogEntry>
When we import these logs into Splunk, the log entry is split up incorrectly into 3 parts e.g.
1-
<LogEntry>
<UserName>IIS APPPOOL\ASP.NET v4.0</UserName>
2-
<CreatedOn>02/28/2014 02:57:55</CreatedOn>
<ExceptionObject />
<RenderedMessage>"66d8cdda-ff62-480a-b7d2-ec175b151e5f" - "POST https://myapi.com/booking" - API-"Response":
"Bad Request"</RenderedMessage>
</LogEntry>
3-
<TimeStamp>02/28/2014 02:57:29</TimeStamp>
<ThreadName>21</ThreadName>
<CorrelationId>66d8cdda-ff62-480a-b7d2-ec175b151e5f</CorrelationId>
<LoggerName>-Api-booking</LoggerName>
<Level>INFO</Level>
<Identity></Identity>
<Domain>/LM/W3SVC/1/ROOT/Api-1-130380256918440000</Domain>
How can I configure Splunk to see these as a single log event?
props.conf (pay attention to LINE_BREAKER)
[your_xml_sourcetype]
TIME_PREFIX = <TimeStamp>
MAX_TIMESTAMP_LOOKAHEAD = 19
TZ = GMT
# A performance tweak is to disable SHOULD_LINEMERGE and then set the
# LINE_BREAKER to "line ending characters coming before a new time stamp"
# (note the direct link of the TIME_FORMAT to the regex of LINE_BREAKER).
TIME_FORMAT = %m/%d/%Y %T
LINE_BREAKER = ([\r\n]+)<LogEntry>
SHOULD_LINEMERGE = False
# 10000 is default, should be set on a case by case basis
TRUNCATE = 5000
# If the data does not have nice key=value pairs, (or some other readily
# machine parseable format, like JSON or XML), set KV_MODE = none so that
# Splunk doesn't spin its wheels on attempting to look for key = value
# pairs which don't exist.
KV_MODE = xml
# Leaving PUNCT enabled can impact indexing performance. Customers can
# comment this line if they need to use PUNCT
ANNOTATE_PUNCT = false
More information here: http://docs.splunk.com/Documentation/Splunk/latest/Admin/Propsconf

Resources