My grok pattern is still slow how to optimise it further? - logstash

I’m curious what would be the most optimal solution for my pattern because this one still not the fastest I guess.
These are my log lines:
2021-07-09T11:48:32.328+0700 7fed98b56700 1 beast: 0x7fedfbac36b0: 10.111.111.111 - - [2021-07-09T11:48:32.328210+0700] "GET /glab-reg/docker/registry/v2/blobs/sha256/3f/3fe01ae49e6c42751859c7a8f8a0b5ab4362b215d07d8a0beaa802113dd8d9b8/data HTTP/1.1" 206 4339 - "docker-distribution/v3.0.0-gitlab (go1.14.7) aws-sdk-go/1.27.0 (go1.14.7; linux; amd64)" bytes=0-
2021-07-09T12:11:45.252+0700 7f36b0dd8700 1 beast: 0x7f374adb36b0: 10.111.111.111 - - [2021-07-09T12:11:45.252941+0700] "GET /glab-reg?list-type=2&max-keys=1&prefix= HTTP/1.1" 200 723 - "docker-distribution/v3.0.0-gitlab (go1.14.7) aws-sdk-go/1.27.0 (go1.14.7; linux; amd64)" -
2021-07-09T12:11:45.431+0700 7f360fc96700 1 beast: 0x7f374ad326b0: 10.111.111.111 - - [2021-07-09T12:11:45.431942+0700] "GET /streag/?list-type=2&delimiter=%2F&max-keys=5000&prefix=logs%2F&fetch-owner=false HTTP/1.1" 200 497 - "Hadoop 3.2.2, aws-sdk-java/1.11.563 Linux/5.4.0-70-generic OpenJDK_64-Bit_Server_VM/25.252-b09 java/1.8.0_252 scala/2.12.10 vendor/Oracle_Corporation" -
2021-07-09T12:12:00.738+0700 7fafc968d700 1 beast: 0x7fb0b5f0d6b0: 10.111.111.111 - - [2021-07-09T12:12:00.738060+0700] "GET /csder-prd-cae?list-type=2&max-keys=1000 HTTP/1.1" 200 279469 - "aws-sdk-java/2.16.50 Linux/3.10.0-1160.31.1.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.292-b10 Java/1.8.0_292 scala/2.11.10 vendor/Red_Hat__Inc. io/async http/NettyNio cfg/retry-mode/legacy" -
2021-07-09T12:55:43.573+0700 7fa5329e3700 1 beast: 0x7fa4499846b0: 10.111.111.111 - - [2021-07-09T12:55:43.573351+0700] "PUT /s..prr//WHITELABEL-1/PAGETPYE-7/DEVICE-1/LANGUAGE-18/SUBTYPE-0/10236929 HTTP/1.1" 200 34982 - "aws-sdk-dotnet-coreclr/3.5.10.1 aws-sdk-dotnet-core/3.5.3.8 .NET_Core/4.6.26328.01 OS/Microsoft_Windows_6.3.9600 ClientAsync" -
2021-07-09T12:55:43.587+0700 7fa4e9951700 1 beast: 0x7fa4490f36b0: 10.111.111.111 - - [2021-07-09T12:55:43.587351+0700] "GET /admin/log/?type=data&id=22&marker=1_1625810142.071426_1063846896.1&extra-info=true&rgwx-zonegroup=31a5ea05-c87a-436d-9ca0-ccfcbad481e3 HTTP/1.1" 200 44 - - -
This is my filter:
%{TIMESTAMP_ISO8601:LogTimestamp}\] \"%{WORD:request_method} (?<swift_v1>(/swift/v1){0,1})/(?<bucketname>(([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]*[a-zA-Z0-9])\.{1,})*([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]*[a-zA-Z0-9]))(\?|/\?|/)(?<list_type=2>(list-type=2){0,1})%{GREEDYDATA}%{SPACE}HTTP/1.1\" %{NUMBER:httprespcode:int}

I just read some exciting articles here improve grok.
here is some of the point :
According to the source, the time spent checking that a line doesn't match can be up to 6 times slower than a regular (successful) match especially with no match start and no match end. so you can improve the grok patterns by hacking through the failed match. you might want to user anchors ^ and $ to help grok decide faster based on the beginning or the end.
sample
%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response:int} (?:-|%{NUMBER:bytes:int}) %{QS:referrer} %{QS:agent}
then
^%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response:int} (?:-|%{NUMBER:bytes:int}) %{QS:referrer} %{QS:agent}$
result
made the initial match failure detection around 10 times faster.
all rights belong to respective writer. amazing articles. you have to check the link out.

Related

How to compile several regex into one

Good morning, I need to compile a several regular expressions into one pattern
Regular expressions are like this:
reg_ip = r'(?P<IP>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'
reg_meth = r'(?P<METHOD>GET|POST|PUT|DELETE|HEAD)'
reg_status = r'\s(?P<STATUS>20[0-9]|30[0-9]|40[0-9]|50[0-9])\s'
reg_400 = r'\s(?P<STATUS_400>40[0-9])\s'
reg_500 = r'\s(?P<STATUS_500>50[0-9])\s'
reg_url = r'"(?P<URL>htt[p|ps]:.*?)"'
reg_rt = r'\s(?P<REQ_TIME>\d{4})$'
Regular expressions are written for strings from apache access.log:
109.169.248.247 - - [12/Dec/2015:18:25:11 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" 4374
Tried to compile it with code like this:
some_pattern = re.compile(reg_ip.join(reg_meth).join(reg_status))
Obviously it doesn't work that way. How to do it right?
You need some glue between regexes.
You have two options:
join regexes via alternation: regex1|regex2|regex3|... and use global search
add missing glue betweek regexes: for example, between reg_status and reg_url you may need to add r'[^"]+' to skip the next number
The problem with alternation is that you could find the regexes at any place. So you could find for example the word post (or a number) inside an url.
So for me, the second option is better.
This is the glue I would use:
import re
reg_ip = r'(?P<IP>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'
reg_meth = r'(?P<METHOD>GET|POST|PUT|DELETE|HEAD)'
reg_status = r'\s(?P<STATUS>20[0-9]|30[0-9]|40[0-9]|50[0-9])\s'
#reg_400 = r'\s(?P<STATUS_400>40[0-9])\s'
#reg_500 = r'\s(?P<STATUS_500>50[0-9])\s'
reg_url = r'"(?P<URL>https?:[^"]+)"'
reg_rt = r'\s(?P<REQ_TIME>\d{4})$'
some_pattern = re.compile(reg_meth + r'\s+[^]]+\s*"' + reg_status + r'[^"]+' + reg_url + r'\s*"[^"]+"\s*' + reg_rt)
print(some_pattern)
line = '109.169.248.247 - - [12/Dec/2015:18:25:11 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "http://almhuette-raith.at/administrator/" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" 4374'
print(some_pattern.search(line))
For the glue, these are the pieces I used:
\s* : Capture any 'whitespace' 0 or more times
\s+ : Capture any 'whitespace' 1 or more times
[^X]+ : Where 'X' is some character; Capture any non-X characters one or more times
By the way:
This htt[p|ps] is not correct. You can simply use https? instead. Or if you want to do it with groups: htt(p|ps) or http(?:p|ps) (Last one is a non-capturing group, which is preferred if you dont want to capture its content)

Adding/changing numbers in specific lines

I have a big file with 250,000 lines and I want to change the numbers every 36 lines.
Example of my file:
rankup_cost_increase_percentage: 0.0
removepermission:
- essentials.warps.B
- essentials.warps.C
- essentials.warps.D
- essentials.warps.E
- essentials.warps.F
- essentials.warps.G
- essentials.warps.H
- essentials.warps.I
- essentials.warps.J
- essentials.warps.K
- essentials.warps.L
- essentials.warps.M
- essentials.warps.N
- essentials.warps.O
- essentials.warps.P
- essentials.warps.Q
- essentials.warps.R
- essentials.warps.S
- essentials.warps.T
- essentials.warps.U
- essentials.warps.W
- essentials.warps.X
- essentials.warps.Y
- essentials.warps.Z
executecmds:
- "[CONSOLE] crate give to %player% Legendary 1"
- "[CONSOLE] crate give to %player% Multiplier 1"
- "[player] warp A"
P7437:
nextprestige: P7438
cost: 3.7185E13
display: '&8[&9P7437&8]'
rankup_cost_increase_percentage: 0.0
I want rankup_cost_increase_percentage: 0.0 to increase by 5.0 everytime.
How would I be able to do that?

Sending data from Hono to Ditto

Eclipse Hono and Eclipse Ditto have been connected successfully. And when I try to send the data from the Hono I will get 202 Accepted response as shown below.
(base) vignesh#nb907:~$ curl -X POST -i -u sensor9200#tenantSensorAdaptersss:mylittle -H 'Content-
Type: application/json' -d '{"temp": 23.07, "hum": 45.85122}' http://localhost:8080/telemetry
HTTP/1.1 202 Accepted
content-length: 0
But when I checked for the digital twin value using localhost:8080/api/2/testing.ditto:9200 - It is not getting updated.
I came across the error while enquiring the logs.
connectivity_1 | 2019-10-14 15:18:26,273 INFO [ID:AMQP_NO_PREFIX:TelemetrySenderImpl-7]
o.e.d.s.c.m.a.AmqpPublisherActor akka://ditto-
cluster/system/sharding/connection/27/Amma123465/pa/$a/c1/amqpPublisherActor2 - Response dropped,
missing replyTo address: UnmodifiableExternalMessage [headers={content-
type=application/vnd.eclipse.ditto+json, orig_adapter=hono-http, orig_address=/telemetry,
device_id=9200, correlation-id=ID:AMQP_NO_PREFIX:TelemetrySenderImpl-7}, response=true, error=true,
authorizationContext=null, topicPath=ImmutableTopicPath [namespace=unknown, id=unknown, group=things,
channel=twin, criterion=errors, action=null, subject=null, path=unknown/unknown/things/twin/errors],
enforcement=null, headerMapping=null, sourceAddress=null, payloadType=TEXT, textPayload=
{"topic":"unknown/unknown/things/twin/errors","headers":{"content-
type":"application/vnd.eclipse.ditto+json","orig_adapter":"hono-
http","orig_address":"/telemetry","device_id":"9200","correlation-
id":"ID:AMQP_NO_PREFIX:TelemetrySenderImpl-7"},"path":"/","value":
{"status":400,"error":"json.field.missing","message":"JSON did not include required </path>
field!","description":"Check if all required JSON fields were set."},"status":400}, bytePayload=null']
gateway_1 | 2019-10-14 15:19:47,927 WARN [b9774050-48ae-45c4-a937-68a70f8defe5]
o.e.d.s.g.s.a.d.DummyAuthenticationProvider - Dummy authentication has been applied for the following
subjects: nginx:ditto
gateway_1 | 2019-10-14 15:19:47,949 INFO [b9774050-48ae-45c4-a937-68a70f8defe5]
o.e.d.s.m.c.a.ConciergeForwarderActor akka://ditto-cluster/user/gatewayRoot/conciergeForwarder -
Sending signal with ID <testing.ditto:9200> and type <things.commands:retrieveThing> to concierge-
shard-region
gateway_1 | 2019-10-14 15:19:48,044 INFO [b9774050-48ae-45c4-a937-68a70f8defe5]
o.e.d.s.g.e.HttpRequestActor akka://ditto-cluster/user/$C - DittoRuntimeException
<things:precondition.notmodified>: <The comparison of precondition header 'if-none-match' for the
requested Thing resource evaluated to false. Expected: '"rev:1"' not to match actual: '"rev:1"'.>.
I have set the all the json fields. But not sure what I am missing.
I also can see this in the log
nginx_1 | 172.18.0.1 - ditto [14/Oct/2019:13:19:48 +0000] "GET
/api/2/things/testing.ditto:9200 HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36"
Please let me know if I am missing something.
Did you send the message in Ditto protocol or applied payload transformation?
Looks like a duplicate of Connecting Eclipse Hono to Ditto - "description":"Check if all required JSON fields were set."},"status":400}" Error where you had the same problem and error before.

How to read all ip address from xxx.log file and print their count?

I am JasperReports Developer, but my manager moved me to work on Python 3 project to read the IP address from 'fileName.log' file and want me to print the count of IP address if one IP watched my Video more than one time.
I am very new to Python 3. Please help me with this problem.
My file as below:
66.23.64.12 - - [06/Nov/2014:19:10:38 +0600] "GET /news/53f8d72920ba2744fe873ebc.html HTTP/1.1" 404 177 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
64.24.65.93 - - [06/Nov/2014:19:11:24 +0600] "GET /?q=%E0%A6%AB%E0%A6%BE%E0%A7%9F%E0%A6%BE%E0%A6%B0 HTTP/1.1" 200 4223 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
78.849.65.62 - - [06/Nov/2014:19:12:14 +0600] "GET /?q=%E0%A6%A6%E0%A7%8B%E0%A7%9F%E0%A6%BE HTTP/1.1" 200 4356 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
78.849.65.62 - - [06/Nov/2014:19:12:14 +0600] "GET /?q=%E0%A6%A6%E0%A7%8B%E0%A7%9F%E0%A6%BE HTTP/1.1" 200 4356 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
98.449.65.19 - - [06/Nov/2014:19:10:38 +0600] "GET /news/53f8d72920ba2744fe873ebc.html HTTP/1.1" 404 177 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
54.49.65.03 - - [06/Nov/2014:19:11:24 +0600] "GET /?q=%E0%A6%AB%E0%A6%BE%E0%A7%9F%E0%A6%BE%E0%A6%B0 HTTP/1.1" 200 4223 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
54.49.65.03 - - [06/Nov/2014:19:11:24 +0600] "GET /?q=%E0%A6%AB%E0%A6%BE%E0%A7%9F%E0%A6%BE%E0%A6%B0 HTTP/1.1" 200 4223 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
45.79.65.62 - - [06/Nov/2014:19:12:14 +0600] "GET /?q=%E0%A6%A6%E0%A7%8B%E0%A7%9F%E0%A6%BE HTTP/1.1" 200 4356 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
Output as below:
IP Count
98.449.65.19 2
54.49.65.03 4
Here is one method, it involves storing the IP address in a dictionary, which may be useful depending on what else you would like to do with the data.
# Read in the text file
with open('fileName.log','r') as f:
lines = f.readlines()
data = {}
for line in lines:
# Split the line each time a space appears, and take the first element (the IP address)
print(line)
ipAddr = line.split()[0]
if ipAddr in data:
data[ipAddr]+=1
else:
data[ipAddr]=1
# Print counts of each IP address
print(' IP Count')
for key, val in data.items():
print(key, val)
Output:
IP Count
66.23.64.12 1
64.24.65.93 1
78.849.65.62 2
98.449.65.19 1
54.49.65.03 2
45.79.65.62 1

Grok filter isn't matching to the bro httplog data

I am trying to use ELK to visualize the BRO log data. I found multiple grok filters online and it keeps failing to match the pattern to the data. One of the filters I tried using is:
grok {
match => [ "message", "(?<ts>(.*?))\t(?<uid>(.*?))\t(?<id.orig_h>(.*?))\t(?<id.orig_p>(.*?))\t(?<id.resp_h>(.*?))\t(?<id.resp_p>(.*?))\t(?<trans_depth>(.*?))\t(?<method>(.*?))\t(?<bro_host>(.*?))\t(?<uri>(.*?))\t(?<referrer>(.*?))\t(?<user_agent>(.*?))\t(?<request_body_len>(.*?))\t(?<response_body_len>(.*?))\t(?<status_code>(.*?))\t(?<status_msg>(.*?))\t(?<info_code>(.*?))\t(?<info_msg>(.*?))\t(?<filename>(.*?))\t(?<http_tags>(.*?))\t(?<username>(.*?))\t(?<password>(.*?))\t(?<proxied>(.*?))\t(?<orig_fuids>(.*?))\t(?<orig_mime_types>(.*?))\t(?<resp_fuids>(.*?))\t(?<resp_mime_types>(.*))" ]
}
The bro data I am trying to ingest is as follows:
#separator \x09
#set_separator ,
#empty_field (empty)
#unset_field -
#path http
#open 2018-11-27-18-31-02
#fields ts uid id.orig_h id.orig_p id.resp_h id.resp_p trans_depth method host uri referrer version user_agent request_body_len response_body_len status_code status_msg info_code info_msg tags username password proxied orig_fuids orig_filenames orig_mime_types resp_fuids resp_filenames resp_mime_types
#types time string addr port addr port count string string string string string string count count count string count string set[enum] string string set[string] vector[string] vector[string] vector[string] vector[string] vector[string] vector[string]
1543343462.308603 CrJmZi31EU3tUXba3c 10.100.130.72 38396 216.58.217.110 80 1 - - - - 1.1 - 0 219 301 Moved Permanently - - (empty) - - - - - - FXhQ5K1ydVhFnz9Agi - text/html
1543344229.051726 CLj9eD4BcFR42BRHV1 10.100.130.72 37452 169.254.169.254 80 1 - - - - 1.0 - 0 13 200 OK - - (empty) - - - - - - FO0Zko4uvyxeC8LDx4 - text/plain
1543345395.827176 C6Kdv49oODjjkgeFk 10.100.130.72 37464 169.254.169.254 80 1 - - - - 1.0 - 0 345 404 Not Found - - (empty) - - - - - - FW4NGDCyMNR43J4Hf - text/html
1543345691.165771 CNaObqkLN9imdehl4 10.100.130.72 37466 169.254.169.254 80 1 - - - - 1.0 - 0 13 200 OK - - (empty) - - - - - - FmUSygO8ocHKTN8L3 - text/plain
1543347316.900516 Ck5CsV2hr56axo3rzl 10.100.130.72 37486 169.254.169.254 80 1 - - - - 1.0 - 0 13 200 OK - - (empty) - - - - - - FXKDmj3kllpKuJnSkg - text/plain
1543348718.870063 CFBClg1jRpmBp4ElYb 10.100.130.72 37506 169.254.169.254 80 1 - - - - 1.0 - 0 13 200 OK - - (empty) - - - - - - F02j4T12ssIF2tYFF5 - text/plain
1543348995.827387 CPMwHt2g13sPqdiXE1 10.100.130.72 37508 169.254.169.254 80 1 - - - - 1.0 - 0 345 404 Not Found - - (empty) - - - - - - FsbLPY8A3gpuBkM7l - text/html
1543350095.640070 CObHQk2ARejHIWBcgc 10.100.130.72 37518 169.254.169.254 80 1 - - - - 1.0 - 0 13 200 OK - - (empty) - - - - - - FxCY9C2fOP4dHO2Dkj - text/plain
Thanks
-JP
Ignore the lines starting with #.
I used this Grok parser given and it worked. You can use the grokdebugger to check the parser.
https://grokdebug.herokuapp.com/
It can be an issue that the data source is using spaces rather than tabs. Verify the data.
I have used => instead of using a comma(,). Change the code to
grok {
match => [ "message" => "(?<ts>(.*?))\t(?<uid>(.*?))\t(?<id.orig_h>(.*?))\t(?<id.orig_p>(.*?))\t(?<id.resp_h>(.*?))\t(?<id.resp_p>(.*?))\t(?<trans_depth>(.*?))\t(?<method>(.*?))\t(?<bro_host>(.*?))\t(?<uri>(.*?))\t(?<referrer>(.*?))\t(?<user_agent>(.*?))\t(?<request_body_len>(.*?))\t(?<response_body_len>(.*?))\t(?<status_code>(.*?))\t(?<status_msg>(.*?))\t(?<info_code>(.*?))\t(?<info_msg>(.*?))\t(?<filename>(.*?))\t(?<http_tags>(.*?))\t(?<username>(.*?))\t(?<password>(.*?))\t(?<proxied>(.*?))\t(?<orig_fuids>(.*?))\t(?<orig_mime_types>(.*?))\t(?<resp_fuids>(.*?))\t(?<resp_mime_types>(.*))" ]
}
You can also use csv parser rather than Grok if the error still persists.

Resources