A complicated logstash pattern in Grok

A complicated logstash pattern in Grok - logstash

I have following 3 lines in a log that need to be grok'd for ElasticSearch through logstash.
2020-01-27 13:30:43,536 INFO com.test.bestmatch.streamer.function.BestMatchProcessor - Best match for ID: COi0620200110450BAD5CB723457A9B4747F1727 Total Batch Processing time: 3942
2020-01-27 13:30:43,581 INFO HTTPConnection - COi0620200110450BAD5CB723457A9B4747F1727 | People: 51 | Addresses: 5935 | HTTP Query Time: 24
2020-01-27 13:30:43,698 INFO bestRoute - COi0620200110450BAD5CB723457A9B4747F1727 | Touch Points: 117 | Best Match Time 3943
I tried various grok patterns but couldn't get to any concrete one.
Edited as per request
I need the following in ES in the context of the specific log entry
1st line
ID: COi0620200110450BAD5CB723457A9B4747F1727
Total Batch Processing time: 3942
2nd Line
ID: COi0620200110450BAD5CB723457A9B4747F1727
People: 51
Addresses: 5935
HTTP Query Time: 24
3rd Line
Touch Points 117
Best Match Time: 3943.
The output is from a Flink log. If there are flink patterns out there then please let me know.

1st line:
^%{TIMESTAMP_ISO8601:time}\s*%{LOGLEVEL:loglevel}.*ID: (?<ID>[\w\d]*).*time: (?<total_time>[\d]*)$
2nd line:
^%{TIMESTAMP_ISO8601:time}\s*%{LOGLEVEL:loglevel}.* - (?<ID>[\w]*).*People: (?<people>[\w]*).*Addresses: (?<addresses>[\d]*).*HTTP Query Time: (?<query_time>[\d]*)$
3rd line:
^%{TIMESTAMP_ISO8601:time}\s*%{LOGLEVEL:loglevel}.* - (?<ID>[\w]*).*Touch Points: (?<touch_points>[\d]*).*Best Match Time (?<best_match_time>[\d]*)$
There are many ways to parse this, this is only one approach. I would reccomend to adjust the field names I used to the new ECS. https://www.elastic.co/guide/en/ecs/current/index.html

Related

Fail2ban custom regular expresion

Trying to add custom rule on regular expression in order to block the below log.
Mar 17 18:46:52 s21409974 named[1577]: client #0x7g246c107030 1.1.1.1#8523 (.): query (cache) './ANY/IN' denied
I did tried with online tools like this one (https://www.regextester.com) but on the fail2ban-regex test command does display like it miss it.
Any suggestion about the rule or about how to better troubleshoot?
Thank in advance

Why do you try to write a custom regex? This message is pretty well matching with original fail2ban filter named-refused:
$ msg="Mar 17 18:46:52 s21409974 named[1577]: client #0x7g246c107030 1.1.1.1#8523 (.): query (cache) './ANY/IN' denied"
$ fail2ban-regex "$msg" named-refused
Running tests
=============
Use failregex filter file : named-refused
Use single line : Mar 17 18:46:52 s21409974 named[1577]: client #0x7...
Results
=======
Prefregex: 1 total
| ^(?:\s*\S+ (?:(?:\[\d+\])?:\s+\(?named(?:\(\S+\))?\)?:?|\(?named(?:\(\S+\))?\)?:?(?:\[\d+\])?:)\s+)?(?: error:)?\s*client(?: #\S*)? (?:\[?(?:(?:::f{4,6}:)?(?P<ip4>(?:\d{1,3}\.){3}\d{1,3})|(?P<ip6>(?:[0-9a-fA-F]{1,4}::?|::){1,7}(?:[0-9a-fA-F]{1,4}|(?<=:):)))\]?|(?P<dns>[\w\-.^_]*\w))#\S+(?: \([\S.]+\))?: (?P<content>.+)\s(?:denied|\(NOTAUTH\))\s*$
`-
Failregex: 1 total
|- #) [# of hits] regular expression
| 1) [1] ^(?:view (?:internal|external): )?query(?: \(cache\))?
`-
Ignoreregex: 0 total
Date template hits:
|- [# of hits] date format
| [1] {^LN-BEG}(?:DAY )?MON Day %k:Minute:Second(?:\.Microseconds)?(?: ExYear)?
`-
Lines: 1 lines, 0 ignored, 1 matched, 0 missed
[processed in 0.01 sec]
But if you need it, here you go (regex interpolated from fail2ban's pref- & failregex):
^\s*\S+\s+named\[\d+\]: client(?: #\S*)? <ADDR>#\S+(?: \([\S.]+\))?: (?:view (?:internal|external): )?query(?: \(cache\))? '[^']+' denied
replace <ADDR> with <HOST> if your fail2ban version is smaller than 0.10.

Artillery.io: How to generate test report for each Scenario?

Artillery: How to run the scenarios sequentially and also display the results of each scenario in the same file?
I'm currently writing nodejs test with artillery.io to compare performance between two endpoints that I implemented. I defined two scenarios and I would like to get the result of each in a same report file.
The execution of the tests is not sequential, it means that at the end of the test I have a result already combined and impossible to know the performance of each one but for all.
config:
target: "http://localhost:8080/api/v1"
plugins:
expect: {}
metrics-by-endpoint: {}
phases:
- duration: 60
arrivalRate: 2
environments:
dev:
target: "https://backend.com/api/v1"
phases:
- duration: 60
arrivalRate: 2
scenarios:
- name: "Nashhorn"
flow:
- post:
url: "/casting/nashhorn"
auth:
user: user1
pass: user1
json:
body:
fromFile: "./casting-dataset-01-as-input.json"
options:
filename: "casting_dataset"
conentType: "application/json"
expect:
statusCode: 200
capture:
regexp: '[^]*'
as: 'result'
- log: 'result= {{result}}'
- name: "Nodejs"
flow:
- post:
url: "/casting/nodejs"
auth:
user: user1
pass: user1
json:
body:
fromFile: "./casting-dataset-01-as-input.json"
options:
filename: "casting_dataset"
conentType: "application/json"
expect:
statusCode: 200
capture:
regexp: '[^]*'
as: 'result'
- log: 'result= {{result}}'
How to run the scenarios sequentially and also display the results of each scenario in the same file?
Thank you in advance for your answers

I think you miss the param weight, this param defines de probability to execute the scenario. if in you first scenario put a weight of 1 and in the second put the same value, both will have the same probability to been execute (50%).
If you put in the first scenario a weight of 3 and in the second one a weight of 1, the second scenario will have a 25% probability of execution while the first one will have a 75% probability of being executed.
This combined with the arrivalRate parameter and setting the value of rampTo to 2, will cause 2 scenarios to be executed every second, in which if you set a weight of 1 to the two scenarios, they will be executed at the same time.
Look down for scenario weights in the documentation
scenarios:
- flow:
- log: Scenario for GET requests
- get:
url: /v1/url_test_1
name: Scenario for GET requests
weight: 1
- flow:
- log: Scenario for POST requets
- post:
json: {}
url: /v1/url_test_2
name: Scenario for POST
weight: 1
I hope this helps you.

To my knowledge, there isn't a good way to do this with the existing the artillery logic.
using this test script:
scenarios:
- name: "test 1"
flow:
- post:
url: "/postman-echo.com/get?test=123"
weight: 1
- name: "test 2"
flow:
- post:
url: "/postman-echo.com/get?test=123"
weight: 1
... etc...
Started phase 0 (equal weight), duration: 1s # 13:21:54(-0500) 2021-01-06
Report # 13:21:55(-0500) 2021-01-06
Elapsed time: 1 second
Scenarios launched: 20
Scenarios completed: 20
Requests completed: 20
Mean response/sec: 14.18
Response time (msec):
min: 117.2
max: 146.1
median: 128.6
p95: 144.5
p99: 146.1
Codes:
404: 20
All virtual users finished
Summary report # 13:21:55(-0500) 2021-01-06
Scenarios launched: 20
Scenarios completed: 20
Requests completed: 20
Mean response/sec: 14.18
Response time (msec):
min: 117.2
max: 146.1
median: 128.6
p95: 144.5
p99: 146.1
Scenario counts:
test 7: 4 (20%)
test 5: 2 (10%)
test 3: 1 (5%)
test 1: 4 (20%)
test 9: 2 (10%)
test 8: 3 (15%)
test 10: 2 (10%)
test 4: 1 (5%)
test 6: 1 (5%)
Codes:
404: 20
So basically you can see that they are weighted equally, but are not running equally. So I think there needs to be something added to the code itself for artillery. Happy to be wrong here.

You can use the per endpoint metrics plugin to give you the results per endpoint instead of aggregated.
https://artillery.io/docs/guides/plugins/plugin-metrics-by-endpoint.html
I see you already have this in your config, but it cannot be working if it is not giving you what you need. Did you install it as well as add to config?
npm install artillery-plugin-metrics-by-endpoint
In terms of running sequentially, I'm not sure why you would want to, but assuming you do, you just need to define each POST as part of the same Scenario instead of 2 different scenarios. That way the second step will only execute after the first step has responded. I believe the plugin is per endpoint, not per scenario so will still give you the report you want.

Parsing two formats of log messages in LogStash

In a single log file, there are two formats of log messages. First as so:
Apr 22, 2017 2:00:14 AM org.activebpel.rt.util.AeLoggerFactory info
INFO:
======================================================
ActiveVOS 9.* version Full license.
Licensed for All application server(s), for 8 cpus,
License expiration date: Never.
======================================================
and second:
Apr 22, 2017 2:00:14 AM org.activebpel.rt.AeException logWarning
WARNING: The product license does not include Socrates.
First line is same, but on the other lines, there can be (written in pseudo) :loglevel: <msg>, or loglevel:<newline><many of =><newline><multiple line msg><newline><many of =>
I have the following configuration:
Query:
%{TIMESTAMP_MW_ERR:timestamp} %{DATA:logger} %{GREEDYDATA:info}%{SPACE}%{LOGLEVEL:level}:(%{SPACE}%{GREEDYDATA:msg}|%{SPACE}=+(%{GREEDYDATA:msg}%{SPACE})*=+)
Grok patterns:
AMPM (am|AM|pm|PM|Am|Pm)
TIMESTAMP_MW_ERR %{MONTH} %{MONTHDAY}, %{YEAR} %{HOUR}:%{MINUTE}:%{SECOND} %{AMPM}
Multiline filter:
%{LOGLEVEL}|%{GREEDYDATA}|=+
The problem is that all messages are always identified with %{SPACE}%{GREEDYDATA:msg}, and so in second case return <many of => as msg, and never with %{SPACE}=+(%{GREEDYDATA:msg}%{SPACE})*=+, probably as first msg pattern contains the second.
How can I parse these two patterns of msg ?

I fixed it by following:
Query:
%{TIMESTAMP_MW_ERR:timestamp} %{DATA:logger} %{DATA:info}\s%{LOGLEVEL:level}:\s((=+\s%{GDS:msg}\s=+)|%{GDS:msg})
Patterns:
AMPM (am|AM|pm|PM|Am|Pm)
TIMESTAMP_MW_ERR %{MONTH} %{MONTHDAY}, %{YEAR} %{HOUR}:%{MINUTE}:%{SECOND} %{AMPM}
GDS (.|\s)*
Multiline pattern:
%{LOGLEVEL}|%{GREEDYDATA}
Logs are correctly parsed.

Grok, logs processing with different values

I have a logfile, I am parsing it with telegraf.logparser and then it sends it to influxdb. The problem is, my logfile has different fields in a complete string:
2016-12-06 11:13:34 job id: mHiMMDmCDFKDmGXNMhm, lrmsid: 13370
2016-12-06 11:14:34 job id: seeeeeewsda33rfddSD, lrmsid: 13371
2016-12-06 11:14:37 job id: dmABFKDmqKcNDmHBFKD, failure: "Timeout"
I can match single of that lines with
%{TIMESTAMP_ISO8601} job id: %{WORD:jobid}, lrmsid: {%WORD.lrmsid}
or
%{TIMESTAMP_ISO8601} job id: %{WORD:jobid}, failure: {%WORD.fail}
But how can I do it to get both .. so that if lrmsid is not set, it get lrmsid=null, and failure="Timeout".. and if lrmsid is set its lrmsid=12345 and failure=null

Please try this one:
(lrmsid: %{WORD:lrmsid})?(failure: "%{WORD:failure}")?
It should capture either lrmsid or failure if I have not missed anything

Search multiline error log for error code and then some of it's parameters on Linux

What command would give me the output I need for each instance of an error code in a very large log file? The file has records marked by a begin and end with number of characters. Such as:
SR 120
1414760452 0 1 Fri Oct 31 13:00:52 2014 2218714 4
GROVEMR2 scn
../SrxParamIF.m 284
New Exam Started
EN 120
The 5th field is the error code, 2218714 in previous example.
I thought of just grep'ing for the error code and outputting -A lines afterwards; then picking what I needed from that rather than parsing the entire file. That seems easy but my grep/awk/sed usage isn't to that level.
ONLY when error 2274021 is encountered as in the following example I'd like some output as shown.
Show me output such as: egrep ‘Coil:|Connector:|Channels faulted:| First channel:’ ERRORLOG|less
Part of input file of interest:
Mon Nov 24 13:43:37 2014 2274021 1
AWHMRGE3T NSP
SCP:RfHubCanHWO::RfBias 4101
^MException Class: Unknown Severity: Unknown
Function: RF: RF Bias
PSD: VIBRANT Coil: Breast SMI Scan: 1106/14
Coil Fault - Short Circuit
A multicoil bias fault was detected.
.
Connector: Port 1 (P1)
Channels faulted: 0x200
First channel: 10 of 32, counting from 1
Fault value: -2499 mV, Channel: 10->
Output:
Coil: Breast SMI
Connector: Port 1 (P1)
Channels faulted: 0x200
First channel: 10 of 32, counting from 1
Thanks in advance for any pointers!

Try the following (with the convenient adaptations)
#!/usr/bin/perl
use strict;
$/="\nEN "; # register separated by "\nEN "
my $error=2274021; # the error!
while(<>){ # for all registers
next unless /\b$error\b/; # ignore unless error
for my $line ( split(/\n/,$_)){
print "$line\n" if ($line =~ /Coil:|Connector:|Channels faulted:|First channel:/);
}
print "====\n"
}
Is this what you need?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

A complicated logstash pattern in Grok - logstash

Related

Fail2ban custom regular expresion

Artillery.io: How to generate test report for each Scenario?

Parsing two formats of log messages in LogStash

Grok, logs processing with different values

Search multiline error log for error code and then some of it's parameters on Linux

Categories

Resources