_grokparsefailure on varnish log - logstash

Message looks like
1.2.3.4 "-" - - [19/Apr/2016:11:42:18 +0200] "GET http://monsite.vpù/api/opa/status HTTP/1.1" 200 92 "-" "curl - API-Player - PREPROD" hit OPA-PREPROD-API - 0.000144958
My grok pattern is
grok {
match => { "message" => "%{IP:clientip} \"%{DATA:x_forwarded_for}\" %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent} (%{NOTSPACE:hitmiss}|-) (%{NOTSPACE:varnish_conf}|-) (%{NOTSPACE:varnish_backend}|-) %{NUMBER:time_firstbyte}"}
}
I have a grokparsefailure tag whereas all my fields are fulfilled correctly except for the last one, I get 0 instead of 0.000144958
The full message in ES is
{
"_index": "logstash-2016.04.19",
"_type": "syslog",
"_id": "AVQt7WSCN-2LsQj9ZIIq",
"_score": null,
"_source": {
"message": "212.95.71.201 \"-\" - - [19/Apr/2016:11:50:12 +0200] \"GET http://monsite.com/api/opa/status HTTP/1.1\" 200 92 \"-\" \"curl - API-Player - PREPROD\" hit OPA-PREPROD-API - 0.000132084",
"#version": "1",
"#timestamp": "2016-04-19T09:50:12.000Z",
"type": "syslog",
"host": "212.95.70.80",
"tags": [
"_grokparsefailure"
],
"application": "varnish-preprod",
"clientip": "1.2.3.4",
"x_forwarded_for": "-",
"ident": "-",
"auth": "-",
"timestamp": "19/Apr/2016:11:50:12 +0200",
"verb": "GET",
"request": "http://monsite.com/api/opa/status",
"httpversion": "1.1",
"response": "200",
"bytes": "92",
"referrer": "\"-\"",
"agent": "\"curl - API-Player - PREPROD\"",
"hitmiss": "hit",
"varnish_conf": "OPA-PREPROD-API",
"varnish_backend": "-",
"time_firstbyte": "0.000132084",
"geoip": {
"ip": "1.2.3.4",
"country_code2": "FR",
"country_code3": "FRA",
"country_name": "France",
"continent_code": "EU",
"region_name": "C1",
"city_name": "Strasbourg",
"latitude": 48.60040000000001,
"longitude": 7.787399999999991,
"timezone": "Europe/Paris",
"real_region_name": "Alsace",
"location": [
7.787399999999991,
48.60040000000001
]
},
"agentname": "Other",
"agentos": "Other",
"agentdevice": "Other"
},
"fields": {
"#timestamp": [
1461059412000
]
},
"highlight": {
"agent": [
"\"curl - API-Player - #kibana-highlighted-field#PREPROD#/kibana-highlighted-field#\""
],
"varnish_conf": [
"OPA-#kibana-highlighted-field#PREPROD#/kibana-highlighted-field#-API"
],
"application": [
"#kibana-highlighted-field#varnish#/kibana-highlighted-field#-#kibana-highlighted-field#preprod#/kibana-highlighted-field#"
],
"message": [
"1.2.3.4 \"-\" - - [19/Apr/2016:11:50:12 +0200] \"GET http://monsote.com/api/opa/status HTTP/1.1\" 200 92 \"-\" \"curl - API-Player - #kibana-highlighted-field#PREPROD#/kibana-highlighted-field#\" hit OPA-#kibana-highlighted-field#PREPROD#/kibana-highlighted-field#-API - 0.000132084"
]
},
"sort": [
1461059412000
]
}
The answer is that kibana do not display very little number

You would only get a grokparsefailure if the grok, um, fails. So, it's not this grok that's producing the tag. Use the tag_on_failure parameter in your groks to provide a unique tag for each grok.
As for your parsing problem, I'll bet that your grok is working just fine. Note that elasticsearch can make fields dynamically and will guess as to the type of the field based on the first data seen. If your first data was "0", it would have made the field an integer and later entries would be cast to that type. You can pull the mapping to see what happened.
You need to control the mapping that is created. You can specify that the field is a float in the grok itself (%{NUMBER:myField:int}) or by creating your own template.
Also notice that NOTSPACE matches "-", so your patterns for varnish_backend, etc, are not entirely correct.

The problem was coming from the syslog filter using grok internally as explained here https://kartar.net/2014/09/when-logstash-and-syslog-go-wrong/.
The solution was then to remove the tag in my own filter
The other problem is that kibana do not display number like 0.0000xxx but they are correctly stored anyway so I can use it anyway.

Related

LogStash concat Filebeat input

I am trying to merge filebeat messages in LogStash.
I have the next Log file:
----------- SCAN SUMMARY -----------
Known viruses: 8520944
Engine version: 0.102.4
Scanned directories: 408
Scanned files: 1688
Infected files: 0
Total errors: 50
Data scanned: 8.93 MB
Data read: 4.42 MB (ratio 2.02:1)
Time: 22.052 sec (0 m 22 s)
I read it from Filebeat and send to LogStash.
The problem is that I received in LogStash each line in different message. I want to merge it all, and also add a new field that is "received_at", from the fileBeat input.
I would like to have the next output from LogStash:
{
Known_viruses: 8520944
Engine_version: 0.102.4
Scanned_directories: 408
Scanned_files: 1688
Infected_files: 0
Total_errors: 50
Data_scanned: 8.93MB
Data_read: 4.42MB
Time: 22.052sec (0 m 22 s)
Received_at: <timeStamp taked from filebeat received JSON message>
}
The input I received in LogStash from Filebeat is the next per each line:
{
"#timestamp": "2021-04-20T08:03:33.843Z",
"#version": "1",
"tags": ["beats_input_codec_plain_applied"],
"host": {
"name": "PRLN302",
"architecture": "x86_64",
"ip": ["10.126.40.18", "fe80::7dbf:4941:cd39:c0f9", "172.17.0.1", "fe80::42:82ff:fe2b:895", "fe80::24b2:8cff:feeb:20b4", "172.18.0.1", "fe80::42:53ff:fe31:a025", "fe80::b420:e1ff:fe97:c152", "fe80::9862:21ff:fe3a:c33e", "fe80::48a6:70ff:fec2:60d6", "192.168.126.117", "fe80::2831:c644:33d5:321"],
"id": "a74e193a551f4d379f9488b80a463581",
"os": {
"platform": "ubuntu",
"version": "20.04.2 LTS (Focal Fossa)",
"family": "debian",
"name": "Ubuntu",
"type": "linux",
"codename": "focal",
"kernel": "5.8.0-49-generic"
},
"mac": ["e8:6a:64:32:fe:4d", "dc:8b:28:4a:c8:88", "02:42:82:2b:08:95", "26:b2:8c:eb:20:b4", "02:42:53:31:a0:25", "b6:20:e1:97:c1:52", "9a:62:21:3a:c3:3e", "4a:a6:70:c2:60:d6", "00:50:b6:b9:19:d7"],
"containerized": false,
"hostname": "PRLN302"
},
"message": "----------- SCAN SUMMARY -----------",
"agent": {
"ephemeral_id": "ad402f64-ab73-480c-b6de-4af6184f012c",
"type": "filebeat",
"version": "7.12.0",
"id": "f681d775-d452-490a-9b8b-036466a87d35",
"name": "PRLN302",
"hostname": "PRLN302"
},
"input": {
"type": "log"
},
"ecs": {
"version": "1.8.0"
},
"log": {
"offset": 0,
"file": {
"path": "/var/log/clamav-test.log"
}
}
}
Could it be possible?
It is possible, you'll need to look for the multiline messages in the filebeat input:
https://www.elastic.co/guide/en/beats/filebeat/current/multiline-examples.html
somthing like below would do it i think:
multiline.type: pattern
multiline.pattern: '^----------- SCAN SUMMARY -----------'
multiline.negate: true
multiline.match: after

Get a workable URL from Python Get request

I'm scraping a JS loaded website using requests. In order to do so, I go to inspect website, network console and look for the XHR calls to know where is the website calling for the data and how. Process would be as follows
Go to the link https://www.888sport.es/futbol/#/event/1006276426 in Chrome. Once that is loaded, you can click on many items with an unique ID. After doing so, a pop up window with information appears. In the XHR call I mentioned above you get a direct link to get this information as follows:
import requests
url='https://eu-offering.kambicdn.org/offering/v2018/888es/betoffer/outcome.json?lang=es_ES&market=ES&client_id=2&channel_id=1&ncid=1586874367958&id=2740660278'
#ncid is the date in timestamp format, and id is the unique id of the node clicked
response=requests.get(url=url,headers=headers)
Problem is, this isn't user friendly and require python. If I put this last url in the Chrome driver, I get the information but in plain text, and I can't interact with it. Is there any way to get a workable link from the request so that manually inserting it in a Chrome driver it loads that pop up window directly, as a regular website?
You've to make the requests as .json() so you receive a json dict, which you can access it with keys.
import requests
import json
def main(url):
r = requests.get(url).json()
print(r.keys())
hview = json.dumps(r, indent=4)
print(hview) # here to see it in nice view.
main("https://eu-offering.kambicdn.org/offering/v2018/888es/betoffer/outcome.json?lang=es_ES&market=ES&client_id=2&channel_id=1&ncid=1586874367958&id=2740660278")
Output:
dict_keys(['betOffers', 'events', 'prePacks'])
{
"betOffers": [
{
"id": 2210856430,
"closed": "2020-04-17T14:30:00Z",
"criterion": {
"id": 1001159858,
"label": "Final del partido",
"englishLabel": "Full Time",
"order": [],
"occurrenceType": "GOALS",
"lifetime": "FULL_TIME"
},
"betOfferType": {
"id": 2,
"name": "Partido",
"englishName": "Match"
},
"eventId": 1006276426,
"outcomes": [
{
"id": 2740660278,
"label": "1",
"englishLabel": "1",
"odds": 1150,
"participant": "FC Lokomotiv Gomel",
"type": "OT_ONE",
"betOfferId": 2210856430,
"changedDate": "2020-04-14T09:11:55Z",
"participantId": 1003789012,
"oddsFractional": "1/7",
"oddsAmerican": "-670",
"status": "OPEN",
"cashOutStatus": "ENABLED"
},
{
"id": 2740660284,
"label": "X",
"englishLabel": "X",
"odds": 6750,
"type": "OT_CROSS",
"betOfferId": 2210856430,
"changedDate": "2020-04-14T09:11:55Z",
"oddsFractional": "23/4",
"oddsAmerican": "575",
"status": "OPEN",
"cashOutStatus": "ENABLED"
},
{
"id": 2740660286,
"label": "2",
"englishLabel": "2",
"odds": 11000,
"participant": "Khimik Svetlogorsk",
"type": "OT_TWO",
"betOfferId": 2210856430,
"changedDate": "2020-04-14T09:11:55Z",
"participantId": 1001024009,
"oddsFractional": "10/1",
"oddsAmerican": "1000",
"status": "OPEN",
"cashOutStatus": "ENABLED"
}
],
"tags": [
"OFFERED_PREMATCH",
"MAIN"
],
"cashOutStatus": "ENABLED"
}
],
"events": [
{
"id": 1006276426,
"name": "FC Lokomotiv Gomel - Khimik Svetlogorsk",
"nameDelimiter": "-",
"englishName": "FC Lokomotiv Gomel - Khimik Svetlogorsk",
"homeName": "FC Lokomotiv Gomel",
"awayName": "Khimik Svetlogorsk",
"start": "2020-04-17T14:30:00Z",
"group": "1\u00aa Divisi\u00f3n",
"groupId": 2000053499,
"path": [
{
"id": 1000093190,
"name": "F\u00fatbol",
"englishName": "Football",
"termKey": "football"
},
{
"id": 2000051379,
"name": "Bielorrusa",
"englishName": "Belarus",
"termKey": "belarus"
},
{
"id": 2000053499,
"name": "1\u00aa Divisi\u00f3n",
"englishName": "1st Division",
"termKey": "1st_division"
}
],
"nonLiveBoCount": 6,
"sport": "FOOTBALL",
"tags": [
"MATCH"
],
"state": "NOT_STARTED",
"groupSortOrder": 1999999000000000000
}
],
"prePacks": []
}

how to use kibana search field with special character "-"

my ES index documents are nginx log data like :
{
"_index": "nginx-2017-04-30",
"_type": "access",
"_id": "AVu8nYNM_NKHiROBoHkE",
"_score": null,
"_source": {
"cookie_logintoken": "-",
"request_time": "0.000",
"request": "POST /login/getMobileLoginCode HTTP/1.1",
"http_protocol": "https",
"request_id": "a6fb53fcf28b7d6b400f0611ac697f0d",
"#timestamp": "2017-04-30T10:08:11+08:00",
"http_user_agent": "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)",
"http_x_forwarded_for": "-",
"request_uri": "/login/getMobileLoginCode",
"remote_addr": "xxxxxxx",
"http_ver": "-",
"status": "503",
"request_method": "POST",
"hostname": "master",
"request_body": "-",
"upstream_response_time": "-",
"http_vm": "-", # my custom http header
"remote_user": "-",
"http_referer": "-"
},
"fields": {
"#timestamp": [
1493518091000
]
},
"sort": [
1493518091000
]
}
I want to use kibana to search all the http_vm(my custom http header) where the value is "-".
I try to input search query but fail, kibana returns the empty result.
my search query :
http_vm:"-"
http_vm:"\\-"
http_vm:\\-
http_vm:(\\-)
How can i search the "-" field ?
thanks to #logan rakai, I find the way.
Which version of ES are you running? Most likely your http_vm field is being analyzed by the standard analyzer which removes punctuations. In ES 5 there is the keyword sub field which is not analyzed. In earlier versions you can change the index mapping to have the field not_analyzed. – logan rakai
SOLUTION:
this query worked:
http_vm.keyword:"-"

logstash keep matched field in message

2016-11-30 15:43:09.3060 DEBUG 20
Company.Product.LoggerDataFilter
[UOW:583ee57782fe0140c6dfbfd8] [DP:0] Creating
DeviceDataTransformationRequest for logger
[D:4E3239200C5032593D004100].
%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:loglevel}\s+ %{INT:threadId}
%{DATA:loggerName} %{UOW} %{DATAPACKET} %{GREEDYDATA:message}
%{DEVICEID}
The output of that is
{
"timestamp": [
"2016-11-30 15:43:09.3060"
],
"loglevel": [
"DEBUG"
],
"threadId": [
"20"
],
"loggerName": [
"Tts.IoT.DataLogger.Etl.Core.Filters.LoggerDataFilter"
],
"correlationId": [
"583ee57782fe0140c6dfbfd8"
],
"datapacket": [
"0"
],
"message": [
"Creating DeviceDataTransformationRequest for logger"
],
"deviceId": [
"4E3239200C5032593D004100"
]
}
Which is good - EXCEPT - the message is now lacking the DEVICEID property which I extracted. I want it both - as a separate field and still keep it in the message.
Can you do that?
(On a side note... how does structured logging like serilog help in this regard?)
How about try change it
%{GREEDYDATA:message} %{DEVICEID}
to
%{GREEDYDATA:testmessage} %{DEVICEID}
then add a field
mutate {
add_field => {
"message" => "%{testmessage} %{DEVICEID}"
}
remove_field => ["testmessage"]
}

Logstash Grok filter getting multiple values per match

I have a server that sends access logs over to logstash in a custom log format, and am using logstash to filter these logs and send them to Elastisearch.
A log line looks something like this:
0.0.0.0 - GET / 200 - 29771 3 ms ELB-HealthChecker/1.0\n
And gets parsed using this grok filter:
grok {
match => [
"message", "%{IP:remote_host} %{USER:remote_user} %{WORD:method} %{URIPATHPARAM:requested_uri} %{NUMBER:status_code} - %{NUMBER:content_length} %{NUMBER:elapsed_time:int} ms %{GREEDYDATA:user_agent}",
"message", "%{IP:remote_host} - %{WORD:method} %{URIPATHPARAM:requested_uri} %{NUMBER:status_code} - %{NUMBER:content_length} %{NUMBER:elapsed_time:int} ms %{GREEDYDATA:user_agent}",
"message", "%{IP:remote_host} %{USER:remote_user} %{WORD:method} %{URIPATHPARAM:requested_uri} %{NUMBER:status_code} - - %{NUMBER:elapsed_time:int} ms %{GREEDYDATA:user_agent}",
"message", "%{IP:remote_host} - %{WORD:method} %{URIPATHPARAM:requested_uri} %{NUMBER:status_code} - - %{NUMBER:elapsed_time:int} ms %{GREEDYDATA:user_agent}"
]
add_field => {
"protocol" => "HTTP"
}
}
The final log gets parsed into this object (with real IPs stubbed out, and other fields taken out):
{
"_source": {
"message": " 0.0.0.0 - GET / 200 - 29771 3 ms ELB-HealthChecker/1.0\n",
"tags": [
"bunyan"
],
"#version": "1",
"host": "0.0.0.0:0000",
"remote_host": [
"0.0.0.0",
"0.0.0.0"
],
"remote_user": [
"-",
"-"
],
"method": [
"GET",
"GET"
],
"requested_uri": [
"/",
"/"
],
"status_code": [
"200",
"200"
],
"content_length": [
"29771",
"29771"
],
"elapsed_time": [
"3",
3
],
"user_agent": [
"ELB-HealthChecker/1.0",
"ELB-HealthChecker/1.0"
],
"protocol": [
"HTTP",
"HTTP"
]
}
}
Any ideas why I am getting multiple matches per log? Shouldn't Grok be breaking on the first match that successfully parses?
Chances are you have multiple config files that are being loaded. If you look at the output, specifically the elapsed_time shows up as both an integer and a string. From the config file you've provided, that's not possible since you have :int on anything that matches elapsed_time.

Resources