How to change “message” value in index - logstash

In logstash pipeline or indexpattern how to change the following part of CDN log in "message" field to seperate or extract some data then aggrigate them.
<40> 2022-01-17T08:31:22Z logserver-5 testcdn[1]: {"method":"GET","scheme":"https","domain":"www.123.com","uri":"/product/10809350","ip":"66.249.65.174","ua":"Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)","country":"US","asn":15169,"content_type":"text/html; charset=utf-8","status":200,"server_port":443,"bytes_sent":1892,"bytes_received":1371,"upstream_time":0.804,"cache":"MISS","request_id":"b017d78db4652036250148216b0a290c"}
expected change:
{"method":"GET","scheme":"https","domain":"www.123.com","uri":"/product/10809350","ip":"66.249.65.174","ua":"Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)","country":"US","asn":15169,"content_type":"text/html; charset=utf-8","status":200,"server_port":443,"bytes_sent":1892,"bytes_received":1371,"upstream_time":0.804,"cache":"MISS","request_id":"b017d78db4652036250148216b0a290c"}
Bacause this part "<40> 2022-01-17T08:31:22Z logserver-5 testcdn[1]:" is not parsed in jason and I can't create visual dashboard based on some fileds such as country, asn, etc...
The original log that indexed by logstash is:
{
"_index": "logstash-2022.01.17-000001",
"_type": "_doc",
"_id": "Qx8pZ34BhloLEkDviGxe",
"_version": 1,
"_score": 1,
"_source": {
"message": "<40> 2022-01-17T08:31:22Z logserver-5 testcdn[1]: {\"method\":\"GET\",\"scheme\":\"https\",\"domain\":\"www.123.com\",\"uri\":\"/product/10809350\",\"ip\":\"66.249.65.174\",\"ua\":\"Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)\",\"country\":\"US\",\"asn\":15169,\"content_type\":\"text/html; charset=utf-8\",\"status\":200,\"server_port\":443,\"bytes_sent\":1892,\"bytes_received\":1371,\"upstream_time\":0.804,\"cache\":\"MISS\",\"request_id\":\"b017d78db4652036250148216b0a290c\"}",
"port": 39278,
"#timestamp": "2022-01-17T08:31:22.100Z",
"#version": "1",
"host": "93.115.150.121"
},
"fields": {
"#timestamp": [
"2022-01-17T08:31:22.100Z"
],
"port": [
39278
],
"#version": [
"1"
],
"host": [
"93.115.150.121"
],
"message": [
"<40> 2022-01-17T08:31:22Z logserver-5 testcdn[1]: {\"method\":\"GET\",\"scheme\":\"https\",\"domain\":\"www.123.com\",\"uri\":\"/product/10809350\",\"ip\":\"66.249.65.174\",\"ua\":\"Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)\",\"country\":\"US\",\"asn\":15169,\"content_type\":\"text/html; charset=utf-8\",\"status\":200,\"server_port\":443,\"bytes_sent\":1892,\"bytes_received\":1371,\"upstream_time\":0.804,\"cache\":\"MISS\",\"request_id\":\"b017d78db4652036250148216b0a290c\"}"
],
"host.keyword": [
"93.115.150.121"
]
}
}
Thanks

Thank you, This is very useful, I got an idea from your suggestion for this specific scenario:
The following edited logstash.conf solves this problem :
input {
tcp {
port => 5000
codec => json
}
}
filter {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:Junk}: %{GREEDYDATA:request}"}
}
json { source => "request" }
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => ["elasticsearch:9200"]
manage_template => false
ecs_compatibility => disabled
index => "logs-%{[#metadata][beat]}-%{[#metadata][version]}-%{+YYYY.MM.dd}"
}
}
But my main concern is about editing config files, I'd prefer make any changes in kibana web ui rather than changing logstash.conf, because we use elk for diffrent scenarios in the organization and such changes make an elk server proper for just a special purpose not for multi purposes.
How to get such result without changing logstash config files?

Add these configurations to filter section of you logstash config:
#To parse the message field
grok {
match => { "message" => "<%{NONNEGINT:syslog_pri}>\s+%{TIMESTAMP_ISO8601:syslog_timestamp}\s+%{DATA:sys_host}\s+%{NOTSPACE:sys_module}\s+%{GREEDYDATA:syslog_message}"}
}
#To replace message field with syslog_message
mutate {
replace => [ "message", "%{syslog_message}" ]
}
Once the message field is replaced by syslog_message, You can add the json filter below to parse the json to separate fields as well..
json {
source => "syslog_message"
}

Related

How to pass hardcoded data in logstash input generated section

How to pass an hardcoded data from input generated, When I am passing this input through input section logstash is successfully executed but didnt produce any filtered output in console. Is there any way to pass json in input generator section??
Note- I have taken this input data from some log file...
input {
generator {
lines => [\"clientContextId\":\"INTERNAL-b8d19563-94a1-442d-9a09-dde36743fb7d\",\"description\":\"A N1QL EXPLAIN statement was executed\",\"id\":28673,\"isAdHoc\":true,\"metrics\":{\"elapsedTime\":\"11.921ms\",\"executionTime\":\"11.764ms\",\"resultCount\":1,\"resultSize\":649},\"name\":\"EXPLAIN statement\",\"node\":\"127.0.0.1:8091\",\"real_userid\":{\"domain\":\"builtin\",\"user\":\"Administrator\"},\"remote\":{\"ip\":\"127.0.0.1\",\"port\":44695},\"requestId\":\"958a7e12-d5a6-4d7b-bd40-ac9bb60cf4a3\",\"statement\":\"explain INSERT INTO `Guardium` (KEY, VALUE) \\nVALUES ( \\\"id::5554\\n\\\", { \\\"Emp Name\\\": \\\"Test4\\\", \\\"Emp Company\\\" : \\\"GS Lab\\\", \\\"Emp Country\\\" : \\\"India\\\"} )\\nRETURNING *;\",\"status\":\"errors\",\"timestamp\":\"2021-01-07T09:37:00.486Z\",\"userAgent\":\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36 (Couchbase Query Workbench (6.6.1-9213-enterprise))\"]
}
}
filter{
// filter logic
}
output{
stdout { codec => rubydebug }
}
Yes, you can have JSON in a generator input. For example
input { generator { count => 1 lines => [ '{ "id": "43fb7d", "description": "..." }' ] } }
filter { json { source => "message" } }
will result in
"description" => "...",
"id" => "43fb7d",

Grok Filter not adding new fields | Logstash

We have the below grok filter configured for our journlabeat. The same was deployed on our local for filebeat was working fine but isn't adding the new fields on journalbeat.
filter {
grok {
patterns_dir => ["/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-patterns-core-4.1.2/patterns"]
match => { "message" => [
'%{IPV4:client_ip} - - \[%{HTTPDATE:date}\] "%{WORD:method} %{URIPATH:request} %{URIPROTO:protocol}\/[1-9].[0-9]" (%{NUMBER:status}|-) (%{NUMBER:bytes}|-) "(%{URI:url}|-)" %{QUOTEDSTRING:client}'
]
break_on_match => false
tag_on_failure => ["failed_match"]
}
}
}
We tried adding the mutate filter for adding new fields using below but it isn't fetching the value and is printing the scalar values itself (example: %{client_ip}).
mutate {
add_field => {
"client_ip" => "%{client_ip}"
"date" => "%{date}"
"method" => "%{method}"
"status" => "%{status}"
"request" => "%{request}"
}
}
The log which we are trying to match is as below.
::ffff:172.65.205.3 - - [09/Jul/2020:11:32:52 +0000] "POST /v1-get-profile HTTP/1.1" 404 71 "https://mycompany.com/customer/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36"
Could someone let me know what exactly are we doing wrong. Thanks in Advance.

Azure Identity Protection - Risk Detection API - Filter by date

I am trying to filter the RiskDetection data retrieved from Azure Identity Protection by date and so far no success.
For the below sample data below filter by activityDateTime (or any date fields in the sample data) show internal error in the response:
https://graph.microsoft.com/beta/riskDetections?$filter=activityDateTime ge 2020-02-05
{'error': {'code': 'Internal Server Error', 'message': 'There was an internal
server error while processing the request.
Error ID: 0c2de841-9d83-479a-b7f2-ed2c102908f6',
'innerError':
{'request-id': '0c2de841-9d83-479a-b7f2-ed2c102908f6',
'date': '2020-02-07T01:28:17'}}}
From https://learn.microsoft.com/en-us/graph/query-parameters
Note: The following $filter operators are not supported for Azure AD
resources: ne, gt, ge, lt, le, and not. The contains string operator
is currently not supported on any Microsoft Graph resources.
Is there a way to filter by date for RiskDetections? Will appreciate any help.
Below filter with riskType and riskLevel shows data:
risk_detections_api_url = "https://graph.microsoft.com/beta/riskDetections?$filter=riskType eq 'anonymizedIPAddress' or riskLevel eq 'medium'"
Below filter with userPrincipalName shows data:
risk_detections_api_url = "https://graph.microsoft.com/beta/riskDetections?$filter=userPrincipalName eq 'john.doe#example.com'"
Below filter with ipAddress shows data:
risk_detections_api_url = "https://graph.microsoft.com/beta/riskDetections?$filter=ipAddress eq '195.228.45.176'"
Sample data
{
"id": "8901d1fee9bqwqweqwe683a221af3d2ae691736f2e369e0dd530625398",
"requestId": "cc755f41-0313-4cb2-96ce-3a6283fef200",
"correlationId": "c422083d-0e32-4afb-af4e-6ca46e4235b4",
"riskType": "anonymizedIPAddress",
"riskState": "atRisk",
"riskLevel": "medium",
"riskDetail": "none",
"source": "IdentityProtection",
"detectionTimingType": "realtime",
"activity": "signin",
"tokenIssuerType": "AzureAD",
"ipAddress": "195.228.45.176",
"activityDateTime": "2019-12-26T17:40:02.1402381Z",
"detectedDateTime": "2019-12-26T17:40:02.1402381Z",
"lastUpdatedDateTime": "2019-12-26T17:43:21.8931807Z",
"userId": "e3835755-80b0-4b61-a1c0-5ea9ead75300",
"userDisplayName": "John Doe",
"userPrincipalName": "john.doe#example.com",
"additionalInfo": "[{\"Key\":\"userAgent\",\"Value\":\"Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0\"}]",
"location": {
"city": "Budapest",
"state": "Budapest",
"countryOrRegion": "HU",
"geoCoordinates": {
"latitude": 47.45996,
"longitude": 19.14968
}
}
}
Based on Properties, activityDateTime is datetimeoffset type.
So you should use GET https://graph.microsoft.com/beta/riskDetections?$filter=activityDateTime gt 2019-12-25 rather than GET https://graph.microsoft.com/beta/riskDetections?$filter=activityDateTime gt '2019-12-25'.
A similar API document here: List directoryAudits.
But when I test it, it gives 500 error:
{
"error": {
"code": "Internal Server Error",
"message": "There was an internal server error while processing the request. Error ID: d52436f6-073b-4fc8-b3bc-c6a6336d6886",
"innerError": {
"request-id": "d52436f6-073b-4fc8-b3bc-c6a6336d6886",
"date": "2020-02-05T04:10:45"
}
}
}
I believe beta version for this API is still in change. You could contact Microsoft support with your request-id for further investigation.
You will need to provide the date in the UTC format.
Example:
https://graph.microsoft.com/beta/riskDetections?$filter=activityDateTime ge 2020-01-01T22:13:50.843847Z
In python you would do something like below to create the URL with filter:
from datetime import datetime
date_filter = datetime.utcnow().isoformat()+"Z"
request_url = "https://graph.microsoft.com/beta/riskDetections?$filter=activityDateTime ge " + date_filter
The response is now filtered:
[
{
"id": "68f0402c7063a2fbbae5895f2c63598ca3c2b81c44be60145be1a9cd7e20af4b",
"requestId": "181d3817-b4fb-4d2b-a87c-065776f05800",
"correlationId": "6d02786c-0bc7-441f-b303-51430016f955",
"riskType": "unfamiliarFeatures",
"riskState": "atRisk",
"riskLevel": "low",
"riskDetail": "none",
"source": "IdentityProtection",
"detectionTimingType": "realtime",
"activity": "signin",
"tokenIssuerType": "AzureAD",
"ipAddress": "52.185.138.50",
"activityDateTime": "2020-02-07T05:48:07.6322964Z",
"detectedDateTime": "2020-02-07T05:48:07.6322964Z",
"lastUpdatedDateTime": "2020-02-07T05:49:33.3003616Z",
"userId": "e3835755-80b0-4b61-a1c0-5ea9ead75300",
"userDisplayName": "John Doe",
"userPrincipalName": "john.doe#example.com",
"additionalInfo": "[{\"Key\":\"userAgent\",\"Value\":\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36\"}]",
"location": {
"city": "tokyo",
"state": "tokyo",
"countryOrRegion": "jp",
"geoCoordinates": {
"latitude": 35.69628,
"longitude": 139.7386
}
}
}
]

how to use kibana search field with special character "-"

my ES index documents are nginx log data like :
{
"_index": "nginx-2017-04-30",
"_type": "access",
"_id": "AVu8nYNM_NKHiROBoHkE",
"_score": null,
"_source": {
"cookie_logintoken": "-",
"request_time": "0.000",
"request": "POST /login/getMobileLoginCode HTTP/1.1",
"http_protocol": "https",
"request_id": "a6fb53fcf28b7d6b400f0611ac697f0d",
"#timestamp": "2017-04-30T10:08:11+08:00",
"http_user_agent": "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)",
"http_x_forwarded_for": "-",
"request_uri": "/login/getMobileLoginCode",
"remote_addr": "xxxxxxx",
"http_ver": "-",
"status": "503",
"request_method": "POST",
"hostname": "master",
"request_body": "-",
"upstream_response_time": "-",
"http_vm": "-", # my custom http header
"remote_user": "-",
"http_referer": "-"
},
"fields": {
"#timestamp": [
1493518091000
]
},
"sort": [
1493518091000
]
}
I want to use kibana to search all the http_vm(my custom http header) where the value is "-".
I try to input search query but fail, kibana returns the empty result.
my search query :
http_vm:"-"
http_vm:"\\-"
http_vm:\\-
http_vm:(\\-)
How can i search the "-" field ?
thanks to #logan rakai, I find the way.
Which version of ES are you running? Most likely your http_vm field is being analyzed by the standard analyzer which removes punctuations. In ES 5 there is the keyword sub field which is not analyzed. In earlier versions you can change the index mapping to have the field not_analyzed. – logan rakai
SOLUTION:
this query worked:
http_vm.keyword:"-"

Logstash parser error, timestamp is malformed

Can somebody tell me what I'm doing wrong, or why Logstash doesn't want to parse an ISO8601 timestamp?
The error message I get is
Failed action ... "error"=>{"type"=>"mapper_parsing_exception",
"reason"=>"failed to parse [timestamp]",
"caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"Invalid
format: \"2017-03-24 12:14:50\" is malformed at \"17-03-24
12:14:50\""}}
Sample log file line (last byte in IP address replaced with 000 on purpose)
2017-03-24 12:14:50 87.123.123.000 12345678.domain.com GET /smil:stream_17.smil/chunk_ctvideo_ridp0va0r600115_cs211711500_mpd.m4s - HTTP/1.1 200 750584 0.714 "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36" https://referrer.domain.com/video/2107 https fra1 "HIT, MISS" 12345678.domain.com
GROK pattern (use http://grokconstructor.appspot.com/do/match to verify)
RAW %{TIMESTAMP_ISO8601:timestamp}%{SPACE}%{IPV4:clientip}%{SPACE}%{HOSTNAME:http_host}%{SPACE}%{WORD:verb}%{SPACE}\/(.*:)?%{WORD:stream}%{NOTSPACE}%{SPACE}%{NOTSPACE}%{SPACE}%{WORD:protocol}\/%{NUMBER:httpversion}%{SPACE}%{NUMBER:response}%{SPACE}%{NUMBER:bytes}%{SPACE}%{SECOND:request_time}%{SPACE}%{QUOTEDSTRING:agent}%{SPACE}%{URI:referrer}%{SPACE}%{WORD}%{SPACE}%{WORD:location}%{SPACE}%{QUOTEDSTRING:cache_status}%{SPACE}%{WORD:account}%{GREEDYDATA}
Logstash configuration (input side):
input {
file {
path => "/subfolder/logs/*"
type => "access_logs"
start_position => "beginning"
}
}
filter {
# skip first two lines in log file with comments
if [message] =~ /^#/ {
drop { }
}
grok {
patterns_dir => ["/opt/logstash/patterns"]
match => { "message" => "%{RAW}" }
}
date {
match => [ "timestamp" , "yyyy-MM-dd HH:mm:ss" ]
locale => "en"
}
# ... (rest of the config omitted for readability)
}
So I am pretty sure this is being caused by the field timestamp being mapped to a type in Elasticsearch that it doesn't parse to. If you post your index mapping, I'd be happy to look at it.
A note: You can quickly solve this by adding remove_field because if the date filter is successful, the value of that field will be pulled into #timestamp. Right now you have the same value stored in two fields. Then you don't have to worry about the mapping for the field. :)
date {
match => [ "timestamp" , "yyyy-MM-dd HH:mm:ss" ]
locale => "en"
remove_field => [ "timestamp" ]
}

Resources