how to use kibana search field with special character "-" - search

my ES index documents are nginx log data like :
{
"_index": "nginx-2017-04-30",
"_type": "access",
"_id": "AVu8nYNM_NKHiROBoHkE",
"_score": null,
"_source": {
"cookie_logintoken": "-",
"request_time": "0.000",
"request": "POST /login/getMobileLoginCode HTTP/1.1",
"http_protocol": "https",
"request_id": "a6fb53fcf28b7d6b400f0611ac697f0d",
"#timestamp": "2017-04-30T10:08:11+08:00",
"http_user_agent": "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)",
"http_x_forwarded_for": "-",
"request_uri": "/login/getMobileLoginCode",
"remote_addr": "xxxxxxx",
"http_ver": "-",
"status": "503",
"request_method": "POST",
"hostname": "master",
"request_body": "-",
"upstream_response_time": "-",
"http_vm": "-", # my custom http header
"remote_user": "-",
"http_referer": "-"
},
"fields": {
"#timestamp": [
1493518091000
]
},
"sort": [
1493518091000
]
}
I want to use kibana to search all the http_vm(my custom http header) where the value is "-".
I try to input search query but fail, kibana returns the empty result.
my search query :
http_vm:"-"
http_vm:"\\-"
http_vm:\\-
http_vm:(\\-)
How can i search the "-" field ?
thanks to #logan rakai, I find the way.
Which version of ES are you running? Most likely your http_vm field is being analyzed by the standard analyzer which removes punctuations. In ES 5 there is the keyword sub field which is not analyzed. In earlier versions you can change the index mapping to have the field not_analyzed. – logan rakai
SOLUTION:
this query worked:
http_vm.keyword:"-"

Related

Dropdown does not display values in filter field when passing parameters in embedded dash of Metabase

When building a dash with some drop-down filter fields (all duly configured on the admin page as category filters and to display the list of values), these work correctly when used without passing parameters in advance, as shown in the picture below.
Dropdown filter working correctly
However, when incorporating the dash and passing parameters via node, the fields are blank. The construction of the dash takes place through 3 tables. A behavior I noticed is that when passing the value of a filter referring to any of the tables, only the filters of that table display the drop-down menu, the others are blank. In the example below, the month parameter was passed when setting up the dash in node, however, the month field that is part of the same table does not display the drop-down menu. It is possible to notice in the call that an empty array is returned.
When passing a parameter via node, the dropdown filter appearing blank
The values ​​that were passed to the filters via node:
Parameters informed in the call
The settings made for the filter in the dash (the month was used as an example, but this occurs with the other fields as well) are shown in the figure.
Filter settings on the dashboard
Settings made in the filters in the questions.
Filter settings in question
`Versão utilizada do Metabase: 0.41.2.
Informações do Diagnóstico: { "browser-info": { "language": "pt-BR", "platform": "Win32", "userAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36", "vendor": "Google Inc." }, "system-info": { "file.encoding": "UTF-8", "java.runtime.name": "OpenJDK Runtime Environment", "java.runtime.version": "1.8.0_312-8u312-b07-0ubuntu1~20.04-b07", "java.vendor": "Private Build", "java.vendor.url": "http://java.oracle.com/", "java.version": "1.8.0_312", "java.vm.name": "OpenJDK 64-Bit Server VM", "java.vm.version": "25.312-b07", "os.name": "Linux", "os.version": "5.13.0-1025-gcp", "user.language": "en", "user.timezone": "America/Sao_Paulo" }, "metabase-info": { "databases": [ "postgres", "h2" ], "hosting-env": "unknown", "application-database": "postgres", "application-database-details": { "database": { "name": "PostgreSQL", "version": "12.12 (Ubuntu 12.12-0ubuntu0.20.04.1)" }, "jdbc-driver": { "name": "PostgreSQL JDBC Driver", "version": "42.2.23" } }, "run-mode": "prod", "version": { "tag": "v0.41.2", "date": "2021-11-09", "branch": "release-x.41.x", "hash": "ad599fd" }, "settings": { "report-timezone": null } } }`
I tried to change the type of filter, the search method and tried to review the structure of the table hoping that the error could be something like that, however, nothing unnecessary. The dropdown filter is blank when passing filters via node.

Indexing e-mails in Azure Search

I'm trying to best index contents of e-mail messages, subjects and email addresses. E-mails can contain both text and HTML representation. They can be in any language so I can't use language specific analysers unfortunately.
As I am new to this I have many questions:
First I used Standard Lucene analyser but after some testing and
checking what each analyser does I switched to using "simple"
analyser. Standard one didn't allow me to search by domain in
user#domain.com (It sees user and domain.com as tokens). Is "simple" the best I can use in my case?
How can I handle HTML contents of e-mail? I thought this should be
possible to do it in Azure Search but right now I think I would need
to strip HTML tags myself.
My users aren't tech savvy and I assumed "simple" query type will be
enough for them. I expect them to type word or two and find messages
containing this word/containing words starting with this word. From my tests it looks I need to append * to their queries to get "starting with" to work?
It would help if you included an example of your data and how you index and query. What happened, and what did you expect?
The standard Lucene analyzer will work with your user#domain.com example. It is correct that it produces the tokens user and domain.com. But the same happens when you query, and you will get records with the tokens user and domain.com.
CREATE INDEX
"fields": [
{"name": "Id", "type": "Edm.String", "searchable": false, "filterable": true, "retrievable": true, "sortable": true, "facetable": false, "key": true, "indexAnalyzer": null, "searchAnalyzer": null, "analyzer": null, "synonymMaps": [] },
{"name": "Email", "type": "Edm.String", "filterable": true, "sortable": true, "facetable": false, "searchable": true, "analyzer": "standard"}
]
UPLOAD
{
"value": [
{
"#search.action": "mergeOrUpload",
"Id": "1",
"Email": "user#domain.com"
},
{
"#search.action": "mergeOrUpload",
"Id": "2",
"Email": "some.user#some-domain.com"
},
{
"#search.action": "mergeOrUpload",
"Id": "3",
"Email": "another#another.com"
}
]
}
QUERY
Query, using full and all.
https://{{SEARCH_SVC}}.{{DNS_SUFFIX}}/indexes/{{INDEX_NAME}}/docs?search=user#domain.com&$count=true&$select=Id,Email&searchMode=all&queryType=full&api-version={{API-VERSION}}
Which produces results as expected (all records containing user and domain.com):
{
"#odata.context": "https://<your-search-env>.search.windows.net/indexes('dg-test-65392234')/$metadata#docs(*)",
"#odata.count": 2,
"value": [
{
"#search.score": 0.51623213,
"Id": "1",
"Email": "user#domain.com"
},
{
"#search.score": 0.25316024,
"Id": "2",
"Email": "some.user#some-domain.com"
}
]
}
If your expected result is to only get the record above where the email matches completely, you could instead use a phrase search. I.e. replace the search parameter above with search="user#domain.com" and you would get:
{
"#search.score": 0.51623213,
"Id": "1",
"Email": "user#domain.com"
}
Alternatively, you could use the keyword analyzer.
ANALYZE
You can compare the different analyzers directly via REST. Using the keyword analyzer on the Email property will produce a single token.
{
"text": "some-user#some-domain.com",
"analyzer": "keyword"
}
Results in the following tokens:
"tokens": [
{
"token": "some-user#some-domain.com",
"startOffset": 0,
"endOffset": 25,
"position": 0
}
]
Compared to the standard tokenizer, which does a decent job for most types of unstructured content.
{
"text": "some-user#some-domain.com",
"analyzer": "standard"
}
Which produces reasonable results for cases where the email address was part of some generic text.
"tokens": [
{
"token": "some",
"startOffset": 0,
"endOffset": 4,
"position": 0
},
{
"token": "user",
"startOffset": 5,
"endOffset": 9,
"position": 1
},
{
"token": "some",
"startOffset": 10,
"endOffset": 14,
"position": 2
},
{
"token": "domain.com",
"startOffset": 15,
"endOffset": 25,
"position": 3
}
]
SUMMARY
This is a long answer already, so I won't cover your other two questions in detail. I would suggest splitting them to separate questions so it can benefit others.
HTML content: You can use a built-in HTML analyzer that strips HTML tags. Or you can strip the HTML yourself using custom code. I typically use Beautiful Soup for cases like these or simple regular expressions for simpler cases.
Wildcard search: Usually, users don't expect automatic wildcards appended. The only application that does this is the Outlook client, which destroys precision. When I search for "Jan" (a common name), I annoyingly get all emails sent in January(!). And a search for Dan (again, a name), I also get all emails from Danmark (Denmark).
Everything in search is a trade-off between precision and recall. In your first example with the email address, your expectation was heavily geared toward precision. But, in your last wildcard question, you seem to prefer extreme recall with wildcards on everything. It all comes down to your expectations.

Transforms to ArcGIS API geojson output and geopandas series/dataframe

The following GET yields a result you can view via Postman and/or geojson.io (NOTE: you need cookies, which are pasted at the bottom of this post):
url_geojson = "https://services6.arcgis.com/GklOjOaok2jR6aKf/ArcGIS/rest/services/NM_OG_ROWs_Linear_031417/FeatureServer/0/query?f=geojson&where=1%3D1&returnGeometry=true&spatialRel=esriSpatialRelIntersects&outFields=*&orderByFields=FID%20ASC&outSR=102100&resultOffset=0&resultRecordCount=4000&cacheHint=true&quantizationParameters=%7B%22mode%22%3A%22edit%22%7D"
In the following GET call, another GIS API call yields just a json response with additional info in regards to the transformations that need to be applied to the geometry objects generated from the geojson call:
url_json = "https://services6.arcgis.com/GklOjOaok2jR6aKf/ArcGIS/rest/services/NM_OG_ROWs_Linear_031417/FeatureServer/0/query?f=json&where=1%3D1&returnGeometry=true&spatialRel=esriSpatialRelIntersects&outFields=*&orderByFields=FID%20ASC&outSR=102100&resultOffset=0&resultRecordCount=4000&cacheHint=true&quantizationParameters=%7B%22mode%22%3A%22edit%22%7D"
Note that only difference is the f parameter (json and geojson).
The output of the json request has a section with the following output:
{
"objectIdFieldName": "FID",
"uniqueIdField": {
"name": "FID",
"isSystemMaintained": true
},
"globalIdFieldName": "",
"geometryProperties": {
"shapeLengthFieldName": "Shape__Length",
"units": "esriMeters"
},
"geometryType": "esriGeometryPolyline",
"spatialReference": {
"wkid": 102100,
"latestWkid": 3857
},
"transform": {
"originPosition": "upperLeft",
"scale": [
0.0001,
0.0001,
0,
0
],
"translate": [
-20037700,
-30241100,
0,
0
]
},...
I assume these are the parameters that I need to use to change the output coordinates of the geojson request, which are (as a single example) this:
{
"type": "FeatureCollection",
"crs": {
"type": "name",
"properties": {
"name": "EPSG:3857"
}
},
"properties": {
"exceededTransferLimit": true
},
"features": [
{
"type": "Feature",
"id": 1,
"geometry": {
"type": "LineString",
"coordinates": [
[
-11533842.1198518,
3857288.84408179
],
[
-11534147.0371623,
3857067.64072161
]
]
},...
I have managed to take the geojson output and assign to a pandas dataframe and apply the scaling transform with the following command:
main_gdf_clean['geometry'] = GeoSeries.scale(main_gdf_clean['geometry'],
xfact=0.00001, yfact=0.00001,
origin=(0,0))
At this point, I'm lost on how to apply - as listed above in the json output - the translation parameters. I've tried the following command, and it yields wildly incorrect result to geometry objects:
GeoSeries.translate(main_gdf_clean['geometry'], xoff=-20037700.0, yoff=-30241100.0)
Based on what I've presented here, can someone suggest a way to apply the translation transform properly or go about this process in a different way?
header = {'Cookie': 'ASP.NET_SessionId=umzghudgzvz22wpo3a0bgeoq; OCDUserPreference=AAEAAAD/////AQAAAAAAAAAMAgAAAEtFTU5SRC5PQ0QuUGVybWl0dGluZywgVmVyc2lvbj0xLjAuMC4wLCBDdWx0dXJlPW5ldXRyYWwsIFB1YmxpY0tleVRva2VuPW51bGwFAQAAACVOTUVNTlJELk9DRC5QZXJtaXR0aW5nLlVzZXJQcmVmZXJlbmNlCQAAAAhQYWdlU2l6ZRJXZWxsU29ydFBhcmFtZXRlcnMWRmFjaWxpdHlTb3J0UGFyYW1ldGVycxxGZWVBcHBsaWNhdGlvblNvcnRQYXJhbWV0ZXJzFkluY2lkZW50U29ydFBhcmFtZXRlcnMRUGl0U29ydFBhcmFtZXRlcnMSVGFua1NvcnRQYXJhbWV0ZXJzGkdlbmVyYXRlZEFwaVNvcnRQYXJhbWV0ZXJzFk9wZXJhdG9yU29ydFBhcmFtZXRlcnMABAQEBAQEBAQIKE5NRU1OUkQuT0NELlBlcm1pdHRpbmcuU29ydFBhcmFtZXRlckxpc3QCAAAAKE5NRU1OUkQuT0NELlBlcm1pdHRpbmcuU29ydFBhcmFtZXRlckxpc3QCAAAAKE5NRU1OUkQuT0NELlBlcm1pdHRpbmcuU29ydFBhcmFtZXRlckxpc3QCAAAAKE5NRU1OUkQuT0NELlBlcm1pdHRpbmcuU29ydFBhcmFtZXRlckxpc3QCAAAAKE5NRU1OUkQuT0NELlBlcm1pdHRpbmcuU29ydFBhcmFtZXRlckxpc3QCAAAAKE5NRU1OUkQuT0NELlBlcm1pdHRpbmcuU29ydFBhcmFtZXRlckxpc3QCAAAAKE5NRU1OUkQuT0NELlBlcm1pdHRpbmcuU29ydFBhcmFtZXRlckxpc3QCAAAAKE5NRU1OUkQuT0NELlBlcm1pdHRpbmcuU29ydFBhcmFtZXRlckxpc3QCAAAAAgAAAGQAAAAJAwAAAAkEAAAACQUAAAAJBgAAAAkHAAAACQgAAAAJCQAAAAkKAAAABQMAAAAoTk1FTU5SRC5PQ0QuUGVybWl0dGluZy5Tb3J0UGFyYW1ldGVyTGlzdAMAAAANTGlzdGAxK19pdGVtcwxMaXN0YDErX3NpemUPTGlzdGAxK192ZXJzaW9uBAAAJk5NRU1OUkQuT0NELlBlcm1pdHRpbmcuU29ydFBhcmFtZXRlcltdAgAAAAgIAgAAAAkLAAAAAgAAAAIAAAABBAAAAAMAAAAJDAAAAAIAAAACAAAAAQUAAAADAAAACQ0AAAABAAAAAQAAAAEGAAAAAwAAAAkOAAAAAgAAAAIAAAABBwAAAAMAAAAJDwAAAAIAAAACAAAAAQgAAAADAAAACRAAAAACAAAAAgAAAAEJAAAAAwAAAAkRAAAAAwAAAAMAAAABCgAAAAMAAAAJEgAAAAEAAAABAAAABwsAAAAAAQAAAAQAAAAEJE5NRU1OUkQuT0NELlBlcm1pdHRpbmcuU29ydFBhcmFtZXRlcgIAAAAJEwAAAAkUAAAADQIHDAAAAAABAAAABAAAAAQkTk1FTU5SRC5PQ0QuUGVybWl0dGluZy5Tb3J0UGFyYW1ldGVyAgAAAAkVAAAACRYAAAANAgcNAAAAAAEAAAAEAAAABCROTUVNTlJELk9DRC5QZXJtaXR0aW5nLlNvcnRQYXJhbWV0ZXICAAAACRcAAAANAwcOAAAAAAEAAAAEAAAABCROTUVNTlJELk9DRC5QZXJtaXR0aW5nLlNvcnRQYXJhbWV0ZXICAAAACRgAAAAJGQAAAA0CBw8AAAAAAQAAAAQAAAAEJE5NRU1OUkQuT0NELlBlcm1pdHRpbmcuU29ydFBhcmFtZXRlcgIAAAAJGgAAAAkbAAAADQIHEAAAAAABAAAABAAAAAQkTk1FTU5SRC5PQ0QuUGVybWl0dGluZy5Tb3J0UGFyYW1ldGVyAgAAAAkcAAAACR0AAAANAgcRAAAAAAEAAAAEAAAABCROTUVNTlJELk9DRC5QZXJtaXR0aW5nLlNvcnRQYXJhbWV0ZXICAAAACR4AAAAJHwAAAAkgAAAACgcSAAAAAAEAAAAEAAAABCROTUVNTlJELk9DRC5QZXJtaXR0aW5nLlNvcnRQYXJhbWV0ZXICAAAACSEAAAANAwwiAAAATVN5c3RlbS5XZWIsIFZlcnNpb249NC4wLjAuMCwgQ3VsdHVyZT1uZXV0cmFsLCBQdWJsaWNLZXlUb2tlbj1iMDNmNWY3ZjExZDUwYTNhBRMAAAAkTk1FTU5SRC5PQ0QuUGVybWl0dGluZy5Tb3J0UGFyYW1ldGVyAwAAAAVfbmFtZQtfZXhwcmVzc2lvbgpfZGlyZWN0aW9uAQEEJ1N5c3RlbS5XZWIuVUkuV2ViQ29udHJvbHMuU29ydERpcmVjdGlvbiIAAAACAAAABiMAAAANV2VsbCBPcGVyYXRvcgYkAAAACm9ncmlkX25hbWUF2////ydTeXN0ZW0uV2ViLlVJLldlYkNvbnRyb2xzLlNvcnREaXJlY3Rpb24BAAAAB3ZhbHVlX18ACCIAAAAAAAAAARQAAAATAAAABiYAAAAJV2VsbCBOYW1lBicAAAAId2VsbG5hbWUB2P///9v///8AAAAAARUAAAATAAAABikAAAAFT2dyaWQGKgAAAAVvZ3JpZAHV////2////wAAAAABFgAAABMAAAAGLAAAAAtGYWNpbGl0eSBJZAYtAAAAAmlkAdL////b////AQAAAAEXAAAAEwAAAAYvAAAACkNyZWF0ZWQgT24GMAAAAAljcmVhdGVkT24Bz////9v///8AAAAAARgAAAATAAAACSkAAAAJKgAAAAHM////2////wAAAAABGQAAABMAAAAGNQAAAAtJbmNpZGVudCBJZAktAAAAAcn////b////AQAAAAEaAAAAEwAAAAkpAAAACSoAAAABxv///9v///8AAAAAARsAAAATAAAABjsAAAAGUGl0IElkBjwAAAAGcGl0X2lkAcP////b////AQAAAAEcAAAAEwAAAAkpAAAACSoAAAABwP///9v///8AAAAAAR0AAAATAAAABkEAAAAHVGFuayBJZAZCAAAAB3RhbmtfaWQBvf///9v///8BAAAAAR4AAAATAAAACSMAAAAJJAAAAAG6////2////wAAAAABHwAAABMAAAAJJgAAAAZIAAAADXByb3BlcnR5X25hbWUBt////9v///8AAAAAASAAAAATAAAABkoAAAALV2VsbCBOdW1iZXIGSwAAAAt3ZWxsX251bWJlcgG0////2////wAAAAABIQAAABMAAAAGTQAAAA1PcGVyYXRvciBOYW1lCSQAAAABsf///9v///8AAAAACw==',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'
}

Azure Identity Protection - Risk Detection API - Filter by date

I am trying to filter the RiskDetection data retrieved from Azure Identity Protection by date and so far no success.
For the below sample data below filter by activityDateTime (or any date fields in the sample data) show internal error in the response:
https://graph.microsoft.com/beta/riskDetections?$filter=activityDateTime ge 2020-02-05
{'error': {'code': 'Internal Server Error', 'message': 'There was an internal
server error while processing the request.
Error ID: 0c2de841-9d83-479a-b7f2-ed2c102908f6',
'innerError':
{'request-id': '0c2de841-9d83-479a-b7f2-ed2c102908f6',
'date': '2020-02-07T01:28:17'}}}
From https://learn.microsoft.com/en-us/graph/query-parameters
Note: The following $filter operators are not supported for Azure AD
resources: ne, gt, ge, lt, le, and not. The contains string operator
is currently not supported on any Microsoft Graph resources.
Is there a way to filter by date for RiskDetections? Will appreciate any help.
Below filter with riskType and riskLevel shows data:
risk_detections_api_url = "https://graph.microsoft.com/beta/riskDetections?$filter=riskType eq 'anonymizedIPAddress' or riskLevel eq 'medium'"
Below filter with userPrincipalName shows data:
risk_detections_api_url = "https://graph.microsoft.com/beta/riskDetections?$filter=userPrincipalName eq 'john.doe#example.com'"
Below filter with ipAddress shows data:
risk_detections_api_url = "https://graph.microsoft.com/beta/riskDetections?$filter=ipAddress eq '195.228.45.176'"
Sample data
{
"id": "8901d1fee9bqwqweqwe683a221af3d2ae691736f2e369e0dd530625398",
"requestId": "cc755f41-0313-4cb2-96ce-3a6283fef200",
"correlationId": "c422083d-0e32-4afb-af4e-6ca46e4235b4",
"riskType": "anonymizedIPAddress",
"riskState": "atRisk",
"riskLevel": "medium",
"riskDetail": "none",
"source": "IdentityProtection",
"detectionTimingType": "realtime",
"activity": "signin",
"tokenIssuerType": "AzureAD",
"ipAddress": "195.228.45.176",
"activityDateTime": "2019-12-26T17:40:02.1402381Z",
"detectedDateTime": "2019-12-26T17:40:02.1402381Z",
"lastUpdatedDateTime": "2019-12-26T17:43:21.8931807Z",
"userId": "e3835755-80b0-4b61-a1c0-5ea9ead75300",
"userDisplayName": "John Doe",
"userPrincipalName": "john.doe#example.com",
"additionalInfo": "[{\"Key\":\"userAgent\",\"Value\":\"Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0\"}]",
"location": {
"city": "Budapest",
"state": "Budapest",
"countryOrRegion": "HU",
"geoCoordinates": {
"latitude": 47.45996,
"longitude": 19.14968
}
}
}
Based on Properties, activityDateTime is datetimeoffset type.
So you should use GET https://graph.microsoft.com/beta/riskDetections?$filter=activityDateTime gt 2019-12-25 rather than GET https://graph.microsoft.com/beta/riskDetections?$filter=activityDateTime gt '2019-12-25'.
A similar API document here: List directoryAudits.
But when I test it, it gives 500 error:
{
"error": {
"code": "Internal Server Error",
"message": "There was an internal server error while processing the request. Error ID: d52436f6-073b-4fc8-b3bc-c6a6336d6886",
"innerError": {
"request-id": "d52436f6-073b-4fc8-b3bc-c6a6336d6886",
"date": "2020-02-05T04:10:45"
}
}
}
I believe beta version for this API is still in change. You could contact Microsoft support with your request-id for further investigation.
You will need to provide the date in the UTC format.
Example:
https://graph.microsoft.com/beta/riskDetections?$filter=activityDateTime ge 2020-01-01T22:13:50.843847Z
In python you would do something like below to create the URL with filter:
from datetime import datetime
date_filter = datetime.utcnow().isoformat()+"Z"
request_url = "https://graph.microsoft.com/beta/riskDetections?$filter=activityDateTime ge " + date_filter
The response is now filtered:
[
{
"id": "68f0402c7063a2fbbae5895f2c63598ca3c2b81c44be60145be1a9cd7e20af4b",
"requestId": "181d3817-b4fb-4d2b-a87c-065776f05800",
"correlationId": "6d02786c-0bc7-441f-b303-51430016f955",
"riskType": "unfamiliarFeatures",
"riskState": "atRisk",
"riskLevel": "low",
"riskDetail": "none",
"source": "IdentityProtection",
"detectionTimingType": "realtime",
"activity": "signin",
"tokenIssuerType": "AzureAD",
"ipAddress": "52.185.138.50",
"activityDateTime": "2020-02-07T05:48:07.6322964Z",
"detectedDateTime": "2020-02-07T05:48:07.6322964Z",
"lastUpdatedDateTime": "2020-02-07T05:49:33.3003616Z",
"userId": "e3835755-80b0-4b61-a1c0-5ea9ead75300",
"userDisplayName": "John Doe",
"userPrincipalName": "john.doe#example.com",
"additionalInfo": "[{\"Key\":\"userAgent\",\"Value\":\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36\"}]",
"location": {
"city": "tokyo",
"state": "tokyo",
"countryOrRegion": "jp",
"geoCoordinates": {
"latitude": 35.69628,
"longitude": 139.7386
}
}
}
]

_grokparsefailure on varnish log

Message looks like
1.2.3.4 "-" - - [19/Apr/2016:11:42:18 +0200] "GET http://monsite.vpù/api/opa/status HTTP/1.1" 200 92 "-" "curl - API-Player - PREPROD" hit OPA-PREPROD-API - 0.000144958
My grok pattern is
grok {
match => { "message" => "%{IP:clientip} \"%{DATA:x_forwarded_for}\" %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent} (%{NOTSPACE:hitmiss}|-) (%{NOTSPACE:varnish_conf}|-) (%{NOTSPACE:varnish_backend}|-) %{NUMBER:time_firstbyte}"}
}
I have a grokparsefailure tag whereas all my fields are fulfilled correctly except for the last one, I get 0 instead of 0.000144958
The full message in ES is
{
"_index": "logstash-2016.04.19",
"_type": "syslog",
"_id": "AVQt7WSCN-2LsQj9ZIIq",
"_score": null,
"_source": {
"message": "212.95.71.201 \"-\" - - [19/Apr/2016:11:50:12 +0200] \"GET http://monsite.com/api/opa/status HTTP/1.1\" 200 92 \"-\" \"curl - API-Player - PREPROD\" hit OPA-PREPROD-API - 0.000132084",
"#version": "1",
"#timestamp": "2016-04-19T09:50:12.000Z",
"type": "syslog",
"host": "212.95.70.80",
"tags": [
"_grokparsefailure"
],
"application": "varnish-preprod",
"clientip": "1.2.3.4",
"x_forwarded_for": "-",
"ident": "-",
"auth": "-",
"timestamp": "19/Apr/2016:11:50:12 +0200",
"verb": "GET",
"request": "http://monsite.com/api/opa/status",
"httpversion": "1.1",
"response": "200",
"bytes": "92",
"referrer": "\"-\"",
"agent": "\"curl - API-Player - PREPROD\"",
"hitmiss": "hit",
"varnish_conf": "OPA-PREPROD-API",
"varnish_backend": "-",
"time_firstbyte": "0.000132084",
"geoip": {
"ip": "1.2.3.4",
"country_code2": "FR",
"country_code3": "FRA",
"country_name": "France",
"continent_code": "EU",
"region_name": "C1",
"city_name": "Strasbourg",
"latitude": 48.60040000000001,
"longitude": 7.787399999999991,
"timezone": "Europe/Paris",
"real_region_name": "Alsace",
"location": [
7.787399999999991,
48.60040000000001
]
},
"agentname": "Other",
"agentos": "Other",
"agentdevice": "Other"
},
"fields": {
"#timestamp": [
1461059412000
]
},
"highlight": {
"agent": [
"\"curl - API-Player - #kibana-highlighted-field#PREPROD#/kibana-highlighted-field#\""
],
"varnish_conf": [
"OPA-#kibana-highlighted-field#PREPROD#/kibana-highlighted-field#-API"
],
"application": [
"#kibana-highlighted-field#varnish#/kibana-highlighted-field#-#kibana-highlighted-field#preprod#/kibana-highlighted-field#"
],
"message": [
"1.2.3.4 \"-\" - - [19/Apr/2016:11:50:12 +0200] \"GET http://monsote.com/api/opa/status HTTP/1.1\" 200 92 \"-\" \"curl - API-Player - #kibana-highlighted-field#PREPROD#/kibana-highlighted-field#\" hit OPA-#kibana-highlighted-field#PREPROD#/kibana-highlighted-field#-API - 0.000132084"
]
},
"sort": [
1461059412000
]
}
The answer is that kibana do not display very little number
You would only get a grokparsefailure if the grok, um, fails. So, it's not this grok that's producing the tag. Use the tag_on_failure parameter in your groks to provide a unique tag for each grok.
As for your parsing problem, I'll bet that your grok is working just fine. Note that elasticsearch can make fields dynamically and will guess as to the type of the field based on the first data seen. If your first data was "0", it would have made the field an integer and later entries would be cast to that type. You can pull the mapping to see what happened.
You need to control the mapping that is created. You can specify that the field is a float in the grok itself (%{NUMBER:myField:int}) or by creating your own template.
Also notice that NOTSPACE matches "-", so your patterns for varnish_backend, etc, are not entirely correct.
The problem was coming from the syslog filter using grok internally as explained here https://kartar.net/2014/09/when-logstash-and-syslog-go-wrong/.
The solution was then to remove the tag in my own filter
The other problem is that kibana do not display number like 0.0000xxx but they are correctly stored anyway so I can use it anyway.

Resources