GROK pattern to match URIPATH - logstash-grok

Here is my sample URL
http://localhost:8080/abc2/query/errorLogs
was trying to extract only query/errorLogs. For this i have tried below GROK patten
(%{URIPROTO}://%{URIHOST}(?<path>/[^/]+/[^/]+/[^/]+))
Below output i am getting
{
"URIPROTO": [
[
"http"
]
],
"URIHOST": [
[
"localhost:8080"
]
],
"IPORHOST": [
[
"localhost"
]
],
"HOSTNAME": [
[
"localhost"
]
],
"IP": [
[
null
]
],
"IPV6": [
[
null
]
],
"IPV4": [
[
null
]
],
"port": [
[
"8080"
]
],
"path": [
[
"/abc2/query/errorLogs"
]
]
}
but i was expecting path should be "/query/errorLogs".

try this :
(%{URIPROTO}://%{URIHOST}(?<first_path>/[^/]+)%{GREEDYDATA:path})
result:
port 8080
first_path /abc2
path /query/errorLogs

Related

How to convert LabelMe output to COCO format?

I am new to this, can any one please help me to convert labelme json output to coco format output.
here is labelme output file. Thanks in advance.
{
"version": "5.1.1",
"flags": {},
"shapes": [
{
"label": "bean",
"points": [
[
3183.5227272727275,
459.65909090909093
],
[
3174.431818181818,
468.1818181818182
],
[
3162.5,
476.1363636363636
],
[
3151.1363636363635,
487.5
],
[
3144.318181818182,
508.52272727272725
],
[
3142.0454545454545,
543.75
],
[
3144.8863636363635,
571.0227272727273
],
[
3151.7045454545455,
589.7727272727273
],
[
3159.090909090909,
602.8409090909091
],
[
3168.181818181818,
614.2045454545455
],
[
3178.9772727272725,
621.5909090909091
],
[
3192.6136363636365,
628.4090909090909
],
[
3208.5227272727275,
628.4090909090909
],
[
3231.25,
619.8863636363636
],
[
3256.25,
604.5454545454545
],
[
3271.590909090909,
583.5227272727273
],
[
3277.2727272727275,
553.9772727272727
],
[
3277.2727272727275,
524.4318181818181
],
[
3271.0227272727275,
496.59090909090907
],
[
3257.3863636363635,
471.59090909090907
],
[
3238.068181818182,
459.09090909090907
],
[
3212.5,
455.6818181818182
]
],
"group_id": null,
"shape_type": "polygon",
"flags": {}
},
{
"label": "bean",
"points": [
[
2968.75,
320.45454545454544
],
[
2952.840909090909,
327.27272727272725
],
[
2935.7954545454545,
333.52272727272725
],
[
2922.7272727272725,
341.47727272727275
],
[
2915.340909090909,
358.52272727272725
],
[
2911.931818181818,
371.59090909090907
],
[
2913.068181818182,
388.6363636363636
],
[
2919.318181818182,
405.6818181818182
],
[
2928.409090909091,
423.8636363636364
],
[
2947.7272727272725,
448.8636363636364
],
[
2967.6136363636365,
460.22727272727275
],
[
2986.3636363636365,
463.6363636363636
],
[
3003.409090909091,
464.77272727272725
],
[
3020.4545454545455,
463.6363636363636
],
[
3036.931818181818,
454.54545454545456
],
[
3049.431818181818,
446.02272727272725
],
[
3056.818181818182,
431.8181818181818
],
[
3063.068181818182,
411.9318181818182
],
[
3065.340909090909,
390.34090909090907
],
[
3056.818181818182,
372.15909090909093
],
[
3044.318181818182,
353.40909090909093
],
[
3026.7045454545455,
338.6363636363636
],
[
3013.6363636363635,
328.40909090909093
],
[
2992.6136363636365,
318.1818181818182
],
[
2982.9545454545455,
318.75
]
],
"group_id": null,
"shape_type": "polygon",
"flags": {}
},

Need to exclude few words from logs using grok

Consider the below string
date 00:00 1.1.1.1 POST test.com hello-world
How could I print only the date totaltime and URL(test.com) using grok?
Given the sample above
^%{DATA:date} %{DATA:time} %{IP:ip} %{DATA:method} %{DATA:url} %{GREEDYDATA:path}$
would generate:
{
"date": [
[
"date"
]
],
"time": [
[
"00:00"
]
],
"ip": [
[
"1.1.1.1"
]
],
"method": [
[
"POST"
]
],
"url": [
[
"test.com"
]
],
"path": [
[
"hello-world"
]
]
}
Afterwards you can mutate it whichever form you want

how to get all details of a particular userid in a json file in python

i want to access the particular userid details
[
{
"userID": 998926445,
"contentID": [
[
"5bbae768c1df412352000004"
],
[
"5ba8d4fac1df413dae0002cf"
],
[
"5ca61afced8f7d3a5f00102d"
],
[
"5b9c9cacc1df41453400003f"
],
[
"5c8a8044a58c4046b30030f2"
],
[
"5ba9070bc1df413dae0003c3"
],
[
"5bbb1087c1df4140a6000162"
],
[
"5c95142bed8f7d5ede004ef4"
],
[
"5ba905e5c1df413dae0003b9"
],
[
"5bb89799c1df41262300062a"
]
]
},
{
"userID": 998926445,
"contentID": [
[
"5baa8ef5c1df41479a0004b8"
],
[
"5c8a8063a58c4046c8000e89"
],
[
"5bbc7a16c1df412a82000008"
],
[
"5bb8964ec1df41262300060c"
],
[
"5bbc4f92c1df4140a6000abe"
],
[
"5bbb0ecbc1df4140a60000fc"
],
[
"5ba90aa2c1df413dae000429"
],
[
"5bf2bb06c1df411238003054"
],
[
"5cb0c006ed8f7d6a1d00146a"
],
[
"5bbc9825c1df41384100024c"
]
]
},
{
"userID": 998926445,
"contentID": [
[
"5bb8974cc1df412623000622"
],
[
"5b9c9cadc1df414534000047"
],
[
"5b8e5b32c1df412918000048"
],
[
"5b9c9cacc1df41453400003f"
],
[
"5bb8ac8ac1df4126230008a0"
],
[
"5b9fad7bc1df4145340000a7"
],
[
"5bbb1171c1df4140a600016c"
],
[
"5c8a8071a58c4046c8000e8d"
],
[
"5ba90dbac1df413dae00043d"
],
[
"5ba8f905c1df413dae000397"
]
]
}
Try to do something like this. You will get the list. then you can work on fixing it. Let us assume json_list is your original json list.
from collections import defaultdict
dd = defaultdict(list)
for i in json_list:
dd[i['userID']].append([j[0] for j in i['contentID']])
dd = dict(dd)
print(dd)
Your output will be something like this:
{998926445: [['5bbae768c1df412352000004', '5ba8d4fac1df413dae0002cf', '5ca61afced8f7d3a5f00102d', '5b9c9cacc1df41453400003f', '5c8a8044a58c4046b30030f2', '5ba9070bc1df413dae0003c3', '5bbb1087c1df4140a6000162', '5c95142bed8f7d5ede004ef4', '5ba905e5c1df413dae0003b9', '5bb89799c1df41262300062a'], ['5baa8ef5c1df41479a0004b8', '5c8a8063a58c4046c8000e89', '5bbc7a16c1df412a82000008', '5bb8964ec1df41262300060c', '5bbc4f92c1df4140a6000abe', '5bbb0ecbc1df4140a60000fc', '5ba90aa2c1df413dae000429', '5bf2bb06c1df411238003054', '5cb0c006ed8f7d6a1d00146a', '5bbc9825c1df41384100024c'], ['5bb8974cc1df412623000622', '5b9c9cadc1df414534000047', '5b8e5b32c1df412918000048', '5b9c9cacc1df41453400003f', '5bb8ac8ac1df4126230008a0', '5b9fad7bc1df4145340000a7', '5bbb1171c1df4140a600016c', '5c8a8071a58c4046c8000e8d', '5ba90dbac1df413dae00043d', '5ba8f905c1df413dae000397']]}

Parse result of XML -> XML2JS ->JSON.stringify using JQ

I have a file created by doing an XHR fetch of XML and parsing it through the node module xlm2js and then JSON.stringify. It has about 700 segments of two basic types. This is an edited version of the file with one segment of each type:
{
"NewDataSet": {
"Table": [
{
"SegmentID": [
"2342"
],
"StationID": [
"005es00045:_MN_Stn"
],
"SegmentName": [
"I-5 NB MP0.45 # SR-14"
],
"SegmentType": [
"2"
],
"SegmentLength": [
"1135"
],
"MinimumLanesReporting": [
"0.5"
],
"CalculationThreshold": [
"30"
],
"CalculationPeriod": [
"2"
],
"MinimumSamples": [
"3"
],
"SegmentMaximumFilter": [
"774"
],
"SegmentMinimumFilter": [
"12"
],
"StandardDeviationSamples": [
"15"
],
"StandardDeviationMultiplier": [
"1.96"
],
"UseStandardDeviationFilter": [
"false"
],
"IsActive": [
"true"
]
},
{
"SegmentID": [
"3051"
],
"BeginningDcuID": [
"584"
],
"EndDcuID": [
"589"
],
"SourceSystem": [
"TravelTime"
],
"SegmentName": [
"OR212 at SE 242nd Ave to OR212 at SE Foster Rd"
],
"SegmentType": [
"1"
],
"SegmentLength": [
"100"
],
"CalculationThreshold": [
"60"
],
"CalculationPeriod": [
"10"
],
"MinimumSamples": [
"3"
],
"SegmentMaximumFilter": [
"3600"
],
"SegmentMinimumFilter": [
"50"
],
"StandardDeviationSamples": [
"20"
],
"StandardDeviationMultiplier": [
"1.96"
],
"UseStandardDeviationFilter": [
"true"
],
"IsActive": [
"true"
]
}
]
}
}
I need to ignore the "SegmentType":["2"] segments and extract SegmentID, SegmentName, BeginningDcuID, EndingDcuID, and SegmentLength from the type 1 segments where IsActive is true.
I can list the file with jq "." but any attempt at other operations with jq fail, usually with the message:
'jq: error: syntax error, unexpected '[' (Unix shell quoting issues?) at , line 1:'
Any suggestions for jq syntax changes or xml2js parameter changes to make this work would be outstandingly helpful.
Never use double quotes for quoting an argument if there is nothing in it that you want the shell to expand.
$ jq '.NewDataSet.Table[]
| select(.SegmentType[0] != "2" and .IsActive[0] == "true")
| (.SegmentID, .SegmentName, .BeginningDcuID, .EndingDcuID, .SegmentLength)[0]' file
"3051"
"OR212 at SE 242nd Ave to OR212 at SE Foster Rd"
"584"
null
"100"

grok help for logstash

My logs look as such
00009139 2015-03-03 00:00:20.142 5254 11607 "HTTP First Line: GET /?main&legacy HTTP/1.1"
I tried using grok debugger to get this information formatted with no success. Is there any way to get this format using grok? The quoted string would be the message
So I used the following formatting simply by using the grok patterns page.
%{NUMBER:Sequence} %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}? %{NUMBER:Process}%{NUMBER:Process2}%{WORD:Message}
This is the closest I could get with the current info.
%{INT}%{SPACE}%{TIMESTAMP_ISO8601}%{SPACE}%{INT:pid1}%{SPACE}%{INT:pid2}%{SPACE}%{GREEDYDATA:message}
With the above grok pattern, this is what the grokdebugger "catches":
{
"INT": [
[
"00009139"
]
],
"SPACE": [
[
" ",
" ",
" ",
" "
]
],
"TIMESTAMP_ISO8601": [
[
"2015-03-03 00:00:20.142"
]
],
"YEAR": [
[
"2015"
]
],
"MONTHNUM": [
[
"03"
]
],
"MONTHDAY": [
[
"03"
]
],
"HOUR": [
[
"00",
null
]
],
"MINUTE": [
[
"00",
null
]
],
"SECOND": [
[
"20.142"
]
],
"ISO8601_TIMEZONE": [
[
null
]
],
"pid1": [
[
"5254"
]
],
"pid2": [
[
"11607"
]
],
"message": [
[
""HTTP First Line: GET /?main&legacy HTTP/1.1""
]
]
}
Hope I was of some help.
Try to replace %{WORD:Message} at the end of your grok with %{QS:message}.
hope this helps :)

Resources