Regex working on Testsite but not in logstash - logstash

This is the field utm_url:
https://www.google.com/search?q=suche+f%C3%BCr+logstash&client=firefox-b-d&ei=oZlXY5q5COyI9u8P7NiAqAo&ved=0ahUKEwjajczf9vr6AhVshP0HHWwsAKUQ4dUDCA4&uact=5&oq=suche+f%C3%BCr+logstash&gs_lcp=Cgdnd3Mtd2l6EAMyBQghEKABMgUIIRCgAToKCAAQRxDWBBCwAzoECAAQQzoFCAAQgAQ6CwguEIAEEMcBENEDOggILhCABBDUAjoHCC4Q1AIQQzoECC4QQzoFCC4QgAQ6DgguEIAEEMcBENEDENQCOgYIABAWEB46CQgAEIAEEA0QEzoICAAQFhAeEBM6CggAEBYQHhAPEBM6BwgAEIAEEBM6CAgAEBYQHhAPSgQITRgBSgQIQRgASgQIRhgAUMoKWLkrYM0saANwAXgBgAHTAYgBrxGSAQYyLjE1LjGYAQCgAQHIAQjAAQE&sclient=gws-
This is my regex:
.*google\.[a-z]*\/search.*q=[^$#&].*
And this is how it looks in the Logstash Pipeline:
if [utm_categoryname] == "Search Engines" or [utm_url] =~ "/.*google\.[a-z]*\/search.*q=[^$#&].*/" {
The Regex works fine on https://extendsclass.com/regex-tester.html , https://regex101.com/ and https://grokconstructor.appspot.com/do/match#result
My Goal is to only match on Google Search URLs with a search query

Related

How to match different instances of the same query in Elasticsearch?

Example 1:
My query term is "abcd".
My query structure is like this:
query: {
query_string: {
query: "abc",
fields: ["field1", "field2", "field3"]
}
},
size: 50,
"highlight": {
"fields": {
"field1": {},
"field2": {},
"field3": {}
}
It matches the following instances:
abc abcs abc_def_ghi
But it does not match def_abc or def_abc_ghi.
Basically instances where abc is in the middle of a string.
Example 2:
In the same example above, if my query is abc_def
It does not match abc_def_ghi, although abc_def is present.
I have tried prefix_phrase and it solves scenario 2 but misses out on example 1's problems.
Any help would be appreciated.
for these usages you should use wildcard in query or regular expression
if you are using term query you can utilize wildcard term query or regexp query instead.
As name suggests phase_prefix is like poor mans autocomplete it searches for fields which starts with given phrase in your case abc,abcs,abc_def_ghi. as your field doesn't start with abc in case of def_abc,def_abc_ghi it won't work with phrase prefix.
Try using character filters specifically Pattern Replace Character Filter to replace _ with (space) from your field while analyzing your field. check this answer . so your token would result in [def,abc,ghi] instead of single token like [def_abc_ghi]. then you can search it using cross_field on analyzed field which should satisfy all of your mentioned cases.

Grok regex with escaped “[“, “(“, and “)” chars problems

Elastic newbie here - working with a new 5.5 install. I have a log line that looks like so:
[2015/10/01#19:48:22.785-0400] P-4780 T-2208 I DBUTIL : (451) prostrct
create session begin for timk519 on CON:.
I have the following regex:
\[%{DATE:date}#%{TIME:time}-(?<gmtoffset>\d{4})\]\s*(?<procid>P-[0-9]+)\s*(?<threadid>T-[0-9]+)\s*(?<msgtype>[ifIF])\s*(?<processtype>[a-zA-Z]+)\s*(?<usernumber>[0-9]+|[:])\s*\((?<msgnum>[0-9]+|[\-]+)\)\s*%{GREEDYDATA:message}
When I try it in the kibana grok debugger it doesn't work and I get the following error:
GrokDebugger: [parse_exception] [pattern_definitions] property isn't a
map, but of type [java.lang.String], with { header={
processor_type="grok" & property_name="pattern_definitions" } }
this appears to be due to the \[ at the start of the line. If I replace the leading \[ with a period "." I get this
.%{DATE:date}#%{TIME:time}-(?<gmtoffset>\d{4})\]\s*(?<procid>P-[0-9]+)\s*(?<threadid>T-[0-9]+)\s*(?<msgtype>[ifIF])\s*(?<processtype>[a-zA-Z]+)\s*(?<usernumber>[0-9]+|[:])\s*\((?<msgnum>[0-9]+|[\-]+)\)\s*%{GREEDYDATA:message}
the grok debugger and https://grokdebug.herokuapp.com/ are good with this pattern.
When I put this regex into logstash, it fails to recognize the msgnum (451) part of the line because of the escaped parens \( and \) around the msgnum field, and as a result fails to recognize the line as a legal string.
Am I escaping something incorrectly? Is this a bug?
UPDATE 2017-07-21
I got around the issue with escaping ( and ) by putting them in [(] and [)]. I haven't figured out a way to solve matching the leading [ yet.
UPDATE 2017-07-24
The answer below was an epic catch and I've used that to create the following custom patterns:
DBTIME %{TIME}[-+]\d{4}
DBTIMESTAMP %{YEAR}/%{MONTHNUM}/%{MONTHDAY}#%{DBTIME}
which I've implemented in my grok statement like so:
\[%{DBTIMESTAMP:dbdatetime}\]\s*%{PROCESSID:processid}\s*%{DBTHREADID:threadid}\s*%{DBMSGTYPE:msgtype}\s*%{PROCESSTYPE:processtype}?\s*%{USERNUMBER:usernumber}?\s*:\s*[(]%{MSGNUMBER:msgnumber}[)].\s*%{GREEDYDATA:eventmessage}\s*\r
I then use the date filter to turn the dbdatetime into a #timestamp setting, and now the regex matches the incoming log stream which is what I want. Thx!
The devil is in the detail and the error is not apparent at first. The reason the Grok Debugger fails is because of your use of the DATE pattern. This pattern resolves like this:
DATE_US %{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR}
DATE_EU %{MONTHDAY}[./-]%{MONTHNUM}[./-]%{YEAR}
MONTHNUM and MONTHDAY are both 2 digit patterns, which in turn actually means they are matching the 15 in your year. This is the reason why the pattern does not work because \[%{DATE} is actually not matching (it is missing the 20). Why does the pattern .%{DATE} work tough? Because you are not matching the [ with the dot, your are matching the 0 of the year.
How to fix this? Use a custom pattern to match the date. Something like this works:
\[(?<date>%{YEAR}/%{MONTHNUM}/%{MONTHDAY})#%{TIME:time}-(?<gmtoffset>\d{4})\]\s*(?<procid>P-[0-9]+)\s*(?<threadid>T-[0-9]+)\s*(?<msgtype>[ifIF])\s*(?<processtype>[a-zA-Z]+)\s*(?<usernumber>[0-9]+|[:])\s*\((?<msgnum>[0-9]+|[\-]+)\)\s*%{GREEDYDATA:message}
This will return the following output:
{
"date": "2015/10/01",
"msgnum": "451",
"procid": "P-4780",
"processtype": "DBUTIL",
"message": "prostrct create session begin for timk519 on CON:.",
"threadid": "T-2208",
"usernumber": ":",
"gmtoffset": "0400",
"time": "19:48:22.785",
"msgtype": "I"
}

How to search full or partial match with first name and last name in mongodb

How to search full or partial match with first name and last name in mongodb?
I tried using this,
{"name":{ $regex: str, $options: 'i'}}
But it is only for full match of the string.
Can I use regex for partial match?
For this type of search better to create text index. mongo shell command to create text index for name field.
db.colectionName.createIndex( { name: "text" } );
then you can search using $text and $search
var text = 'John Test''
db.collectionName.find({ $text: { $search: text } });
for this query you will get if name contain john or test and this is case insensitive
I have to do this for my project, how does this work for you? {"name":new RegExp('^'+name, 'i')} you may need to encode the string first str.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
Try this
{'name': {'$regex': '.*str.*', $options: 'i'}}
I have responded to similar question here, combining text index and regex pattern makes it work nicely. Note, text index is searching by terms so if you try to search for padavan by supplying pad you won't get what you are expecting having only text index in place.

Getting parsing error while writting custom grok filter

I am new to regex and using logstack grok filter. I have to write my own regex to filter this message
NAID-iOS-3093448A-BC34-4FE1-A057-29AD2CEF8FD3-1471410782.565937 IP-202.174.93.103 2016-08-17 00:13:09,963
my grok filter is
grok{
match => {"message" => "%{DATA:deviceid}%{IP:Serverip}%{%{YEAR}[-]%{MONTHNUM}[-]%{MONTHDAY}:Date}%{GREEDYDATA:data}"}
}
the default DATE_EU entry in grok has %{MONTHDAY}[./-]%{MONTHNUM}[./-]%{YEAR}
but i needed a reverse of it so I wrote
%{%{YEAR}[./-]%{MONTHNUM}[./-]%{MONTHDAY}:Date} instead of %{DATE_EU:Date} but getting [0]_grok parsefailure message in my syntax.
I even followed https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html#_custom_patterns and created a patterns directory inside logstash folder I am using windows and added DATE_LOG %{YEAR}[./-]%{MONTHNUM}[./-]%{MONTHDAY} entry inside file named "extra" but again got parsing_error.
You're using bad syntax. While parsing a log line you name each part with your own variables, for example:
%{MONTHNUM:month}-%{MONTHDAY:day}
and then you complete your own format this way:
replace => ['timestamp', '%{month}-%{day}']

Using a regex with text search in MongoDB

I have created a text index on the num field in the collection. Now while passing the string to search from the text index, I need to use a regex which has to be passed as a string to the $search variable.
My current query works fine but it doesn't work when I add a regex to it.
Current Query:
db.collection.find({$text:{$search:"1234 6789"}},{'id':1})
I need to add a regex/like query to the $search to make it something like
db.collection.find({$text:{$search:"/1234/ /6789/"}},{'id':1})
where I get all the values from the database that contain a pattern like "1234" OR "6789".
I did try the query but it gives me a $search needs a String error:
db.collection.find({$text:{$search:/1234/}},{'id':1})
To achieve this you should use the $regex MongoDB operator:
// Without options
db.collection.find({num: /1234|5678/i});
// Separate options property
db.collection.find({num: {$regex: /1234|5678/, $options: 'i'}});
To add multiple terms in the regex, use the | operator
$regex docs and examples
Edit:
For querying records using an array of values the $in operator can be used:
$in docs and examples

Resources