How exactly am I to format this a-zA-Z for GROK custom regular expressions? - logstash-grok

Please see image. How the heck do you get a simple [a-zA-Z] expression to work in the KIBANA X-Pack Grok debugger?
I've tried several flavors and have ran the regex just fine in normal regex testing environments where it finds all that I need but this debugger wants something that I cannot figure out. Again this is a CUSTOM regular expression not the pre-built ones.
[a-z]
[A-Z]
[a-zA-Z]
([a-zA-Z]+)
and more

The first box is the data string, the second box is the pattern and the last box is where you define custom patterns. You have no pattern and the syntax for defining a custom pattern is wrong.
In the second box type
%{MY_REGEX:results}
In the third box type
MY_REGEX [a-z]
This creates a new pattern called MY_REGEX which can be used in the actual search pattern.
That matches the first character of the data, which is unlikely to be what was intended, but that should get you started.
See also https://www.elastic.co/guide/en/kibana/current/grokdebugger-getting-started.html#grokdebugger-custom-patterns

Related

Logstash how to combine words separated by delimiter

i have some sting like "John-Raj " I would like to combine these two as a single field in logstash by using grok pattern.
So I want the output as like below. But I am not able to get the output as single field by using \%{WORD} and %{NOTSPACE}
"John-Raj"
And ideas how to create grok to output?
%{WORD} is alphanumeric and underscore, so it won't match your hyphen.
%{NOTSPACE} matches in the debugger.
If you have quoted text yo may use %{QS} pattern.
I was looking how to combine several patterns to build the one value as well.
Found here
Sometimes logstash doesn’t have a pattern you need. For this, you have
a few options.
First, you can use the Oniguruma syntax for named capture which will
let you match a piece of text and save it as a field:
(?<_field_name_>the pattern here)
So in your case the following will make value = "John-Raj" (tested in the debugger)
(?<value>%{WORD}%{NOTSPACE})

Sublime Text 3: I can't search a string with a dollar followed by underscore ($_GET, $_POST, etc.)

I can search the following without problems:
_GET
$variable
However, sublime fails to search $_ (p.e. $_GET.) I have tried to escape it somehow:
$\_GET
\$_GET
$__GET
I'm on Ubuntu 14.04LTS
Turn off the regular expressions search. It is the button on the far left of the search field (in this picture currently selected):
With regular expressions turned off:
Although I'm not sure if this would fit your exact problem since you tried escaping using \$_, this answer may still help for posterity.
Did you also make sure "whole word" search is turned off? That's the 3rd button from the left (next to the Aa)
With whole word turned on:
Failing with the attempted escaped \$_:
And it succeeding with _GET:
Note that whole word search of $_ would succeed if there was a whole $_ phrase, surrounded by whitespace. For example with whole word search on:
I am a sentence with the keyword $_ which will be matched.
would work, whereas:
I am a sentence with the keyword $_GET, which will never match. $_POST, $_REQUEST, and $_SERVER won't work either.
would break the whole word search.

Delete text with GREP in Textwrangler

I have the following source code from the Wikipedia page of a list of Games. I need to grab the name of the game from the source, which is located within the title attribute, as follows:
<td><i>007: Quantum of Solace</i><sup id="cite_ref-4" class="reference"><span>[</span>4<span>]</span></sup></td>
As you can see above, in the title attribute there's a string. I need to use GREP to search through every single line for when that occurs, and remove everything excluding:
title="Game name"
I have the following (in TextWrangler) which returns every single occurrence:
title="(.*)"
How can I now set it to remove everything surrounding that, but to ensure it keeps either the string alone, or title="string".
I use a multi-step method to process these kind of files.
First you want to have only one HTML tag per line, GREP works on each line so you want to minimise the need for complicated patterns. I usually replace all: > with >\n
Then you want to develop a pattern for each occurrence of the item you want. In this case 'title=".?"'. Put that in between parentheses (). Then you want add some filling to that statement to find and replace all occurrences of this pattern: .?(title=".?").
Replace everything that matches .?(title=".?").* with \1
Finally, make smart use of the Textwrangler function process lines containing, to filter any remaining rubbish.
Notes
the \1 refers to the first occurrence of a match between () you can also reorder stuff using multiple parentheses and use something like (.?), (.) with \2, \1 to shuffle columns.
Learn how to do lazy regular expressions. The use of ? in these patterns is very powerfull. Basically ? will have the pattern looking for the next occurrence of the next part of the pattern not the latest part that the next part of your pattern occurs.
I've figured this problem out, it was quite simple. Instead of retrieving the content in the title attribute, I'd retrieve the page name.
To ensure I only struck the correct line where the content was, I'd use the following string for searching the code.
(.)/wiki/(.)"
Returning \2
After that, I simply remove any cases where there is HTML code:
<(.*)
Returning ''
Finally, I'll remove the remaining content after the page name:
"(.*)
Returning ''
A bit of cleaning up the spacing and I have a list for all game names.

Chrome extension (content scripts and match patterns) - only match digits in URL

I'm about to write my first Chrome Extension, but I'm just wondering if you can really only use "*" and "?" when declaring the "matches" pattern and the "include_globs" and "exclude_globs" patterns in manifest.json?
With regex, I'd declare this kind of pattern: "example.com/[0-9]+"
"example.com/*" however would match any kind of characters after "example.com/", but I want only digits to be matched.
Is that possible?
Based on the documentation, it seems that only * is a special character and that the patterns are not treated as regexes. (there are no mentions of regular expressions on that page actually)
I would suggest using example.com/* and wrap any action you want to take with a check on the URL where you'll be able to use any regex you want.

replacing part of regex matches

I have several functions that start with get_ in my code:
get_num(...) , get_str(...)
I want to change them to get_*_struct(...).
Can I somehow match the get_* regex and then replace according to the pattern so that:
get_num(...) becomes get_num_struct(...),
get_str(...) becomes get_str_struct(...)
Can you also explain some logic behind it, because the theoretical regex aren't like the ones used in UNIX (or vi, are they different?) and I'm always struggling to figure them out.
This has to be done in the vi editor as this is main work tool.
Thanks!
To transform get_num(...) to get_num_struct(...), you need to capture the correct text in the input. And, you can't put the parentheses in the regular expression because you may need to match pointers to functions too, as in &get_distance, and uses in comments. However, and this depends partially on the fact that you are using vim and partially on how you need to keep the entire input together, I have checked that this works:
%s/get_\w\+/&_struct/g
On every line, find every expression starting with get_ and continuing with at least one letter, number, or underscore, and replace it with the entire matched string followed by _struct.
Darn it; I shouldn't answer these things on spec. Note that other regex engines might use \& instead of &. This depends on having magic set, which is default in vim.
For an alternate way to do it:
%s/get_\(\w*\)(/get_\1_struct(/g
What this does:
\w matches to any "word character"; \w* matches 0 or more word characters.
\(...\) tells vim to remember whatever matches .... So, \(w*\) means "match any number of word characters, and remember what you matched. You can then access it in the replacement with \1 (or \2 for the second, etc.)
So, the overall pattern get_\(\w*\)( looks for get_, followed by any number of word chars, followed by (.
The replacement then just does exactly what you want.
(Sorry if that was too verbose - not sure how comfortable you are with vim regex.)

Resources