Grok debugging issue with double space

Grok debugging issue with double space - logstash-grok

I have the following log which was generated using log4net
2017-12-11 17:01:28,390 [6] INFO DAL.DBManager "FunctionName":"Dispose"
The problem is the 2 spaces after INFO. If the word is debug it seems to only have 1 space, so it could be "tab".
I'm using http://grokdebug.herokuapp.com/ but my pattern, below, doesn't seem to work.
%{TIMESTAMP_ISO8601} \[%{NUMBER:thread}\] %{LOGLEVEL:log-level} %{DATA:CLASS} %{DATA:Function} %{DATA:FunctionName} %{GREEDYDATA:remainder}
I've tried adding %{SPACE} instead of the space but it doesn't generate anything.

If you want to match exactly two whitespaces, you'll have to add two whitespaces in your pattern aswell. Following pattern seems to match the line you wrote:
%{TIMESTAMP_ISO8601} \[%{NUMBER:thread}\] %{LOGLEVEL:log-level} %{DATA:CLASS}\.%{DATA:Function} %{DATA:FunctionName}\:%{GREEDYDATA:remainder}
If you want to match one or two whitespaces you can use a whitespace and an optional whitespace ( )? like so:
%{TIMESTAMP_ISO8601} \[%{NUMBER:thread}\] %{LOGLEVEL:log-level} ( )?%{DATA:CLASS}\.%{DATA:Function} %{DATA:FunctionName}\:%{GREEDYDATA:remainder}

Related

vim Search Replace should use replaced text in following searches

I have a data file (comma separated) that has a lot of NAs (It was generated by R). I opened the file in vim and tried to replace all the NA values to empty strings.
Here is a sample slimmed down version of a record in the file:
1,1,NA,NA,NA,NATIONAL,NA,1,NANA,1,AMERICANA,1
Once I am done with the search-replace, the intended output should be:
1,1,,,,NATIONAL,,1,NANA,1,AMERICANA,1
In other words, all the NAs should be replaced except the words NATIONAL, NANA and AMERICANA.
I used the following command in vim to do this:
1, $ s/\,NA\,/\,\,/g
But, it doesn't seem to work. Here is the output that I get:
1,1,,NA,,NATIONAL,,1,NANA,1,AMERICANA,1
As you can see, there is one ,NA, that is left out of the replacement process.
Does anyone have a good way to fix it? Thanks.
A trivial solution is to run the same command again and it will take care of the remaining ,NA,. However, it is not a feasible solution because my actual data file has 100s of columns and 500K+ rows each with a variable number of NAs.

, doesn't have a special meaning so you don't have to escape it:
:1,$s/,NA,/,,/g
Which doesn't solve your problem.
You can use % as a shorthand for 1,$:
:%s/,NA,/,,/g
Which doesn't solve your problem either.
The best way to match all those NA words to the exclusion of other words containing NA would be to use word boundaries:
:%s/,\<NA\>,/,,/g
Which still doesn't solve your problem.
Which makes those commas, that you used to restrict the match to NA and that are causing the error, useless:
:%s/\<NA\>//g
See :help :range and :help \<.

Use % instead of 1,$ (% means "the buffer" aka the whole file).
You don't need \,. , works fine.
Vim finds discrete, non-overlapping matches. so in ,NA,NA,NA, it only finds the first ,NA, and third ,NA, as the middle one doesn't have its own separate surrounding ,. We can modify the match to not include certain characters of our regex with \zs (start) and \ze (end). These modify our regex to find matches that are surrounded by other characters, but our matches don't actually include them, so we can match all the NA in ,NA,NA,NA,.
TL;DR: %s/,\zsNA\ze,//g

Regex Search/Replace is... inverted

Up to a few days ago my Sublime text 3 was working just fine. I could search/replace regular strings and use regular expressions patterns as well and when a capture group got a match, all of them were highlighted perfectly.
However, since yesterday, everything I search is matching... reversely. Here:
image:\s*"?(.*?)"?
This should match a fixed string image, followed by a colon, any number of spaces, if any, and anything between optional quotes.
Not a big deal, right? However Sublime is capturing the string image instead of what I've defined to be captured. Even if there are no spaces or quotes, it should at least match what's after the colon, not before it:
I did a fresh install, reinstalling and reconfiguring the very few plugins I use, trying to, maybe, get rid of any sort of caching, without luck.
And this is a major setback for me 'cause I can't do batch replacements all over a project.
There are only two things I did differently than my regular development routine:
Installed String 2 Lower Hyphen Plugin to speed-up the creation of some dashed separated URI slugs BUT when fresh installing I didn't add it back and the problem persisted.
For the first time, I used the expression <open files> to do a batch replacement in a specific set of files I had manually opened since they're in different directories.
Nothing more than that.
I can workaround the issue by changing the .*? to a .* but this is a palliative measure since I always used the non-greedy version without problems
Does anyone know what could be happening?

I'm not sure how your regex used to match any differently, let's think about what the regex is saying:
image: - the literal image:
\s* - any amount of whitespace, including none
"? - an optional quote
(.*?) - lazily capture anything except a newline character into capture group 1
"? an optional quote
So for your example text to match, it matches image: and the space after it, then, there is no quote, the next instruction is lazy so it captures nothing into capture group 1, then there is no quote, so that is the full extent of the match.
If you always want to capture the value in capture group 1, regardless of whether it was a quoted or unquoted string, you could instead consider using an expression like:
\bimage:\s*"?((?(?<=")[^"]*|.*$))"?
\b word boundary, to ignore image: not being the start of a word
image: literal image:
\s* any amount of whitespace, including none (depending on your source document and requirements, it may be better/more defensive to specify a literal space so that newlines won't be matched here)
"? optional quote
( begin capture group 1
(?(?<=") conditional - lookbehind to see if a quote matched
[^"]* if a quote matched, then match all non-quote characters (of course, we could also check for escaped quotes if your file is YAML format or similar, but URLs shouldn't contain quotes, so we're leaving it out as per the original regex.)
| otherwise, if the conditional didn't match, i.e. there was no quote after image:
.*$ match everything until the newline - again, if this is YAML, you may want to consider excluding comments etc.
) end conditional
) end capture group
"? optional quote (this will never match at $ if the conditional fails)

Accommodate uncertain number of Spaces in a log file GROK pattern

This may be a simple question, but in my logs the spaces between different fields are uncertain, that mean in some logs I can see two spaces and in some three between the same fields. How do we accommodate this in GROK?

You can use %{SPACE}* in your grok pattern for matching uncertian number of spaces. It will match even if spaces are present or not.

Grok is at it's heart an overlay on Regex's. So in your grok pattern, you can directly use Regex syntax:
%{WORD} +%{WORD}
So "space+" means one or more spaces. "space*" means 0 or more spaces.
Grok also has a pattern %{SPACE} that is equivilent to " *"

Logstash how to combine words separated by delimiter

i have some sting like "John-Raj " I would like to combine these two as a single field in logstash by using grok pattern.
So I want the output as like below. But I am not able to get the output as single field by using \%{WORD} and %{NOTSPACE}
"John-Raj"
And ideas how to create grok to output?

%{WORD} is alphanumeric and underscore, so it won't match your hyphen.
%{NOTSPACE} matches in the debugger.

If you have quoted text yo may use %{QS} pattern.

I was looking how to combine several patterns to build the one value as well.
Found here
Sometimes logstash doesn’t have a pattern you need. For this, you have
a few options.
First, you can use the Oniguruma syntax for named capture which will
let you match a piece of text and save it as a field:
(?<_field_name_>the pattern here)
So in your case the following will make value = "John-Raj" (tested in the debugger)
(?<value>%{WORD}%{NOTSPACE})

How to replace in vim

I have a line in a source file: [12 13 15]. In vim, I type:
:%s/\([0-90-9]\) /\0, /g
wanting to add a coma after 12 and 13. It works, but not quite, as it inserts an extraspace [12 , 13 , 15].
How can I achieve the desired effect?

Use \1 in the replacement expression, not \0.
\1 is the text captured by the first \(...\). If there were any more pairs of escaped parens in your pattern, \2 would match the text capture between the pair starting at the second \(, \3 at the third \(, and so on.
\0 is the entire text matched by the whole pattern, whether in parentheses or not. In your case this includes the space at the end of your pattern.
Also note that [0-90-9] is the same as [0-9]: each [...] collection matches just one character. It happens to work anyway, because in your data ‘a digit followed by a space’ matches in the same places as ‘2 digits followed by a space’. (If you actually needed to only insert commas after 2 digits, you could write [0-9][0-9].)

"I have a line in a source file:..."
then you type :%s/... this will do the substitution on all lines, if it matched. or that is the single line in your file?
If it is the single line, you don't have to group, or [0-9], just :%s/ \+/,/g will do the job.

The fine answers already point interesting solutions, but here's another one,
making use of the \zs, which marks the start of the match. In this pattern:
/[0-9]\zs /
The searched text is /[0-9] /, but only the space counts as a match. Note
that you can use the class \d to simplify the digit character class, so the
following command shall work for your needs:
:s/\d\d\zs /, /g ; matches only the space, replace by `, '
You said you have multiple lines and these changes are only to certain lines.
You can either visually select the lines to be changed or use the :global
command, which searches for lines matching a pattern and applies a command to
them. Now you'd need to build an expression to match the lines to be changed
in a less precise as possible way. If the lines that begins with optional
spaces, a [ and two digits are the only lines to be matched and no other
ones, then this would work for you:
:g/\s*[\d\d/s/\d\d\zs /, /g
Check the help for pattern.txt for \ze and similar and
:global.
Homework: use the help to understand \zs and see how this works:
:s/\d\d\zs\ze /,/g

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string