Why does this pattern not produce an error? - pcre2

Compiled PCRE2 10.39 from source on aarch64 (Apple M1). If I use the pattern Product\d{2,} it compiles and matches correctly, but if I instead use the pattern Product\d{2 it doesn't produce any compile error (pcre2_compile) but rather just doesn't match anything when calling pcre2_match. Is this by design? Can it be configured to produce an error instead?

In line with #Justinas' comments, I found the answer in the PCRE2 Spec https://www.pcre.org/current/doc/html/pcre2pattern.html#SEC17 :
An opening curly bracket that appears in a position where a quantifier is not allowed, or one that does not match the syntax of a quantifier, is taken as a literal character. For example, {,6} is not a quantifier, but a literal string of four characters.

Related

Postgresql invalid class for [:^punct]

Having read the answer in the Remove all punctuation except apostrophes in R post, I tried to use
'[[:space:]]|[^\/[:^punct:]]'
in REGEXP_REPLACE function, but it gives me
[2201B] ERROR: invalid regular expression: invalid character class
How can I make it work?
The question you link to is tagged with r, where stringr library uses ICU regex flavor that supports POSIX character classes in its own way, not necessarily POSIX compatible.
To match any whitespace or any punctuation but / you may use
[^/[:alnum:]]
It matches any char that is not alphanumeric (and that means it is either a whitespace or punctuation) and not a / char.

How to search and replace using regular expressions in Visual Studio

I need to replace all the urls with empty string:
""regular"": ""http://fonts.gstatic.com/s/abhayalibre/v3/zTLc5Jxv6yvb1nHyqBasVy3USBnSvpkopQaUR-2r7iU.ttf"",
""500"": ""http://fonts.gstatic.com/s/abhayalibre/v3/wBjdF6T34NCo7wQYXgzrc5MQuUSAwdHsY8ov_6tk1oA.ttf"",
""600"": ""http://fonts.gstatic.com/s/abhayalibre/v3/wBjdF6T34NCo7wQYXgzrc2v8CylhIUtwUiYO7Z2wXbE.ttf"",
""700"": ""http://fonts.gstatic.com/s/abhayalibre/v3/wBjdF6T34NCo7wQYXgzrc0D2ttfZwueP-QU272T9-k4.ttf"",
""800"": ""http://fonts.gstatic.com/s/abhayalibre/v3/wBjdF6T34NCo7wQYXgzrc_qsay_1ZmRGmC8pVRdIfAg.ttf""
I've tried using the Regular Expressions with:
"http://fonts(*).ttf"
but i can't see the replace working.
Your mistake is (*), use instead:
http://fonts.+\.ttf
Regular Expression Search and Replace is actually quite well documented.
At the moment you're matching strings that look like this, unless Visual Studio actually fails to parse the expression because of the incorrect usage of *.
http://font).ttf
http://font().ttf
http://font(().ttf
http://font(((().ttf
http://font((((((((((((((((((((((((((((((().ttf
etc
To match any character you could use .*, . being the universal match in Regex, but that will match beyond the closing quotes.
Instead, you can use [^"]+ to match one or more characters except ".
http://font\.[^"]+
Also, note the \. to make sure the regex actually matches the . character, the \ escapes it from being the universal match character.

vim non-greedy unexpected behavior

I'm using vim (version 7.3).
On the following line
1xAxBx4
where A and B can be any alphanumerical character, I want to replace xBx4 with foo. I tried the following substitution command
:s/x.\{-}x4/foo/
and get 1foo instead of what I expected (1xAfoo). I can get 1xAfoo if I use this substitution command
:s/x[^A]x4/foo/
but this is too specific and won't be helpful if I want to replace on multiple lines, as "A" could be a different character on each line.
Why the unexpected behavior with \.{-}? Or is this exactly what one would expect, but I'm just misunderstanding the syntax?
Though you've correctly used the non-greedy \{-} quantifier, because there's no consumption before, it still will start matching at the first x, and then match as few as possible. Because that works, there's no backtracking.
Now, you need to add a greedy match before your expression, yet do not consume those characters. This can be achieved with \zs to let the match only start afterwards:
:s/.*\zsx.\{-}x4/foo/
this is not the use case for "non-greedy".
x.\{-}x4 will make sense for example you want to replace:
xAAAx4BBBx4CCCx4 -> ######BBBx4CCCx4
without the usage of \{-} the result would be ######
if it is known that only one single character between x and x4, you just use x.x4 or if you want to avoid space to be selected, use x\Sx4

Why do I have to escape the final ]

I have a file containing string like this one :
print $hash_xml->{'div'}{'div'}{'div'}[1]...
I want to replace {'div'}{'div'}{'div'}[1] by something else.
So I tried
%s/{'div'}{'div'}{'div'}[1]/by something else/gc
The strings were not found. I though I had to escape the {,},[ and ]
Still string not found.
So I tried to search a single { and it found them.
Then I tried to search {'div'}{'div'}{'div'} and it found it again.
Then {'div'}{'div'}{'div'}[1 was still found.
To find {'div'}{'div'}{'div'}[1]
I had to use %s/{'div'}{'div'}{'div'}[1\]
Why ?
vim 7.3 on Linux
The [] are used in regular expressions to wrap a range of acceptable characters.
When both are supplied unescaped, vim is treating the search string as a regex.
So when you leave it out, or escape the final character, vim cannot interpret a single bracket in a regex context, so does a literal search (basically the best it can do given the search string).
Personally, I would escape the opening and closing square brace to ensure that the meaning is clear.
That's because the [ and ] characters are used to build the search pattern.
See :h pattern and use the help file pattern.txt to try the following experiment:
Searching for the "[9-0]" pattern (without quotes) using /[0-9] will match every digit from 0 to 9 individually (see :h \[)
Now, if you try /\[0-9] or /[0-9\] you will match the whole pattern: a zero, an hyphen and a nine inside square brackets. That's because when you escape one of [ or ] the operator [*] ceases to exist.
Using your search pattern, /{'div'}{'div'}{'div'}[1\] and /{'div'}{'div'}{'div'}\[1] should match the same pattern which is the one you want, while /{'div'}{'div'}{'div'}[1] matches the string {'div'}{'div'}{'div'}1.
In order to avoid being caught by these special characters in regular expressions, you can try using the very magic flag.
E.g.:
:%s/\V{'div'}[1]/replacement/
Notice the \V flag at the beginning of the line.
Because the square brackets mean that vim thinks you're looking for any of the characters inside. This is known as a 'character class'. By escaping either of the square brackets it lets vim know that you're looking for the literal square string ending with '[1]'.
Ideally you should write your expression as:
%s/{'div'}{'div'}{'div'}\[1\]/replacement string/
to ensure that the meaning is completely clear.

Why doesn't Vims errorformat take regular expressions?

Vims errorformat (for parsing compile/build errors) uses an arcane format from c for parsing errors.
Trying to set up an errorformat for nant seems almost impossible, I've tried for many hours and can't get it. I also see from my searches that alot of people seem to be having the same problem. A regex to solve this would take minutesto write.
So why does vim still use this format? It's quite possible that the C parser is faster but that hardly seems relevant for something that happens once every few minutes at most. Is there a good reason or is it just an historical artifact?
It's not that Vim uses an arcane format from C. Rather it uses the ideas from scanf, which is a C function. This means that the string that matches the error message is made up of 3 parts:
whitespace
characters
conversion specifications
Whitespace is your tabs and spaces. Characters are the letters, numbers and other normal stuff. Conversion specifications are sequences that start with a '%' (percent) character. In scanf you would typically match an input string against %d or %f to convert to integers or floats. With Vim's error format, you are searching the input string (error message) for files, lines and other compiler specific information.
If you were using scanf to extract an integer from the string "99 bottles of beer", then you would use:
int i;
scanf("%d bottles of beer", &i); // i would be 99, string read from stdin
Now with Vim's error format it gets a bit trickier but it does try to match more complex patterns easily. Things like multiline error messages, file names, changing directory, etc, etc. One of the examples in the help for errorformat is useful:
1 Error 275
2 line 42
3 column 3
4 ' ' expected after '--'
The appropriate error format string has to look like this:
:set efm=%EError\ %n,%Cline\ %l,%Ccolumn\ %c,%Z%m
Here %E tells Vim that it is the start of a multi-line error message. %n is an error number. %C is the continuation of a multi-line message, with %l being the line number, and %c the column number. %Z marks the end of the multiline message and %m matches the error message that would be shown in the status line. You need to escape spaces with backslashes, which adds a bit of extra weirdness.
While it might initially seem easier with a regex, this mini-language is specifically designed to help with matching compiler errors. It has a lot of shortcuts in there. I mean you don't have to think about things like matching multiple lines, multiple digits, matching path names (just use %f).
Another thought: How would you map numbers to mean line numbers, or strings to mean files or error messages if you were to use just a normal regexp? By group position? That might work, but it wouldn't be very flexible. Another way would be named capture groups, but then this syntax looks a lot like a short hand for that anyway. You can actually use regexp wildcards such as .* - in this language it is written %.%#.
OK, so it is not perfect. But it's not impossible either and makes sense in its own way. Get stuck in, read the help and stop complaining! :-)
I would recommend writing a post-processing filter for your compiler, that uses regular expressions or whatever, and outputs messages in a simple format that is easy to write an errorformat for it. Why learn some new, baroque, single-purpose language unless you have to?
According to :help quickfix,
it is also possible to specify (nearly) any Vim supported regular
expression in format strings.
However, the documentation is confusing and I didn't put much time into verifying how well it works and how useful it is. You would still need to use the scanf-like codes to pull out file names, etc.
They are a pain to work with, but to be clear: you can use regular expressions (mostly).
From the docs:
Pattern matching
The scanf()-like "%*[]" notation is supported for backward-compatibility
with previous versions of Vim. However, it is also possible to specify
(nearly) any Vim supported regular expression in format strings.
Since meta characters of the regular expression language can be part of
ordinary matching strings or file names (and therefore internally have to
be escaped), meta symbols have to be written with leading '%':
%\ The single '\' character. Note that this has to be
escaped ("%\\") in ":set errorformat=" definitions.
%. The single '.' character.
%# The single '*'(!) character.
%^ The single '^' character. Note that this is not
useful, the pattern already matches start of line.
%$ The single '$' character. Note that this is not
useful, the pattern already matches end of line.
%[ The single '[' character for a [] character range.
%~ The single '~' character.
When using character classes in expressions (see |/\i| for an overview),
terms containing the "\+" quantifier can be written in the scanf() "%*"
notation. Example: "%\\d%\\+" ("\d\+", "any number") is equivalent to "%*\\d".
Important note: The \(...\) grouping of sub-matches can not be used in format
specifications because it is reserved for internal conversions.
lol try looking at the actual vim source code sometime. It's a nest of C code so old and obscure you'll think you're on an archaeological dig.
As for why vim uses the C parser, there are plenty of good reasons starting with that it's pretty universal. But the real reason is that sometime in the past 20 years someone wrote it to use the C parser and it works. No one changes what works.
If it doesn't work for you the vim community will tell you to write your own. Stupid open source bastards.

Resources