Is this a bug in Vim's string expression evaluation? - vim

I was debugging a problem with a plugin running in Vim73 on Arch Linux and it seems to stem from an error in string expression evaluation.
In this Vim installation the expression 'xxx' > '' evaluates to 0 (false) while in all other Vims I've seen the expression evaluates (as it should) to 1 (true).
Does anyone know the explanation for this? The Arch Linux Vim was not compiled with lots of features built in, but could there really be some feature that changes the evaluation of string expressions?
Is there some Vim setting (encoding?) that might have changed the result of this string comparison? It was a plain-jane install of Vim (nothing of note in vimrc) giving the bad result, didn't see anwhere a setting could have been changed even if there is some setting that affects this result.
Thanks for any info.
UPDATE:
It turns out this problem was caused by a bug in the string comparison function in recent version of 64-bit Vim when the Vim flag 'ignorecase' is set. A non-empty string should be greater than an empty string regardless of whether case is ignored, but Vim was returning false. Bug report is here:
http://groups.google.com/group/vim_dev/browse_thread/thread/313bc7c46a19cd40
Workarounds would be: (1) use comparison operator that forces 'matchcase' comparison, e.g., mystring_var ># '' or (2) use !empty(mystring_var) .

To know the answer for this question you have to take a look at the documentation. Here is a quote of the *41.4* Conditionals section:
The logic operators work both for numbers and strings. When comparing two
strings, the mathematical difference is used. This compares byte values,
which may not be right for some languages.
When comparing a string with a number, the string is first converted to a
number. This is a bit tricky, because when a string doesn't look like a
number, the number zero is used. Example:
:if 0 == "one"
: echo "yes"
:endif
This will echo "yes", because "one" doesn't look like a number, thus it is
converted to the number zero.
Apparently, vim does not guarantee the result for the operation you are trying to perform and you shouldn't rely on it. If you want to compare the length of the strings, take a look at *strlen()*.

Related

Regex for scientific number and hex and decimal numbers for vim produces error while works for Perl?

I was trying to learn how to write vim plugins and required to match numbers which a language would allow to work with and trying to highlight it with different colors and wrote following regex in very magic mode:
syntax match cNumberGroup "\v\d+"
syntax match cNumberGroup "\v0x\x+"
syntax match cNumberGroup "\v[+\-]?(?:0|[1-9]\d*)(?:\.\d*)?(?:[eE][+\-]?\d+)?"
The first one is for decimal numbers. The second one for hex numbers
The third one is for scientific numbers representations.
The first two are working fine but last one is not working. I want to know that why it is not working and also that, Is there a better way to write all the regex to cover all the three number systems efficiently.
Thanks.
Just slapping \v (very magic) in from of the regular expression doesn't make Vim's regular expression syntax Perl-compatible. As #Carpetsmoker has already commented, :help perl-patterns shows the differences.
For your example, the (?:...) has to be written as \v%(...):
syntax match cNumberGroup "\v[+\-]?%(0|[1-9]\d*)%(\.\d*)?%([eE][+\-]?\d+)?"

Character Encoding interferes with matching Scala strings?

Right now dealing with a weird problem when trying to match two Scala strings. When trying to determine if the following two strings are the same:
SM8lz5IEIWs7TUhR3ke27pnY3XsjojxqaMEg+ARCGs1nm3sVkwA+CM+XJfdsUxqzqH7LZdkflvny
z621tYkmXA== and SM8lz5IEIWs7TUhR3ke27pnY3XsjojxqaMEg+ARCGs1nm3sVkwA+CM+XJfdsUxqzqH7LZdkflvny
z621tYkmXA==
Scala returns false. So if I do the following if(hash1 == hash2) it returns false.
I suspect this is either a whitespace or character encoding issue, since hash matching only fails when trying to match a hash that was produced on a computer of a different operating system. I already tried stripping whitespace using regex, but it still failed.
What have I overlooked? And are there better ways to clean and match hashes in Scala?
Update
After comparing the two strings, Scala thinks hash2 is a single character longer than hash1. So I ran the following functions on both hashes: .trim.replaceAll("""(?m)\s+$""", ""). Still, it says they're not the same. What other characters could be interfering?
I have found the cause of this particular problem. Apparently when processing strings on Macintosh, \r is added in addition to any line breaks. Even though line break characters don't print out on a console, they're still inside the string.
The remedy was to do the following: .trim.replaceAll("\r", "")
And now both strings match.

Using Vim, how do you use a variable to store count of patterns found?

This question was helpful for getting a count of a certain pattern in Vim, but it would be useful to me to store the count and sum the results so I can echo a concise summary.
I'm teaching a class on basic HTML to some high schoolers, and I'm using this script to be quickly check numbers of required elements throughout all their pages without leaving Vim. It works fine, but when students have more than 10 .html files it gets cumbersome to add up the various sections by hand.
Something like:
img_sum = :bufdo %s/<img>//gen
would be nice. I think I'll write a ruby script to check the pages more thoroughly and check for structure, but for now I'm curious about how to do this in Vim.
The problem can be solved by a counter separate from the one built-in into the
:substitute command: Use Vim-script variable to hold the number of pattern
matches. A convenient way to register every match and modify a particular
variable accordingly, is to take advantage of the substitute with an
expression feature of the :substitute command (see :help sub-replace-\=).
The idea is to use a substitution that evaluates an expression increasing
a counter on every occurrence, and does not change the text it is operating
on.
The first part of the technique cannot be implemented straightforwardly
because it is forbidden to use Ex commands in expressions (including \=
substitute expressions), and therefore it is not possible to use the :let
command to modify a variable. Answering the question "gVim find/replace
with counter", I have proposed a simple trick to overcome that limitation,
which is based on using a single-item list (or dictionary containing a single
key-value pair). Since the map() function transforms a list or a dictionary
in place, that only item could be changed in a constrained expression context.
To do that, one should call the map() function passing an expression
evaluating to the new value along with the list containing the current value.
The second half of the technique is how to avoid changing text when using
a substitution command. In order to achieve that, one can make the pattern
have zero-width by prepending \ze or by appending \zs atoms to it (see
:help /\zs, :help /\ze). In such a way, the modified pattern captures
a string of zero width just before or after the occurrence of the initial
pattern. So, if the replacement text is also empty, substitution does not
cause any change in the contents of a buffer. To make the substitute
expression evaluate to an empty string, one can just extract an empty
substring or sublist from the resulting value of that expression.
The two ideas are put into action in the following command.
:let n=[0] | bufdo %s/pattern\zs/\=map(n,'v:val+1')[1:]/ge
I think that answer above is hard to understand and more pretty way to use external command grep like this:
:let found=0
:bufdo let found=found+(system('grep "<p>" '.expand('%:p') . '| wc -l'))
:echo found

Replacing the equality operator (==) with the identity operator (===) in Vim

I want to change all occurrences of == to ===, but issuing the command :%s/==/===/g would convert existing identity operators from === to ====. I tried using the command :%s/\<==\>/===/g, but no pattern matches are found.
The informal definition of the pattern given in the question could be
read as “two equal signs in a row neither preceded with nor followed
by an equal sign”.
A natural way to transform this verbal description to a concise Vim
regular expression is to use the \#<! and \#! zero-width assertions
(see :help /multi for an overview). The former allows to leave out
the occurrences starting with a certain pattern (see :help /\#<!).
The latter makes it possible to ignore the occurrences ending with an
ineligible pattern (see :help /\#!).
:%s/=\#<!===\#!/&=/g
As far as I can guess from the history of your questions and answers
on StackOverflow, the substitution is probably to be performed in
JavaScript or PHP source code. Since both languages have the !==
inequality operator as well as the === equality one, == in the
former is also the subject of replacement as you describe it in the
question. If this behavior is undesirable, modify the substitution
command above, as follows:
:%s/[!=]\#<!===\#!/&=/g
kind of ugly, but :%s/\([^=]\)==\([^=]\)/\1===\2/g
EDIT
Turned out that ib. has far better solution so look at his answer. That takes care of the cases of !== etc. So ignore this one and look at his.
Consider adding the c flag on the sed command: %s/==/===/gc
This will allow you to choose every time with y or n. Of course if there are a lot then this might not be the best plan.

Why doesn't Vims errorformat take regular expressions?

Vims errorformat (for parsing compile/build errors) uses an arcane format from c for parsing errors.
Trying to set up an errorformat for nant seems almost impossible, I've tried for many hours and can't get it. I also see from my searches that alot of people seem to be having the same problem. A regex to solve this would take minutesto write.
So why does vim still use this format? It's quite possible that the C parser is faster but that hardly seems relevant for something that happens once every few minutes at most. Is there a good reason or is it just an historical artifact?
It's not that Vim uses an arcane format from C. Rather it uses the ideas from scanf, which is a C function. This means that the string that matches the error message is made up of 3 parts:
whitespace
characters
conversion specifications
Whitespace is your tabs and spaces. Characters are the letters, numbers and other normal stuff. Conversion specifications are sequences that start with a '%' (percent) character. In scanf you would typically match an input string against %d or %f to convert to integers or floats. With Vim's error format, you are searching the input string (error message) for files, lines and other compiler specific information.
If you were using scanf to extract an integer from the string "99 bottles of beer", then you would use:
int i;
scanf("%d bottles of beer", &i); // i would be 99, string read from stdin
Now with Vim's error format it gets a bit trickier but it does try to match more complex patterns easily. Things like multiline error messages, file names, changing directory, etc, etc. One of the examples in the help for errorformat is useful:
1 Error 275
2 line 42
3 column 3
4 ' ' expected after '--'
The appropriate error format string has to look like this:
:set efm=%EError\ %n,%Cline\ %l,%Ccolumn\ %c,%Z%m
Here %E tells Vim that it is the start of a multi-line error message. %n is an error number. %C is the continuation of a multi-line message, with %l being the line number, and %c the column number. %Z marks the end of the multiline message and %m matches the error message that would be shown in the status line. You need to escape spaces with backslashes, which adds a bit of extra weirdness.
While it might initially seem easier with a regex, this mini-language is specifically designed to help with matching compiler errors. It has a lot of shortcuts in there. I mean you don't have to think about things like matching multiple lines, multiple digits, matching path names (just use %f).
Another thought: How would you map numbers to mean line numbers, or strings to mean files or error messages if you were to use just a normal regexp? By group position? That might work, but it wouldn't be very flexible. Another way would be named capture groups, but then this syntax looks a lot like a short hand for that anyway. You can actually use regexp wildcards such as .* - in this language it is written %.%#.
OK, so it is not perfect. But it's not impossible either and makes sense in its own way. Get stuck in, read the help and stop complaining! :-)
I would recommend writing a post-processing filter for your compiler, that uses regular expressions or whatever, and outputs messages in a simple format that is easy to write an errorformat for it. Why learn some new, baroque, single-purpose language unless you have to?
According to :help quickfix,
it is also possible to specify (nearly) any Vim supported regular
expression in format strings.
However, the documentation is confusing and I didn't put much time into verifying how well it works and how useful it is. You would still need to use the scanf-like codes to pull out file names, etc.
They are a pain to work with, but to be clear: you can use regular expressions (mostly).
From the docs:
Pattern matching
The scanf()-like "%*[]" notation is supported for backward-compatibility
with previous versions of Vim. However, it is also possible to specify
(nearly) any Vim supported regular expression in format strings.
Since meta characters of the regular expression language can be part of
ordinary matching strings or file names (and therefore internally have to
be escaped), meta symbols have to be written with leading '%':
%\ The single '\' character. Note that this has to be
escaped ("%\\") in ":set errorformat=" definitions.
%. The single '.' character.
%# The single '*'(!) character.
%^ The single '^' character. Note that this is not
useful, the pattern already matches start of line.
%$ The single '$' character. Note that this is not
useful, the pattern already matches end of line.
%[ The single '[' character for a [] character range.
%~ The single '~' character.
When using character classes in expressions (see |/\i| for an overview),
terms containing the "\+" quantifier can be written in the scanf() "%*"
notation. Example: "%\\d%\\+" ("\d\+", "any number") is equivalent to "%*\\d".
Important note: The \(...\) grouping of sub-matches can not be used in format
specifications because it is reserved for internal conversions.
lol try looking at the actual vim source code sometime. It's a nest of C code so old and obscure you'll think you're on an archaeological dig.
As for why vim uses the C parser, there are plenty of good reasons starting with that it's pretty universal. But the real reason is that sometime in the past 20 years someone wrote it to use the C parser and it works. No one changes what works.
If it doesn't work for you the vim community will tell you to write your own. Stupid open source bastards.

Resources