vim non-greedy unexpected behavior

vim non-greedy unexpected behavior - vim

I'm using vim (version 7.3).
On the following line
1xAxBx4
where A and B can be any alphanumerical character, I want to replace xBx4 with foo. I tried the following substitution command
:s/x.\{-}x4/foo/
and get 1foo instead of what I expected (1xAfoo). I can get 1xAfoo if I use this substitution command
:s/x[^A]x4/foo/
but this is too specific and won't be helpful if I want to replace on multiple lines, as "A" could be a different character on each line.
Why the unexpected behavior with \.{-}? Or is this exactly what one would expect, but I'm just misunderstanding the syntax?

Though you've correctly used the non-greedy \{-} quantifier, because there's no consumption before, it still will start matching at the first x, and then match as few as possible. Because that works, there's no backtracking.
Now, you need to add a greedy match before your expression, yet do not consume those characters. This can be achieved with \zs to let the match only start afterwards:
:s/.*\zsx.\{-}x4/foo/

this is not the use case for "non-greedy".
x.\{-}x4 will make sense for example you want to replace:
xAAAx4BBBx4CCCx4 -> ######BBBx4CCCx4
without the usage of \{-} the result would be ######
if it is known that only one single character between x and x4, you just use x.x4 or if you want to avoid space to be selected, use x\Sx4

Related

Find and replace only part of a single line in Vim

Most substitution commands in vim perform an action on a full line or a set of lines, but I would like to only do this on part of a line (either from the cursor to end of the line or between set marks).
example
this_is_a_sentence_that_has_underscores = this_is_a_sentence_that_should_not_have_underscores
into
this_is_a_sentence_that_has_underscores = this is a sentence that should not have underscores
This task is very easy to do for the whole line :s/_/ /g, but seems to be much more difficult to only perform the replacement for anything after the =.
Can :substitution perform an action on half of a line?

Two solutions I can think of.
Option one, use the before/after column match atoms \%>123c and \%<456c.
In your example, the following command substitutes underscores only in the second word, between columns 42 and 94:
:s/\%>42c_\%<94c/ /g
Option two, use the Visual area match atom \%V.
In your example, Visual-select the second long word, leave Visual mode, then execute the following substitution:
:s/\%V_/ /g
These regular expression atoms are documented at :h /\%c and :h /\%V respectively.

Look-around
There is a big clue your post already:
only perform the replacement for anything after the =.
This often means using a positive look-behind, \#<=.
:%s/\(=.*\)\#<=_/ /g
This means match all _ that are after the following pattern =.*. Since all look-arounds (look-aheads and look-behinds) are zero width they do not take up space in the match and the replacement is simple.
Note: This is equivalent to (?<=...) in perl speak. See :h perl-patterns.
What about \zs?
\zs will set the start of a match at a certain point. On the face this sounds exactly what is needed. However \zs will not work correctly as it matches the pattern before the \zs first then the following pattern. This means there will only be one match. Look-behinds on the other hand match the part after \#<= then "look behind" to make sure the match is valid which makes it great for multiple replacement scenario.
It should be noted that if you can use \zs not only is it easy to type but it is also more efficient.
Note: \zs is like \K in perl speak.
More ways?!?
As #glts mentioned you can use other zero-width atoms to basically "anchor" your pattern. A list of a few common ways:
\%>a - after the 'a mark
\%V - match inside the visual area
\%>42c - match after column 42
The possible downside of using one of these methods they need you to set marks or count columns. There is nothing wrong with this but it means the substitution will maybe affected by side-effects so repeating the substitution may not work correctly.
For more help see:
:h /\#<=
:h /zero-width
:h perl-patterns
:h /\zs

What does \#<= and \#= mean in Vim command?

Can't understand \#<= and \#= Benoit's answer of this post, anyone can help explain them?

From vim documentation for patterns
\#= Matches the preceding atom with zero width. {not in Vi}
Like "(?=pattern)" in Perl.
Example matches
foo\(bar\)\#= "foo" in "foobar"
foo\(bar\)\#=foo nothing
*/zero-width*
When using "\#=" (or "^", "$", "\<", "\>") no characters are included
in the match. These items are only used to check if a match can be
made. This can be tricky, because a match with following items will
be done in the same position. The last example above will not match
"foobarfoo", because it tries match "foo" in the same position where
"bar" matched.
Note that using "\&" works the same as using "\#=": "foo\&.." is the
same as "\(foo\)\#=..". But using "\&" is easier, you don't need the
braces.
\#<= Matches with zero width if the preceding atom matches just before what
follows. |/zero-width| {not in Vi}
Like '(?<=pattern)" in Perl, but Vim allows non-fixed-width patterns.
Example matches
\(an\_s\+\)\#<=file "file" after "an" and white space or an
end-of-line
For speed it's often much better to avoid this multi. Try using "\zs"
instead |/\zs|. To match the same as the above example:
an\_s\+\zsfile
"\#<=" and "\#<!" check for matches just before what follows.
Theoretically these matches could start anywhere before this position.
But to limit the time needed, only the line where what follows matches
is searched, and one line before that (if there is one). This should
be sufficient to match most things and not be too slow.
The part of the pattern after "\#<=" and "\#<!" are checked for a
match first, thus things like "\1" don't work to reference \(\) inside
the preceding atom. It does work the other way around:
Example matches
\1\#<=,\([a-z]\+\) ",abc" in "abc,abc"

replacing part of regex matches

I have several functions that start with get_ in my code:
get_num(...) , get_str(...)
I want to change them to get_*_struct(...).
Can I somehow match the get_* regex and then replace according to the pattern so that:
get_num(...) becomes get_num_struct(...),
get_str(...) becomes get_str_struct(...)
Can you also explain some logic behind it, because the theoretical regex aren't like the ones used in UNIX (or vi, are they different?) and I'm always struggling to figure them out.
This has to be done in the vi editor as this is main work tool.
Thanks!

To transform get_num(...) to get_num_struct(...), you need to capture the correct text in the input. And, you can't put the parentheses in the regular expression because you may need to match pointers to functions too, as in &get_distance, and uses in comments. However, and this depends partially on the fact that you are using vim and partially on how you need to keep the entire input together, I have checked that this works:
%s/get_\w\+/&_struct/g
On every line, find every expression starting with get_ and continuing with at least one letter, number, or underscore, and replace it with the entire matched string followed by _struct.
Darn it; I shouldn't answer these things on spec. Note that other regex engines might use \& instead of &. This depends on having magic set, which is default in vim.

For an alternate way to do it:
%s/get_\(\w*\)(/get_\1_struct(/g
What this does:
\w matches to any "word character"; \w* matches 0 or more word characters.
\(...\) tells vim to remember whatever matches .... So, \(w*\) means "match any number of word characters, and remember what you matched. You can then access it in the replacement with \1 (or \2 for the second, etc.)
So, the overall pattern get_\(\w*\)( looks for get_, followed by any number of word chars, followed by (.
The replacement then just does exactly what you want.
(Sorry if that was too verbose - not sure how comfortable you are with vim regex.)

How to repeat a substitution the number of times the search word occurs in a row in a substitution command in Vim?

I would like to use tabs in a code that doesn’t use them. What I did until now to implement tabs was pretty handcrafty:
:%s/^ /\t/g
:%s/^\t /\t\t/g
. . .
Question: Is there a way to replace two spaces ( ) by tab (\t) the number of times it was found at the beginning of a line?

There are (at least) three substitution techniques relevant to this case.
1. The first one takes advantage of the preceding-atom matching
syntax to naturally define a step of indentation. According to the
question statement, an indent step is a pair of adjacent space
characters preceded with nothing but spaces from the beginning
of line. Following this definition, one can construct the actual
substitution pattern, right to left:
:%s/\%(^ *\)\#<= /\t/g
Indeed, the pattern designates an occurrence of two literal space
characters, but only when they are preceded by a zero-width match
of the atom just before \#<=, which is the pattern ^ * wrapped in
grouping parentheses \%(, \). These non-capturing parentheses are
used instead of the usual capturing ones, \(, \), since there is no
need in further referring to the matched string of leading spaces. Due
to the g flag, the above :substitute command runs through the
leading spaces pair by pair, and replaces each of them by single tab
character.
2. The second technique takes a different approach. Instead of
matching separate indent levels, one can break each of the lines
starting with space characters down into two lines: one containing
the indenting spaces of the original line, and another holding the
rest of it. After that, it is straightforward to replace all of the pairs
of spaces on the first line, and concatenate the lines back together:
:g/^ /s/^ \+/&\r/|-s/ /\t/g|j!
3. The third idea is to process leading spaces by means of Vim
scripting language. A convenient way of doing that is to use the
substitute with an expression feature of the :substitute command
(see :help sub-replace-\=). When started with \=, the substitute
string of the command enables to substitute the matches of a pattern
with results of evaluation of the expression specified after \=:
:%s#^ \+#\=repeat("\t",len(submatch(0))/2)

If you specifically want to convert spaces into tabs (or vice-versa) at the start of a line, there's the useful :retab command which takes care of that. For example:
:retab! 2 will convert spaces in groups of two to tabs
:set expandtab and then :retab! 2 will convert tabstops (of width 2) back to spaces
See :h :retab (and :h 'ts') for the details.
This is not a general solution for the original problem, but I think it covers the most common use case.

There is no general way of doing this using :s regex's. You can't make the /g modifier look backwards otherwise it'd be unusable, and you can't reliably check that you're at the beginning of the line without looking backwards.
The only way of doing it generally is to loop, like so:
:for i in range(100)
: %s/^\t*\zs /\t/e
:endfor
Which is ugly, slow and highly unrecommended. Use :retab

Find first non-matching line in VIM

It happens sometimes that I have to look into various log and trace files on Windows and generally I use for the purpose VIM.
My problem though is that I still can't find any analog of grep -v inside of VIM: find in the buffer a line not matching given regular expression. E.g. log file is filled with lines which somewhere in a middle contain phrase all is ok and I need to find first line which doesn't contain all is ok.
I can write a custom function for that, yet at the moment that seems to be an overkill and likely to be slower than a native solution.
Is there any easy way to do it in VIM?

I believe if you simply want to have your cursor end up at the first non-matching line you can use visual as the command in your global command. So:
:v/pattern/visual
will leave your cursor at the first non-matching line. Or:
:g/pattern/visual
will leave your cursor at the first matching line.

you can use negative look-behind operator #<!
e.g. to find all lines not containing "a", use /\v^.+(^.*a.*$)#<!$
(\v just causes some operators like ( and #<! not to must have been backslash escaped)
the simpler method is to delete all lines matching or not matching the pattern (:g/PATTERN/d or :g!/PATTERN/d respectively)

I'm often in your case, so to "clean" the logs files I use :
:g/all is ok/d
Your grep -v can be achieved with
:v/error/d
Which will remove all lines which does not contain error.

It's probably already too late, but I think that this should be said somewhere.
Vim (since version about 7.4) comes with a plugin called LogiPat, which makes searching for lines which don't contain some string really easy. So using this plugin finding the lines not containing all is ok is done like this:
:LogiPat !"all is ok"
And then you can jump between the matching (or in this case not matching) lines with n and N.
You can also use logical operations like & and | to join different strings in one pattern:
:LP !("foo"|"bar")&"baz"
LP is shorthand for LogiPat, and this command will search for lines that contain the word baz and don't contain neither foo nor bar.

I just managed a somewhat klutzy procedure using the "g" command:
:%g!/search/p
This says to print out the non-matching lines... not sure if that worked, but it did end up with the cursor positioned on the first non-matching line.
(substitute some other string for "search", of course)

You can search with following line and press n to jump to the first non-matching line
^\(.*all is ok\)\#!.*$
Breakdown of operators:
^ -> means start of the line
\( and \) -> To match a whole string multiple times, it must be grouped into one item. This is done by putting "\(" before it and "\)" after it.
\#! -> Matches with zero width if the preceding atom does NOT match at the current position.
.* -> Matches any character repeated 1 or more times
$ -> end of the line
Here is sample animation how it works. For simplicity I searched for word apple.

You can iterate through the non-matches using g and a null substitution:
:g!/pattern/s/^//c
If you reply "n" each time you wont even mark the file as changed.
You need ctrl-C to escape from the circle (or keep going to bottom of file).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string