Searching for an exact match with a singular digit

Searching for an exact match with a singular digit - vim

I'm trying to search for only a singular digit in vim by itself. For example, if there are two sets of digits 1 and 123 and I want to search for 1, I would only want the singular 1 digit to be found.
I have tried using regular expressions like \<1> and \%(a)#

You almost had the right solution. You want:
\<1\>
This is because each angled bracket needs to be escaped. Alternatively, you could use:
\v<1>
The \v flag tells vim to treat more characters as special without needing to be escaped (for example, (){}+<> all become special rather than literal text. Read :h /\v for more on this.
A great reference for learning regex in vim is vimregex.com. The \<\> characters are explained in 4.1 "Anchors".
If you want to match text like 1.23 this is possible too. Two different approaches:
Modify the iskeyword option so that it includes .. This will also affect how w moves
Use \v<1(\d|.)#!, which basically means "a 1 at the beginning of a word, that isn't followed by some other digit or a period."

Related

vim Search Replace should use replaced text in following searches

I have a data file (comma separated) that has a lot of NAs (It was generated by R). I opened the file in vim and tried to replace all the NA values to empty strings.
Here is a sample slimmed down version of a record in the file:
1,1,NA,NA,NA,NATIONAL,NA,1,NANA,1,AMERICANA,1
Once I am done with the search-replace, the intended output should be:
1,1,,,,NATIONAL,,1,NANA,1,AMERICANA,1
In other words, all the NAs should be replaced except the words NATIONAL, NANA and AMERICANA.
I used the following command in vim to do this:
1, $ s/\,NA\,/\,\,/g
But, it doesn't seem to work. Here is the output that I get:
1,1,,NA,,NATIONAL,,1,NANA,1,AMERICANA,1
As you can see, there is one ,NA, that is left out of the replacement process.
Does anyone have a good way to fix it? Thanks.
A trivial solution is to run the same command again and it will take care of the remaining ,NA,. However, it is not a feasible solution because my actual data file has 100s of columns and 500K+ rows each with a variable number of NAs.

, doesn't have a special meaning so you don't have to escape it:
:1,$s/,NA,/,,/g
Which doesn't solve your problem.
You can use % as a shorthand for 1,$:
:%s/,NA,/,,/g
Which doesn't solve your problem either.
The best way to match all those NA words to the exclusion of other words containing NA would be to use word boundaries:
:%s/,\<NA\>,/,,/g
Which still doesn't solve your problem.
Which makes those commas, that you used to restrict the match to NA and that are causing the error, useless:
:%s/\<NA\>//g
See :help :range and :help \<.

Use % instead of 1,$ (% means "the buffer" aka the whole file).
You don't need \,. , works fine.
Vim finds discrete, non-overlapping matches. so in ,NA,NA,NA, it only finds the first ,NA, and third ,NA, as the middle one doesn't have its own separate surrounding ,. We can modify the match to not include certain characters of our regex with \zs (start) and \ze (end). These modify our regex to find matches that are surrounded by other characters, but our matches don't actually include them, so we can match all the NA in ,NA,NA,NA,.
TL;DR: %s/,\zsNA\ze,//g

How to perform following search and replace in vim?

I have the following string in the code at multiple places,
m_cells->a[ Id ]
and I want to replace it with
c(Id)
where the string Id could be anything including numbers also.

A regular expression replace like below should do:
%s/m_cells->a\[\s\(\w\+\)\s\]/c(\1)/g
If you wish to apply the replacement operation on a number of files you could use the :bufdo command.

Full explanation of #BasBossink's answer (as a separate answer because this won't fit in a comment), because regexes are awesome but non-trivial and definitely worth learning:
In Command mode (ie. type : from Normal mode), s/search_term/replacement/ will replace the first occurrence of 'search_term' with 'replacement' on the current line.
The % before the s tells vim to perform the operation on all lines in the document. Any range specification is valid here, eg. 5,10 for lines 5-10.
The g after the last / performs the operation "globally" - all occurrences of 'search_term' on the line or lines, not just the first occurrence.
The "m_cells->a" part of the search term is a literal match. Then it gets interesting.
Many characters have special meaning in a regex, and if you want to use the character literally, without the special meaning, then you have to "escape" it, by putting a \ in front.
Thus \[ and \] match the literal '[' and ']' characters.
Then we have the opposite case: literal characters that we want to treat as special regex entities.
\s matches white*s*pace (space, tab, etc.).
\w matches "*w*ord" characters (letters, digits, and underscore _).
(. matches any character (except a newline). \d matches digits. There are more...)
If a character is not followed by a quantifier, then exactly one such character matches. Thus, \s will match one space or tab, but not fewer or more.
\+ is a quantifier, and means "one or more". (\? matches 0 or 1; * (with no backslash) matches any number: zero or more. Warning: matching on zero occurrences takes a little getting used to; when you're first learning regexes, you don't always get the results you expected. It's also possible to match on an arbitrary exact number or range of occurrences, but I won't get into that here.)
\( and \) work together to form a "capturing group". This means that we don't just want to match on these characters, we also want to remember them specially so that we can do something with them later. You can have any number of capturing groups, and they can be nested too. You can refer to them later by number, starting at 1 (not 0). Just start counting (escaped) left-parantheses from the left to determine the number.
So here, we are matching a space followed by a group (which we will capture) of at least one "word" character followed by a space, within the square brackets.
Then section between the second and third / is the replacement text.
The "c" is literal.
\1 means the first captured group, which in this case will be the "Id".
In summary, we are finding text that matches the given description, capturing part of it, and replacing the entire match with the replacement text that we have constructed.
Perhaps a final suggestion: c after the final / (doesn't matter whether it comes before or after the 'g') enables *c*onfirmation: vim will highlight the characters to be replaced and will show the replacement text and ask whether you want to go ahead. Great for learning.
Yes, regexes are complicated, but super powerful and well worth learning. Once you have them internalized, they're actually fairly easy. I suggest that, as with learning vim itself, you start with the basics, get fluent in them, and then incrementally add new features to your repertoire.
Good luck and have fun.

Substitute `number` with `(number)` in multiple lines

I am a beginner at Vim and I've been reading about substitution but I haven't found an answer to this question.
Let's say I have some numbers in a file like so:
1
2
3
And I want to get:
(1)
(2)
(3)
I think the command should resemble something like :s:\d\+:........ Also, what's the difference between :s/foo/bar and :s:foo:bar ?
Thanks

Here is an alternative, slightly less verbose, solution:
:%s/^\d\+/(&)
Explanation:
^ anchors the pattern to the beginning of the line
\d is the atom that covers 0123456789
\+ matches one or more of the preceding item
& is a shorthand for \0, the whole match

Let me address those in reverse.
First: there's no difference between :s/foo/bar and :s:foo:bar; whatever delimiter you use after the s, vim will expect you to use from then on. This can be nice if you have a substitution involving lots of slashes, for instance.
For the first: to do this to the first number on the current line (assuming no commas, decimal places, etc), you could do
:s:\(\d\+\):(\1)
The \(...\) doesn't change what is matched - rather, it tells vim to remember whatever matched what is inside, and store it. The first \(...\) is stored in \1, the second in \2, etc. So, when you do the replacement, you can reference \1 to get the number back.
If you want to change ALL numbers on the current line, change it to
:s:\(\d\+\):(\1):g
If you want to change ALL numbers on ALL lines, change it to
:%s:\(\d\+\):(\1):g

You can do what you want with:
:%s/\([0-9]\)/(\1)/
%s means global search and replace, that is do the search/replace for every line in the file. the \( \) defines a group, which in turn is referenced by \1. So the above search and replace, finds all lines with a single digit ([0-9]), and replaces it with the matched digit surrounded by parentheses.

replacing part of regex matches

I have several functions that start with get_ in my code:
get_num(...) , get_str(...)
I want to change them to get_*_struct(...).
Can I somehow match the get_* regex and then replace according to the pattern so that:
get_num(...) becomes get_num_struct(...),
get_str(...) becomes get_str_struct(...)
Can you also explain some logic behind it, because the theoretical regex aren't like the ones used in UNIX (or vi, are they different?) and I'm always struggling to figure them out.
This has to be done in the vi editor as this is main work tool.
Thanks!

To transform get_num(...) to get_num_struct(...), you need to capture the correct text in the input. And, you can't put the parentheses in the regular expression because you may need to match pointers to functions too, as in &get_distance, and uses in comments. However, and this depends partially on the fact that you are using vim and partially on how you need to keep the entire input together, I have checked that this works:
%s/get_\w\+/&_struct/g
On every line, find every expression starting with get_ and continuing with at least one letter, number, or underscore, and replace it with the entire matched string followed by _struct.
Darn it; I shouldn't answer these things on spec. Note that other regex engines might use \& instead of &. This depends on having magic set, which is default in vim.

For an alternate way to do it:
%s/get_\(\w*\)(/get_\1_struct(/g
What this does:
\w matches to any "word character"; \w* matches 0 or more word characters.
\(...\) tells vim to remember whatever matches .... So, \(w*\) means "match any number of word characters, and remember what you matched. You can then access it in the replacement with \1 (or \2 for the second, etc.)
So, the overall pattern get_\(\w*\)( looks for get_, followed by any number of word chars, followed by (.
The replacement then just does exactly what you want.
(Sorry if that was too verbose - not sure how comfortable you are with vim regex.)

How to repeat a substitution the number of times the search word occurs in a row in a substitution command in Vim?

I would like to use tabs in a code that doesn’t use them. What I did until now to implement tabs was pretty handcrafty:
:%s/^ /\t/g
:%s/^\t /\t\t/g
. . .
Question: Is there a way to replace two spaces ( ) by tab (\t) the number of times it was found at the beginning of a line?

There are (at least) three substitution techniques relevant to this case.
1. The first one takes advantage of the preceding-atom matching
syntax to naturally define a step of indentation. According to the
question statement, an indent step is a pair of adjacent space
characters preceded with nothing but spaces from the beginning
of line. Following this definition, one can construct the actual
substitution pattern, right to left:
:%s/\%(^ *\)\#<= /\t/g
Indeed, the pattern designates an occurrence of two literal space
characters, but only when they are preceded by a zero-width match
of the atom just before \#<=, which is the pattern ^ * wrapped in
grouping parentheses \%(, \). These non-capturing parentheses are
used instead of the usual capturing ones, \(, \), since there is no
need in further referring to the matched string of leading spaces. Due
to the g flag, the above :substitute command runs through the
leading spaces pair by pair, and replaces each of them by single tab
character.
2. The second technique takes a different approach. Instead of
matching separate indent levels, one can break each of the lines
starting with space characters down into two lines: one containing
the indenting spaces of the original line, and another holding the
rest of it. After that, it is straightforward to replace all of the pairs
of spaces on the first line, and concatenate the lines back together:
:g/^ /s/^ \+/&\r/|-s/ /\t/g|j!
3. The third idea is to process leading spaces by means of Vim
scripting language. A convenient way of doing that is to use the
substitute with an expression feature of the :substitute command
(see :help sub-replace-\=). When started with \=, the substitute
string of the command enables to substitute the matches of a pattern
with results of evaluation of the expression specified after \=:
:%s#^ \+#\=repeat("\t",len(submatch(0))/2)

If you specifically want to convert spaces into tabs (or vice-versa) at the start of a line, there's the useful :retab command which takes care of that. For example:
:retab! 2 will convert spaces in groups of two to tabs
:set expandtab and then :retab! 2 will convert tabstops (of width 2) back to spaces
See :h :retab (and :h 'ts') for the details.
This is not a general solution for the original problem, but I think it covers the most common use case.

There is no general way of doing this using :s regex's. You can't make the /g modifier look backwards otherwise it'd be unusable, and you can't reliably check that you're at the beginning of the line without looking backwards.
The only way of doing it generally is to loop, like so:
:for i in range(100)
: %s/^\t*\zs /\t/e
:endfor
Which is ugly, slow and highly unrecommended. Use :retab

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string