How to search for multiple overlapping occurrences of same word on the same line? - vim

I want to search for all occurrences of a word on the same line as well as multiple files within a given file. For example:
ABCCG*CAT*AD*CAT*TT
DFGBBB*CAT*YYUAB
Manually searching for the word 'CAT' I found two when using /CAT, when in fact there are three occurrences of that word in the file.
What is the command to find all occurrences of a given word in a file irrespective of the fact that it may occur multiple times within a line?
Note: There are no * in the file. I have used it in the example above to denote the positions of the string CAT.
What if the multiple occurrences were to overlap on the same line? For example:
ABCCG*TNTNT*ADCATDD
DFGBBB*TNT*YYUAB
Searching for the word TNT using :%s/TNT//gn would still give me 2, when in fact there are three occurrences.
Is there a way to identify overlapping occurrences in the same line using Vim?

To get a count of the total number of all matches of an item—including ”overlapping” string cases, you actually need to use the %s command (long form: %substitute) and tell it three things:
do not actually perform the substitution (n flag; in this case, a mnemonic for “noop” I guess)
consider multiple matches on the same line to be separate matches (g flag for “global“)
do a “non-greedy“ match (\{-}; somewhat arcane but worth reading up on; see below)
Putting all that together, here's what it looks like:
:%s/[T]\{-}NT//gn
So, given the following text from the question:
ABCCG*TNTNT*ADCATDD
DFGBBB*TNT*YYUAB
…vim will then report this:
3 matches on 2 lines
If/when you do actually want a count of just the number of matching lines, you can omit the g and vim will use its default of reporting a count just for the number lines that contain a match. And if you don’t want to count “overlapping” strings, then omit the \{-} part.
The vim docs actually have very good info about this stuff.
For more help on counting items in vim, see :help count-items:
Counting words, lines, etc. count-items
To count how often any pattern occurs in the current buffer use the substitute
command and add the 'n' flag to avoid the substitution. The reported number
of substitutions is the number of items. Examples:
:%s/./&/gn characters
:%s/\i\+/&/gn words
:%s/^//n lines
:%s/the/&/gn "the" anywhere
:%s/\<the\>/&/gn "the" as a word
You might want to reset 'hlsearch' or do ":nohlsearch".
Add the 'e' flag if you don't want an error when there are no matches.
And for more help with doing “non-greedy“ matching, see :help non-greedy:
non-greedy
If a "-" appears immediately after the "{", then a shortest match
first algorithm is used (see example below). In particular, "\{-}" is
the same as "*" but uses the shortest match first algorithm. BUT: A
match that starts earlier is preferred over a shorter match: "a\{-}b"
matches "aaab" in "xaaab".
Example matches
ab\{2,3}c "abbc" or "abbbc"
a\{5} "aaaaa"
ab\{2,}c "abbc", "abbbc", "abbbbc", etc.
ab\{,3}c "ac", "abc", "abbc" or "abbbc"
a[bc]\{3}d "abbbd", "abbcd", "acbcd", "acccd", etc.
a\(bc\)\{1,2}d "abcd" or "abcbcd"
a[bc]\{-}[cd] "abc" in "abcd"
a[bc]*[cd] "abcd" in "abcd"
The } may optionally be preceded with a backslash: \{n,m\}.

Related

vim Search Replace should use replaced text in following searches

I have a data file (comma separated) that has a lot of NAs (It was generated by R). I opened the file in vim and tried to replace all the NA values to empty strings.
Here is a sample slimmed down version of a record in the file:
1,1,NA,NA,NA,NATIONAL,NA,1,NANA,1,AMERICANA,1
Once I am done with the search-replace, the intended output should be:
1,1,,,,NATIONAL,,1,NANA,1,AMERICANA,1
In other words, all the NAs should be replaced except the words NATIONAL, NANA and AMERICANA.
I used the following command in vim to do this:
1, $ s/\,NA\,/\,\,/g
But, it doesn't seem to work. Here is the output that I get:
1,1,,NA,,NATIONAL,,1,NANA,1,AMERICANA,1
As you can see, there is one ,NA, that is left out of the replacement process.
Does anyone have a good way to fix it? Thanks.
A trivial solution is to run the same command again and it will take care of the remaining ,NA,. However, it is not a feasible solution because my actual data file has 100s of columns and 500K+ rows each with a variable number of NAs.
, doesn't have a special meaning so you don't have to escape it:
:1,$s/,NA,/,,/g
Which doesn't solve your problem.
You can use % as a shorthand for 1,$:
:%s/,NA,/,,/g
Which doesn't solve your problem either.
The best way to match all those NA words to the exclusion of other words containing NA would be to use word boundaries:
:%s/,\<NA\>,/,,/g
Which still doesn't solve your problem.
Which makes those commas, that you used to restrict the match to NA and that are causing the error, useless:
:%s/\<NA\>//g
See :help :range and :help \<.
Use % instead of 1,$ (% means "the buffer" aka the whole file).
You don't need \,. , works fine.
Vim finds discrete, non-overlapping matches. so in ,NA,NA,NA, it only finds the first ,NA, and third ,NA, as the middle one doesn't have its own separate surrounding ,. We can modify the match to not include certain characters of our regex with \zs (start) and \ze (end). These modify our regex to find matches that are surrounded by other characters, but our matches don't actually include them, so we can match all the NA in ,NA,NA,NA,.
TL;DR: %s/,\zsNA\ze,//g

Searching for an exact match with a singular digit

I'm trying to search for only a singular digit in vim by itself. For example, if there are two sets of digits 1 and 123 and I want to search for 1, I would only want the singular 1 digit to be found.
I have tried using regular expressions like \<1> and \%(a)#
You almost had the right solution. You want:
\<1\>
This is because each angled bracket needs to be escaped. Alternatively, you could use:
\v<1>
The \v flag tells vim to treat more characters as special without needing to be escaped (for example, (){}+<> all become special rather than literal text. Read :h /\v for more on this.
A great reference for learning regex in vim is vimregex.com. The \<\> characters are explained in 4.1 "Anchors".
If you want to match text like 1.23 this is possible too. Two different approaches:
Modify the iskeyword option so that it includes .. This will also affect how w moves
Use \v<1(\d|.)#!, which basically means "a 1 at the beginning of a word, that isn't followed by some other digit or a period."

How to perform following search and replace in vim?

I have the following string in the code at multiple places,
m_cells->a[ Id ]
and I want to replace it with
c(Id)
where the string Id could be anything including numbers also.
A regular expression replace like below should do:
%s/m_cells->a\[\s\(\w\+\)\s\]/c(\1)/g
If you wish to apply the replacement operation on a number of files you could use the :bufdo command.
Full explanation of #BasBossink's answer (as a separate answer because this won't fit in a comment), because regexes are awesome but non-trivial and definitely worth learning:
In Command mode (ie. type : from Normal mode), s/search_term/replacement/ will replace the first occurrence of 'search_term' with 'replacement' on the current line.
The % before the s tells vim to perform the operation on all lines in the document. Any range specification is valid here, eg. 5,10 for lines 5-10.
The g after the last / performs the operation "globally" - all occurrences of 'search_term' on the line or lines, not just the first occurrence.
The "m_cells->a" part of the search term is a literal match. Then it gets interesting.
Many characters have special meaning in a regex, and if you want to use the character literally, without the special meaning, then you have to "escape" it, by putting a \ in front.
Thus \[ and \] match the literal '[' and ']' characters.
Then we have the opposite case: literal characters that we want to treat as special regex entities.
\s matches white*s*pace (space, tab, etc.).
\w matches "*w*ord" characters (letters, digits, and underscore _).
(. matches any character (except a newline). \d matches digits. There are more...)
If a character is not followed by a quantifier, then exactly one such character matches. Thus, \s will match one space or tab, but not fewer or more.
\+ is a quantifier, and means "one or more". (\? matches 0 or 1; * (with no backslash) matches any number: zero or more. Warning: matching on zero occurrences takes a little getting used to; when you're first learning regexes, you don't always get the results you expected. It's also possible to match on an arbitrary exact number or range of occurrences, but I won't get into that here.)
\( and \) work together to form a "capturing group". This means that we don't just want to match on these characters, we also want to remember them specially so that we can do something with them later. You can have any number of capturing groups, and they can be nested too. You can refer to them later by number, starting at 1 (not 0). Just start counting (escaped) left-parantheses from the left to determine the number.
So here, we are matching a space followed by a group (which we will capture) of at least one "word" character followed by a space, within the square brackets.
Then section between the second and third / is the replacement text.
The "c" is literal.
\1 means the first captured group, which in this case will be the "Id".
In summary, we are finding text that matches the given description, capturing part of it, and replacing the entire match with the replacement text that we have constructed.
Perhaps a final suggestion: c after the final / (doesn't matter whether it comes before or after the 'g') enables *c*onfirmation: vim will highlight the characters to be replaced and will show the replacement text and ask whether you want to go ahead. Great for learning.
Yes, regexes are complicated, but super powerful and well worth learning. Once you have them internalized, they're actually fairly easy. I suggest that, as with learning vim itself, you start with the basics, get fluent in them, and then incrementally add new features to your repertoire.
Good luck and have fun.

Substitute `number` with `(number)` in multiple lines

I am a beginner at Vim and I've been reading about substitution but I haven't found an answer to this question.
Let's say I have some numbers in a file like so:
1
2
3
And I want to get:
(1)
(2)
(3)
I think the command should resemble something like :s:\d\+:........ Also, what's the difference between :s/foo/bar and :s:foo:bar ?
Thanks
Here is an alternative, slightly less verbose, solution:
:%s/^\d\+/(&)
Explanation:
^ anchors the pattern to the beginning of the line
\d is the atom that covers 0123456789
\+ matches one or more of the preceding item
& is a shorthand for \0, the whole match
Let me address those in reverse.
First: there's no difference between :s/foo/bar and :s:foo:bar; whatever delimiter you use after the s, vim will expect you to use from then on. This can be nice if you have a substitution involving lots of slashes, for instance.
For the first: to do this to the first number on the current line (assuming no commas, decimal places, etc), you could do
:s:\(\d\+\):(\1)
The \(...\) doesn't change what is matched - rather, it tells vim to remember whatever matched what is inside, and store it. The first \(...\) is stored in \1, the second in \2, etc. So, when you do the replacement, you can reference \1 to get the number back.
If you want to change ALL numbers on the current line, change it to
:s:\(\d\+\):(\1):g
If you want to change ALL numbers on ALL lines, change it to
:%s:\(\d\+\):(\1):g
You can do what you want with:
:%s/\([0-9]\)/(\1)/
%s means global search and replace, that is do the search/replace for every line in the file. the \( \) defines a group, which in turn is referenced by \1. So the above search and replace, finds all lines with a single digit ([0-9]), and replaces it with the matched digit surrounded by parentheses.

Search for string and get count in vi editor

I want to search for a string and find the number of occurrences in a file using the vi editor.
THE way is
:%s/pattern//gn
You need the n flag. To count words use:
:%s/\i\+/&/gn
and a particular word:
:%s/the/&/gn
See count-items documentation section.
If you simply type in:
%s/pattern/pattern/g
then the status line will give you the number of matches in vi as well.
:%s/string/string/g
will give the answer.
(similar as Gustavo said, but additionally: )
For any previously search, you can do simply:
:%s///gn
A pattern is not needed, because it is already in the search-register (#/).
"%" - do s/ in the whole file
"g" - search global (with multiple hits in one line)
"n" - prevents any replacement of s/ -- nothing is deleted! nothing must be undone!
(see: :help s_flag for more informations)
(This way, it works perfectly with "Search for visually selected text", as described in vim-wikia tip171)
:g/xxxx/d
This will delete all the lines with pattern, and report how many deleted. Undo to get them back after.
Short answer:
:%s/string-to-be-searched//gn
For learning:
There are 3 modes in VI editor as below
: you are entering from Command to Command-line mode. Now, whatever you write after : is on CLI(Command Line Interface)
%s specifies all lines. Specifying the range as % means do substitution in the entire file. Syntax for all occurrences substitution is :%s/old-text/new-text/g
g specifies all occurrences in the line. With the g flag , you can make the whole line to be substituted. If this g flag is not used then only first occurrence in the line only will be substituted.
n specifies to output number of occurrences
//double slash represents omission of replacement text. Because we just want to find.
Once got the number of occurrences, you can Press N Key to see occurrences one-by-one.
For finding and counting in particular range of line number 1 to 10:
:1,10s/hello//gn
Please note, % for whole file is repleaced by , separated line numbers.
For finding and replacing in particular range of line number 1 to 10:
:1,10s/helo/hello/gn
use
:%s/pattern/\0/g
when pattern string is too long and you don't like to type it all again.
I suggest doing:
Search either with * to do a "bounded search" for what's under the cursor, or do a standard /pattern search.
Use :%s///gn to get the number of occurrences. Or you can use :%s///n to get the number of lines with occurrences.
** I really with I could find a plug-in that would giving messaging of "match N of N1 on N2 lines" with every search, but alas.
Note:
Don't be confused by the tricky wording of the output. The former command might give you something like 4 matches on 3 lines where the latter might give you 3 matches on 3 lines. While technically accurate, the latter is misleading and should say '3 lines match'. So, as you can see, there really is never any need to use the latter ('n' only) form. You get the same info, more clearly, and more by using the 'gn' form.

Resources