Vim - Substitute command and regexp - vim

I have a little JSON files with some entries, here is a section:
"i":{
"normale":"3c",
"bold":"4b",
"doppio":"6c"},
"is":{
"normale":"2c",
"bold":"33",
"doppio":"66"},
I realized I have to add "\u25" in front of all the values, so I tried this command:
:%s:\("\)\(\d\d"\)\|\("\)\(\d\w"\):"\\u25\2
The idea is to search for either "dd" or "dw", and substitute the first double quote with "\u25 while keeping the rest.This is the result:
"i":{
"normale":"\u25,
"bold":"\u25,
"doppio":"\u25},
"is":{
"normale":"\u25,
"bold":"\u2533",
"doppio":"\u2566"},
If the matching string has only the two digits, the command works fine: the first double quote (the first group) is substituted and the second group is left as it was.
However, if the matching string has a digit and a character, it seems to ignore the second group, substituting the whole string. The two patterns are identical, except for \w, so it should work exactly the same. What's happening?

Vim matches \d to digits; you'd need \x to match hex digits.
But it seems you want to replace all occurrences of :" with :"\u25.
Can you use:
:%s/:"/:"\\u25"/
Or, if you want to prepend \u25 to all occurrences of 2 hex digits,
:%s/\x\x/\\u25&/

Related

How to replace a range of numbers from a range of strings with sed

I'm trying to modify a given text file, wherein I want to change/alter the following strings, eg:
lcl|NC_018257.1_cds_XP_003862892.1_5067
lcl|NC_018241.1_cds_XP_003859498.1_1683
lcl|NC_018256.1_cds_XP_003862456.1_4633
lcl|NC_018237.1_cds_XP_003858978.1_1163
lcl|NC_018254.1_cds_XP_003861926.1_4104
so that it only contains the XP_n.1 part of the string.
I have successfully removed the lcl|NC\_*.1_cds\_ part out of the strings for which
I used the following sed command:
sed 's/lcl|NC\_.\*_cds_//g' cds.fa > cds4.fa
The resultant text file contains strings like XP_003862892.1_5067.
There are about 8014 strings like this ranging from XP_*.1_1 to XP_*.1_8014. I want to delete the _1 to _8014 part of the string and replace it with 1.
I tried using
sed 's/1\_./1/g'
and it seemed to have worked, however when I scrolled further down the list of strings, the double digit numbers didn't get replaced - only one of the digits was replaced, which immediately followed the '_', resulting in the first digit turning into 1 and the rest retaining their original identity. Same with triple and quadruple digit numbers.
eg:
XP_003857837.1_23 ---> XP_003857837.13
XP_003857942.1_228 ---> XP_003857942.128
I have absolutely no idea how to remove this, all my attempts have led to failure. Some people have asked me for what my desired output should look like, the ideal output would be: XP_003857837.1, each string should be followed by a .1 instead of .1_SomeNumberRangingFrom1to8014
You can do everything in one go with a slightly more complex regex.
sed 's/lcl|NC_.*_cds_\(XP_[0-9.]*\)_.*/\1/' cds.fa > cds4.fa
The backslashed parentheses create a capturing group, and \1 in the replacement recalls the first captured group (\2 for the second, etc, if you have more than one). The regex inside the group looks for XP_ followed by digits and dots, and the expression after matches the rest of the line from the next uderscore on.
In other words, this basically says "replace the whole line with just the part we care about".
By the by, there is no reason to backslash underscores anywhere, and the /g option to the s command only makes sense when you want to replace multiple occurrences on the same input line.
Using sed
$ sed 's/.*_\?\(XP_[^.]*\.\)[^_]*_[0-9]\(.*\)/\11\2/'
XP_003862892.1067
XP_003859498.1683
XP_003862456.1633
XP_003858978.1163
XP_003861926.1104
XP_003857837.13
XP_003857942.128

Dialogflow RE2 Regex

I am new here. I wanted to ask a question on using REGEX for an entity in DialogFlow
I wanted the entity to accept all text and spaces except for the symbol *
I have tried to use [A-Za-z0-9 ][^*], but it is not working. Any advice. thanks!
In your Regex expression, [^*] means "capture any character at the start of the line." To refer to a literal asterisk rather than matching any character, you need to use \*
If you want to match a line of letters or numbers as in the [A-Za-z0-9] example you give, but only if that string does not include an asterisk, then this expression should work for you:
^[a-zA-Z0-9]+$
This means "match a whole line of text if it only contains one or more of the characters a-z, A-Z, or 0-9".
If you want to match any character or group of characters in a line except for the asterisk, then you could use something like this:
(?!\*)([a-zA-Z0-9]+)(?<!\*)
The first part is called a "negative lookahead," and it looks forward to ensure we're not matching the asterisk. The last part is called a "negative lookbehind," and it looks backwards to make sure we're not matching the asterisk. The middle part is your "capture group," and confirms that you're matching any letters or numbers in a given string, but excluding the * character.
If this Regex gets input like *abc, it will capture abc. If it encounters abc*, it will still capture abc. If it encounters abc*def, it will capture abc and def separately in two capture groups, because it will break around the asterisk.
This link explains the concept of lookarounds in Regex. You can also use this Regex tester to get started practicing your Regular Expressions with explanations of what each block of characters does.
EDITED TO ADD If you're just interested in matching single characters rather than groups of characters, you can use [A-Za-z0-9] and match any upper or lowercase letter and any single digit. You don't need to exclude the * character, because the character group is already exclusive.
This is a slight duplicate of the question below, so responses here may also help you. Hope this helps!
How can I exclude asterisk in a regex expression
[A-Za-z0-9 ][^*]
What you regex will do is match 2 consecutive characters. First, it will look for anything A-Za-z0-9 . Then, it will look at the negated set that includes *, and will match ANY character except *.
You can type your regex into https://regexr.com/ to see a breakdown of how it matches and test some strings.
For example, your regex would match these:
Aa
AA
a&
A1
0_
But would not match these:
A*
a*
1*
And WOULD NOT match anything longer than 2 characters. If you really want to match any string with any characters except *, this should work:
[^\*]+
What that will do is match any number of consecutive characters that are not *. (The + means match 1 or more characters in the set). It is also a good idea to escape * because it is also a reserved character in regex. Even though most regex parsers are smart enough to know that inside a group you probably mean the literal char *, it is still a best practice to escape it. (And by that same token, you would want to use \s instead of the blank space in your original regex.)

How to perform following search and replace in vim?

I have the following string in the code at multiple places,
m_cells->a[ Id ]
and I want to replace it with
c(Id)
where the string Id could be anything including numbers also.
A regular expression replace like below should do:
%s/m_cells->a\[\s\(\w\+\)\s\]/c(\1)/g
If you wish to apply the replacement operation on a number of files you could use the :bufdo command.
Full explanation of #BasBossink's answer (as a separate answer because this won't fit in a comment), because regexes are awesome but non-trivial and definitely worth learning:
In Command mode (ie. type : from Normal mode), s/search_term/replacement/ will replace the first occurrence of 'search_term' with 'replacement' on the current line.
The % before the s tells vim to perform the operation on all lines in the document. Any range specification is valid here, eg. 5,10 for lines 5-10.
The g after the last / performs the operation "globally" - all occurrences of 'search_term' on the line or lines, not just the first occurrence.
The "m_cells->a" part of the search term is a literal match. Then it gets interesting.
Many characters have special meaning in a regex, and if you want to use the character literally, without the special meaning, then you have to "escape" it, by putting a \ in front.
Thus \[ and \] match the literal '[' and ']' characters.
Then we have the opposite case: literal characters that we want to treat as special regex entities.
\s matches white*s*pace (space, tab, etc.).
\w matches "*w*ord" characters (letters, digits, and underscore _).
(. matches any character (except a newline). \d matches digits. There are more...)
If a character is not followed by a quantifier, then exactly one such character matches. Thus, \s will match one space or tab, but not fewer or more.
\+ is a quantifier, and means "one or more". (\? matches 0 or 1; * (with no backslash) matches any number: zero or more. Warning: matching on zero occurrences takes a little getting used to; when you're first learning regexes, you don't always get the results you expected. It's also possible to match on an arbitrary exact number or range of occurrences, but I won't get into that here.)
\( and \) work together to form a "capturing group". This means that we don't just want to match on these characters, we also want to remember them specially so that we can do something with them later. You can have any number of capturing groups, and they can be nested too. You can refer to them later by number, starting at 1 (not 0). Just start counting (escaped) left-parantheses from the left to determine the number.
So here, we are matching a space followed by a group (which we will capture) of at least one "word" character followed by a space, within the square brackets.
Then section between the second and third / is the replacement text.
The "c" is literal.
\1 means the first captured group, which in this case will be the "Id".
In summary, we are finding text that matches the given description, capturing part of it, and replacing the entire match with the replacement text that we have constructed.
Perhaps a final suggestion: c after the final / (doesn't matter whether it comes before or after the 'g') enables *c*onfirmation: vim will highlight the characters to be replaced and will show the replacement text and ask whether you want to go ahead. Great for learning.
Yes, regexes are complicated, but super powerful and well worth learning. Once you have them internalized, they're actually fairly easy. I suggest that, as with learning vim itself, you start with the basics, get fluent in them, and then incrementally add new features to your repertoire.
Good luck and have fun.

Substitute `number` with `(number)` in multiple lines

I am a beginner at Vim and I've been reading about substitution but I haven't found an answer to this question.
Let's say I have some numbers in a file like so:
1
2
3
And I want to get:
(1)
(2)
(3)
I think the command should resemble something like :s:\d\+:........ Also, what's the difference between :s/foo/bar and :s:foo:bar ?
Thanks
Here is an alternative, slightly less verbose, solution:
:%s/^\d\+/(&)
Explanation:
^ anchors the pattern to the beginning of the line
\d is the atom that covers 0123456789
\+ matches one or more of the preceding item
& is a shorthand for \0, the whole match
Let me address those in reverse.
First: there's no difference between :s/foo/bar and :s:foo:bar; whatever delimiter you use after the s, vim will expect you to use from then on. This can be nice if you have a substitution involving lots of slashes, for instance.
For the first: to do this to the first number on the current line (assuming no commas, decimal places, etc), you could do
:s:\(\d\+\):(\1)
The \(...\) doesn't change what is matched - rather, it tells vim to remember whatever matched what is inside, and store it. The first \(...\) is stored in \1, the second in \2, etc. So, when you do the replacement, you can reference \1 to get the number back.
If you want to change ALL numbers on the current line, change it to
:s:\(\d\+\):(\1):g
If you want to change ALL numbers on ALL lines, change it to
:%s:\(\d\+\):(\1):g
You can do what you want with:
:%s/\([0-9]\)/(\1)/
%s means global search and replace, that is do the search/replace for every line in the file. the \( \) defines a group, which in turn is referenced by \1. So the above search and replace, finds all lines with a single digit ([0-9]), and replaces it with the matched digit surrounded by parentheses.

How to replace in vim

I have a line in a source file: [12 13 15]. In vim, I type:
:%s/\([0-90-9]\) /\0, /g
wanting to add a coma after 12 and 13. It works, but not quite, as it inserts an extraspace [12 , 13 , 15].
How can I achieve the desired effect?
Use \1 in the replacement expression, not \0.
\1 is the text captured by the first \(...\). If there were any more pairs of escaped parens in your pattern, \2 would match the text capture between the pair starting at the second \(, \3 at the third \(, and so on.
\0 is the entire text matched by the whole pattern, whether in parentheses or not. In your case this includes the space at the end of your pattern.
Also note that [0-90-9] is the same as [0-9]: each [...] collection matches just one character. It happens to work anyway, because in your data ‘a digit followed by a space’ matches in the same places as ‘2 digits followed by a space’. (If you actually needed to only insert commas after 2 digits, you could write [0-9][0-9].)
"I have a line in a source file:..."
then you type :%s/... this will do the substitution on all lines, if it matched. or that is the single line in your file?
If it is the single line, you don't have to group, or [0-9], just :%s/ \+/,/g will do the job.
The fine answers already point interesting solutions, but here's another one,
making use of the \zs, which marks the start of the match. In this pattern:
/[0-9]\zs /
The searched text is /[0-9] /, but only the space counts as a match. Note
that you can use the class \d to simplify the digit character class, so the
following command shall work for your needs:
:s/\d\d\zs /, /g ; matches only the space, replace by `, '
You said you have multiple lines and these changes are only to certain lines.
You can either visually select the lines to be changed or use the :global
command, which searches for lines matching a pattern and applies a command to
them. Now you'd need to build an expression to match the lines to be changed
in a less precise as possible way. If the lines that begins with optional
spaces, a [ and two digits are the only lines to be matched and no other
ones, then this would work for you:
:g/\s*[\d\d/s/\d\d\zs /, /g
Check the help for pattern.txt for \ze and similar and
:global.
Homework: use the help to understand \zs and see how this works:
:s/\d\d\zs\ze /,/g

Resources