Substitute with an expression and matched pattern - vim

In vim we can substitute with an sub-replace-expression. When the substitute string starts with \= the remainder is interpreted as an expression.
e.g. with text:
bar
bar
and substitute command:
:%s/.*/\='foo \0'/
gives unexpected results:
foo \0
foo \0
instead of:
foo bar
foo bar
The question is: How to evaluate expression with matched pattern in substitute?

When you use a sub-replace-expression, the normal special replacements like & and \1 don't work anymore; everything is interpreted as a Vimscript expression. Fortunately, you can access the captured submatches with submatches(), so it becomes:
:%s/.*/\='foo ' . submatch(0)/

You need :%s/.*/foo \0/
With :%s/.*/\='foo \0'/ you evaluate 'foo \0' but that's a string and it evaluates to itself.

You don't need to evaluate any expression for that, use a regex group and proper escapes
:%s /\(.*\)/foo \1/

Related

Add blank spaces after substring while keeping the columns

I have a data file like this:
randomthingsbefore $DATAROOT/randompathwithoutanypattern randomthingsafter
randomthingsbefore $DATAROOT/randompathwithoutanypattern randomthingsafter $DATAROOT/randompathwithoutanypattern randomthingsafter
randomthingsbefore $DATAROOT/randompathwithoutanypattern randomthingsafter
(...)
I want to delete the substring $DATAROOT from each path and add blank spaces after the path to keep the columns where randomthingsafter started. Notice that there could be 2 or more paths with the $DATAROOT substring in the same line. This way, my desired output would look like this:
randomthingsbefore /randompathwithoutanypattern randomthingsafter
randomthingsbefore /randompathwithoutanypattern randomthingsafter /randompathwithoutanypattern randomthingsafter
randomthingsbefore /randompathwithoutanypattern randomthingsafter
(...)
I've tried:
VAR1=*pathtofile*
VAR2=$(\grep -oP '\$DATAROOT\K[^ ]*' $VAR1)
arr=$(echo $VAR2 | tr " " "\n")
for x in $arr
do
y="${x} "
sed -i "s:$x:$y:" $VAR1
done
sed -i 's/$DATAROOT\///g' $VAR1
but it does not seem to work. Thank you for your help!
I believe the easiest is just to use sed to replace your script in a single line:
sed 's/$DATAROOT\([^[:blank:]]*\)/\1 /g' /path/to/file
Note, that are 9 spaces after \1 which is the length of the string $DATAROOT. Here we make use of what is known as back-reference:
Editing Commands in sed
[2addr]s/BRE/replacement/flags:
Substitute the replacement string for instances of the BRE in the pattern space. Any character other than <backslash> or <newline> can be used instead of a <slash> to delimit the BRE and the replacement. Within the BRE and the replacement, the BRE delimiter itself can be used as a literal character if it is preceded by a <backslash>.
The replacement string shall be scanned from beginning to end. An <ampersand> ( & ) appearing in the replacement shall be replaced by the string matching the BRE. The special meaning of & in this context can be suppressed by preceding it by a <backslash>. The characters \n, where n is a digit, shall be replaced by the text matched by the corresponding back-reference expression. If the corresponding back-reference expression does not match, then the characters \n shall be replaced by the empty string. The special meaning of \n where n is a digit in this context, can be suppressed by preceding it by a <backslash>. For each other <backslash> encountered, the following character shall lose its special meaning (if any).
source: POSIX SED
9.3.6 BREs Matching Multiple Characters
The back-reference expression \n shall match the same (possibly empty) string of characters as was matched by a subexpression enclosed between \( and \) preceding the \n. The character n shall be a digit from 1 through 9, specifying the nth subexpression (the one that begins with the nth \( from the beginning of the pattern and ends with the corresponding paired \) ). The expression is invalid if less than n subexpressions precede the \n. The string matched by a contained subexpression shall be within the string matched by the containing subexpression. If the containing subexpression does not match, or if there is no match for the contained subexpression within the string matched by the containing subexpression, then back-reference expressions corresponding to the contained subexpression shall not match. When a subexpression matches more than one string, a back-reference expression corresponding to the subexpression shall refer to the last matched string. For example, the expression ^\(.*\)\1$ matches strings consisting of two adjacent appearances of the same substring, and the expression \(a\)*\1 fails to match a, the expression \(a\(b\)*\)*\2 fails to match abab, and the expression ^\(ab*\)*\1$ matches ababbabb, but fails to match ababbab.
source: POSIX Basic Regular Expressions

Replace C statement with substitute in vim

I would like to use vim's substitute function (:%s) to search and replace a certain pattern of code. For example if I have code similar to the following:
if(!foo)
I would like to replace it with:
if(foo == NULL)
However, foo is just an example. The variable name can be anything.
This is what I came up with for my vim command:
:%s/if(!.*)/if(.* == NULL)/gc
It searches the statements correctly, but it tries to replace it with ".*" instead of the variable that's there (i.e "foo"). Is there a way to do what I am asking with vim?
If not, is there any other editor/tools I can use to help me with modifications like these?
Thanks in advance!
You need to use capture grouping and backreferencing in order to achieve that:
Pattern String sub. flags
|---------| |------------| |-|
:%s/if(!\(.*\))/if(\1 == NULL)/gc
|---| |--|
| ^
|________|
The matched string in pattern will be exactly repeated in string substitution
:help /\(
\(\) A pattern enclosed by escaped parentheses. /\(/\(\) /\)
E.g., "\(^a\)" matches 'a' at the start of a line.
E51 E54 E55 E872 E873
\1 Matches the same string that was matched by /\1 E65
the first sub-expression in \( and \). {not in Vi}
Example: "\([a-z]\).\1" matches "ata", "ehe", "tot", etc.
\2 Like "\1", but uses second sub-expression, /\2
... /\3
\9 Like "\1", but uses ninth sub-expression. /\9
Note: The numbering of groups is done based on which "\(" comes first
in the pattern (going left to right), NOT based on what is matched
first.
You can use
:%s/if(!\(.*\))/if(\1 == NULL)/gc
By putting .* in \( \) you make numbered captured group, which means that the regex will capture what is in .*
When the replace starts then by using \1 you will print the captured group.
A macro is easy in this case, just do the following:
qa .............. starts macro 'a'
f! .............. jumps to next '!'
x ............... erase that
e ............... jump to the end of word
a ............... starts append mode (insert)
== NULL ........ literal == NULL
<ESC> ........... stop insert mode
q ............... stops macro 'a'
:%norm #a ........ apply marco 'a' in the whole file
:g/^if(!/ norm #a apply macro 'a' in the lines starting with if...
Try the following:
%s/if(!\(.\{-}\))/if(\1 == NULL)/gc
The quantifier .\{-} matches a non-empty word, as few as possible (more strict than .*).
The paranthesis \( and \) are used to divide the searched expression into subexpressions, so that you can use those subgroups in the substitute string.
Finally, \1 allows the user to use the first matched subexpression, in our case it is whatever is caught inside the paranthesis.
I hope this is more clear, more information can be found here. And thanks for the comment that suggests improving the answer.

Powershell Search String in File Containing $

I want to search the following string in a text file: $$u$$
I tried select-string, get-content, .Contains. It seems to me it's not possible.
I used this for the search as a variable: $ToSearch = "'$'$u'$'$"
It always gives false result.
It is because most search filters are relying on regex. The $ symbol in regex needs to be escaped
Without knowing more of what you're trying to accomplish I can't give much of an example, but here is one:
'this is a $$u$$ test' -replace "\$",""
The '\' is what is escaping the character - meaning to translate it literally.
Edit: Per comment
$Val = 'this is a $$u$$ test'
$Val | Select-String "\$+\w\$+" -quiet
-Quiet switch returns t/f rather than a string value.
The $ is a reserved character in regex, which is complicating your search.
The default search pattern type for Select-String is regex, but you can also specify -Simplematch or -Wildcard, either of which will eliminate the need to escape the $. If you use -Wildcard, you'll need to include the wilcard * at either end of the match - '*$$u$$*'. For simplematch, just use the string you want to match for - '$$u$$'.

Working with sed linux command

In my shellscript code I saw that there is line that is handling Telephone number using sed command.
sed "s~<Telephone type[ ]*=[ ]*\"fax\"[ ]*><Number>none[ ]*</Number></Telephone>~~g" input.xml > output.xml
I am not understanding what the regular expression actually does.
<Telephone type[ ]*=[ ]*\"fax\"[ ]*><Number>none[ ]*</Number></Telephone>
I am doing revere engineering to get this working.
My xml structure like below.
<ContactMethod>
<InternetEmailAddress>donald.francis#lexisnexis.com</InternetEmailAddress>
<Telephone type = "work">
<Number>215-639-9000 x3281</Number>
</Telephone>
<Telephone type = "home">
<Number>484-231-1141</Number>
</Telephone>
<Telephone type = "fax">
<Number>N/A</Number>
</Telephone>
<Telephone type = "work">
<Number>215-639-9000 x3281</Number>
</Telephone>
<Telephone type = "home">
<Number>484-231-1141</Number>
</Telephone>
<Telephone type = "fax">
<Number>none</Number>
</Telephone>
<Telephone type1 = "fax12234">
<Number>484-231-1141sadsadasdasdaasd</Number>
</Telephone>
</ContactMethod>
That regex recognises <Telephone type = "fax"> entries where the number is given as none, and deletes them.
Breakdown:
s sed command for "substitution".
~ pattern separator. You can choose any character for this. sed recoginizes it because it comes right after the s.
<Telephone type This matches the literal text "<Telephone type".
[ ]* matches zero or more spaces.
= matches a literal "="
[ ]* matches zero or more spaces.
\"fax\" matches literal text. The quotes are escaped because the whole pattern appears inside quotes, but the shell removes the quote characters (\) before sed sees them.
[ ]* matches zero or more spaces.
><Number>none matches literal text.
[ ]* matches zero or more spaces.
</Number></Telephone> matches the literal text.
~~ the pattern separators end the search pattern, and surround an empty replace pattern.
g is a flag that means the substitution will be performed multiple times on each line.
The only thing that confuses me is that this pattern won't match anything that has line breaks in it, so I presume your input.xml isn't actually formatted like you have in your example data?

Multiple :g and :v commands in one statement

I have this file
foo
foo bar
foo bar baz
bar baz
foo baz
baz bar
bar
baz
foo 42
foo bar 42 baz
baz 42
I want to
Select lines which contain foo and do NOT contain bar
Delete lines which contain foo and do NOT contain bar
I read somewhere (can't find the link) that I have to use :exec with | for this.
I tried the following, but it doesn't work
:exec "g/foo" # works
:exec "g/foo" | exec "g/bar" -- first returns lines with foo, then with bar
:exec "g/foo" | :g/bar -- same as above
And ofcourse if I cannot select a line, I cannot execute normal dd on it.
Any ideas?
Edit
Note for the bounty:
I'm looking for a solution that uses proper :g and :v commands, and does not use regex hacks, as the conditions may not be the same (I can have 2 includes, 3 excludes).
Also note that the last 2 examples of things that don't work, they do work for just deleting the lines, but they return incorrect information when I run them without deleting (ie, viewing the selected lines) and they behave as mentioned above.
I'm no vim wizard, but if all you want to do is "Delete lines which contain foo and do NOT contain bar" then this should do (I tried on your example file):
:v /bar/s/.*foo.*//
EDIT: actually this leaves empty lines behind. You probably want to add an optional newline to that second search pattern.
This might still be hackish to you, but you can write some vimscript to make a function and specialized command for this. For example:
command! -nargs=* -range=% G <line1>,<line2>call MultiG(<f-args>)
fun! MultiG(...) range
let pattern = ""
let command = ""
for i in a:000
if i[0] == "-"
let pattern .= "\\(.*\\<".strpart(i,1)."\\>\\)\\#!"
elseif i[0] == "+"
let pattern .= "\\(.*\\<".strpart(i,1)."\\>\\)\\#="
else
let command = i
endif
endfor
exe a:firstline.",".a:lastline."g/".pattern."/".command
endfun
This creates a command that allows you to automate the "regex hack". This way you could do
:G +foo -bar
to get all lines with foo and not bar. If an argument doesn't start with + or - then it is considered the command to add on to the end of the :g command. So you could also do
:G d +foo -bar
to delete the lines, or even
:G norm\ foXp +two\ foos -bar
if you escape your spaces. It also takes a range like :1,3G +etc, and you can use regex in the search terms but you must escape your spaces. Hope this helps.
This is where regular expressions get a bit cumbersome. You need to use the zero width match \(search_string\)\#=. If you want to match a list of items in any order, the search_string should start with .* (so the match starts from the start of the line each time). To match a non-occurrence, use \#! instead.
I think these commands should do what you want (for clarity I am using # as the delimiter, rather than the usual /):
Select lines which contain foo and bar:
:g#\(.*foo\)\#=\(.*bar\)\#=
Select lines which contain foo, bar and baz
:g#\(.*foo\)\#=\(.*bar\)\#=\(.*baz\)\#=
Select lines which contain foo and do NOT contain bar
:g#\(.*foo\)\#=\(.*bar\)\#!
Delete lines which contain foo and bar
:g#\(.*foo\)\#=\(.*bar\)\#=#d
Delete lines which contain foo and do NOT contain bar
:g#\(.*foo\)\#=\(.*bar\)\#!#d
You won't achieve your requirements unless you're willing to use some regular expressions since the expressions are what drives :global and it's opposite :vglobal.
This is no hacking around but how the commands are supposed to work: they need an expression to work with. If you're not willing to use regular expressions, I'm afraid you won't be able to achieve it.
Answer terminates here if you're not willing to use any regular expressions.
Assuming that we are nice guys with an open mind, we need a regular expression that is true when a line contains foo and not bar.
Suggestion number 5 of Prince Goulash is quite there but doesn't work if foo occurs after bar.
This expression does the job (i.e. print all the lines):
:g/^\(.*\<bar\>\)\#!\(.*\<foo\>\)\#=/
If you want to delete them, add the delete command:
:g/^\(.*\<bar\>\)\#!\(.*\<foo\>\)\#=/d
Description:
^ starting from the beginning of the line
\(.*\<bar\>\) the word bar
\#! must never appear
\(.*\<foo\>\)\#= but the word foo has to appear anywhere on the line
The two patterns could also be swapped:
:g/^\(.*\<foo\>\)\#=\(.*\<bar\>\)\#!/
yields the same results.
Tested with the following input:
01 foo
02 foo bar
03 foo bar baz
04 bar baz
05 foo baz
06 baz bar
07 bar
08 baz
09 foo 42
10 foo bar 42 baz
11 42 foo baz
12 42 foo bar
13 42 bar foo
14 baz 42
15 baz foo
16 bar foo
Regarding multiple includes/excludes:
Each exclude is made of the pattern
\(.*\<what_to_exclude\>\)\#!
Each include is made of the pattern
\(.*\<what_to_include\>\)\#=
To print all the lines that contain foo but not bar nor baz:
g/^\(.*\<bar\>\)\#!\(.*\<baz\>\)\#!\(.*\<foo\>\)\#=/
Print all lines that contain foo and 42 but neither bar nor baz:
g/^\(.*\<bar\>\)\#!\(.*\<baz\>\)\#!\(.*\<foo\>\)\#=\(.*\<42\>\)\#=/
The sequence of the includes and excludes is not important, you could even mix them:
g/^\(.*\<bar\>\)\#!\(.*\<42\>\)\#=\(.*\<baz\>\)\#!\(.*\<foo\>\)\#=/
One might think a combination like :g/foo/v/bar/d would work, but unfortunately this isn't possible, and you will have to recur to one of the proposed work-arounds.
As described in the help, behind the scenes the :global command works in two stages,
first marking the lines on which to operate,
then performing the operation on them.
Out of interest, I had a look at the relevant parts in the Vim source: In ex_cmds.c, ex_global(), you will find that the global flag global_busy prevents repeated execution of the command while it is busy.
You want to employ a negative look ahead. This article gives more or less the specific example you are trying to achieve.
http://www.littletechtips.com/2009/09/vim-negative-match-using-negative-look.html
I changed it to
:g/foo(.*bar)\#!/d
Please let us know if you consider this a regex hack.
I will throw my hat in the ring. As vim's documentation explicitly states recursive global commands are invalid and the regex solution will get pretty hairy quickly, I think this is job for a custom function and command. I have created the :G command.
The usage is as :G followed by patterns surrounded by /. Any pattern that should not match is prefixed with a !.
:G /foo/ !/bar/ d
This will delete all lines that match /foo/ and does not match /bar/
:G /42 baz/ !/bar/ norm A$
This will append a $ to all lines matching /42 baz/ and that don't match /bar/
:G /foo/ !/bar/ !/baz/ d
This will delete all lines that match /foo/ and does not match /bar/ and does not match /baz/
The script for the :G command is below:
function! s:ManyGlobal(args) range
let lnums = {}
let patterns = []
let cmd = ''
let threshold = 0
let regex = '\m^\s*\(!\|v\)\=/.\{-}\%(\\\)\#<!/\s\+'
let args = a:args
while args =~ regex
let pat = matchstr(args, regex)
let pat = substitute(pat, '\m^\s*\ze/', '', '')
call add(patterns, pat)
let args = substitute(args, regex, '', '')
endwhile
if args =~ '\s*'
let cmd = 'nu'
else
let cmd = args
endif
for p in patterns
if p =~ '^(!\|v)'
let op = '-'
else
let op = '+'
let threshold += 1
endif
let marker = "let l:lnums[line('.')] = get(l:lnums, line('.'), 0)" . op . "1"
exe a:firstline . "," . a:lastline . "g" . substitute(p, '^(!\|v)', '', '') . marker
endfor
let patterns = []
for k in keys(lnums)
if threshold == lnums[k]
call add(patterns, '\%' . k . 'l')
endif
endfor
exe a:firstline . "," . a:lastline . "g/\m" . join(patterns, '\|') . "/ " . cmd
endfunction
command! -nargs=+ -range=% G <line1>,<line2>call <SID>ManyGlobal(<q-args>)
The function basically parses out the arguments then goes and marks all matching lines with each given pattern separately. Then executes the given command on each line that is marked the proper amount of times.
All right, here's one which actually simulates recursive use of global commands. It allows you to combine any number of :g commands, at least theoretically. But I warn you, it isn't pretty!
Solution to the original problem
I use the Unix program nl (bear with me!) to insert line numbers, but you can also use pure Vim for this.
:%!nl -b a
:exec 'norm! qaq'|exec '.,$g/foo/d A'|exec 'norm! G"apddqaq'|exec '.,$v/bar/d'|%sort|%s/\v^\s*\d+\s*
Done! Let's see the explanation and general solution.
General solution
This is the approach I have chosen:
Introduce explicit line numbering
Use the end of the file as a scratch space and operate on it repeatedly
Sort the file, remove the line numbering
Using the end of the file as a scratch space (:g/foo/m$ and similar) is a pretty well-known trick (you can find it mentioned in the famous answer number one). Also note that :g preserves relative ordering of the lines – this is crucial. Here we go:
Preparation: Number lines, clear "accumulator" register a.
:%!nl
qaq
The iterative bit:
:execute global command, collect matching lines by appending them into the accumulator register with :d A.
paste the collected lines at the end of the file
repeat for range .,$ (the scratch space, or in our case, the "match" space)
Here's an extended example: delete lines which do contain 'foo', do not contain 'bar', do contain '42' (just for the demonstration).
:exec '.,$g/foo/d A' | exec 'norm! G"apddqaq' | exec '.,$v/bar/d A' | exec 'norm! G"apddqaq' | exec '.,$g/42/d A' | exec 'norm! G"apddqaq'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(this is the repeating bit)
When the iterative bit ends, the lines .,$ contain the matches for your convenience. You can delete them (dVG) or whatever.
Cleanup: Sort, remove line numbers.
:%sort
:%s/\v^\s*\d+\s*
I'm sure other people can improve on the details of the solution, but if you absolutely need to combine multiple :gs and :vs into one, this seems to be the most promising solution.
The in-built solutions looks very complex.
One easy way would be to use LogiPat plugin:
Doc: http://www.drchip.org/astronaut/vim/doc/LogiPat.txt.html
Plugin: http://www.drchip.org/astronaut/vim/index.html#LOGIPAT
With this, you can easily search for patterns.
For e.g, to search for lines containing foo, and not bar, use:
:LogiPat "foo"&!("bar")
This would highlight all the lines matching the logical pattern (if you have set hls).
That way you can cross-check whether you got the correct lines, and then traverse with 'n', and delete with 'dd', if you wish.
I realize you explicitly stated that you want solutions using :g and :v, but I firmly believe this is a perfect example of a case where you really should use an external tool.
:%!awk '\!/foo/ || /bar/'
There's no need to re-invent the wheel.
Select lines which contain foo and do NOT contain bar
Delete lines which contain foo and do NOT contain bar
This can be done by combining global and substitute commands:
:v/bar/s/.*foo.*//g

Resources