I've a file with occurence in the form _[number].htm, for example _43672151820.htm
How I can remove all occurrences of strings with a matching pattern?
Substitute substring
Use this regular expression with the substitution command %s
:%s/_\d\+\.htm//g
Explanation (from regex101.com):
_ matches the character _ with index 9510 (5F16 or 1378) literally (case sensitive)
\d matches a digit (equivalent to [0-9])
\+ matches the character + with index 4310 (2B16 or 538) literally (case sensitive)
\. matches the character . with index 4610 (2E16 or 568) literally (case sensitive)
htm matches the characters htm literally (case sensitive)
Global pattern flags
g modifier: global. All matches (don't return after first match)
Substitute word
The above regular expression will match for instance 123.htm in ab_123.htm. If you want to match a word use vim's word boundaries\< and \>:
:%s/\<_\d\+\.htm\>//g
(see In Vim, how do you search for a word boundary character, like the \b in regexp?)
Related
in nodejs, I have to following pattern
(/^(>|<|>=|<=|!=|%)?[a-z0-9 ]+?(%)*$/i
to match only alphanumeric strings, with optional suffix and prefix with some special characters. And its working just fine.
Now I want to match the last '%' only if the first character is alphanumeric (case insensitive) or also a %. Then its optionally allowed, otherwise it should not match.
Example:
Should match:
>test
!=test
<test
>=test
<=test
%test
test
%test%
test%
Example which should not match:
<test% <-- its now matching, which is not correct
<test< <-- its now **not** matching, which is correct
Any Ideas?
You can add a negative lookahead after ^ like
/^(?![^a-z\d%].*%$)(?:[><]=?|!=|%)?[a-z\d ]+%*$/i
^^^^^^^^^^^^^^^^^
See the regex demo. Details:
^ - start of string
(?![^a-z\d%].*%$) - fail the match if there is a char other than alphanumeric or % at the start and % at the end
(?:[><]=?|!=|%)? - optionally match <, >, <=, >=, != or %
[a-z\d ]+ - one or more alphanumeric or space chars
%* - zero or more % chars
$ - end of string
You might use an alternation | to match either one of the options.
^(?:[a-z0-9%](?:[a-z0-9 ]*%)?|(?:[<>]=?|!=|%)?[a-z0-9 ]+)$
^ Start of string
(?: Non capture group
[a-z0-9%] Match one of the listed in the character class
(?:[a-z0-9 ]*%)? Optionally match repeating 0+ times any of the character class followed by %
| Or
(?:[<>]=?|!=|%)? Optionally match one of the alternatives
[a-z0-9 ]+ Match 1+ times any of the character class
) Close non capture group
$ End of string
Regex demo
I have a data file like this:
randomthingsbefore $DATAROOT/randompathwithoutanypattern randomthingsafter
randomthingsbefore $DATAROOT/randompathwithoutanypattern randomthingsafter $DATAROOT/randompathwithoutanypattern randomthingsafter
randomthingsbefore $DATAROOT/randompathwithoutanypattern randomthingsafter
(...)
I want to delete the substring $DATAROOT from each path and add blank spaces after the path to keep the columns where randomthingsafter started. Notice that there could be 2 or more paths with the $DATAROOT substring in the same line. This way, my desired output would look like this:
randomthingsbefore /randompathwithoutanypattern randomthingsafter
randomthingsbefore /randompathwithoutanypattern randomthingsafter /randompathwithoutanypattern randomthingsafter
randomthingsbefore /randompathwithoutanypattern randomthingsafter
(...)
I've tried:
VAR1=*pathtofile*
VAR2=$(\grep -oP '\$DATAROOT\K[^ ]*' $VAR1)
arr=$(echo $VAR2 | tr " " "\n")
for x in $arr
do
y="${x} "
sed -i "s:$x:$y:" $VAR1
done
sed -i 's/$DATAROOT\///g' $VAR1
but it does not seem to work. Thank you for your help!
I believe the easiest is just to use sed to replace your script in a single line:
sed 's/$DATAROOT\([^[:blank:]]*\)/\1 /g' /path/to/file
Note, that are 9 spaces after \1 which is the length of the string $DATAROOT. Here we make use of what is known as back-reference:
Editing Commands in sed
[2addr]s/BRE/replacement/flags:
Substitute the replacement string for instances of the BRE in the pattern space. Any character other than <backslash> or <newline> can be used instead of a <slash> to delimit the BRE and the replacement. Within the BRE and the replacement, the BRE delimiter itself can be used as a literal character if it is preceded by a <backslash>.
The replacement string shall be scanned from beginning to end. An <ampersand> ( & ) appearing in the replacement shall be replaced by the string matching the BRE. The special meaning of & in this context can be suppressed by preceding it by a <backslash>. The characters \n, where n is a digit, shall be replaced by the text matched by the corresponding back-reference expression. If the corresponding back-reference expression does not match, then the characters \n shall be replaced by the empty string. The special meaning of \n where n is a digit in this context, can be suppressed by preceding it by a <backslash>. For each other <backslash> encountered, the following character shall lose its special meaning (if any).
source: POSIX SED
9.3.6 BREs Matching Multiple Characters
The back-reference expression \n shall match the same (possibly empty) string of characters as was matched by a subexpression enclosed between \( and \) preceding the \n. The character n shall be a digit from 1 through 9, specifying the nth subexpression (the one that begins with the nth \( from the beginning of the pattern and ends with the corresponding paired \) ). The expression is invalid if less than n subexpressions precede the \n. The string matched by a contained subexpression shall be within the string matched by the containing subexpression. If the containing subexpression does not match, or if there is no match for the contained subexpression within the string matched by the containing subexpression, then back-reference expressions corresponding to the contained subexpression shall not match. When a subexpression matches more than one string, a back-reference expression corresponding to the subexpression shall refer to the last matched string. For example, the expression ^\(.*\)\1$ matches strings consisting of two adjacent appearances of the same substring, and the expression \(a\)*\1 fails to match a, the expression \(a\(b\)*\)*\2 fails to match abab, and the expression ^\(ab*\)*\1$ matches ababbabb, but fails to match ababbab.
source: POSIX Basic Regular Expressions
I would like to use vim's substitute function (:%s) to search and replace a certain pattern of code. For example if I have code similar to the following:
if(!foo)
I would like to replace it with:
if(foo == NULL)
However, foo is just an example. The variable name can be anything.
This is what I came up with for my vim command:
:%s/if(!.*)/if(.* == NULL)/gc
It searches the statements correctly, but it tries to replace it with ".*" instead of the variable that's there (i.e "foo"). Is there a way to do what I am asking with vim?
If not, is there any other editor/tools I can use to help me with modifications like these?
Thanks in advance!
You need to use capture grouping and backreferencing in order to achieve that:
Pattern String sub. flags
|---------| |------------| |-|
:%s/if(!\(.*\))/if(\1 == NULL)/gc
|---| |--|
| ^
|________|
The matched string in pattern will be exactly repeated in string substitution
:help /\(
\(\) A pattern enclosed by escaped parentheses. /\(/\(\) /\)
E.g., "\(^a\)" matches 'a' at the start of a line.
E51 E54 E55 E872 E873
\1 Matches the same string that was matched by /\1 E65
the first sub-expression in \( and \). {not in Vi}
Example: "\([a-z]\).\1" matches "ata", "ehe", "tot", etc.
\2 Like "\1", but uses second sub-expression, /\2
... /\3
\9 Like "\1", but uses ninth sub-expression. /\9
Note: The numbering of groups is done based on which "\(" comes first
in the pattern (going left to right), NOT based on what is matched
first.
You can use
:%s/if(!\(.*\))/if(\1 == NULL)/gc
By putting .* in \( \) you make numbered captured group, which means that the regex will capture what is in .*
When the replace starts then by using \1 you will print the captured group.
A macro is easy in this case, just do the following:
qa .............. starts macro 'a'
f! .............. jumps to next '!'
x ............... erase that
e ............... jump to the end of word
a ............... starts append mode (insert)
== NULL ........ literal == NULL
<ESC> ........... stop insert mode
q ............... stops macro 'a'
:%norm #a ........ apply marco 'a' in the whole file
:g/^if(!/ norm #a apply macro 'a' in the lines starting with if...
Try the following:
%s/if(!\(.\{-}\))/if(\1 == NULL)/gc
The quantifier .\{-} matches a non-empty word, as few as possible (more strict than .*).
The paranthesis \( and \) are used to divide the searched expression into subexpressions, so that you can use those subgroups in the substitute string.
Finally, \1 allows the user to use the first matched subexpression, in our case it is whatever is caught inside the paranthesis.
I hope this is more clear, more information can be found here. And thanks for the comment that suggests improving the answer.
How could I use string.gmatch(text, pattern) to do this:
text = "Hello.%23 Awesome7^.."
pattern = --what to put here?
for word in string.gmatch(text, pattern) do
print(word)
end
--Result
>test
Hello.%23
Awesome7^..
>
I have been using "%w+%p", but this results in:
>test
Hello.
%
23
Awesome7^
.
.
Which is not the desired result.
Note: I have not tested this exact string, it could vary... but still, does not create the desired result
From your example, every word contains no spaces, and are separated by spaces, so the simplest pattern is "%S+":
text = "Hello.%23 Awesome7^.."
pattern = "%S+"
for word in string.gmatch(text, pattern) do
print(word)
end
"%s" matches a space character, "%S" matches a non-space character.
Specifically, I'll like to detect lines that have a '+' character in either the first column, or in the second position right after a '*' character.
This pattern ?
/^\*\=+
^ matches the start of line
\* matches the star character
\= says if any
+ matches the ... plus.