Replace part of regular expression and keep prefix/postfix - linux

I was trying to replace some strings in a file using sed and came accross an issue.
I have the following strings:
TEMPLATE_MODULE
TEMPLATE_SOME_ERR
TEMPLATE_MORE_ERR
I would like to replace TEMPLATE_MODULE with some string and all strings that start with TEMPLATE and end with ERR with a different string, as follows:
TEMPLATE_MODULE ---> NEW_MODULE_NAME
TEMPLATE_SOME_ERR ---> NEW_MODULE_NAME_SOME_ERR
TEMPLATE_MORE_ERR ---> NEW_MODULE_NAME_MORE_ERR
The replacement of TEMPLATE_MODULE is easy:
find . -type f -print -exec sed -i "s/TEMPLATE_MODULE/NEW_MODULE_NAME/g" {} +
Though I don't know how to handle the other part. If I look for strings starting with TEMPLATE_* , I would also catch TEMPLATE_MODULE.
I also want to keep the SOME_ERR or MORE_ERR postfix so this solution would not work:
find . -type f -print -exec sed -i "s/TEMPLATE_.*_ERR$/NEW_MODULE_NAME/g" {} +
Any ideas?
Thanks!

Consider this sample input
$ cat ip.txt
foo
TEMPLATE_MODULE
TEMPLATE_SOME_ERR
TEMPLATE_MORE_ERR
TEMPLATE_SOME_ERR xyz
Use multiple s commands and capture groups
$ sed -E 's/\bTEMPLATE_MODULE\b/NEW_MODULE_NAME/g; s/\bTEMPLATE\w*(_(SOME|MORE)_ERR)\b/NEW_MODULE_NAME\1/g' ip.txt
foo
NEW_MODULE_NAME
NEW_MODULE_NAME_SOME_ERR
NEW_MODULE_NAME_MORE_ERR
NEW_MODULE_NAME_SOME_ERR xyz
\b is for word boundaries. \bcat\b will match only cat and won't match scat or cater
s/\bTEMPLATE_MODULE\b/NEW_MODULE_NAME/g will replace TEMPLATE_MODULE with NEW_MODULE_NAME
s/\bTEMPLATE\w*(_(SOME|MORE)_ERR)\b/NEW_MODULE_NAME\1/g will replace TEMPLATE followed by zero or more word characters ending with _SOME_ERR or _MORE_ERR with NEW_MODULE_NAME and the captured string
Solution is for GNU sed, not sure about portability with other implementations

Assuming that "TEMPLATE_MODULE" and "TEMPLATE_" are literal, but "SOME_ERR" and "MORE_ERR" are placeholders, the following seems possible.
sed "s/TEMPLATE\(_MODULE\)*/NEW_MODULE_NAME/"
I recommend to play with this as a bare sed line, then, if it is OK, integrate it into your commandline.
I think this code does not make more assumptions on occurrences and spellings than your own code.
However, you probably want to use "anchoring" in order to not accidentally replace "SOMEOTHERTEMPLATE_" etc.
To start you off on that here is a modified version:
sed "s/\bTEMPLATE\(_MODULE\)*/NEW_MODULE_NAME/"
I recommend to try that with:
TEMPLATE_MODULE
TEMPLATE_SOME_ERR
TEMPLATE_MORE_ERR
OTHER_TEMPLATE_MODULE
OTHER_TEMPLATE_SOME_ERR

Related

grep: finding a string that starts and ends with a specific letter in directory

I am teaching myself commands and different ways to use grep. I know how to search for a string in a directory and its sub directories, but I am confused when it comes to searching for a split in the string.
for example: how could I search for all words (string size varies) that starts with a and ends with e. so that I could find ape or apple in text files?
EDIT UPDATE:
I am not sure of the grep version I am using, but I tried using:
" grep -nr "a[A-Za-z]*e" "
this produces the answer by including outputs like ape and apple but it also includes apes which is NOT wanted.
Simply:
grep '\ba\w*e\b'
or
grep --color '\ba\w*e\b'
or
grep -rn '\ba\w*e\b'
Some explanations
As this question is tagged linux, this answer use GNU grep: grep (GNU grep) 2.27.
The result of command man grep | grep -3 '\\b':
The Backslash Character and Special Expressions
The symbols \< and \> respectively match the empty string at the
beginning and end of a word. The symbol \b matches the empty string at
the edge of a word, and \B matches the empty string provided it's not
at the edge of a word. The symbol \w is a synonym for [_[:alnum:]] and
\W is a synonym for [^_[:alnum:]].
Let you show
\b mean edge of a word
\w mean [_[:alnum:]]
a and e are letters
you may already know* wich mean The preceding item will be matched zero or more times. (elsewhere in same man page: man grep | grep '^ *\*' ;)
... and finally... This could by written:
grep '\<a\w*e\>'
where
The symbols \< and > respectively match the empty string at the beginning and end of a word.
This could have near same effect, but description strictly correspond to title of this: grep: finding a string that starts and ends with a specific letter in directory
I suppose you could use:
find . -type f -name '*.txt' -exec cat {} \; | grep 'a[A-Za-z]\+e'
That should cat any .txt files in the current directory, recursively, and grep for "a... any characters... e"
The [A-Za-z] searches for a character of either case, the \+ says "any number of them".
I think that's what you're after?
Edit:
Word boundaries:
find . -type f -name '*.txt' -exec cat {} \+ | grep '\ba[A-Za-z]\+e\b'
As alluded to in various comments, it is possible to do this using POSIX standard grep -E, but it is not all that notationally convenient.
I used a script file grep-ape.sh containing:
grep -E -e '(^|[^[:alpha:]])a[[:alpha:]]+e($|[^[:alpha:]])' "$#"
The -E enables extended regular expressions. The -e is optional, but allows me to add extra options as 'file names' after the regular expression. The regular expression looks for either 'start of line' or a non-alpha character, followed by a, one or more additional alpha characters, an e and either 'end of line' or a non-alpha character.
Given the data file (called, unimaginatively, data):
I want to tape the apes that ate the grapes.
ape at the start.
Ending with ape
Situating ape in the middle
And an apple too.
But not apples, no way.
The tape ran out.
The apes ran out.
The grapes ran out.
They ate them.
I could run grep-ape.sh -n data (demonstrating the usefulness of the -e option, though GNU systems will permute options so you don't necessarily spot the problem), and got:
1:I want to tape the apes that ate the grapes.
2:ape at the start.
3:Ending with ape
4:Situating ape in the middle
5:And an apple too.
10:They ate them.
Using a non-POSIX option -o (supported by GNU and BSD versions of grep) to print only what is matched, I can get the output:
$ grep-ape.sh -n -o data
1: ate
2:ape
3: ape
4: ape
5: apple
10: ate
$
This shows that the regular expression is picking up the acceptable words, even on lines where there are words that would not be acceptable when not in the company of words that are acceptable.

Linux - How can i replace some string with same string enclosed with braces?

I have some files in a dir, when i grep some string, i get result like below.
scripts/FileReplace> grep -r "case" *
dir1/file2:case 'a'
dir2/file3:case "ssss"
file1:case 1
After i use replace cmd i expect strings in files updated as below
CASE ('a')
CASE ("ssss")
CASE (1)
ie,, "case" is replaced with "CASE" and text after space is enclosed in braces as above.
Any suggestion how i can do this with shell cmd.
You can use sed and its substitution:
find . -type f -exec sed -i 's/case \(.\+\)/CASE (\1)/' {} +
.\+ matches anything that has at least one character.
\(...\) creates a capture group, you can reference the first capture group as \1.
running with -i~ instead of -i will create backups of the files; recommended especially if you're just experimenting.

How to replace string in files recursively via sed or awk?

I would like to know how to search from the command line for a string in various files of type .rb.
And replace:
.delay([ANY OPTIONAL TEXT FOR DELETION]).
with
.delay.
Besides sed an awk are there any other command line tools included in the OS that are better for the task?
Status
So far I have the following regular expression:
.delay\(*.*\)\.
I would like to know how to match only the expression ending on the first closing parenthesis? And avoid replacing:
.delay([ANY OPTIONAL TEXT FOR DELETION]).sometext(param)
Thanks in advance!
If you need to find and replace text in files - sed seems to be the best command line solution.
Search for a string in the text file and replace:
sed -i 's/PATTERN/REPLACEMENT/' file.name
Or, if you need to process multiple occurencies of PATTERN in file, add g key
sed -i 's/PATTERN/REPLACEMENT/g' file.name
For multiple files processing - redirect list of files to sed:
echo "${filesList}" | xargs sed -i ...
You can use find to generate your list of files, and xargs to run sed over the result:
find . -type f -print | xargs sed -i 's/\.delay.*/.delay./'
find will generate a list of files contained in your current directory (., although you can of course pass a different directory), xargs will read that list and then run sed with the list of files as an argument.
Instead of find, which here generates a list of all files, you could use something like grep to generate a list of files that contain a specific term. E.g.:
grep -rl '\.delay' | xargs sed -i ...
For the part of the question where you want to only match and replace until the first ) and not include a second pair of (), here is how to change your regex:
.delay\(*.*\)\.
->
\.delay\([^\)]*\)
I.e. match "actual dot, delay, brace open, everything but brace close and brace close".
E.g. using sed:
>echo .delay([ANY OPTIONAL TEXT FOR DELETION]).sometext(param) | sed -E "s/\.delay\([^\)]*\)/.delay/"
.delay.sometext(param)
I recommend to use grep for finding the right files:
grep -rl --include "*.rb" '\.delay' .
Then feed the list into xargs, as recommended by other answers.
Credits to the other answers for providing a solution for feeding multiple files into sed.

How to specify an "or" in sed

I have a file having data in the following form
<A/Here> <A/There>
<B/SomeMoreDate> <C/SomeOtherDate>
Now I want to delete all the A,B,C from the file in an efficient way. I know I can use sed for one pattern
sed -i 's/A//g' /path/to/filename.
But how do I specify such that sed to contain an or to deletes all the patterns?
The expected output is:
<Here> <There>
<SomeMoreDate> <SomeOtherDate>
You can use sed -i 's/[ABC]//g' /path/to/filename. [ABC] will match either A or B or C. You may find this reference useful.
If you're using GNU sed, you can say:
sed -r 's#(A|B|C)/##g' filename
The following should work otherwise:
sed 's#A/##g;s#B/##g;s#C/##g' filename
Ivaylo Strandjev's answer is correct in that it solves the problem when wanting to match single characters. There is a way though to have or when matching longer strings.
s/\(\(stringA\)\|\(stringB\)\|\(stringC\)\)something/something else/
You can try with somehting like:
echo stringBsomething | sed -e 's/\(stringA\|stringB\|stringC\)something/something else/'
It is sad that sed requires all these backslashes. Some if this is avoided if you use -r.
sed "s/<[ABC]\//</g" /path/to/filename
because it is a special case of 1 char in length changing in the pattern. This is not a real OR
you can use this workaround on limited to POSIX sed
Sample for test purpose
echo "<Pat1/ is pattern 2> <pat2/ is pattern 2>
<pAt3/ is pattern 3>
<pat4/ is pattern 4> but not avalaible for Pat1/ nor <pat2
" | \
The sed part
sed 's/²/²o/g
t myor
:myor
s/<Pat1\//²p/g;t treat
s/<pat2\//²p/g;t treat
s/<pAt3\//²p/g;t treat
b continu
: treat
s/²p/</g
t myor
: continu
s/²o/²/g
'
This use a temporary char as generic pattern "²" and a series of s/ followed by a test branch as OR functionality

How to rename a bunch of files with a specific pattern

I want to rename the files in a directory which are named with this pattern:
string1-number.html
for example:
English-5.html
what I want to do is to rename the files like this:
string2-number.string3
for example:
Dictionary-5.en
How can I do this?
I used this script, but nothing happened:
echo "English-5.html" | sed 's%\({English}\).\(\.*\)\(html\)%dictionary\2\en%'
I would suggest using the mmv tool: http://linux.dsplabs.com.au/mmv-copy-append-link-move-multiple-files-under-linux-shell-bash-by-wildcard-patterns-p5/
With that you can do:
mmv *-*.html Dictionary-#2.en
echo "English-5.html" | sed 's%English\(-[0-9][0-9]*.\)html%dictionary\1en%'
Explanation:
I'm looking for English
followed by a dash, one or more numbers, and a dot -[0-9][0-9]*. (I surround this part with escaped parenthesis to make it a group (group 1)).
followed by html
In the replacement text, I use \1 to output the contents of group 1, as well as the changed text.
You have 2 errors: The {...} is not required, and you confused \. and .
\. matches a literal dot, while . matches a single character.
echo "English-5.html" |
sed 's%\(English\)\(.*\)\.\(html\)%dictionary\2.en%'
This answer shows some minor optimizations for sed commands already posted and shows how to actually rename the files (in the current folder):
for f in *; do mv "$f" $(echo "$f" |\
sed 's/^English-\([0-9]\+\)\.html$/dictionary-\1\.en/'); done

Resources