shell scripting for token replacement in all files in a folder - linux

HI
I am not very good with linux shell scripting.I am trying following shell script to replace
revision number token $rev -<rev number> in all html files under specified directory
cd /home/myapp/test
set repUpRev = "`svnversion`"
echo $repUpRev
grep -lr -e '\$rev -'.$repUpRev.'\$' *.html | xargs sed -i 's/'\$rev -'.$repUpRev.'\$'/'\$rev -.*$'/g'
This seems not working, what is wrong with the above code ?

rev=$(svnversion)
sed -i.bak "s/$rev/some other string/g" *.html

What is $rev in the regexp string? Is it another variable? Or you're looking for a string '$rev'. If latter - I would suggest adding '\' before $ otherwise it's treated as a special regexp character...

This is how you show the last line:
grep -lr -e '\$rev -'.$repUpRev.'\$' *.html | xargs sed -i 's/'\$rev -'.$repUpRev.'\$'/'\$rev -.*$'/g'
It would help if you showed some input data.
The -r option makes the grep recursive. That means it will operate on files in the directory and its subdirectories. Is that what you intend?
The dots in your grep and sed stand for any character. If you want literal dots, you'll need to escape them.
The final escaped dollar sign in the grep and sed commands will be seen as a literal dollar sign. If you want to anchor to the end of the line you should remove the escape.
The .* works only as a literal string on the right hand side of a sed s command. If you want to include what was matched on the left side, you need to use capture groups. The g modifier on the s command is only needed if the pattern appears more than once in a line.
Using quote, unquote, quote, unquote is hard to read. Use double quotes to permit variable expansion.
Try your grep command by itself without the xargs and sed to see if it's producing a list of files.
This may be closer to what you want:
grep -lr -e "\$rev -.$repUpRev.$" *.html | xargs sed -i "s/\$rev -.$repUpRev.$/\$rev -REPLACEMENT_TEXT/g"
but you'll still need to determine if the g modifier, the dots, the final dollar signs, etc., are what you intend.

Related

A good way to use sed to find and replace characters with 2 delimiters

I trying to find and replace items using bash. I was able to use sed to grab out some of the characters, but I think I might be using it in the wrong matter.
I am basically trying to remove the characters after ";" and before "," including removing ","
sed -e 's/\(;\).*\(,\)/\1\2/'
That is what I used to replace it with nothing. However, it ends up replacing everything in the middle so my output came out like this:
cmd2="BMC,./socflash_x64 if=B600G3_BMC_V0207.ima;,reboot -f"
This is the original text of what I need to replace
cmd2="BMC,./socflash_x64 if=B600G3_BMC_V0207.ima;X,sleep 120;after_BMC,./run-after-bmc-update.sh;hba_fw,./hba_fw.sh;X,sleep 5;DB,2;X,reboot -f"
Is there any way to make it look like this output?
./socflash_x64 if=B600G3_BMC_V0207.ima;sleep 120;./run-after-bmc-update.sh;./hba_fw.sh;sleep 5;reboot -f
Ff there is any way to make this happen other than bash I am fine with any type of language.
Non-greedy search can (mostly) be simulated in programs that don't support it by replacing match-any (dot .) with a negated character class.
Your original command is
sed -e 's/\(;\).*\(,\)/\1\2/'
You want to match everything in between the semi-colon and the comma, but not another comma (non-greedy). Replace .* with [^,]*
sed -e 's/\(;\)[^,]*\(,\)/\1\2/'
You may also want to exclude semi-colons themselves, making the expression
sed -e 's/\(;\)[^,;]*\(,\)/\1\2/'
Note this would treat a string like "asdf;zxcv;1234,qwer" differently, since one would match ;zxcv;1234, and the other would match only ;1234,
In perl:
perl -pe 's/;.*?,/;/g;' -pe 's/^[^,]*,//' foo.txt
will output:
./socflash_x64 if=B600G3_BMC_V0207.ima;sleep 120;./run-after-bmc-update.sh;./hba_fw.sh;sleep 5;2;reboot -f
The .*? is non greedy matching before the comma. The second command is to remove from the beginning to the comma.
Something like:
echo $cmd2 | tr ';' '\n' | cut -d',' -f2- | tr '\n' ';' ; echo
result is:
./socflash_x64 if=B600G3_BMC_V0207.ima;sleep 120;./run-after-bmc-update.sh;./hba_fw.sh;sleep 5;2;reboot -f;
however, I thing your requirements are a few more complex, because 'DB,2' seems a particular case. After "tr" command, insert a "grep" or "grep -v" to include/exclude these cases.

Inverted exclamation and question mark in ISO-8859

I need to replace inverted exclamation and inverted question marks in subtitle files so they display correctly on my TV. The files work correctly in ISO-8859, but I can't remove the marks.
The first solution was to use the command 'sed':
sed s/\¿|¡//g "$FILE"
This works for files in UTF-8, but what would be the right solution for files in ISO-8859?
sed 's/\xBF//g', for example, doesn't work.
In this command, your \ is removed by bash before the argument is passed to sed:
sed s/\¿//g "$FILE"
That doesn't matter, because ¿ is not a bash metacharacter and it does not require quoting. However, if you write this:
sed s/\xBF//g "$FILE"
it won't do what you expect; bash will replace \x with x leaving sed with the command s/xBF//g, which is probably not what you wanted to do.
You must either write:
sed 's/\xBF//g'
or
sed s/\\xBF//g
The command posted will not work, though:
sed s/\¿|¡//g "$FILE"
| is a bash metacharacter, and it must therefore be quoted or escaped. Also, sed uses Basic Regular Expressions (BREs) by default, which means that you must write \| to express alternation. That means that you would have to type:
sed 's/¿\|¡//g' "$FILE"
or
sed s/¿\\\|¡//g "$FILE"

Some help needed on grep

I am trying to find alphanumeric string including these two characters "/+" with at least 30 characters in length.
I have written this code,
grep "[a-zA-Z0-9\/\+]{30,}" tmp.txt
cat tmp.txt
> array('rWmyiJgKT8sFXCmMr639U4nWxcSvVFEur9hNOOvQwF/tpYRqTk9yWV2xPFBAZwAPRVs/s
ddd73ZEjfy+airfy8DtqIqKI9+dd 6hdd7soJ9iG0sGs/ld5f2GHzockoYHfh
+pAzx/t17Crf0T/2+8+reo+MU39lqCr02sAkcC1k/LzyBvSDEtu9N/9NHicr jA3SvDqg5s44DFlaNZ/8BW37fGEf2rk13S/q68OVVyzac7IT7yE7PIL9XZ/6LsmrY
KEsAmN4i/+ym8be3wwn KWGYaIB908+7W98pI6qao3iaZB
3mh7Y/nZm52hyLa37978f+PyOCqUh0Wfx2PL3vglofi0l
QVrOM1pg+mFLEIC88B706UzL4Pss7ouEo+EsrES+/qJq9Y1e/UGvwefOWSL2TJdt
this does not work, Mainly I wanted to have minimum length of the string to be 30
In the syntax of grep, the repetition braces need to be backslashed.
grep -o '[a-zA-Z0-9/+]\{30,\}' file
If you want to constrain the match to lines containing only matches to this pattern, add line-start and line-ending anchors:
grep '^[a-zA-Z0-9/+]\{30,\}$' file
The -o option in the first command line causes grep to only print the matching part, not the entire matching line.
The repetition operator is not directly supported in Basic Regular Expression syntax. Use grep -E to enable Extended Regular Expression syntax, or backslash the braces.
You can use
grep -e "^[a-zA-Z0-9/+]\{30,\}" tmp.txt
grep -e "^[a-zA-Z0-9/+]\{30,\}" tmp.txt
+pAzx/t17Crf0T/2+8+reo+MU39lqCr02sAkcC1k/LzyBvSDEtu9N/9NHicr jA3SvDqg5s44DFlaNZ/8BW37fGEf2rk13S/q68OVVyzac7IT7yE7PIL9XZ/6LsmrY
3mh7Y/nZm52hyLa37978f+PyOCqUh0Wfx2PL3vglofi0l
QVrOM1pg+mFLEIC88B706UzL4Pss7ouEo+EsrES+/qJq9Y1e/UGvwefOWSL2TJdt
man grep
Read up about the difference between between regular and extended patterns. You need the -E option.

How to use sed with a variable that needs to be escaped

I have a file, and I am trying to use bask to replace all the contents of a substring with a path.
I can use the command:
sed -i s/{WORKSPACE}/$MYVARIABLE/g /var/lib/jenkins/jobs/MY-JOB/workspace/config/params.ini
My config/params.ini looks like:
[folders]
folder1 = {WORKSPACE}/subfolder1
folder2 = {WORKSPACE}/subfolder2
however, when $MYVARIABLE is a path, it fails (containing slashes), the sed command fails with:
sed: -e expression #1, char 16: unknown option to `s'
When I run through it manually, I see that the $MYVARIABLE needs to have it's path-slashes escaped. How can I modify my sed command to incorporate an escaped version of $MYVARIABLE?
There's nothing saying you have to use / as your delimiter. sed will use (almost) anything you stick in there. I have a tendency to use |, since that never (rarely?) appears in a path.
sridhar#century:~> export boong=FLEAK
sridhar#century:~> echo $PATH | sed "s|/bin|/$boong|g"
~/FLEAK:/usr/local/FLEAK:/usr/local/sbin:/usr/local/games:/FLEAK:/sbin:/usr/FLEAK:/usr/sbin:/usr/games:/usr/lib/lightdm/lightdm:/home/oracle/app/oracle/product/12.1.0/server_1/FLEAK
sridhar#century:~>
Using double-quotes will allow the shell to do the variable-substitution.
Just escape the $ sign, and use a different delimiter:
sed -i 's;{WORKSPACE};\$MYVARIABLE;g' your_file

Bash script to remove 'x' amount of characters the end of multiple filenames in a directory?

I have a list of file names in a directory (/path/to/local). I would like to remove a certain number of characters from all of those filenames.
Example filenames:
iso1111_plane001_00321.moc1
iso1111_plane002_00321.moc1
iso2222_plane001_00123.moc1
In every filename I wish to remove the last 5 characters before the file extension.
For example:
iso1111_plane001_.moc1
iso1111_plane002_.moc1
iso2222_plane001_.moc1
I believe this can be done using sed, but I cannot determine the exact coding. Something like...
for filename in /path/to/local/*.moc1; do
mv $filname $(echo $filename | sed -e 's/.....^//');
done
...but that does not work. Sorry if I butchered the sed options, I do not have much experience with it.
mv $filname $(echo $filename | sed -e 's/.....\.moc1$//');
or
echo ${filename%%?????.moc1}.moc1
%% is a bash internal operator...
This sed command will work for all the examples you gave.
sed -e 's/\(.*\)_.*\.moc1/\1_.moc1/'
However, if you just want to specifically "remove 5 characters before the last extension in a filename" this command is what you want:
sed -e 's/\(.*\)[0-9a-zA-Z]\{5\}\.\([^.]*\)/\1.\2/'
You can implement this in your script like so:
for filename in /path/to/local/*.moc1; do
mv $filename "$(echo $filename | sed -e 's/\(.*\)[0-9a-zA-Z]\{5\}\.\([^.]*\)/\1.\2/')";
done
First Command Explanation
The first sed command works by grabbing all characters until the first underscore: \(.*\)_
Then it discards all characters until it finds .moc1: .*\.moc1
Then it replaces the text that it found with everything it grabbed at first inside the parenthesis: /\1
And finally adds the .moc1 extension back on the end and ends the regex: .moc1/
Second Command Explanation
The second sed command works by grabbing all characters at first: \(.*\)
And then it is forced to stop grabbing characters so it can discard five characters, or more specifically, five characters that lie in the ranges 0-9, a-z, and A-Z: [0-9a-zA-Z]\{5\}
Then comes the dot '.' character to mark the last extension : \.
And then it looks for all non-dot characters. This ensures that we are grabbing the last extension: \([^.]*\)
Finally, it replaces all that text with the first and second capture groups, separated by the . character, and ends the regex: /\1.\2/
This might work for you (GNU sed):
sed -r 's/(.*).{5}\./\1./' file

Resources