question of sed replace - linux

i have a config file xml
<tflow name="CENTRE"
inputDTD="/JOBS/cnav/etc/jobReporting/batch/dtd/dtd-ContactCentre.dtd"
inputFile="/JOBS/cnav/etc/jobReporting/import/2010.05.02.CONTACTCENTRE.xml"
logPath="/JOBS/cnav/etc/jobReporting/logs/"
rejectPath="/JOBS/cnav/etc/jobReporting/rejets/"/>
<tflow name="SKILL"
inputDTD="/JOBS/cnav/etc/jobReporting/batch/dtd/dtd-Skill.dtd"
inputFile="/JOBS/cnav/etc/jobReporting/import/2010.05.02.SKILLS.xml"
logPath="/JOBS/cnav/etc/jobReporting/logs/"
rejectPath="/JOBS/cnav/etc/jobReporting/rejets/"/>
my shell is aim to change, by example '2010.05.02.SKILLS.xml' with 'newdate.SKILLS.xml'
currently i think of SED, i wrote:
sed 's/(import\/)(\d{4}.\d{2}.\d{2})/$1$newdate/g' myfile.xml
it doesn't work,i test the pattern with RegExr(a site) which is fine.
is it a problem of synthesis of SED?
thanks.

Most sed implementations don't support extended regexps by default, so you need to use basic regexps. This means you need to put backslashes before your parentheses and braces, and you can't use the \d class. You also have the syntax of backreferences wrong -- it should be \1, not $1$. So, it should look like this:
sed 's/\(import\/\)\([0-9]\{4\}\.[0-9]\{2\}\.[0-9]\{2\}\)/\1newdate/' myfile.xml

So many regex dialects around... I think sed does not understand \d. Besides, use \1 for the bactracking, and use double quotes if $newdate is a shell variable. And try -r to use extended regex.
Try something like this
sed -r "s/(import\/)([0-9]{4}\.[0-9]{2}\.[0-9]{2})/\1$newdate/g" myfile.xml

Depending on your sed implementation, you may have to specify that you are using regex. What OS/version of sed are you using? You may, for example, have to use the -E argument.

Related

Sed: Extracting regex pattern from lines

I have an input stream of many lines which look like this:
path/to/file: example: 'extract_me.proto'
path/to/other-file: example: 'me_too.proto'
path/to/something/else: example: 'and_me_2.proto'
...
I'd like to just extract the *.proto filenames from these lines, and I have tried:
[INPUT] | sed 's/^.*\([a-zA-Z0-9_]+\.proto\).*$/\1/'
I know that part of my problem is that .* is greedy and I'm going to get things like e.proto and o.proto and 2.proto, but I can't even get that far... it just outputs with the same lines as the input. Any help would be greatly appreciated.
I find it helpful to use extended regex for this purpose (-r) in which case you need not escape your brackets.
sed -r 's/^.*[^a-zA-Z0-9_]([a-zA-Z0-9_]+\.proto).*$/\1/'
The addition of [^a-zA-Z0-9_] forces the .* to not be greedy.
Since you tag your command with linux, I'll assume you have GNU grep. Pick one of
grep -oP '\w+\.proto' file
grep -o "[^']+\\.proto" file
one way to do it:
sed 's/^.*[^a-zA-Z0-9_]\([a-zA-Z0-9_]\+\.proto\).*$/\1/'
escaped the + char
put a negation before the alphanum+underscore to delimit the leading chars
another way: use single quote delimitation, after all it's here for that:
sed "s/^.*'\([a-zA-Z0-9_]\+\.proto\)'.*\$/\1/"
Use this sed:
sed "s/^.*'\([a-zA-Z0-9_]\+\.proto\).*$/\1/"
+ - Extended-RegEx. So, you need to escape to get special meaning. The preceding item will be matched one or more times.
Another way:
sed "s/^.*'\([^']\+\.proto\)'.*$/\1/"
With GNU sed:
sed -E "s/.*'([^']+)'$/\1/"

Replace a string using sed in Linux

I want to replace following original string by replace string.
original_str="#22=SI_UNIT(*,*,#5,'','metre');"
replace_str="#22=SI_UNIT(*,*,#5,'','millimetre');"
sed -i "s/$original_str/$replace_str/" ./output/modified.txt
I have tried in different ways using 'sed'. However, it is not working. Does anyone have any idea?
The concept #22 is referenced to the other concept later in the same file. Is this the reason?
Please note that it is working fine for following string in the same bash script:
original_str="#103=CARTESIAN_POINT('P3',0.0,0.0,1.0,#72);"
replace_str="#103=CARTESIAN_POINT('P2',10.0,10.0,10.0,#72);"
sed -i "s/$original_str/$replace_str/" ./output/modified.txt
The concept #103 is not used in later concept in the same file.
Thank you.
You need to escape characters that have a special meaning in sed, which in this case is *:
original_str='#22=SI_UNIT(\*,\*,#5,'','metre');'
replace_str='#22=SI_UNIT(*,*,#5,'','millimetre');'
sed -i "s/$original_str/$replace_str/" ./output/modified.txt
This will work.
* is a regular expression metacharacter which does not match itself (or only does so coincidentally). You need to escape it in the original_str.
sed -i "s/${original_str//\*/\\*}/$replace_str/" ./output/modified.txt
The $(variable//substr/repl} syntax is Bash-specific. In the general case, you will need to escape any regex specials -- [, \, and . -- which is a bit harder to do in a general way in Bash.

Replace string within a file from a bash script

I need to replace within a little bash script a string inside a file but... I am getting weird results.
Let's say I want to replace:
<tag><![CDATA[text]]></tag>
With:
<tag><![CDATA[replaced_text]]></tag>
Should I use sed? I think due to / and [ ] I am getting weird results...
What would be the best way of approaching this?
Perl with -p option works almost as sed and it has \Q (quote) switch for its regexes:
perl -pe 's{\Q<tag><![CDATA[text]]></tag>}
{<tag><![CDATA[replaced_text]]></tag>}' YOUR_FILE
And in Perl you can use different punctuation to delimiter your expressions (s{...}{...} in my example).
Yes, you need to escape the brackets, and either escape slashes or use different delimiters.
sed 's,<tag><!\[CDATA\[text\]\]></tag>,<tag><!\[CDATA\[replaced)text\]\]></tag>,'
That said, SGML and XML are not actually any better than HTML when it comes to using regexes; don't expect this to generalize.
This should be enough:
$ echo '<tag><![CDATA[text]]></tag>' | sed 's/\[text\]/\[replaced_text\]/'
<tag><![CDATA[replaced_text]]></tag>
You can also change your / separator inside sed to a different character like ,, | or %.
Just use a delimiter other than /, here I use #:
sed -i 's#<tag><!\[CDATA\[text\]\]></tag>#<tag><![CDATA[replaced_text]]></tag>#g' filename
-i to have sed change the file instead of printing it out.
g is for matching more than once (global).
But do you know the exact string you want to match, both the tag and the text?
For instance, if you want to replace the text in all with your replaced_text:
perl -i -pe 's#(<tag><!\[CDATA\[)(.*?)(\]\]></tag>)#\1replaced_text\3#g' filename
Switched to perl because sed doesn't support non-greedy multipliers (the *?).

Specify multiple possible patterns for a single command

Basically there a few lines which contain a common format, but different wording at the end. The command will work for all of them, but I want to match all possible pattern, thereby needing only 1 line in the script. As an example, I know how to make the script work like so:
/pattern1/ s/asdf/ghjk/g
/pattern2/ s/asdf/ghjk/g
/pattern3/ s/asdf/ghjk/g
Any ideas?
If your patterns are really as similar as in your example, you can use
sed -e '/pattern[1-3]/ s/asdf/ghjk/g'
If the patterns aren't so similar and your sed command supports extended regular expressions, you can use
sed -E -e '/(pattern1|pattern2|pattern3)/ s/asdf/ghjk/g'
# ^^ use extended regular expressions
# for GNU sed, use -r or escape (, |, and ) with \
If your sed command doesn't support extended regular expressions, you might have to turn to awk or perl:
perl -ple '/(pattern1|pattern2|pattern3)/ && s/asdf/ghjk/g'

Using ? with sed

I just want to get the number of a file that may or may not be gzip'd. However, it appears that a regular expression in sed does not support a ?. Here's what I tried:
echo 'file_1.gz'|sed -n 's/.*_\(.*\)\(\.gz\)?/\1/p'
and nothing was returned. Then I added a ? to the string being analyzed:
echo 'file_1.gz?'|sed -n 's/.*_\(.*\)\(\.gz\)?/\1/p'
and got:
1
So, it looks like the ? used in most regex's is not supported in sed, right? Well then, I would just like sed to give a 1 for file_1 and file_1.gz. What's the best way to do that in a bash script if execution time is critical?
The equivalent to x? is \(x\|\).
However, many versions of sed support an option to enable "extended regular expressions" which includes ?. In GNU sed the flag is -r. Note that this also changes unescaped parens to do grouping. eg:
echo 'file_1.gz'|sed -n -r 's/.*_(.*)(\.gz)?/\1/p'
Actually, there's another bug in your regex which is that the greedy .* in the parens is going to swallow up the ".gz" if there is one. sed doesn't have a non-greedy equivalent to * as far as I know, but you can use | to work around this. | in sed (and many other regex implementations) will use the leftmost match that works, so you can do something like this:
echo 'file_1.gz'|sed -r 's/(.*_(.*)\.gz)|(.*_(.*))/\2\4/'
This tries to match with .gz, and only tries without it if that doesn't work. Only one of group 2 or 4 will actually exist (since they are on opposite sides of the same |) so we just concatenate them to get the value we want.
If you're looking for an answer to the specific example given in the question, or why it uses the ? incorrectly (regardless of syntax), see the answer by Laurence Gonsalves.
If you're looking instead for the answer to the general question of why ? doesn't exhibit its special meaning in sed as you might expect:
By default, sed uses the " POSIX basic regular expressions syntax", so the question mark must be escaped as \? to apply its special meaning, otherwise it matches a literal question mark. As an alternative, you can use the -r or --regexp-extended option to use the "extended regular expression syntax", which reverses the meaning of escaped and non-escaped special characters, including ?.
In the words of the GNU sed documentation (view by running 'info sed' on Linux):
The only difference between basic and extended regular expressions is in
the behavior of a few characters: '?', '+', parentheses, and braces
('{}'). While basic regular expressions require these to be escaped if
you want them to behave as special characters, when using extended
regular expressions you must escape them if you want them to match a
literal character.
and the option is explained:
-r
--regexp-extended
Use extended regular expressions rather than basic regular
expressions. Extended regexps are those that `egrep' accepts;
they can be clearer because they usually have less backslashes,
but are a GNU extension and hence scripts that use them are not
portable.
Update
Newer versions of GNU sed now say this:
-E
-r
--regexp-extended
Use extended regular expressions rather than basic regular
expressions. Extended regexps are those that 'egrep' accepts; they
can be clearer because they usually have fewer backslashes.
Historically this was a GNU extension, but the '-E' extension has
since been added to the POSIX standard
(http://austingroupbugs.net/view.php?id=528), so use '-E' for
portability. GNU sed has accepted '-E' as an undocumented option
for years, and *BSD seds have accepted '-E' for years as well, but
scripts that use '-E' might not port to other older systems.
So, if you need to preserve compatibility with ancient GNU sed, stick with -r. But if you prefer better cross-platform portability on more modern systems (e.g. Linux+Mac support), go with -E (but note that there are still some quirks and differences between GNU sed and BSD sed, so you'll have to make sure your scripts are portable in any case).
echo 'file_1.gz'|sed -n 's/.*_\(.*\)\?\(\.gz\)/\1/p'
Works. You have to put the return in the right spot, and you have to escape it.
A function that should return a number that follows the '_' in a filename, regardless of file extension:
realname () {
local n=${$1##*/}
local rn="${n%.*}"
sed 's/^.*\_//g' ${$rn:-$n}
}
You should use awk which is superior to sed when it comes to field grabbing/parsing:
$ awk -F'[._]' '{print $2}' <<<"file_1"
1
$ awk -F'[._]' '{print $2}' <<<"file_1.gz"
1
Alternatively you can just use Bash's parameter expansion like so:
var=file_1.gz;
temp=${var#*_};
file=${temp%.*}
echo $file
Note: works when var=file_1 as well
Part of the solution lies in escaping the question mark or using the -r option.
sed 's/.*_\([^.]*\)\(\.\?[^.]\+\)\?$/\1/'
or
sed -r 's/.*_([^.]*)(\.?[^.]+)?$/\1/'
will work for:
file_1.gz
file_12.txt
file_123
resulting in:
1
12
123
I just realized that could do something very easy:
echo 'file_1.gz'|sed -n 's/.*_\([0-9]*\).*/\1/p'
Notice the [0-9]* instead of a .*. #Laurence Gonsalves's answer made me realize the greediness of my previous post.

Resources