Substitute all characters between two strings by char 'X' using sed - string

In a Bash script, I am trying to in-file replace the characters between two given strings by 'X'. I have bunch of string pair, between which I want the replacement of characters by 'X' should happen.
In the below code, the first string in the pair is declared in cpi_list array. The second string in the pair is always either %26 or & or ENDOFLINE
This is what I am doing.
# list of "first" or "start" string
declare -a cpi_list=('%26Name%3d' '%26Pwd%3d')
# This is the "end" string
myAnd=\%26
newfile="inputlog.txt"
for item in "${cpi_list[#]}";
do
sed -i -e :a -e "s/\($item[X]*\)[^X]\(.*"$myAnd"\)/\1X\2/;ta" $newfile;
done
The input
CPI.%26Name%3dJASON%26Pwd%3dBOTTLE%26Name%3dCOTT
CPI.%26Name%3dVoorhees&machete
I want to make it
CPI.%26Name%3dXXXXX%26Pwd%3dXXXXXX%26Name%3dXXXX
CPI.%26Name%3dXXXXXXXX&machete
PS: The last item need also change %26Name%3dCOTT to %26Name%3dXXXX even though there is no end %26 because I am looking for either %26 as the end point or the END OF THE LINE
But somehow it is not working.

This will work in any awk called from any shell in any UNIX installation:
$ cat tst.awk
BEGIN {
begs = "%26Name%3d|%26Pwd%3d"
ends = "%26|&"
}
{
head = ""
tail = $0
while( match(tail, begs) ) {
tgtStart = RSTART + RLENGTH
tgt = substr(tail,tgtStart)
if ( match(tgt, ends) ) {
tgt = substr(tgt,1,RSTART-1)
}
gsub(/./,"X",tgt)
head = head substr(tail,1,tgtStart-1) tgt
tail = substr(tail,tgtStart+length(tgt))
}
$0 = head tail
print
}
$ cat file
CPI.%26Name%3dJASON%26Pwd%3dBOTTLE%26Name%3dCOTT
CPI.%26Name%3dVoorhees&machete
$ awk -f tst.awk file
CPI.%26Name%3dXXXXX%26Pwd%3dXXXXXX%26Name%3dXXXX
CPI.%26Name%3dXXXXXXXX&machete
Just like with a sed subsitution, any regexp metacharacter in the beg and end strings would need to be escaped or we'd have to use a loop with index()s instead of match() so we'd do string matching instead of regexp matching.

You can avoid %26 doing this:
a='CPI.%26Name%3dJASON%26Pwd%3dBOTTLE%26Name%3dCOTT'
echo "$a" |sed -E ':a;s/(%3dX*)([^%X]|%[013-9a-f][0-9a-f]|%2[0-5789a-f])/\1X/g;ta;'
Note that each encoded character %xx counts for one X.

It is not pretty but you can use perl:
$ s1="CPI.%26Name%3dJASON%26Pwd%3dBOTTLE%26Name%3dCOTT"
$ echo "$s1" | perl -lne 'if (/(?:^.*%26Name%3d)(.*)(?:%26Pwd%3d)(?:.*%26Name%3d)(.*)((?:%26Pwd%3d)|(?:$))/) {
$i1=$-[1];
$l1=$+[1]-$-[1];
$i2=$-[2];
$l2=$+[2]-$-[2];
substr($_, $i1, $l1, "X"x$l1);
substr($_, $i2, $l2, "X"x$l2);
print;
}'
CPI.%26Name%3dXXXXX%26Pwd%3dBOTTLE%26Name%3dXXXX
That is for two pairs like the example. N pairs in a line will be a slight modification.

Related

Find and replace words using sed command not working

I have a a text file which is tab separated, the first column holds the word to be found and the second column holds the word to replace the found word. This text file contains English and Arabic pairs. Once the word is found and replaced it should not be changed again.
For example:
adam a +dam
a b
ال ال+
So for a given text file:
adam played with a ball ال
I expect:
a +dam played with b ball ال+
However, I get:
b +dbm plbyed with b bbll ال+
I am using the following sed command to find and replace:
sed -e 's/^/s%/' -e 's/\t/%/' -e 's/$/%g/' tab_sep_file.txt | sed -f - original_file.txt >replaced.txt
How can I fix this issue
The basic problem to your approach is that you don't want to replace matched text in a prior substitution with a later one - you don't want to change the a's in a +dam to b's. This makes sed a pretty poor choice - you can make a regular expression that matches all of the things you want to replace fairly easily, but picking which replacement to use is an issue.
A way using GNU awk:
gawk -F'\t' '
FNR == NR { subs[$1] = $2; next } # populate the array of substitutions
ENDFILE {
if (FILENAME == ARGV[1]) {
# Build a regular expression of things to substitute
subre = "\\<("
first=0
for (s in subs)
subre = sprintf("%s%s%s", subre, first++ ? "|" : "", s)
subre = sprintf("%s)\\>", subre)
}
}
{
# Do the substitution
nwords = patsplit($0, words, subre, between)
printf "%s", between[0]
for (n = 1; n <= nwords; n++)
printf "%s%s", subs[words[n]], between[n]
printf "\n"
}
' tab_sep_file.txt original_file.txt
which outputs
a +dam played with b ball
First it reads the TSV file and builds an array of words to be replaced and text to replace it with (subs). Then after reading that file, it builds a regular expression to match all possible words to be found - \<(a|adam)\> in this case. The \< and \> match only at the beginning and end, respectively, of words, so the a in ball won't match.
Then for the second file with the text you want to process, it uses patsplit() to split each line into an array of matched parts (words) and the bits between matches (between), and iterates over the length of the array, printing out the replacement text for each match. That way it avoids re-matching text that's already been replaced.
And a perl version that uses a similar approach (Taking advantage of perl's ability to evaluate the replacement text in a s/// substitution):
perl -e '
use strict;
use warnings;
# Set file/standard stream char encodings from locale
use open ":locale";
# Or for explicit UTF-8 text
# use open ":encoding(UTF-8)", ":std";
my %subs;
open my $words, "<", shift or die $!;
while (<$words>) {
chomp;
my ($word, $rep) = split "\t" ,$_, 2;
$subs{$word} = $rep;
}
my $subre = "\\b(?:" . join("|", map { quotemeta } keys %subs) . ")\\b";
while (<<>>) {
print s/$subre/$subs{$&}/egr;
}
' tab_sep_file.txt original_file.txt
(This one will escape regular expression metacharacters in the words to replace, making it more robust)

sed command replace string with string in file path

I want to change some string in file with content in another file
sed -i "s/##END_ALL_VHOST##/r $SERVERROOT/conf/templates/$DOMAIN.conf/g" $SERVERROOT/conf/httpd_config.conf
I need to change string ##END_ALL_VHOST## with content in file httpd_config.conf
Please help :)
Here's a way of doing it. Fancify the cat command as needed.
(pi51 591) $ echo "bar" > /tmp/foo.txt
(pi51 592) $ echo "alpha beta gamma" | sed "s/beta/$(cat /tmp/foo.txt)/"
alpha bar gamma
sed cannot operate on literal strings so it's the wrong tool to use when you want to do just that, as in your case. awk can work with strings so just use that instead:
awk '
BEGIN { old="##END_ALL_VHOST##"; lgth=length(old) }
NR==FNR { new = (NR>1 ? new ORS : "") $0; next }
s = index($0,old) { $0 = substr($0,1,s-1) new substr($0,s+lgth) }
' "$SERVERROOT/conf/templates/$DOMAIN.conf" "$SERVERROOT/conf/httpd_config.conf"
You may need to swap the order of your 2 input files, it wasn't clear from your question.

Parse string using grep, sed or awk

I have a string that looks like this
807001S:S6S11ABB23668732CC1DD1496851208.807262EE7482
I need output like this:
S:S6S11,07001,23668732,1,1496851208,807262,7482
I need the string with the column separated like this:
S:S6 + the next 3 characters;
In this case S:S6S11 this works:
echo 807001S:S6S11ABB23668732CC1DD1496851208.807262EE7482 |
grep -P -o 'F:S6.{1,3}'
Output:
S:S6S11
This gets me close, getting just the numbers
echo 807001S:S6S11ABB23668732CC1DD1496851208.807262EE7482 |
grep -o '[0-9]\+' | tr '\n' ','
Output:
807001,6,11,23668732,1,1496851208,807262,7482,
How can I get S:S6S11 in the beginning of my output and avoid 6,11 after that?
If this can be done better with sed or awk I don't mind.
Edit - clarification of structure
The rest of the string is:
LETTERS NUMBERS
BB 23668732
CC 1
DD 1496851208.807262
EE 7482
I need just the numbers but they have to correspond to the letters.
awk to the rescue!
$ echo "807001S:S6S11ABB23668732CC1DD1496851208.807262EE7482" |
awk '{pre=gensub(".*(S:S6...).*","\\1","g"); ## extract prefix
sub(/./,","); ## replace first char with comma
gsub(/[^0-9]+/,","); ## replace non-numeric values with comma
print pre $0}' ## print prefix and replaced line
S:S6S11,07001,6,11,23668732,1,1496851208,807262,7482
... or sed:
$ echo "807001S:S6S11ABB23668732CC1DD1496851208.807262EE7482" | sed -re 's/^.([0-9]+)(S:S6...)ABB([0-9]+)CC([0-9]+)DD([0-9]+)\.([0-9]+)EE([0-9]*)$/\2,\1,\3,\4,\5,\6,\7/'
S:S6S11,07001,23668732,1,1496851208,807262,7482
That is, if your line format is fixed.
If you use GNU awk, you can simplify the task by defining RS as the desired pattern, e.g.:
parse.awk
BEGIN { RS = "S:S6...|\n" }
# Start of the string
RT != "\n" {
sub(".", ",") # Replace first char by a comma
pst = $0 # Remember the rest of the string
pre = RT # Remember the S:S6 pattern
}
# End of string
RT == "\n" {
gsub("[A-Z.]+", ",") # Replace letters and dots by commas
print pre pst $0 # Print the final result
}
Run e.g. it like this:
s=807001S:S6S11ABB23668732CC1DD1496851208.807262EE7482
gawk -f parse.awk <<<$s
Output:
S:S6S11,07001,23668732,1,1496851208,807262,7482
Here is one way you could do it with sed:
parse.sed
h # Duplicate string to hold space
s/.*(S:S6...).*/\1/ # Extract the desired pattern
x # Swap hold and pattern space
s/S:S6...// # Remove pattern (still in hold space)
s/[A-Z.]+/,/g # Replace letters and dots with commas
s/./,/ # Replace first char with comma
G # Append hold space content
s/([^\n]+)\n(.*)/\2\1/ # Rearrange to match desired output
Run it like this:
s=807001S:S6S11ABB23668732CC1DD1496851208.807262EE7482
sed -Ef parse.sed <<<$s
Output:
S:S6S11,07001,23668732,1,1496851208,807262,7482
It sounds like this MAY be what you're really trying to do:
$ awk -F'[A-Z]{2,}|[.]' -v OFS=',' '{$1=substr($1,7) OFS substr($1,2,5)}1' file
S:S6S11,07001,23668732,1,1496851208,807262,7482
but your requirements for how and what to match where are very unclear and just one sample input line doesn't help much.

Count total number of pattern between two pattern (using sed if possible) in Linux

I have to count all '=' between two pattern i.e '{' and '}'
Sample:
{
100="1";
101="2";
102="3";
};
{
104="1,2,3";
};
{
105="1,2,3";
};
Expected Output:
3
1
1
A very cryptic perl answer:
perl -nE 's/\{(.*?)\}/ say ($1 =~ tr{=}{=}) /ge'
The tr function returns the number of characters transliterated.
With the new requirements, we can make a couple of small changes:
perl -0777 -nE 's/\{(.*?)\}/ say ($1 =~ tr{=}{=}) /ges'
-0777 reads the entire file/stream into a single string
the s flag to the s/// function allows . to handle newlines like a plain character.
Perl to the rescue:
perl -lne '$c = 0; $c += ("$1" =~ tr/=//) while /\{(.*?)\}/g; print $c' < input
-n reads the input line by line
-l adds a newline to each print
/\{(.*?)\}/g is a regular expression. The ? makes the asterisk frugal, i.e. matching the shortest possible string.
The (...) parentheses create a capture group, refered to as $1.
tr is normally used to transliterate (i.e. replace one character by another), but here it just counts the number of equal signs.
+= adds the number to $c.
Awk is here too
grep -o '{[^}]\+}'|awk -v FS='=' '{print NF-1}'
example
echo '{100="1";101="2";102="3";};
{104="1,2,3";};
{105="1,2,3";};'|grep -o '{[^}]\+}'|awk -v FS='=' '{print NF-1}'
output
3
1
1
First some test input (a line with a = outside the curly brackets and inside the content, one without brackets and one with only 2 brackets)
echo '== {100="1";101="2";102="3=3=3=3";} =;
a=b
{c=d}
{}'
Handle line without brackets (put a dummy char so you will not end up with an empty string)
sed -e 's/^[^{]*$/x/'
Handle line without equal sign (put a dummy char so you will not end up with an empty string)
sed -e 's/{[^=]*}/x/'
Remove stuff outside the brackets
sed -e 's/.*{\(.*\)}/\1/'
Remove stuff inside the double quotes (do not count fields there)
sed -e 's/"[^"]*"//g'
Use #repzero method to count equal signs
awk -F "=" '{print NF-1}'
Combine stuff
echo -e '{100="1";101="2";102="3";};\na=b\n{c=d}\n{}' |
sed -e 's/^[^{]*$/x/' -e 's/{[^=]*}/x/' -e 's/.*{\(.*\)}/\1/' -e 's/"[^"]*"//g' |
awk -F "=" '{print NF-1}'
The ugly temp fields x and replacing {} can be solved inside awk:
echo -e '= {100="1";101="2=2=2=2";102="3";};\na=b\n{c=d}\n{}' |
sed -e 's/^[^{]*$//' -e 's/.*{\(.*\)}/\1/' -e 's/"[^"]*"//g' |
awk -F "=" '{if (NF>0) c=NF-1; else c=0; print c}'
or shorter
echo -e '= {100="1";101="2=2=2=2";102="3";};\na=b\n{c=d}\n{}' |
sed -e 's/^[^{]*$//' -e 's/.*{\(.*\)}/\1/' -e 's/"[^"]*"//g' |
awk -F "=" '{print (NF>0) ? NF-1 : 0; }'
No harder sed than done ... in.
Restricting this answer to the environment as tagged, namely:
linux shell unix sed wc
will actually not require the use of wc (or awk, perl, or any other app.).
Though echo is used, a file source can easily exclude its use.
As for bash, it is the shell.
The actual environment used is documented at the end.
NB. Exploitation of GNU specific extensions has been used for brevity
but appropriately annotated to make a more generic implementation.
Also brace bracketed { text } will not include braces in the text.
It is implicit that such braces should be present as {} pairs but
the text src. dangling brace does not directly violate this tenet.
This is a foray into the world of `sed`'ng to gain some fluency in it's use for other purposes.
The ideas expounded upon here are used to cross pollinate another SO problem solution in order
to aquire more familiarity with vetting vagaries of vernacular version variances. Consequently
this pedantic exercice hopefully helps with the pedagogy of others beyond personal edification.
To test easily, at least in the environment noted below, judiciously highlight the appropriate
code section, carefully excluding a dangling pipe |, and then, to a CLI command line interface
drag & drop, copy & paste or use middle click to enter the code.
The other SO problem. linux - Is it possible to do simple arithmetic in sed addresses?
# _______________________________ always needed ________________________________
echo -e '\n
\n = = = {\n } = = = each = is outside the braces
\na\nb\n { } so therefore are not counted
\nc\n { = = = = = = = } while the ones here do count
{\n100="1";\n101="2";\n102="3";\n};
\n {\n104="1,2,3";\n};
a\nb\nc\n {\n105="1,2,3";\n};
{ dangling brace ignored junk = = = \n' |
# _____________ prepatory conditioning needed for final solutions _____________
sed ' s/{/\n{\n/g;
s/}/\n}\n/g; ' | # guarantee but one brace to a line
sed -n '/{/ h; # so sed addressing can "work" here
/{/,/}/ H; # use hHold buffer for only { ... }
/}/ { x; s/[^=]*//g; p } ' | # then make each {} set a line of =
# ____ stop code hi-lite selection in ^--^ here include quote not pipe ____
# ____ outputs the following exclusive of the shell " # " comment quotes _____
#
#
# =======
# ===
# =
# =
# _________________________________________________________________________
# ____________________________ "simple" GNU solution ____________________________
sed -e '/^$/ { s//0/;b }; # handle null data as 0 case: next!
s/=/\n/g; # to easily count an = make it a nl
s/\n$//g; # echo adds an extra nl - delete it
s/.*/echo "&" | sed -n $=/; # sed = command w/ $ counts last nl
e ' # who knew only GNU say you ah phoo
# 0
# 0
# 7
# 3
# 1
# 1
# _________________________________________________________________________
# ________________________ generic incomplete "solution" ________________________
sed -e '/^$/ { s//echo 0/;b }; # handle null data as 0 case: next!
s/=$//g; # echo adds an extra nl - delete it
s/=/\\\\n/g; # to easily count an = make it a nl
s/.*/echo -e & | sed -n $=/; '
# _______________________________________________________________________________
The paradigm used for the algorithm is instigated by the prolegomena study below.
The idea is to isolate groups of = signs between { } braces for counting.
These are found and each group is put on a separate line with ALL other adorning characters removed.
It is noted that sed can easily "count", actually enumerate, nl or \n line ends via =.
The first "solution" uses these sed commands:
print
branch w/o label starts a new cycle
h/Hold for filling this sed buffer
exchanage to swap the hold and pattern buffers
= to enumerate the current sed input line
substitute s/.../.../; with global flag s/.../.../g;
and most particularly the GNU specific
evaluate (execute can not remember the actual mnemonic but irrelevantly synonymous)
The GNU specific execute command is avoided in the generic code. It does not print the answer but
instead produces code that will print the answer. Run it to observe. To fully automate this, many
mechanisms can be used not the least of which is the sed write command to put these lines in a
shell file to be excuted or even embed the output in bash evaluation parentheses $( ) etc.
Note also that various sed example scripts can "count" and these too can be used efficaciously.
The interested reader can entertain these other pursuits.
prolegomena:
concept from counting # of lines between braces
sed -n '/{/=;/}/=;'
to
sed -n '/}/=;/{/=;' |
sed -n 'h;n;G;s/\n/ - /;
2s/^/ Between sets of {} \n the nl # count is\n /;
2!s/^/ /;
p'
testing "done in":
linuxuser#ubuntu:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.2 LTS
Release: 18.04
Codename: bionic
linuxuser#ubuntu:~$ sed --version -----> sed (GNU sed) 4.4
And for giggles an awk-only alternative:
echo '{
> 100="1";
> 101="2";
> 102="3";
> };
> {
> 104="1,2,3";
> };
> {
> 105="1,2,3";
> };' | awk 'BEGIN{RS="\n};";FS="\n"}{c=gsub(/=/,""); if(NF>2){print c}}'
3
1
1

make a change on the string based on mapping

I have the following string format
str="aaa.[any_1].bbb.[any_2].ccc"
I have the following mapping
map1:
any_1 ==> 1
cny_1 ==> 2
map2
any_2 ==> 1
bny_2 ==> 2
cny_2 ==> 3
What's the best command to execute on the str with taking account the above mapping in order to get
$ command $str
aaa.1.bbb.1.ccc
Turn your map files into sed scripts:
sed 's%^%s/%;s% ==> %/%;s%$%/g%' map?
Apply the resulting script to the input string. You can do it directly by process substitution:
sed 's%^%s/%;s% ==> %/%;s%$%/g%' map? | sed -f- <(echo "$str")
Output:
aaa.[1].bbb.[1].ccc
Update: I now think that I didn't understand the question correctly, and my solution therefore is wrong. I'm leaving it in here because I don't know if parts of this answer will be helpful to your question, but I encourage you to look at the other answers first.
Not sure what you mean. But here's something:
any_1="1"
any_2="2"
str="aaa.${any_1}.bbb.${any_2}.ccc"
echo $str
The curly brackets tell the interpreter where the variable name ends and the normal string resumes. Result:
aaa.1.bbb.2.ccc
You can loop this:
for any_1 in {1..2}; do
for any_2 in {1..3}; do
echo aaa.${any_1}.bbb.${any_2}.ccc
done
done
Here {1..3} represents the numbers 1, 2, and 3. Result
aaa.1.bbb.1.ccc
aaa.1.bbb.2.ccc
aaa.1.bbb.3.ccc
aaa.2.bbb.1.ccc
aaa.2.bbb.2.ccc
aaa.2.bbb.3.ccc
{
echo "${str}"
cat Map1
cat Map2
} | sed -n '1h;1!H;$!d
x
s/[[:space:]]*==>[[:space:]]*/ /g
:a
s/\[\([^]]*\)\]\(.*\)\n\1 \([^[:cntrl:]]*\)/\3\2/
ta
s/\n.*//p'
you could use several mapping, not limited to 2 (even and find to cat every mapping found).
based on fact that alias and value have no space inside (can be adapted if any)
I have upvoted #chw21's answer as it promotes - right tool for the problem scenario. However,
You can devise a perlbased command based on the following.
#!/usr/bin/perl
use strict;
use warnings;
my $text = join '',<DATA>;
my %myMap = (
'any_1' => '1',
'any_2' => '2'
);
$text =~s/\[([^]]+)\]/replace($1)/ge;
print $text;
sub replace {
my ($needle) = #_;
return "\[$needle\]" if ! exists $myMap{ lc $needle};
return $myMap{lc $needle};
}
__DATA__
aaa.[any_1].bbb.[any_2].ccc
Only thing that requires a bit of explanation is may be the regex,it matches text that comes between square brackets and sends the text to replace routine. In replace routine, we get mapped value from map corresponding to its argument.
$ cat tst.awk
BEGIN {
FS=OFS="."
m["any_1"]=1; m["cny_1"]=2
m["any_2"]=1; m["bny_2"]=2; m["cny_2"]=3
for (i in m) map["["i"]"] = m[i]
}
{
for (i=1;i<=NF;i++) {
$i = ($i in map ? map[$i] : $i)
}
print
}
$ awk -f tst.awk <<<'aaa.[any_1].bbb.[any_2].ccc'
aaa.1.bbb.1.ccc

Resources