How to use sed to replace multiple chars in a string? - linux

I want to replace some chars of a string with sed.
I tried the following two approaches, but I need to know if there is a more elegant form to get the same result, without using the pipes or the -e option:
sed 's#a#A#g' test.txt | sed 's#l#23#g' > test2.txt
sed -e 's#a#A#g' -e 's#l#23#g' test.txt > test2.txt

Instead of multiple -e options, you can separate commands with ; in a single argument.
sed 's/a/A/g; s/1/23/g' test.txt > test2.txt
If you're looking for a way to do multiple substitutions in a single command, I don't think there's a way. If they were all single-character replacements you could use a command like y/abc/123, which would replace a with 1, b with 2, and c with 3. But there's no multi-character version of this.

In addition to the answer of Barmar, you might want to use regexp character classes to perform several chars to one specific character substitution.
Here's an example to clarify things, try to run it with and without sed to feel the effect
echo -e 'abc\ndef\nghi\nklm' | sed 's/[adgk]/1/g; s/[behl]/2/g; s/[cfim]/3/g'
P.S. never run example code from strangers outside of safe sandbox

When you have a lot strings for the replacement, you can collect them in a variable.
seds="s/a/A/;"
seds+="s/1/23/;"
echo "That was 1 big party" |
sed ${seds}

Related

Linux sed expression to convert the camelCase keys to underscore strings

I could not get the regex to convert only the key from a key value pair from camel case to underscore sting.
The expressions like sed -E 's/\B[A-Z]/_\U&/g' converts the full value, but I would like to limit the conversion only to the key here.
$ echo UserPoolId="eu-west-1_6K6Q2bT9c" | sed -E 's/\B[A-Z]/_\U&/g'
User_Pool_Id=eu-west-1_6_K6_Q2b_T9c
but i would like to get User_Pool_Id=eu-west-1_6K6Q2bT9c
With GNU awk for the 3rd arg to match() and gensub():
$ echo 'UserPoolId="eu-west-1_6K6Q2bT9c"' |
awk 'match($0,/([^=]+=)"(.*)"/,a) { $0=gensub(/([[:lower:]])([[:upper:]])/,"\\1_\\2","g",a[1]) a[2]} 1'
User_Pool_Id=eu-west-1_6K6Q2bT9c
I don't know if it's what you'd want for this case or not but anyway:
$ echo 'UserPoolID="eu-west-1_6K6Q2bT9c"' |
awk 'match($0,/([^=]+=)"(.*)"/,a) { $0=gensub(/([[:lower:]])([[:upper:]])/,"\\1_\\2","g",a[1]) a[2]} 1'
User_Pool_ID=eu-west-1_6K6Q2bT9c
Note that ID remains as _ID and isn't converted to _I_D.
If you have only one = sign and you want to modify the camel case before the = sign, with GNU sed you can iterate until all substitutions are done:
echo UserPoolId="eu-west-1_6K6Q2bT9c" | sed -E ':a;s/([a-z])([A-Z].*=.*)/\1_\2/;ta'
User_Pool_Id=eu-west-1_6K6Q2bT9c
:a sets label a, ta branches to label a if the previous s command substituted something. The s command in the loop inserts a _ between a lower case and an upper case before the equal sign.
In your example this will first insert a _ between User and Pool, and then between Pool and Id.
Doing this in sed is somewhat challenging because you need a more complex regex and a more complex script. Perhaps a better solution would be to use the shell's substitution facilities to isolate the part you want to operate on.
string='UserPoolId="eu-west-1_6K6Q2bT9c"'
prefix=${string%%=*}
suffix=${string#"$prefix"}
sed -E -e 's/\B[A-Z]/_\U&/g' -e "s/\$/$suffix/" <<<"$prefix"
Bash also has built-in parameter expansion to convert the first character of a string to upper case, but perhaps this is sufficient to solve your immediate problem.
This might work for you (GNU sed):
sed 's/=/&\n/;h;s/\B[[:upper:]]/_&/g;G;s/\n.*\n//' file
Introduce a newline after the = and copy the result to the hold space.
Insert underscores in the required places.
Append the copy to the current line and remove the middle, leaving the answer.

Find line starts with and replace in linux using sed [duplicate]

This question already has answers here:
Replace whole line when match found with sed
(4 answers)
Closed 4 years ago.
How do I find line starts with and replace complete line?
File output:
xyz
abc
/dev/linux-test1/
Code:
output=/dev/sda/windows
sed 's/^/dev/linux*/$output/g' file.txt
I am getting below Error:
sed: -e expression #1, char 9: unknown option to `s'
File Output expected after replacement:
xyz
abc
/dev/sda/windows
Let's take this in small steps.
First we try changing "dev" to "other":
sed 's/dev/other/' file.txt
/other/linux-test1/
(Omitting the other lines.) So far, so good. Now "/dev/" => "/other/":
sed 's//dev///other//' file.txt
sed: 1: "s//dev///other//": bad flag in substitute command: '/'
Ah, it's confused, we're using '/' as both a command delimiter and literal text. So we use a different delimiter, like '|':
sed 's|/dev/|/other/|' file.txt
/other/linux-test1/
Good. Now we try to replace the whole line:
sed 's|^/dev/linux*|/other/|' file.txt
/other/-test1/
It didn't replace the whole line... Ah, in sed, '*' means the previous character repeated any number of times. So we precede it with '.', which means any character:
sed 's|^/dev/linux.*|/other/|' file.txt
/other/
Now to introduce the variable:
sed 's|^/dev/linux.*|$output|' file.txt
$output
The shell didn't expand the variable, because of the single quotes. We change to double quotes:
sed "s|^/dev/linux.*|$output|" file.txt
/dev/sda/windows
This might work for you (GNU sed):
output="/dev/sda/windows"; sed -i '\#/dev/linux.*/#c'"$output" file
Set the shell variable and change the line addressed by /dev/linux.*/ to it.
N.B. The shell variable needs to interpolated hence the ; i.e. the variable may be set on a line on its own. Also the the delimiter for the sed address must be changed so as not to interfere with the address, hence \#...#, and finally the shell variable should be enclosed in double quotes to allow full interpolation.
I'd recommend not doing it this way. Here's why.
Sed is not a programming language. It's a stream editor with some constructs that look and behave like a language, but it offers very little in the way of arbitrary string manipulation, format control, etc.
Sed only takes data from a file or stdin (also a file). Embedding strings within your sed script is asking for errors -- constructs like s/re/$output/ are destined to fail at some point, almost regardless of what workarounds you build into your sed script. The best solutions for making sed commands like this work is to do your input sanitization OUTSIDE of sed.
Which brings me to ... this may be the wrong tool for this job, or might be only one component of the toolset for the job.
The error you're getting is obviously because the sed command you're using is horribly busted. The substitute command is:
s/pattern/replacement/flags
but the command you're running is:
s/^/dev/linux*/$output/g
The pattern you're searching for is ^, the null at the beginning of the line. Your replacement pattern is dev, then you have a bunch of text that might be interpreted as flags. This plainly doesn't work, when your search string contains the same character that you're using as a delimiter to the options for the substitute command.
In regular expressions and in sed, you can escape things. You while you might get some traction with s/^\/dev\/linux.*/$output/, you'd still run into difficulty if $output contained slashes. If you're feeding this script to sed from bash, you could use ${output//\//\\\/}, but you can't handle those escapes within sed itself. Sed has no variables.
In a proper programming language, you'd have better separation of variable content and the commands used for the substitution.
output="/dev/sda/windows"
awk -v output="$output" '$1~/\/dev\/linux/ { $0=output } 1' file.txt
Note that I've used $1 here because in your question, your input lines (and output) appear to have a space at the beginning of each line. Awk automatically trims leading and trailing space when assigning field (positional) variables.
Or you could even do this in pure bash, using no external tools:
output="/dev/sda/windows"
while read -r line; do
[[ "$line" =~ ^/dev/linux ]] && line="$output"
printf '%s\n' "$line"
done < file.txt
This one isn't resilient in the face of leading whitespace. Salt to taste.
So .. yes, you can do this with sed. But the way commands get put together in sed makes something like this risky, and despite the available workarounds like switching your substitution command delimiter to another character, you'd almost certainly be better off using other tools.

Select lines between two patterns using variables inside SED command

I'm new to shell scripting. My requirement is to retrieve lines between two pattern, its working fine if I run it from the terminal without using variables inside sed cmd. But the problem arises when I put all those below cmd in a file and tried to execute it.
#!/bin/sh
word="ajp-qdcls2228.us.qdx.com%2F156.30.35.204-8009-34"
upto="2017-01-03 23:00"
fileC=`cat test.log`
output=`echo $fileC | sed -e "n/\$word/$upto/p"`
printf '%s\n' "$output"
If I use the below cmd in the terminal it works fine
sed -n '/ajp-qdcls2228.us.qdx.com%2F156.30.35.204-8009-34/,/2017-01-03 23:00/ p' test.log
Please suggest a workaround.
If we put aside for a moment the fact you shouldn't cat a file to a variable and then echo it for sed filtering, the reason why your command is not working is because you're not quoting the file content variable, fileC when echoing. This will munge together multiple whitespace characters and turn them into a single space. So, you're losing newlines from the file, as well as multiple spaces, tabs, etc.
To fix it, you can write:
fileC=$(cat test.log)
output=$(echo "$fileC" | sed -n "/$word/,/$upto/p")
Note the double-quotes around fileC (and a fixed sed expression, similar to your second example). Without the quotes (try echo $fileC), your fileC is expanded (with the default IFS) into a series of words, each being one argument to echo, and echo will just print those words separated with a single space. Additionally, if the file contains some of the globbing characters (like *), those patterns are also expanded. This is a common bash pitfall.
Much better would be to write it like this:
output=$(sed -n "/$word/,/$upto/p" test.log)
And if your patterns include some of the sed metacharacters, you should really escape them before using with sed, like this:
escape() {
sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$1";
}
output=$(sed -n "/$(escape "$word")/,/$(escape "$upto")/ p" test.log)
The correct approach will be something like:
word="ajp-qdcls2228.us.qdx.com%2F156.30.35.204-8009-34"
upto="2017-01-03 23:00"
awk -v beg="$word" -v end="$upto" '$0==beg{f=1} f{print; if ($0==end) exit}' file
but until we see your sample input and output we can't know for sure what it is you need to match on (full lines, partial lines, all text on one line, etc.) or what you want to print (include delimiters, exclude one, exclude both, etc.).

How to specify an "or" in sed

I have a file having data in the following form
<A/Here> <A/There>
<B/SomeMoreDate> <C/SomeOtherDate>
Now I want to delete all the A,B,C from the file in an efficient way. I know I can use sed for one pattern
sed -i 's/A//g' /path/to/filename.
But how do I specify such that sed to contain an or to deletes all the patterns?
The expected output is:
<Here> <There>
<SomeMoreDate> <SomeOtherDate>
You can use sed -i 's/[ABC]//g' /path/to/filename. [ABC] will match either A or B or C. You may find this reference useful.
If you're using GNU sed, you can say:
sed -r 's#(A|B|C)/##g' filename
The following should work otherwise:
sed 's#A/##g;s#B/##g;s#C/##g' filename
Ivaylo Strandjev's answer is correct in that it solves the problem when wanting to match single characters. There is a way though to have or when matching longer strings.
s/\(\(stringA\)\|\(stringB\)\|\(stringC\)\)something/something else/
You can try with somehting like:
echo stringBsomething | sed -e 's/\(stringA\|stringB\|stringC\)something/something else/'
It is sad that sed requires all these backslashes. Some if this is avoided if you use -r.
sed "s/<[ABC]\//</g" /path/to/filename
because it is a special case of 1 char in length changing in the pattern. This is not a real OR
you can use this workaround on limited to POSIX sed
Sample for test purpose
echo "<Pat1/ is pattern 2> <pat2/ is pattern 2>
<pAt3/ is pattern 3>
<pat4/ is pattern 4> but not avalaible for Pat1/ nor <pat2
" | \
The sed part
sed 's/²/²o/g
t myor
:myor
s/<Pat1\//²p/g;t treat
s/<pat2\//²p/g;t treat
s/<pAt3\//²p/g;t treat
b continu
: treat
s/²p/</g
t myor
: continu
s/²o/²/g
'
This use a temporary char as generic pattern "²" and a series of s/ followed by a test branch as OR functionality

Replacing a line in a csv file?

I have a set of 10 CSV files, which normally have a an entry of this kind
a,b,c,d
d,e,f,g
Now due to some error entries in this file have become of this kind
a,b,c,d
d,e,f,g
,,,
h,i,j,k
Now I want to remove the line with only commas in all the files. These files are on a Linux filesystem.
Any command that you recommend that can replaces the erroneous lines in all the files.
It depends on what you mean by replace. If you mean 'remove', then a trivial variant on #wnoise's solution is:
grep -v '^,,,$' old-file.csv > new-file.csv
Note that this deletes just those lines with exactly three commas. If you want to delete mal-formed lines with any number of commas (including zero) - and no other characters on the line, then:
grep -v '^,*$' ...
There are endless other variations on the regex that would deal with other scenarios. Dealing with full CSV data with commas inside quotes starts to need something other than a regex machine. It can be done, within broad limits, especially in more complex regex systems such as PCRE or Perl. But it requires more work.
Check out Mastering Regular Expressions.
sed 's/,,,/replacement/' < old-file.csv > new-file.csv
optionally followed by
mv new-file.csv old-file.csv
Replace or remove, your post is not clear... For replacement see wnoise's answer. For removing, you could use
awk '$0 !~ /,,,/ {print}' <old-file.csv > new-file.csv
What about trying to keep only lines which are matching the desired format instead of handling one exception ?
If the provided input is what you really want to match:
grep -E '[a-z],[a-z],[a-z],[a-z]' < oldfile.csv > newfile.csv
If the input is different, provide it, the regular expression should not be too hard to write.
Do you want to replace them with something, or delete them entirely? Either way, it can be done with sed. To delete:
sed -i -e '/^,\+$/ D' yourfile1.csv yourfile2.csv ...
To replace: well, see wnoise's answer, or if you don't want to create new files with the output,
sed -i -e '/^,\+$/ s//replacement/' yourfile1.csv yourfile2.csv ...
or
sed -i -e '/^,\+$/ c\
replacement' yourfile1.csv yourfile2.csv ...
(that should be entered exactly as is, including the line break). Of course, you can also do this with awk or perl or, if you're only deleting lines, even grep:
egrep -v '^,+$' < oldfile.csv > newfile.csv
I tested these to make sure they work, but I'd advise you to do the same before using them (just in case). You can omit the -i option from sed, in which case it'll print out the results (rather than writing them back to the file), or omit the output redirection >newfile.csv from grep.
EDIT: It was pointed out in a comment that some features of these sed commands only work on GNU sed. As far as I can tell, these are the -i option (which can be replaced with shell redirection, sed ... <infile >outfile ) and the \+ modifier (which can be replaced with \{1,\} ).
Most simply:
$ grep -v ,,,, oldfile > newfile
$ mv newfile oldfile
yes, awk or grep are very good option if you are working in linux platform. However you can use perl regex for other platform. using join & split options.

Resources