How to specify an "or" in sed - linux

I have a file having data in the following form
<A/Here> <A/There>
<B/SomeMoreDate> <C/SomeOtherDate>
Now I want to delete all the A,B,C from the file in an efficient way. I know I can use sed for one pattern
sed -i 's/A//g' /path/to/filename.
But how do I specify such that sed to contain an or to deletes all the patterns?
The expected output is:
<Here> <There>
<SomeMoreDate> <SomeOtherDate>

You can use sed -i 's/[ABC]//g' /path/to/filename. [ABC] will match either A or B or C. You may find this reference useful.

If you're using GNU sed, you can say:
sed -r 's#(A|B|C)/##g' filename
The following should work otherwise:
sed 's#A/##g;s#B/##g;s#C/##g' filename

Ivaylo Strandjev's answer is correct in that it solves the problem when wanting to match single characters. There is a way though to have or when matching longer strings.
s/\(\(stringA\)\|\(stringB\)\|\(stringC\)\)something/something else/
You can try with somehting like:
echo stringBsomething | sed -e 's/\(stringA\|stringB\|stringC\)something/something else/'
It is sad that sed requires all these backslashes. Some if this is avoided if you use -r.

sed "s/<[ABC]\//</g" /path/to/filename
because it is a special case of 1 char in length changing in the pattern. This is not a real OR
you can use this workaround on limited to POSIX sed
Sample for test purpose
echo "<Pat1/ is pattern 2> <pat2/ is pattern 2>
<pAt3/ is pattern 3>
<pat4/ is pattern 4> but not avalaible for Pat1/ nor <pat2
" | \
The sed part
sed 's/²/²o/g
t myor
:myor
s/<Pat1\//²p/g;t treat
s/<pat2\//²p/g;t treat
s/<pAt3\//²p/g;t treat
b continu
: treat
s/²p/</g
t myor
: continu
s/²o/²/g
'
This use a temporary char as generic pattern "²" and a series of s/ followed by a test branch as OR functionality

Related

Sed: Extracting regex pattern from lines

I have an input stream of many lines which look like this:
path/to/file: example: 'extract_me.proto'
path/to/other-file: example: 'me_too.proto'
path/to/something/else: example: 'and_me_2.proto'
...
I'd like to just extract the *.proto filenames from these lines, and I have tried:
[INPUT] | sed 's/^.*\([a-zA-Z0-9_]+\.proto\).*$/\1/'
I know that part of my problem is that .* is greedy and I'm going to get things like e.proto and o.proto and 2.proto, but I can't even get that far... it just outputs with the same lines as the input. Any help would be greatly appreciated.
I find it helpful to use extended regex for this purpose (-r) in which case you need not escape your brackets.
sed -r 's/^.*[^a-zA-Z0-9_]([a-zA-Z0-9_]+\.proto).*$/\1/'
The addition of [^a-zA-Z0-9_] forces the .* to not be greedy.
Since you tag your command with linux, I'll assume you have GNU grep. Pick one of
grep -oP '\w+\.proto' file
grep -o "[^']+\\.proto" file
one way to do it:
sed 's/^.*[^a-zA-Z0-9_]\([a-zA-Z0-9_]\+\.proto\).*$/\1/'
escaped the + char
put a negation before the alphanum+underscore to delimit the leading chars
another way: use single quote delimitation, after all it's here for that:
sed "s/^.*'\([a-zA-Z0-9_]\+\.proto\)'.*\$/\1/"
Use this sed:
sed "s/^.*'\([a-zA-Z0-9_]\+\.proto\).*$/\1/"
+ - Extended-RegEx. So, you need to escape to get special meaning. The preceding item will be matched one or more times.
Another way:
sed "s/^.*'\([^']\+\.proto\)'.*$/\1/"
With GNU sed:
sed -E "s/.*'([^']+)'$/\1/"

A good way to use sed to find and replace characters with 2 delimiters

I trying to find and replace items using bash. I was able to use sed to grab out some of the characters, but I think I might be using it in the wrong matter.
I am basically trying to remove the characters after ";" and before "," including removing ","
sed -e 's/\(;\).*\(,\)/\1\2/'
That is what I used to replace it with nothing. However, it ends up replacing everything in the middle so my output came out like this:
cmd2="BMC,./socflash_x64 if=B600G3_BMC_V0207.ima;,reboot -f"
This is the original text of what I need to replace
cmd2="BMC,./socflash_x64 if=B600G3_BMC_V0207.ima;X,sleep 120;after_BMC,./run-after-bmc-update.sh;hba_fw,./hba_fw.sh;X,sleep 5;DB,2;X,reboot -f"
Is there any way to make it look like this output?
./socflash_x64 if=B600G3_BMC_V0207.ima;sleep 120;./run-after-bmc-update.sh;./hba_fw.sh;sleep 5;reboot -f
Ff there is any way to make this happen other than bash I am fine with any type of language.
Non-greedy search can (mostly) be simulated in programs that don't support it by replacing match-any (dot .) with a negated character class.
Your original command is
sed -e 's/\(;\).*\(,\)/\1\2/'
You want to match everything in between the semi-colon and the comma, but not another comma (non-greedy). Replace .* with [^,]*
sed -e 's/\(;\)[^,]*\(,\)/\1\2/'
You may also want to exclude semi-colons themselves, making the expression
sed -e 's/\(;\)[^,;]*\(,\)/\1\2/'
Note this would treat a string like "asdf;zxcv;1234,qwer" differently, since one would match ;zxcv;1234, and the other would match only ;1234,
In perl:
perl -pe 's/;.*?,/;/g;' -pe 's/^[^,]*,//' foo.txt
will output:
./socflash_x64 if=B600G3_BMC_V0207.ima;sleep 120;./run-after-bmc-update.sh;./hba_fw.sh;sleep 5;2;reboot -f
The .*? is non greedy matching before the comma. The second command is to remove from the beginning to the comma.
Something like:
echo $cmd2 | tr ';' '\n' | cut -d',' -f2- | tr '\n' ';' ; echo
result is:
./socflash_x64 if=B600G3_BMC_V0207.ima;sleep 120;./run-after-bmc-update.sh;./hba_fw.sh;sleep 5;2;reboot -f;
however, I thing your requirements are a few more complex, because 'DB,2' seems a particular case. After "tr" command, insert a "grep" or "grep -v" to include/exclude these cases.

sed help: matching and replacing a literal "\n" (not the newline)

i have a file which contains several instances of \n.
i would like to replace them with actual newlines, but sed doesn't recognize the \n.
i tried
sed -r -e 's/\n/\n/'
sed -r -e 's/\\n/\n/'
sed -r -e 's/[\n]/\n/'
and many other ways of escaping it.
is sed able to recognize a literal \n? if so, how?
is there another program that can read the file interpreting the \n's as real newlines?
Can you please try this
sed -i 's/\\n/\n/g' input_filename
What exactly works depends on your sed implementation. This is poorly specified in POSIX so you see all kinds of behaviors.
The -r option is also not part of the POSIX standard; but your script doesn't use any of the -r features, so let's just take it out. (For what it's worth, it changes the regex dialect supported in the match expression from POSIX "basic" to "extended" regular expressions; some sed variants have an -E option which does the same thing. In brief, things like capturing parentheses and repeating braces are "extended" features.)
On BSD platforms (including MacOS), you will generally want to backslash the literal newline, like this:
sed 's/\\n/\
/g' file
On some other systems, like Linux (also depending on the precise sed version installed -- some distros use GNU sed, others favor something more traditional, still others let you choose) you might be able to use a literal \n in the replacement string to represent an actual newline character; but again, this is nonstandard and thus not portable.
If you need a properly portable solution, probably go with Awk or (gasp) Perl.
perl -pe 's/\\n/\n/g' file
In case you don't have access to the manuals, the /g flag says to replace every occurrence on a line; the default behavior of the s/// command is to only replace the first match on every line.
awk seems to handle this fine:
echo "test \n more data" | awk '{sub(/\\n/,"**")}1'
test ** more data
Here you need to escape the \ using \\
$ echo "\n" | sed -e 's/[\\][n]/hello/'
sed works one line at a time, so no \n on 1 line only (it's removed by sed at read time into buffer). You should use N, n or H,h to fill the buffer with more than one line, and then \n appears inside. Be careful, ^ and $ are no more end of line but end of string/buffer because of the \n inside.
\n is recognized in the search pattern, not in the replace pattern. Two ways for using it (sample):
sed s/\(\n\)bla/\1blabla\1/
sed s/\nbla/\
blabla\
/
The first uses a \n already inside as back reference (shorter code in replace pattern);
the second use a real newline.
So basically
sed "N
$ s/\(\n\)/\1/g
"
works (but is a bit useless). I imagine that s/\(\n\)\n/\1/g is more like what you want.

Why can s command of sed can be followed by a comma?

I saw someone use an expression like: sed -e 's, *$,,'
does anybody know why we can use it like this, and what does it do?
I thought the s command should be sed -e 'addr,addrs/reg/sub/' ?
From Using different delimiters in sed:
sed takes whatever follows the "s" as the separator
It is a good way to avoid escaping too much. Code is more readable if you use a delimiter that is not present in the string you want to handle.
For example let's say we want to replace lo/bye from a string. With / as delimiter it would be a little messy:
$ echo "hello/bye" | sed 's/lo\/bye/aa/g'
helaa
So if we define another separator it is more clear:
$ echo "hello/bye" | sed 's|lo/bye|aa|g'
helaa
$ echo "hello/bye" | sed 's,lo/bye,aa,g'
helaa

Removing Parts of String With Sed

I have lines of data that looks like this:
sp_A0A342_ATPB_COFAR_6_+_contigs_full.fasta
sp_A0A342_ATPB_COFAR_9_-_contigs_full.fasta
sp_A0A373_RK16_COFAR_10_-_contigs_full.fasta
sp_A0A373_RK16_COFAR_8_+_contigs_full.fasta
sp_A0A4W3_SPEA_GEOSL_15_-_contigs_full.fasta
How can I use sed to delete parts of string after 4th column (_ separated) for each line.
Finally yielding:
sp_A0A342_ATPB_COFAR
sp_A0A342_ATPB_COFAR
sp_A0A373_RK16_COFAR
sp_A0A373_RK16_COFAR
sp_A0A4W3_SPEA_GEOSL
cut is a better fit.
cut -d_ -f 1-4 old_file
This simply means use _ as delimiter, and keep fields 1-4.
If you insist on sed:
sed 's/\(_[^_]*\)\{4\}$//'
This left hand side matches exactly four repetitions of a group, consisting of an underscore followed by 0 or more non-underscores. After that, we must be at the end of the line. This is all replaced by nothing.
sed -e 's/\([^_]*\)_\([^_]*\)_\([^_]*\)_\([^_]*\)_.*/\1_\2_\3_\4' infile > outfile
Match "any number of not '_'", saving what was matched between \( and \), followed by '_'. Do this 4 times, then match anything for the rest of the line (to be ignored). Substitute with each of the matches separated by '_'.
Here's another possibility:
sed -E -e 's|^([^_]+(_[^_]+){3}).*$|\1|'
where -E, like -r in GNU sed, turns on extended regular expressions for readability.
Just because you can do it in sed, though, doesn't mean you should. I like cut much much better for this.
AWK likes to play in the fields:
awk 'BEGIN{FS=OFS="_"}{print $1,$2,$3,$4}' inputfile
or, more generally:
awk -v count=4 'BEGIN{FS="_"}{for(i=1;i<=count;i++){printf "%s%s",sep,$i;sep=FS};printf "\n"}'
sed -e 's/_[0-9][0-9]*_[+-]_contigs_full.fasta$//g'
Still the cut answer is probably faster and just generally better.
Yes, cut is way better, and yes matching the back of each is easier.
I finally got a match using the beginning of each line:
sed -r 's/(([^_]*_){3}([^_]*)).*/\1/' oldFile > newFile

Resources