Fastest way to extract pattern - linux

What is the fastest way to extract a substring of interest from input such as the following?
MsgTrace(65/26)noop:user=xxx=INBOX:cmd=534
ImapFetchComplete(56/39)user=xxxxxxxxxx
Desired output (i.e., the :-terminated string following the string MsgTrace(65/26) in this example):
noop
I tried the following, but without success:
egrep -i "[a-zA-Z]+\(.*\)[a-z]+:"

grep by default returns the entire line when a match is found on a given input line.
While option -o restricts the output to only that part of the line that the regex matched, that is still not enough in this case, because you want a substring of that match.
However, since you're on Linux, you can use GNU grep's -P option (for support of PCREs, Perl-compatible regular expression), which allows extracting a submatch by way of features such as \K (drop everything matched so far) and (?=...) (a look-ahead assertion that does not contribute to the match):
$ grep -Po "[a-zA-Z]\(.*\)\K[a-z]+(?=:)" <<'EOF'
MsgTrace(65/26)noop:user=xxx=INBOX:cmd=534
ImapFetchComplete(56/39)user=xxxxxxxxxx
EOF
noop # output
Optional background information:
Ed Morton points out (in a since-deleted comment) that GNU grep's man page still calls the -P option "highly experimental" that may "warn of unimplemented features", but the option has been around for years, and in practice I have yet to see a warning or a performance problem - YMMV.
In the case at hand, the above command even outperforms sed and awk solutions - see NeronLeVelu's helpful performance comparison.
The interesting article Ed points to discusses a potential performance problem that can surface with regex engines such as used by grep -P (via the PCRE library), Perl itself, and many other widely used (and mature) regex engines, such as in Python, Ruby, and PHP:
In short: the recursive backtracking algorithm employed by these engines can result in severe performance degradation with "pathological" regexes that string together long sequences of subexpressions with variable-length quantifiers, such as (a longer version of) a?a?a?a?aaaa to match aaaa.
The article argues that backtracking is only truly required when a regex contains backreferences, and that a different, much faster algorithm should be employed in their absence.

You could try this:
$ sed -n 's/[[:alpha:]]*([^)]*)\([[:lower:]]*\):.*/\1/p' file
noop
It's portable to all POSIX seds and doesn't employ PCREs, just BREs, so the regexp matching part at least should be fast.

Little quick and dirty test on a 2469120 lines text of such a sample entry give grep -PO as winner
time sed -n -e 's/^MsgTrace[^)]\{4,\})//;t M' -e 'b' -e ':M' -e 's/:.*//p' YourFile >/dev/null
real 0m7.61s
user 0m7:10s
sys 0m0.13s
time awk -F ':' '/^MsgTrace/{ sub( /.*)/, "", $1); print $1}' YourFile >/dev/null
real 0m17.43s
user 0m16.19s
sys 0m0.17s
time grep -Po "[a-zA-Z]\(.*\)\K[a-z]+(?=:)" YourFile >/dev/null
real 0m6.72s
user 0m6.23s
sys 0m0.11s
time sed -n 's/[[:alpha:]]*([^)]*)\([[:lower:]]*\):.*/\1/p' YourFile >/dev/null
real 0m17.43s
user 0m16.29s
sys 0m0.12s
time grep -Po '(?<=MsgTrace\(65/26\)).*?(?=:)' YourFile >/dev/null
real 0m16.38s
user 0m15.22s
sys 0m0.15s
for #EdMorton question (i redo the same original sed to have compare value in same context of machine load). The exact string is lot faster, i imagine that sed try several combination before selecting which is the longest one for all criteria where a .*l give lot more possibility than pool is full
time sed -n -e 's/^MsgTrace([^)]\{3,\})//;T' -e 's/:.*//p' YourFile >/dev/null
real 0m7.28s
user 0m6.60s
sys 0m0.13s
time sed -n -e 's/^[[:alpha:]]*([^)]\{3,\})//;T' -e 's/:.*//p' YourFile >/dev/null
real 0m10.44s
user 0m9.67s
sys 0m0.14s
time sed -n -e 's/^[[:alpha:]]*([^)]*)//;T' -e 's/:.*//p' YourFile >/dev/null
real 0m10.54s
user 0m9.75s
sys 0m0.11s

Related

grep and cut a specific pattern [duplicate]

Is there a way to make grep output "words" from files that match the search expression?
If I want to find all the instances of, say, "th" in a number of files, I can do:
grep "th" *
but the output will be something like (bold is by me);
some-text-file : the cat sat on the mat
some-other-text-file : the quick brown fox
yet-another-text-file : i hope this explains it thoroughly
What I want it to output, using the same search, is:
the
the
the
this
thoroughly
Is this possible using grep? Or using another combination of tools?
Try grep -o:
grep -oh "\w*th\w*" *
Edit: matching from Phil's comment.
From the docs:
-h, --no-filename
Suppress the prefixing of file names on output. This is the default
when there is only one file (or only standard input) to search.
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.
Cross distribution safe answer (including windows minGW?)
grep -h "[[:alpha:]]*th[[:alpha:]]*" 'filename' | tr ' ' '\n' | grep -h "[[:alpha:]]*th[[:alpha:]]*"
If you're using older versions of grep (like 2.4.2) which do not include the -o option, then use the above. Else use the simpler to maintain version below.
Linux cross distribution safe answer
grep -oh "[[:alpha:]]*th[[:alpha:]]*" 'filename'
To summarize: -oh outputs the regular expression matches to the file content (and not its filename), just like how you would expect a regular expression to work in vim/etc... What word or regular expression you would be searching for then, is up to you! As long as you remain with POSIX and not perl syntax (refer below)
More from the manual for grep
-o Print each match, but only the match, not the entire line.
-h Never print filename headers (i.e. filenames) with output lines.
-w The expression is searched for as a word (as if surrounded by
`[[:<:]]' and `[[:>:]]';
The reason why the original answer does not work for everyone
The usage of \w varies from platform to platform, as it's an extended "perl" syntax. As such, those grep installations that are limited to work with POSIX character classes use [[:alpha:]] and not its perl equivalent of \w. See the Wikipedia page on regular expression for more
Ultimately, the POSIX answer above will be a lot more reliable regardless of platform (being the original) for grep
As for support of grep without -o option, the first grep outputs the relevant lines, the tr splits the spaces to new lines, the final grep filters only for the respective lines.
(PS: I know most platforms by now would have been patched for \w.... but there are always those that lag behind)
Credit for the "-o" workaround from #AdamRosenfield answer
It's more simple than you think. Try this:
egrep -wo 'th.[a-z]*' filename.txt #### (Case Sensitive)
egrep -iwo 'th.[a-z]*' filename.txt ### (Case Insensitive)
Where,
egrep: Grep will work with extended regular expression.
w : Matches only word/words instead of substring.
o : Display only matched pattern instead of whole line.
i : If u want to ignore case sensitivity.
You could translate spaces to newlines and then grep, e.g.:
cat * | tr ' ' '\n' | grep th
Just awk, no need combination of tools.
# awk '{for(i=1;i<=NF;i++){if($i~/^th/){print $i}}}' file
the
the
the
this
thoroughly
grep command for only matching and perl
grep -o -P 'th.*? ' filename
I was unsatisfied with awk's hard to remember syntax but I liked the idea of using one utility to do this.
It seems like ack (or ack-grep if you use Ubuntu) can do this easily:
# ack-grep -ho "\bth.*?\b" *
the
the
the
this
thoroughly
If you omit the -h flag you get:
# ack-grep -o "\bth.*?\b" *
some-other-text-file
1:the
some-text-file
1:the
the
yet-another-text-file
1:this
thoroughly
As a bonus, you can use the --output flag to do this for more complex searches with just about the easiest syntax I've found:
# echo "bug: 1, id: 5, time: 12/27/2010" > test-file
# ack-grep -ho "bug: (\d*), id: (\d*), time: (.*)" --output '$1, $2, $3' test-file
1, 5, 12/27/2010
cat *-text-file | grep -Eio "th[a-z]+"
You can also try pcregrep. There is also a -w option in grep, but in some cases it doesn't work as expected.
From Wikipedia:
cat fruitlist.txt
apple
apples
pineapple
apple-
apple-fruit
fruit-apple
grep -w apple fruitlist.txt
apple
apple-
apple-fruit
fruit-apple
I had a similar problem, looking for grep/pattern regex and the "matched pattern found" as output.
At the end I used egrep (same regex on grep -e or -G didn't give me the same result of egrep) with the option -o
so, I think that could be something similar to (I'm NOT a regex Master) :
egrep -o "the*|this{1}|thoroughly{1}" filename
To search all the words with start with "icon-" the following command works perfect. I am using Ack here which is similar to grep but with better options and nice formatting.
ack -oh --type=html "\w*icon-\w*" | sort | uniq
You could pipe your grep output into Perl like this:
grep "th" * | perl -n -e'while(/(\w*th\w*)/g) {print "$1\n"}'
grep --color -o -E "Begin.{0,}?End" file.txt
? - Match as few as possible until the End
Tested on macos terminal
$ grep -w
Excerpt from grep man page:
-w: Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character.
ripgrep
Here are the example using ripgrep:
rg -o "(\w+)?th(\w+)?"
It'll match all words matching th.

BASH - How to use sed to pull out the URLS from a website

I have this
exec 5<>/dev/tcp/twitter.ca/80
echo -e "GET / HTTP/1.0\n" >&5
cat <&5
I looked a similar script
curl http://cookpad.com 2>&1 | grep -o -E 'href="([^"#]+)"' | cut -d'"' -f2
but I need to use the sed command only.
the output i get is this
sed: -e expression #1, char 2: extra characters after command
#!/bin/bash
exec 5<>/dev/tcp/twitter.ca/80
echo -e "GET / HTTP/1.0\n" >&5
cat <&5 | sed -r -e 'href="([^"#]+)"'
Is what I currently have and I guess what im trying to do is how to use sed to strip it of all extras and keep it with just the htmls?
my output should be look something like this:
href="UnixFortune.apk"
href="UnixFortune-1.0.tgz"
href="BeagleCar.apk"
href="BeagleCar.zip"
sed is a scripting language. Your command looks like you are trying to use the h command (copy pattern to hold space) with options starting with ref=... but the h command doesn't take any options.
Anyway, the command you want is the s command, which performs substitutions. Namely, you want to substitute everything before and after the matching group with nothing (and thus print only the captured group).
sed -r -e 's/.*href="([^"#]+)".*/\1/'
However, this still doesn't do the right thing if there are multiple matches on a line (or lines without a match, although that is easy to fix with sed -n 's/.../p'). You can certainly solve that in sed, but I would suggest you go with grep -o instead, unless you specifically want to learn, write, and maintain sed script. (Or, alternatively, rewrite into an Awk or Perl script. Perl in particular has a lot more leverage for tasks like this.)
And of course, for this particular task, the proper tool is an HTML parser. There is no way to properly pick apart HTML using just regular expressions. See e.g. How to extract links from a webpage using lxml, XPath and Python?

Is it possible to do simple arithmetic in sed addresses?

Is it possible to do simple arithmetic in sed addresses?
Judging by the "addresses" manual section, the answer seems no. But maybe there is a workaround?
For example, how can I print the second last line of a file? It would be cool something like:
sed -n '$-1 p' file
But it obviously does not work... so I usually have to do multiple sed calls, first for identifying the line, then do the arithmetic using the shell $((expr)) and then finally call sed again. Like this:
sed -n "$(($(sed -n '$ =' file)-1)) p" file
Is there a "better", more compact, more readable way for doing arithmetics with sed addresses?
In a serious moment of procrastination, I decided to write a small script that quickly changes the xterm colorscheme. The idea is that you have the .Xresources a file with a start marker and an end marker:
...
START_MARKER
...
END_MARKER
...
and you want to delete everything that is between the markers, but not the markers themselves. Again, it would be great to do something like:
sed '/START_MARKER/+1,/END_MARKER/-1 d' file
...but you can't!
You're right, one can't directly do math in sed1, even addresses. But you can use some trickery to do what you want:
Second-last row:
$ seq 5 | sed -n -e '${ # On the last line
> g # Replace the buffer with the hold space
> p # and print it
> }
> h' # All lines, store the current line in the hold space.
4
Between START and END:
$ cat test.in
1
START
2
3
END
4
$ cat test.in | sed '/^START$/,/^END$/{
> /^START$/d
> /^END$/d
> p
> }
> d'
2
3
$ cat test.in | sed -n -e '/^START$/,/^END$/!d' -e '/^START/d' -e '/^END$/d' -e p
2
3
I'm using a BSD (mac) sed; on GNU systems you can use ; between lines instead of a newline. Or stick it in a script.
1: Sed is Turing complete, so you can do math, but it's unwieldy at best: http://rosettacode.org/wiki/A%2BB#sed
Yes, I know, UUOC; it's for illustration only
Delete the second last line:
sed ':r;$!{N;br};s/\n[^\n]*\(\n[^\n]*\)$/\1/' file
Delete everything inside markers:
sed ':r;$!{N;br};s/START_MARKER.*END_MARKER/START_MARKER\nEND_MARKER/' file
Far from being elegant, but kinda works.
As it was mentioned in the comments, sed operates on lines. However, you can read another line into the pattern space with N command. The two lines will now both be in the pattern space and will be separated with a \n. sed also has means of execution flow control, namely labels and conditional/unconditional branches. Everything is documented in man sed, also here is a full reference with examples. In the code above r is a label; $!{..} means "everywhere except last line, do ..; N;br reads another line and branches unconditionally to r again. So with :r;$!{N;br} you read all the input into the pattern space and then you operate on it as a single line with \n separating lines of the input.
This might work for you (GNU sed);
sed '$!N;$s/.*\n//;P;D' file
and this works and should be easy to understand:
sed '/start/,/end/!d;//d' file
These are solutions to your questions but as for arithmetic best use awk or perl.
You have some good sed suggestions, here's one based on GNU awk:
awk -v RS='START_MARKER|END_MARKER' 'RT == "END_MARKER"' infile
RS='START_MARKER|END_MARKER' splits input with the markers as separators.
RT is set to the matched separator, when it matches "END" the default block {print $0} is executed.
So for example if you wanted to print all but the last three lines, set FS to \n and apply the appropriate loop:
awk -v RS='START_MARKER|END_MARKER' -v FS='\n' 'RT == "END" { for(i=1; i<NF-3; i++) print $i }' infile
You can use simple method to show second last line of the file.
TOTAL_LENGTH=$(cat file_name | wc -l)
SECOND_LAST_LINE=`expr $TOTAL_LENGTH - 1`
head -$SECOND_LAST_LINE | tail -1
If you want to delete the second last line from the file:
sed -i "$SECOND_LAST_LINE"d file_name
A more comprehensive treatment for doing arithmetic in sedis given in solution #2. An introduction to using sed to `sed' its own script is here.
As the brain pain strain incurred in solution #2 from the quixotic comment demands of too much "hand waving" actually is too much "hand waving" of code, in juxtaposition, this is solution #3:
echo -e 'a\nb\nc\nd\ne' | sed -n '1!G;h;$p' | sed -n 3p
which still uses piping ("But maybe there is a workaround?"), where the numeral 3 must be replaced "by hand" for the desired line from the end of the file ala $-3.
Suppose the sed script is '$-4 p; $-6p; $-8 p;'
echo -e 'a\nb\nc\nd\ne\nf\ng\nh\ni' |
sed -n '1!G;h;$p' |
sed -n '4 p; 6p; 8 p;' |
sed -n '1!G;h;$p'
does the job via
echo '$-4 p; $-6p; $-8 p;' | sed s/$-//
Caveats:
The sed commands must be as simple as print.
The "simple arithmetic" can only be of the form '$-n'.
The arithmetic is not calculated "normally".
A "single" 'sed' command string (a "line" if the previous piping is considered as such) would embed and combine these two commands as outlined in the next answer #2.
The coup de grâce.
Given the perfunctory dismissal of the first answer here is #2:
As this is only the 2nd or 3rd time writing a substantial sed script, serious syntax subtlety (s)circumvention scuppering solutions seemed sufficient: ala
# file prep
echo -e ' a\n b\n c\n d\n e\n f' >test
The following strikeout is not incorrect but after playing and "messing about" with sed with an SO problem over here the sed execute can be simpler w/o IO redirection if run from the pattern buffer to get the file length line count $ via:
sed -e '1{h; s/.*/sed -n "$=" test /e' -e 'p;x}; ${p;x;}' test
The $= enumeration is held in the hold buffer from the get go and printed again at the end.
# get "sed -n $= test" command output into sed script
sed -n '1esed -n "$=" test >sedr' test
# see where this is headed? so far "sed -n ... test" is irrelevant
# a pedantic "sed" only solution would keep it this way with
# all the required "sed"'ng as part of an 'e' command or '$e'
# where the 'sedr' file is itself "sed"'d ultimately to a final
# command 'sed -n /<the calculated line number>/p'
# one could quibble whether '>sedr' io redirection is "pure sed"
# modify 'sedr'with [the sed RPN][1] to get <the calculated line number>
# with judicious use of "sed"'s 'r' command and buffering will
# realize the effective script to compute the desired result
# this is left as an exercise needing perverse persistence with
# a certain amount of masochistic agony
As a hint as to how to proceed; using the technique of solution #3 the sed script $- addresses are now replaced by the $= value and -. So sed is again used to edit its own script.
Parsing the sed script must accurately modify just the $- in addresses only.
Also, to use the RPN calculator the infix arithmetic must have post fixed operators. It is a conventional paradigm in theories of automata and formal languages to convert Polish Notation or its Reverse to Infix and vice versa.
Hopefully, this establishes the answer in the afirmative that it can be done (mais, pas par moi) and in the negative that it is not a trivial exercice (c'est par moi).
Excruciating rationale for an arbitrary solution is at the end.
The environment used for the empirical tests:
linuxuser#ubuntu:~$ sed --version
sed (GNU sed) 4.4
Copyright (C) 2017 Free Software Foundation, Inc.
linuxuser#ubuntu:~$ uname -a
Linux ubuntu 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:00 UTC 2019 i686 i686 i686 GNU/Linux
linuxuser#ubuntu:~$ lsbname -a
lsbname: command not found
linuxuser#ubuntu:~$ apropos lsb
lsb_release (1) - print distribution-specific information
lsblk (8) - list block devices
linuxuser#ubuntu:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.2 LTS
Release: 18.04
Codename: bionic
Solution #1
A technique thinking outside inside the box:
seq 60 | sed -n '$!p' | sed -n '$!p' | sed -n '$!p' | sed -n '$p'
which prints:
57
specifically, for the second last line:
sed -n '$!p' file | sed -n '$p'
More generally, a script can iterate over sed -n '$!p' to "count backwards" from the end of a file.
Well, the answer to:
Is it possible to do simple arithmetic in sed addresses?
rhetorically is, it depends on one's abilities, wishes and desires as well as a realistic assessment of practicality. As well, the implication is that a single sed invocation should be used for this task exclusively. But yes it is possible.
A firm grounding in the studies of Automata, Formal Languages and Recursive Function Theory does not hurt.
As stated in previous answers: Not only can sed do simple arithmetic but also any computable function which includes complex arithmetic. To do so however requires implementing the Primitive Recursive Functions (PRF) (which of course sed does) of Recursive Function Theory (RFT). Of course the finite size of machine architecture does limit the computation without infinite tape resources as a Turing machine proves. In any case not wishing to demonstrate this the precedents are to be found in the sed manual.
Specifically, to do arithmetic (finitely) an RPN calculator:
https://www.gnu.org/software/sed/manual/html_node/Increment-a-number.html#FOOT9
Now then, using such a tool can create a sed script that precomputes the arithmetic that is then embedded in a sed script to print the desired output. A simple demonstration is given by the OP noting now that the shell arithmetic computation can be done using the RPN sed script.
This reduces to a form such as (very crude)
sed '/$(sed RPN($= - 3*4) file)/;p;' file
but still requires feeding sed a sed'd script.
Also, there is arguably the quibbling over the use of bash $() but it could be argued bash is already used to execute the first 'sed' so no harm no foul.
Recognizing that sed implements the PRF or equivalently is Turing complete means that yes, a single invocation of sed is adequate.
The paradigm can therefore do this.
Some commands that could expedite this task are:
e, e command, r, R, w, W
in addition to the usual hold and pattern buffer commands.
The r, R, w, W commands are particularly advantageous as temporary buffer space.
e [command] [3.7 Commands Specific to GNU sed][2]
This command allows one to pipe input from a shell command into
pattern space. Without parameters, the e command executes the
command that is found in pattern space ...
More abstractly, it is completely possible, though highly impractical, to write a sed script to execute the sed paradigm itself that also includes arithmetic calculations even in addresses.
A sed peculiarity. The expression /\n/ will not match any address and matches in pattern space only if a sed command like 'N'ext or s/.*/\n/ introduces one.
Confirmed via:
echo -e '\n\n' | sed -n ' /\n/ {s//hello/;p}'
But
echo -e '\n\n' | sed -n '0,/\n\n\n/ {s//hello/;p}'
outputs 3 blank lines and
echo -e '\n\n' | sed -n '0,/\n/ {s/.*/hello/;p}'
echo -e '\n\n' | sed -n '0,/\n\n\n/ {s/.*/hello/;p}'
each output 3 hello's
hello
hello
hello
while this is well-behaved:
echo -e '\n\n' | sed -n '0,/^$/ {s//hello/;p}'

Sed:Replace a series of dots with one underscore

I want to do some simple string replace in Bash with sed. I am Ubuntu 10.10.
Just see the following code, it is self-explanatory:
name="A%20Google.."
echo $name|sed 's/\%20/_/'|sed 's/\.+/_/'
I want to get A_Google_ but I get A_Google..
The sed 's/\.+/_/' part is obviously wrong.
BTW, sed 's/\%20/_/' and sed 's/%20/_/' both work. Which is better?
sed speaks POSIX basic regular expressions, which don't include + as a metacharacter. Portably, rewrite to use *:
sed 's/\.\.*/_/'
or if all you will ever care about is Linux, you can use various GNU-isms:
sed -r 's/\.\.*/_/' # turn on POSIX EREs (use -E instead of -r on OS X)
sed 's/\.\+/_/' # GNU regexes invert behavior when backslash added/removed
That last example answers your other question: a character which is literal when used as is may take on a special meaning when backslashed, and even though at the moment % doesn't have a special meaning when backslashed, future-proofing means not assuming that \% is safe.
Additional note: you don't need two separate sed commands in the pipeline there.
echo $name | sed -e 's/\%20/_/' -e 's/\.+/_/'
(Also, do you only need to do that once per line, or for all occurrences? You may want the /g modifier.)
The sed command doesn't understand + so you'll have to expand it by hand:
sed 's/\.\.*/_/'
Or tell sed that you want to use extended regexes:
sed -r 's/\.+/_/' # GNU
sed -E 's/\.+/_/' # OSX
Which switch, -r or -E, depends on your sed and it might not even support extended regexes so the portable solution is to use \.\.* in place of \.+. But, since you're on Linux, you should have GNU sed so sed -r should do the trick.

How to stop sed from buffering?

I have a program that writes to fd3 and I want to process that data with grep and sed. Here is how the code looks so far:
exec 3> >(grep "good:"|sed -u "s/.*:\(.*\)/I got: \1/")
echo "bad:data1">&3
echo "good:data2">&3
Nothing is output until I do a
exec 3>&-
Then, everything that I wanted finally arrives as I expected:
I got: data2
It seems to reply immediately if I use only a grep or only a sed, but mixing them seems to cause some sort of buffering. How can I get immediate output from fd3?
I think I found it. For some reason, grep doesn't automatically do line buffering. I added a --line-buffered option to grep and now it responds immediately.
You only need to tell grep and sed to not bufferize lines:
grep --line-buffered
and
sed -u
An alternate means to stop sed from buffering is to run it through the s2p sed-to-Perl translator and insert a directive to have it command-buffered, perhaps like
BEGIN { $| = 1 }
The other reason to do this is that it gives you the more convenient notation from EREs instead of the backslash-annoying legacy BREs. You also get the full complement of Unicode properties, which is often critical.
But you don’t need the translator for such a simple sed command. And you do not need both grep and sed, either. These all work:
perl -nle 'BEGIN{$|=1} if (/good:/) { s/.*:(.*)/I got: $1/; print }'
perl -nle 'BEGIN{$|=1} next unless /good:/; s/.*:(.*)/I got: $1/; print'
perl -nle 'BEGIN{$|=1} next unless /good:/; s/.*:/I got: /; print'
Now you also have access to the minimal quantifier, *?, +?, ??, {N,}?, and {N,M}?. These now allow things like .*? or \S+? or [\p{Pd}.]??, which may well be preferable.
You can merge the grep into the sed like so:
exec 3> >(sed -une '/^good:/s//I got: /p')
echo "bad:data1">&3
echo "good:data2">&3
Unpacking that a bit: You can put a regexp (between slashes as usual) before any sed command, which makes it only be applied to lines that match that regexp. If the first regexp argument to the s command is the empty string (s//whatever/) then it will reuse the last regexp that matched, which in this case is the prefix, so that saves having to repeat yourself. And finally, the -n option tells sed to print only what it is specifically told to print, and the /p suffix on the s command tells it to print the result of the substitution.
The -e option is not strictly necessary but is good style, it just means "the next argument is the sed script, not a filename".
Always put sed scripts in single quotes unless you need to substitute a shell variable in there, and even then I would put everything but the shell variable in single quotes (the shell variable is, of course, double-quoted). You avoid a bunch of backslash-related grief that way.
On a Mac, brew install coreutils and use gstdbuf to control buffering of grep and sed.
Turn off buffering in pipe seems to be the easiest and most generic answer. Using stdbuf (coreutils) :
exec 3> >(stdbuf -oL grep "good:" | sed -u "s/.*:\(.*\)/I got: \1/")
echo "bad:data1">&3
echo "good:data2">&3
I got: data2
Buffering has other dependencies, for example depending on mawk either gawk reading this pipe :
exec 3> >(stdbuf -oL grep "good:" | awk '{ sub(".*:", "I got: "); print }')
In that case, mawk would retain the input, gawk wouldn't.
See also How to fix stdio buffering

Resources