Specify multiple possible patterns for a single command - linux

Basically there a few lines which contain a common format, but different wording at the end. The command will work for all of them, but I want to match all possible pattern, thereby needing only 1 line in the script. As an example, I know how to make the script work like so:
/pattern1/ s/asdf/ghjk/g
/pattern2/ s/asdf/ghjk/g
/pattern3/ s/asdf/ghjk/g
Any ideas?

If your patterns are really as similar as in your example, you can use
sed -e '/pattern[1-3]/ s/asdf/ghjk/g'
If the patterns aren't so similar and your sed command supports extended regular expressions, you can use
sed -E -e '/(pattern1|pattern2|pattern3)/ s/asdf/ghjk/g'
# ^^ use extended regular expressions
# for GNU sed, use -r or escape (, |, and ) with \
If your sed command doesn't support extended regular expressions, you might have to turn to awk or perl:
perl -ple '/(pattern1|pattern2|pattern3)/ && s/asdf/ghjk/g'

Related

Bash to transform string `3.11.0.17.16` into `3.11.0-17-generic`

I'm trying to transform this 3.11.0.17.16 into 3.11.0-17-generic using only bash and unix tools. The 16 in the original string can be anything. I feel like sed is the answer, but I'm not comfortable with its flavor of regex. How would you do this?
Version using awk instead of sed:
echo "3.11.0.17.16" | awk -F. '{printf "%s.%s.%s-%s-generic\n",$1,$2,$3,$4}'
echo "3.11.0.17.16" | sed 's/\.\([0-9][0-9]*\)\.[0-9][0-9]*$/-\1-generic/'
3.11.0-17-generic
This only accepts digits in the final component. If you want to accept arbitrary characters other than . there (you can't allow . or the match will become ambiguous) then write instead
echo "3.11.0.17.gr#wl1x" | sed 's/\.\([0-9][0-9]*\)\.[^.][^.]*$/-\1-generic/'
In a portable sed invocation you are limited to POSIX basic regular expressions, which most importantly means you cannot use +, ?, or |, and ( ) { } are ordinary characters unless \-escaped. Many sed implementations now accept an -E option that brings their regex syntax in line with egrep, but that is not a feature even of the very latest revision of POSIX so you cannot rely on it.
Substring removal using bash parameter expansion and extended globs
shopt -s extglob
version=3.11.0.17.16
version=${version%.+(!(.))}
printf "%s-%s-generic\n" ${version%.+(!(.))} ${version##*.}
3.11.0-17-generic
If you anchor the regex you are trying to match onto the last 3 sets of digits you would get
echo "3.11.0.17.16" | sed 's!\([0-9]*\)\.\([0-9]*\)\.\([0-9]*\)$!\1-\2-generic!'

sed help: matching and replacing a literal "\n" (not the newline)

i have a file which contains several instances of \n.
i would like to replace them with actual newlines, but sed doesn't recognize the \n.
i tried
sed -r -e 's/\n/\n/'
sed -r -e 's/\\n/\n/'
sed -r -e 's/[\n]/\n/'
and many other ways of escaping it.
is sed able to recognize a literal \n? if so, how?
is there another program that can read the file interpreting the \n's as real newlines?
Can you please try this
sed -i 's/\\n/\n/g' input_filename
What exactly works depends on your sed implementation. This is poorly specified in POSIX so you see all kinds of behaviors.
The -r option is also not part of the POSIX standard; but your script doesn't use any of the -r features, so let's just take it out. (For what it's worth, it changes the regex dialect supported in the match expression from POSIX "basic" to "extended" regular expressions; some sed variants have an -E option which does the same thing. In brief, things like capturing parentheses and repeating braces are "extended" features.)
On BSD platforms (including MacOS), you will generally want to backslash the literal newline, like this:
sed 's/\\n/\
/g' file
On some other systems, like Linux (also depending on the precise sed version installed -- some distros use GNU sed, others favor something more traditional, still others let you choose) you might be able to use a literal \n in the replacement string to represent an actual newline character; but again, this is nonstandard and thus not portable.
If you need a properly portable solution, probably go with Awk or (gasp) Perl.
perl -pe 's/\\n/\n/g' file
In case you don't have access to the manuals, the /g flag says to replace every occurrence on a line; the default behavior of the s/// command is to only replace the first match on every line.
awk seems to handle this fine:
echo "test \n more data" | awk '{sub(/\\n/,"**")}1'
test ** more data
Here you need to escape the \ using \\
$ echo "\n" | sed -e 's/[\\][n]/hello/'
sed works one line at a time, so no \n on 1 line only (it's removed by sed at read time into buffer). You should use N, n or H,h to fill the buffer with more than one line, and then \n appears inside. Be careful, ^ and $ are no more end of line but end of string/buffer because of the \n inside.
\n is recognized in the search pattern, not in the replace pattern. Two ways for using it (sample):
sed s/\(\n\)bla/\1blabla\1/
sed s/\nbla/\
blabla\
/
The first uses a \n already inside as back reference (shorter code in replace pattern);
the second use a real newline.
So basically
sed "N
$ s/\(\n\)/\1/g
"
works (but is a bit useless). I imagine that s/\(\n\)\n/\1/g is more like what you want.

My regular expression isn't working in grep

Here's the text of the file I'm working with:
(4 spaces)Hi, everyone
(1 tab)yes
When I run this command - grep '^[[:space:]]+' myfile - it doesn't print anything to stdout.
Why doesn't it match the whitespace in the file?
I'm using GNU grep version 2.9.
There are several different regular expression syntaxes. The default for grep is called basic syntax in the grep documentation.
From man grep(1):
In basic regular expressions the meta-characters
?, +, {, |, (, and ) lose their special meaning; instead
use the backslashed versions \?, \+, \{, \|, \(, and \).
Therefore instead of + you should have typed \+:
grep '^[[:space:]]\+' FILE
If you need more power from your regular expressions, I also encourage you to take a look at Perl regular expression syntax. They are generally considered the most expressive. There is a C library called PCRE which emulates them, and grep links to it. To use them (instead of basic syntax) you can use grep -P.
You could use -E:
grep -E '^[[:space:]]+' FILE
This enables extended regex. Without it you get BREs (basic regex) which have a more simplified syntax. Alternatively you could run egrep instead with the same result.
I found you need to escape the +:
grep '^[[:space:]]\+' FILE
Try grep -P '^\s+' instead, provided you’re using GNU grep. It’s a lot easier to type, and has better regexes.

Sed:Replace a series of dots with one underscore

I want to do some simple string replace in Bash with sed. I am Ubuntu 10.10.
Just see the following code, it is self-explanatory:
name="A%20Google.."
echo $name|sed 's/\%20/_/'|sed 's/\.+/_/'
I want to get A_Google_ but I get A_Google..
The sed 's/\.+/_/' part is obviously wrong.
BTW, sed 's/\%20/_/' and sed 's/%20/_/' both work. Which is better?
sed speaks POSIX basic regular expressions, which don't include + as a metacharacter. Portably, rewrite to use *:
sed 's/\.\.*/_/'
or if all you will ever care about is Linux, you can use various GNU-isms:
sed -r 's/\.\.*/_/' # turn on POSIX EREs (use -E instead of -r on OS X)
sed 's/\.\+/_/' # GNU regexes invert behavior when backslash added/removed
That last example answers your other question: a character which is literal when used as is may take on a special meaning when backslashed, and even though at the moment % doesn't have a special meaning when backslashed, future-proofing means not assuming that \% is safe.
Additional note: you don't need two separate sed commands in the pipeline there.
echo $name | sed -e 's/\%20/_/' -e 's/\.+/_/'
(Also, do you only need to do that once per line, or for all occurrences? You may want the /g modifier.)
The sed command doesn't understand + so you'll have to expand it by hand:
sed 's/\.\.*/_/'
Or tell sed that you want to use extended regexes:
sed -r 's/\.+/_/' # GNU
sed -E 's/\.+/_/' # OSX
Which switch, -r or -E, depends on your sed and it might not even support extended regexes so the portable solution is to use \.\.* in place of \.+. But, since you're on Linux, you should have GNU sed so sed -r should do the trick.

Using ? with sed

I just want to get the number of a file that may or may not be gzip'd. However, it appears that a regular expression in sed does not support a ?. Here's what I tried:
echo 'file_1.gz'|sed -n 's/.*_\(.*\)\(\.gz\)?/\1/p'
and nothing was returned. Then I added a ? to the string being analyzed:
echo 'file_1.gz?'|sed -n 's/.*_\(.*\)\(\.gz\)?/\1/p'
and got:
1
So, it looks like the ? used in most regex's is not supported in sed, right? Well then, I would just like sed to give a 1 for file_1 and file_1.gz. What's the best way to do that in a bash script if execution time is critical?
The equivalent to x? is \(x\|\).
However, many versions of sed support an option to enable "extended regular expressions" which includes ?. In GNU sed the flag is -r. Note that this also changes unescaped parens to do grouping. eg:
echo 'file_1.gz'|sed -n -r 's/.*_(.*)(\.gz)?/\1/p'
Actually, there's another bug in your regex which is that the greedy .* in the parens is going to swallow up the ".gz" if there is one. sed doesn't have a non-greedy equivalent to * as far as I know, but you can use | to work around this. | in sed (and many other regex implementations) will use the leftmost match that works, so you can do something like this:
echo 'file_1.gz'|sed -r 's/(.*_(.*)\.gz)|(.*_(.*))/\2\4/'
This tries to match with .gz, and only tries without it if that doesn't work. Only one of group 2 or 4 will actually exist (since they are on opposite sides of the same |) so we just concatenate them to get the value we want.
If you're looking for an answer to the specific example given in the question, or why it uses the ? incorrectly (regardless of syntax), see the answer by Laurence Gonsalves.
If you're looking instead for the answer to the general question of why ? doesn't exhibit its special meaning in sed as you might expect:
By default, sed uses the " POSIX basic regular expressions syntax", so the question mark must be escaped as \? to apply its special meaning, otherwise it matches a literal question mark. As an alternative, you can use the -r or --regexp-extended option to use the "extended regular expression syntax", which reverses the meaning of escaped and non-escaped special characters, including ?.
In the words of the GNU sed documentation (view by running 'info sed' on Linux):
The only difference between basic and extended regular expressions is in
the behavior of a few characters: '?', '+', parentheses, and braces
('{}'). While basic regular expressions require these to be escaped if
you want them to behave as special characters, when using extended
regular expressions you must escape them if you want them to match a
literal character.
and the option is explained:
-r
--regexp-extended
Use extended regular expressions rather than basic regular
expressions. Extended regexps are those that `egrep' accepts;
they can be clearer because they usually have less backslashes,
but are a GNU extension and hence scripts that use them are not
portable.
Update
Newer versions of GNU sed now say this:
-E
-r
--regexp-extended
Use extended regular expressions rather than basic regular
expressions. Extended regexps are those that 'egrep' accepts; they
can be clearer because they usually have fewer backslashes.
Historically this was a GNU extension, but the '-E' extension has
since been added to the POSIX standard
(http://austingroupbugs.net/view.php?id=528), so use '-E' for
portability. GNU sed has accepted '-E' as an undocumented option
for years, and *BSD seds have accepted '-E' for years as well, but
scripts that use '-E' might not port to other older systems.
So, if you need to preserve compatibility with ancient GNU sed, stick with -r. But if you prefer better cross-platform portability on more modern systems (e.g. Linux+Mac support), go with -E (but note that there are still some quirks and differences between GNU sed and BSD sed, so you'll have to make sure your scripts are portable in any case).
echo 'file_1.gz'|sed -n 's/.*_\(.*\)\?\(\.gz\)/\1/p'
Works. You have to put the return in the right spot, and you have to escape it.
A function that should return a number that follows the '_' in a filename, regardless of file extension:
realname () {
local n=${$1##*/}
local rn="${n%.*}"
sed 's/^.*\_//g' ${$rn:-$n}
}
You should use awk which is superior to sed when it comes to field grabbing/parsing:
$ awk -F'[._]' '{print $2}' <<<"file_1"
1
$ awk -F'[._]' '{print $2}' <<<"file_1.gz"
1
Alternatively you can just use Bash's parameter expansion like so:
var=file_1.gz;
temp=${var#*_};
file=${temp%.*}
echo $file
Note: works when var=file_1 as well
Part of the solution lies in escaping the question mark or using the -r option.
sed 's/.*_\([^.]*\)\(\.\?[^.]\+\)\?$/\1/'
or
sed -r 's/.*_([^.]*)(\.?[^.]+)?$/\1/'
will work for:
file_1.gz
file_12.txt
file_123
resulting in:
1
12
123
I just realized that could do something very easy:
echo 'file_1.gz'|sed -n 's/.*_\([0-9]*\).*/\1/p'
Notice the [0-9]* instead of a .*. #Laurence Gonsalves's answer made me realize the greediness of my previous post.

Resources