Bash to transform string `3.11.0.17.16` into `3.11.0-17-generic` - string

I'm trying to transform this 3.11.0.17.16 into 3.11.0-17-generic using only bash and unix tools. The 16 in the original string can be anything. I feel like sed is the answer, but I'm not comfortable with its flavor of regex. How would you do this?

Version using awk instead of sed:
echo "3.11.0.17.16" | awk -F. '{printf "%s.%s.%s-%s-generic\n",$1,$2,$3,$4}'

echo "3.11.0.17.16" | sed 's/\.\([0-9][0-9]*\)\.[0-9][0-9]*$/-\1-generic/'
3.11.0-17-generic
This only accepts digits in the final component. If you want to accept arbitrary characters other than . there (you can't allow . or the match will become ambiguous) then write instead
echo "3.11.0.17.gr#wl1x" | sed 's/\.\([0-9][0-9]*\)\.[^.][^.]*$/-\1-generic/'
In a portable sed invocation you are limited to POSIX basic regular expressions, which most importantly means you cannot use +, ?, or |, and ( ) { } are ordinary characters unless \-escaped. Many sed implementations now accept an -E option that brings their regex syntax in line with egrep, but that is not a feature even of the very latest revision of POSIX so you cannot rely on it.

Substring removal using bash parameter expansion and extended globs
shopt -s extglob
version=3.11.0.17.16
version=${version%.+(!(.))}
printf "%s-%s-generic\n" ${version%.+(!(.))} ${version##*.}
3.11.0-17-generic

If you anchor the regex you are trying to match onto the last 3 sets of digits you would get
echo "3.11.0.17.16" | sed 's!\([0-9]*\)\.\([0-9]*\)\.\([0-9]*\)$!\1-\2-generic!'

Related

Delete _ and - characters using sed

I am trying to convert 2015-06-03_18-05-30 to 20150603180530 using sed.
I have this:
$ var='2015-06-03_18-05-30'
$ echo $var | sed 's/\-\|\_//g'
$ echo $var | sed 's/-|_//g'
None of these are working. Why is the alternation not working?
As long as your script has a #!/bin/bash (or ksh, or zsh) shebang, don't use sed or tr: Your shell can do this built-in without the (comparatively large) overhead of launching any external tool:
var='2015-06-03_18-05-30'
echo "${var//[-_]/}"
That said, if you really want to use sed, the GNU extension -r enables ERE syntax:
$ sed -r -e 's/-|_//g' <<<'2015-06-03_18-05-30'
20150603180530
See http://www.regular-expressions.info/posix.html for a discussion of differences between BRE (default for sed) and ERE. That page notes, in discussing ERE extensions:
Alternation is supported through the usual vertical bar |.
If you want to work on POSIX platforms -- with /bin/sh rather than bash, and no GNU extensions -- then reformulate your regex to use a character class (and, to avoid platform-dependent compatibility issues with echo[1], use printf instead):
printf '%s\n' "$var" | sed 's/[-_]//g'
[1] - See the "APPLICATION USAGE" section of that link, in particular.
Something like this ought to do.
sed 's/[-_]//g'
This reads as:
s: Search
/[-_]/: for any single character matching - or _
//: replace it with nothing
g: and do that for every character in the line
Sed operates on every line by default, so this covers every instance in the file/string.
I know you asked for a solution using sed, but I offer an alternative in tr:
$ var='2015-06-03_18-05-30'
$ echo $var | tr -d '_-'
20150603180530
tr should be a little faster.
Explained:
tr stands for translate and it can be used to replace certain characters with another ones.
-d option stands for delete and it removes the specified characters instead of replacing them.
'_-' specifies the set of characters to be removed (can also be specified as '\-_' but you need to escape the - there because it's considered another option otherwise).
Easy:
sed 's/[-_]//g'
The character class [-_] matches of the characters from the set.
sed 's/[^[:digit:]]//g' YourFile
Could you tell me what failed on echo $var | sed 's/\-\|\_//g', it works here (even if escapping - and _ are not needed and assuming you use a GNU sed due to \| that only work in this enhanced version of sed)

Specify multiple possible patterns for a single command

Basically there a few lines which contain a common format, but different wording at the end. The command will work for all of them, but I want to match all possible pattern, thereby needing only 1 line in the script. As an example, I know how to make the script work like so:
/pattern1/ s/asdf/ghjk/g
/pattern2/ s/asdf/ghjk/g
/pattern3/ s/asdf/ghjk/g
Any ideas?
If your patterns are really as similar as in your example, you can use
sed -e '/pattern[1-3]/ s/asdf/ghjk/g'
If the patterns aren't so similar and your sed command supports extended regular expressions, you can use
sed -E -e '/(pattern1|pattern2|pattern3)/ s/asdf/ghjk/g'
# ^^ use extended regular expressions
# for GNU sed, use -r or escape (, |, and ) with \
If your sed command doesn't support extended regular expressions, you might have to turn to awk or perl:
perl -ple '/(pattern1|pattern2|pattern3)/ && s/asdf/ghjk/g'

Sed:Replace a series of dots with one underscore

I want to do some simple string replace in Bash with sed. I am Ubuntu 10.10.
Just see the following code, it is self-explanatory:
name="A%20Google.."
echo $name|sed 's/\%20/_/'|sed 's/\.+/_/'
I want to get A_Google_ but I get A_Google..
The sed 's/\.+/_/' part is obviously wrong.
BTW, sed 's/\%20/_/' and sed 's/%20/_/' both work. Which is better?
sed speaks POSIX basic regular expressions, which don't include + as a metacharacter. Portably, rewrite to use *:
sed 's/\.\.*/_/'
or if all you will ever care about is Linux, you can use various GNU-isms:
sed -r 's/\.\.*/_/' # turn on POSIX EREs (use -E instead of -r on OS X)
sed 's/\.\+/_/' # GNU regexes invert behavior when backslash added/removed
That last example answers your other question: a character which is literal when used as is may take on a special meaning when backslashed, and even though at the moment % doesn't have a special meaning when backslashed, future-proofing means not assuming that \% is safe.
Additional note: you don't need two separate sed commands in the pipeline there.
echo $name | sed -e 's/\%20/_/' -e 's/\.+/_/'
(Also, do you only need to do that once per line, or for all occurrences? You may want the /g modifier.)
The sed command doesn't understand + so you'll have to expand it by hand:
sed 's/\.\.*/_/'
Or tell sed that you want to use extended regexes:
sed -r 's/\.+/_/' # GNU
sed -E 's/\.+/_/' # OSX
Which switch, -r or -E, depends on your sed and it might not even support extended regexes so the portable solution is to use \.\.* in place of \.+. But, since you're on Linux, you should have GNU sed so sed -r should do the trick.

how do you specify non-capturing groups in sed?

is it possible to specify non-capturing groups in sed?
if so, how?
Parentheses in sed have two functions, grouping, and capturing.
So i'm asking about using parentheses to do the grouping, but without capturing. One might say non-capturing grouping parentheses. (non-capturing parantheses and that aren't literal). What are called non-capturing groups. Like i've seen the syntax (?:regex) for non-capturing groups, but it doesn't work in sed.
Linguistic Note- in the UK, the term brackets is used generally, for "round brackets" or "square brackets". In the UK, brackets usually refers to "( )", since "( )" are so common. And in the UK the term parentheses is hardly used. In the USA the term brackets are specifically "[ ]". So to prevent confusion to anybody in the USA, i've not used the words brackets in the question.
Parentheses can be used for grouping alternatives. For example:
sed 's/a\(bc\|de\)f/X/'
says to replace "abcf" or "adef" with "X", but the parentheses also capture. There is not a facility in sed to do such grouping without also capturing. If you have a complex regex that does both alternative grouping and capturing, you will simply have to be careful in selecting the correct capture group in your replacement.
Perhaps you could say more about what it is you're trying to accomplish (what your need for non-capturing groups is) and why you want to avoid capture groups.
Edit:
There is a type of non-capturing brackets ((?:pattern)) that are part of Perl-Compatible Regular Expressions (PCRE). They are not supported in sed (but are when using grep -P).
The answer, is that as of writing, you can't - sed does not support it.
Non-capturing groups have the syntax of (?:a) and are a PCRE syntax.
Sed supports BRE(Basic regular expressions), aka POSIX BRE, and if using GNU sed, there is the option -r that makes it support ERE(extended regular expressions) aka POSIX ERE, but still not PCRE)
Perl will work, for windows or linux
examples here
https://superuser.com/questions/416419/perl-for-matching-with-regular-expressions-in-terminal
e.g. this from cygwin in windows
$ echo -e 'abcd' | perl -0777 -pe 's/(a)(?:b)(c)(d)/\1/s'
a
$ echo -e 'abcd' | perl -0777 -pe 's/(a)(?:b)(c)(d)/\2/s'
c
There is a program albeit for Windows, which can do search and replace on the command line, and does support PCRE. It's called rxrepl. It's not sed of course, but it does search and replace with PCRE support.
C:\blah\rxrepl>echo abc | rxrepl -s "(a)(b)(c)" -r "\1"
a
C:\blah\rxrepl>echo abc | rxrepl -s "(a)(b)(c)" -r "\3"
c
C:\blah\rxrepl>echo abc | rxrepl -s "(a)(b)(?:c)" -r "\3"
Invalid match group requested.
C:\blah\rxrepl>echo abc | rxrepl -s "(a)(?:b)(c)" -r "\2"
c
C:\blah\rxrepl>
The author(not me), mentioned his program in an answer over here https://superuser.com/questions/339118/regex-replace-from-command-line
It has a really good syntax.
The standard thing to use would be perl, or almost any other programming language that people use.
I'll assume you are speaking of the backrefence syntax, which are parentheses ( ) not brackets [ ]
By default, sed will interpret ( ) literally and not attempt to make a backrefence from them. You will need to escape them to make them special as in \( \) It is only when you use the GNU sed -r option will the escaping be reversed. With sed -r, non escaped ( ) will produce backrefences and escaped \( \) will be treated as literal. Examples to follow:
POSIX sed
$ echo "foo(###)bar" | sed 's/foo(.*)bar/####/'
####
$ echo "foo(###)bar" | sed 's/foo(.*)bar/\1/'
sed: -e expression #1, char 16: invalid reference \1 on `s' command's RHS
-bash: echo: write error: Broken pipe
$ echo "foo(###)bar" | sed 's/foo\(.*\)bar/\1/'
(###)
GNU sed -r
$ echo "foo(###)bar" | sed -r 's/foo(.*)bar/####/'
####
$ echo "foo(###)bar" | sed -r 's/foo(.*)bar/\1/'
(###)
$ echo "foo(###)bar" | sed -r 's/foo\(.*\)bar/\1/'
sed: -e expression #1, char 18: invalid reference \1 on `s' command's RHS
-bash: echo: write error: Broken pipe
Update
From the comments:
Group-only, non-capturing parentheses ( ) so you can use something like intervals {n,m} without creating a backreference \1 don't exist. First, intervals are not apart of POSIX sed, you must use the GNU -r extension to enable them. As soon as you enable -r any grouping parentheses will also be capturing for backreference use. Examples:
$ echo "123.456.789" | sed -r 's/([0-9]{3}\.){2}/###/'
###789
$ echo "123.456.789" | sed -r 's/([0-9]{3}\.){2}/###\1/'
###456.789
As said, it is not possible to have non-capturing groups in sed.
It could be obvious but non-capturing groups are not a necessity(unless running into the back reference limit (e.g. \9).).
One can just use the desired capturing ones and ignore the non-desired ones as if they were non-capturing.
So e.g. of the two capturings here \1 and \2 you can ignore the \1 and just use the \2
$ echo blahblahblahc | sed -r "s/(blah){1,10}(.)/\2/"
c
For reference, nested capturing groups are numbered by the position-order of "(".
E.g.,
echo "apple and bananas and monkeys" | sed -r "s/((apple|banana)s?)/\1x/g"
applex and bananasx and monkeys (note: "s" in bananas, first bigger group)
vs
echo "apple and bananas and monkeys" | sed -r "s/((apple|banana)s?)/\2x/g"
applex and bananax and monkeys (note: no "s" in bananas, second smaller group)

Using ? with sed

I just want to get the number of a file that may or may not be gzip'd. However, it appears that a regular expression in sed does not support a ?. Here's what I tried:
echo 'file_1.gz'|sed -n 's/.*_\(.*\)\(\.gz\)?/\1/p'
and nothing was returned. Then I added a ? to the string being analyzed:
echo 'file_1.gz?'|sed -n 's/.*_\(.*\)\(\.gz\)?/\1/p'
and got:
1
So, it looks like the ? used in most regex's is not supported in sed, right? Well then, I would just like sed to give a 1 for file_1 and file_1.gz. What's the best way to do that in a bash script if execution time is critical?
The equivalent to x? is \(x\|\).
However, many versions of sed support an option to enable "extended regular expressions" which includes ?. In GNU sed the flag is -r. Note that this also changes unescaped parens to do grouping. eg:
echo 'file_1.gz'|sed -n -r 's/.*_(.*)(\.gz)?/\1/p'
Actually, there's another bug in your regex which is that the greedy .* in the parens is going to swallow up the ".gz" if there is one. sed doesn't have a non-greedy equivalent to * as far as I know, but you can use | to work around this. | in sed (and many other regex implementations) will use the leftmost match that works, so you can do something like this:
echo 'file_1.gz'|sed -r 's/(.*_(.*)\.gz)|(.*_(.*))/\2\4/'
This tries to match with .gz, and only tries without it if that doesn't work. Only one of group 2 or 4 will actually exist (since they are on opposite sides of the same |) so we just concatenate them to get the value we want.
If you're looking for an answer to the specific example given in the question, or why it uses the ? incorrectly (regardless of syntax), see the answer by Laurence Gonsalves.
If you're looking instead for the answer to the general question of why ? doesn't exhibit its special meaning in sed as you might expect:
By default, sed uses the " POSIX basic regular expressions syntax", so the question mark must be escaped as \? to apply its special meaning, otherwise it matches a literal question mark. As an alternative, you can use the -r or --regexp-extended option to use the "extended regular expression syntax", which reverses the meaning of escaped and non-escaped special characters, including ?.
In the words of the GNU sed documentation (view by running 'info sed' on Linux):
The only difference between basic and extended regular expressions is in
the behavior of a few characters: '?', '+', parentheses, and braces
('{}'). While basic regular expressions require these to be escaped if
you want them to behave as special characters, when using extended
regular expressions you must escape them if you want them to match a
literal character.
and the option is explained:
-r
--regexp-extended
Use extended regular expressions rather than basic regular
expressions. Extended regexps are those that `egrep' accepts;
they can be clearer because they usually have less backslashes,
but are a GNU extension and hence scripts that use them are not
portable.
Update
Newer versions of GNU sed now say this:
-E
-r
--regexp-extended
Use extended regular expressions rather than basic regular
expressions. Extended regexps are those that 'egrep' accepts; they
can be clearer because they usually have fewer backslashes.
Historically this was a GNU extension, but the '-E' extension has
since been added to the POSIX standard
(http://austingroupbugs.net/view.php?id=528), so use '-E' for
portability. GNU sed has accepted '-E' as an undocumented option
for years, and *BSD seds have accepted '-E' for years as well, but
scripts that use '-E' might not port to other older systems.
So, if you need to preserve compatibility with ancient GNU sed, stick with -r. But if you prefer better cross-platform portability on more modern systems (e.g. Linux+Mac support), go with -E (but note that there are still some quirks and differences between GNU sed and BSD sed, so you'll have to make sure your scripts are portable in any case).
echo 'file_1.gz'|sed -n 's/.*_\(.*\)\?\(\.gz\)/\1/p'
Works. You have to put the return in the right spot, and you have to escape it.
A function that should return a number that follows the '_' in a filename, regardless of file extension:
realname () {
local n=${$1##*/}
local rn="${n%.*}"
sed 's/^.*\_//g' ${$rn:-$n}
}
You should use awk which is superior to sed when it comes to field grabbing/parsing:
$ awk -F'[._]' '{print $2}' <<<"file_1"
1
$ awk -F'[._]' '{print $2}' <<<"file_1.gz"
1
Alternatively you can just use Bash's parameter expansion like so:
var=file_1.gz;
temp=${var#*_};
file=${temp%.*}
echo $file
Note: works when var=file_1 as well
Part of the solution lies in escaping the question mark or using the -r option.
sed 's/.*_\([^.]*\)\(\.\?[^.]\+\)\?$/\1/'
or
sed -r 's/.*_([^.]*)(\.?[^.]+)?$/\1/'
will work for:
file_1.gz
file_12.txt
file_123
resulting in:
1
12
123
I just realized that could do something very easy:
echo 'file_1.gz'|sed -n 's/.*_\([0-9]*\).*/\1/p'
Notice the [0-9]* instead of a .*. #Laurence Gonsalves's answer made me realize the greediness of my previous post.

Resources