I am reading a bash script that takes two arguments in input but I can't figure out what does
${2%.*}
does exactly, can someone explain me what does the curly braces, 2, %, "." and * refers two?
Thanks
$2 is the second argument passed to the program. That is, if your script was run with
myscript foo.txt bar.jpg
$2 would have the value bar.jpg.
The % operator removes a suffix from the value that matches the following pattern. .* matches a period (.) followed by zero or more characters. Put together, your expression removes a single extension from the value. Using the above example,
$ echo ${2%.*}
bar
P.S. Perhaps worth noting that % will remove the shortest match for the following pattern: So if $2 was for example bar.jpg.xz, then ${2%.*} would be bar.jpg. (Conversely, the %% operator will remove the longest match for the pattern, so ${2%%.*} would be bar in both examples.)
Related
If you append ^ to a variable, Bash capitalises the first letter of its contents. (Similarly, , sends it to lowercase and doubling-up either of these applies the transformation to the whole string, rather than just the first letter.)
foo="hello world"
echo ${foo^} # Hello world
You can also do ${variable:position:length} to extract a substring:
echo ${foo:0:1} # h
So far, I haven't found a way to combine these without, obviously, creating a temporary variable. Is there a form where I can get just the capitalised first letter out of an arbitrary string?
It does not change the basic limitation you are seeing in terms of not being able to "chain" expansions, but you can assign the result of an expansion to the same variable and do away with the temporary variable.
For instance:
A=text
A="${A^}"
A="${A//x/s}"
echo "$A"
echoes "Test".
No. Parameter expansion operators do not compose, so if you want more than one side effect, you need a temporary variable (which can include overwriting the original value as shown by #fred) or an external tool to process the result of the expansion (as shown by #anubhava).
Your other alternative is to use a different shell that does support more complicated operations, like zsh:
% foo="hello world"
% % print ${(U)${foo:0:1}}
H
You can use tr with substring:
tr [[:lower:]] [[:upper:]] <<< "${foo:0:1}"
H
I've come across this curious bash expression:
somestring=4.5.6
echo ${somestring%rc*}
For all that I can tell it just prints 4.5.6. So why would anybody use it?
I found it in this script (look for pkgver), so I hope I didn't miss any context which is necessary for this to work.
Source:
${string%substring} deletes shortest match of $substring from back of
$string.
The intention is to echo the numerical version only, without the rc* suffix for strings like:
somestring=4.5.6rc1
somestring=4.5.6rc23_whatever
UPDATE:
The better choice is to echo ${somestring%%rc*}.
Otherwise, the following might happen:
somestring=4.5.6rc1_rc2
echo ${somestring%rc*}
4.5.6rc1_
whereas:
echo ${somestring%%rc*}
4.5.6
It removes rc and any following characters (the *) from somestring. See, e.g., this answer.
This is known as parameter expansion.
From Bash manual:
${parameter%word}
${parameter%%word}
The word is expanded to produce a pattern just as in filename expansion. If the pattern matches a trailing portion of the expanded
value of parameter, then the result of the expansion is the value of
parameter with the shortest matching pattern (the ‘%’ case) or the
longest matching pattern (the ‘%%’ case) deleted. If parameter is ‘#’
or ‘’, the pattern removal operation is applied to each positional
parameter in turn, and the expansion is the resultant list. If
parameter is an array variable subscripted with ‘#’ or ‘’, the
pattern removal operation is applied to each member of the array in
turn, and the expansion is the resultant list.
Simply "${somestring%rc*}" is the string left by cutting rc* (* means anything after rc) from $somestring from right i.e rc* is matched in the string from right and then deleted and the resultant is the remaining string.
I was reviewing some of my old code and came across this syntax:
extractDir="${downloadFileName%.*}-tmp"
The only information I found searching refers to a list of commands, but this is just one variable. What does this curly-brace syntax mean in bash?
In this context, it is a parameter substitution.
The ${variable%.*} notation means take the value of $variable, strip off the pattern .* from the tail of the value — mnemonic: percenT has a 't' at the Tail — and give the result. (By contrast, ${variable#xyz} means remove xyz from the head of the variable's value — mnemonic: a Hash has an 'h' at the Head.)
Given:
downloadFileName=abc.tar.gz
evaluating extractDir="${downloadFileName%.*}-tmp" yields the equivalent of:
extractDir="abc.tar-tmp"
The alternative notation with the double %:
extractDir="${downloadFileName%%.*}-tmp"
would yield the equivalent of:
extractDir="abc-tmp"
The %% means remove the longest possible tail; correspondingly, ## means remove the longest matching head.
It indicates that parameter expansion will occur.
It is used when expanding an environment variable adjacent to some text that is not the variable, so the shell does not include all of it in the variable name.
I found out that with ${string:0:3} one can access the first 3 characters of a string. Is there a equivalently easy method to access the last three characters?
Last three characters of string:
${string: -3}
or
${string:(-3)}
(mind the space between : and -3 in the first form).
Please refer to the Shell Parameter Expansion in the reference manual:
${parameter:offset}
${parameter:offset:length}
Expands to up to length characters of parameter starting at the character
specified by offset. If length is omitted, expands to the substring of parameter
starting at the character specified by offset. length and offset are arithmetic
expressions (see Shell Arithmetic). This is referred to as Substring Expansion.
If offset evaluates to a number less than zero, the value is used as an offset
from the end of the value of parameter. If length evaluates to a number less than
zero, and parameter is not ‘#’ and not an indexed or associative array, it is
interpreted as an offset from the end of the value of parameter rather than a
number of characters, and the expansion is the characters between the two
offsets. If parameter is ‘#’, the result is length positional parameters
beginning at offset. If parameter is an indexed array name subscripted by ‘#’ or
‘*’, the result is the length members of the array beginning with
${parameter[offset]}. A negative offset is taken relative to one greater than the
maximum index of the specified array. Substring expansion applied to an
associative array produces undefined results.
Note that a negative offset must be separated from the colon by at least one
space to avoid being confused with the ‘:-’ expansion. Substring indexing is
zero-based unless the positional parameters are used, in which case the indexing
starts at 1 by default. If offset is 0, and the positional parameters are used,
$# is prefixed to the list.
Since this answer gets a few regular views, let me add a possibility to address John Rix's comment; as he mentions, if your string has length less than 3, ${string: -3} expands to the empty string. If, in this case, you want the expansion of string, you may use:
${string:${#string}<3?0:-3}
This uses the ?: ternary if operator, that may be used in Shell Arithmetic; since as documented, the offset is an arithmetic expression, this is valid.
Update for a POSIX-compliant solution
The previous part gives the best option when using Bash. If you want to target POSIX shells, here's an option (that doesn't use pipes or external tools like cut):
# New variable with 3 last characters removed
prefix=${string%???}
# The new string is obtained by removing the prefix a from string
newstring=${string#"$prefix"}
One of the main things to observe here is the use of quoting for prefix inside the parameter expansion. This is mentioned in the POSIX ref (at the end of the section):
The following four varieties of parameter expansion provide for substring processing. In each case, pattern matching notation (see Pattern Matching Notation), rather than regular expression notation, shall be used to evaluate the patterns. If parameter is '#', '*', or '#', the result of the expansion is unspecified. If parameter is unset and set -u is in effect, the expansion shall fail. Enclosing the full parameter expansion string in double-quotes shall not cause the following four varieties of pattern characters to be quoted, whereas quoting characters within the braces shall have this effect. In each variety, if word is omitted, the empty pattern shall be used.
This is important if your string contains special characters. E.g. (in dash),
$ string="hello*ext"
$ prefix=${string%???}
$ # Without quotes (WRONG)
$ echo "${string#$prefix}"
*ext
$ # With quotes (CORRECT)
$ echo "${string#"$prefix"}"
ext
Of course, this is usable only when then number of characters is known in advance, as you have to hardcode the number of ? in the parameter expansion; but when it's the case, it's a good portable solution.
You can use tail:
$ foo="1234567890"
$ echo -n $foo | tail -c 3
890
A somewhat roundabout way to get the last three characters would be to say:
echo $foo | rev | cut -c1-3 | rev
Another workaround is to use grep -o with a little regex magic to get three chars followed by the end of line:
$ foo=1234567890
$ echo $foo | grep -o ...$
890
To make it optionally get the 1 to 3 last chars, in case of strings with less than 3 chars, you can use egrep with this regex:
$ echo a | egrep -o '.{1,3}$'
a
$ echo ab | egrep -o '.{1,3}$'
ab
$ echo abc | egrep -o '.{1,3}$'
abc
$ echo abcd | egrep -o '.{1,3}$'
bcd
You can also use different ranges, such as 5,10 to get the last five to ten chars.
1. Generalized Substring
To generalise the question and the answer of gniourf_gniourf (as this is what I was searching for), if you want to cut a range of characters from, say, 7th from the end to 3rd from the end, you can use this syntax:
${string: -7:4}
Where 4 is the length of course (7-3).
2. Alternative using cut
In addition, while the solution of gniourf_gniourf is obviously the best and neatest, I just wanted to add an alternative solution using cut:
echo $string | cut -c $((${#string}-2))-
Here, ${#string} is the length of the string, and the trailing "-" means cut to the end.
3. Alternative using awk
This solution instead uses the substring function of awk to select a substring which has the syntax substr(string, start, length) going to the end if the length is omitted. length($string)-2) thus picks up the last three characters.
echo $string | awk '{print substr($1,length($1)-2) }'
I have a string separated by dot in Linux Shell,
$example=This.is.My.String
I want to
1.Add some string before the last dot, for example, I want to add "Good.Long" before the last dot, so I get:
This.is.My.Goood.Long.String
2.Get the part after the last dot, so I will get
String
3.Turn the dot into underscore except the last dot, so I will get
This_is_My.String
If you have time, please explain a little bit, I am still learning Regular Expression.
Thanks a lot!
I don't know what you mean by 'Linux Shell' so I will assume bash. This solution will also work in zsh, etcetera:
example=This.is.My.String
before_last_dot=${example%.*}
after_last_dot=${example##*.}
echo ${before_last_dot}.Goood.Long.${after_last_dot}
This.is.My.Goood.Long.String
echo ${before_last_dot//./_}.${after_last_dot}
This_is_My.String
The interim variables before_last_dot and after_last_dot should explain my usage of the % and ## operators. The //, I also think is self-explanatory but I'd be happy to clarify if you have any questions.
This doesn't use sed (or even regular expressions), but bash's inbuilt parameter substitution. I prefer to stick to just one language per script, with as few forks as possible :-)
Other users have given good answers for #1 and #2. There are some disadvantages to some of the answers for #3. In one case, you have to run the substitution twice. In another, if your string has other underscores they might get clobbered. This command works in one go and only affects dots:
sed 's/\(.*\)\./\1\n./;h;s/[^\n]*\n//;x;s/\n.*//;s/\./_/g;G;s/\n//'
It splits the line before the last dot by inserting a newline and copies the result into hold space:
s/\(.*\)\./\1\n./;h
removes everything up to and including the newline from the copy in pattern space and swaps hold space and pattern space:
s/[^\n]*\n//;x
removes everything after and including the newline from the copy that's now in pattern space
s/\n.*//
changes all dots into underscores in the copy in pattern space and appends hold space onto the end of pattern space
s/\./_/g;G
removes the newline that the append operation adds
s/\n//
Then the sed script is finished and the pattern space is output.
At the end of each numbered step (some consist of two actual steps):
Step Pattern Space Hold Space
This.is.My\n.String This.is.My\n.String
This.is.My\n.String .String
This.is.My .String
This_is_My\n.String .String
This_is_My.String .String
Solution
Two versions of this, too:
Complex: sed 's/\(.*\)\([.][^.]*$\)/\1.Goood.Long\2/'
Simple: sed 's/.*\./&Goood.Long./' - thanks Dennis Williamson
What do you want?
Complex: sed 's/.*[.]\([^.]*\)$/\1/'
Simpler: sed 's/.*\.//' - thanks, glenn jackman.
sed 's/\([^.]*\)[.]\([^.]*[.]\)/\1_\2/g'
With 3, you probably need to run the substitute (in its entirety) at least twice, in general.
Explanation
Remember, in sed, the notation \(...\) is a 'capture' that can be referenced as '\1' or similar in the replacement text.
Capture everything up to a string starting with a dot followed by a sequence of non-dots (which you also capture); replace by what came before the last dot, the new material, and the last dot and what came after it.
Ignore everything up to the last dot followed by a capture of a sequence of non-dots; replace with the capture only.
Find and capture a sequence of non-dots, a dot (not captured), followed by a sequence of non-dots and a dot; replace the first dot with an underscore. This is done globally, but the second and subsequent matches won't touch anything already matched. Therefore, I think you need ceil(log2N) passes, where N is the number of dots to be replaced. One pass deals with 1 dot to replace; two passes deals with 2 or 3; three passes deals with 4-7, and so on.
Here's a version that uses Bash's regex matching (Bash 3.2 or greater).
[[ $example =~ ^(.*)\.(.*)$ ]]
echo ${BASH_REMATCH[1]//./_}.${BASH_REMATCH[2]}
Here's a Bash version that uses IFS (Internal Field Separator).
saveIFS=$IFS
IFS=.
array=($e) # * split the string at each dot
lastword=${array[#]: -1}
unset "array[${#array}-1]" # *
IFS=_
echo "${array[*]}.$lastword" # The asterisk as a subscript when inside quotes causes IFS (an underscore in this case) to be inserted between each element of the array
IFS=$saveIFS
* use declare -p array after these steps to see what the array looks like.
1.
$ echo 'This.is.my.string' | sed 's}[^\.][^\.]*$}Good Long.&}'
This.is.my.Good Long.string
before: a dot, then no dot until the end. after: obvious, & is what matched the first part
2.
$ echo 'This.is.my.string' | sed 's}.*\.}}'
string
sed greedy matches, so it will extend the first closure (.*) as far as possible i.e. to the last dot.
3.
$ echo 'This.is.my.string' | tr . _ | sed 's/_\([^_]*\)$/\.\1/'
This_is_my.string
convert all dots to _, then turn the last _ to a dot.
(caveat: this will turn 'This.is.my.string_foo' to 'This_is_my_string.foo', not 'This_is_my.string_foo')
You don't need regular expressions at all (those complex things hurt my eyes!) if you use Awk and are a little creative.
1. echo $example| awk -v ins="Good.long" -F . '{OFS="."; $NF = ins"."$NF;print}'
What this does:
-v ins="Good.long" tells awk to create a variable called 'ins' with "Good.long" as content,
-F . tells awk to use the dot as a separator for your fields for input,
-OFS tells awk to use the dot as a separator for your fields as output,
NF is the number of fields, so $NF represents the last field,
the $NF=... part replaces the last field, it appends the current last string to what you want to insert (the variable called "ins" declared earlier).
2. echo $example| awk -F . '{print $NF}'
$NF is the last field, so that's all!
3. echo $example| awk -F . '{OFS="_"; $(NF-1) = $(NF-1)"."$NF; NF=NF-1; print}'
Here we have to be creative, as Awk AFAIK doesn't allow deleting fields. Of course, we set the output field separateor to underscore.
$(NF-1) = $(NF-1)"."$NF: First, we replace the second last field with the last glued to the second last, with a dot between.
Then, we fool awk to make it think the Number of fields is equal to the number of fields minus one, hence deleting the last field!
Note you can't say $NF="", because then it would display two underscores.