Accessing last x characters of a string in Bash

Accessing last x characters of a string in Bash - string

I found out that with ${string:0:3} one can access the first 3 characters of a string. Is there a equivalently easy method to access the last three characters?

Last three characters of string:
${string: -3}
or
${string:(-3)}
(mind the space between : and -3 in the first form).
Please refer to the Shell Parameter Expansion in the reference manual:
${parameter:offset}
${parameter:offset:length}
Expands to up to length characters of parameter starting at the character
specified by offset. If length is omitted, expands to the substring of parameter
starting at the character specified by offset. length and offset are arithmetic
expressions (see Shell Arithmetic). This is referred to as Substring Expansion.
If offset evaluates to a number less than zero, the value is used as an offset
from the end of the value of parameter. If length evaluates to a number less than
zero, and parameter is not ‘#’ and not an indexed or associative array, it is
interpreted as an offset from the end of the value of parameter rather than a
number of characters, and the expansion is the characters between the two
offsets. If parameter is ‘#’, the result is length positional parameters
beginning at offset. If parameter is an indexed array name subscripted by ‘#’ or
‘*’, the result is the length members of the array beginning with
${parameter[offset]}. A negative offset is taken relative to one greater than the
maximum index of the specified array. Substring expansion applied to an
associative array produces undefined results.
Note that a negative offset must be separated from the colon by at least one
space to avoid being confused with the ‘:-’ expansion. Substring indexing is
zero-based unless the positional parameters are used, in which case the indexing
starts at 1 by default. If offset is 0, and the positional parameters are used,
$# is prefixed to the list.
Since this answer gets a few regular views, let me add a possibility to address John Rix's comment; as he mentions, if your string has length less than 3, ${string: -3} expands to the empty string. If, in this case, you want the expansion of string, you may use:
${string:${#string}<3?0:-3}
This uses the ?: ternary if operator, that may be used in Shell Arithmetic; since as documented, the offset is an arithmetic expression, this is valid.
Update for a POSIX-compliant solution
The previous part gives the best option when using Bash. If you want to target POSIX shells, here's an option (that doesn't use pipes or external tools like cut):
# New variable with 3 last characters removed
prefix=${string%???}
# The new string is obtained by removing the prefix a from string
newstring=${string#"$prefix"}
One of the main things to observe here is the use of quoting for prefix inside the parameter expansion. This is mentioned in the POSIX ref (at the end of the section):
The following four varieties of parameter expansion provide for substring processing. In each case, pattern matching notation (see Pattern Matching Notation), rather than regular expression notation, shall be used to evaluate the patterns. If parameter is '#', '*', or '#', the result of the expansion is unspecified. If parameter is unset and set -u is in effect, the expansion shall fail. Enclosing the full parameter expansion string in double-quotes shall not cause the following four varieties of pattern characters to be quoted, whereas quoting characters within the braces shall have this effect. In each variety, if word is omitted, the empty pattern shall be used.
This is important if your string contains special characters. E.g. (in dash),
$ string="hello*ext"
$ prefix=${string%???}
$ # Without quotes (WRONG)
$ echo "${string#$prefix}"
*ext
$ # With quotes (CORRECT)
$ echo "${string#"$prefix"}"
ext
Of course, this is usable only when then number of characters is known in advance, as you have to hardcode the number of ? in the parameter expansion; but when it's the case, it's a good portable solution.

You can use tail:
$ foo="1234567890"
$ echo -n $foo | tail -c 3
890
A somewhat roundabout way to get the last three characters would be to say:
echo $foo | rev | cut -c1-3 | rev

Another workaround is to use grep -o with a little regex magic to get three chars followed by the end of line:
$ foo=1234567890
$ echo $foo | grep -o ...$
890
To make it optionally get the 1 to 3 last chars, in case of strings with less than 3 chars, you can use egrep with this regex:
$ echo a | egrep -o '.{1,3}$'
a
$ echo ab | egrep -o '.{1,3}$'
ab
$ echo abc | egrep -o '.{1,3}$'
abc
$ echo abcd | egrep -o '.{1,3}$'
bcd
You can also use different ranges, such as 5,10 to get the last five to ten chars.

1. Generalized Substring
To generalise the question and the answer of gniourf_gniourf (as this is what I was searching for), if you want to cut a range of characters from, say, 7th from the end to 3rd from the end, you can use this syntax:
${string: -7:4}
Where 4 is the length of course (7-3).
2. Alternative using cut
In addition, while the solution of gniourf_gniourf is obviously the best and neatest, I just wanted to add an alternative solution using cut:
echo $string | cut -c $((${#string}-2))-
Here, ${#string} is the length of the string, and the trailing "-" means cut to the end.
3. Alternative using awk
This solution instead uses the substring function of awk to select a substring which has the syntax substr(string, start, length) going to the end if the length is omitted. length($string)-2) thus picks up the last three characters.
echo $string | awk '{print substr($1,length($1)-2) }'

Related

Issue in sed command

I am trying to change nw_src in the following string:
cookie=0xb868a1f26498cddd, duration=5327.613s, table=0, n_packets=199, n_bytes=19502, priority=30,icmp,in_port="qvo2495b490-33",nw_src=10.0.0.133,nw_dst=8.8.8.0/24 actions=group:2
command used:
sed -i -e
's/nw_src=(^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$,)/nw_src=1.1.1.1/g' flows.txt
It seems to not work however when i generate data from regex: it shows all the permutation of generated data right.
I am just trying to replace nw_src=<any_ipv4_address> to nw_src=1.1.1.1
Also nw_src appears only once in line
What am i missing.
Please help

Let's start with the regular expression that verifies that a text fragment is a one, two or three digit substring, representing a numeric value between 0 and 255:
2(5[0-5]|[0-4][0-9])|1[0-9]{2}|[1-9]?[0-9]
For example, run the following command and inspect its output:
seq 0 999 | grep -E '^(2(5[0-5]|[0-4][0-9])|1[0-9]{2}|[1-9]?[0-9])$'
seq generates all integers from 0 to 999, one per line, and grep will return only the integers from 0 to 255. Note that I had to enclose the regexp in parentheses, so that the anchors ^ and $ apply to the whole thing. In general, you need to have the regexp preceded and followed by non-digits, or by some sort of zero-length assertions to make sure the string is not a substring of a longer sequence of digits.
If you must replace the value of nw_src, but only when it is a valid IP4 address (and leave it as is if it's not a valid IP4 address), you could do this:
sed -iE 's/\bnw_src=(2(5[0-5]|[0-4][0-9])|1[0-9]{2}|[1-9]?[0-9])(\.\1){3}\b/nw_src=1.1.1.1/'
If it is guaranteed that the "old" value is always a valid IP4 address (so you don't need to verify that in this command), or if you need to replace the "old" value regardless of whether it is a valid address or not, the command can be simplified quite a bit; for example
... 's/\bnw_src=[^,]*/nw_src=1.1.1.1/'

If all you are trying to do is replace nw_src=<any_ipv4_address> to nw_src=1.1.1.1 then the following will suffice
sed 's/\(.*nw_src=\)\([^,]*\),\(.*\)/\11.1.1.1,\2/' network_file
Output
$ sed 's/\(.*nw_src=\)\([^,]*\),\(.*\)/\11.1.1.1,\2/' network_file
cookie=0xb868a1f26498cddd, duration=5327.613s, table=0, n_packets=199, n_bytes=19502, priority=30,icmp,in_port="qvo2495b490-33",nw_src=1.1.1.1,10.0.0.133

sed is misbehaving when replacing certain regular expressions

I am trying to remove numbers - but only when they immediately follow periods. Similar replaces seem to work correctly, but not with periods.
I have tried the following which was given as a solution in another post:
echo "fr.r1.1.0" | sed s/\.[0-9][0-9]*/\./g
I get fr..... It seems that even though I escape the period it is matching arbitrary characters instead of only periods.
This expression seems to work for the previous example:
echo "fr.r1.1.0" | sed s/[[:punct:]][0-9][0-9]*/\./g
and gives me fr.r1.. but then for
echo "ge.s1_1.0" | sed s/[[:punct:]][0-9][0-9]*/\./g
I get ge.s1.. instead of ge.s1_1.

You will have to put the sed instructions between single quotes to avoid interpretation of some of the special characters by your shell:
echo "fr.r1.1.0" | sed 's/\.[0-9][0-9]*/\./g'
fr.r1..
Also you do not need to escape the dot in the replacement part (.) and [0-9][0-9]* can be simplified into [0-9]\+ giving the simplified command:
echo "fr.r1.1.0" | sed 's/\.[0-9]\+/./g'
fr.r1..
Last but not least, as POSIX [:punct:] character class is defined as
punctuation (all graphic characters except letters and digits)
https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions
it will also include underscore (and a lot of other stuff), therefore, if you want to limit your matches to . followed by digits you will need to explicitly use dot (escaped or via its ascii value)

What does ${2%.*} mean?

I am reading a bash script that takes two arguments in input but I can't figure out what does
${2%.*}
does exactly, can someone explain me what does the curly braces, 2, %, "." and * refers two?
Thanks

$2 is the second argument passed to the program. That is, if your script was run with
myscript foo.txt bar.jpg
$2 would have the value bar.jpg.
The % operator removes a suffix from the value that matches the following pattern. .* matches a period (.) followed by zero or more characters. Put together, your expression removes a single extension from the value. Using the above example,
$ echo ${2%.*}
bar
P.S. Perhaps worth noting that % will remove the shortest match for the following pattern: So if $2 was for example bar.jpg.xz, then ${2%.*} would be bar.jpg. (Conversely, the %% operator will remove the longest match for the pattern, so ${2%%.*} would be bar in both examples.)

sed regex not being greedy?

In bash I have a string variable tempvar, which is created thus:
tempvar=`grep -n 'Mesh Tally' ${meshtalfile}`
meshtalfile is a (large) input file which contains some header lines and a number of blocks of data lines, each marked by a beginning line which is searched for in the grep above.
In the case at hand, the variable tempvar contains the following string:
5: Mesh Tally Number 4 977236: Mesh Tally Number 14 1954467: Mesh Tally Number 24 4354479: Mesh Tally Number 34
I now wish to extract the line number relating to a particularly mesh tally number - so I define a variable meshnum1 as equal to 24, and run the following sed command:
echo ${tempvar} | sed -r "s/^.*([0-9][0-9]*):\sMesh\sTally\sNumber\s${meshnum1}.*$/\1/"
This is where things go wrong. I expect the output 1954467, but instead I get 7. Trying with number 34 instead returns 9 instead of 4354479. It seems that sed is returning only the last digit of the number - which surely violates the principle of greedy matching? And oddly, when I move the open parenthesis ( left a couple of characters to include .*, it returns the whole line up to and including the single character it was previously returning. Surely it cannot be greedy in one situation and antigreedy in another? Hopefully I have just done something stupid with the syntax...

The problem is that the .* is being greedy too, which means that it will get all numbers too. Since you force it to get at least one digit in the [0-9][0-9]* part, the .* before it will be greedy enough to leave only one digit for the expression after it.
A solution could be:
echo ${tempvar} | sed -r "s/^.*\s([0-9][0-9]*):\sMesh\sTally\sNumber\s${meshnum1}.*$/\1/"
Where now the \s between the .* and the [0-9][0-9]* explictly forces there to be a space before the digits you want to match.
Hope this helps =)

Are the values in $tempvar supposed to be multiple or a single line? Because if it is a single line, ".*$" should match to the end of line, meaning all the other values too, right?

There's no need for sed, here's one way using GNU grep:
echo "$tempvar" | grep -oP "[0-9]+(?=:\sMesh\sTally\sNumber\s${meshnum1}\b)"

A Linux Shell Script Problem

I have a string separated by dot in Linux Shell,
$example=This.is.My.String
I want to
1.Add some string before the last dot, for example, I want to add "Good.Long" before the last dot, so I get:
This.is.My.Goood.Long.String
2.Get the part after the last dot, so I will get
String
3.Turn the dot into underscore except the last dot, so I will get
This_is_My.String
If you have time, please explain a little bit, I am still learning Regular Expression.
Thanks a lot!

I don't know what you mean by 'Linux Shell' so I will assume bash. This solution will also work in zsh, etcetera:
example=This.is.My.String
before_last_dot=${example%.*}
after_last_dot=${example##*.}
echo ${before_last_dot}.Goood.Long.${after_last_dot}
This.is.My.Goood.Long.String
echo ${before_last_dot//./_}.${after_last_dot}
This_is_My.String
The interim variables before_last_dot and after_last_dot should explain my usage of the % and ## operators. The //, I also think is self-explanatory but I'd be happy to clarify if you have any questions.
This doesn't use sed (or even regular expressions), but bash's inbuilt parameter substitution. I prefer to stick to just one language per script, with as few forks as possible :-)

Other users have given good answers for #1 and #2. There are some disadvantages to some of the answers for #3. In one case, you have to run the substitution twice. In another, if your string has other underscores they might get clobbered. This command works in one go and only affects dots:
sed 's/\(.*\)\./\1\n./;h;s/[^\n]*\n//;x;s/\n.*//;s/\./_/g;G;s/\n//'
It splits the line before the last dot by inserting a newline and copies the result into hold space:
s/\(.*\)\./\1\n./;h
removes everything up to and including the newline from the copy in pattern space and swaps hold space and pattern space:
s/[^\n]*\n//;x
removes everything after and including the newline from the copy that's now in pattern space
s/\n.*//
changes all dots into underscores in the copy in pattern space and appends hold space onto the end of pattern space
s/\./_/g;G
removes the newline that the append operation adds
s/\n//
Then the sed script is finished and the pattern space is output.
At the end of each numbered step (some consist of two actual steps):
Step Pattern Space Hold Space
This.is.My\n.String This.is.My\n.String
This.is.My\n.String .String
This.is.My .String
This_is_My\n.String .String
This_is_My.String .String

Solution
Two versions of this, too:
Complex: sed 's/\(.*\)\([.][^.]*$\)/\1.Goood.Long\2/'
Simple: sed 's/.*\./&Goood.Long./' - thanks Dennis Williamson
What do you want?
Complex: sed 's/.*[.]\([^.]*\)$/\1/'
Simpler: sed 's/.*\.//' - thanks, glenn jackman.
sed 's/\([^.]*\)[.]\([^.]*[.]\)/\1_\2/g'
With 3, you probably need to run the substitute (in its entirety) at least twice, in general.
Explanation
Remember, in sed, the notation \(...\) is a 'capture' that can be referenced as '\1' or similar in the replacement text.
Capture everything up to a string starting with a dot followed by a sequence of non-dots (which you also capture); replace by what came before the last dot, the new material, and the last dot and what came after it.
Ignore everything up to the last dot followed by a capture of a sequence of non-dots; replace with the capture only.
Find and capture a sequence of non-dots, a dot (not captured), followed by a sequence of non-dots and a dot; replace the first dot with an underscore. This is done globally, but the second and subsequent matches won't touch anything already matched. Therefore, I think you need ceil(log2N) passes, where N is the number of dots to be replaced. One pass deals with 1 dot to replace; two passes deals with 2 or 3; three passes deals with 4-7, and so on.

Here's a version that uses Bash's regex matching (Bash 3.2 or greater).
[[ $example =~ ^(.*)\.(.*)$ ]]
echo ${BASH_REMATCH[1]//./_}.${BASH_REMATCH[2]}
Here's a Bash version that uses IFS (Internal Field Separator).
saveIFS=$IFS
IFS=.
array=($e) # * split the string at each dot
lastword=${array[#]: -1}
unset "array[${#array}-1]" # *
IFS=_
echo "${array[*]}.$lastword" # The asterisk as a subscript when inside quotes causes IFS (an underscore in this case) to be inserted between each element of the array
IFS=$saveIFS
* use declare -p array after these steps to see what the array looks like.

1.
$ echo 'This.is.my.string' | sed 's}[^\.][^\.]*$}Good Long.&}'
This.is.my.Good Long.string
before: a dot, then no dot until the end. after: obvious, & is what matched the first part
2.
$ echo 'This.is.my.string' | sed 's}.*\.}}'
string
sed greedy matches, so it will extend the first closure (.*) as far as possible i.e. to the last dot.
3.
$ echo 'This.is.my.string' | tr . _ | sed 's/_\([^_]*\)$/\.\1/'
This_is_my.string
convert all dots to _, then turn the last _ to a dot.
(caveat: this will turn 'This.is.my.string_foo' to 'This_is_my_string.foo', not 'This_is_my.string_foo')

You don't need regular expressions at all (those complex things hurt my eyes!) if you use Awk and are a little creative.
1. echo $example| awk -v ins="Good.long" -F . '{OFS="."; $NF = ins"."$NF;print}'
What this does:
-v ins="Good.long" tells awk to create a variable called 'ins' with "Good.long" as content,
-F . tells awk to use the dot as a separator for your fields for input,
-OFS tells awk to use the dot as a separator for your fields as output,
NF is the number of fields, so $NF represents the last field,
the $NF=... part replaces the last field, it appends the current last string to what you want to insert (the variable called "ins" declared earlier).
2. echo $example| awk -F . '{print $NF}'
$NF is the last field, so that's all!
3. echo $example| awk -F . '{OFS="_"; $(NF-1) = $(NF-1)"."$NF; NF=NF-1; print}'
Here we have to be creative, as Awk AFAIK doesn't allow deleting fields. Of course, we set the output field separateor to underscore.
$(NF-1) = $(NF-1)"."$NF: First, we replace the second last field with the last glued to the second last, with a dot between.
Then, we fool awk to make it think the Number of fields is equal to the number of fields minus one, hence deleting the last field!
Note you can't say $NF="", because then it would display two underscores.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Accessing last x characters of a string in Bash - string

I found out that with ${string:0:3} one can access the first 3 characters of a string. Is there a equivalently easy method to access the last three characters?

You can use tail: $ foo="1234567890" $ echo -n $foo | tail -c 3 890 A somewhat roundabout way to get the last three characters would be to say: echo $foo | rev | cut -c1-3 | rev

Related

Issue in sed command

sed is misbehaving when replacing certain regular expressions

What does ${2%.*} mean?

sed regex not being greedy?

A Linux Shell Script Problem

Categories

Resources