Bash: ${string:$i:1} what does this mean? - linux

This is the script. It reverses a string entered by the user:
#!/bin/bash
read -p "Enter string:" string
len=${#string}
for (( i=$len-1; i>=0; i-- ))
do
# "${string:$i:1}"extract single single character from string.
reverse="$reverse${string:$i:1}"
done
echo "$reverse"
I don't understand the following part of the script. What is this? Looks like some kind of extended variable interpolation.
${string:$i:1}

in bash doing something lik this: ${string:3:1} means: take substring starting from the character at pos 3 (0-based, so the 4th character), and length = 1 character.
for example:
string=abc
then ${string:0:1} equals a and ${string:2:1} equals c.
This script takes the value of the variable $i: so it just takes the character at position $i.

It's substring expansion.
from the man pages:
${parameter:offset:length}
Substring Expansion. Expands to up to length characters of parameter starting at the character specified by offset. If length is omitted, expands to the
substring of parameter starting at the character specified by offset. length and offset are arithmetic expressions (see ARITHMETIC EVALUATION below). If
offset evaluates to a number less than zero, the value is used as an offset from the end of the value of parameter. Arithmetic expressions starting with a -
must be separated by whitespace from the preceding : to be distinguished from the Use Default Values expansion. If length evaluates to a number less than
zero, and parameter is not # and not an indexed or associative array, it is interpreted as an offset from the end of the value of parameter rather than a
number of characters, and the expansion is the characters between the two offsets. If parameter is #, the result is length positional parameters beginning
at offset. If parameter is an indexed array name subscripted by # or *, the result is the length members of the array beginning with ${parameter[offset]}.
A negative offset is taken relative to one greater than the maximum index of the specified array. Substring expansion applied to an associative array proâ
duces undefined results. Note that a negative offset must be separated from the colon by at least one space to avoid being confused with the :- expansion.
Substring indexing is zero-based unless the positional parameters are used, in which case the indexing starts at 1 by default. If offset is 0, and the posiâ
tional parameters are used, $0 is prefixed to the list.

Related

What's the function of `'` in `printf "%x" "'你"`?

I want to get the hexadecimal value of 你, someone tell me to use printf "%x" "'你", but I don't know what's the function of ' in printf "%x" "'你", why use ' before 你?
From the bash manual:
Arguments to non-string format specifiers are treated as C constants, except that a leading plus or minus sign is allowed, and if the leading character is a single or double quote, the value is the ASCII value of the following character.
%x is a numeric specifier, not a string one, so this section applies. The documentation is a bit wrong (or outdated) when it speaks about ASCII values, but it's correct in spirit: an argument of '你 evaluates to the numerical value of the unicode codepoint 你 (without the quote, it would be a syntax error, since 你 isn't a number). The codepoint value that it evaluates to is then formatted in hexadecimal by %x.

Select sequences in a fasta file with more than 300 aa and "C" occurs at least 4 times

I have a fasta file which contains protein sequences. I'd like to select sequences with more than 300 amino acids and Cysteine (C) amino acid appears more than 4 times.
I've used this command to select sequences with more than 300 aa:
cat 72hDOWN-fasta.fasta | bioawk -c fastx 'length($seq) > 300{ print ">"$name; print $seq }'
Some sequence example:
>jgi|Triasp1|216614|CE216613_3477
MPSLYLTSALGLLSLLPAAQAGWNPNSKDNIVVYWGQDAGSIGQNRLSYYCENAPDVDVI
NISFLVGITDLNLNLANVGNNCTAFAQDPNLLDCPQVAADIVECQQTYGKTIMMSLFGST
YTESGFSSSSTAVSAAQEIWAMFGPVQSGNSTPRPFGNAVIDGFDFDLEDPIENNMEPFA
AELRSLTSAATSKKFYLSAAPQCVYPDASDESFLQGEVAFDWLNIQFYNNGCGTSYYPSG
YNYATWDNWAKTVSANPNTKLLVGTPASVHAVNFANYFPTNDQLAGAISSSKSYDSFAGV
MLWDMAQLFGNPGYLDLIVADLGGASTPPPPASTTLSTVTRSSTASTGPTSPPPSGGSVP
QWGQCGGQGYTGPTQCQSPYTCVVESQWWSSCQ*
I do not know bioawk but I assume it is identical to awk with some initial parsing and constant definitions.
I would proceed as follows. Assuming you want the find the strings with more then 4 times the letter C in and a length of more than 300, then you could do :
bioawk -c fastx '
(length($seq) > 300) && (gsub("C","C",$seq)>4) {
print ">"$name; print $seq
}' 72hDOWN-fasta.fasta
but this assumes that seq is the full character sequence.
The idea behind it is the following. The gsub command performs substitutions in strings and returns the total substitutions it did. Hence, if we substitute all characters "C" with "C" we actually did not change the string, but get the total amount of "C"'s in the string back.
From the POSIX standard IEEE Std 1003.1-2017:
gsub(ere, repl[, in]): Behave like sub (see below), except that it shall replace all occurrences of the regular expression (like
the ed utility global substitute) in $0 or in the in argument,
when specified.
sub(ere, repl[, in ]): Substitute the string repl in place of the first instance of the extended regular expression ere in string in
and return the number of substitutions. An <ampersand> ( &
) appearing in the string repl shall be replaced by the string from in
that matches the ERE. An <ampersand> preceded with a
<backslash> shall be interpreted as the literal
<ampersand> character. An occurrence of two consecutive
<backslash> characters shall be interpreted as just a single
literal <backslash> character. Any other occurrence of a
<backslash> (for example, preceding any other character) shall
be treated as a literal <backslash> character. Note that if repl
is a string literal (the lexical token STRING; see Grammar), the
handling of the <ampersand> character occurs after any lexical
processing, including any lexical <backslash>-escape sequence
processing. If in is specified and it is not an lvalue (see
Expressions in awk), the behavior is undefined. If in is omitted, awk
shall use the current record ($0) in its place.
Note: BioAwk is based on Brian Kernighan's awk which is documented in "The AWK Programming Language",
by Al Aho, Brian Kernighan, and Peter Weinberger
(Addison-Wesley, 1988, ISBN 0-201-07981-X)
. I'm not sure if this version is compatible with POSIX.

Bash: How to check if the last three string characters equals '***'

I know how to do it for one character:
[ "${filename: -1}" == "*" ]
Is it possible to do it for more?
Why have you stopped in the -1 value?
The manual pages of bash give the answer:
${parameter:offset}
${parameter:offset:length}
If offset evaluates to a number less than zero, the value is used as
an offset in characters from the end of the value of parameter. If
length evaluates to a number less than zero, it is interpreted as an
offset in characters from the end of the value of parameter rather
than a number of characters, and the expansion is the characters
between offset and that result. Note that a negative offset must be
separated from the colon by at least one space to avoid being confused
with the ‘:-’ expansion.
Therefore
[ "${filename: -3}" == "***" ]

Bash script string processing

I wrote a script that reads a Plain text and a key, and then loops trough each character of plain text and shifts it with the value of the corresponding character in key text, with a=0 b=1 c=2 ... z = 25
the code works fine but with a string of size 1K characters it takes almost 3s to execute.
this is the code:
small="abcdefghijklmnopqrstuvwxyz" ## used to search and return the position of some small letter in a string
capital="ABCDEFGHIJKLMNOPQRSTUVWXYZ" ## used to search and return the position of some capital letter in a string
read Plain_text
read Key_text
## saving the length of each string
length_1=${#Plain_text}
length_2=${#Key_text}
printf " Your Plain text is: %s\n The Key is: %s\n The resulting Cipher text is: " "$Plain_text" "$Key_text"
for(( i=0,j=0;i<$length_1;++i,j=`expr $(($j + 1)) % $length_2` )) ## variable 'i' is the index for the first string, 'j' is the index of the second string
do
## return a substring statring from position 'i' and with length 1
c=${Plain_text:$i:1}
d=${Key_text:$j:1}
## function index takes two parameters, the string to seach in and a substring,
## and return the index of the first occerunce of the substring with base-insex 1
x=`expr index "$small" $c`
y=`expr index "$small" $d`
##shifting the current letter to the right with the vaule of the corresponding letter in the key mod 26
z=`expr $(($x + $y - 2)) % 26`
##print the resulting letter from capital letter string
printf "%s" "${capital:$z:1}"
done
echo ""
How is it possible to improve the performance of this code.
Thank you.
You are creating 4 new processes in each iteration of your for loop by using command substitution (3 substitutions in the body, 1 in the head). You should use arithmetic expansion instead of calling expr (search for $(( in the bash(1) manpage). Note that you don't need the $ to substitute variables inside $(( and )).
you can change character like this
a=( soheil )
echo ${a/${a:0:1}/${a:1:1}}
for change all char use loop like for
and for change char to upper
echo soheil | tr "[:lower:]" "[:upper:]"
i hope i understand your question.
be at peace
You will have a lot of repeating chars in a 1K string.
Imagine the input was 1M.
You should calculate all request/respond pairs in front, so your routine only has to lookup the replacement.
I would think of a solution with arrays is the best approach here.

changing position of character in string bash

i have this string
E="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
any idea how to change the position of the letters with its neighbour if the user enters no
and it will continue changing position until user satisfied with the string OR it has reach end of the string.
is the position of 1st correct? Y/N
N
E=BACDEFGHIJKLMNOPQRSTUVWXYZ
*some of my code here*
are u satisfied? Y/N
N
is the position of 2nd correct? Y/N
N
E=BCADEFGHIJKLMNOPQRSTUVWXYZ
*some of my code here*
are u satisfied? Y/N
N
is the position 3rd correct? Y?N
Y
E=BCADEFGHIJKLMNOPQRSTUVWXYZ
*some of my code here*
are u satisfied? Y/N
N
is the position 4th correct? Y?N
Y
E=BCADEFGHIJKLMNOPQRSTUVWXYZ
*some of my code here*
are u satisfied? Y/N
Y
*exit prog*
any help will greatly appreciated. thanks
edited
i got this code from a forum. worked perfectly. but any idea how to swap next character after it have done once? for example ive done the first position, and i want to run it for the second character? any idea?
dual=ETAOINSHRDLCUMWFGYPBVKJXQZ
phrase='E'
rotat=1
newphrase=$(echo $phrase | tr "${dual:0:26}" "${dual:${rotat}:26}")
echo ${newphrase}
You will have to use a loop.
#!/bin/bash
E="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
echo "$E"
for (( i = 1; i < ${#E}; i++ )); do
echo "Is position $i correct? Y/N"
read answer
if [ "$answer" == "N" -o "$answer" == "n" ]
then
E="${E:0:$i-1}${E:$i:1}${E:$i-1:1}${E:$i+1}"
fi
echo "$E"
echo "Are you satisfied? Y/N"
read answer
if [ "$answer" == "Y" -o "$answer" == "y" ]
then
break
fi
done
The loop iterates over every character of the string. The string altering happens in the first if clause. It's nothing more than basic substring operations. ${E:n} returns the substring of E starting at position n. ${E:n:m} returns the next m characters of E starting at position n . The remaining lines are the handling if the user is satisfied and wants to exit.
With bash, you can extract a substring easily:
${string:position:length}
This syntax allows you to use variable extensions, so it is quite straightforward to swap two consective characters in a string:
E="${dual:0:$rotat}${dual:$((rotat+1)):1}${dual:$rotat:1}${dual:$((rotat+2))}"
Arithmetics may need to be enclosed into $((...)).
From bash man pages:
${parameter:offset}
${parameter:offset:length}
Substring Expansion. Expands to up to length characters of parameter starting at the character specified by offset. If
length is omitted, expands to the substring of parameter starting at the character specified by offset. length and offset are
arithmetic expressions (see ARITHMETIC EVALUATION below). length must evaluate to a number greater than or equal to zero. If
offset evaluates to a number less than zero, the value is used as an offset from the end of the value of parameter. If param-
eter is #, the result is length positional parameters beginning at offset. If parameter is an array name indexed by # or *,
the result is the length members of the array beginning with ${parameter[offset]}. A negative offset is taken relative to one
greater than the maximum index of the specified array. Note that a negative offset must be separated from the colon by at
least one space to avoid being confused with the :- expansion. Substring indexing is zero-based unless the positional parame-
ters are used, in which case the indexing starts at 1.
Examples:
pos=5
E="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
echo "${E:pos:1} # print 6th character (F)
echo "${E:pos} # print from 6th character (F) to the end
What dou you mean when you say "its neighbour"? Excepting first and last characters, every character in the string has two neighbours.
To exchange the "POS" character (starting from 1) and its next one (POS+1):
E="${E:0:POS-1}${E:POS:1}${E:POS-1:1}${E:POS+1}"

Resources