Linux command or Bash syntax that calculate the next ASCII character - linux

I have a Linux machine (Red Hat Linux 5.1), and I need to add the following task to my Bash script.
Which Linux command or Bash syntax will calculate the next ASCII character?
Remark – the command syntax can be also AWK/Perl, but this syntax must be in my Bash script.
Example:
input results
a --> the next is b
c --> the next is d
A --> the next is B

Use translate (tr):
echo "aA mM yY" | tr "a-yA-Y" "b-zB-Z"
It prints:
bB nN zZ

Perl's ++ operator also handles strings, to an extent:
perl -nle 'print ++$_'
The -l option with autochomp is necessary here, since a\n for example will otherwise return 1.

You could use chr() and ord() functions for Bash (see How do I convert an ASCII character to its decimal (or hexadecimal) value and back?):
# POSIX
# chr() - converts decimal value to its ASCII character representation
# ord() - converts ASCII character to its decimal value

perl -le "print chr(ord(<>) + 1)"
Interactive:
breqwas#buttonbox:~$ perl -le "print chr(ord(<>) + 1)"
M
N
Non-interactive:
breqwas#buttonbox:~$ echo a | perl -le "print chr(ord(<>) + 1)"
b

The character value:
c="a"
To convert the character to its ASCII value:
v=$(printf %d "'$c")
The value you want to add to this ASCII value:
add=1
To change its ASCII value by adding $add to it:
((v+=add))
To convert the result to char:
perl -X -e "printf('The character is %c\n', $v);"
I used -X to disable all warnings
You can combine all of these in one line and put the result in the vairable $r:
c="a"; add=1; r=$(perl -X -e "printf('%c', $(($add+$(printf %d "'$c"))));")
you can print the result:
echo "$r"
You can make a function to return the result:
achar ()
{
c="$1"; add=$2
printf "$(perl -X -e "printf('%c', $(($add+$(printf %d "'$c"))));")"
}
you can use the function:
x=$(achar "a" 1) // x = the character that follows a by 1
or you can make a loop:
array=( a k m o )
for l in "${array[#]}"
do
echo "$l" is followed by $(achar "$l" 1)
done

Related

How can I truncate a line of text longer than a given length?

How would you go about removing everything after x number of characters? For example, cut everything after 15 characters and add ... to it.
This is an example sentence should turn into This is an exam...
GnuTools head can use chars rather than lines:
head -c 15 <<<'This is an example sentence'
Although consider that head -c only deals with bytes, so this is incompatible with multi-bytes characters like UTF-8 umlaut ü.
Bash built-in string indexing works:
str='This is an example sentence'
echo "${str:0:15}"
Output:
This is an exam
And finally something that works with ksh, dash, zsh…:
printf '%.15s\n' 'This is an example sentence'
Even programmatically:
n=15
printf '%.*s\n' $n 'This is an example sentence'
If you are using Bash, you can directly assign the output of printf to a variable and save a sub-shell call with:
trim_length=15
full_string='This is an example sentence'
printf -v trimmed_string '%.*s' $trim_length "$full_string"
Use sed:
echo 'some long string value' | sed 's/\(.\{15\}\).*/\1.../'
Output:
some long strin...
This solution has the advantage that short strings do not get the ... tail added:
echo 'short string' | sed 's/\(.\{15\}\).*/\1.../'
Output:
short string
So it's one solution for all sized outputs.
Use cut:
echo "This is an example sentence" | cut -c1-15
This is an exam
This includes characters (to handle multi-byte chars) 1-15, c.f. cut(1)
-b, --bytes=LIST
select only these bytes
-c, --characters=LIST
select only these characters
Awk can also accomplish this:
$ echo 'some long string value' | awk '{print substr($0, 1, 15) "..."}'
some long strin...
In awk, $0 is the current line. substr($0, 1, 15) extracts characters 1 through 15 from $0. The trailing "..." appends three dots.
Todd actually has a good answer however I chose to change it up a little to make the function better and remove unnecessary parts :p
trim() {
if (( "${#1}" > "$2" )); then
echo "${1:0:$2}$3"
else
echo "$1"
fi
}
In this version the appended text on longer string are chosen by the third argument, the max length is chosen by the second argument and the text itself is chosen by the first argument.
No need for variables :)
Using Bash Shell Expansions (No External Commands)
If you don't care about shell portability, you can do this entirely within Bash using a number of different shell expansions in the printf builtin. This avoids shelling out to external commands. For example:
trim () {
local str ellipsis_utf8
local -i maxlen
# use explaining variables; avoid magic numbers
str="$*"
maxlen="15"
ellipsis_utf8=$'\u2026'
# only truncate $str when longer than $maxlen
if (( "${#str}" > "$maxlen" )); then
printf "%s%s\n" "${str:0:$maxlen}" "${ellipsis_utf8}"
else
printf "%s\n" "$str"
fi
}
trim "This is an example sentence." # This is an exam…
trim "Short sentence." # Short sentence.
trim "-n Flag-like strings." # Flag-like strin…
trim "With interstitial -E flag." # With interstiti…
You can also loop through an entire file this way. Given a file containing the same sentences above (one per line), you can use the read builtin's default REPLY variable as follows:
while read; do
trim "$REPLY"
done < example.txt
Whether or not this approach is faster or easier to read is debatable, but it's 100% Bash and executes without forks or subshells.

Convert carriage return (\r) to actual overwrite

Questions
Is there a way to convert the carriage returns to actual overwrite in a string so that 000000000000\r1010 is transformed to 101000000000?
Context
1. Initial objective:
Having a number x (between 0 and 255) in base 10, I want to convert this number in base 2, add trailing zeros to get a 12-digits long binary representation, generate 12 different numbers (each of them made of the last n digits in base 2, with n between 1 and 12) and print the base 10 representation of these 12 numbers.
2. Example:
With x = 10
Base 2 is 1010
With trailing zeros 101000000000
Extract the 12 "leading" numbers: 1, 10, 101, 1010, 10100, 101000, ...
Convert to base 10: 1, 2, 5, 10, 20, 40, ...
3. What I have done (it does not work):
x=10
x_base2="$(echo "obase=2;ibase=10;${x}" | bc)"
x_base2_padded="$(printf '%012d\r%s' 0 "${x_base2}")"
for i in {1..12}
do
t=$(echo ${x_base2_padded:0:${i}})
echo "obase=10;ibase=2;${t}" | bc
done
4. Why it does not work
Because the variable x_base2_padded contains the whole sequence 000000000000\r1010. This can be confirmed using hexdump for instance. In the for loop, when I extract the first 12 characters, I only get zeros.
5. Alternatives
I know I can find alternative by literally adding zeros to the variable as follow:
x_base2=1010
x_base2_padded="$(printf '%s%0.*d' "${x_base2}" $((12-${#x_base2})) 0)"
Or by padding with zeros using printf and rev
x_base2=1010
x_base2_padded="$(printf '%012s' "$(printf "${x_base2}" | rev)" | rev)"
Although these alternatives solve my problem now and let me continue my work, it does not really answer my question.
Related issue
The same problem may be observed in different contexts. For instance if one tries to concatenate multiple strings containing carriage returns. The result may be hard to predict.
str=$'bar\rfoo'
echo "${str}"
echo "${str}${str}"
echo "${str}${str}${str}"
echo "${str}${str}${str}${str}"
echo "${str}${str}${str}${str}${str}"
The first echo will output foo. Although you might expect the other echo to output foofoofoo..., they all output foobar.
The following function overwrite transforms its argument such that after each carriage return \r the beginning of the string is actually overwritten:
overwrite() {
local segment result=
while IFS= read -rd $'\r' segment; do
result="$segment${result:${#segment}}"
done < <(printf '%s\r' "$#")
printf %s "$result"
}
Example
$ overwrite $'abcdef\r0123\rxy'
xy23ef
Note that the printed string is actually xy23ef, unlike echo $'abcdef\r0123\rxy' which only seems to print the same string, but still prints \r which is then interpreted by your terminal such that the result looks the same. You can confirm this with hexdump:
$ echo $'abcdef\r0123\rxy' | hexdump -c
0000000 a b c d e f \r 0 1 2 3 \r x y \n
000000f
$ overwrite $'abcdef\r0123\rxy' | hexdump -c
0000000 x y 2 3 e f
0000006
The function overwrite also supports overwriting by arguments instead of \r-delimited segments:
$ overwrite abcdef 0123 xy
xy23ef
To convert variables in-place, use a subshell: myvar=$(overwrite "$myvar")
With awk, you'd set the field delimiter to \r and iterate through fields printing only the visible portions of them.
awk -F'\r' '{
offset = 1
for (i=NF; i>0; i--) {
if (offset <= length($i)) {
printf "%s", substr($i, offset)
offset = length($i) + 1
}
}
print ""
}'
This is indeed too long to put into a command substitution. So you better wrap this in a function, and pipe the lines to be resolved to that.
To answer the specific question, how to convert 000000000000\r1010 to 101000000000, refer to Socowi's answer.
However, I wouldn't introduce the carriage return in the first place and solve the problem like this:
#!/usr/bin/env bash
x=$1
# Start with 12 zeroes
var='000000000000'
# Convert input to binary
binary=$(bc <<< "obase = 2; $x")
# Rightpad with zeroes: ${#binary} is the number of characters in $binary,
# and ${var:x} removes the first x characters from $var
var=$binary${var:${#binary}}
# Print 12 substrings, convert to decimal: ${var:0:i} extracts the first
# i characters from $var, and $((x#$var)) interprets $var in base x
for ((i = 1; i <= ${#var}; ++i)); do
echo "$((2#${var:0:i}))"
done

how to "decdump" a string in bash?

I need to convert a string into a sequence of decimal ascii code using bash command.
example:
for the string 'abc' the desired output would be 979899 where a=97, b=98 and c=99 in ascii decimal code.
I was able to achieve this with ascii hex code using xxd.
printf '%s' 'abc' | xxd -p
which gives me the result: 616263
where a=61, b=62 and c=63 in ascii hexadecimal code.
Is there an equivalent to xxd that gives the result in ascii decimal code instead of ascii hex code?
If you don't mind the results are merged into a line, please try the following:
echo -n "abc" | xxd -p -c 1 |
while read -r line; do
echo -n "$(( 16#$line ))"
done
Result:
979899
str=abc
printf '%s' $str | od -An -tu1
The -An gets rid of the address line, which od normally outputs, and the -tu1 treats each input byte as unsigned integer. Note that it assumes that one character is one byte, so it won't work with Unicode, JIS or the like.
If you really don't want spaces in the result, pipe it further into tr -d ' '.
Unicode Solution
What makes this problem annoying is that you have to pipeline characters when converting from hex to decimal. So you can't do a simple conversion from char to hex to dec as some characters hex representations are longer than others.
Both of these solutions are compatible with unicode and use a character's code point. In both solutions, a newline is chosen as separator for clarity; change this to '' for no separator.
Bash
sep='\n'
charAry=($(printf 'abc🎶' | grep -o .))
for i in "${charAry[#]}"; do
printf "%d$sep" "'$i"
done && echo
97
98
99
127926
Python (in Bash)
Here, we use a list comprehension to convert every character to a decimal number (ord), join it as a string and print it. sys.stdin.read() allows us to use Python inline to get input from a pipe. If you replace input with your intended string, this solution is then cross-platform.
printf '%s' 'abc🎶' | python -c "
import sys
input = sys.stdin.read()
sep = '\n'
print(sep.join([str(ord(i)) for i in input]))"
97
98
99
127926
Edit: If all you care about is using hex regardless of encoding, use #user1934428's answer

edit ASCII value of a character in bash

I am trying to update the ASCII value of each character of a string array in bash on which I want to add 2 to the existing character ASCII value.
Example:
declare -a x =("j" "a" "f" "a" "r")
I want to update the ASCII value incrementing the existing by 2 , such "j" will become "l"
I can't find anything dealing with the ASCII value beyond
print f '%d' "'$char"
Can anyone help me please?
And also when I try to copy an array into another it doesn't work
note that I am using
declare -a temp=("${x[#]}")
What is wrong with it?
You can turn an integer into a char by first using printf to turn it into an octal escape sequence (like \123) and then using that a printf format string to produce the character:
#!/bin/bash
char="j"
printf -v num %d "'$char"
(( num += 2 ))
printf -v newchar \\$(printf '%03o' "$num")
echo "$newchar"
This only works for ASCII.
It seems tr can help you here:
y=($(echo ${x[#]} | tr a-z c-zab))
tr maps characters from one set to another. In this example, from the set of a b c ... z, it maps to c d e ... z a b. So, you're effectively "rotating" the characters. This principle is used by the ROT13 cipher.

How do I reverse escape backslash encodings like "\ " and "\303\266" in bash?

I have a script that records files with UTF8 encoded names. However the script's encoding / environment wasn't set up right, and it just recoded the raw bytes. I now have lots of lines in the file like this:
.../My\ Folders/My\ r\303\266m/...
So there are spaces in the filenames with \ and UTF8 encoded stuff like \303\266 (which is ö). I want to reverse this encoding? Is there some easy set of bash command line commands I can chain together to remove them?
I could get millions of sed commands but that'd take ages to list all the non-ASCII characters we have. Or start parsing it in python. But I'm hoping there's some trick I can do.
Here's a rough stab at the Unicode characters:
text="/My\ Folders/My\ r\303\266m/"
text="echo \$\'"$(echo "$text"|sed -e 's|\\|\\\\|g')"\'"
# the argument to the echo must not be quoted or escaped-quoted in the next step
text=$(eval "echo $(eval "$text")")
read text < <(echo "$text")
echo "$text"
This makes use of the $'string' quoting feature of Bash.
This outputs "/My Folders/My röm/".
As of Bash 4.4, it's as easy as:
text="/My Folders/My r\303\266m/"
echo "${text#E}"
This uses a new feature of Bash called parameter transformation. The E operator causes the parameter to be treated as if its contents were inside $'string' in which backslash escaped sequences, in this case octal values, are evaluated.
It is not clear exactly what kind of escaping is being used. The octal character codes are C, but C does not escape space. The space escape is used in the shell, but it does not use octal character escapes.
Something close to C-style escaping can be undone using the command printf %b $escaped. (The documentation says that octal escapes start with \0, but that does not seem to be required by GNU printf.) Another answer mentions read for unescaping shell escapes, although if space is the only one that is not handled by printf %b then handling that case with sed would probably be better.
In the end I used something like this:
cat file | sed 's/%/%%/g' | while read -r line ; do printf "${line}\n" ; done | sed 's/\\ / /g'
Some of the files had % in them, which is a printf special character, so I had to 'double it up' so that it would be escaped and passed straight through. The -r in read stops read escaping the \'s however read doesn't turn "\ " into " ", so I needed the final sed.
Use printf to solve the issue with utf-8 text. Use read to take care of spaces (\ ).
Like this:
$ text='/My\ Folders/My\ r\303\266m/'
$ IFS='' read t < <(printf "$text")
$ echo "$t"
/My Folders/My röm/
The built-in 'read' function will handle part of the
problem:
$ echo "with\ spaces" | while read r; do echo $r; done
with spaces
Pass the file (line by line) to the following perl script.
#!/usr/bin/per
sub encode {
$String = $_[0];
$_ = $String;
while(/(\\[0-9]+|.)/g) {
$Match = $1;
if ($Match =~ /\\([0-9]+)/) {
$Code = oct(0 + $1);
$Char = ((($Code >= 32) && ($Code 160))
? chr($Code)
: sprintf("\\x{%X}", $Code);
printf("%s", $Char);
} else {
print "$Match";
}
}
print "\n";
}
while ($#ARGV >= 0) {
$File = shift();
open(my $F, ") {
$String =~ s/\\ / /g;
&encode($Line);
}
}
Like this:
$ ./PerlEncode.pl Test.txt
Where Test.txt contains:
/My\ Folders/My\ r\303\266m/
/My\ Folders/My\ r\303\266m/
/My\ Folders/My\ r\303\266m/
The line "$String =~ s/\ / /g;" replace "\ " with " " and sub encode parse those unicode char.
Hope this help

Resources