I need to convert a string into a sequence of decimal ascii code using bash command.
example:
for the string 'abc' the desired output would be 979899 where a=97, b=98 and c=99 in ascii decimal code.
I was able to achieve this with ascii hex code using xxd.
printf '%s' 'abc' | xxd -p
which gives me the result: 616263
where a=61, b=62 and c=63 in ascii hexadecimal code.
Is there an equivalent to xxd that gives the result in ascii decimal code instead of ascii hex code?
If you don't mind the results are merged into a line, please try the following:
echo -n "abc" | xxd -p -c 1 |
while read -r line; do
echo -n "$(( 16#$line ))"
done
Result:
979899
str=abc
printf '%s' $str | od -An -tu1
The -An gets rid of the address line, which od normally outputs, and the -tu1 treats each input byte as unsigned integer. Note that it assumes that one character is one byte, so it won't work with Unicode, JIS or the like.
If you really don't want spaces in the result, pipe it further into tr -d ' '.
Unicode Solution
What makes this problem annoying is that you have to pipeline characters when converting from hex to decimal. So you can't do a simple conversion from char to hex to dec as some characters hex representations are longer than others.
Both of these solutions are compatible with unicode and use a character's code point. In both solutions, a newline is chosen as separator for clarity; change this to '' for no separator.
Bash
sep='\n'
charAry=($(printf 'abc🎶' | grep -o .))
for i in "${charAry[#]}"; do
printf "%d$sep" "'$i"
done && echo
97
98
99
127926
Python (in Bash)
Here, we use a list comprehension to convert every character to a decimal number (ord), join it as a string and print it. sys.stdin.read() allows us to use Python inline to get input from a pipe. If you replace input with your intended string, this solution is then cross-platform.
printf '%s' 'abc🎶' | python -c "
import sys
input = sys.stdin.read()
sep = '\n'
print(sep.join([str(ord(i)) for i in input]))"
97
98
99
127926
Edit: If all you care about is using hex regardless of encoding, use #user1934428's answer
Related
Im trying to generate a password in Bash that matches MacOS password requirements and one of them is that it can't have repeated characters (aa, bb, 44, 00, etc).
I know i can use openssl rand -base64 or /dev/urandom and use tr -d to manipulate the output string. I use grep -E '(.)\1{1,}' to search for repeated characters but if i use this regex to delete (tr -d (.)\1{1,}'), it deletes the entire string. I even tried tr -s '(.)\1{1,}' to squeeze the characters to just one occurrence but it keep generating repeated characters in some attempts. Is it possible to achieve what i'm trying to?
P.S.: that's a situation where i cant download any "password generator tool" like pwgen and more. It must be "native"
Sorry I have no bash at hands to, but trying to help you.
What about iteratively grabbing the unique chars, eg. by
chars=$(openssl rand -base64)
pwd=
for (( i=0; i<${#chars}; i++ )); do
if [[ "$(echo $pwd | grep "${chars:$i:1}")" == "" ]]; then
pwd=$pwd${chars:$i:1}
fi
done
The issue might be that you have non-printable characters, so it's not actually repeated. If you first get the first e.g. 30 characters, then delete any non-alphanumeric, non punctuation characters, then squeeze any of those characters, then from whatever is left get the first 20 characters, it seems to work:
cat /dev/urandom | tr -dc '[:alnum:][:punct:]' | fold -w ${1:-30} | head -n 1 | tr -s '[:alnum:][:punct:]' | cut -c-20
Output e.g.:
]'Zc,fs6m;wUo%wLIG%K
2O3Ff4dzi30~L.RH8jR0
sU?,WkK]&I;z'|eTSLjY
5gK]\H51i#Rtux.{bdC=
:g"\?5JsjBd1r])2^WR+
;{cR:jY\rIc&Q(2yo:|-
fFykmxvZ|ATX_l6L(8h:
^Sd*,V%9}bWnTYNv"w?'
6foMgbU6:n<*cWj2W=3&
*v39FWmB#LwE5O`a3C36
Is there a specific size requirement? Other required characters?
How about -
openssl rand -base64 20 | sed -E 's/(.)\1+/\1/g'
You're getting close. tr doesn't use regex (but does use POSIX character classes).
Either of these will squeeze repeats:
tr -cs '\0'
tr -s '[:graph:][:space:]'
They differ only in how we refer to "all characters". First is "complement of null" second is all printable and all white space characters. There may be a neater way to specify "all characters".
Or using sed:
sed -E 's/(.)\1+/\1/g'
This both squeezes printable characters, and removes white space:
tr -ds '[:space:]' '[:graph:]'
Example for 32 non whitespace characters, with no repeats:
tr -ds '[:space:]' '[:graph:]' < /dev/urandom |
dd bs=32 count=1
Also, this example specifies a list of allowed characters (letters, digits, and _.), then squeezes any repeats:
tr -dc '[:alnum:]_.' < /dev/urandom |
tr -sc '\0' |
dd bs=32 count=1
Example output:
9mCEqrhHPwmq7.1qEky6qn4jqzDpRK7b
Putting dd at the end means we get 32 characters after removing repeats. You may also want to add status=none to hide dd logging on stderr.
It's not clear if you don't want consecutive chars repeated or no repeated chars at all (which in either case I don't think it's a good idea as it would make your passwords weaker and easier to guess), but having said that
#! /bin/bash
awk -vN=20 '
{
n=split($0,ch,"");
for (i=1; i<n; i++) {
a[ch[i]]++
}
n=0;
for (c in a) {
if (++n > N) {
break;
}
printf("%c",c)
}
printf("\n")
}
' < <(openssl rand -base64 32)
this generated N length passwords without repeated chars from 32 random bytes (that is N should be much smaller than 32)
I have the file, which have a Chinese word in each line like this :
王大明
新型传染病
電子雷射
I want to add the number of Chinese character in each end of line :
王大明 3
新型传染病 5
電子雷射 4
How can I do this?
I know command, sed, wc. However, I cannot achieve this work. I tried many things, but clearly I need help here.
sed -i s/$/{length $0}/ myfile
sed -i s/$/{wc -m}/ myfile
awk '{$2=system(awk 'length') OFS $2} 1' myfile
What exactly will work will depend entirely on what exactly your input looks like. If you are dealing with Unicode glyphs, use a Unicode-aware tool such as e.g. Python.
bash$ cat uniline
#!/usr/bin/env python3
import sys
for line in sys.stdin:
line = line.rstrip('\n')
print(line, len(line))
bash$ chmod +x uniline
bash$ uniline <<\:
> 王大明
> 新型传染病
> 電子雷射
> :
王大明 3
新型传染病 5
電子雷射 4
(I had to trim some whitespace from the ends of the lines in the example you posted.)
For the record, my system encoding is UTF-8, meaning the first line's representation as bytes is
bash$ echo '王大明' | xxd
00000000: e78e 8be5 a4a7 e698 8e0a ..........
Perhaps see also Problematic questions about decoding errors for some relevant background.
If you are lucky, even Awk and wc might be locale-aware on your platform. Your sed attempts really have no chance of working (though if you have GNU sed you could try with the /e option; but really, probably don't). If you have GNU Awk and the en_US.UTF-8 locale defined, this works, too:
bash$ echo $'\xe7\x8e\x8b\xe5\xa4\xa7\xe6\x98\x8e' |
> LC_ALL=en-US.UTF-8 awk '{ print $0, length }'
王大明 3
if you're VERY certain the only multi-byte characters there are chinese, then do
gawk/mawk/mawk2 '{ print $0, \
\
gsub(/\342|\343|\344|\345|\346|\347|\350|\351|\357|\360/, "&") }'
This list of leading-bytes shall correctly account for either 3- or 4-byte code-points related to chinese chars, of either simplified and traditional, plus all special compatibility variants.
Run that in either byte-mode or unicode-mode and it'll give you the same result. Your locale settings DOES NOT matter here (as long as your input is already UTF8 compliant text)
If you're definitely in byte-mode or LC_ALL=C, then
awk '{ print $0, gsub(/[\342-\351\357\360]/,"&") }'
One of the less-mentioned-but-excellent use case for gsub() is to use it for purposes of counting occurrences without having to do split() or substr().
if you're REALLY pedantic about exactness, the hideous regex i use myself is
function isChinese(str6) { return (str6 ~
/\344|\345|\346|\347|\350|\351|
(\343|\360|\357)(\244|\245|\246|\247|
\250|\251|\252|\253)|(\357\271|
\343(\204|\207))(\200|\201|\202|\203|\204|
\205|\206|\207|\210|
\211|\212|\213|\214|\215|\216|\217)|(\343\206|
\357\270)(\260|\261|\262|\263|\264|\265|\266|\267|
\270|\271|\272|\273|\274|\275|\276|\277)|
(\343|\360)(\240|\241|\242|\243|\254|\255|\256|\257|\260|
\261)|\342(\272|\273|\274|\275|\276|\277(\200|
\210|\211|\212|\213|\214|\215|\216|\217))|
(\342\277|\343(\204|\206|\207))(\220|\221|\222|
\223|\224|\225|\226|\227|\230|\231|\232|\233|
\234|\235|\236|\237)|\343(\200|\210|\211|\212|
\213|\214|\215|\216|\217|\220|\221|\222|\223|
\224|\225|\226|\227|\230|\231|\232|\233|\234|
\235|\236|\237|\262|\263|\264|\265|\266|(\204|
\206|\207)(\240|\241|\242|\243|\244|
\245|\246|\247|\250|\251|\252|\253|\254|\255|\256|\257))/) };
How would you go about removing everything after x number of characters? For example, cut everything after 15 characters and add ... to it.
This is an example sentence should turn into This is an exam...
GnuTools head can use chars rather than lines:
head -c 15 <<<'This is an example sentence'
Although consider that head -c only deals with bytes, so this is incompatible with multi-bytes characters like UTF-8 umlaut ü.
Bash built-in string indexing works:
str='This is an example sentence'
echo "${str:0:15}"
Output:
This is an exam
And finally something that works with ksh, dash, zsh…:
printf '%.15s\n' 'This is an example sentence'
Even programmatically:
n=15
printf '%.*s\n' $n 'This is an example sentence'
If you are using Bash, you can directly assign the output of printf to a variable and save a sub-shell call with:
trim_length=15
full_string='This is an example sentence'
printf -v trimmed_string '%.*s' $trim_length "$full_string"
Use sed:
echo 'some long string value' | sed 's/\(.\{15\}\).*/\1.../'
Output:
some long strin...
This solution has the advantage that short strings do not get the ... tail added:
echo 'short string' | sed 's/\(.\{15\}\).*/\1.../'
Output:
short string
So it's one solution for all sized outputs.
Use cut:
echo "This is an example sentence" | cut -c1-15
This is an exam
This includes characters (to handle multi-byte chars) 1-15, c.f. cut(1)
-b, --bytes=LIST
select only these bytes
-c, --characters=LIST
select only these characters
Awk can also accomplish this:
$ echo 'some long string value' | awk '{print substr($0, 1, 15) "..."}'
some long strin...
In awk, $0 is the current line. substr($0, 1, 15) extracts characters 1 through 15 from $0. The trailing "..." appends three dots.
Todd actually has a good answer however I chose to change it up a little to make the function better and remove unnecessary parts :p
trim() {
if (( "${#1}" > "$2" )); then
echo "${1:0:$2}$3"
else
echo "$1"
fi
}
In this version the appended text on longer string are chosen by the third argument, the max length is chosen by the second argument and the text itself is chosen by the first argument.
No need for variables :)
Using Bash Shell Expansions (No External Commands)
If you don't care about shell portability, you can do this entirely within Bash using a number of different shell expansions in the printf builtin. This avoids shelling out to external commands. For example:
trim () {
local str ellipsis_utf8
local -i maxlen
# use explaining variables; avoid magic numbers
str="$*"
maxlen="15"
ellipsis_utf8=$'\u2026'
# only truncate $str when longer than $maxlen
if (( "${#str}" > "$maxlen" )); then
printf "%s%s\n" "${str:0:$maxlen}" "${ellipsis_utf8}"
else
printf "%s\n" "$str"
fi
}
trim "This is an example sentence." # This is an exam…
trim "Short sentence." # Short sentence.
trim "-n Flag-like strings." # Flag-like strin…
trim "With interstitial -E flag." # With interstiti…
You can also loop through an entire file this way. Given a file containing the same sentences above (one per line), you can use the read builtin's default REPLY variable as follows:
while read; do
trim "$REPLY"
done < example.txt
Whether or not this approach is faster or easier to read is debatable, but it's 100% Bash and executes without forks or subshells.
I have a 2GB file in raw format. I want to search for all appearance of a specific HEX value "355A3C2F74696D653E" AND collect the following 28 characters.
Example: 355A3C2F74696D653E323031312D30342D32365431343A34373A30322D31343A34373A3135
In this case I want the output: "323031312D30342D32365431343A34373A30322D31343A34373A3135" or better: 2011-04-26T14:47:02-14:47:15
I have tried with
xxd -u InputFile | grep '355A3C2F74696D653E' | cut -c 1-28 > OutputFile.txt
and
xxd -u -ps -c 4000000 InputFile | grep '355A3C2F74696D653E' | cut -b 1-28 > OutputFile.txt
But I can't get it working.
Can anybody give me a hint?
As you are using xxd it seems to me that you want to search the file as if it were binary data. I'd recommend using a more powerful programming language for this; the Unix shell tools assume there are line endings and that the text is mostly 7-bit ASCII. Consider using Python:
#!/usr/bin/python
import mmap
fd = open("file_to_search", "rb")
needle = "\x35\x5A\x3C\x2F\x74\x69\x6D\x65\x3E"
haystack = mmap.mmap(fd.fileno(), length = 0, access = mmap.ACCESS_READ)
i = haystack.find(needle)
while i >= 0:
i += len(needle)
print (haystack[i : i + 28])
i = haystack.find(needle, i)
If your grep supports -P parameter then you could simply use the below command.
$ echo '355A3C2F74696D653E323031312D30342D32365431343A34373A30322D31343A34373A3135' | grep -oP '355A3C2F74696D653E\K.{28}'
323031312D30342D32365431343A
For 56 chars,
$ echo '355A3C2F74696D653E323031312D30342D32365431343A34373A30322D31343A34373A3135' | grep -oP '355A3C2F74696D653E\K.{56}'
323031312D30342D32365431343A34373A30322D31343A34373A3135
Why convert to hex first? See if this awk script works for you. It looks for the string you want to match on, then prints the next 28 characters. Special characters are escaped with a backslash in the pattern.
Adapted from this post: Grep characters before and after match?
I added some blank lines for readability.
VirtualBox:~$ cat data.dat
Thisis a test of somerandom characters before thestringI want5Z</time>2011-04-26T14:47:02-14:47:15plus somemoredata
VirtualBox:~$ cat test.sh
awk '/5Z\<\/time\>/ {
match($0, /5Z\<\/time\>/); print substr($0, RSTART + 9, 28);
}' data.dat
VirtualBox:~$ ./test.sh
2011-04-26T14:47:02-14:47:15
VirtualBox:~$
EDIT: I just realized something. The regular expression will need to be tweaked to be non-greedy, etc and between that and awk need to be tweaked to handle multiple occurrences as you need them. Perhaps some of the folks more up on awk can chime in with improvements as I am real rusty. An approach to consider anyway.
I have a Linux machine (Red Hat Linux 5.1), and I need to add the following task to my Bash script.
Which Linux command or Bash syntax will calculate the next ASCII character?
Remark – the command syntax can be also AWK/Perl, but this syntax must be in my Bash script.
Example:
input results
a --> the next is b
c --> the next is d
A --> the next is B
Use translate (tr):
echo "aA mM yY" | tr "a-yA-Y" "b-zB-Z"
It prints:
bB nN zZ
Perl's ++ operator also handles strings, to an extent:
perl -nle 'print ++$_'
The -l option with autochomp is necessary here, since a\n for example will otherwise return 1.
You could use chr() and ord() functions for Bash (see How do I convert an ASCII character to its decimal (or hexadecimal) value and back?):
# POSIX
# chr() - converts decimal value to its ASCII character representation
# ord() - converts ASCII character to its decimal value
perl -le "print chr(ord(<>) + 1)"
Interactive:
breqwas#buttonbox:~$ perl -le "print chr(ord(<>) + 1)"
M
N
Non-interactive:
breqwas#buttonbox:~$ echo a | perl -le "print chr(ord(<>) + 1)"
b
The character value:
c="a"
To convert the character to its ASCII value:
v=$(printf %d "'$c")
The value you want to add to this ASCII value:
add=1
To change its ASCII value by adding $add to it:
((v+=add))
To convert the result to char:
perl -X -e "printf('The character is %c\n', $v);"
I used -X to disable all warnings
You can combine all of these in one line and put the result in the vairable $r:
c="a"; add=1; r=$(perl -X -e "printf('%c', $(($add+$(printf %d "'$c"))));")
you can print the result:
echo "$r"
You can make a function to return the result:
achar ()
{
c="$1"; add=$2
printf "$(perl -X -e "printf('%c', $(($add+$(printf %d "'$c"))));")"
}
you can use the function:
x=$(achar "a" 1) // x = the character that follows a by 1
or you can make a loop:
array=( a k m o )
for l in "${array[#]}"
do
echo "$l" is followed by $(achar "$l" 1)
done