Convert binary data to hexadecimal in a shell script

Convert binary data to hexadecimal in a shell script - linux

I want to convert binary data to hexadecimal, just that, no fancy formatting and all. hexdump seems too clever, and it "overformats" for me. I want to take x bytes from the /dev/random and pass them on as hexadecimal.
Preferably I'd like to use only standard Linux tools, so that I don't need to install it on every machine (there are many).

Perhaps use xxd:
% xxd -l 16 -p /dev/random
193f6c54814f0576bc27d51ab39081dc

Watch out!
hexdump and xxd give the results in a different endianness!
$ echo -n $'\x12\x34' | xxd -p
1234
$ echo -n $'\x12\x34' | hexdump -e '"%x"'
3412
Simply explained. Big-endian vs. little-endian :D

With od (GNU systems):
$ echo abc | od -A n -v -t x1 | tr -d ' \n'
6162630a
With hexdump (BSD systems):
$ echo abc | hexdump -ve '/1 "%02x"'
6162630a
From Hex dump, od and hexdump:
"Depending on your system type, either or both of these two utilities will be available--BSD systems deprecate od for hexdump, GNU systems the reverse."

Perhaps you could write your own small tool in C, and compile it on-the-fly:
int main (void) {
unsigned char data[1024];
size_t numread, i;
while ((numread = read(0, data, 1024)) > 0) {
for (i = 0; i < numread; i++) {
printf("%02x ", data[i]);
}
}
return 0;
}
And then feed it from the standard input:
cat /bin/ls | ./a.out
You can even embed this small C program in a shell script using the heredoc syntax.

All the solutions seem to be hard to remember or too complex. I find using printf the shortest one:
$ printf '%x\n' 256
100
But as noted in comments, this is not what author wants, so to be fair, below is the full answer.
... to use above to output actual binary data stream:
printf '%x\n' $(cat /dev/urandom | head -c 5 | od -An -vtu1)
What it does:
printf '%x\n' .... - prints a sequence of integers , i.e. printf '%x,' 1 2 3, will print 1,2,3,
$(...) - this is a way to get output of some shell command and process it
cat /dev/urandom - it outputs random binary data
head -c 5 - limits binary data to 5 bytes
od -An -vtu1 - octal dump command, converts binary to decimal
As a testcase ('a' is 61 hex, 'p' is 70 hex, ...):
$ printf '%x\n' $(echo "apple" | head -c 5 | od -An -vtu1)
61
70
70
6c
65
Or to test individual binary bytes, on input let’s give 61 decimal ('=' char) to produce binary data ('\\x%x' format does it). The above command will correctly output 3d (decimal 61):
$printf '%x\n' $(echo -ne "$(printf '\\x%x' 61)" | head -c 5 | od -An -vtu1)
3d

If you need a large stream (no newlines) you can use tr and xxd (part of Vim) for byte-by-byte conversion.
head -c1024 /dev/urandom | xxd -p | tr -d $'\n'
Or you can use hexdump (POSIX) for word-by-word conversion.
head -c1024 /dev/urandom | hexdump '-e"%x"'
Note that the difference is endianness.

dd + hexdump will also work:
dd bs=1 count=1 if=/dev/urandom 2>/dev/null | hexdump -e '"%x"'

Sometimes perl5 works better for portability if you target more than one platform. It comes with every Linux distribution and Unix OS. You can often find it in container images where other tools like xxd or hexdump are not available. Here's how to do the same thing in Perl:
$ head -c8 /dev/urandom | perl -0777 -ne 'print unpack "H*"'
5c9ed169dabf33ab
$ echo -n $'\x01\x23\xff' | perl -0777 -ne 'print unpack "H*"'
0123ff
$ echo abc | perl -0777 -ne 'print unpack "H*"'
6162630a
Note that this uses slurp more, which causes Perl to read the entire input into memory, which may be suboptimal when the input is large.

These three commands will print the same (0102030405060708090a0b0c):
n=12
echo "$a" | xxd -l "$n" -p
echo "$a" | od -N "$n" -An -tx1 | tr -d " \n" ; echo
echo "$a" | hexdump -n "$n" -e '/1 "%02x"'; echo
Given that n=12 and $a is the byte values from 1 to 26:
a="$(printf '%b' "$(printf '\\0%o' {1..26})")"
That could be used to get $n random byte values in each program:
xxd -l "$n" -p /dev/urandom
od -vN "$n" -An -tx1 /dev/urandom | tr -d " \n" ; echo
hexdump -vn "$n" -e '/1 "%02x"' /dev/urandom ; echo

Related

why does echo -n "100" | wc -c output 3?

I just happened to be playing around with a few linux commands and i found that echo -n "100" | wc -c outputs 3. i knew that 100 could be stored in a single byte as 1100100 so i could not understand why this happened. I guess that it is because of some teminal encoding, is it ? i also found out that if i touch test.txt and echo -n "100" | test.txt and then execute wc ./test.txt -ci get the same output here also my guess is to blame file encoding, am i right ?

100 is three characters long, hence wc giving you 3. If you left out the -n to echo it'd show 4, because echo would be printing out a newline too in that case.

When you echo -n 100, you are showing a string with 3 characters.
When you want to show a character with ascii value 100, use
echo -n "d"
# Check
echo -n "d" | xdd -b
I found value "d" with man ascii. When you don't want to use the man page, use
printf "\\$(printf "%o" 100)"
# Check
printf "\\$(printf "%o" 100)" | xxd -b
# wc returns 1 here
printf "\\$(printf "%o" 100)" | wc -c

It's fine)
$ wc --help
...
-c, --bytes print the byte counts
-m, --chars print the character counts
...
$ man echo
...
-n do not output the trailing newline
...
$ echo -n 'abc' | wc -c
3
$ echo -n 'абс' | wc -c # russian symbols
6

Using "$RANDOM" to generate a random string in Bash

I am trying to use the Bash variable $RANDOM to create a random string that consists of 8 characters from a variable that contains integer and alphanumeric digits, e.g., var="abcd1234ABCD".
How can I do that?

Use parameter expansion. ${#chars} is the number of possible characters, % is the modulo operator. ${chars:offset:length} selects the character(s) at position offset, i.e. 0 - length($chars) in our case.
chars=abcd1234ABCD
for i in {1..8} ; do
echo -n "${chars:RANDOM%${#chars}:1}"
done
echo

For those looking for a random alpha-numeric string in bash:
LC_ALL=C tr -dc A-Za-z0-9 </dev/urandom | head -c 64
The same as a well-documented function:
function rand-str {
# Return random alpha-numeric string of given LENGTH
#
# Usage: VALUE=$(rand-str $LENGTH)
# or: VALUE=$(rand-str)
local DEFAULT_LENGTH=64
local LENGTH=${1:-$DEFAULT_LENGTH}
LC_ALL=C tr -dc A-Za-z0-9 </dev/urandom | head -c $LENGTH
# LC_ALL=C: required for Mac OS X - https://unix.stackexchange.com/a/363194/403075
# -dc: delete complementary set == delete all except given set
}

Another way to generate a 32 bytes (for example) hexadecimal string:
xxd -l 32 -c 32 -p < /dev/random
add -u if you want uppercase characters instead.

OPTION 1 - No specific length, no openssl needed, only letters and numbers, slower than option 2
sed "s/[^a-zA-Z0-9]//g" <<< $(cat /dev/urandom | tr -dc 'a-zA-Z0-9!##$%*()-+' | fold -w 32 | head -n 1)
DEMO: x=100; while [ $x -gt 0 ]; do sed "s/[^a-zA-Z0-9]//g" <<< $(cat /dev/urandom | tr -dc 'a-zA-Z0-9!##$%*()-+' | fold -w 32 | head -n 1) <<< $(openssl rand -base64 17); x=$(($x-1)); done
Examples:
j0PYAlRI1r8zIoOSyBhh9MTtrhcI6d
nrCaiO35BWWQvHE66PjMLGVJPkZ6GBK
0WUHqiXgxLq0V0mBw2d7uafhZt2s
c1KyNeznHltcRrudYpLtDZIc1
edIUBRfttFHVM6Ru7h73StzDnG
OPTION 2 - No specific length, openssl needed, only letters and numbers, faster than option 1
openssl rand -base64 12 # only returns
rand=$(openssl rand -base64 12) # only saves to var
sed "s/[^a-zA-Z0-9]//g" <<< $(openssl rand -base64 17) # leave only letters and numbers
# The last command can go to a var too.
DEMO: x=100; while [ $x -gt 0 ]; do sed "s/[^a-zA-Z0-9]//g" <<< $(openssl rand -base64 17); x=$(($x-1)); done
Examples:
9FbVwZZRQeZSARCH
9f8869EVaUS2jA7Y
V5TJ541atfSQQwNI
V7tgXaVzmBhciXxS
Others options not necessarily related:
uuidgen or cat /proc/sys/kernel/random/uuid
After generating 1 billion UUIDs every second for the next 100 years,
the probability of creating just one duplicate would be about 50%. The
probability of one duplicate would be about 50% if every person on
earth owns 600 million UUIDs 😇 source

Not using $RANDOM, but worth mentioning.
Using shuf as source of entropy (a.k.a randomness) (which, in turn, may use /dev/random as source of entropy. As in `shuf -i1-10 --random-source=/dev/urandom) seems like a solution that use less resources:
$ shuf -er -n8 {A..Z} {a..z} {0..9} | paste -sd ""
tf8ZDZ4U

head -1 <(fold -w 20 <(tr -dc 'a-zA-Z0-9' < /dev/urandom))
This is safe to use in bash script if you have safety options turned on:
set -eou pipefail
This is a workaround of bash exit status 141 when you use pipes
tr -dc 'a-zA-Z0-9' < /dev/urandom | fold -w 20 | head -1

Little bit obscure but short to write solution is
RANDSTR=$(mktemp XXXXX) && rm "$RANDSTR"
expecting you have write access to current directory ;-)
mktemp is part of coreutils
UPDATE:
As Bazi pointed out in the comment, mktemp can be used without creating the file ;-) so the command can be even shorter.
RANDSTR=$(mktemp --dry-run XXXXX)

Using sparse array to shuffle characters.
#!/bin/bash
array=()
for i in {a..z} {A..Z} {0..9}; do
array[$RANDOM]=$i
done
printf %s ${array[#]::8} $'\n'
(Or alot of random strings)
#!/bin/bash
b=()
while ((${#b[#]} <= 32768)); do
a=(); for i in {a..z} {A..Z} {0..9}; do a[$RANDOM]=$i; done; b+=(${a[#]})
done
tr -d ' ' <<< ${b[#]} | fold -w 8 | head -n 4096

An abbreviated safe pipe workaround based on Radu Gabriel's answer and tested with GNU bash version 4.4.20 and set -euxo pipefail:
head -c 20 <(tr -dc [:alnum:] < /dev/urandom)

How to dump part of binary file

I have binary and want to extract part of it, starting from know byte string (i.e. FF D8 FF D0) and ending with known byte string (AF FF D9)
In the past I've used dd to cut part of binary file from beginning/ending but this command doesn't seem to support what I ask.
What tool on terminal can do this?

Locate the start/end position, then extract the range.
$ xxd -g0 input.bin | grep -im1 FFD8FFD0 | awk -F: '{print $1}'
0000cb0
$ ^FFD8FFD0^AFFFD9^
0009590
$ dd ibs=1 count=$((0x9590-0xcb0+1)) skip=$((0xcb0)) if=input.bin of=output.bin

In a single pipe:
xxd -c1 -p file |
awk -v b="ffd8ffd0" -v e="aaffd9" '
found == 1 {
print $0
str = str $0
if (str == e) {found = 0; exit}
if (length(str) == length(e)) str = substr(str, 3)}
found == 0 {
str = str $0
if (str == b) {found = 1; print str; str = ""}
if (length(str) == length(b)) str = substr(str, 3)}
END{ exit found }' |
xxd -r -p > new_file
test ${PIPESTATUS[1]} -eq 0 || rm new_file
The idea is to use awk between two xxd to select the part of the file that is needed. Once the 1st pattern is found, awk prints the bytes until the 2nd pattern is found and exit.
The case where the 1st pattern is found but the 2nd is not must be taken into account. It is done in the END part of the awk script, which return a non-zero exit status. This is catch by bash's ${PIPESTATUS[1]} where I decided to delete the new file.
Note that en empty file also mean that nothing has been found.

This should work with standard tools (xxd, tr, grep, awk, dd). This correctly handles the "pattern split across line" issue, also look for the pattern only aligned at byte offset (not nibble).
file=<yourfile>
outfile=<youroutputfile>
startpattern="ff d8 ff d0"
endpattern="af ff d9"
xxd -g0 -c1 -ps ${file} | tr '\n' ' ' > ${file}.hex
start=$((($(grep -bo "${startpattern}" ${file}.hex\
| head -1 | awk -F: '{print $1}')-1)/3))
len=$((($(grep -bo "${endpattern}" ${file}.hex\
| head -1 | awk -F: '{print $1}')-1)/3-${start}))
dd ibs=1 count=${len} skip=${start} if=${file} of=${outfile}
Note: The script above use a temporary file to prevent having the binary>hex conversion twice. A space/time trade-off is to pipe the result of xxd directly into the two grep. A one-liner is also possible, at the expense of clarity.
One could also use tee and named pipe to prevent having to store a temporary file and converting output twice, but I'm not sure it would be faster (xxd is fast) and is certainly more complex to write.

See this link for a way to do binary grep. Once you have the start and end offset, you should be able with dd to get what you need.

A variation on the awk solution that assumes that your binary file, once converted in hex with spaces, fits in memory:
xxd -c1 -p file |
tr "\n" " " |
sed -n -e 's/.*\(ff d8 ff d0.*aa ff d9\).*/\1/p' |
xxd -r -p > new_file

Another solution in sed, but using less memory:
xxd -c1 -p file |
sed -n -e '1{N;N;N}' -e '/ff\nd8\nff\nd0/{:begin;p;s/.*//;n;bbegin}' -e 'N;D' |
sed -n -e '1{N;N}' -e '/aa\nff\nd9/{p;Q1}' -e 'P;N;D' |
xxd -r -p > new_file
test ${PIPESTATUS[2]} -eq 1 || rm new_file
The 1st sed prints from ff d8 ff d0 till the end of file. Note that you need as much N in -e '1{N;N;N}' as there is bytes in your 1st pattern less one.
The 2nd sed prints from the beginning of the file to aa ff d9. Note again that you need as much N in -e '1{N;N}' as there is bytes in your 2nd pattern less one.
Again, a test is needed to check if the 2nd pattern is found, and delete the file if it is not.
Note that the Q command is a GNU extension to sed. If you do not have it, you need to trash the rest of the file once the pattern is found (in a loop like the 1st sed, but not printing the file), and check after hex to binary conversion that the new_file end with the wright pattern.

How to convert hex to ASCII characters in the Linux shell?

Let's say that I have a string 5a.
This is the hex representation of the ASCII letter Z.
I need to find a Linux shell command which will take a hex string and output the ASCII characters that the hex string represents.
So if I do:
echo 5a | command_im_looking_for
I will see a solitary letter Z:
Z

I used to do this with xxd:
echo -n 5a | xxd -r -p
But then I realised that in Debian/Ubuntu, xxd is part of vim-common and hence might not be present in a minimal system. To also avoid Perl (IMHO also not part of a minimal system), I ended up using sed, xargs, and printf like this:
echo -n 5a | sed 's/\([0-9A-F]\{2\}\)/\\\\\\x\1/gI' | xargs printf
Mostly, I only want to convert a few bytes and it's okay for such tasks. The advantage of this solution over the one of ghostdog74 is, that this can convert hex strings of arbitrary lengths automatically. xargs is used because printf doesnt read from standard input.

echo -n 5a | perl -pe 's/([0-9a-f]{2})/chr hex $1/gie'
Note that this won't skip non-hex characters. If you want just the hex (no whitespace from the original string etc):
echo 5a | perl -ne 's/([0-9a-f]{2})/print chr hex $1/gie'
Also, zsh and bash support this natively in echo:
echo -e '\x5a'

You can do this with echo only, without the other stuff. Don't forget to add "-n" or you will get a linebreak automatically:
echo -n -e "\x5a"

Bash one-liner
echo -n "5a" | while read -N2 code; do printf "\x$code"; done

Some Python 3 one-liners that work with any number of bytes.
Decoding hex
Using strip, so that it's ok to have a newline on stdin.
$ echo 666f6f0a | python3 -c "import sys, binascii; sys.stdout.buffer.write(binascii.unhexlify(input().strip()))"
foo
Encoding hex
$ echo foo | python3 -c "import sys, binascii; print(binascii.hexlify(sys.stdin.buffer.read()).decode())"
666f6f0a

Depending on where you got that "5a", you can just prepend "\x" to it and pass that to printf:
$ a=5a
$ a="\x${a}"
$ printf "$a"
Z

echo 5a | python -c "import sys; print chr(int(sys.stdin.read(),base=16))"

Here is a pure bash script (as printf is a bash builtin) :
#warning : spaces do matter
die(){ echo "$#" >&2;exit 1;}
p=48656c6c6f0a
test $((${#p} & 1)) == 0 || die "length is odd"
p2=''; for ((i=0; i<${#p}; i+=2));do p2=$p2\\x${p:$i:2};done
printf "$p2"
If bash is already running, this should be faster than any other solution which is launching a new process.

dc can convert between numeric bases:
$ echo 5a | (echo 16i; tr 'a-z' 'A-Z'; echo P) | dc
Z$

There is a simple shell command ascii.
If you use Ubuntu, install it with:
sudo apt install ascii
Then
ascii 0x5a
will output:
ASCII 5/10 is decimal 090, hex 5a, octal 132, bits 01011010: prints as `Z'
Official name: Majuscule Z
Other names: Capital Z, Uppercase Z

As per #Randal comment, you can use perl, e.g.
$ printf 5a5a5a5a | perl -lne 'print pack "H*", $_'
ZZZZ
and other way round:
$ printf ZZZZ | perl -lne 'print unpack "H*", $_'
5a5a5a5a
Another example with file:
$ printf 5a5a5a5a | perl -lne 'print pack "H*", $_' > file.bin
$ perl -lne 'print unpack "H*", $_' < file.bin
5a5a5a5a

You can use this command (python script) for larger inputs:
echo 58595a | python -c "import sys; import binascii; print(binascii.unhexlify(sys.stdin.read().strip()).decode())"
The result will be:
XYZ
And for more simplicity, define an alias:
alias hexdecoder='python -c "import sys; import binascii; print(binascii.unhexlify(sys.stdin.read().strip()).decode())"'
echo 58595a | hexdecoder

GNU awk 4.1
awk -niord '$0=chr("0x"RT)' RS=.. ORS=
Note that if you echo to this it will produce an extra null byte
$ echo 595a | awk -niord '$0=chr("0x"RT)' RS=.. ORS= | od -tx1c
0000000 59 5a 00
Y Z \0
Instead use printf
$ printf 595a | awk -niord '$0=chr("0x"RT)' RS=.. ORS= | od -tx1c
0000000 59 5a
Y Z
Also note that GNU awk produces UTF-8 by default
$ printf a1 | awk -niord '$0=chr("0x"RT)' RS=.. ORS= | od -tx1
0000000 c2 a1
If you are dealing with characters outside of ASCII, and you are going to be
Base64 encoding the resultant string, you can disable UTF-8 with -b
echo 5a | sha256sum | awk -bniord 'RT~/\w/,$0=chr("0x"RT)' RS=.. ORS=

Similar to my answer here: Linux shell scripting: hex number to binary string
You can do it with the same tool like this (using ascii printable character instead of 5a):
echo -n 616263 | cryptocli dd -decoders hex
Will produce the following result:
abcd

What's the opposite of od(1)?

Say I have 8b1f 0008 0231 49f6 0300 f1f3 75f4 0c72 f775 0850 7676 720c 560d 75f0 02e5 ce00 0861 1302 0000 0000, how can I easily get a binary file from that without copying+pasting into a hex editor?

Use:
% xxd -r -p in.txt out.bin

See xxd.

This version will work with binary format too:
cat /bin/sh \
| od -A n -v -t x1 \
| tr -d '\r' \
| xxd -r -g 1 -p1 \
| md5sum && md5sum /bin/sh
The extra '\r' is just if you're dealing with DOS text files...
And process byte by byte to prevent endianness difference if running parts of a pipe on different systems.

All the present answers refer to the convenient xxd -r approach, but for situations where xxd is not available or convenient here is a more portable (and more flexible, but more verbose and less efficient) solution, using only POSIX shell syntax (it also compensates for odd-number of digits in input):
un_od() {
printf -- "$(
tr -d '\t\r\n ' | sed -e 's/^(.(.{2})*)$/0\1/' -e 's/\(.\{2\}\)/\\x\1/g'
)"
}
By the way: you don't specify whether your input is big-endian or little-endian, or whether you want big/little-endian output. Usually input such as in your question would be big-endian/network-order (e.g., as created by od -t x1 -An -v), and would be expected to transform to big-endian output. I presume xxd just assumes that default if not told otherwise, and this solution does that too. If byte-swapping is needed, how you do the byte-swapping also depends on the word-size of the system (e.g., 32 bit, 64 bit) and very rarely the byte-size (you can almost always assume 8-bit bytes - octets - though).
The below functions use a more complex version of the binary -> od -> binary trick to portably byteswap binary data, conditional on system endianness, and accounting for system word-size. The algorithm works for anything up to 72-bit word size (because seq -s '' 10 -> 12345678910 doesn't work):
if { sed --version 2>/dev/null || :; } | head -n 1 | grep -q 'GNU sed'; then
_sed() { sed -r "${#}"; }
else
_sed() { sed -E "${#}"; }
fi
sys_bigendian() {
return $(
printf 'I' | od -t o2 | head -n 1 | \
_sed -e 's/^[^ \t]+[ \t]+([^ \t]+)[ \t]*$/\1/' | cut -c 6
)
}
sys_word_size() { expr $(getconf LONG_BIT) / 8; }
byte_swap() {
_wordsize=$1
od -An -v -t o1 | _sed -e 's/^[ \t]+//' | tr -s ' ' '\n' | \
paste -d '\\' $(for _cnt in $(seq $_wordsize); do printf -- '- '; done) | \
_sed -e 's/^/\\/' -e '$ s/\\+$//' | \
while read -r _word; do
_thissize=$(expr $(printf '%s' "$_word" | wc -c) / 4)
printf '%s' "$(seq -s '' $_thissize)" | tr -d '\n' | \
tr "$(seq -s '' $_thissize -1 1)" "$_word"
done
unset _wordsize _prefix _word _thissize
}
You can use the above to output file contents in big-endian format regardless of system endianness:
if sys_bigendian; then
cat /bin/sh
else
cat /bin/sh | byte_swap $(sys_word_size)
fi

Here is the way to reverse "od" output:
echo "test" | od -A x -t x1 | sed -e 's|^[0-f]* ?||g' | xxd -r
test

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Convert binary data to hexadecimal in a shell script - linux

Perhaps use xxd: % xxd -l 16 -p /dev/random 193f6c54814f0576bc27d51ab39081dc

Watch out! hexdump and xxd give the results in a different endianness! $ echo -n $'\x12\x34' | xxd -p 1234 $ echo -n $'\x12\x34' | hexdump -e '"%x"' 3412 Simply explained. Big-endian vs. little-endian :D

If you need a large stream (no newlines) you can use tr and xxd (part of Vim) for byte-by-byte conversion. head -c1024 /dev/urandom | xxd -p | tr -d $'\n' Or you can use hexdump (POSIX) for word-by-word conversion. head -c1024 /dev/urandom | hexdump '-e"%x"' Note that the difference is endianness.

dd + hexdump will also work: dd bs=1 count=1 if=/dev/urandom 2>/dev/null | hexdump -e '"%x"'

Related

why does echo -n "100" | wc -c output 3?

Using "$RANDOM" to generate a random string in Bash

How to dump part of binary file

How to convert hex to ASCII characters in the Linux shell?

What's the opposite of od(1)?

Categories

Resources