I have two files and I want to see if the first 40 bytes are similar. How can I do this using hex dump?
If you are using the BSD hexdump utility (which will also be installed as hd, with a different default output format) then you can supply the -n40 command line parameter to limit the dump to the first 40 bytes:
hexdump -n40 filename
If you are using the Posix standard od, you need a capital N. You might find the following invocation useful:
od -N40 -w40 -tx1 -Ax filename
(You can do that with hexdump, too, but the format string is more work to figure out :) ).
Try this:
head -c 40 myfile | hexdump
Not sure why you need hexdump here,
diff <(dd bs=1 count=40 if=file1) <(dd bs=1 count=40 if=file2)
with hexdump:
diff <(dd bs=1 count=40 if=file1|hexdump) <(dd bs=1 count=40 if=file2|hexdump)
Related
The hexdump command converts any file to hex values.
But what if I have hex values and I want to reverse the process, is this possible?
There is a similar tool called xxd. If you run xxd with just a file name it dumps the data in a fairly standard hex dump format:
# xxd bdata
0000000: 0001 0203 0405
......
Now if you pipe the output back to xxd with the -r option and redirect that to a new file, you can convert the hex dump back to binary:
# xxd bdata | xxd -r >bdata2
# cmp bdata bdata2
# xxd bdata2
0000000: 0001 0203 0405
I've written a short AWK script which reverses hexdump -C output back to the
original data. Use like this:
reverse-hexdump.sh hex.txt > data
Handles '*' repeat markers and generating original data even if binary.
hexdump -C and reverse-hexdump.sh make a data round-trip pair. It is
available here:
GitHub reverse-hexdump repo
Direct to reverse-hexdump.sh
Restore file, given only the output of hexdump file
If you only have the output of hexdump file and want to restore the original file, first note that hexdump's default output depends on the endianness of the system you ran hexdump on!
If you have access to the system that created the dump, you can determinate its endianness using below command:
[[ "$(printf '\01\03' | hexdump)" == *0103* ]] && echo big || echo little
Reversing little-endian hexdump
This is the most common case. All x86/x64 systems are little-endian. If you don't know the endianness of the system that ran hexdump file, try this.
sed 's/ \(..\)\(..\)/ \2\1/g;$d' dump | xxd -r
The sed part converts hexdump's format into xxd's format, at least so far that xxd -r works.
Reversing big-endian hexdump
sed '$d' dump | xxd -r
Known Bugs (see comment section)
A trailing null byte is added if the original file was of odd length (e.g. 1, 3, 5, 7, ..., byte long).
Repeating sections of the original file are not restored correctly if they were hexdumped using a *.
You can check your dump for above problematic cases by running below command:
grep -qE '^\*|^[0-9a-f]*[13579bdf] *$' dump && echo bug || echo ok
Better alternative to create hexdumps in the first place
Besides the non-posix (and therefore not so portable) xxd there is od (octal dump) which should be available on all unix-like systems as it is specified by posix:
od -tx1 -An -v
Will print a hexadecimal dump, grouping digits as single bytes (-tx1), with no Address prefixes (-An, similar to xxd -p) and without abbreviating repeated sections as * (-v). You can reverse such a dump using xxd -r -p.
As someone who sucks at bash, I could not understand the examples already posted.
Here is what would have helped me when I was originally searching:
Take your text file "AYE.TXT" and convert it into a hex dump called "BEE.TXT"
xxd -p "AYE.TXT" > "BEE.TXT"
Take your hex dump file ("BEE.TXT") and covert it back to ascii file "CEE.TXT"
xxd -r -p "BEE.TXT" > "CEE.TXT"
Now that you have some simple working code, feel free to check out
"xxd -help" on the command line for an explanation of what all those flags do.
(That part is the easy part, the hard part is the specifics of the bash syntax)
There is a tonne of more elegant ways to get this done, but I've quickly hacked something together that Works for Me (tm) when regenerating a binary file from a hex dump generated by hexdump -C some_file.bin:
sed 's/\(.\{8\}\) \(..\) \(..\) \(..\) \(..\) \(..\) \(..\) \(..\) \(..\)/\1: \2\3 \4\5 \6\7 \8\9/g' some_file.hexdump | sed 's/\(.*\) \(..\) \(..\) \(..\) \(..\) \(..\) \(..\) \(..\) \(..\) |/\1 \2\3 \4\5 \6\7 \8\9 /g' | sed 's/.$//g' | xxd -r > some_file.restored
Basically, uses 2 sed processeses, each handling it's part of each line. Ugly, but someone might find it useful.
If you don't have xxd, use hexdump, od, perl or python:
The following all give the same output:
# If you only have hexdump
hexdump -ve '1/1 "%.2x"' mybinaryfile > mydump
# This gives exactly the same output as:
xxd -p mybinaryfile > mydump
# Or, much slower:
od -v -t x1 -An < mybinaryfile | tr -d "\n " > mydump
# Or, the fastest:
perl -pe 'BEGIN{$/=\1e6} $_=unpack "H*"' < mybinaryfile > mydump
# Or, if you somehow have Python, and not Perl:
python -c "print(open('mybinaryfile','rb').read().hex())" > mydump
Then you can copy and paste, or pipe the output, and convert back with:
xxd -r -p mydump mybinaryfileagain
# Or
xxd -r -p < mydump > mybinaryfileagain
The hexdump command is available almost everywhere, and is usually part of the default busybox - if it's not linked, you can try running busybox hexdump or busybox xxd.
If xxd is not available to reverse the data, then you can try awk
The old days: Zmodem
In the old days we used to use X/Y/Zmodem which is available in the package lrzsz which can tolerate lossy comms - but it's a bidirectional protocol so the binaries need to be running at the same time and there needs to be bidirectional comms:
# Demo on local machine, using FIFOs
mkfifo /tmp/fifo-in
mkfifo /tmp/fifo-out
sz -b mybinaryfile > /tmp/fifo-out < /tmp/fifo-in
mkdir out; cd out
rz -b < /tmp/fifo-out > /tmp/fifo-in
Luckily, screen supports receiving Zmodem, so if you're in a screen session:
screen
telnet somehost
Then type Ctrl+A and : and then zmodem catch and Enter. Then inside the screen on the remote host, issue:
# sz -b mybinaryfile
Press Enter when you see the string starting with "!!!".
When you see "Transfer Complete", you may want to run reset if you want to continue the terminal session normally.
This program reverses hexdump -C output back to the original data.
Usage:
make
make test
./unhexdump -i inputfile -o outputfile
see https://github.com/zhouzq-thu/unhexdump!
i found more simple solution:
bin2hex
echo -n "abc" | hexdump -ve '1/1 "%02x"'
hex2bin
echo -n "616263" | grep -Eo ".{2}" | sed 's/\(.*\)/\\x\1/' | tr -d '\n' | xargs -0 echo -ne
I'm using the ruby binding, ruby-xz.
random_string = SecureRandom.random_bytes(100)
compressed_string = XZ.compress(random_string, compression_level = 9, check = :none, extreme = true)
compressed_string.size # => always 148
I've tested it ten thousands of times, on strings of varying length.
I know that at least half of the strings are 1-incompressible (cannot be compresse by more than 1 bit), 3/4 of the strings are 2-incompressible, etc. (This follows from a counting argument.) This, obviously, says nothing about the lower bound of the number compressible strings, but there are bound to be a few, aren't there?
Explanation
There are a few reasons:
liblzma, when not in RAW mode, adds a header describing the dictionary size and a few other settings. That is one of the reasons it grows in size.
LZMA, like a lot of other compressors, uses a range encoder to encode the output of the dictionary compression (in essence a badass version of LZ77) in the least amount of bits needed. So at the end of the bit stream, the last bits are padded to make it into a full byte.
You are compressing random noise, which as you note, is hard to compress. The range encoder tries to find the least amount of bits to encode the symbols outputted by the dictionary compression round. So in this case, there will be a lot of symbols. If, there was one (or two) recurring patterns that LZMA found, it could be that in the end it only saves a bit or two from the output. Which as explained in point 2, you cannot observe on a byte level.
Experiment
Some small experiments for observing the overhead.
empty file with lzma in raw mode:
$ dd if=/dev/urandom bs=1k count=0 2>/dev/null | xz -9 -e --format=raw -c 2>/dev/null | wc -c
1
it needed at least one or two bits to say it reached the end of the stream, and this was padded to one byte
1k file filled with zeroes
$ dd if=/dev/zero bs=1k count=1 2>/dev/null | xz -9 -e --format=raw -c 2>/dev/null | wc -c
19
quite nice, but complexity theory wise, still perhaps a few bytes to many (1000x'\0' would have been optimal encoding)
1k file with all bits at 1
$ dd if=/dev/zero bs=1k count=1 2>/dev/null | sed 's/\x00/\xFF/g'| xz -9 -e --format=raw -c 2>/dev/null | wc -c
21
interestingly, xz compresses this a little worse than all zeroes. most likely related to the fact that LZMA dictionary works on a bit level (which was one of the novel ideas of LZMA).
1k random file:
$ dd if=/dev/urandom bs=1k count=1 2>/dev/null | xz -9 -e --format=raw -c 2>/dev/null | wc -c
1028
so 4 bytes more than the input, still not bad.
1000 runs of 1k random files:
$ for i in {1..1000}; do dd if=/dev/urandom bs=1k count=1 2>/dev/null | xz -9 -e --format=raw -c 2>/dev/null | wc -c; done | sort | uniq -c
1000 1028
so every time, 1028 bytes needed.
I need to count the occurrences of the hex string 0xFF 0x84 0x03 0x07 in a binary file, without too much hassle... is there a quick way of grepping for this data from the linux command line or should I write dedicated code to do it?
Patterns without linebreaks
If your version of grep takes the -P parameter, then you can use grep -a -P, to search for an arbitrary binary string (with no linebreaks) inside a binary file. This is close to what you want:
grep -a -c -P '\xFF\x84\x03\x07' myfile.bin
-a ensures that binary files will not be skipped
-c outputs the count
-P specifies that your pattern is a Perl-compatible regular expression (PCRE), which allows strings to contain hex characters in the above \xNN format.
Unfortunately, grep -c will only count the number of "lines" the pattern appears on - not actual occurrences.
To get the exact number of occurrences with grep, it seems you need to do:
grep -a -o -P '\xFF\x84\x03\x07' myfile.bin | wc -l
grep -o separates out each match onto its own line, and wc -l counts the lines.
Patterns containing linebreaks
If you do need to grep for linebreaks, one workaround I can think of is to use tr to swap the character for another one that's not in your search term.
# set up test file (0a is newline)
xxd -r <<< '0:08 09 0a 0b 0c 0a 0b 0c' > test.bin
# grep for '\xa\xb\xc' doesn't work
grep -a -o -P '\xa\xb\xc' test.bin | wc -l
# swap newline with oct 42 and grep for that
tr '\n\042' '\042\n' < test.bin | grep -a -o -P '\042\xb\xc' | wc -l
(Note that 042 octal is the double quote " sign in ASCII.)
Another way, if your string doesn't contain Nulls (0x0), would be to use the -z flag, and swap Nulls for linebreaks before passing to wc.
grep -a -o -P -z '\xa\xb\xc' test.bin | tr '\0\n' '\n\0' | wc -l
(Note that -z and -P may be experimental in conjunction with each other. But with simple expressions and no Nulls, I would guess it's fine.)
use hexdump like
hexdump -v -e '"0x" 1/1 "%02X" " "' <filename> | grep -oh "0xFF 0x84 0x03 0x07" |wc -w
hexdump will output binary file in the given format like 0xNN
grep will find all the occurrences of the string without considering the same ones repeated on a line
wc will give you final count
did you try grep -a?
from grep man page:
-a, --text
Process a binary file as if it were text; this is equivalent to the --binary-files=text option.
How about:
$ hexdump a.out | grep -Ec 'ff ?84 ?03 ?07'
This doesn't quite answer your question, but does solve the problem when the search string is ASCII but the file is binary:
cat binaryfile | sed 's/SearchString/SearchString\n/g' | grep -c SearchString
Basically, 'grep' was almost there except it only counted one occurrence if there was no newline byte in between, so I added the newline bytes.
This question already has answers here:
How to create a hex dump of file containing only the hex characters without spaces in bash?
(9 answers)
Closed 7 years ago.
following Convert decimal to hexadecimal in UNIX shell script
I am trying to print only the hex values from hexdump, i.e. don't print the lines numbers and the ASCII table.
But the following command line doesn't print anything:
hexdump -n 50 -Cs 10 file.bin | awk '{for(i=NF-17; i>2; --i) print $i}'
Using xxd is better for this job:
xxd -p -l 50 -seek 10 file.bin
From man xxd:
xxd - make a hexdump or do the reverse.
-p | -ps | -postscript | -plain
output in postscript continuous hexdump style. Also known as plain hexdump style.
-l len | -len len
stop after writing <len> octets.
-seek offset
When used after -r: revert with <offset> added to file positions found in hexdump.
You can specify the exact format that you want hexdump to use for output, but it's a bit tricky. Here's the default output, minus the file offsets:
hexdump -e '16/1 "%02x " "\n"' file.bin
(To me, it looks like this would produce an extra trailing space at the end
of each line, but for some reason it doesn't.)
As an alternative, consider using xxd -p file.bin.
First of all, remove -C which is emitting the ascii information.
Then you could drop the offset with
hexdump -n 50 -s 10 file.bin | cut -c 9-
I am trying to read from /dev/random and /dev/urandom and would like to know what is the best way to read from them and block/character special devices in general using bash shell scripting ?
Use dd to get blocks of data from the device. E.g. to get 8 bytes from /dev/urandom:
dd if=/dev/urandom count=1 bs=8 | ...
Then you can use od to convert the bytes to a human-readable form:
$ dd if=/dev/urandom count=1 bs=8 2>/dev/null | od -t x1 -A n
b4 bc 2f 59 dd 55 1b 4a
By the way, if you only need random numbers in bash, $RANDOM is probably more useful:
$ echo $RANDOM $RANDOM $RANDOM $RANDOM
3466 6521 4426 9349
My hint:
dd if=/dev/urandom count=4 | ...
or e.g. The tail is heavily dependent on what you want to do with that data
To format as a long integer number:
dd if=/dev/urandom bs=1 count=4|od -l