How to check if a file contains only zeros in a Linux shell?

How to check if a file contains only zeros in a Linux shell? - linux

How to check if a large file contains only zero bytes ('\0') in Linux using a shell command? I can write a small program for this but this seems to be an overkill.

If you're using bash, you can use read -n 1 to exit early if a non-NUL character has been found:
<your_file tr -d '\0' | read -n 1 || echo "All zeroes."
where you substitute the actual filename for your_file.

The "file" /dev/zero returns a sequence of zero bytes on read, so a cmp file /dev/zero should give essentially what you want (reporting the first different byte just beyond the length of file).

If you have Bash,
cmp file <(tr -dc '\000' <file)
If you don't have Bash, the following should be POSIX (but I guess there may be legacy versions of cmp which are not comfortable with reading standard input):
tr -dc '\000' <file | cmp - file
Perhaps more economically, assuming your grep can read arbitrary binary data,
tr -d '\000' <file | grep -q -m 1 ^ || echo All zeros
I suppose you could tweak the last example even further with a dd pipe to truncate any output from tr after one block of data (in case there are very long sequences without newlines), or even down to one byte. Or maybe just force there to be newlines.
tr -d '\000' <file | tr -c '\000' '\n' | grep -q -m 1 ^ || echo All zeros

It won't win a prize for elegance, but:
xxd -p file | grep -qEv '^(00)*$'
xxd -p prints a file in the following way:
23696e636c756465203c6572726e6f2e683e0a23696e636c756465203c73
7464696f2e683e0a23696e636c756465203c7374646c69622e683e0a2369
6e636c756465203c737472696e672e683e0a0a766f696420757361676528
63686172202a70726f676e616d65290a7b0a09667072696e746628737464
6572722c202255736167653a202573203c
So we grep to see if there is a line that is not made completely out of 0's, which means there is a char different to '\0' in the file. If not, the file is made completely out of zero-chars.
(The return code signals which one happened, I assumed you wanted it for a script. If not, tell me and I'll write something else)
EDIT: added -E for grouping and -q to discard output.

Straightforward:
if [ -n $(tr -d '\0000' < file | head -c 1) ]; then
echo a nonzero byte
fi
The tr -d removes all null bytes. If there are any left, the if [ -n sees a nonempty string.

Completely changed my answer based on the reply here
Try
perl -0777ne'print /^\x00+$/ ? "yes" : "no"' file

Related

Hexdump reverse command

The hexdump command converts any file to hex values.
But what if I have hex values and I want to reverse the process, is this possible?

There is a similar tool called xxd. If you run xxd with just a file name it dumps the data in a fairly standard hex dump format:
# xxd bdata
0000000: 0001 0203 0405
......
Now if you pipe the output back to xxd with the -r option and redirect that to a new file, you can convert the hex dump back to binary:
# xxd bdata | xxd -r >bdata2
# cmp bdata bdata2
# xxd bdata2
0000000: 0001 0203 0405

I've written a short AWK script which reverses hexdump -C output back to the
original data. Use like this:
reverse-hexdump.sh hex.txt > data
Handles '*' repeat markers and generating original data even if binary.
hexdump -C and reverse-hexdump.sh make a data round-trip pair. It is
available here:
GitHub reverse-hexdump repo
Direct to reverse-hexdump.sh

Restore file, given only the output of hexdump file
If you only have the output of hexdump file and want to restore the original file, first note that hexdump's default output depends on the endianness of the system you ran hexdump on!
If you have access to the system that created the dump, you can determinate its endianness using below command:
[[ "$(printf '\01\03' | hexdump)" == *0103* ]] && echo big || echo little
Reversing little-endian hexdump
This is the most common case. All x86/x64 systems are little-endian. If you don't know the endianness of the system that ran hexdump file, try this.
sed 's/ \(..\)\(..\)/ \2\1/g;$d' dump | xxd -r
The sed part converts hexdump's format into xxd's format, at least so far that xxd -r works.
Reversing big-endian hexdump
sed '$d' dump | xxd -r
Known Bugs (see comment section)
A trailing null byte is added if the original file was of odd length (e.g. 1, 3, 5, 7, ..., byte long).
Repeating sections of the original file are not restored correctly if they were hexdumped using a *.
You can check your dump for above problematic cases by running below command:
grep -qE '^\*|^[0-9a-f]*[13579bdf] *$' dump && echo bug || echo ok
Better alternative to create hexdumps in the first place
Besides the non-posix (and therefore not so portable) xxd there is od (octal dump) which should be available on all unix-like systems as it is specified by posix:
od -tx1 -An -v
Will print a hexadecimal dump, grouping digits as single bytes (-tx1), with no Address prefixes (-An, similar to xxd -p) and without abbreviating repeated sections as * (-v). You can reverse such a dump using xxd -r -p.

As someone who sucks at bash, I could not understand the examples already posted.
Here is what would have helped me when I was originally searching:
Take your text file "AYE.TXT" and convert it into a hex dump called "BEE.TXT"
xxd -p "AYE.TXT" > "BEE.TXT"
Take your hex dump file ("BEE.TXT") and covert it back to ascii file "CEE.TXT"
xxd -r -p "BEE.TXT" > "CEE.TXT"
Now that you have some simple working code, feel free to check out
"xxd -help" on the command line for an explanation of what all those flags do.
(That part is the easy part, the hard part is the specifics of the bash syntax)

There is a tonne of more elegant ways to get this done, but I've quickly hacked something together that Works for Me (tm) when regenerating a binary file from a hex dump generated by hexdump -C some_file.bin:
sed 's/\(.\{8\}\) \(..\) \(..\) \(..\) \(..\) \(..\) \(..\) \(..\) \(..\)/\1: \2\3 \4\5 \6\7 \8\9/g' some_file.hexdump | sed 's/\(.*\) \(..\) \(..\) \(..\) \(..\) \(..\) \(..\) \(..\) \(..\) |/\1 \2\3 \4\5 \6\7 \8\9 /g' | sed 's/.$//g' | xxd -r > some_file.restored
Basically, uses 2 sed processeses, each handling it's part of each line. Ugly, but someone might find it useful.

If you don't have xxd, use hexdump, od, perl or python:
The following all give the same output:
# If you only have hexdump
hexdump -ve '1/1 "%.2x"' mybinaryfile > mydump
# This gives exactly the same output as:
xxd -p mybinaryfile > mydump
# Or, much slower:
od -v -t x1 -An < mybinaryfile | tr -d "\n " > mydump
# Or, the fastest:
perl -pe 'BEGIN{$/=\1e6} $_=unpack "H*"' < mybinaryfile > mydump
# Or, if you somehow have Python, and not Perl:
python -c "print(open('mybinaryfile','rb').read().hex())" > mydump
Then you can copy and paste, or pipe the output, and convert back with:
xxd -r -p mydump mybinaryfileagain
# Or
xxd -r -p < mydump > mybinaryfileagain
The hexdump command is available almost everywhere, and is usually part of the default busybox - if it's not linked, you can try running busybox hexdump or busybox xxd.
If xxd is not available to reverse the data, then you can try awk
The old days: Zmodem
In the old days we used to use X/Y/Zmodem which is available in the package lrzsz which can tolerate lossy comms - but it's a bidirectional protocol so the binaries need to be running at the same time and there needs to be bidirectional comms:
# Demo on local machine, using FIFOs
mkfifo /tmp/fifo-in
mkfifo /tmp/fifo-out
sz -b mybinaryfile > /tmp/fifo-out < /tmp/fifo-in
mkdir out; cd out
rz -b < /tmp/fifo-out > /tmp/fifo-in
Luckily, screen supports receiving Zmodem, so if you're in a screen session:
screen
telnet somehost
Then type Ctrl+A and : and then zmodem catch and Enter. Then inside the screen on the remote host, issue:
# sz -b mybinaryfile
Press Enter when you see the string starting with "!!!".
When you see "Transfer Complete", you may want to run reset if you want to continue the terminal session normally.

This program reverses hexdump -C output back to the original data.
Usage:
make
make test
./unhexdump -i inputfile -o outputfile
see https://github.com/zhouzq-thu/unhexdump!

i found more simple solution:
bin2hex
echo -n "abc" | hexdump -ve '1/1 "%02x"'
hex2bin
echo -n "616263" | grep -Eo ".{2}" | sed 's/\(.*\)/\\x\1/' | tr -d '\n' | xargs -0 echo -ne

Linux bash command incorrectly handles zeros in output.

I'm using Debian 6-64.
When i'm running a command
echo -n `cat /proc/$(ps -o pid --no-header -C x-session-manager | tr -d ' ')/environ 2>/dev/null | tr '\0000' '\n'|grep XA|cut -d '=' -f 2`
to acquire XAUTHORITY for the current user logged in, I expect it to return at the moment my actual xauthority path, which is:
/var/run/gdm3/auth-for-alex-g5t0xM
but what it actually returns is
/var/run/gdm3/auth-for-alex-g5t
the part 0xM is missing.
Apparently it somehow takes 0 as '\0', truncating the output.
What can I do to receive the correct output?

tr manual:
\NNN character with octal value NNN (1 to 3 octal digits)
You've given four digits. Fourth one handled as separate symbol ('0') and gets replaced too.

This should work as well and is a little more concise:
tr '\000' '\n' </proc/$(pgrep x-session-manager)/environ | awk -F= '/XAUTHORITY/{print $2}'

Count the number of occurences of binary data

I need to count the occurrences of the hex string 0xFF 0x84 0x03 0x07 in a binary file, without too much hassle... is there a quick way of grepping for this data from the linux command line or should I write dedicated code to do it?

Patterns without linebreaks
If your version of grep takes the -P parameter, then you can use grep -a -P, to search for an arbitrary binary string (with no linebreaks) inside a binary file. This is close to what you want:
grep -a -c -P '\xFF\x84\x03\x07' myfile.bin
-a ensures that binary files will not be skipped
-c outputs the count
-P specifies that your pattern is a Perl-compatible regular expression (PCRE), which allows strings to contain hex characters in the above \xNN format.
Unfortunately, grep -c will only count the number of "lines" the pattern appears on - not actual occurrences.
To get the exact number of occurrences with grep, it seems you need to do:
grep -a -o -P '\xFF\x84\x03\x07' myfile.bin | wc -l
grep -o separates out each match onto its own line, and wc -l counts the lines.
Patterns containing linebreaks
If you do need to grep for linebreaks, one workaround I can think of is to use tr to swap the character for another one that's not in your search term.
# set up test file (0a is newline)
xxd -r <<< '0:08 09 0a 0b 0c 0a 0b 0c' > test.bin
# grep for '\xa\xb\xc' doesn't work
grep -a -o -P '\xa\xb\xc' test.bin | wc -l
# swap newline with oct 42 and grep for that
tr '\n\042' '\042\n' < test.bin | grep -a -o -P '\042\xb\xc' | wc -l
(Note that 042 octal is the double quote " sign in ASCII.)
Another way, if your string doesn't contain Nulls (0x0), would be to use the -z flag, and swap Nulls for linebreaks before passing to wc.
grep -a -o -P -z '\xa\xb\xc' test.bin | tr '\0\n' '\n\0' | wc -l
(Note that -z and -P may be experimental in conjunction with each other. But with simple expressions and no Nulls, I would guess it's fine.)

use hexdump like
hexdump -v -e '"0x" 1/1 "%02X" " "' <filename> | grep -oh "0xFF 0x84 0x03 0x07" |wc -w
hexdump will output binary file in the given format like 0xNN
grep will find all the occurrences of the string without considering the same ones repeated on a line
wc will give you final count

did you try grep -a?
from grep man page:
-a, --text
Process a binary file as if it were text; this is equivalent to the --binary-files=text option.

How about:
$ hexdump a.out | grep -Ec 'ff ?84 ?03 ?07'

This doesn't quite answer your question, but does solve the problem when the search string is ASCII but the file is binary:
cat binaryfile | sed 's/SearchString/SearchString\n/g' | grep -c SearchString
Basically, 'grep' was almost there except it only counted one occurrence if there was no newline byte in between, so I added the newline bytes.

Using sed to get an env var from /proc/*/environ weirdness with \x00

I'm trying to grovel through some other processes environment to get a specific env var.
So I've been trying a sed command like:
sed -n "s/\x00ENV_VAR_NAME=\([^\x00]*\)\x00/\1/p" /proc/pid/environ
But I'm getting as output the full environ file. If I replace the \1 with just a static string, I get that string plus the entire environ file:
sed -n "s/\x00ENV_VAR_NAME=\([^\x00]*\)\x00/BLAHBLAH/p" /proc/pid/environ
I should just be getting "BLAHBLAH" in the last example. This doesn't happen if I get rid of the null chars and use some other test data set.
This lead me to try transforming the \x00 to \x01's, which does seem to work:
cat /proc/pid/environ | tr '\000' '\001' | sed -n "s/\x01ENV_VAR_NAME=\([^\x01]*\)\x01/\1/p"
Am I missing something simple about sed here? Or should I just stick to this workaround?

A lot of programs written in C tend to fail with strings with embedded NULs as a NUL terminates a C-style string. Unless specially written to handle it.
I process /proc/*/environ on the command line with xargs:
xargs -n 1 -0 < /proc/pid/environ
This gives you one env var per line. Without a command, xargs just echos the argument. You can then easily use grep, sed, awk, etc on that by piping to it.
xargs -n 1 -0 < /proc/pid/environ | sed -n 's/^ENV_VAR_NAME=\(.*\)/\1/p'
I use this often enough that I have a shell function for it:
pidenv()
{
xargs -n 1 -0 < /proc/${1:-self}/environ
}
This gives you the environment of a specific pid, or self if no argument is supplied.

You could process the list with gawk, setting the record separator to \0 and the field separator to =:
gawk -v 'RS=\0' -F= '$1=="ENV_VAR_NAME" {print $2}' /proc/pid/environ
Or you could use read in a loop to read each NUL-delimited line. For instance:
while read -d $'\0' ENV; do declare "$ENV"; done < /proc/pid/environ
echo $ENV_VAR_NAME
(Do this in a sub-shell to avoid clobbering your own environment.)

cat /proc/PID/environ | tr '\0' '\n' | sed 's/^/export /' ;
then copy and paste as needed.

In spite of really old and answered question, I am adding one very simple oneliner, probably simpler for getting the text output and further processing:
strings /proc/$PID/environ

For some reason sed does not match \0 with .
% echo -n "\00" | xxd
0000000: 00 .
% echo -n "\00" | sed 's/./a/g' | xxd
0000000: 00 .
% echo -n "\01" | xxd
0000000: 01 .
% echo -n "\01" | sed 's/./a/g' | xxd
0000000: 61 a
Solution: do not use sed or use your workaround.

Linux using grep to print the file name and first n characters

How do I use grep to perform a search which, when a match is found, will print the file name as well as the first n characters in that file? Note that n is a parameter that can be specified and it is irrelevant whether the first n characters actually contains the matching string.

grep -l pattern *.txt |
while read line; do
echo -n "$line: ";
head -c $n "$line";
echo;
done
Change -c to -n if you want to see the first n lines instead of bytes.

You need to pipe the output of grep to sed to accomplish what you want. Here is an example:
grep mypattern *.txt | sed 's/^\([^:]*:.......\).*/\1/'
The number of dots is the number of characters you want to print. Many versions of sed often provide an option, like -r (GNU/Linux) and -E (FreeBSD), that allows you to use modern-style regular expressions. This makes it possible to specify numerically the number of characters you want to print.
N=7
grep mypattern *.txt /dev/null | sed -r "s/^([^:]*:.{$N}).*/\1/"
Note that this solution is a lot more efficient that others propsoed, which invoke multiple processes.

There are few tools that print 'n characters' rather than 'n lines'. Are you sure you really want characters and not lines? The whole thing can perhaps be best done in Perl. As specified (using grep), we can do:
pattern="$1"
shift
n="$2"
shift
grep -l "$pattern" "$#" |
while read file
do
echo "$file:" $(dd if="$file" count=${n}c)
done
The quotes around $file preserve multiple spaces in file names correctly. We can debate the command line usage, currently (assuming the command name is 'ngrep'):
ngrep pattern n [file ...]
I note that #litb used 'head -c $n'; that's neater than the dd command I used. There might be some systems without head (but they'd pretty archaic). I note that the POSIX version of head only supports -n and the number of lines; the -c option is probably a GNU extension.

Two thoughts here:
1) If efficiency was not a concern (like that would ever happen), you could check $status [csh] after running grep on each file. E.g.: (For N characters = 25.)
foreach FILE ( file1 file2 ... fileN )
grep targetToMatch ${FILE} > /dev/null
if ( $status == 0 ) then
echo -n "${FILE}: "
head -c25 ${FILE}
endif
end
2) GNU [FSF] head contains a --verbose [-v] switch. It also offers --null, to accomodate filenames with spaces. And there's '--', to handle filenames like "-c". So you could do:
grep --null -l targetToMatch -- file1 file2 ... fileN |
xargs --null head -v -c25 --

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to check if a file contains only zeros in a Linux shell? - linux

How to check if a large file contains only zero bytes ('\0') in Linux using a shell command? I can write a small program for this but this seems to be an overkill.

If you're using bash, you can use read -n 1 to exit early if a non-NUL character has been found: <your_file tr -d '\0' | read -n 1 || echo "All zeroes." where you substitute the actual filename for your_file.

The "file" /dev/zero returns a sequence of zero bytes on read, so a cmp file /dev/zero should give essentially what you want (reporting the first different byte just beyond the length of file).

Straightforward: if [ -n $(tr -d '\0000' < file | head -c 1) ]; then echo a nonzero byte fi The tr -d removes all null bytes. If there are any left, the if [ -n sees a nonempty string.

Completely changed my answer based on the reply here Try perl -0777ne'print /^\x00+$/ ? "yes" : "no"' file

Related

Hexdump reverse command

Linux bash command incorrectly handles zeros in output.

Count the number of occurences of binary data

Using sed to get an env var from /proc/*/environ weirdness with \x00

Linux using grep to print the file name and first n characters

Categories

Resources