NULL (\0) added at the end of file - linux

I'm trying to clean a binary file to delete all the NULL on it. The task is quite simple, but I found out a lot of files have a NULL at the end of the file and i dont know what. I'm dumping the hexadecimal value of each byte and i dont see the null anywhere, but if I do a hexdump of the file, I see a value 00 at the end and I dont know why.... Could be that it is a EOF, but it's weird becuase it doesnt appear in all files. This is the script I have, quite simpel one, it generates 100 random binary files, and then reads file per file, char per char. Following the premise that bash wont store NULL's on variables, rewritting the char after storing it on a variable would avoid the NULL's, but no....
#!/bin/bash
for i in $(seq 0 100)
do
echo "$i %"
time dd if=/dev/urandom of=$i bs=1 count=1000
while read -r -n 1 c;
do
echo -n "$c" >> temp
done < $i
mv temp $i
done
I also tried with:
tr '\000' <inFile > outfile
But same result.
This is how it looks the hexdump of one the files with this problem
00003c0 0b12 a42b cb50 2a90 1fd6 a4f9 89b4 ddb6
00003d0 3fa3 eb7e 00c4
c4 should be the last byte butas you can see, there's a 00 there ....
Any clue?
EDIT:
Forgot to mention that the machine where im running this is something similar like raspberry pi and the tools provided with it are quite limited.

Try these other commands:
od -tx1 inFile
xxd inFile
hexdump outputs 00 when the size is an odd number of bytes.
It seems hexdump without options is like -x, hexdump -h gives the list of options; hexdump -C may also help.

Related

How to Write Block of Random Data to File using Bash

I need to write a 5MB block of data obtained from /dev/urandom to a partition, at a specific location in the partition. I then need to check that the write executed correctly. I was previously successfully doing this in C++, but now I would like to implement it in a bash script.
My C++ code consisted of:
create 10MB array of data populated from /dev/urandom (RANDOM_ARRAY)
open partition with open()
use lseek() to navigate to desired position in partition
use write() to write the array into the partition
close and reopen partition, use lseek to navigate back to desired position
use read() to read 5MB at this position and populate another array with this data (WRITTEN_ARRAY)
compare each element in (RANDOM_ARRAY) with (WRITTEN_ARRAY)
Im not experienced with writing bash scripts but this is what I've got so far, although it doesn't seem to work:
random_data="$(cat /dev/urandom | head -c<5MB>)"
printf $random_data | dd of=<partition_path> bs=1 seek=<position_in_partition> count=<5MB>
file_data=$(dd if=<partition_path> bs=1 skip=<position_in_partition> count=<5MB>)
if [ "$random_data" == "$file_data" ]
then
echo "data write successful"
fi
Thanks to the helpful commenters my script now looks something like this:
# get 10MB random data
head -c<10MB> /dev/urandom > random.bin
# write 10MB random data to partition
dd if=random.bin of=<partition_location>
# copy the written data
dd if=<partition_location> count=<10MB/512 bytes> of=newdata.bin
# compare
cmp random.bin newdata.bin
At this point cmp returns that the first char is different. Looking at a verbose output of cmp and turns out all values in newdata.bin are 0.
Here's a simpler approach which just saves the data in a temporary file.
#!/bin/sh
set -e
random_data=$(mktemp -t ideone.XXXXXXXX) || exit
trap 'rm -rf "$t"' EXIT
dd if=/dev/urandom bs=10240 count=512 of="$random_data"
dd if="$random_data" of=<partition_path> bs=1 seek=<position_in_partition> count=<5MB>
if dd if=<partition_path> bs=1 skip=<position_in_partition> count=<5MB> |
cmp "$random_data"
then
echo "data write successful"
fi
Bash strings cannot hold arbitrary binary data because the ASCII NUL character is used as a string terminator.
One way to do what you want to do is to put the data in files instead of variables and use cmp to compare the files.
Another option is to store cryptographic hashes of the data in Bash variables. This Shellcheck-clean code demonstrates the idea:
#! /bin/bash -p
partition_path=testpart
position_in_partition=10
sha256_1=$( exec 3>&1
head -c 5MB /dev/urandom \
| tee >(sha256sum >&3) \
| dd of="$partition_path" bs=1 \
seek="$position_in_partition" count=5MB)
sha256_2=$(dd if="$partition_path" bs=1 \
skip="$position_in_partition" count=5MB \
| sha256sum)
[[ $sha256_1 == "$sha256_2" ]] && echo 'data write successful'
You'll need to set the partition_path and position_in_partion variables to values that are appropriate for you.
exec 3>&1 connects file descriptor 3 to the stream that is used to read the value of sha256_1.
tee >(sha256sum >&3) uses the standard tee utility and Bash process substitution to copy the pipeline data as input to a sha256sum process whose output is redirected to file descriptor 3. The effect of this is that the sha256sum output (with the trailing newline removed) becomes the value of the sha256_1 variable.
You can use a stronger cryptographic hash function by replacing sha256sum with (for example) sha512sum.

Compute checksum on file from command line

Looking for a command or set of commands that are readily available on Linux distributions that would allow me to create a script to generate a checksum for a file.
This checksum is generated by a build system I have no control over by summing every single byte in the file and then truncating that number to 4 bytes.
I know how to do do this using tools like node.js, perl, python, C/C++, etc, but I need to be able to do this on a bare bones Linux distribution running remotely that I can't modify (it's on a PLC).
Any ideas? I've been searching for awhile and haven't found anything that looks straightforward yet.
The solution for byte by byte summation and truncating that number to 4 bytes using much primitive shell commands.
#! /usr/bin/env bash
declare -a bytes
bytes=(`xxd -p -c 1 INPUT_FILE | tr '\n' ' '`)
total=0;
for(( i=0; i<${#bytes[#]}; i++));
do
total=$(($total + 0x${bytes[i]}))
if [ $total > 4294967295 ]; then
total=$(($total & 4294967295))
fi
done
echo "Checksum: " $total
If you just want to do byte by byte summation and truncating that number to 4 bytes then the following command can be used.
xxd -p -c 1 <Input file> | awk '{s+=$1; if(s > 4294967295) s = and(4294967295, s) } END {print s}'
The xxd command is used extract hexdump of the input file and each byte is added to compute the sum. If the sum exceeds 2^32-1 = 4294967295 value, then a bitwise and operation is performed to truncate the bits.
Have you tried cksum? I use it inside a few scripts. It's very simple to use.
http://linux.die.net/man/1/cksum

Remove a specific line from a file WITHOUT using sed or awk

I need to remove a specific line number from a file using a bash script.
I get the line number from the grep command with the -n option.
I cannot use sed for a variety of reasons, least of which is that it is not installed on all the systems this script needs to run on and installing it is not an option.
awk is out of the question because in testing, on different machines with different UNIX/Linux OS's (RHEL, SunOS, Solaris, Ubuntu, etc.), it gives (sometimes wildly) different results on each. So, no awk.
The file in question is just a flat text file, with one record per line, so nothing fancy needs to be done, except for remove the line by number.
If at all possible, I need to avoid doing something like extracting the contents of the file, not including the line I want gone, and then overwriting the original file.
Since you have grep, the obvious thing to do is:
$ grep -v "line to remove" file.txt > /tmp/tmp
$ mv /tmp/tmp file.txt
$
But it sounds like you don't want to use any temporary files - I assume the input file is large and this is an embedded system where memory and storage are in short supply. I think you ideally need a solution that edits the file in place. I think this might be possible with dd but haven't figured it out yet :(
Update - I figured out how to edit the file in place with dd. Also grep, head and cut are needed. If these are not available then they can probably be worked around for the most part:
#!/bin/bash
# get the line number to remove
rline=$(grep -n "$1" "$2" | head -n1 | cut -d: -f1)
# number of bytes before the line to be removed
hbytes=$(head -n$((rline-1)) "$2" | wc -c)
# number of bytes to remove
rbytes=$(grep "$1" "$2" | wc -c)
# original file size
fsize=$(cat "$2" | wc -c)
# dd will start reading the file after the line to be removed
ddskip=$((hbytes + rbytes))
# dd will start writing at the beginning of the line to be removed
ddseek=$hbytes
# dd will move this many bytes
ddcount=$((fsize - hbytes - rbytes))
# the expected new file size
newsize=$((fsize - rbytes))
# move the bytes with dd. strace confirms the file is edited in place
dd bs=1 if="$2" skip=$ddskip seek=$ddseek conv=notrunc count=$ddcount of="$2"
# truncate the remainder bytes of the end of the file
dd bs=1 if="$2" skip=$newsize seek=$newsize count=0 of="$2"
Run it thusly:
$ cat > file.txt
line 1
line two
line 3
$ ./grepremove "tw" file.txt
7+0 records in
7+0 records out
0+0 records in
0+0 records out
$ cat file.txt
line 1
line 3
$
Suffice to say that dd is a very dangerous tool. You can easily unintentionally overwrite files or entire disks. Be very careful!
Try ed. The here-document-based example below deletes line 2 from test.txt
ed -s test.txt <<!
2d
w
!
You can do it without grep using posix shell builtins which should be on any *nix.
while read LINE || [ "$LINE" ];do
case "$LINE" in
*thing_you_are_grepping_for*)continue;;
*)echo "$LINE";;
esac
done <infile >outfile
If n is the line you want to omit:
{
head -n $(( n-1 )) file
tail +$(( n+1 )) file
} > newfile
Given dd is deemed too dangerous for this in-place line removal, we need some other method where we have fairly fine-grained control over the file system calls. My initial urge is to write something in c, but while possible, I think that is a bit of overkill. Instead it is worth looking to common scripting (not shell-scripting) languages, as these typically have fairly low-level file APIs which map to the file syscalls in a fairly straightforward manner. I'm guessing this can be done using python, perl, Tcl or one of many other scripting language that might be available. I'm most familiar with Tcl, so here we go:
#!/bin/sh
# \
exec tclsh "$0" "$#"
package require Tclx
set removeline [lindex $argv 0]
set filename [lindex $argv 1]
set infile [open $filename RDONLY]
for {set lineNumber 1} {$lineNumber < $removeline} {incr lineNumber} {
if {[eof $infile]} {
close $infile
puts "EOF at line $lineNumber"
exit
}
gets $infile line
}
set bytecount [tell $infile]
gets $infile rmline
set outfile [open $filename RDWR]
seek $outfile $bytecount start
while {[gets $infile line] >= 0} {
puts $outfile $line
}
ftruncate -fileid $outfile [tell $outfile]
close $infile
close $outfile
Note on my particular box I have Tcl 8.4, so I had to load the Tclx package in order to use the ftruncate command. In Tcl 8.5, there is chan truncate which could be used instead.
You can pass the line number you want to remove and the filename to this script.
In short, the script does this:
open the file for reading
read the first n-1 lines
get the offset of the start of the next line (line n)
read line n
open the file with a new FD for writing
move the file location of the write FD to the offset of the start of line n
continue reading the remaining lines from the read FD and write them to the write FD until the whole read FD is read
truncate the write FD
The file is edited exactly in place. No temporary files are used.
I'm pretty sure this can be re-written in python or perl or ... if necessary.
Update
Ok, so in-place line removal can be done in almost-pure bash, using similar techniques to the Tcl script above. But the big caveat is that you need to have truncate command available. I do have it on my Ubuntu 12.04 VM, but not on my older Redhat-based box. Here is the script:
#!/bin/bash
n=$1
filename=$2
exec 3<> $filename
exec 4<> $filename
linecount=1
bytecount=0
while IFS="" read -r line <&3 ; do
if [[ $linecount == $n ]]; then
echo "omitting line $linecount: $line"
else
echo "$line" >&4
((bytecount += ${#line} + 1))
fi
((linecount++))
done
exec 3>&-
exec 4>&-
truncate -s $bytecount $filename
#### or if you can tolerate dd, just to do the truncate:
# dd of="$filename" bs=1 seek=$bytecount count=0
#### or if you have python
# python -c "open(\"$filename\", \"ab\").truncate($bytecount)"
I would love to hear of a more generic (bash-only?) way to do the partial truncate at the end and complete this answer. Of course the truncate can be done with dd as well, but I think that was already ruled out for my earlier answer.
And for the record this site lists how to do an in-place file truncation in many different languages - in case any of these could be used in your environment.
If you can indicate under which circumstances on which platform(s) the most obvious Awk script is failing for you, perhaps we can devise a workaround.
awk "NR!=$N" infile >outfile
If course, obtaining $N with grep just to feed it to Awk is pretty bass-ackwards. This will delete the line containing the first occurrence of foo:
awk '/foo/ { if (!p++) next } 1' infile >outfile
Based on Digital Trauma's answere, I found an improvement that just needs grep and echo, but no tempfile:
echo $(grep -v PATTERN file.txt) > file.txt
Depending on the kind of lines your file contains and whether your pattern requires a more complex syntax or not, you can embrace the grep command with double quotes:
echo "$(grep -v PATTERN file.txt)" > file.txt
(useful when deleting from your crontab)

Bash script that prints out contents of a binary file, one word at a time, without xxd

I'd like to create a BASH script that reads a binary file, word (32-bits) by word and pass that word to an application called devmem.
Right now, I have:
...
for (( i=0; i<${num_words}; i++ ))
do
val=$(dd if=${file_name} skip=${i} count=1 bs=4 2>/dev/null)
echo -e "${val}" # Weird output...
devmem ${some_address} 32 ${val}
done
...
${val} has some weird (ASCII?) format character representations that looks like a diamond with a question mark.
If I replace the "val=" line with:
val=$(dd ... | xxd -r -p)
I get my desired output.
What is the easiest way of replicating the functionality of xxd using BASH?
Note: I'm working on an embedded Linux project where the requirements don't allow me to install xxd.
This script is performance driven, please correct me if I'm wrong in my approach, but for this reason I chose (dd -> binary word -> devmem) instead of (hexdump -> file, parse file -> devmem).
- Regardless of the optimal route for my end goal, this exercise has been giving me some trouble and I'd very much appreciate someone helping me figure out how to do this.
Thanks!
As I see you shouldn't use '|' or echo, because they are both ASCII tools. Instead I think '>' could work for you.
I think devmem is a bash function or alias, so I would try something like this:
for (( i=0; i<${num_words}; i++ ))
do
dd if=${file_name} skip=${i} count=1 bs=4 2>/dev/null 1> binary_file
# echo -e "${val}" # Weird output...
devmem ${some_address} 32 $(cat binary_file)
done
"As cat simply catenates streams of bytes, it can be also used to concatenate binary files, where it will just concatenate sequence of bytes." wiki
Or you can alter devmem to accept file as input...I hope this will help!

Strip bash script from beginning of gzip file

I have a series of files which are comprised of a bash script, at the end of which a gzip file has been concatenated.
I would like a method of stripping off the leading bash, to leave a pure gzip file.
The method I have come up with is to:
Do a hex dump on the file;
Use sed to remove everything before the gzip magic number 1f 8b;
Convert the remaining hex dump back to binary.
i.e.
xxd -c1 -p input | tr "\n" " " | sed 's/^.*?1f 8b/1f 8b' | xxd -r -p > output
This appears to work okay on first glance. However, it would fall apart if the gzip portion of the file happens to contain the byte sequence 1f 8b apart from in the initial header. In these cases it deletes everything before the last occurrence.
Is my initial attempt on the right track, and what can I do to fix it? Or is there a much better way to do this that I have missed?
I would use the sed line range functionality to accomplish this. -n suppresses normal printing, and the range /\x1f\x8b/,$ will match every line after and including the first one with \x1f\x8b in it and print them out.
sed -n '/\x1f\x8b/,$ p'
Alternatively, depending on your tastes, you can add a text marker "### BEGIN GZIP DATA ###" and delete everything before and including it:
sed '1,/### BEGIN GZIP DATA ###/ d'
Perl solution. It sets the record separator to the magic sequence and prints all the records except the first one. The magic sequence must be prepended at the beginning, otherwise, it would be lost together with the bash script, which is the first record.
perl -ne 'BEGIN { $/ = "\x1f\x8b"; print $/; } print if $. != 1' input > output.gz

Resources