How to Write Block of Random Data to File using Bash - linux

I need to write a 5MB block of data obtained from /dev/urandom to a partition, at a specific location in the partition. I then need to check that the write executed correctly. I was previously successfully doing this in C++, but now I would like to implement it in a bash script.
My C++ code consisted of:
create 10MB array of data populated from /dev/urandom (RANDOM_ARRAY)
open partition with open()
use lseek() to navigate to desired position in partition
use write() to write the array into the partition
close and reopen partition, use lseek to navigate back to desired position
use read() to read 5MB at this position and populate another array with this data (WRITTEN_ARRAY)
compare each element in (RANDOM_ARRAY) with (WRITTEN_ARRAY)
Im not experienced with writing bash scripts but this is what I've got so far, although it doesn't seem to work:
random_data="$(cat /dev/urandom | head -c<5MB>)"
printf $random_data | dd of=<partition_path> bs=1 seek=<position_in_partition> count=<5MB>
file_data=$(dd if=<partition_path> bs=1 skip=<position_in_partition> count=<5MB>)
if [ "$random_data" == "$file_data" ]
then
echo "data write successful"
fi
Thanks to the helpful commenters my script now looks something like this:
# get 10MB random data
head -c<10MB> /dev/urandom > random.bin
# write 10MB random data to partition
dd if=random.bin of=<partition_location>
# copy the written data
dd if=<partition_location> count=<10MB/512 bytes> of=newdata.bin
# compare
cmp random.bin newdata.bin
At this point cmp returns that the first char is different. Looking at a verbose output of cmp and turns out all values in newdata.bin are 0.

Here's a simpler approach which just saves the data in a temporary file.
#!/bin/sh
set -e
random_data=$(mktemp -t ideone.XXXXXXXX) || exit
trap 'rm -rf "$t"' EXIT
dd if=/dev/urandom bs=10240 count=512 of="$random_data"
dd if="$random_data" of=<partition_path> bs=1 seek=<position_in_partition> count=<5MB>
if dd if=<partition_path> bs=1 skip=<position_in_partition> count=<5MB> |
cmp "$random_data"
then
echo "data write successful"
fi

Bash strings cannot hold arbitrary binary data because the ASCII NUL character is used as a string terminator.
One way to do what you want to do is to put the data in files instead of variables and use cmp to compare the files.
Another option is to store cryptographic hashes of the data in Bash variables. This Shellcheck-clean code demonstrates the idea:
#! /bin/bash -p
partition_path=testpart
position_in_partition=10
sha256_1=$( exec 3>&1
head -c 5MB /dev/urandom \
| tee >(sha256sum >&3) \
| dd of="$partition_path" bs=1 \
seek="$position_in_partition" count=5MB)
sha256_2=$(dd if="$partition_path" bs=1 \
skip="$position_in_partition" count=5MB \
| sha256sum)
[[ $sha256_1 == "$sha256_2" ]] && echo 'data write successful'
You'll need to set the partition_path and position_in_partion variables to values that are appropriate for you.
exec 3>&1 connects file descriptor 3 to the stream that is used to read the value of sha256_1.
tee >(sha256sum >&3) uses the standard tee utility and Bash process substitution to copy the pipeline data as input to a sha256sum process whose output is redirected to file descriptor 3. The effect of this is that the sha256sum output (with the trailing newline removed) becomes the value of the sha256_1 variable.
You can use a stronger cryptographic hash function by replacing sha256sum with (for example) sha512sum.

Related

NULL (\0) added at the end of file

I'm trying to clean a binary file to delete all the NULL on it. The task is quite simple, but I found out a lot of files have a NULL at the end of the file and i dont know what. I'm dumping the hexadecimal value of each byte and i dont see the null anywhere, but if I do a hexdump of the file, I see a value 00 at the end and I dont know why.... Could be that it is a EOF, but it's weird becuase it doesnt appear in all files. This is the script I have, quite simpel one, it generates 100 random binary files, and then reads file per file, char per char. Following the premise that bash wont store NULL's on variables, rewritting the char after storing it on a variable would avoid the NULL's, but no....
#!/bin/bash
for i in $(seq 0 100)
do
echo "$i %"
time dd if=/dev/urandom of=$i bs=1 count=1000
while read -r -n 1 c;
do
echo -n "$c" >> temp
done < $i
mv temp $i
done
I also tried with:
tr '\000' <inFile > outfile
But same result.
This is how it looks the hexdump of one the files with this problem
00003c0 0b12 a42b cb50 2a90 1fd6 a4f9 89b4 ddb6
00003d0 3fa3 eb7e 00c4
c4 should be the last byte butas you can see, there's a 00 there ....
Any clue?
EDIT:
Forgot to mention that the machine where im running this is something similar like raspberry pi and the tools provided with it are quite limited.
Try these other commands:
od -tx1 inFile
xxd inFile
hexdump outputs 00 when the size is an odd number of bytes.
It seems hexdump without options is like -x, hexdump -h gives the list of options; hexdump -C may also help.

Compute checksum on file from command line

Looking for a command or set of commands that are readily available on Linux distributions that would allow me to create a script to generate a checksum for a file.
This checksum is generated by a build system I have no control over by summing every single byte in the file and then truncating that number to 4 bytes.
I know how to do do this using tools like node.js, perl, python, C/C++, etc, but I need to be able to do this on a bare bones Linux distribution running remotely that I can't modify (it's on a PLC).
Any ideas? I've been searching for awhile and haven't found anything that looks straightforward yet.
The solution for byte by byte summation and truncating that number to 4 bytes using much primitive shell commands.
#! /usr/bin/env bash
declare -a bytes
bytes=(`xxd -p -c 1 INPUT_FILE | tr '\n' ' '`)
total=0;
for(( i=0; i<${#bytes[#]}; i++));
do
total=$(($total + 0x${bytes[i]}))
if [ $total > 4294967295 ]; then
total=$(($total & 4294967295))
fi
done
echo "Checksum: " $total
If you just want to do byte by byte summation and truncating that number to 4 bytes then the following command can be used.
xxd -p -c 1 <Input file> | awk '{s+=$1; if(s > 4294967295) s = and(4294967295, s) } END {print s}'
The xxd command is used extract hexdump of the input file and each byte is added to compute the sum. If the sum exceeds 2^32-1 = 4294967295 value, then a bitwise and operation is performed to truncate the bits.
Have you tried cksum? I use it inside a few scripts. It's very simple to use.
http://linux.die.net/man/1/cksum

Remove a specific line from a file WITHOUT using sed or awk

I need to remove a specific line number from a file using a bash script.
I get the line number from the grep command with the -n option.
I cannot use sed for a variety of reasons, least of which is that it is not installed on all the systems this script needs to run on and installing it is not an option.
awk is out of the question because in testing, on different machines with different UNIX/Linux OS's (RHEL, SunOS, Solaris, Ubuntu, etc.), it gives (sometimes wildly) different results on each. So, no awk.
The file in question is just a flat text file, with one record per line, so nothing fancy needs to be done, except for remove the line by number.
If at all possible, I need to avoid doing something like extracting the contents of the file, not including the line I want gone, and then overwriting the original file.
Since you have grep, the obvious thing to do is:
$ grep -v "line to remove" file.txt > /tmp/tmp
$ mv /tmp/tmp file.txt
$
But it sounds like you don't want to use any temporary files - I assume the input file is large and this is an embedded system where memory and storage are in short supply. I think you ideally need a solution that edits the file in place. I think this might be possible with dd but haven't figured it out yet :(
Update - I figured out how to edit the file in place with dd. Also grep, head and cut are needed. If these are not available then they can probably be worked around for the most part:
#!/bin/bash
# get the line number to remove
rline=$(grep -n "$1" "$2" | head -n1 | cut -d: -f1)
# number of bytes before the line to be removed
hbytes=$(head -n$((rline-1)) "$2" | wc -c)
# number of bytes to remove
rbytes=$(grep "$1" "$2" | wc -c)
# original file size
fsize=$(cat "$2" | wc -c)
# dd will start reading the file after the line to be removed
ddskip=$((hbytes + rbytes))
# dd will start writing at the beginning of the line to be removed
ddseek=$hbytes
# dd will move this many bytes
ddcount=$((fsize - hbytes - rbytes))
# the expected new file size
newsize=$((fsize - rbytes))
# move the bytes with dd. strace confirms the file is edited in place
dd bs=1 if="$2" skip=$ddskip seek=$ddseek conv=notrunc count=$ddcount of="$2"
# truncate the remainder bytes of the end of the file
dd bs=1 if="$2" skip=$newsize seek=$newsize count=0 of="$2"
Run it thusly:
$ cat > file.txt
line 1
line two
line 3
$ ./grepremove "tw" file.txt
7+0 records in
7+0 records out
0+0 records in
0+0 records out
$ cat file.txt
line 1
line 3
$
Suffice to say that dd is a very dangerous tool. You can easily unintentionally overwrite files or entire disks. Be very careful!
Try ed. The here-document-based example below deletes line 2 from test.txt
ed -s test.txt <<!
2d
w
!
You can do it without grep using posix shell builtins which should be on any *nix.
while read LINE || [ "$LINE" ];do
case "$LINE" in
*thing_you_are_grepping_for*)continue;;
*)echo "$LINE";;
esac
done <infile >outfile
If n is the line you want to omit:
{
head -n $(( n-1 )) file
tail +$(( n+1 )) file
} > newfile
Given dd is deemed too dangerous for this in-place line removal, we need some other method where we have fairly fine-grained control over the file system calls. My initial urge is to write something in c, but while possible, I think that is a bit of overkill. Instead it is worth looking to common scripting (not shell-scripting) languages, as these typically have fairly low-level file APIs which map to the file syscalls in a fairly straightforward manner. I'm guessing this can be done using python, perl, Tcl or one of many other scripting language that might be available. I'm most familiar with Tcl, so here we go:
#!/bin/sh
# \
exec tclsh "$0" "$#"
package require Tclx
set removeline [lindex $argv 0]
set filename [lindex $argv 1]
set infile [open $filename RDONLY]
for {set lineNumber 1} {$lineNumber < $removeline} {incr lineNumber} {
if {[eof $infile]} {
close $infile
puts "EOF at line $lineNumber"
exit
}
gets $infile line
}
set bytecount [tell $infile]
gets $infile rmline
set outfile [open $filename RDWR]
seek $outfile $bytecount start
while {[gets $infile line] >= 0} {
puts $outfile $line
}
ftruncate -fileid $outfile [tell $outfile]
close $infile
close $outfile
Note on my particular box I have Tcl 8.4, so I had to load the Tclx package in order to use the ftruncate command. In Tcl 8.5, there is chan truncate which could be used instead.
You can pass the line number you want to remove and the filename to this script.
In short, the script does this:
open the file for reading
read the first n-1 lines
get the offset of the start of the next line (line n)
read line n
open the file with a new FD for writing
move the file location of the write FD to the offset of the start of line n
continue reading the remaining lines from the read FD and write them to the write FD until the whole read FD is read
truncate the write FD
The file is edited exactly in place. No temporary files are used.
I'm pretty sure this can be re-written in python or perl or ... if necessary.
Update
Ok, so in-place line removal can be done in almost-pure bash, using similar techniques to the Tcl script above. But the big caveat is that you need to have truncate command available. I do have it on my Ubuntu 12.04 VM, but not on my older Redhat-based box. Here is the script:
#!/bin/bash
n=$1
filename=$2
exec 3<> $filename
exec 4<> $filename
linecount=1
bytecount=0
while IFS="" read -r line <&3 ; do
if [[ $linecount == $n ]]; then
echo "omitting line $linecount: $line"
else
echo "$line" >&4
((bytecount += ${#line} + 1))
fi
((linecount++))
done
exec 3>&-
exec 4>&-
truncate -s $bytecount $filename
#### or if you can tolerate dd, just to do the truncate:
# dd of="$filename" bs=1 seek=$bytecount count=0
#### or if you have python
# python -c "open(\"$filename\", \"ab\").truncate($bytecount)"
I would love to hear of a more generic (bash-only?) way to do the partial truncate at the end and complete this answer. Of course the truncate can be done with dd as well, but I think that was already ruled out for my earlier answer.
And for the record this site lists how to do an in-place file truncation in many different languages - in case any of these could be used in your environment.
If you can indicate under which circumstances on which platform(s) the most obvious Awk script is failing for you, perhaps we can devise a workaround.
awk "NR!=$N" infile >outfile
If course, obtaining $N with grep just to feed it to Awk is pretty bass-ackwards. This will delete the line containing the first occurrence of foo:
awk '/foo/ { if (!p++) next } 1' infile >outfile
Based on Digital Trauma's answere, I found an improvement that just needs grep and echo, but no tempfile:
echo $(grep -v PATTERN file.txt) > file.txt
Depending on the kind of lines your file contains and whether your pattern requires a more complex syntax or not, you can embrace the grep command with double quotes:
echo "$(grep -v PATTERN file.txt)" > file.txt
(useful when deleting from your crontab)

Bash while read loop extremely slow compared to cat, why?

A simple test script here:
while read LINE; do
LINECOUNT=$(($LINECOUNT+1))
if [[ $(($LINECOUNT % 1000)) -eq 0 ]]; then echo $LINECOUNT; fi
done
When I do cat my450klinefile.txt | myscript the CPU locks up at 100% and it can process about 1000 lines a second. About 5 minutes to process what cat my450klinefile.txt >/dev/null does in half a second.
Is there a more efficient way to do essentially this. I just need to read a line from stdin, count the bytes, and write it out to a named pipe. But the speed of even this example is impossibly slow.
Every 1Gb of input lines I need to do a few more complex scripting actions (close and open some pipes that the data is being feed to).
The reason while read is so slow is that the shell is required to make a system call for every byte. It cannot read a large buffer from the pipe, because the shell must not read more than one line from the input stream and therefore must compare each character against a newline. If you run strace on a while read loop, you can see this behavior. This behavior is desirable, because it makes it possible to reliably do things like:
while read size; do test "$size" -gt 0 || break; dd bs="$size" count=1 of=file$(( i++ )); done
in which the commands inside the loop are reading from the same stream that the shell reads from. If the shell consumed a big chunk of data by reading large buffers, the inner commands would not have access to that data. An unfortunate side-effect is that read is absurdly slow.
It's because the bash script is interpreted and not really optimised for speed in this case. You're usually better off using one of the external tools such as:
awk 'NR%1000==0{print}' inputFile
which matches your "print every 1000 lines" sample.
If you wanted to (for each line) output the line count in characters followed by the line itself, and pipe it through another process, you could also do that:
awk '{print length($0)" "$0}' inputFile | someOtherProcess
Tools like awk, sed, grep, cut and the more powerful perl are far more suited to these tasks than an interpreted shell script.
The perl solution for count bytes of each string:
perl -p -e '
use Encode;
print length(Encode::encode_utf8($_))."\n";$_=""'
for example:
dd if=/dev/urandom bs=1M count=100 |
perl -p -e 'use Encode;print length(Encode::encode_utf8($_))."\n";$_=""' |
tail
works for me as 7.7Mb/s
to compare how much script used:
dd if=/dev/urandom bs=1M count=100 >/dev/null
run as 9.1Mb/s
seems script not so slow :)
Not really sure what your script is supposed to do. So this might not be an answer to your question but more of a generic tip.
Don't cat your file and pipe it to your script, instead when reading from a file with a bash script do it like this:
while read line
do
echo $line
done <file.txt

Pull fields/attributes from lsof (Linux command line)

With the recent move to Flash 10 (or maybe it was a distro choice), I and many others are no longer able to copy Flash videos from /tmp. I have, however, found a workaround in the following:
First, execute:
lsof | grep Flash
which should return output like this:
plugin-co 8935 richard 16w REG 8,1 4139180 8220 /tmp/FlashXXq4KyOZ (deleted)
Note: You can see the problem here....the /tmp file has the file pointer released.
You are, however, able to grab the file by using the cp command thusly:
cp /proc/#/fd/# video.flv
where the 1st # is the process ID (8935) and the second if the next number (16, from 16w).
Currently, this works, but it requires a few manual steps. To automate this, I figure I could pull the PID and the fd number and insert them dynamically into the cp command.
My question is how do I pull the appropriate fields into variables? I know you can use $1, etc. for grabbing input arguments, but how do you retrieve outputs?
Note: I could use pidof plugin-container to find the PID, but I still need the other number (since it tells which specific flash video to save).
The following command will return PIDs and FDs for all the files in /tmp that have filenames that begin with "Flash"
lsof -F pfn /tmp/Flash*
and the output will look something like this:
p16471
f16
n/tmp/FlashXXq4KyOZ
f17
n/tmp/FlashXXq4KyOZ
p26588
f16
n/tmp/FlashYYh3JwIW
f17
Where the field identifiers are p: PID, f: FD, n: NAME. The -F option is designed to make the output of lsof easy to parse.
Iterating over these and removing the field identifiers is trivial.
#!/bin/bash
c=-1
while read -r line
do
case $line in
f*)
fds[pids[c]]+=${line:1}" "
;;
n*)
names[pids[c]]+=${line:1}" "
;;
p*)
pids[++c]=${line:1}
;;
esac
done < <(lsof -F pfn -- /tmp/Flash*)
for ((i=0; i<=c; i++))
do
for name in ${names[pids[i]]}
do
for fd in ${fds[pids[i]]}
do
echo "File: $name, Process ID: ${pids[i]}, File Descriptor: $fd"
done
done
done
Lines like this:
fds[pids[c]]+=${line:1}" "
accumulate file descriptors in a string stored in an array indexed by the PID. Doing this for file names will fail for filenames which contain spaces. That could be worked around if necessary.
The line is stripped of the leading field descriptor character by using a substring operator: ${line:1} starts at position one and includes the rest of the string so it drops character zero.
The second loop is just a demo to show iterating over the arrays.
var=$(lsof | awk '/Flash/{gsub(/[^0-9]/,"",$4);print $2 FS $4};exit')
set -- $var
pid=$1
number=$2
Completed Script:
#!/bin/sh
if [ $1 ]; then
#lsof | grep Flash | awk '{print $2}' also works for PID
pid=$(pidof plugin-container)
file_num=$(lsof -p $pid | grep /tmp/Flash | awk '{print substr($4,1,2)}')
cp /proc/$pid/fd/$file_num ~/Downloads/"$1".flv
else
echo "Please enter video name as argument."
fi
Avoid using lsof because it takes too long (>30 seconds) to return the path. The below .bashrc line will work with vlc, mplayer, or whatever you put in and return the path to the deleted temp file in milliseconds.
flashplay () {
vlc $(stat -c %N /proc/*/fd/* 2>&1|awk -F[\`\'] '/lash/{print$2}')
}

Resources