understanding and decoding the file mode value from stat function output - linux

I have been trying to understand what exactly is happening in the below mentioned code. But i am not able to understand it.
$mode = (stat($filename))[2];
printf "Permissions are %04o\n", $mode & 07777;
Lets say my $mode value is 33188
$mode & 07777 yields a value = 420
is the $mode value a decimal number ?
why we are choosing 07777 and why we are doing a bitwise and operation. I am not able to underand the logic in here.

The mode from your question corresponds to a regular file with 644 permissions (read-write for the owner and read-only for everyone else), but don’t take my word for it.
$ touch foo
$ chmod 644 foo
$ perl -le 'print +(stat "foo")[2]'
33188
The value of $mode can be viewed as a decimal integer, but doing so is not particularly enlightening. Seeing the octal representation gives something a bit more familiar.
$ perl -e 'printf "%o\n", (stat "foo")[2]'
100644
Bitwise AND with 07777 gives the last twelve bits of a number’s binary representation. With a Unix mode, this operation gives the permission or mode bits and discards any type information.
$ perl -e 'printf "%d\n", (stat "foo")[2] & 07777' # decimal, not useful
420
$ perl -e 'printf "%o\n", (stat "foo")[2] & 07777' # octal, eureka!
644
A nicer way to do this is below. Read on for all the details.
Mode Bits
The third element returned from stat (which corresponds to st_mode in struct stat) is a bit field where the different bit positions are binary flags.
For example, one bit in st_mode POSIX names S_IWUSR. A file or directory whose mode has this bit set is writable by its owner. A related bit is S_IROTH that when set means other users (i.e., neither the owner nor in the group) may read that particular file or directory.
The perlfunc documentation for stat gives the names of commonly available mode bits. We can examine their values.
#! /usr/bin/env perl
use strict;
use warnings;
use Fcntl ':mode';
my $perldoc_f_stat = q(
# Permissions: read, write, execute, for user, group, others.
S_IRWXU S_IRUSR S_IWUSR S_IXUSR
S_IRWXG S_IRGRP S_IWGRP S_IXGRP
S_IRWXO S_IROTH S_IWOTH S_IXOTH
# Setuid/Setgid/Stickiness/SaveText.
# Note that the exact meaning of these is system dependent.
S_ISUID S_ISGID S_ISVTX S_ISTXT
# File types. Not necessarily all are available on your system.
S_IFREG S_IFDIR S_IFLNK S_IFBLK S_IFCHR S_IFIFO S_IFSOCK S_IFWHT S_ENFMT
);
my %mask;
foreach my $sym ($perldoc_f_stat =~ /\b(S_I\w+)\b/g) {
my $val = eval { no strict 'refs'; &$sym() };
if (defined $val) {
$mask{$sym} = $val;
}
else {
printf "%-10s - undefined\n", $sym;
}
}
my #descending = sort { $mask{$b} <=> $mask{$a} } keys %mask;
printf "%-10s - %9o\n", $_, $mask{$_} for #descending;
On Red Hat Enterprise Linux and other operating systems in the System V family, the output of the above program will be
S_ISTXT - undefined
S_IFWHT - undefined
S_IFSOCK - 140000
S_IFLNK - 120000
S_IFREG - 100000
S_IFBLK - 60000
S_IFDIR - 40000
S_IFCHR - 20000
S_IFIFO - 10000
S_ISUID - 4000
S_ISGID - 2000
S_ISVTX - 1000
S_IRWXU - 700
S_IRUSR - 400
S_IWUSR - 200
S_IXUSR - 100
S_IRWXG - 70
S_IRGRP - 40
S_IWGRP - 20
S_IXGRP - 10
S_IRWXO - 7
S_IROTH - 4
S_IWOTH - 2
S_IXOTH - 1
Bit twiddling
The numbers above are octal (base 8), so any given digit must be 0-7 and has place value 8n, where n is the zero-based number of places to the left of the radix point. To see how they map to bits, octal has the convenient property that each digit corresponds to three bits. Four, two, and 1 are all exact powers of two, so in binary, they are 100, 10, and 1 respectively. Seven (= 4 + 2 + 1) in binary is 111, so then 708 is 1110002. The latter example shows how converting back and forth is straightforward.
With a bit field, you don’t care exactly what the value of a bit in that position is but whether it is zero or non-zero, so
if ($mode & $mask) {
tests whether any bit in $mode corresponding to $mask is set. For a simple example, given the 4-bit integer 1011 and a mask 0100, their bitwise AND is
1011
& 0100
------
0000
So the bit in that position is clear—as opposed to a mask of, say, 0010 or 1100.
Clearing the most significant bit of 1011 looks like
1011 1011
& ~(1000) = & 0111
------
0011
Recall that ~ in Perl is bitwise complement.
For completeness, set a bit with bitwise OR as in
$bits |= $mask;
Octal and file permissions
An octal digit’s direct mapping to three bits is convenient for Unix permissions because they come in groups of three. For example, the permissions for the program that produced the output above are
-rwxr-xr-x 1 gbacon users 1096 Feb 24 20:34 modebits
That is, the owner may read, write, and execute; but everyone else may read and execute. In octal, this is 755—a compact shorthand. In terms of the table above, the set bits in the mode are
S_IRUSR
S_IWUSR
S_IXUSR
S_IRGRP
S_IXGRP
S_IROTH
S_IXOTH
We can decompose the mode from your question by adding a few lines to the program above.
my $mode = 33188;
print "\nBits set in mode $mode:\n";
foreach my $sym (#descending) {
if (($mode & $mask{$sym}) == $mask{$sym}) {
print " - $sym\n";
$mode &= ~$mask{$sym};
}
}
printf "extra bits: %o\n", $mode if $mode;
The mode test has to be more careful because some of the masks are shorthand for multiple bits. Testing that we get the exact mask back avoids false positives when some of the bits are set but not all.
The loop also clears the bits from all detected hits so at the end we can check that we have accounted for each bit. The output is
Bits set in mode 33188:
- S_IFREG
- S_IRUSR
- S_IWUSR
- S_IRGRP
- S_IROTH
No extra warning, so we got everything.
That magic 07777
Converting 77778 to binary gives 0b111_111_111_111. Recall that 78 is 1112, and four 7s correspond to 4×3 ones. This mask is useful for selecting the set bits in the last twelve. Looking back at the bit masks we generated earlier
S_ISUID - 4000
S_ISGID - 2000
S_ISVTX - 1000
S_IRWXU - 700
S_IRWXG - 70
S_IRWXO - 7
we see that the last 9 bits are the permissions for user, group, and other. The three bits preceding those are the setuid, setgroupid, and what is sometimes called the sticky bit. For example, the full mode of sendmail on my system is -rwxr-sr-x or 3428510. The bitwise AND works out to be
(dec) (oct) (bin)
34285 102755 1000010111101101
& 4095 = & 7777 = & 111111111111
------- -------- ------------------
1517 = 2755 = 10111101101
The high bit in the mode that gets discarded is S_IFREG, the indicator that it is a regular file. Notice how much clearer the mode expressed in octal is when compared with the same information in decimal or binary.
The stat documentation mentions a helpful function.
… and the S_IF* functions are
S_IMODE($mode)
the part of $mode containing the permission bits and the setuid/setgid/sticky bits
In ext/Fcntl/Fcntl.xs, we find its implementation and a familiar constant on the last line.
void
S_IMODE(...)
PREINIT:
dXSTARG;
SV *mode;
PPCODE:
if (items > 0)
mode = ST(0);
else {
mode = &PL_sv_undef;
EXTEND(SP, 1);
}
PUSHu(SvUV(mode) & 07777);
To avoid the bad practice of magic numbers in source code, write
my $permissions = S_IMODE $mode;
Using S_IMODE and other functions available from the Fcntl module also hides the low-level bit twiddling and focuses on the domain-level information the program wants. The documentation continues
S_IFMT($mode)
the part of $mode containing the file type which can be bit-anded with (for example) S_IFREG or with the following functions
# The operators -f, -d, -l, -b, -c, -p, and -S.
S_ISREG($mode) S_ISDIR($mode) S_ISLNK($mode)
S_ISBLK($mode) S_ISCHR($mode) S_ISFIFO($mode) S_ISSOCK($mode)
# No direct -X operator counterpart, but for the first one
# the -g operator is often equivalent. The ENFMT stands for
# record flocking enforcement, a platform-dependent feature.
S_ISENFMT($mode) S_ISWHT($mode)
Using these constants and functions will make your programs clearer by more directly expressing your intent.

It is explained in perldoc -f stat, which is where I assume you found this example:
Because the mode contains both the file type and its
permissions, you should mask off the file type portion and
(s)printf using a "%o" if you want to see the real permissions.
The output of printf "%04o", 420 is 0644 which is the permissions on your file. 420 is just the decimal representation of the octal number 0644.
If you try and print the numbers in binary form, it is easier to see:
perl -lwe 'printf "%016b\n", 33188'
1000000110100100
perl -lwe 'printf "%016b\n", 33188 & 07777'
0000000110100100
As you'll notice, bitwise and removes the leftmost bit in the number above, which presumably represents file type, leaving you with only the file permissions. This number 07777 is the binary number:
perl -lwe 'printf "%016b\n", 07777'
0000111111111111
Which acts as a "mask" in the bitwise and. Since 1 & 1 = 1, and 0 & 1 = 0, it means that any bit that is not matched by a 1 in 07777 is set to 0.

Related

Setting value of command prompt ( PS1) based on present directory string length

I know I can do this to reflect just last 2 directories in the PS1 value.
PS1=${PWD#"${PWD%/*/*}/"}#
but lets say we have a directory name that's really messy and will reduce my working space , like
T-Mob/2021-07-23--07-48-49_xperia-build-20191119010027#
OR
2021-07-23--07-48-49_nokia-build-20191119010027/T-Mob#
those are the last 2 directories before the prompt
I want to set a condition if directory length of either of the last 2 directories is more than a threshold e.g. 10 chars , shorten the name with 1st 3 and last 3 chars of the directory (s) whose length exceeds 10
e.g.
2021-07-23--07-48-49_xperia-build-20191119010027 &
2021-07-23--07-48-49_nokia-build-20191119010027
both gt 10 will be shortened to 202*027 & PS1 will be respectively
T-Mob/202*027/# for T-Mob/2021-07-23--07-48-49_xperia-build-20191119010027# and
202*027/T-Mob# for 2021-07-23--07-48-49_nokia-build-20191119010027/T-Mob#
A quick 1 Liner to get this done ?
I cant post this in comments so Updating here. Ref to Joaquins Answer ( thx J)
PS1=''`echo ${PWD#"${PWD%/*/*}/"} | awk -v RS='/' 'length() <=10{printf $0"/"}; length()>10{printf "%s*%s/", substr($0,1,3), substr($0,length()-2,3)};'| tr -d "\n"; echo "#"`''
see below o/p's
/root/my-applications/bin # it shortened as expected
my-*ons/bin/#cd - # going back to prev.
/root
my-*ons/bin/# #value of prompt is the same but I am in /root
A one-liner is basically always the wrong choice. Write code to be robust, readable and maintainable (and, for something that's called frequently or in a tight loop, to be efficient) -- not to be terse.
Assuming availability of bash 4.3 or newer:
# Given a string, a separator, and a max length, shorten any segments that are
# longer than the max length.
shortenLongSegments() {
local -n destVar=$1; shift # arg1: where do we write our result?
local maxLength=$1; shift # arg2: what's the maximum length?
local IFS=$1; shift # arg3: what character do we split into segments on?
read -r -a allSegments <<<"$1"; shift # arg4: break into an array
for segmentIdx in "${!allSegments[#]}"; do # iterate over array indices
segment=${allSegments[$segmentIdx]} # look up value for index
if (( ${#segment} > maxLength )); then # value over maxLength chars?
segment="${segment:0:3}*${segment:${#segment}-3:3}" # build a short version
allSegments[$segmentIdx]=$segment # store shortened version in array
fi
done
printf -v destVar '%s\n' "${allSegments[*]}" # build result string from array
}
# function to call from PROMPT_COMMAND to actually build a new PS1
buildNewPs1() {
# declare our locals to avoid polluting global namespace
local shorterPath
# but to cache where we last ran, we need a global; be explicit.
declare -g buildNewPs1_lastDir
# do nothing if the directory hasn't changed
[[ $PWD = "$buildNewPs1_lastDir" ]] && return 0
shortenLongSegments shorterPath 10 / "$PWD"
PS1="${shorterPath}\$"
# update the global tracking where we last ran this code
buildNewPs1_lastDir=$PWD
}
PROMPT_COMMAND=buildNewPs1 # call buildNewPs1 before rendering the prompt
Note that printf -v destVar %s "valueToStore" is used to write to variables in-place, to avoid the performance overhead of var=$(someFunction). Similarly, we're using the bash 4.3 feature namevars -- accessible with local -n or declare -n -- to allow destination variable names to be parameterized without the security risk of eval.
If you really want to make this logic only apply to the last two directory names (though I don't see why that would be better than applying it to all of them), you can do that easily enough:
buildNewPs1() {
local pathPrefix pathFinalSegments
pathPrefix=${PWD%/*/*} # everything but the last 2 segments
pathSuffix=${PWD#"$pathPrefix"} # only the last 2 segments
# shorten the last 2 segments, store in a separate variable
shortenLongSegments pathSuffixShortened 10 / "$pathSuffix"
# combine the unshortened prefix with the shortened suffix
PS1="${pathPrefix}${pathSuffixShortened}\$"
}
...adding the performance optimization that only rebuilds PS1 when the directory changed to this version is left as an exercise to the reader.
Probably not the best solution, but a quick solution using awk:
PS1=`echo ${PWD#"${PWD%/*/*}/"} | awk -v RS='/' 'length()<=10{printf $0"/"}; length()>10{printf "%s*%s/", substr($0,1,3), substr($0,length()-2,3)};'| tr -d "\n"; echo "#"`
I got this results with your examples:
T-Mob/202*027/#
202*027/T-Mob/#

Why does glob lstat matching entries?

Looking into behavior in this question, I was surprised to see that perl lstat()s every path matching a glob pattern:
$ mkdir dir
$ touch dir/{foo,bar,baz}.txt
$ strace -e trace=lstat perl -E 'say $^V; <dir/b*>'
v5.10.1
lstat("dir/baz.txt", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
lstat("dir/bar.txt", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
I see the same behavior on my Linux system with glob(pattern) and <pattern>, and with later versions of perl.
My expectation was that the globbing would simply opendir/readdir under the hood, and that it would not need to inspect the actual pathnames it was searching.
What is the purpose of this lstat? Does it affect the glob()s return?
This strange behavior has been noticed before on PerlMonks. It turns out that glob calls lstat to support its GLOB_MARK flag, which has the effect that:
Each pathname that is a directory that matches the pattern has a slash appended.
To find out whether a directory entry refers to a subdir, you need to stat it. This is apparently done even when the flag is not given.
I was wondering the same thing - "What is the purpose of this lstat? Does it affect the glob()s return?"
Within bsd_glob.c glob2() I noticed a g_stat call within an if branch that required the GLOB_MARK flag to be set, I also noticed a call to g_lstat just before that was not guarded by a flag check. Both are within an if branch for when the end of pattern is reached.
If I remove these 2 lines in the glob2 function in perl-5.12.4/ext/File-Glob/bsd_glob.c
- if (g_lstat(pathbuf, &sb, pglob))
- return(0);
the only perl test (make test) that fails is test 5 in ext/File-Glob/t/basic.t with:
not ok 5
# Failed test at ../ext/File-Glob/t/basic.t line 92.
# Structures begin differing at:
# $got->[0] = 'asdfasdf'
# $expected->[0] = Does not exist
Test 5 in t/basic.t is
# check nonexistent checks
# should return an empty list
# XXX since errfunc is NULL on win32, this test is not valid there
#a = bsd_glob("asdfasdf", 0);
SKIP: {
skip $^O, 1 if $^O eq 'MSWin32' || $^O eq 'NetWare';
is_deeply(\#a, []);
}
If I replace the 2 lines removed with:
+ if (!((pglob->gl_flags & GLOB_NOCHECK) ||
+ ((pglob->gl_flags & GLOB_NOMAGIC) &&
+ !(pglob->gl_flags & GLOB_MAGCHAR)))){
+ if (g_lstat(pathbuf, &sb, pglob))
+ return(0);
+ }
I don't see any failures from "make test" for perl-5.12.4 on linux x86_64 (RHEL6.3 2.6.32-358.11.1.el6.x86_64) and when using:
strace -fe trace=lstat perl -e 'use File::Glob q{:glob};
print scalar bsd_glob(q{/var/log/*},GLOB_NOCHECK)'
I no longer see the lstat calls for each file in the dir.
I don't mean to suggest that the perl tests for glob (File-Glob) are comprehensive (they are not), or that a change such as this will not break existing
behaviour (this seems likely). As far as I can tell the code with this (g_l)stat call existed in original-bsd/lib/libc/gen/glob.c 24 years ago in 1990.
Also see:
Chapter 6. Benchmarking Perl of "Mastering Perl" By brian d foy, Randal L. Schwartz
contains a section on comparing code where code using glob() and opendir() is compared.
"future globs (was "UNIX mindset...")" in comp.unix.wizards from Dick Dunn in 1991.
Usenet newsgroup mod.sources "'Globbing' library routine (glob)" from Guido van Rossum in July 1986 - I don't see a reference to "stat" in this code.

How to generate random numbers in the BusyBox shell

How can I generate random numbers using AShell (restricted bash)? I am using a BusyBox binary on the device which does not have od or $RANDOM. My device has /dev/urandom and /dev/random.
$RANDOM and od are optional features in BusyBox, I assume given your question that they aren't included in your binary. You mention in a comment that /dev/urandom is present, that's good, it means what you need to do is retrieve bytes from it in a usable form, and not the much more difficult problem of implementing a random number generator. Note that you should use /dev/urandom and not /dev/random, see Is a rand from /dev/urandom secure for a login key?.
If you have tr or sed, you can read bytes from /dev/urandom and discard any byte that isn't a desirable character. You'll also need a way to extract a fixed number of bytes from a stream: either head -c (requiring FEATURE_FANCY_HEAD to be enabled) or dd (requiring dd to be compiled in). The more bytes you discard, the slower this method will be. Still, generating random bytes is usually rather fast in comparison with forking and executing external binaries, so discarding a lot of them isn't going to hurt much. For example, the following snippet will produce a random number between 0 and 65535:
n=65536
while [ $n -ge 65536 ]; do
n=1$(</dev/urandom tr -dc 0-9 | dd bs=5 count=1 2>/dev/null)
n=$((n-100000))
done
Note that due to buffering, tr is going to process quite a few more bytes than what dd will end up keeping. BusyBox's tr reads a bufferful (at least 512 bytes) at a time, and flushes its output buffer whenever the input buffer is fully processed, so the command above will always read at least 512 bytes from /dev/urandom (and very rarely more since the expected take from 512 input bytes is 20 decimal digits).
If you need a unique printable string, just discard non-ASCII characters, and perhaps some annoying punctuation characters:
nonce=$(</dev/urandom tr -dc A-Za-z0-9-_ | head -c 22)
In this situation, I would seriously consider writing a small, dedicated C program. Here's one that reads four bytes and outputs the corresponding decimal number. It doesn't rely on any libc function other than the wrappers for the system calls read and write, so you can get a very small binary. Supporting a variable cap passed as a decimal integer on the command line is left as an exercise; it'll cost you hundreds of bytes of code (not something you need to worry about if your target is big enough to run Linux).
#include <stddef.h>
#include <unistd.h>
int main () {
int n;
unsigned long x = 0;
unsigned char buf[4];
char dec[11]; /* Must fit 256^sizeof(buf) in decimal plus one byte */
char *start = dec + sizeof(dec) - 1;
n = read(0, buf, sizeof(buf));
if (n < (int)sizeof(buf)) return 1;
for (n = 0; n < (int)sizeof(buf); n++) x = (x << 8 | buf[n]);
*start = '\n';
if (x == 0) *--start = '0';
else while (x != 0) {
--start;
*start = '0' + (x % 10);
x = x / 10;
}
while (n = write(1, start, dec + sizeof(dec) - start),
n > 0 && n < dec + sizeof(dec) - start) {
start += n;
}
return n < 0;
}
</dev/urandom sed 's/[^[:digit:]]\+//g' | head -c10
/dev/random or /dev/urandom are likely to be present.
Another option is to write a small C program that calls srand(), then rand().
I Tried Gilles' first snippet with BusyBox 1.22.1 and I have some patches, which didn't fit into a comment:
while [ $n -gt 65535 ]; do
n=$(</dev/urandom tr -dc 0-9 | dd bs=5 count=1 2>/dev/null | sed -e 's/^0\+//' )
done
The loop condition should check for greater than the maximum value, otherwise there will be 0 executions.
I silenced dd's stderr
Leading zeros removed, which could lead to surprises in contexts where interpreted as octal (e.g. $(( )))
Hexdump and dc are both available with busybox. Use /dev/urandom for mostly random or /dev/random for better random. Either of these options are better than $RANDOM and are both faster than looping looking for printable characters.
32-bit decimal random number:
CNT=4
RND=$(dc 10 o 0x$(hexdump -e '"%02x" '$CNT' ""' -n $CNT /dev/random) p)
24-bit hex random number:
CNT=3
RND=0x$(hexdump -e '"%02x" '$CNT' ""' -n $CNT /dev/random)
To get smaller numbers, change the format of the hexdump format string and the count of bytes that hexdump reads.
Trying escitalopram's solution didn't work on busybox v1.29.0 but inspired me doing a function.
sI did actually come up with a portable random number generation function that asks for the number of digits and should work fairly well (tested on Linux, WinNT10 bash, Busybox and msys2 so far).
# Get a random number on Windows BusyBox alike, also works on most Unixes
function PoorMansRandomGenerator {
local digits="${1}" # The number of digits of the number to generate
local minimum=1
local maximum
local n=0
if [ "$digits" == "" ]; then
digits=5
fi
# Minimum already has a digit
for n in $(seq 1 $((digits-1))); do
minimum=$minimum"0"
maximum=$maximum"9"
done
maximum=$maximum"9"
#n=0; while [ $n -lt $minimum ]; do n=$n$(dd if=/dev/urandom bs=100 count=1 2>/dev/null | tr -cd '0-9'); done; n=$(echo $n | sed -e 's/^0//')
# bs=19 since if real random strikes, having a 19 digits number is not supported
while [ $n -lt $minimum ] || [ $n -gt $maximum ]; do
if [ $n -lt $minimum ]; then
# Add numbers
n=$n$(dd if=/dev/urandom bs=19 count=1 2>/dev/null | tr -cd '0-9')
n=$(echo $n | sed -e 's/^0//')
if [ "$n" == "" ]; then
n=0
fi
elif [ $n -gt $maximum ]; then
n=$(echo $n | sed 's/.$//')
fi
done
echo $n
}
The following gives a number between 1000 and 9999
echo $(PoorMansRandomGenerator 4)
Improved the above reply to a more simpler version,that also runs really faster, still compatible with Busybox, Linux, msys and WinNT10 bash.
function PoorMansRandomGenerator {
local digits="${1}" # The number of digits to generate
local number
# Some read bytes can't be used, se we read twice the number of required bytes
dd if=/dev/urandom bs=$digits count=2 2> /dev/null | while read -r -n1 char; do
number=$number$(printf "%d" "'$char")
if [ ${#number} -ge $digits ]; then
echo ${number:0:$digits}
break;
fi
done
}
Use with
echo $(PoorMansRandomGenerator 5)
​​​​​​​​​​​​​

Is there a command to write random garbage bytes into a file?

I am now doing some tests of my application again corrupted files. But I found it is hard to find test files.
So I'm wondering whether there are some existing tools, which can write random/garbage bytes into a file of some format.
Basically, I need this tool to:
It writes random garbage bytes into the file.
It does not need to know the format of the file, just writing random bytes are OK for me.
It is best to write at random positions of the target file.
Batch processing is also a bonus.
Thanks.
The /dev/urandom pseudo-device, along with dd, can do this for you:
dd if=/dev/urandom of=newfile bs=1M count=10
This will create a file newfile of size 10M.
The /dev/random device will often block if there is not sufficient randomness built up, urandom will not block. If you're using the randomness for crypto-grade stuff, you can steer clear of urandom. For anything else, it should be sufficient and most likely faster.
If you want to corrupt just bits of your file (not the whole file), you can simply use the C-style random functions. Just use rnd() to figure out an offset and length n, then use it n times to grab random bytes to overwrite your file with.
The following Perl script shows how this can be done (without having to worry about compiling C code):
use strict;
use warnings;
sub corrupt ($$$$) {
# Get parameters, names should be self-explanatory.
my $filespec = shift;
my $mincount = shift;
my $maxcount = shift;
my $charset = shift;
# Work out position and size of corruption.
my #fstat = stat ($filespec);
my $size = $fstat[7];
my $count = $mincount + int (rand ($maxcount + 1 - $mincount));
my $pos = 0;
if ($count >= $size) {
$count = $size;
} else {
$pos = int (rand ($size - $count));
}
# Output for debugging purposes.
my $last = $pos + $count - 1;
print "'$filespec', $size bytes, corrupting $pos through $last\n";
# Open file, seek to position, corrupt and close.
open (my $fh, "+<$filespec") || die "Can't open $filespec: $!";
seek ($fh, $pos, 0);
while ($count-- > 0) {
my $newval = substr ($charset, int (rand (length ($charset) + 1)), 1);
print $fh $newval;
}
close ($fh);
}
# Test harness.
system ("echo =========="); #DEBUG
system ("cp base-testfile testfile"); #DEBUG
system ("cat testfile"); #DEBUG
system ("echo =========="); #DEBUG
corrupt ("testfile", 8, 16, "ABCDEFGHIJKLMNOPQRSTUVWXYZ ");
system ("echo =========="); #DEBUG
system ("cat testfile"); #DEBUG
system ("echo =========="); #DEBUG
It consists of the corrupt function that you call with a file name, minimum and maximum corruption size and a character set to draw the corruption from. The bit at the bottom is just unit testing code. Below is some sample output where you can see that a section of the file has been corrupted:
==========
this is a file with nothing in it except for lowercase
letters (and spaces and punctuation and newlines).
that will make it easy to detect corruptions from the
test program since the character range there is from
uppercase a through z.
i have to make it big enough so that the random stuff
will work nicely, which is why i am waffling on a bit.
==========
'testfile', 344 bytes, corrupting 122 through 135
==========
this is a file with nothing in it except for lowercase
letters (and spaces and punctuation and newlines).
that will make iFHCGZF VJ GZDYct corruptions from the
test program since the character range there is from
uppercase a through z.
i have to make it big enough so that the random stuff
will work nicely, which is why i am waffling on a bit.
==========
It's tested at a basic level but you may find there are edge error cases which need to be taken care of. Do with it what you will.
Just for completeness, here's another way to do it:
shred -s 10 - > my-file
Writes 10 random bytes to stdout and redirects it to a file. shred is usually used for destroying (safely overwriting) data, but it can be used to create new random files too.
So if you have already have a file that you want to fill with random data, use this:
shred my-existing-file
You could read from /dev/random:
# generate a 50MB file named `random.stuff` filled with random stuff ...
dd if=/dev/random of=random.stuff bs=1000000 count=50
You can specify the size also in a human readable way:
# generate just 2MB ...
dd if=/dev/random of=random.stuff bs=1M count=2
You can also use cat and head. Both are usually installed.
# write 1024 random bytes to my-file-to-override
cat /dev/urandom | head -c 1024 > my-file-to-override

What does an ampersand in arithmetic evaluation and numbers with x's mean in Bash?

I'm curious about what exactly the following comparison does, in as much detail as possible, especially relating to the 0x2 and the & characters and what exactly they do,
if [ $((${nValid} & 0x1)) -eq 1 ]; then
#...snip...
fi
if [ $((${nValid} & 0x2)) -eq 2 ]; then
#...snip...
fi
& is the bitwise AND operator. So you are asking to do a bitwise and between 0x1 and the value that ${nVAlid} is returning.
For more information on bitwise operations look here.
A shell script interprets a number as decimal (base 10), unless that number has a special prefix or notation. A number preceded by a 0 is octal (base 8). A number preceded by 0x is hexadecimal (base 16). A number with an embedded # evaluates as BASE#NUMBER (with range and notational restrictions).
So, in [ $((${nValid} & 0x1)) -eq 1 ], $nValid is anded with 0x1 and compared with decimal 1. Similarly the second comparison too.
Read this and this for detailed info.
It's testing nValid on a per-bit basis.
The bitwise AND operator (&) means that bit-by-bit, the operator will do an AND comparison. So, if nValid is a byte (8 bit) value, then look at the operation in binary:
nValue & 0b00000001
If nValue is 42, then the operation would look like this
(nValue = 0b00101010) & 0b00000001 => 0b00000000 // (does not have the last bit set)
(nValue & 0b00000001) == 0b00000001 // false
and for the 2nd (nValid & 0x2)
(nValue = 0b00101010) & 0b00000010 => 0b00000010 // (has the 2nd-to-last bit set)
(nValue & 0b00000010) == 0b00000010 // true
This is useful for testing flags within variables; usually you use the AND to check for flags by isolating bits and the OR to combine flags.
0b00001000 | 0b00000010 | 0b00000001 => 0b00001011
0x1 and 0x2 are the hexadecimal notations for 1 and 2. The & is the bitwise AND operator. What these lines do is test the value in nValid whether the least significant bit (0x1) and second least significant bit (0x2) are set.
The scheme goes like this (C notation):
if (val & (1 << bitNumber) == (1 << bitNumber)) {
// The bit at position bitNumber (from least to most significant digit) is set
}
The << is the left bitshift operator. 1 << 0 == 1, 1 << 1 == 2, 1 << 2 == 4, ...
So for better readability the lines should be more like:
if [ $((${nValid} & X)) -eq X ]; then
where X is a power of 2 (instead of mixing hexadecimal and decimal notation).
That could be rewritten as:
if (( nValid & 2#00000001 )); then
#...snip...
fi
if (( nValid & 2#00000010 )); then
#...snip...
fi
with the number of binary digits chosen to be most appropriate for the context. It's not necessary to test for equality if you're only checking one bit*. You could still use the hex representation if it makes more sense. The braces and dollar sign aren't necessary in this context.
You might want to use constants with meaningful names instead of hard-coded values:
declare -r FOO=$((2#00000001))
declare -r BAR=$((2#00000010))
if (( nValid & FOO )); then
#...snip...
fi
if (( nValid & BAR )); then
#...snip...
fi
* You will need to test for equality if you're testing multiple bits at the same time:
if (( (nValid & (FOO | BAR)) == (FOO | BAR) )); then
#...snip...
fi
You will need the extra parentheses since == has a higher precedence than the bitwise operators.
Clearing and setting bits in Bash:
(( var |= FOO )) # set the bits in FOO into var
(( var &= ~BAR )) # clear the bits in BAR from var

Resources