Script to calculate odd file size

Script to calculate odd file size - linux

I need to write a script which will calculate a total size of files which size is odd number; could you help me please?
#!/bin/bash
echo "Directory <$1> contains the following filenames of odd size:"
ls -l $1 |
while read file_parm
do
size=`echo $file_parm | cut -f 5 -d " "`
name=`echo $file_parm | cut -f 9 -d " "`
let "div=size%2"
if [ ! -d $name ]
then
if [ $div -ne 0 ]
then
# this is listing odd numbers from this
# directory; I just need to add them together
# and print result
echo "[$name : $size]"
fi
fi
done

I virtually copied the code from my comment and ran it, and it worked -- I just had to ensure I had $1 set to somewhere sane, rather than empty.
$ set -- "."; totsize=0; for file in "$1"/*; do if [ -f "$file" ]; then size=$(stat -c '%s' "$file"); if ((size % 2 == 1)); then echo "[$file : $size]"; ((totsize += $size)); fi; fi; done; echo "Total size of odd-sized files = $totsize"
[./bash-assoc-arrays.sh : 417]
[./makefile : 1125]
[./xx.pl : 117]
Total size of odd-sized files = 1659
$
Or, formatted for readability:
set -- "."
totsize=0
for file in "$1"/*
do
if [ -f "$file" ]
then
size=$(stat -c '%s' "$file")
if ((size % 2 == 1))
then
echo "[$file : $size]"
((totsize += $size))
fi
fi
done
echo "Total size of odd-sized files = $totsize"
The repeated invocation of stat is a bit expensive. If you don't have files with newlines in their names (most people don't), you can speed it up with a single invocation of stat and some care:
stat -c '%s %F %n' "$1"/* |
{
totsize=0
while read size type name
do
if [ "X$type" = "X-" ] && ((size % 2 == 1))
then
((totsize+=$size))
echo "[$name : $size]"
fi
done
echo "Total size of odd-sized files = $totsize"
}
You could use (...) in place of {...} at a marginal (unmeasurable) cost in efficiency.
Answers to other questions explain the [ "X$type" = "X-" ] notation.

Related

Bash script: max,min,sum - many sources as parameter

Is it possible to write a script that reads the file containing numbers (one per line) and writes their maximum, minimum and sum. If the file is empty, it will print an appropriate message. The name of the file is to be given as the parameter of the script. I mange to create below script, but there are 2 errors:
./4.3: line 20: syntax error near unexpected token `done'
./4.3: line 20: `done echo "Max: $max" '
Is it possible to add multiple files as parameter?
lines=`cat "$1" | wc -l`
if [ $lines -eq 0 ];
then echo "File $1 is empty!"
exit fi min=`cat "$1" | head -n 1`
max=$min sum=0
while [ $lines -gt 0 ];
do num=`cat "$1" |
tail -n $lines`
if [ $num -gt $max ];
then max=$num
elif [ $num -lt $min ];
then min=$num fiS
sum=$[ $sum + $num] lines=$[ $lines - 1 ]
done echo "Max: $max"
echo "Min: number $min"
echo "Sum: $sum"

Pretty compelling use of GNU datamash here:
read sum min max < <( datamash sum 1 min 1 max 1 < "$1" )
[[ -z $sum ]] && echo "file is empty"
echo "sum=$sum; min=$min; max=$max"
Or, sort and awk:
sort -n "$1" | awk '
NR == 1 { min = $1 }
{ sum += $1 }
END {
if (NR == 0) {
print "file is empty"
} else {
print "min=" min
print "max=" $1
print "sum=" sum
}
}
'

Here's how I'd fix your original attempt, preserving as much of the intent as possible:
#!/usr/bin/env bash
lines=$(wc -l "$1")
if [ "$lines" -eq 0 ]; then
echo "File $1 is empty!"
exit
fi
min=$(head -n 1 "$1")
max=$min
sum=0
while [ "$lines" -gt 0 ]; do
num=$(tail -n "$lines" "$1")
if [ "$num" -gt "$max" ]; then
max=$num
elif [ "$num" -lt "$min" ]; then
min=$num
fi
sum=$(( sum + num ))
lines=$(( lines - 1 ))
done
echo "Max: $max"
echo "Min: number $min"
echo "Sum: $sum"
The dealbreakers were missing linebreaks (can't use exit fi on a single line without ;); other changes are good practice (quoting expansions, useless use of cat), but wouldn't have prevented your script from working; and others are cosmetic (indentation, no backticks).
The overall approach is a massive antipattern, though: you read the whole file for each line being processed.
Here's how I would do it instead:
#!/usr/bin/env bash
for fname in "$#"; do
[[ -s $fname ]] || { echo "file $fname is empty" >&2; continue; }
IFS= read -r min < "$fname"
max=$min
sum=0
while IFS= read -r num; do
(( sum += num ))
(( max = num > max ? num : max ))
(( min = num < min ? num : min ))
done < "$fname"
printf '%s\n' "$fname:" " min: $min" " max: $max" " sum: $sum"
done
This uses the proper way to loop over an input file and utilizes the ternary operator in the arithmetic context.
The outermost for loop loops over all arguments.

You can do the whole thing in one while loop inside a shell script. Here's the bash version:
s=0
while read x; do
if [ ! $mi ]; then
mi=$x
elif [ $mi -gt $x ]; then
mi=$x
fi
if [ ! $ma ]; then
ma=$x
elif [ $ma -lt $x ]; then
ma=$x
fi
s=$((s+x))
done
if [ ! $ma ]; then
echo "File is empty."
else
echo "s=$s, mi=$mi, ma=$ma"
fi
Save that script into a file, and then you can use pipes to send as many input files into it as you wish, like so (assuming the script is called "mysum"):
cat file1 file2 file3 | mysum
or for a single file
mysum < file1
(Make sure, the script is executable and on the $PATH, otherwise use "./mysum" for the script in the current directory or indeed "bash mysum" if it isn't executable.)
The script assumes that the numbers are one per line and that there's nothing else on the line. It gives a message if the input is empty.
How does it work? The "read x" will take input from stdin line-by-line. If the file is empty, the while loop will never be run, and thus variables mi and ma won't be set. So we use this at the end to trigger the appropriate message. Otherwise the loop checks first if the mi and ma variables exist. If they don't, they are initialised with the first x. Otherwise it is checked if the next x requires updating the mi and ma found thus far.
Note that this trick ensures that you can feed-in any sequence of numbers. Otherwise you have to initialise mi with something that's definitely too large and ma with something that's definitely too small - which works until you encounter a strange number list.
Note further, that this works for integers only. If you need to work with floats, then you need to use some other tool than the shell, e.g. awk.
Just for fun, here's the awk version, a one-liner, use as-is or in a script, and it will work with floats, too:
cat file1 file2 file3 | awk 'BEGIN{s=0}; {s+=$1; if(length(mi)==0)mi=$1; if(length(ma)==0)ma=$1; if(mi>$1)mi=$1; if(ma<$1)ma=$1} END{print s, mi, ma}'
or for one file:
awk 'BEGIN{s=0}; {s+=$1; if(length(mi)==0)mi=$1; if(length(ma)==0)ma=$1; if(mi>$1)mi=$1; if(ma<$1)ma=$1} END{print s, mi, ma}' < file1
Downside: if doesn't give a decent error message for an empty file.

a script that reads the file containing numbers (one per line) and writes their maximum, minimum and sum
Bash solution using sort:
<file sort -n | {
read -r sum
echo "Min is $sum"
while read -r num; do
sum=$((sum+num));
done
echo "Max is $num"
echo "Sum is $sum"
}
Let's speed up by using some smart parsing using tee, tr and calculating with bc and if we don't mind using stderr for output. But we could do a little fifo and synchronize tee output. Anyway:
{
<file sort -n |
tee >(echo "Min is $(head -n1)" >&2) >(echo "Max is $(tail -n1)" >&2) |
tr '\n' '+';
echo 0;
} | bc | sed 's/^/Sum is /'
And there is always datamash. The following willl output 3 numbers, being sum, min and max:
<file datamash sum 1 min 1 max 1

You can try with a shell loop and dc
while [ $# -gt 0 ] ; do
dc -f - -e '
['"$1"' is empty]sa
[la p q ]sZ
z 0 =Z
# if file is empty
dd sb sc
# populate max and min with the first value
[d sb]sY
[d lb <Y ]sM
# if max keep it
[d sc]sX
[d lc >X ]sN
# if min keep it
[lM x lN x ld + sd z 0 <B]sB
lB x
# on each line look for max, min and keep the sum
[max for '"$1"' = ] n lb p
[min for '"$1"' = ] n lc p
[sum for '"$1"' = ] n ld p
# print summary at end of each file
' <"$1"
shift
done

BASH SCRIPT: Combining Permutations of Contents in a Directory

I'm currently trying to combine the contents of all .rule files in the rules directory.
For example:
./rules
./numbersFirst.rule
./numbersLast.rule
./lettersFirst.rule
etc.
Each of these files has about 1,000 rules. I need to write a bash script that can output all permutations of each of these rules.
For all the singles, it would just be:
cat rules/*.rule >> ruleSet
Is there any way to do this programmatically and cleverly? For example:
for rule1 in rules/*.rule
do
for rule2 in rules/*.rule
do
if [ $rule1 != $rule 2 ]
then
#read both files and output "$line_rule1 $line_rule2"
#Magic here?
fi
done
done
What about for permutations of 3, 4, ... n files, each with 1,000 lines each? The ideal is to programmatically do this with n files so that I can simple add to the directory and rebuild from this script. Obviously it will be a LOT of combinations!

You can compute cartesian product with GNU parallel if available :
#!/bin/bash
YOUR_DIR="./rules"
ARGS="::: "
NUM=0
for file in $YOUR_DIR/*.rule; do
ARGS="$ARGS $(cat $file | tr "\n" " ") ::: "
NUM=$((NUM+1))
INDEX="$INDEX {$NUM}"
done
if [ ! -z "$ARGS" ]; then
parallel --no-notice -P1 echo $INDEX $ARGS
fi
Or through only recurrence with associative array :
#!/bin/bash
dim=()
YOUR_DIR="./rules"
NUM=0
for file in $YOUR_DIR/*.rule; do
ARGS="$(cat $file | tr "\n" " ")"
dim[$NUM]="$ARGS"
NUM=$((NUM+1))
done
for i in "${!dim[#]}"
do
echo "key : $i"
echo "value: ${dim[$i]}"
done
function iterate {
local index="$2"
if [ "${index}" == "${#dim[#]}" ]; then
for (( i=0; i<=${index}; i++ ))
do
echo -n "${items[$i]} "
done
echo ""
else
for element in ${dim[${index}]}; do
items["${index}"]="${element}"
local it=$((index+1))
iterate items[#] "$it"
done
fi
}
declare -a items=("")
iterate "" 0
You can find a generalization here

Linux: Using split on limited space

I have a huge file on a linux machine. The file is ~20GB and the space on my box is ~25GB. I want to split the file into ~100mb parts. I know theres a 'split' command but that keeps the original file. I don't have enough space to keep the original. Any ideas on how this can be acomplished? I'll even work with any node modules if they make the task easier than bash.

My attempt:
#! /bin/bash
if [ $# -gt 2 -o $# -lt 1 -o ! -f "$1" ]; then
echo "Usage: ${0##*/} <filename> [<split size in M>]" >&2
exit 1
fi
bsize=${2:-100}
bucket=$( echo $bsize '* 1024 * 1024' | bc )
size=$( stat -c '%s' "$1" )
chunks=$( echo $size / $bucket | bc )
rest=$( echo $size % $bucket | bc )
[ $rest -ne 0 ] && let chunks++
while [ $chunks -gt 0 ]; do
let chunks--
fn=$( printf '%s_%03d.%s' "${1%.*}" $chunks "${1##*.}" )
skip=$(( bsize * chunks ))
dd if="$1" of="$fn" bs=1M skip=${skip} || exit 1
truncate -c -s ${skip}M "$1" || exit 1
done
The above assumes bash(1), and Linux implementations of stat(1), dd(1), and truncate(1). It should be pretty much as fast as it gets, since it uses dd(1) to copy chunks of the initial file. It also uses bc(1) to make sure arithmetic operations in the 20GB range don't overflow anything. However, the script was only tested on smaller files, so double check it before running it against your data.

You can use tail and truncate in a shell script to split a file in place, while destroying the original file. We are splitting the file in place backwards so that we can use the truncate. Here is a sample Bash script:
#!/bin/bash
if [ -z "$2" ]; then
echo "Usage: insplit.sh <splitsize> <filename>"
exit 1
fi
FILE="$2"
SPLITSIZE="$1"
FILESIZE=`stat -c '%s' $FILE`
BLOCKCOUNT=$(( (FILESIZE+SPLITSIZE-1)/SPLITSIZE ))
echo "Split count: $BLOCKCOUNT"
BLOCKCOUNT=$(($BLOCKCOUNT-1))
while [ $BLOCKCOUNT -ge 0 ]; do
FNAME="$FILE.$BLOCKCOUNT"
echo "writing $FNAME"
OFFSET=$((BLOCKCOUNT * SPLITSIZE))
BLOCKSIZE=$(( $FILESIZE - $OFFSET))
tail -c "$BLOCKSIZE" $FILE > $FNAME
truncate -s $OFFSET $FILE
FILESIZE=$((FILESIZE-BLOCKSIZE))
BLOCKCOUNT=$(( $BLOCKCOUNT-1 ))
done
I confirmed the results with a random file:
$ dd if=/dev/urandom of=largefile bs=512 count=1000
$ md5sum largefile
7ff913b62ef572265661a85f06417746 largefile
$ ./insplit.sh 200000 largefile
Split count: 3
writing largefile.2
writing largefile.1
writing largefile.0
$ cat largefile.0 largefile.1 largefile.2 | md5sum
7ff913b62ef572265661a85f06417746 -

greping a character from file UNIX.linux bash. Can't pass an argument(file name) through command line

I am having trouble with my newbie linux script which needs to count brackets and tell if they are matched.
#!/bin/bash
file="$1"
x="()(((a)(()))"
left=$(grep -o "(" <<<"$x" | wc -l)
rght=$(grep -o ")" <<<"$x" | wc -l)
echo "left = $left right = $rght"
if [ $left -gt $rght ]
then echo "not enough brackets"
elif [ $left -eq $rght ]
then echo "all brackets are fine"
else echo "too many"
fi
the problem here is i can't pass an argument through command line so that grep would work and count the brackets from the file. In the $x place I tried writing $file but it does not work
I am executing the script by writting: ./script.h test1.txt the file test1.txt is on the same folder as script.h
Any help in explaining how the parameter passing works would be great. Or maybe other way to do this script?

The construct <<< is used to transmit "the contents of a variable", It is not applicable to "contents of files". If you execute this snippet, you could see what I mean:
#!/bin/bash
file="()(((a)((a simple test)))"
echo "$( cat <<<"$file" )"
which is also equivalent to just echo "$file". That is, what is being sent to the console are the contents of the variable "file".
To get the "contents of a file" which name is inside a var called "file", then do:
#!/bin/bash
file="test1.txt"
echo "$( cat <"$file" )"
which is exactly equivalent to echo "$( <"$file" )", cat <"$file" or even <"$file" cat
You can use: grep -o "(" <"$file" or <"$file" grep -o "("
But grep could accept a file as a parameter, so this: grep -o "(" "$file" also works.
However, I believe that tr would be a better command, as this: <"$file" tr -cd "(".
It transforms the whole file into a sequence of "(" only, which will need a lot less to be transmitted (passed) to the wc command. Your script would become, then:
#!/bin/bash -
file="$1"
[[ $file ]] || exit 1 # make sure the var "file" is not empty.
[[ -r $file ]] || exit 2 # test if the file "file" exists.
left=$(<"$file" tr -cd "("| wc -c)
rght=$(<"$file" tr -cd ")"| wc -c)
echo "left = $left right = $rght"
# as $left and $rght are strictly numeric values, this integer tests work:
(( $left > $rght )) && echo "not enough right brackets"
(( $left == $rght )) && echo "all brackets are fine"
(( $left < $rght )) && echo "too many right brackets"
# added as per an additional request of the OP.
if [[ $(<"$file" tr -cd "()"|head -c1) = ")" ]]; then
echo "the first character is (incorrectly) a right bracket"
fi

Why do this sample script, keep outputting error near token?

enter image description hereI was trying to see how a shell scripts work and how to run them, so I toke some sample code from a book I picked up from the library called "Wicked Cool Shell Scripts"
I re wrote the code verbatim, but I'm getting an error from Linux, which I compiled the code on saying:
'd.sh: line 3: syntax error near unexpected token `{
'd.sh: line 3:`gmk() {
Before this I had the curly bracket on the newline but I was still getting :
'd.sh: line 3: syntax error near unexpected token
'd.sh: line 3:`gmk()
#!/bin/sh
#format directory- outputs a formatted directory listing
gmk()
{
#Give input in Kb, output converted to Kb, Mb, or Gb for best output format
if [$1 -ge 1000000]; then
echo "$(scriptbc -p 2 $1/1000000)Gb"
elif [$1 - ge 1000]; then
echo "$$(scriptbc -p 2 $1/1000)Mb"
else
echo "${1}Kb"
fi
}
if [$# -gt 1] ; then
echo "Usage: $0 [dirname]" >&2; exit 1
elif [$# -eq 1] ; then
cd "$#"
fi
for file in *
do
if [-d "$file"] ; then
size = $(ls "$file"|wc -l|sed 's/[^[:digit:]]//g')
elif [$size -eq 1] ; then
echo "$file ($size entry)|"
else
echo "$file ($size entries)|"
fi
else
size ="$(ls -sk "$file" | awk '{print $1}')"
echo "$file ($(gmk $size))|"
fi
done | \
sed 's/ /^^^/g' |\
xargs -n 2 |\
sed 's/\^\^\^/ /g' | \
awk -F\| '{ printf "%39s %-39s\n", $1, $2}'
exit 0
if [$#-gt 1]; then
echo "Usage :$0 [dirname]" >&2; exit 1
elif [$# -eq 1]; then
cd "$#"
fi
for file in *
do
if [ -d "$file" ] ; then
size =$(ls "$file" | wc -l | sed 's/[^[:digit:]]//g')
if [ $size -eq 1 ] ; then
echo "$file ($size entry)|"
else
echo "$file ($size entries)|"
fi
else
size ="$(ls -sk "$file" | awk '{print $1}')"
echo "$file ($(convert $size))|"
fi
done | \
sed 's/ /^^^/g' | \
xargs -n 2 | \
sed 's/\^\^\^/ /g' | \
awk -F\| '{ printf "%-39s %-39s\n", $1, $2 }'
exit 0

sh is very sensitive to spaces. In particular assignment (no spaces around =) and testing (must have spaces inside the [ ]).
This version runs, although fails on my machine due to the lack of scriptbc.
You put an elsif in a spot where it was supposed to be if.
Be careful of column alignment between starts and ends. If you mismatch them it will easily lead you astray in thinking about how this works.
Also, adding a set -x near the top of a script is a very good way of debugging what it is doing - it will cause the interpreter to output each line it is about to run before it does.
#!/bin/sh
#format directory- outputs a formatted directory listing
gmk()
{
#Give input in Kb, output converted to Kb, Mb, or Gb for best output format
if [ $1 -ge 1000000 ]; then
echo "$(scriptbc -p 2 $1/1000000)Gb"
elif [ $1 -ge 1000 ]; then
echo "$(scriptbc -p 2 $1/1000)Mb"
else
echo "${1}Kb"
fi
}
if [ $# -gt 1 ] ; then
echo "Usage: $0 [dirname]" >&2; exit 1
elif [ $# -eq 1 ] ; then
cd "$#"
fi
for file in *
do
if [ -d "$file" ] ; then
size=$(ls "$file"|wc -l|sed 's/[^[:digit:]]//g')
if [ $size -eq 1 ] ; then
echo "$file ($size entry)|"
else
echo "$file ($size entries)|"
fi
else
size="$(ls -sk "$file" | awk '{print $1}')"
echo "$file ($(gmk $size))|"
fi
done | \
sed 's/ /^^^/g' |\
xargs -n 2 |\
sed 's/\^\^\^/ /g' | \
awk -F\| '{ printf "%39s %-39s\n", $1, $2}'
exit 0

By the way, with respect to the book telling you to modify your PATH variable, that's really a bad idea, depending on what exactly it advised you to do. Just to be clear, never add your current directory to the PATH variable unless you intend on making that directory a permanent location for all of your scripts etc. If you are making this a permanent location for your scripts, make sure you add the location to the END of your PATH variable, not the beginning, otherwise you are creating a major security problem.
Linux and Unix do not add your current location, commonly called your PWD, or present working directory, to the path because someone could create a script called 'ls', for example, which could run something malicious instead of the actual 'ls' command. The proper way to execute something in your PWD, is to prepend it with './' (e.g. ./my_new_script.sh). This basically indicates that you really do want to run something from your PWD. Think of it as telling the shell "right here". The '.' actually represents your current directory, in other words "here".

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Script to calculate odd file size - linux

Related

Bash script: max,min,sum - many sources as parameter

BASH SCRIPT: Combining Permutations of Contents in a Directory

Linux: Using split on limited space

greping a character from file UNIX.linux bash. Can't pass an argument(file name) through command line

Why do this sample script, keep outputting error near token?

Categories

Resources