multiple variable definition and assignments - linux

I have a piece of script that basically calculates the amount of space the directories in the current directory use but I want help understanding some of the syntax and language etiquette.
Here is the script:
#!/bin/bash
# This script prints a little histogram of how much space
# the directories in the current working directory use
error () {
echo "Error: $1"
exit $2
} >&2
# Create a tempfile (in a BSD- and Linux-friendly way)
my_mktemp () {
mktemp || mktemp -t hist
} 2> /dev/null
# check we are using bash 4
(( BASH_VERSINFO[0] < 4 )) && error "This script can only be run by bash 4 or higher" 1
# An array to keep all the file sizes
declare -A file_sizes
declare -r tempfile=$(my_mktemp) || error "Cannot create tempfile" 2
# How wide is the terminal?
declare -ir term_cols=$(tput cols)
# Longest file name, Largest file, total file size
declare -i max_name_len=0 max_size=0 total_size=0
# A function to draw a line
drawline () {
declare line=""
declare char="-"
for (( i=0; i<$1; ++i )); do
line="${line}${char}"
done
printf "%s" "$line"
}
# This reads the output from du into an array
# And calculates total size and maximum size, max filename length
read_filesizes () {
while read -r size name; do
file_sizes["$name"]="$size"
(( total_size += size ))
(( max_size < size )) && (( max_size=size ))
(( max_file_len < ${#name} )) && (( max_file_len=${#name} ))
done
}
# run du to get filesizes
# Using a temporary file for output from du
{ du -d 0 */ || du --max-depth 0 *; } 2>/dev/null > "$tempfile"
read_filesizes < "$tempfile"
# The length for each line and percentage for each file
declare -i length percentage
# How many columns may the lines take up?
declare -i cols="term_cols - max_file_len - 10"
for k in "${!file_sizes[#]}"; do
(( length=cols * file_sizes[$k] / max_size ))
(( percentage=100 * file_sizes[$k] / total_size ))
printf "%-${max_file_len}s | %3d%% | %s\n" "$k" "$percentage" $(drawline $length)
done
printf "%d Directories\n" "${#file_sizes[#]}"
printf "Total size: %d blocks\n" "$total_size"
# clean up
rm "$tempfile"
exit 0
In the first and second line of the read_filesizes() function that I highlighted in bold, why are two variables (size name) being created if the name is being assigned to size in the array?
In the same function, (( max_size < size )) && (( max_size=size )) this line seems odd to me because how can the two expressions both be true?
Then in the first line of the for loop, (( **length=cols** * file_sizes[$k] / max_size )) I don't understand why the variable length is assigned to cols..why were they defined separately to begin with?

While I'm not 100% sure of the syntax, it seems clear enough to answer your questions :
First Question
why are two variables (size name) being created if the name is being assigned to size in the array?
It looks like name holds the file name and size holds the file size. Then the assignment file_sizes["$name"]="$size" stores the file sizes indexed by the file names.
Second Question
(( max_size < size )) && (( max_size=size ))
I believe this line assigns size to max_size if the previous value of max_size is smaller than size. The goal is that at the end max_size would hold the size of the largest file.
Third Question
(( length=cols * file_sizes[$k] / max_size ))
This calculates the length of the line that would be displayed for each file (whose goal is probably to illustrate the relative size of the file compared to the largest file). The length of the line is relative to the size of the file. cols is the length of the line that would be displayed for the largest file (the one whose size is max_size). cols = the lengh of the terminal - the length of the longest file name - 10.

Related

Time difference in shell (hour)

I'm trying to calculate time difference stored inside of two variables inside of a shell script, I'm observing the following pattern:
hhmm -> 0950
so:
time1=1333
time2=0950
Now I need to calculate the difference in time between time1 and time2, as for now I have tried:
deltaTime=$(($time1-$time2))
but I'm facing the following error message
1333-0950: value too great for base (error token is "0950")
I'm expecting as a result: $deltaTime=0343
Unfortunately, I am strictly bound to use this time pattern. I have already researched for a solution online, some of them propose to use date -d... but I couldn't get it to work :(
Your approach has two issues.
First issue: bash recognizes numbers with leading zeroes as octal. You can force base10 by adding 10# prefix.
Second issue: it is incorrect to consider strings in hhmm format as numbers and substract them. e.g. 1333-950=383 but difference between 09:50 and 13:33 is 3 hours and 43 minutes. You should convert string values to common units, e.g. to minutes, substract them and convert back to hhmm format.
time1=1333
time2=0950
str2min()
{
printf "%u" $((10#${1%??} * 60 + 10#${1#??}))
}
min2str()
{
printf "%02u%02u" $(($1 / 60)) $(($1 % 60))
}
time1m=$(str2min $time1)
time2m=$(str2min $time2)
timediff=$(($time1m - $time2m))
deltaTime=$(min2str $timediff)
You could use this implementation maybe?
#!/usr/bin/env bash
diff_hhmm() {
local -r from=$1
local -i from_hh=10#${from:0:2} # skip 0 chars, read 2 chars (`${from:0:2}`) using base 10 (`10#`)
local -ri from_mm=10#${from:2:2} # skip 2 chars, read 2 chars (`${from:0:2}`) using base 10 (`10#`)
local -r upto=$2
local -ri upto_hh=10#${upto:0:2}
local -ri upto_mm=10#${upto:2:2}
local -i diff_hh
local -i diff_mm
# Compute difference in minutes
(( diff_mm = from_mm - upto_mm ))
# If it's negative, we've "breached" into the previous hour, so adjust
# the `diff_mm` value to be modulo 60 and compensate the `from_hh` var
# to reflect that we've already subtracted some of the minutes there.
if (( diff_mm < 0 )); then
(( diff_mm += 60 ))
(( from_hh -= 1 ))
fi
# Compute difference in hours
(( diff_hh = from_hh - upto_hh ))
# Ensure the result is modulo 24, the number of hours in a day.
if (( diff_hh < 0 )); then
(( diff_hh += 24 ))
fi
# Print the values with 0-padding if necessary.
printf '%02d%02d\n' "$diff_hh" "$diff_mm"
}
$ diff_hhmm 1333 0950
0343
$ diff_hhmm 0733 0950
2143
$ diff_hhmm 0733 0930
2203
Or an even shorter implementation using a big arithmetic compound command ((( ... )) ) and inlining some variables:
diff_hhmm_terse() {
local -i diff_hh diff_mm
((
diff_mm = 10#${1:2:2} - 10#${2:2:2},
diff_hh = 10#${1:0:2} - 10#${2:0:2},
diff_hh -= diff_mm < 0 ? 1 : 0,
diff_mm += diff_mm < 0 ? 60 : 0,
diff_hh += diff_hh < 0 ? 24 : 0
))
printf '%02d%02d\n' "$diff_hh" "$diff_mm"
}
Do you have the possibility to drop the leading zero?
As you can see from my prompt:
Prompt> echo $((1333-0950))
-bash: 1333-0950: value too great for base (error token is "0950")
Prompt> echo $((1333-950))
383
Other proposal:
date '+%s'
Let me give you some examples:
date '+%s'
1662357975
... (after some time)
date '+%s'
1662458180
=>
echo $((1662458180-1662357975))
100205 (amount of seconds)
=>
echo $(((1662458180-1662357975)/3600))
27 (amount of hours)
This bash one-liner may be used if time difference is not negative (that is, time1 >= time2):
printf '%04d\n' $(( 10#$time1 - 10#$time2 - (10#${time1: -2} < 10#${time2: -2} ? 40 : 0) ))

What to do in order to create a continuous .txt files without replacing the already existing .txt files using bash

I am trying to write a bash script to create multiple .txt files.
With the below code I created the files, but when I run the script again I get the same output instead of having more files with increasing number.
#! /bin/bash
for z in $(seq -w 1 10);
do
[[ ! -f "${z}_name.txt" ]] && {touch "${z}_name.txt";}
done
Based in part on work by Raman Sailopal in a now-deleted answer (and on comments I made about that answer, as well as comments I made about the question), you could use:
shopt -s nullglob
touch $(seq -f '%.0f_name.txt' \
$(printf '%s\n' [0-9]*_name.txt |
awk 'BEGIN { max = 0 }
{ val = $0 + 0; if (val > max) max = val; }
END { print max + 1, max + 10 }'
)
)
The shopt -s nullglob command means that if there are no names that match the glob expression [0-9]*_name.txt, nothing will be generated in the arguments to the printf command.
The touch command is given a list of file names. The seq command formats a range of numbers using zero decimal places (so it formats them as integers) plus the rest of the name (_name.txt). The range is given by the output of printf … | awk …. The printf() command lists file names that start with a digit and end with _name.txt one per line. The awk command keeps a track of the current maximum number; it coerces the name into a number (awk ignores the material after the last digit) and checks whether the number is larger than before. At the end, it prints two values, the largest value plus 1 and the largest value plus 10 (defaulting to 1 and 10 if there were no files). Adding the -w option to seq is irrelevant when you specify -f and a format; the file names won't be generated with leading zeros. There are ways to deal with this if they're crucial — probably simplest is to drop the -f option to seq and add the -w option, and output the output through sed 's/$/_name.txt/'.
You can squish the awk script onto a single line; you can squish the whole command onto a single line. However, it is arguably easier to see the organization of the command when they are spread over multiple lines.
Note that (apart from a possible TOCTOU — Time of Check, Time of Use — issue), there is no need to check whether the files exist. They don't; they'd have been listed by the glob [0-9]*_name.txt if they did, and the number would have been accounted for. If you want to ensure no damage to existing files, you'd need to use set -C or set -o noclobber and then create the files one by one using shell I/O redirection.
[…time passes…]
Actually, you can have awk do the file name generation instead of using seq at all:
touch $(printf '%s\n' [0-9]*_name.txt |
awk 'BEGIN { max = 0 }
{ val = $0 + 0; if (val > max) max = val; }
END { for (i = max + 1; i <= max + 10; i++)
printf "%d_name.txt\n", i
}'
)
And, if you try a bit harder, you can get rid of the printf command too:
touch $(awk 'BEGIN { max = 0
for (i = 1; i <= ARGC; i++)
{
val = ARGV[i] + 0;
if (val > max)
max = val
}
for (i = max + 1; i <= max + 10; i++)
printf "%d_name.txt\n", i
}' [0-9]*_name.txt
)
Don't forget the shopt -s nullglob — that's still needed for maximum resiliency.
You might even choose to get rid of the separate touch command by having awk write to the files:
awk 'BEGIN { max = 0
for (i = 0; i < ARGC; i++)
{
val = ARGV[i] + 0;
if (val > max)
max = val
}
for (i = max + 1; i <= max + 10; i++)
{
name = sprintf("%d_name.txt", i)
printf "" > name
}
exit
}' [0-9]*_name.txt
Note the use of exit. Note that the POSIX specification for awk says that ARGC is the number of arguments in ARGV and that the elements in ARGV are indexed from 0 to ARGC - 1 — as in C programs.
There are few shell scripts that cannot be improved. The first version shown runs 4 commands; the last runs just one. That difference could be quite significant if there were many files to be processed.
Beware: eventually, the argument list generated by the glob will get too big; then you have to do more work. You might be obliged to filter the output from ls (with its attendant risks and dangers) and feed the output (the list of file names) into the awk script and process the lines of input once more. While your lists remain a few thousand files long, it probably won't be a problem.

How to print something to the right-most of the console in Linux shell script

Say I want to search for "ERROR" within a bunch of log files.
I want to print one line for every file that contains "ERROR".
In each line, I want to print the log file path on the left-most edge while the number of "ERROR" on the right-most edge.
I tried using:
printf "%-50s %d" $filePath $errorNumber
...but it's not perfect, since the black console can vary greatly, and the file path sometimes can be quite long.
Just for the pleasure of the eyes, but I am simply incapable of doing so.
Can anyone help me to solve this problem?
Using bash and printf:
printf "%-$(( COLUMNS - ${#errorNumber} ))s%s" \
"$filePath" "$errorNumber"
How it works:
$COLUMNS is the shell's terminal width.
printf does left alignment by putting a - after the %. So printf "%-25s%s\n" foo bar prints "foo", then 22 spaces, then "bar".
bash uses the # as a parameter length variable prefix, so if x=foo, then ${#x} is 3.
Fancy version, suppose the two variables are longer than will fit in one column; if so print them on as many lines as are needed:
printf "%-$(( COLUMNS * ( 1 + ( ${#filePath} + ${#errorNumber} ) / COLUMNS ) \
- ${#errorNumber} ))s%s" "$filePath" "$errorNumber"
Generalized to a function. Syntax is printfLR foo bar, or printfLR < file:
printfLR() { if [ "$1" ] ; then echo "$#" ; else cat ; fi |
while read l r ; do
printf "%-$(( ( 1 + ( ${#l} + ${#r} ) / COLUMNS ) \
* COLUMNS - ${#r} ))s%s" "$l" "$r"
done ; }
Test with:
# command line args
printfLR foo bar
# stdin
fortune | tr -s ' \t' '\n\n' | paste - - | printfLR

Generating multiple files with the same structure

I want to generate a series of files in which the file name of each file shall be increased by 1 (File1.txt, File2.txt, File3.txt, ... FileN.txt) where N = 250
Each file has 2 lines.
AAAXXX (where XXX = 001 to 250 - automatic increased for each file)
BBBYYY (where YYY = 3 digit random number )
Example:
File1.txt:
AAA001
BBB175
File5.txt:
AAA005
BBB067
File102.txt:
AAA102
BBB765
I'm a newbie using Ubuntu Linux 12.04 - but I'm hoping someone can assist.
You can do it as follows:
#!/bin/bash
for i in {1..250}
do
printf "AAA%03d\nBBB%03d" ${i} $(($RANDOM % 1000)) > File${i}.txt
done
Explanation:
for i in {1..250} - bash way of specifying iteration from 1 to 250, increment size of 1.
printf - shell printf command - used to print formatted string
AAA - string literal (means "exactly as written")
%03d - formatted string, this prints a decimal number padded with 3 zero's in front.
\n - newline
BBB - another string literal
%03d - same as before
${i} - this is the value used in the first formatted string (%03d)
$(($RANDOM % 1000)) - $RANDOM is a system variable that provides a random number for you each time you access it. The % 1000 to take the modulo so you get a range betwee 0-999. This is used in the 2nd formatted string (%03d)
> File${i}.txt: output redirection; creates and saves to a file (overwrites if file already exists.
Here's a quick one-liner that might start you off:
for i in {1..250}; do printf "AAA%03d\nBBB%03d" $i $(($RANDOM % 1000)) > "File${i}.txt"; done
Using bash:
for i in {1..250}; do printf "AAA%03d\nBBB%03d\n" "$i" "$((RANDOM%1000))" > "File$i.txt"; done
You can write a bash script for this
#!/bin/bash
for (( i=1; i<=250; i++ ))
do
NUMBER=$[ ( $RANDOM % 999 ) + 100 ]
echo "AAA$i BBB$NUMBER" > File$i.txt
done

Is there a bash command that can tell the size of a shell variable

Is there a way to find the size(memory used) of shell variable from command line, without using C ?
This tells you how many characters are in the value of a scalar variable named "var":
echo ${#var}
This tells you the number of elements in an array named "array":
echo ${#array[#]}
This tells you the number of characters in an element of an array:
echo ${#array[3]}
If you try to get the size of an array and you leave out the [#] index, you get the length of element 0:
$ array=(1 22 333 4444)
$ echo ${#array}
1
$ echo ${#array[#]}
4
$ echo ${#array[2]}
3
If you want the total length of all elements of an array, you could iterate over the array and add them up, you could use IFS and some steps similar to those below, or you could:
$ tmp="${array[*]}"
$ echo $(( ${#tmp} - ${#array[#]} + 1 ))
10
Beware of using the number of elements in an array as the index of the last element since Bash supports sparse arrays:
$ array=(1 22 333 4444 55555)
$ echo ${#array[#]}
5
$ array[9]=999999999
$ echo ${#array[#]}
6
$ echo ${array[${#array[#]} - 1]} # same as echo ${array[6 - 1]}
$ # only a newline is echoed since element 5 is empty (only if "nounset" option* is not set (default in most cases))
$ # when "nounset" option is set (possibly using command "set -u") then bash will print such error:
$ # bash: array[${#array[#]} - 1]: unbound variable
$ unset "array[1]" # always quote array elements when you unset them
$ echo ${#array[#]}
5
$ echo ${array[${#array[#]} - 1]} # same as echo ${array[5 - 1]}
55555
That was obviously not the last element. To get the last element:
$ echo ${array[#]: -1} # note the space before the minus sign
999999999
Note that in the upcoming Bash 4.2, you can do echo ${array[-1]} to get the last element. In versions prior to 4.2, you get a bad subscript error for negative subscripts.
To get the index of the last element:
$ idx=(${!array[#]})
$ echo ${idx[#]: -1}
9
Then you can do:
$ last=${idx[#]: -1}
$ echo ${array[last]}
999999999
To iterate over a sparse array:
for idx in ${!array[#]}
do
something_with ${array[idx]}
done
* I recommend avoiding nounset
wc can tell you how many characters and bytes are in a variable, and bash itself can tell you how many elements are in an array. If what you're looking for is how large bash's internal structures are for holding a specific variable then I don't believe that's available anywhere.
$ foo=42
$ bar=(1 2 3 4)
$ echo -n "$foo" | wc -c -m
2 2
$ echo "${#bar[#]}"
4
For a scalar variable, ${#VAR} gives you the length in characters. In a unibyte locale, this is the length in bytes. The size in bytes is the length of the name in bytes, plus the length of the value in bytes, plus a constant overhead.
LC_ALL=C
name=VAR
size=$(($#name + $#VAR)) # plus a small overhead
If the variable is exported, the size is roughly double.
LC_ALL=C
name=VAR
size=$((($#name + $#VAR) * 2)) # plus a small overhead
For an array variable, you need to sum up the lengths (again, in bytes) of the elements, and add a constant overhead per element plus a constant overhead for the array.
LC_ALL=C
name=VAR
size=$(($#name)) # plus a small overhead
for key in "${!VAR[#]}"; do
size=$((size + ${#key} + ${#VAR[$key]})) # plus a small overhead
done
Here's a minimally tested function that computes the approximate size occupied by a variable. Arrays and exports are taken into account, but not special read-only variables such as $RANDOM. The sizes have been observed on bash 4.2, different versions may have different overheads. You may need to adjust the constants depending on your system types and malloc implementation.
_sizeof_pointer=4
_sizeof_int=4
_malloc_granularity=16
_malloc_overhead=16
## Usage: compute_size VAR
## Print the amount of memory (in bytes) used by VAR.
## The extra bytes used by the memory allocator are not taken into account.
add_size () {
local IFS="+" this extra
set $(($1 + _malloc_overhead))
_size=$((_size + $1))
set $(($1 % _malloc_granularity))
[[ $1 -eq 0 ]] || _size=$((_size + _malloc_granularity - $1))
}
compute_size () {
local LC_ALL=C _size=0 _key
if eval "[ -z \${$1+1} ]"; then echo 0; return; fi
add_size $((_sizeof_pointer*5 + _sizeof_int*2)) # constant overhead
add_size ${#1} # the name
case $(declare -p $1) in
declare\ -x*)
eval "add_size \${#$1}" # the value
eval "add_size \$((\${#1} + \${#$1} + 2))" # the export string
;;
declare\ -a*)
eval 'for _key in "${!'$1'[#]}"; do
add_size $_key
add_size ${#'$1'[$_key]}
add_size $((_sizeof_pointer*4))
done'
add_size $((_sizeof_pointer*2 + _sizeof_int*2))
add_size $((_sizeof_pointer*4))
;;
*)
eval "add_size \${#$1}" # the value
;;
esac
echo $_size
}
${#VAR}
tells you the length of the string VAR

Resources