How I can generate random number between 0-60 in sh (/bin/sh, not bash)? This is a satellite box, there is no $RANDOM variable, and other goods [cksum, od (od -vAn -N4 -tu4 < /dev/urandom)].
I want to randomize a crontab job's time.
If you have tr, head and /dev/urandom, you can write this:
tr -cd 0-9 </dev/urandom | head -c 3
Then you have to use the remainder operator to put in 0-60 range.
How about using the nanoseconds of system time?
date +%N
It isn't like you need cryptographically useful numbers here.
Depending on which version of /bin/sh it is, you may be able to do:
$(( date +%N % 60 ))
If it doesn't support the $(()) syntax, but you have dc, you could try:
dc -e `date +%N`' 60 % p'
Without knowing which operating system, version of /bin/sh or what
tools are available it is hard to come up with a solution guaranteed to work.
Do you have awk? You can call awk's rand() function. For instance:
awk 'BEGIN { printf("%d\n",rand()*60) }' < /dev/null
I know this post is old, but the suggested answers are not generating uniform unbiased random numbers. The accepted answer is essentially this:
% echo $(( $(tr -cd 0-9 </dev/urandom | head -c 3) % 60))
The problem with this suggestion is that by choosing a 3-digit number from /dev/urandom, the range is from 0-999, a total of 1,000 numbers. However, 1,000 does not divide into 60 evenly. As such, you'll be biased towards generating 0-959 just slightly more than 960-999.
The second answer, while creative in using nanoseconds from your clock, suffers from the same biased approach:
% echo $(( $(date +%N) % 60 ))
The range for nanoseconds is 0-999,999,999, which is 1 billion numbers. So, if you're dividing that result by 60, you'll again be biased towards generating 0-999,999,959 slightly more than 999,999,960-999,999,999.
All the rest of the answers are the same- biased non-uniform generation.
To generate unbiased uniform random numbers in the range of 0-59 (is what I assume he means rather than 0-60, if he's attempting to randomize a crontab(1) entry), we need to force the output to be a multiple of 60.
First, we'll generate a random 32-bit number between 0 and 4294967295:
% RNUM=$(od -An -N4 -tu2 /dev/urandom | awk '{print $1}')
Now we'll force our range to be between $MIN and 4294967295 that is a multiple of 60:
% MIN=$((2**32 % 60)) # 16
This means:
4294967296 - 16 = 4294967280
4294967280 / 60 = 71582788.0
In other words, my range of [16, 4294967295] is exactly a multiple of 60. So, every number I generate in that range, then divide by 60, will be equally likely as any other number. Thus, I have an unbiased generator of numbers 0-59 (or 1-60 if you add 1).
The only thing left to do is make sure that my number is between 16 and 4294967295. If my number is less than 16, then I'll need to generate a new number:
% while [ $RNUM -lt $MIN ]; do RNUM=$(od -An -N1 -tu2 /dev/urandom); done
% MINUTE=$(($RNUM % 60))
Everything put together for copy/paste goodnees:
#!/bin/bash
RNUM=$(od -An -N4 -tu2 /dev/urandom | awk '{print $1}')
MIN=$((2**32 % 60))
while [ $RNUM -lt $MIN ]; do RNUM=$(od -An -N1 -tu2 /dev/urandom); done
MINUTE=$(($RNUM % 60))
value=`od -An -N2 -tu2 /dev/urandom`
minutes=`expr $value % 60`
The seed will be between 0 and 65535, which is not an even multiple of 60, so minutes 0-15 have a slightly greater chance ob being chosen, but the discrepancy is probably not important.
If you want to achieve perfection, use "od -An -N1 -tu1" and loop until value is less than 240.
Tested with busybox od.
Beware of errors when generated number starts by 0 and has other digits greater than 7, as it is interpreted as octal, I would propose:
tr -cd 0-9 </dev/urandom | head -c 4 | sed -e 's/^00*//
specially in case you want to process it any further, for example to establish a range:
RANDOM=`tr -cd 0-9 </dev/urandom | head -c 4 | sed -e 's/^00*//'`
RND50=$((($RANDOM%50)+1)) // random number between 1 and 50
Related
I have a file with unknown number of lines(but even number of lines). I want to print them side by side based on total number of lines in that file. For example, I have a file with 16 lines like below:
asdljsdbfajhsdbflakjsdff235
asjhbasdjbfajskdfasdbajsdx3
asjhbasdjbfajs23kdfb235ajds
asjhbasdjbfajskdfbaj456fd3v
asjhbasdjb6589fajskdfbaj235
asjhbasdjbfajs54kdfbaj2f879
asjhbasdjbfajskdfbajxdfgsdh
asjhbasdf3709ddjbfajskdfbaj
100
100
150
125
trh77rnv9vnd9dfnmdcnksosdmn
220
225
sdkjNSDfasd89asdg12asdf6asdf
So now i want to print them side by side. as they have 16 lines in total, I am trying to get the results 8:8 like below
asdljsdbfajhsdbflakjsdff235 100
asjhbasdjbfajskdfasdbajsdx3 100
asjhbasdjbfajs23kdfb235ajds 150
asjhbasdjbfajskdfbaj456fd3v 125
asjhbasdjb6589fajskdfbaj235 trh77rnv9vnd9dfnmdcnksosdmn
asjhbasdjbfajs54kdfbaj2f879 220
asjhbasdjbfajskdfbajxdfgsdh 225
asjhbasdf3709ddjbfajskdfbaj sdkjNSDfasd89asdg12asdf6asdf
paste command did not work for me exactly, (paste - - - - - - - -< file1) nor the awk command that I used awk '{printf "%s" (NR%2==0?RS:FS),$1}'
Note: The number of lines in a file dynamic. The only known thing in my scenario is, they are even number all the time.
If you have the memory to hash the whole file ("max" below):
$ awk '{
a[NR]=$0 # hash all the records
}
END { # after hashing
mid=int(NR/2) # compute the midpoint, int in case NR is uneven
for(i=1;i<=mid;i++) # iterate from start to midpoint
print a[i],a[mid+i] # output
}' file
If you have the memory to hash half of the file ("mid"):
$ awk '
NR==FNR { # on 1st pass hash second half of records
if(FNR>1) { # we dont need the 1st record ever
a[FNR]=$0 # hash record
if(FNR%2) # if odd record
delete a[int(FNR/2)+1] # remove one from the past
}
next
}
FNR==1 { # on the start of 2nd pass
if(NR%2==0) # if record count is uneven
exit # exit as there is always even count of them
offset=int((NR-1)/2) # compute offset to the beginning of hash
}
FNR<=offset { # only process the 1st half of records
print $0,a[offset+FNR] # output one from file, one from hash
next
}
{ # once 1st half of 2nd pass is finished
exit # just exit
}' file file # notice filename twice
And finally if you have awk compiled into a worms brain (ie. not so much memory, "min"):
$ awk '
NR==FNR { # just get the NR of 1st pass
next
}
FNR==1 {
mid=(NR-1)/2 # get the midpoint
file=FILENAME # filename for getline
while(++i<=mid && (getline line < file)>0); # jump getline to mid
}
{
if((getline line < file)>0) # getline read from mid+FNR
print $0,line # output
}' file file # notice filename twice
Standard disclaimer on getline and no real error control implemented.
Performance:
I seq 1 100000000 > file and tested how the above solutions performed. Output was > /dev/null but writing it to a file lasted around 2 s longer. max performance is so-so as the mem print was 88 % of my 16 GB so it might have swapped. Well, I killed all the browsers and shaved off 7 seconds for the real time of max.
+------------------+-----------+-----------+
| which | | |
| min | mid | max |
+------------------+-----------+-----------+
| time | | |
| real 1m7.027s | 1m30.146s | 0m48.405s |
| user 1m6.387s | 1m27.314 | 0m43.801s |
| sys 0m0.641s | 0m2.820s | 0m4.505s |
+------------------+-----------+-----------+
| mem | | |
| 3 MB | 6.8 GB | 13.5 GB |
+------------------+-----------+-----------+
Update:
I tested #DavidC.Rankin's and #EdMorton's solutions and they ran, respectively:
real 0m41.455s
user 0m39.086s
sys 0m2.369s
and
real 0m39.577s
user 0m37.037s
sys 0m2.541s
Mem print was about the same as my mid had. It pays to use the wc, it seems.
$ pr -2t file
asdljsdbfajhsdbflakjsdff235 100
asjhbasdjbfajskdfasdbajsdx3 100
asjhbasdjbfajs23kdfb235ajds 150
asjhbasdjbfajskdfbaj456fd3v 125
asjhbasdjb6589fajskdfbaj235 trh77rnv9vnd9dfnmdcnksosdmn
asjhbasdjbfajs54kdfbaj2f879 220
asjhbasdjbfajskdfbajxdfgsdh 225
asjhbasdf3709ddjbfajskdfbaj sdkjNSDfasd89asdg12asdf6asdf
if you want just one space between columns, change to
$ pr -2ts' ' file
You can also do it with awk simply by storing the first-half of the lines in an array and then concatenating the second half to the end, e.g.
awk -v nlines=$(wc -l < file) -v j=0 'FNR<=nlines/2{a[++i]=$0; next} j<i{print a[++j],$1}' file
Example Use/Output
With your data in file, then
$ awk -v nlines=$(wc -l < file) -v j=0 'FNR<=nlines/2{a[++i]=$0; next} j<i{print a[++j],$1}' file
asdljsdbfajhsdbflakjsdff235 100
asjhbasdjbfajskdfasdbajsdx3 100
asjhbasdjbfajs23kdfb235ajds 150
asjhbasdjbfajskdfbaj456fd3v 125
asjhbasdjb6589fajskdfbaj235 trh77rnv9vnd9dfnmdcnksosdmn
asjhbasdjbfajs54kdfbaj2f879 220
asjhbasdjbfajskdfbajxdfgsdh 225
asjhbasdf3709ddjbfajskdfbaj sdkjNSDfasd89asdg12asdf6asdf
Extract the first half of the file and the last half of the file and merge the lines:
paste <(head -n $(($(wc -l <file.txt)/2)) file.txt) <(tail -n $(($(wc -l <file.txt)/2)) file.txt)
You can use columns utility from autogen:
columns -c2 --by-columns file.txt
You can use column, but the count of columns is calculated in a strange way from the count of columns of your terminal. So assuming your lines have 28 characters, you also can:
column -c $((28*2+8)) file.txt
I do not want to solve this, but if I were you:
wc -l file.txt
gives number of lines
echo $(($(wc -l < file.txt)/2))
gives a half
head -n $(($(wc -l < file.txt)/2)) file.txt > first.txt
tail -n $(($(wc -l < file.txt)/2)) file.txt > last.txt
create file with first half and last half of the original file. Now you can merge those files together side by side as it was described here .
Here is my take on it using the bash shell wc(1) and ed(1)
#!/usr/bin/env bash
array=()
file=$1
total=$(wc -l < "$file")
half=$(( total / 2 ))
plus1=$(( half + 1 ))
for ((m=1;m<=half;m++)); do
array+=("${plus1}m$m" "${m}"'s/$/ /' "${m}"',+1j')
done
After all of that if just want to print the output to stdout. Add the line below to the script.
printf '%s\n' "${array[#]}" ,p Q | ed -s "$file"
If you want to write the changes directly to the file itself, Use this code instead below the script.
printf '%s\n' "${array[#]}" w | ed -s "$file"
Here is an example.
printf '%s\n' {1..10} > file.txt
Now running the script against that file.
./myscript file.txt
Output
1 6
2 7
3 8
4 9
5 10
Or using bash4+ feature mapfile aka readarray
Save the file in an array named array.
mapfile -t array < file.txt
Separate the files.
left=("${array[#]::((${#array[#]} / 2))}") right=("${array[#]:((${#array[#]} / 2 ))}")
loop and print side-by-side
for i in "${!left[#]}"; do
printf '%s %s\n' "${left[i]}" "${right[i]}"
done
What you said The only known thing in my scenario is, they are even number all the time. That solution should work.
(Similar to How to interleave lines from two text files but for a single input. Also similar to Sort lines by group and column but interleaving or randomizing versus sorting.)
I have a set of systems and tasks in two columns, SYSTEM,TASK:
alpha,90198500
alpha,93082105
alpha,30184438
beta,21700055
beta,33452909
beta,40850198
beta,82645731
gamma,64910850
I want to distribute the tasks to each system in a balanced way. The ideal case where each system has the same number of tasks would be round-robin, one alpha then one beta then one gamma and repeat until finished.
I get the whole list of tasks + systems at once, so I don't need to keep any state
The list of systems is not static, on the order of N=100
The total number of tasks is variable, on the order of N=500
The number of tasks for each system is not guaranteed to be equal
Hard / absolute interleaving isn't required, as long as there aren't two of the same system twice in a row
The same task may show up more than once, but not for the same system
Input format / delimiter can be changed
I can solve this well enough with some fancy scripting to split the data into multiple files (grep ^alpha, input > alpha.txt etc) and then recombine them with paste or similar, but I'd like to use a single command or set of pipes to run it without intermediate files or a proper scripting language. Just using sort -R gets me 95% of the way there, but I end up with 2 tasks for the same system in a row almost every time, and sometimes 3 or more depending on the initial distribution.
edit:
To clarify, any output should not have the same system on two lines in a row. All system,task pairs must be preserved, you can't move a task from one system to another - that'd make this really easy!
One of several possible sample outputs:
beta,40850198
alpha,90198500
beta,82645731
alpha,93082105
gamma,64910850
beta,21700055
alpha,30184438
beta,33452909
We start with by answering the underlying theoretical problem. The problem is not as simple as it seems. Feel free to implement a script based on this answer.
The blocks formatted as quotes are not quotes. I just wanted to highlight them to improve navigation in this rather long answer.
Theoretical Problem
Given a finite set of letters L with frequencies f : L→ℕ0, find a sequence of letters such that every letter ℓ appears exactly f(ℓ) times and adjacent elements of the sequence are always different.
Example
L = {a,b,c} with f(a)=4, f(b)=2, f(c)=1
ababaca, acababa, and abacaba are all valid solutions.
aaaabbc is invalid – Some adjacent elements are equal, for instance aa or bb.
ababac is invalid – The letter a appears 3 times, but its frequency is f(a)=4
cababac is invalid – The letter c appears 2 times, but its frequency is f(c)=1
Solution
The following approach produces a valid sequence if and only if there exists a solution.
Sort the letters by their frequencies.
For ease of notation we assume, without loss of generality, that f(a) ≥ f(b) ≥ f(c) ≥ ... ≥ 0.
Note: There exists a solution if and only if f(a) ≤ 1 + ∑ℓ≠a f(ℓ).
Write down a sequence s of f(a) many a.
Add the remaining letters into a FIFO working list, that is:
(Don't add any a)
First add f(b) many b
Then f(c) many c
and so on
Iterate from left to right over the sequence s and insert after each element a letter from the working list. Repeat this step until the working list is empty.
Example
L = {a,b,c,d} with f(a)=5, f(b)=5, f(c)=4, f(d)=2
The letters are already sorted by their frequencies.
s = aaaaa
workinglist = bbbbbccccdd. The leftmost entry is the first one.
We iterate from left to right. The places where we insert letters from the working list are marked with an _ underscore.
s = a_a_a_a_a_ workinglist = bbbbbccccdd
s = aba_a_a_a_ workinglist = bbbbccccdd
s = ababa_a_a_ workinglist = bbbccccdd
...
s = ababababab workinglist = ccccdd
⚠️ We reached the end of sequence s. We repeat step 4.
s = a_b_a_b_a_b_a_b_a_b_ workinglist = ccccdd
s = acb_a_b_a_b_a_b_a_b_ workinglist = cccdd
...
s = acbcacb_a_b_a_b_a_b_ workinglist = cdd
s = acbcacbca_b_a_b_a_b_ workinglist = dd
s = acbcacbcadb_a_b_a_b_ workinglist = d
s = acbcacbcadbda_b_a_b_ workinglist =
⚠️ The working list is empty. We stop.
The final sequence is acbcacbcadbdabab.
Implementation In Bash
Here is a bash implementation of the proposed approach that works with your input format. Instead of using a working list each line is labeled with a binary floating point number specifying the position of that line in the final sequence. Then the lines are sorted by their labels. That way we don't have to use explicit loops. Intermediate results are stored in variables. No files are created.
#! /bin/bash
inputFile="$1" # replace $1 by your input file or call "./thisScript yourFile"
inputBySys="$(sort "$inputFile")"
sysFreqBySys="$(cut -d, -f1 <<< "$inputBySys" | uniq -c | sed 's/^ *//;s/ /,/')"
inputBySysFreq="$(join -t, -1 2 -2 1 <(echo "$sysFreqBySys") <(echo "$inputBySys") | sort -t, -k2,2nr -k1,1)"
maxFreq="$(head -n1 <<< "$inputBySysFreq" | cut -d, -f2)"
lineCount="$(wc -l <<< "$inputBySysFreq")"
increment="$(awk '{l=log($1/$2)/log(2); l=int(l)-(int(l)>l); print 2^l}' <<< "$maxFreq $lineCount")"
seq="$({ echo obase=2; seq 0 "$increment" "$maxFreq" | head -n-1; } | bc |
awk -F. '{sub(/0*$/,"",$2); print 0+$1 "," $2 "," length($2)}' |
sort -snt, -k3,3 -k2,2 | head -n "$lineCount")"
paste -d, <(echo "$seq") <(echo "$inputBySysFreq") | sort -nt, -k1,1 -k2,2 | cut -d, -f4,6
This solution could fail for very long input files due to the limited precision of floating point numbers in seq and awk.
Well, this is what I've come up with:
args=()
while IFS=' ' read -r _ name; do
# add a file redirection with grepped certain SYSTEM only for later eval
args+=("<(grep '^$name,' file)")
done < <(
# extract SYSTEM only
<file cut -d, -f1 |
#sort with the count
sort | uniq -c | sort -nr
)
# this is actually safe, because we control all arguments
eval paste -d "'\\n'" "${args[#]}" |
# paste will insert empty lines when the list ended - remove them
sed '/^$/d'
First, I extract and sort the SYSTEM names in the order which occurs the most often to be first. So for the input example we get:
4 beta
3 alpha
1 gamme
Then for each such name I add the proper string <(grep '...' file) to arguments list witch will be later evalulated.
Then I evalulate the call to paste <(grep ...) <(grep ...) <(grep ...) ... with newline as the paste's delimeter. I remove empty lines with simple sed call.
The output for the input provided:
beta,21700055
alpha,90198500
gamma,64910850
beta,33452909
alpha,93082105
beta,40850198
alpha,30184438
beta,82645731
Converted to a fancy oneliner, with substituting the while read with command substitution and sed. Got safe with inputfile naming with printf "%q" "$inputfile" and double quoting inside sed regex.
inputfile="file"
fieldsep=","
eval paste -d '"\\n"' "$(
cut -d "$fieldsep" -f1 "$inputfile" |
sort | uniq -c | sort -nr |
sed 's/^[[:space:]]*[0-9]\+[[:space:]]*\(.*\)$/<(grep '\''^\1'"$fieldsep"\'' "'"$(printf "%q" "$inputfile")"'")/' |
tr '\n' ' '
)" |
sed '/^$/d'
inputfile="inputfile"
fieldsep=","
# remember SYSTEMS with it's occurrence counts
counts=$(cut -d "$fieldsep" -f1 "$inputfile" | sort | uniq -c)
# remember last outputted system name
lastsys=''
# until there are any systems with counts
while ((${#counts})); do
# get the most occurrented system with it's count from counts
IFS=' ' read -r cnt sys < <(
# if lastsys is empty, don't do anything, if not, filter it out
if [ -n "$lastsys" ]; then
grep -v " $lastsys$";
else
cat;
# ha suprise - counts is here!
# probably would be way more readable with just `printf "%s" "$counts" |`
fi <<<"$counts" |
# with the most occurence
sort -n | tail -n1
)
if [ -z "$cnt" ]; then
echo "ERROR: constructing output is not possible! There have to be duplicate system lines!" >&2
exit 1
fi
# update counts - decrement the count of this system, or remove it if count is 1
counts=$(
# remove current system from counts
<<<"$counts" grep -v " $sys$"
# if the count of the system is 1, don't add it back - it's count is now 0
if ((cnt > 1)); then
# decrement count and add the line with system to counts
printf "%s" "$((cnt - 1)) $sys"
fi
)
# finally print output
printf "%s\n" "$sys"
# and remember last system
lastsys="$sys"
done |
{
# get system names only in `system` - using cached counts variable
# for each system name open a grep for that name from the input file
# with asigned file descritpro
# The file descriptor list is saved in an array `fds`
fds=()
systems=""
while IFS=' ' read -r _ sys; do
exec {fd}< <(grep "^$sys," "$inputfile")
fds+=("$fd")
systems+="$sys"$'\n'
done <<<"$counts"
# for each line in input
while IFS='' read -r sys; do
# get the position inside systems list of that system decremented by 1
# this will be the underlying filesystem for filtering that system out of input
fds_idx=$(<<<"$systems" grep -n "$sys" | cut -d: -f1)
fds_idx=$((fds_idx - 1))
# read one line from that file descriptor
# I wonder is `sed 1p` would be faster
IFS='' read -r -u "${fds[$fds_idx]}" line
# output that line
printf "%s\n" "$line"
done
}
To accommodate for strange input values this script implements somewhat simple but hardy in bash statemachine.
The variable counts stores SYSTEM names with their're occurrence count. So from the example input it will be
4 alpha
3 beta
1 gamma
Now - we output the SYSTEM name with the biggest occurrence count that is also different from the last outputted SYSTEM name. We decrement it's occurrence count. If the count is equal to zero, it is removed from the list. We remember the last outputted SYSTEM name. We repeat this process until all occurrence counts reach zero, so the list is empty. For the example input this will output:
beta
alpha
beta
alpha
beta
alpha
beta
gamma
Now, we need to join that list with the job names. We can't use join as the input is not sorted and we don't want to change the ordering. So what I do, I get only SYSTEM names in system. Then for each system I open a different file descriptor with filtered only that SYSTEM name from the input file. All the file descriptors are stored in an array. Then for each SYSTEM name from the input, I find the file descriptor that filters that SYSTEM name from the input file and read exactly one line from the file descriptor. This works like an array of file positions each file position associated / filtering specified SYSTEM name.
beta,21700055
alpha,90198500
beta,33452909
alpha,93082105
beta,40850198
alpha,30184438
beta,82645731
gamma,64910850
The script was done so for the input in the form of:
alpha,90198500
alpha,93082105
alpha,30184438
beta,21700055
gamma,64910850
the script outputs correctly:
alpha,90198500
gamma,64910850
alpha,93082105
beta,21700055
alpha,30184438
I think this algorithm will mostly always print correct output, but the ordering is so that the least common SYSTEMs will be outputted last, which may be not optimal.
Tested manually with some custom tests and checker on paiza.io.
inputfile="inputfile"
in=( 1 2 1 5 )
cat <<EOF > "$inputfile"
$(seq ${in[0]} | sed 's/^/A,/' )
$(seq ${in[1]} | sed 's/^/B,/' )
$(seq ${in[2]} | sed 's/^/C,/' )
$(seq ${in[3]} | sed 's/^/D,/' )
EOF
sed -i -e '/^$/d' "$inputfile"
inputfile="inputfile"
fieldsep=","
# remember SYSTEMS with it's occurrence counts
counts=$(cut -d "$fieldsep" -f1 "$inputfile" | sort | uniq -c)
# I think this holds true
# The SYSTEM with the most count should be lower than the sum of all others
# remember last outputted system name
lastsys=''
# until there are any systems with counts
while ((${#counts})); do
# get the most occurrented system with it's count from counts
IFS=' ' read -r cnt sys < <(
# if lastsys is empty, don't do anything, if not, filter it out
if [ -n "$lastsys" ]; then
grep -v " $lastsys$";
else
cat;
# ha suprise - counts is here!
# probably would be way more readable with just `printf "%s" "$counts" |`
fi <<<"$counts" |
# with the most occurence
sort -n | tail -n1
)
if [ -z "$cnt" ]; then
echo "ERROR: constructing output is not possible! There have to be duplicate system lines!" >&2
exit 1
fi
# update counts - decrement the count of this system, or remove it if count is 1
counts=$(
# remove current system from counts
<<<"$counts" grep -v " $sys$"
# if the count of the system is 1, don't add it back - it's count is now 0
if ((cnt > 1)); then
# decrement count and add the line with system to counts
printf "%s" "$((cnt - 1)) $sys"
fi
)
# finally print output
printf "%s\n" "$sys"
# and remember last system
lastsys="$sys"
done |
{
# get system names only in `system` - using cached counts variable
# for each system name open a grep for that name from the input file
# with asigned file descritpro
# The file descriptor list is saved in an array `fds`
fds=()
systems=""
while IFS=' ' read -r _ sys; do
exec {fd}< <(grep "^$sys," "$inputfile")
fds+=("$fd")
systems+="$sys"$'\n'
done <<<"$counts"
# for each line in input
while IFS='' read -r sys; do
# get the position inside systems list of that system decremented by 1
# this will be the underlying filesystem for filtering that system out of input
fds_idx=$(<<<"$systems" grep -n "$sys" | cut -d: -f1)
fds_idx=$((fds_idx - 1))
# read one line from that file descriptor
# I wonder is `sed 1p` would be faster
IFS='' read -r -u "${fds[$fds_idx]}" line
# output that line
printf "%s\n" "$line"
done
} |
{
# check if the output is correct
output=$(cat)
# output should have same lines as inputfile
if ! cmp <(sort "$inputfile") <(<<<"$output" sort); then
echo "Output does not match input!" >&2
exit 1
fi
# two consecutive lines can't have the same system
lastsys=""
<<<"$output" cut -d, -f1 |
while IFS= read -r sys; do
if [ -n "$lastsys" -a "$lastsys" = "$sys" ]; then
echo "Same systems found on two consecutive lines!" >&2
exit 1
fi
lastsys="$sys"
done
# all ok
echo "all ok!"
echo -------------
printf "%s\n" "$output"
}
exit
How can I get the value of up from below command on linux?
# w
01:16:08 up 20:29, 1 user, load average: 0.50, 0.34, 0.30
USER TTY LOGIN# IDLE JCPU PCPU WHAT
root pts/0 00:57 0.00s 0.11s 0.02s w
# w | grep up
01:16:17 up 20:29, 1 user, load average: 0.42, 0.33, 0.29
On Linux, the easiest way to get the uptime in (fractional) seconds is via the 1st field of /proc/uptime (see man proc):
$ cut -d ' ' -f1 /proc/uptime
350735.47
To format that number the same way that w and uptime do, using awk:
$ awk '{s=int($1);d=int(s/86400);h=int(s % 86400/3600);m=int(s % 3600 / 60);
printf "%d days, %02d:%02d\n", d, h, m}' /proc/uptime
4 days, 01:25 # 4 days, 1 hour, and 25 minutes
To answer the question as asked - parsing the output of w (or uptime, whose output is the same as w's 1st output line, which contains all the information of interest), which also works on macOS/BSD, with a granularity of integral seconds:
A perl solution:
<(uptime) is a Bash process substitution that provides uptime's output as input to the perl command - see bottom.
$ perl -nle 'print for / up +((?:\d+ days?, +)?[^,]+)/' <(uptime)
4 days, 01:25
This assumes that days is the largest unit every displayed.
perl -nle tells Perl to process the input line by line, without printing any output by default (-n), automatically stripping the trailing newline from each input line on input, and automatically appending one on output (-l); -e tells Perl to treat the next argument as the script (expression) to process.
print for /.../ tells Perl to output what each capture group (...) inside regex /.../ captures.
up + matches literal up, preceded by (at least) one space and followed by 1 or more spaces (+)
(?:\d+ days?, +)? is a non-capturing subexpression - due to ?: - that matches:
1 or more digits (\d+)
followed by a single space
followed by literal day, optionally followed by a literal s (s?)
the trailing ? makes the entire subexpression optional, given that a number-of-days part may or may not be present.
[^,]+ matches 1 or more (+) subsequent characters up to, but not including a literal , ([^,]) - this is the hh:mm part.
The overall capture group - the outer (...) therefore captures the entire up-time expression - whether composed of hh:mm only, or preceded by <n> day/s> - and prints that.
<(uptime) is a Bash process substitution (<(...))
that, loosely speaking, presents uptime's output as a (temporary, self-deleting) file that perl can read via stdin.
Something like this with gnu sed:
$ w |head -n1
02:06:19 up 3:42, 1 user, load average: 0.01, 0.05, 0.13
$ w |sed -r '1 s/.*up *(.*),.*user.*/\1/g;q'
3:42
$ echo "18:35:23 up 18 days, 9:08, 6 users, load average: 0.09, 0.31, 0.41" \
|sed -r '1 s/.*up *(.*),.*user.*/\1/g;q'
18 days, 9:08
Given that the format of the uptime depends on whether it is less or more than 24 hours, the best I could come up with is a double awk:
$ w
18:35:23 up 18 days, 9:08, 6 users,...
$ w | awk -F 'user|up ' 'NF > 1 {print $2}' \
| awk -F ',' '{for(i = 1; i < NF; i++) {printf("%s ",$i)}} END{print ""}'
18 days 9:08
I know this example of sar sar -u 1 3 which gives statistics for the next 3 seconds with 1 second interval .
However sar also keeps on collecting the information in background (My cron set to collect stats for every minute ) . Is there any way I can simply query using sar command to tell the last 5 mins statistics and its average .
Right now I am using following below command
interval=5; sar -f /var/log/sysstat/sa22 | tail -n $interval | head -n -1 | awk '{print $4+$6}'| awk '{s+=$1} END {print s/$interval}'
to check the overall cpu usage in last 5 min .
Is there a better way ?
Unfortunately when using the -f option in sar together with interval and count it doesn't return the average value for the given interval (as you would expect). Instead it always returns the first recorded value in the sar file
The only way to work around that is to use the -s option which allows you to specify a time at which to start your sampling period. I've provided a perl script below that finishes with a call to sar that is constructed in a way that will return what you're looking for.
Hope this helps.
Peter Rhodes.
#!/usr/bin/perl
$interval = 300; # seconds.
$epoch = `date +%s`;
$epoch -= $interval;
$time = `date -d \#$epoch +%H:%M:00`;
$dom = `date +%d`;
chomp($time,$dom);
system("sar -f /var/log/sysstat/sa$dom -B -s $time 300 1");
I am trying to write a shell program to determine the average word length in a file. I'm assuming I need to use wc and expr somehow. Guidance in the right direction would be great!
Assuming your file is ASCII and wc can indeed read it...
chars=$(cat inputfile | wc -c)
words=$(cat inputfile | wc -w)
Then a simple
avg_word_size=$(( ${chars} / ${words} ))
will calculate a (rounded) integer. But it will be "more wrong" than just the rounding error is: you'll have included all whitespace character in your avarage wordsize as well. And I assume you want to be more precise...
The following will give you some increased precision by calculating the rounded integer from a number that is multiplied by 100:
_100x_avg_word_size=$(( $((${chars} * 100)) / ${words} ))
Now we can use that for telling the world:
echo "Avarage word size is: ${avg_word_size}.${_100x_avg_word_size: -2:2}"
To further refine, we could assume that only 1 whitespace character is separating words:
chars=$(cat inputfile | wc -c)
words=$(cat inputfile | wc -w)
avg_word_size=$(( $(( ${chars} - $(( ${words} - 1 )) )) / ${words} ))
_100x_avg_word_size=$(( $((${chars} * 100)) / ${words} ))
echo "Avarage word size is: ${avg_word_size}.${_100x_avg_word_size: -2:2}"
Now it's your job to try and include the concept of 'lines' into your computations... :-)
Update: to show clearly (hopefully) the differenct between wc and this method; and fixed a "too-many-newlines" bug; Also added finer control of apostrophes in word endings .
If your want to consider a word as being a bash word, then using wc alone is fine.
However if you want to consider a word as word in a spoken/written language, then you can't use wc for the word parsing.
Eg.. wc considers the following to contain 1 word (of size average = 112.00),
wheras the script belows shows it to contain 19 words (of size average = 4.58)
"/home/axiom/zap_notes/apps/eng-hin-devnag-itrans/Platt's_Urdu_and_classical_Hindi_to_English_-_preface5.doc't"
Using Kurt's script, the following line is shown to contain 7 words (of size average = 8.14),
wheras the script presented below shows it to contain 7 words (of size average = 4.43) ...बे = 2 chars
"बे = {Platts} ... —be-ḵẖẉabī, s.f. Sleeplessness:"
So, if wc is your flavour, good, and if not, something like this may suit:
# Cater for special situation words: eg 's and 't
# Convert each group of anything which isn't a "character" (including '_') into a newline.
# Then, convert each CHARACTER which isn't a newline into a BYTE (not character!).
# This leaves one 'word' per line, each 'word' being made up of the same BYTE ('x').
#
# Without any options, wc prints newline, word, and byte counts (in that order),
# so we can capture all 3 values in a bash array
#
# Use `awk` as a floating point calculator (bash can only do integer arithmetic)
count=($(sed "s/\>'s\([[:punct:]]\|$\)/\1/g # ignore apostrophe-s ('s) word endings
s/'t\>/xt/g # consider words ending in apostrophe-t ('t) as base word + 2 characters
s/[_[:digit:][:blank:][:punct:][:cntrl:]]\+/\n/g
s/^\n*//; s/\n*$//; s/[^\n]/x/g" "$file" | wc))
echo "chars / word average:" \
$(awk -vnl=${count[0]} -vch=${count[2]} 'BEGIN{ printf( "%.2f\n", (ch-nl)/nl ) }')