I'm running a shell script in Linux Mint that calls some processes taking few minutes.
For each process I want to echo a message like this:
echo "Cleaning temporary files... X seconds."
myprocess
where X is the current elapsed time and I would like it to change every second, but not printing a new line.
Is there a good way to do that? I only found ways to print the total time in the end, but not the elapsed time while running the process.
Use this at the beginning of your script, this creates a subprocess which runs in background and keeps on updating the status.
file=$(mktemp)
progress() {
pc=0;
while [ -e $file ]
do
echo -ne "$pc sec\033[0K\r"
sleep 1
((pc++))
done
}
progress &
#Do all the necessary staff
#now when everything is done
rm -f $file
You'll have to run the process in the background with &, otherwise the rest of the script will wait until it finishes. Use backspaces to overwrite your current line, so make sure you don't use newlines.
So, to do what you want:
myproc &
myPid=$! # save process id
tmp=""
while true; do
if kill -0 "$myPid"; then # if the process accepts a signal, keep waiting
for i in {0..${#tmp}..1}; do
printf "%b" "\b" # print backspaces until we have cleared the previous line
done
tmp=$( printf "Cleaning temp files... %t seconds." )
printf "%s" "$tmp"
else
break # drop out of the while loop
fi
sleep 1
done
You can run each command with time:
time <command>
and then use sed/awk to exctract the elapsed time.
Here is a way to have awk print on STDERR every seconds.
You should just add:
when myprocess is over, create a file /tmp/SOMETHING
have awk include a test : it exits when /tmp/SOMETHING appears
The loop part (without the termination test... so "infinite loop" until CTRL-C) is:
ping 127.0.0.1 | awk '
BEGIN{cmd="date +%s"; cmd|getline startup ; close (cmd) }
/bytes from/ { cmd | getline D ; close (cmd) ;
print D-startup | "cat >&2" }'
now you just need to use "printf" and ansi escape sequence to print without a newline, have the ansi-escape go back until the beginning of the number, and flush the output (all descriptors) by invoking system:
ping 127.0.0.1 | awk -v getback4char="$(printf '\033[4D')" '
BEGIN{cmd="date +%s"; cmd|getline startup ; close (cmd) ; printf "Elapsed time: ";}
/bytes from/ { cmd | getline D ; close (cmd) ;
printf "%4d%s" ,(D-startup) , getback4char | "cat >&2"
system("") }'
note: this is compatible with all version of awk I know of, even ANCIENT ones (ie, not gawk/nawk only, but also the venerable awk.)
Related
I am running a numbers of commands from a script and measuring the execution time (of only several of them). This I know how to do with time. But I also want to output all the times only after the whole script is finished (either in the shell or in a file). How do I do that?
EDIT:
I am sorry, I should have specified that I am using a Fish shell.(Nevertheless, I will add bash to the tags so that other people can use the answers.)
#!/bin/bash
#
declare -a toutput
declare -a commands
#
stime()
{
start=`date +%s`
# run command
$1
end=`date +%s`
toutput+=("$1 : $((end-start)) ,")
}
# set array of commnds
commands+=("'ls -1 /var/log'")
commands+=("'sleep 3'")
commands+=("'sleep 5'")
echo "==================="
echo ${commands[#]}
echo "==================="
# execute commands and log times to toutput
#
for cc in "${commands[#]}"
do
stime "$(echo ${cc} | tr -d \')"
done
echo "times = (" ${toutput[#]} ")"
Bash 4.2 and up have an obscure command for saving the unix time to a variable.
#!/bin/bash
# start time
printf -v s_time '%(%s)T' -1
# do stuff
sleep 1
sleep 2
sleep 3
# end time
printf -v e_time '%(%s)T' -1
# do more stuff
sleep 4
# print result
echo It took $(( e_time - s_time )) seconds
Shows the run time of the "do stuff" multiple commands
It took 6 seconds
option 1:
just try to run your script in this way:
time ./your_script.sh
https://www.cyberciti.biz/faq/unix-linux-time-command-examples-usage-syntax/
option 2:
npm install -g gnomon
./your_script.sh | gnomon
https://github.com/paypal/gnomon
by using shell scripting, I am dividing one long data file into 8 files and run them in parallel in 8 instance.
function_child()
{
while read -r record
do
###process to get the data by arsdoc get##
exit 12 ## if get fails##
### fp2pdf ###
EXIT 12 ## IF AFP2PDF FAILS ###
### logic ###
exit 12 ## if logic fails####
done < $1
}
## main ##
for file in /$MY_WORK/CCN_split_files/*; do
proceed_task "$file" &
PID="$!"
echo "$PID:$file" | tee $tmp_file
PID_LIST+="$PID "
done
how can take\ monitor the exit code and pid's of the child process when there is an failure.
I tryed this below, Once all the process are sent to background, I am using ‘wait’ function to wait for the PID from our PID_LIST to exit and then capture and print the respective exit status.
for process in "${PID_LIST[#]}";do
wait "$process"
exit_status=$?
file_name=`egrep $process $tmp_file | awk -F ":" '{print $2}' | rev | awk -F "/" '{print $2}' | rev`
echo "$file_name exit status: $exit_status"
done
but it gives an error
line 49: wait: `23043 23049 ': not a pid or valid job spec
grep: 23049: No such file or directory
could someone help me on this, Thank you.
See: help jobs and help wait
Collect return status at end of your code
for pid in $(jobs -rp); do
printf "Job %d handling file %q is still running\n" "$pid" "${pids[pid]}"
done
for pid in ${jobs -sp); do
printf "Job %s handling file %q has returned with status %d\n" "$pid" "${pids[pid]}" "$(wait "$pid")"
done
The double quotes around the argument to wait creates a single string. Remove the quotes to have the shell break up the string into individual PIDs.
Use wait on proper pid numbers.
function_child() {
while read -r record; do
# let's return a random number!
exit ${RANDOM}
done <<<'a'
}
mkdir -p my-home/dir
touch my-home/dir/{1..9}
for file in my-home/dir/*; do
function_child "$file" &
pid=$!
echo "Backgrounded: $file (pid=$pid)"
pids[$pid]=$file
done
for i in "${!pids[#]}"; do
wait "$i"
ret=$?
echo ${pids[$i]} returned with $ret
done
outputs on repl:
Backgrounded: my-home/dir/1 (pid=84)
Backgrounded: my-home/dir/2 (pid=85)
Backgrounded: my-home/dir/3 (pid=86)
Backgrounded: my-home/dir/4 (pid=87)
Backgrounded: my-home/dir/5 (pid=88)
Backgrounded: my-home/dir/6 (pid=89)
Backgrounded: my-home/dir/7 (pid=90)
Backgrounded: my-home/dir/8 (pid=91)
Backgrounded: my-home/dir/9 (pid=92)
my-home/dir/1 returned with 241
my-home/dir/2 returned with 59
my-home/dir/3 returned with 235
my-home/dir/4 returned with 11
my-home/dir/5 returned with 6
my-home/dir/6 returned with 222
my-home/dir/7 returned with 230
my-home/dir/8 returned with 189
my-home/dir/9 returned with 195
But I think just use xargs or other tool designed to run such jobs in parallel.
printf "%s\n" my-home/dir/* | xargs -n$'\n' -P8 sh -c 'echo "$1"; ###process to get the data by arsdoc get' --
#KamilCuk, appologies , updated the code.
The PID_LIST+="$PID " creates a one long variable with spaces. The "${PID_LIST[#]}" is an expansion used for arrays. It works that way, that ${PID_LIST[#]} just expands to the value of the variable PID_LIST, as if "$PID_LIST", so it expands to "23043 23049 ". Because it is in quotes it iterates over one element, hence it runs wait "23043 23049 ", hence you see the error message.
Not recommended: You could depend on shell space splitting
for process in $PID_LIST; do
wait "$process"
But just use an array:
PID_LIST+=("$PID")
done
for process in "${PID_LIST[#]}"; do
wait "$process"
If you feel not safe with your pids[$pid]=$file associative array, use two arrays instead:
onlypids+=("$pid")
files+=("$files")
done
for i in "${!onlypids[#]}"; do
pid="${onlypids[$i]}"
file="${files[$i]}"
wait "$pid"
Note that by convention, upper case variable names are for exported variables.
You mention in the comments that you do not want to use GNU Parallel, so this answer is for people who do not have that restriction.
doit() {
record="$1"
###process to get the data by arsdoc get##
exit 12 ## if get fails##
### fp2pdf ###
EXIT 12 ## IF AFP2PDF FAILS ###
### logic ###
exit 12 ## if logic fails####
}
export -f doit
cat /$MY_WORK/CCN_split_files/* |
parallel --joblog my.log doit
# Field 7 of my.log is the exit value
# If you have an unsplit version of the input you can have GNU Parallel process it:
# cat /$MY_WORK/CNN_big_file |
# parallel --joblog my.log doit
consider the following command:
while true; do echo 'Hit CTRL+C';sleep 1;done >> `date +"%H%M.txt"`
when I execute this command, it redirects output to a single file with the filename as the starting time of the command. How can I change this such that it saves to a different file every minute with the file name
date +"%H%M.txt"
at that given minute?
EDIT:
while true; do echo 'Hit CTRL+C';sleep 1;done
is just a substitute for a program that runs for a long time and outputs some data every second.i want to save the data that is output every minute into a separate file without having to start my program over again.
You need to re-evaluate the date on each instance of the loop.
while true; do echo 'Hit CTRL+C' > "`date +"%H%M"`.txt"; sleep 1; done
#!/bin/bash
while true ; do
echo 'Hit CTRL+C' # just for the screen
stock=$( date +"%H%M" )
while [[ $( date +"%H%M" ) == $stock ]] ; do
echo 'Hit CTRL+C' $stock
sleep 1
done > "$stock.txt" # no need '>>'
done
disk access every minute about
The output of your long time running command should redirect to a intermediate place (fifo), then use cat to pull log to different files from this fifo in every minute.
Here is a simple script wrapper, use SIGUSR1 to update log file name. Replace long-running-cmd with your command.
# in seconds
logger_update_duration=60
background_pid=
main_cmd=
function update_file_name() {
trap "update_file_name" USR1 # re-establish
file_name=$(date +"%Y-%m-%d-%H-%M-%S.log")
echo "update file name $file_name"
cat < ./logger > $file_name &
# after establish the new, check should kill the old
if [ x$background_pid != x ];then
echo "will kill the old $background_pid current is $$"
kill -TERM $background_pid
fi
background_pid=$!
}
trap "update_file_name" USR1
function clean() {
echo "perform clean..."
kill -TERM $!
kill -TERM $main_cmd
rm logger
}
trap "clean" EXIT
# create logger fifo
[ ! -p logger ] && mkfifo logger
# replace with your long time running operation
./long-running-cmd > ./logger &
main_cmd=$!
while true
do
kill -USR1 $$
sleep ${logger_update_duration}
done
The output of long-running-cmd should split into different log files in a period of one minute.
I've implemented a way to have concurrent jobs in bash, as seen here.
I'm looping through a file with around 13000 lines. I'm just testing and printing each line, as such:
#!/bin/bash
max_bg_procs(){
if [[ $# -eq 0 ]] ; then
echo "Usage: max_bg_procs NUM_PROCS. Will wait until the number of background (&)"
echo " bash processes (as determined by 'jobs -pr') falls below NUM_PROCS"
return
fi
local max_number=$((0 + ${1:-0}))
while true; do
local current_number=$(jobs -pr | wc -l)
if [[ $current_number -lt $max_number ]]; then
echo "success in if"
break
fi
echo "has to wait"
sleep 4
done
}
download_data(){
echo "link #" $2 "["$1"]"
}
mapfile -t myArray < $1
i=1
for url in "${myArray[#]}"
do
max_bg_procs 6
download_data $url $i &
((i++))
done
echo "finito!"
I've also tried other solutions such as this and this, but my issue is persistent:
At a "random" given step, usually between the 2000th and the 5000th iteration, it simply gets stuck. I've put those various echo in the middle of the code to see where it would get stuck but it the last thing it prints is the $url $i.
I've done the simple test to remove any parallelism and just loop the file contents: all went fine and it looped till the end.
So it makes me think I'm missing some limitation on the parallelism, and I wonder if anyone could help me out figuring it out.
Many thanks!
Here, we have up to 6 parallel bash processes calling download_data, each of which is passed up to 16 URLs per invocation. Adjust per your own tuning.
Note that this expects both bash (for exported function support) and GNU xargs.
#!/usr/bin/env bash
# ^^^^- not /bin/sh
download_data() {
echo "link #$2 [$1]" # TODO: replace this with a job that actually takes some time
}
export -f download_data
<input.txt xargs -d $'\n' -P 6 -n 16 -- bash -c 'for arg; do download_data "$arg"; done' _
Using GNU Parallel it looks like this
cat input.txt | parallel echo link '\#{#} [{}]'
{#} = the job number
{} = the argument
It will spawn one process per CPU. If you instead want 6 in parallel use -j:
cat input.txt | parallel -j6 echo link '\#{#} [{}]'
If you prefer running a function:
download_data(){
echo "link #" $2 "["$1"]"
}
export -f download_data
cat input.txt | parallel -j6 download_data {} {#}
I have a bash script which takes nearly 5 seconds to run. I'd like to debug it, and determine which commands are taking the longest. What is the best way of doing this? Is there a flag I can set? Setting #!/bin/bash -vx does not really help. What I want is basically execution time by line number.
This is as close as possible answer with built-in bash debug facility since it gives overall timing info from the script execution start time.
At the top of the script add this for a second count:
export PS4='+[${SECONDS}s][${BASH_SOURCE}:${LINENO}]: ${FUNCNAME[0]:+${FUNCNAME[0]}(): }'; set -x;
Same but with milliseconds instead:
N=`date +%s%N`; export PS4='+[$(((`date +%s%N`-$N)/1000000))ms][${BASH_SOURCE}:${LINENO}]: ${FUNCNAME[0]:+${FUNCNAME[0]}(): }'; set -x;
The last example can go to microsecond precision, just keep in mind you are using bash :).
Exampe script:
#!/bin/bash
N=`date +%s%N`
export PS4='+[$(((`date +%s%N`-$N)/1000000))ms][${BASH_SOURCE}:${LINENO}]: ${FUNCNAME[0]:+${FUNCNAME[0]}(): }'; set -x;
sleep 1
exit
Example debug output:
+[3ms][/root/db_test.sh:5]: sleep 1
+[1012ms][/usr/local/bin/graphite_as_rand_stat.sh:6]: exit
Keep in mind that you can selectively debug a specific portion of the script by enclosing it in 'set -x' at the debug start and 'debug +x' at the debug end. The timing data will still show correctly counted from execution start.
Addendum
For sake of completeness, if you do need the differential timing data you can redirect the debug info to a file and process it afterwards.
Given this example script:
#!/bin/bash
N=`date +%s%N`
export PS4='+[$(((`date +%s%N`-$N)/1000000))ms][${BASH_SOURCE}:${LINENO}]: ${FUNCNAME[0]:+${FUNCNAME[0]}(): }'; set -x;
sleep 1
for ((i=0;i<2;i++)); do
o=$(($RANDOM*$RANDOM/$RANDOM))
echo $o
sleep 0.$o
done
exit
Run it while redirecting debug to a file:
./example.sh 2>example.dbg
And output the differential debug timing with this (covers multi-line):
p=0; cat example.dbg | while read l; do [[ ! ${l%%[*} =~ ^\+ ]] && echo $l && continue; i=`echo $l | sed 's#[^0-9]*\([0-9]\+\).*#\1#'`; echo $l | sed "s#${i}ms#${i}ms+$(($i-$p))ms#"; p=$i; done
The output:
+[2ms+2ms][./example.sh:5]: sleep 1
+[1006ms+1004ms][./example.sh:6]: (( i=0 ))
+[1009ms+3ms][./example.sh:6]: (( i<2 ))
+[1011ms+2ms][./example.sh:7]: o=19258
+[1014ms+3ms][./example.sh:8]: echo 19258
+[1016ms+2ms][./example.sh:9]: sleep 0.19258
+[1213ms+197ms][./example.sh:6]: (( i++ ))
+[1217ms+4ms][./example.sh:6]: (( i<2 ))
+[1220ms+3ms][./example.sh:7]: o=176
+[1226ms+6ms][./example.sh:8]: echo 176
+[1229ms+3ms][./example.sh:9]: sleep 0.176
+[1442ms+213ms][./example.sh:6]: (( i++ ))
+[1460ms+18ms][./example.sh:6]: (( i<2 ))
+[1502ms+42ms][./example.sh:11]: exit
You can use the time utility to measure the run time of your individual commands/functions.
For example:
[ben#imac ~]$ cat times.sh
#!/bin/bash
test_func ()
{
sleep 1
echo "test"
}
echo "Running test_func()"
time test_func
echo "Running a 5 second external command"
time sleep 5
Running that script results in something like the following:
[ben#imac ~]$ ./times.sh
Running test_func()
test
real 0m1.003s
user 0m0.001s
sys 0m0.001s
Running a 5 second external command
real 0m5.002s
user 0m0.001s
sys 0m0.001s
You can use set -x to have the script print each command before it's executed. I don't know of a way to get command timings added automatically. You can sprinkle date commands throughout the script to mark the time.
Try this:
sed 's/^\([^#]\)/time \1/' script.sh>tmp.sh && ./tmp.sh
it prepends a time command to all the non command lines