Scheduling Jobs one after other in Linux

Scheduling Jobs one after other in Linux - linux

I have written an script After.sh to postpone one job to be started after another running job is finished:
echo "Waiting job $1 to be finished..."
while ps -p $1 >/dev/null; do sleep 1; done ;
echo "Job $1 finished."
echo "Now running job ${*:2}..."
${*:2}
echo "Job ${*:2} finished."
I run it like After.sh 5327 shutdown 0 to shutdown the PC after the job with PID=5327 is finished. Even I can run it like After.sh "5327 5778 5935" shutdown 0, this way it waits until all jobs 5327 5778 and 5935 are finished first.
My problem is when I want to feed a more complex job as argument to the script, for instance:
After.sh 5327 for f in *; do echo $f; done
Now it runs with an error: for: command not found. I tried to replace the command ${*:2} in the script to sh ${*:2} or eval ${*:2}, but they failed (sh fails again to find for command, eval keeps the first value of $f for the whole loop, so prints the name of the first file every time.)
Do you have any idea how to fix it?

the problem is, that you are for f in *; do echo $f; done is not a command, but rather it is bash-code, which needs a bash-interpreter to be executed.
you can start an interpreter by using eval:
#!/bin/sh
PID="$1"
shift
JOB="$#"
echo "Waiting job ${PID} to be finished..."
while ps -p ${PID} >/dev/null; do sleep 1; done ;
echo "Job ${PID} finished."
echo "Now running job '${JOB}'"
eval ${JOB}
echo "Job '${JOB}' finished."
then use single quotes for commands including variables that should be evaluated by After:
./After.sh 123 'for f in *; do echo $f; done'

Related

can shell script make itself run in background after running some steps?

I have BBB based custom Embedded Linux based board with busybox shell(ash)
I have a situation where my script must run in background with following condition
There must only one instance of the script.
wrapper script need to know if script started successfully in background or not.
There is another wrapper script which starts and stops my script, wrapper script is as mentioned below.
#!/bin/sh
export PATH=/bin:/sbin:/usr/bin:/usr/sbin
readonly TEST_SCRIPT_PATH="/home/testscript.sh"
readonly TEST_SCRIPT_LOCK_PATH="/var/run/${TEST_SCRIPT_PATH##*/}.lock"
start_test_script()
{
local pid_of_testscript=0
local status=0
#Run test script in background
"${TEST_SCRIPT_PATH}" &
#---------Now When this point is hit, lock file must be created.-----
if [ -f "${TEST_SCRIPT_LOCK_PATH}" ];then
pid_of_testscript=$(head -n1 ${TEST_SCRIPT_LOCK_PATH})
if [ -n "${pid_of_testscript}" ];then
kill -0 ${pid_of_testscript} &> /dev/null || status="${?}"
if [ ${status} -ne 0 ];then
echo "Error starting testscript"
else
echo "testscript start successfully"
fi
else
echo "Error starting testscript.sh"
fi
fi
}
stop_test_script()
{
local pid_of_testscript=0
local status=0
if [ -f "${TEST_SCRIPT_LOCK_PATH}" ];then
pid_of_testscript=$(head -n1 ${TEST_SCRIPT_LOCK_PATH})
if [ -n "${pid_of_testscript}" ];then
kill -0 ${pid_of_testscript} &> /dev/null || status="${?}"
if [ ${status} -ne 0 ];then
echo "testscript not running"
rm "${TEST_SCRIPT_LOCK_PATH}"
else
#send SIGTERM signal
kill -SIGTERM "${pid_of_testscript}"
fi
fi
fi
}
#Script starts from here.
case ${1} in
'start')
start_test_script
;;
'stop')
stop_test_script
;;
*)
echo "Usage: ${0} [start|stop]"
exit 1
;;
esac
Now actual script "testscript.sh" looks something like this,
#!/bin/sh
#Filename : testscript.sh
export PATH=/bin:/sbin:/usr/bin:/usr/sbin
set -eu
LOCK_FILE="/var/run/${0##*/}.lock"
FLOCK_CMD="/bin/flock"
FLOCK_ID=200
eval "exec ${FLOCK_ID}>>${LOCK_FILE}"
"${FLOCK_CMD}" -n "${FLOCK_ID}" || exit 0
echo "${$}" > "${LOCK_FILE}"
# >>>>>>>>>>-----Now run the code in background---<<<<<<
handle_sigterm()
{
# cleanup
"${FLOCK_CMD}" -u "${FLOCK_ID}"
if [ -f "${LOCK_FILE}" ];then
rm "${LOCK_FILE}"
fi
}
trap handle_sigterm SIGTERM
while true
do
echo "do something"
sleep 10
done
Now in above script you can see "---Now run the code in background--" at that point I am sure that either lock file is successfully created or instance of this script is already running. So Then I can safely run other code in background and wrapper script can check for lockfile and find out if the process mentioned in the lock file is running or not.
can shellscript itself make it to run in background ?
if not is there a better way to meet all the conditions ?

I think you can look into job control built-in, specifically bg.
Job Control Commands

When processes say they background themselves, what they actually do is fork and exit the parent. You can do the same by running whichever commands, functions or statements you want with & and then exiting.
#!/bin/sh
echo "This runs in the foreground"
sleep 3
while true
do
sleep 10
echo "doing background things"
done &

Bash script to re-launch program in case of failure error

In linux (I use a Ubuntu), I run a (ruby) program that continually runs all day long. My job is to monitor to see if the program fails and if so, re-launch the program. This consists up simply hitting 'Up' for last command and 'Enter'. Simple enough.
There has to be a way to write a bash script to monitor my program if its stops working and to re-launch it automatically.
How would I go about doing this?
A bonus is to be able to save the output of the program when it errors.

What you could do:
#!/bin/bash
LOGFILE="some_file.log"
LAUNCH="your_program"
while :
do
echo "New launch at `date`" >> "${LOGFILE}"
${LAUNCH} >> "${LOGFILE}" 2>&1 &
wait
done
Another way is to periodicaly check the PID:
#!/bin/bash
LOGFILE="some_file.log"
LAUNCH="your_program"
PID=""
CHECK=""
while :
do
if [ -n "${PID}" ]; then
CHECK=`ps -o pid:1= -p "${PID}"`
fi
# If PID does not exist anymore, launch again
if [ -z "${CHECK}" ]; then
echo "New launch at `date`" >> "${LOGFILE}"
# Launch command and keep track of the PID
${LAUNCH} >> "${LOGFILE}" 2>&1 &
PID=$!
fi
sleep 2
done

Infinite loop:
while true; do
your_program >> /path/to/error.log 2>&1
done

Can I detect early exit from a long-running, backgrounded process?

I'm trying to improve the startup scripts for several servers running in a cluster environment. The server processes should run indefinitely but occasionally fails on startup issuing e.g., Address already in use exceptions.
I'd like the exit code for the startup script to reflect these early terminations by, say, waiting for 1 second and telling me if the server seems to have started okay. I also need the server PID echoed.
Here's my best shot so far:
$ cat startup.sh
# start the server in the bg but if it fails in the first second,
# then kill startup.sh.
CMD="start_server -option1 foo -option2 bar"
eval "($CMD >> cc.log 2>&1 || kill -9 $$ &)"
SERVER_PID=$!
# the `kill` above only has 1 second to kill me-- otherwise my exit code is 0
sleep 1
echo $SERVER_PID
The exit code works fine but two problems remain:
If the server is long-running but eventually encounters an error, the parent startup.sh will have exited already and the $$ PID may have been reused by an unrelated process which this script will then kill off.
The SERVER_PID isn't correct since it's the PID of the subshell rather than the start_server command (which in this case is a grandchild of the startup.sh script.
Is there a simpler way to background the start_server process, get its PID, and use a timeout'ed check for error codes? I looked into bash builtins wait and timeout but they don't seem to work for processes that shouldn't exit in the end.
I can't change the server code and the startup script should not run indefinitely.

You can also use coproc (and look, I'm putting the command in an array, and also with proper quoting!):
#!/bin/bash
cmd=( start_server -option1 foo -option2 bar )
coproc mycoprocfd { "${cmd[#]}" >> cc.log 2>&1 ; }
server_pid=$!
sleep 1
if [[ -z "${mycoprocfd[#]}" ]]; then
echo >&2 "Failure detected when starting server! Server died before 1 second."
exit 1
else
echo $server_pid
fi
The trick is that coproc puts the file descriptors of the redirections of stdin and stdout in a prescribed array (here mycoprocfd) and empties the array when the process exits. So you don't need to do clumsy stuff with the PID itself.
You can hence check for the server to never exit as so:
#!/bin/bash
cmd=( start_server -option1 foo -option2 bar )
coproc mycoprocfd { "${cmd[#]}" >> cc.log 2>&1 ; }
server_pid=$!
read -u "${mycoprocfd[0]}"
echo >&2 "Oh dear, the server with PID $server_pid died after $SECONDS seconds."
exit 1
That's because read will read on the file descriptor given by coproc (but nothing is ever read here, since the stdout of your command has been redirected to a file!), and read exits when the file descriptor is closed, i.e., when the command launched by coproc exits.
I'd say this is a really elegant solution!
Now, this script will live as long as the coproc lives. I understood that's not what you want. In this case, you can timeout the read with its -t option, and then you'll use the fact that return's exit status is greater than 128 if it timed out. E.g., for a 4.5 seconds timeout
#!/bin/bash
timeout=4.5
cmd=( start_server -option1 foo -option2 bar )
coproc mycoprocfd { "${cmd[#]}" >> cc.log 2>&1 ; }
server_pid=$!
read -t $timeout -u "${mycoprocfd[0]}"
if (($?>128)); then
echo "$server_pid <-- all is good, it's still alive after $timeout seconds."
else
echo >&2 "Oh dear, the server with PID $server_pid died after $timeout seconds."
exit 1
fi
exit 0 # Yay
This is also very elegant :).
Use, extend, and adapt to your needs! (but with good practices!)
Hope this helps!
Remarks.
coproc is a bash-builtin that appeared in bash 4.0. The solutions shown here are 100% pure bash (except the first one, with sleep, which is not the best one at all!).
The use of coproc in scripts is almost always superior to putting jobs in background with & and doing clumsy and awkward stuff with sleep and checking $!.
If you want coproc to keep quiet, whatever happens (e.g., if there's an error launching the command, which is fine here since you're handling everything yourself), do:
coproc mycoprocfd { "${cmd[#]}" >> cc.log 2>&1 ; } > /dev/null 2>&1

20 minutes of more googling revealed https://stackoverflow.com/a/6756971/494983 and kill -0 $PID from https://stackoverflow.com/a/14296353/494983.
So it seems I can use:
$ cat startup.sh
CMD="start_server -option1 foo -option2 bar"
eval "$CMD >> cc.log 2>&1 &"
SERVER_PID=$!
sleep 1
kill -0 $SERVER_PID
if [ $? != 0 ]; then
echo "Failure detected when starting server! PID $SERVER_PID doesn't exist!" 1>&2
exit 1
else
echo $SERVER_PID
fi
This wouldn't work for processes that I can't send signals to but works well enough in my case (where startup.sh starts the server itself).

Linux Single Instance Kill if running too long

I am using the following to keep a single instance of a script running on my server. I have a cronjob to run this every minute.
How do I daemonize an arbitrary script in unix?
#!/bin/bash
if [[ $# < 1 ]]; then
echo "Name of pid file not given."
exit
fi
# Get the pid file's name.
PIDFILE=$1
shift
if [[ $# < 1 ]]; then
echo "No command given."
exit
fi
echo "Checking pid in file $PIDFILE."
#Check to see if process running.
PID=$(cat $PIDFILE 2>/dev/null)
if [[ $? = 0 ]]; then
ps -p $PID >/dev/null 2>&1
if [[ $? = 0 ]]; then
echo "Command $1 already running."
exit
fi
fi
# Write our pid to file.
echo $$ >$PIDFILE
# Get command.
COMMAND=$1
shift
# Run command
$COMMAND "$*"
Now I found out that my script had hung for some reason and therefore it was stuck. I'd like a way to check if the $PIDFILE is "old" and if so, kill the process. I know that's possible (check the timestamp on the file) but I don't know the syntax or if this is even a good idea. Also, when this script is running, the CPU should be pretty heavily used. If it hangs (rare but it happened at least once so far), the CPU usage drops to 0%. It would be nice if I could check that the process is really hung/not active, but I don't know if there's an easy way to do that (and I don't want to have many false positives where it gets killed but it's running fine).

To answer the question in your title, which seems quite different from your problem, use timeout.
Now, for your problem, I don't see where it could hang, unless you gave it a fifo queue for the pid file. Now, to run and respawn, you can just run this script once, on startup:
#!/bin/bash
while /bin/true; do
"$#"
wait
done
Which brings up another bug in the code you got from the other question: "$*" will pass all the arguments to the script as a single argument; without the quotes it'll split arguments with white space. "$#" will pass them individually and handling white space properly.
Call with /path/to/script command [argument]....

Get exit code of a background process

I have a command CMD called from my main bourne shell script that takes forever.
I want to modify the script as follows:
Run the command CMD in parallel as a background process (CMD &).
In the main script, have a loop to monitor the spawned command every few seconds. The loop also echoes some messages to stdout indicating progress of the script.
Exit the loop when the spawned command terminates.
Capture and report the exit code of the spawned process.
Can someone give me pointers to accomplish this?

1: In bash, $! holds the PID of the last background process that was executed. That will tell you what process to monitor, anyway.
4: wait <n> waits until the process with PID <n> is complete (it will block until the process completes, so you might not want to call this until you are sure the process is done), and then returns the exit code of the completed process.
2, 3: ps or ps | grep " $! " can tell you whether the process is still running. It is up to you how to understand the output and decide how close it is to finishing. (ps | grep isn't idiot-proof. If you have time you can come up with a more robust way to tell whether the process is still running).
Here's a skeleton script:
# simulate a long process that will have an identifiable exit code
(sleep 15 ; /bin/false) &
my_pid=$!
while ps | grep " $my_pid " # might also need | grep -v grep here
do
echo $my_pid is still in the ps output. Must still be running.
sleep 3
done
echo Oh, it looks like the process is done.
wait $my_pid
# The variable $? always holds the exit code of the last command to finish.
# Here it holds the exit code of $my_pid, since wait exits with that code.
my_status=$?
echo The exit status of the process was $my_status

This is how I solved it when I had a similar need:
# Some function that takes a long time to process
longprocess() {
# Sleep up to 14 seconds
sleep $((RANDOM % 15))
# Randomly exit with 0 or 1
exit $((RANDOM % 2))
}
pids=""
# Run five concurrent processes
for i in {1..5}; do
( longprocess ) &
# store PID of process
pids+=" $!"
done
# Wait for all processes to finish, will take max 14s
# as it waits in order of launch, not order of finishing
for p in $pids; do
if wait $p; then
echo "Process $p success"
else
echo "Process $p fail"
fi
done

The pid of a backgrounded child process is stored in $!.
You can store all child processes' pids into an array, e.g. PIDS[].
wait [-n] [jobspec or pid …]
Wait until the child process specified by each process ID pid or job specification jobspec exits and return the exit status of the last command waited for. If a job spec is given, all processes in the job are waited for. If no arguments are given, all currently active child processes are waited for, and the return status is zero. If the -n option is supplied, wait waits for any job to terminate and returns its exit status. If neither jobspec nor pid specifies an active child process of the shell, the return status is 127.
Use wait command you can wait for all child processes finish, meanwhile you can get exit status of each child processes via $? and store status into STATUS[]. Then you can do something depending by status.
I have tried the following 2 solutions and they run well. solution01 is
more concise, while solution02 is a little complicated.
solution01
#!/bin/bash
# start 3 child processes concurrently, and store each pid into array PIDS[].
process=(a.sh b.sh c.sh)
for app in ${process[#]}; do
./${app} &
PIDS+=($!)
done
# wait for all processes to finish, and store each process's exit code into array STATUS[].
for pid in ${PIDS[#]}; do
echo "pid=${pid}"
wait ${pid}
STATUS+=($?)
done
# after all processed finish, check their exit codes in STATUS[].
i=0
for st in ${STATUS[#]}; do
if [[ ${st} -ne 0 ]]; then
echo "$i failed"
else
echo "$i finish"
fi
((i+=1))
done
solution02
#!/bin/bash
# start 3 child processes concurrently, and store each pid into array PIDS[].
i=0
process=(a.sh b.sh c.sh)
for app in ${process[#]}; do
./${app} &
pid=$!
PIDS[$i]=${pid}
((i+=1))
done
# wait for all processes to finish, and store each process's exit code into array STATUS[].
i=0
for pid in ${PIDS[#]}; do
echo "pid=${pid}"
wait ${pid}
STATUS[$i]=$?
((i+=1))
done
# after all processed finish, check their exit codes in STATUS[].
i=0
for st in ${STATUS[#]}; do
if [[ ${st} -ne 0 ]]; then
echo "$i failed"
else
echo "$i finish"
fi
((i+=1))
done

As I see almost all answers use external utilities (mostly ps) to poll the state of the background process. There is a more unixesh solution, catching the SIGCHLD signal. In the signal handler it has to be checked which child process was stopped. It can be done by kill -0 <PID> built-in (universal) or checking the existence of /proc/<PID> directory (Linux specific) or using the jobs built-in (bash specific. jobs -l also reports the pid. In this case the 3rd field of the output can be Stopped|Running|Done|Exit . ).
Here is my example.
The launched process is called loop.sh. It accepts -x or a number as an argument. For -x is exits with exit code 1. For a number it waits num*5 seconds. In every 5 seconds it prints its PID.
The launcher process is called launch.sh:
#!/bin/bash
handle_chld() {
local tmp=()
for((i=0;i<${#pids[#]};++i)); do
if [ ! -d /proc/${pids[i]} ]; then
wait ${pids[i]}
echo "Stopped ${pids[i]}; exit code: $?"
else tmp+=(${pids[i]})
fi
done
pids=(${tmp[#]})
}
set -o monitor
trap "handle_chld" CHLD
# Start background processes
./loop.sh 3 &
pids+=($!)
./loop.sh 2 &
pids+=($!)
./loop.sh -x &
pids+=($!)
# Wait until all background processes are stopped
while [ ${#pids[#]} -gt 0 ]; do echo "WAITING FOR: ${pids[#]}"; sleep 2; done
echo STOPPED
For more explanation see: Starting a process from bash script failed

#/bin/bash
#pgm to monitor
tail -f /var/log/messages >> /tmp/log&
# background cmd pid
pid=$!
# loop to monitor running background cmd
while :
do
ps ax | grep $pid | grep -v grep
ret=$?
if test "$ret" != "0"
then
echo "Monitored pid ended"
break
fi
sleep 5
done
wait $pid
echo $?

I would change your approach slightly. Rather than checking every few seconds if the command is still alive and reporting a message, have another process that reports every few seconds that the command is still running and then kill that process when the command finishes. For example:
#!/bin/sh
cmd() { sleep 5; exit 24; }
cmd & # Run the long running process
pid=$! # Record the pid
# Spawn a process that coninually reports that the command is still running
while echo "$(date): $pid is still running"; do sleep 1; done &
echoer=$!
# Set a trap to kill the reporter when the process finishes
trap 'kill $echoer' 0
# Wait for the process to finish
if wait $pid; then
echo "cmd succeeded"
else
echo "cmd FAILED!! (returned $?)"
fi

Our team had the same need with a remote SSH-executed script which was timing out after 25 minutes of inactivity. Here is a solution with the monitoring loop checking the background process every second, but printing only every 10 minutes to suppress an inactivity timeout.
long_running.sh &
pid=$!
# Wait on a background job completion. Query status every 10 minutes.
declare -i elapsed=0
# `ps -p ${pid}` works on macOS and CentOS. On both OSes `ps ${pid}` works as well.
while ps -p ${pid} >/dev/null; do
sleep 1
if ((++elapsed % 600 == 0)); then
echo "Waiting for the completion of the main script. $((elapsed / 60))m and counting ..."
fi
done
# Return the exit code of the terminated background process. This works in Bash 4.4 despite what Bash docs say:
# "If neither jobspec nor pid specifies an active child process of the shell, the return status is 127."
wait ${pid}

A simple example, similar to the solutions above. This doesn't require monitoring any process output. The next example uses tail to follow output.
$ echo '#!/bin/bash' > tmp.sh
$ echo 'sleep 30; exit 5' >> tmp.sh
$ chmod +x tmp.sh
$ ./tmp.sh &
[1] 7454
$ pid=$!
$ wait $pid
[1]+ Exit 5 ./tmp.sh
$ echo $?
5
Use tail to follow process output and quit when the process is complete.
$ echo '#!/bin/bash' > tmp.sh
$ echo 'i=0; while let "$i < 10"; do sleep 5; echo "$i"; let i=$i+1; done; exit 5;' >> tmp.sh
$ chmod +x tmp.sh
$ ./tmp.sh
0
1
2
^C
$ ./tmp.sh > /tmp/tmp.log 2>&1 &
[1] 7673
$ pid=$!
$ tail -f --pid $pid /tmp/tmp.log
0
1
2
3
4
5
6
7
8
9
[1]+ Exit 5 ./tmp.sh > /tmp/tmp.log 2>&1
$ wait $pid
$ echo $?
5

Another solution is to monitor processes via the proc filesystem (safer than ps/grep combo); when you start a process it has a corresponding folder in /proc/$pid, so the solution could be
#!/bin/bash
....
doSomething &
local pid=$!
while [ -d /proc/$pid ]; do # While directory exists, the process is running
doSomethingElse
....
else # when directory is removed from /proc, process has ended
wait $pid
local exit_status=$?
done
....
Now you can use the $exit_status variable however you like.

With this method, your script doesnt have to wait for the background process, you will only have to monitor a temporary file for the exit status.
FUNCmyCmd() { sleep 3;return 6; };
export retFile=$(mktemp);
FUNCexecAndWait() { FUNCmyCmd;echo $? >$retFile; };
FUNCexecAndWait&
now, your script can do anything else while you just have to keep monitoring the contents of retFile (it can also contain any other information you want like the exit time).
PS.: btw, I coded thinking in bash

My solution was to use an anonymous pipe to pass the status to a monitoring loop. There are no temporary files used to exchange status so nothing to cleanup. If you were uncertain about the number of background jobs the break condition could be [ -z "$(jobs -p)" ].
#!/bin/bash
exec 3<> <(:)
{ sleep 15 ; echo "sleep/exit $?" >&3 ; } &
while read -u 3 -t 1 -r STAT CODE || STAT="timeout" ; do
echo "stat: ${STAT}; code: ${CODE}"
if [ "${STAT}" = "sleep/exit" ] ; then
break
fi
done

how about ...
# run your stuff
unset PID
for process in one two three four
do
( sleep $((RANDOM%20)); echo hello from process $process; exit $((RANDOM%3)); ) & 2>&1
PID+=($!)
done
# (optional) report on the status of that stuff as it exits
for pid in "${PID[#]}"
do
( wait "$pid"; echo "process $pid complemted with exit status $?") &
done
# (optional) while we wait, monitor that stuff
while ps --pid "${PID[*]}" --ppid "${PID[*]}" --format pid,ppid,command,pcpu
do
sleep 5
done | xargs -i date '+%x %X {}'
# return non-zero if any are non zero
SUCCESS=0
for pid in "${PID[#]}"
do
wait "$pid" && ((SUCCESS++)) && echo "$pid OK" || echo "$pid returned $?"
done
echo "success for $SUCCESS out of ${#PID} jobs"
exit $(( ${#PID} - SUCCESS ))

This may be extending beyond your question, however if you're concerned about the length of time processes are running for, you may be interested in checking the status of running background processes after an interval of time. It's easy enough to check which child PIDs are still running using pgrep -P $$, however I came up with the following solution to check the exit status of those PIDs that have already expired:
cmd1() { sleep 5; exit 24; }
cmd2() { sleep 10; exit 0; }
pids=()
cmd1 & pids+=("$!")
cmd2 & pids+=("$!")
lasttimeout=0
for timeout in 2 7 11; do
echo -n "interval-$timeout: "
sleep $((timeout-lasttimeout))
# you can only wait on a pid once
remainingpids=()
for pid in ${pids[*]}; do
if ! ps -p $pid >/dev/null ; then
wait $pid
echo -n "pid-$pid:exited($?); "
else
echo -n "pid-$pid:running; "
remainingpids+=("$pid")
fi
done
pids=( ${remainingpids[*]} )
lasttimeout=$timeout
echo
done
which outputs:
interval-2: pid-28083:running; pid-28084:running;
interval-7: pid-28083:exited(24); pid-28084:running;
interval-11: pid-28084:exited(0);
Note: You could change $pids to a string variable rather than array to simplify things if you like.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string