Bash run a group of two children in the background and kill them later - linux

Let's group two commands (cd and bash ..) together like this:
#!/bin/bash
C="directory"
SH="bash process.sh"
(cd ${C}; ${SH})&
PID=$!
sleep 1
KILL=`kill ${PID}`
process.sh prints out the date (each second and five times):
C=0
while true
do
date
sleep 1
if [ ${C} -eq 4 ]; then
break
fi
C=$((C+1))
done
Now I actually would expect the background subprocess to be killed right after 1 second, but it just continues like nothing happens. INB4: "Why don't you just bash directory/process.sh" No, this cd is just an example.
What am I doing wrong?

Use exec when you want a process to replace itself in-place, rather than creating a new subprocess with its own PID.
That is to say, this code can create two subprocesses, storing the PID of the first one in $! but then using the second one to execute process.sh:
# store the subshell that runs cd in $!; not necessarily the shell that runs process.sh
# ...as the shell that runs cd is allowed to fork off a child and run process.sh there.
(cd "$dir" && bash process.sh) & pid=$!
...whereas this code creates only one subprocess, because it uses exec to make the first process replace itself with the second:
# explicitly replace the shell that runs cd with the one that runs process.sh
# so $! is guaranteed to have the right thing
(cd "$dir" && exec bash process.sh) &

you can check all child processes with "ps --ppid $$"
so,
#!/bin/bash
C="directory"
SH="bash process.sh"
(cd ${C}; ${SH})&
PID=$!
sleep 1
ps -o pid= --ppid $$|xargs kill

Related

finding the process group id created through setsid

In a shell script, I see that using setsid, we could create a new process group. I am not able to find a reliable way to get the group id after the creation. My requirement is simple, launch a process, and after it is done, clean up any descendant (if any). I dont want to kill the main process, hence I have to wait for the main process to end. After which, I can kill the leftover child processes if I had somehow got the group id. which can be done with kill -- -pgid. The missing piece is how do I get the group id ?
This script is what I came up with finally. Hope this helps someone.
$! will give the pid, and a ps has to be used to find its gid.
there was an extra space in front while using ps,the next line of variable expansion removes the leading space.
Finally after waiting for the main process,it kills the group.
#!/bin/sh -x
setsid "$#" &
pid=$!
gidspace=$(ps -o pgid= $pid)
gid="${gidspace## }"
echo "gid $gid"
echo "waiting"
wait $pid
ps -s $gid -o pid,ppid,pgid,command
kill -- -$gid
I managed to do it with a coproc, and a sleep to ensure we have enough time to read back the pid. This is bash-specific of course, and the only way to avoid using a hackish sleep inside a coproc is to write to a temp file and wait for the command to terminate (no need for coproc then).
Using a coproc
Note that I open filehandle 3 to write the pgid back to the parent shell and close it before executing the command.
#!/bin/bash -x
coproc setsid bash -c 'ps -o pgid= $BASHPID >&3; exec 3>&-; exec "$#" & sleep 1' -- "$#" 3>&1
read -u ${COPROC[0]} gid
echo "gid $gid"
ps -s $gid -o pid,ppid,pgid,command
kill -- -$gid
Using a temp file
To avoid having to pass the temp file to the subshell (and the risk the parent dies and removes it before child writes to it) I again open fh 3 so the children can write its pgid to it.
#!/bin/bash -x
t=$(mktemp)
trap 'rm -f "$t"' EXIT
exec {fh}>"$t"
setsid bash -c 'ps -o pgid= $BASHPID >&3; exec 3>&-; exec "$#" &' -- "$#" 3>&${fh}
read gid <$t
echo "gid $gid"
ps -s $gid -o pid,ppid,pgid,command
kill -- -$gid

Launch two processes simultaneously and collect results from the process finished earlier

Suppose I want to run two commands c1 and c2, which essentially process (but not modify) the same piece of data on Linux.
Right now I would like to launch them simultaneously, and see which one finishes quicker, once one process has finished, I will collect its output (could be dumpped into a file with c1 >> log1.txt), and terminate the other process.
Note that the processing time of two process could be largely different and hence observable, say one takes ten seconds, while the other takes 60 seconds.
=======================update
I tried the following script set but it causes infinite loop on my computer:
import os
os.system("./launch.sh")
launch.sh
#!/usr/bin/env bash
rm /tmp/smack-checker2
mkfifo /tmp/smack-checker2
setsid bash -c "./sleep60.sh ; echo 1 > /tmp/run-checker2" &
pid0=$!
setsid bash -c "./sleep10.sh ; echo 2 > /tmp/run-checker2" &
pid1=$!
read line </tmp/smack-checker2
printf "Process %d finished earlier\n" "$line"
rm /tmp/smack-checker2
eval kill -- -\$"pid$((line ^ 1))"
sleep60.sh
#!/usr/bin/env bash
sleep 60
sleep10.sh
#!/usr/bin/env bash
sleep 10
Use wait -n to wait for either process to exit. Ignoring race conditions and pid number wrapping,
c1 & P1=$!
c2 & P2=$!
wait -n # wait for either one to exit
if ! kill $P1; then
# failure to kill $P1 indicates c1 finished first
kill $P2
# collect c1 results...
else
# c2 finished first
kill $P1
# collect c2 results...
fi
See help wait or man bash for documentation.
I would run 2 processes and make them write to the shared named pipe
after they finish. Reading from a named pipe is a blocking operation
so you don't need funny sleep instructions inside a loop. It would
be:
#!/usr/bin/env bash
mkfifo /tmp/run-checker
(./sleep60.sh ; echo 0 > /tmp/run-checker) &
(./sleep10.sh ; echo 1 > /tmp/run-checker) &
read line </tmp/run-checker
printf "Process %d finished earlier\n" "$line"
rm /tmp/run-checker
kill -- -$$
sleep60.sh:
#!/usr/bin/env bash
sleep 60
sleep10.sh:
#!/usr/bin/env bash
sleep 10
EDIT:
If you're going to call the script form Python script like that:
#!/usr/bin/env python3
import os
os.system("./parallel.sh")
print("Done")
you'll get:
Process 1 finished earlier
./parallel.sh: line 11: kill: (-13807) - No such process
Done
This is because kill -- -$$ tries to send TERM signal to the process
group as specified in man 1 kill:
-n
where n is larger than 1. All processes in process group n are
signaled. When an argument of the form '-n' is given, and it
is meant to denote a process group, either a signal must be
specified first, or the argument must be preceded by a '--'
option, otherwise it will be taken as the signal to send.
It works when you run parallel.sh from the terminal because $$ is a
PID of the subshell and also of the process group. I used it because
it's very convenient to kill parallel.sh, process0 or process1 and all
their children in one shot. However, when parallel.sh is called from
Python script $$ does not longer denote process group and kill --
fails.
You could modify parallel.sh like that:
#!/usr/bin/env bash
mkfifo /tmp/run-checker
setsid bash -c "./sleep60.sh ; echo 0 > /tmp/run-checker" &
pid0=$!
setsid bash -c "./sleep10.sh ; echo 1 > /tmp/run-checker" &
pid1=$!
read line </tmp/run-checker
printf "Process %d finished earlier\n" "$line"
rm /tmp/run-checker
eval kill -- -\$"pid$((line ^ 1))"
It will now work also when called from Python script. The last line
eval kill -- -\$"pid$((line ^ 1))"
kills pid0 if pid1 finished earlier or pid0 if pid1 finished earlier
using ^ binary operator to convert 0 to 1 and vice versa. If you
don't like it you can use a bit more verbose form:
if [ "$line" -eq "$pid0" ]
then
echo kill "$pid1"
kill -- -"$pid1"
else
echo kill "$pid0"
kill -- -"$pid0"
fi
Can this snippet give you some idea?
#!/bin/sh
runproc1() {
sleep 5
touch proc1 # file created when terminated
exit
}
runproc2() {
sleep 10
touch proc2 # file created when terminated
exit
}
# remove flags
rm proc1
rm proc2
# run processes concurrently
runproc1 &
runproc2 &
# wait until one of them is finished
while [ ! -f proc1 -a ! -f proc2 ]; do
sleep 1
echo -n "."
done
The idea is to enclose two processes into two functions which, at the end, touch a file to signal that computing is terminated. The functions are executed in background, after having removed the files used as flags. The last step is to watch for either file to show up. At that point, anything can be done: continue to wait for the other process, or kill it.
Launching this precise script, it takes about 5 seconds, then terminates. I see that the file "proc1" is created, with no proc2. After a few seconds (5, to be precise), also "proc2" is created. This means that even when the script is terminated, any unfinished job keeps to run.

Multiple instances of bash script upon logout and login

I made simple script which works in infinite loop. It looks like that:
while :
do
#operations
sleep 5
done
and I added it to autorun programs like this.
Everything works fine but after logout I have 2 instances of this script process (3 after next logout and so on). Only one of them show notifications but they both run own sleep processes.
What can I do to solve this problem?
Log out doesn't kill all processes. You need to kill that process yourself. One way is to add conditional kill inside your script.
Example:
#!/bin/bash
for proc in $(pgrep $(basename "$0"));do
[[ $proc -ne $$ ]] && kill $proc
done
while :
do
#operations
sleep 5
done
If you run this script twice, the second one will kill the previous one/s and make sure only one instance of this script is running at a time.
If there are more than one users who use that process then you might want it to be user specific. For that, change the line:
[[ $proc -ne $$ ]] && kill $proc
to:
[[ $(echo $(pgrep -u $USER) | grep -o $proc) -ne $$ ]] && kill $proc
Note: Sometimes, your process can get into a defunct state when normal kill command won't be enough to kill them. Use kill -9 in those cases.

Shell scripts and how to avoid running the same script at the same time on a Linux machine

I have Linux centralize server – Linux 5.X.
In some cases on my Linux server the get_hosts.ksh script could be run from some other different hosts.
For example get_hosts.ksh could run on my Linux machine three or more times at the same time.
My question:
How to avoid running multiple instances of process/script?
A common solution for your problem on *nix systems is to check for a lock file existence.
Usually lock file contains current process PID.
This is an example ksh script:
#!/bin/ksh
pid="/var/run/get_hosts.pid"
trap "rm -f $pid" SIGSEGV
trap "rm -f $pid" SIGINT
if [ -e $pid ]; then
exit # pid file exists, another instance is running, so now we politely exit
else
echo $$ > $pid # pid file doesn't exit, create one and go on
fi
# your normal workflow here...
rm -f $pid # remove pid file just before exiting
exit
UPDATE: Answering to OP comment, I add handling program interruptions and segfaults with trap command.
The normal way of doing this is to write the process id into a file. The first thing the script does is check for the existence of the file, read the pid, check if a process with that pid exists, and for extra paranoia points, if that process actually runs the script. If yes, the script exits.
Here's a simple example. The process in question is a binary, and this script makes sure the binary runs only once. This is not exactly what you need, but you should be able to adapt this:
RUNNING=0
PIDFILE=$PATH_TO/var/run/example.pid
if [ -f $PIDFILE ]
then
PID=`cat $PIDFILE`
ps -eo pid | grep $PID >/dev/null 2>&1
if [ $? -eq 0 ]
then
RUNNING=1
fi
fi
if [ $RUNNING -ne 1 ]
then
run_binary
PID=$!
echo $PID > $PIDFILE
fi
This is not very elaborate but should get you on the right track.
You can use a pid file to keep track of when the process is running. At the top of the script, check for the existence of the pid file and if it doesn't exist, create it and run the script, otherwise return.
Some sample code can be seen in this answer to a similar question.
You might consider using the (optional) lockfile(1) command (provided by procmail package on Debian).
I have a lot of scripts, and using this below code for prevent multiple/simulate run:
PID="/var/scripts/PID.txt" # Temp file
if [ ! -f "$PID" ]; then
echo $$ > "$PID" # Print actual PID into a file
else
ps -p $(cat "$PID") > /dev/null && exit || echo $$ > "$PID"
fi
Building on wallenborn's answer I also added a "staleness" check just in case the PID lock file is beyond a certain expected age in seconds.
# prevent simultaneous executions within an hourish
pid_file="$HOME/.harness.pid"
max_stale_seconds=3600
if [ -f $pid_file ]; then
pid="$(cat "$pid_file")"
let age_in_seconds="$(date +%s) - $(date -r "$pid_file" +%s)"
if ps $pid >/dev/null && [ $age_in_seconds -lt $max_stale_seconds ]; then
exit 1
fi
fi
echo $$>"$pid_file"
trap "rm -f \"$pid_file\"" SIGSEGV
trap "rm -f \"$pid_file\"" SIGINT
This could be made "smarter" to kill off the other executions should the PID be valid but this would be dangerous. Consider a sudden power failure and reset situation where the PID file contains a number that may now reference a completely different process.

wait child process but get error: 'pid is not a child of this shell'

I write a script to get data from HDFS parrallel,then I wait these child processes in a for loop, but sometimes it returns "pid is not a child of this shell". sometimes, it works well。It's so puzzled. I use "jobs -l" to show all the jobs run in the background. I am sure these pid is the child process of the shell process, and I use "ps aux" to make sure these pids is note assign to other process. Here is my script.
PID=()
FILE=()
let serial=0
while read index_tar
do
echo $index_tar | grep index > /dev/null 2>&1
if [[ $? -ne 0 ]]
then
continue
fi
suffix=`printf '%03d' $serial`
mkdir input/output_$suffix
$HADOOP_HOME/bin/hadoop fs -cat $index_tar | tar zxf - -C input/output_$suffix \
&& mv input/output_$suffix/index_* input/output_$suffix/index &
PID[$serial]=$!
FILE[$serial]=$index_tar
let serial++
done < file.list
for((i=0;i<$serial;i++))
do
wait ${PID[$i]}
if [[ $? -ne 0 ]]
then
LOG "get ${FILE[$i]} failed, PID:${PID[$i]}"
exit -1
else
LOG "get ${FILE[$i]} success, PID:${PID[$i]}"
fi
done
Just find the process id of the process you want to wait for and replace that with 12345 in below script. Further changes can be made as per your requirement.
#!/bin/sh
PID=12345
while [ -e /proc/$PID ]
do
echo "Process: $PID is still running" >> /home/parv/waitAndRun.log
sleep .6
done
echo "Process $PID has finished" >> /home/parv/waitAndRun.log
/usr/bin/waitingScript.sh
http://iamparv.blogspot.in/2013/10/unix-wait-for-running-process-not-child.html
Either your while loop or the for loop runs in a subshell, which is why you cannot await a child of the (parent, outer) shell.
Edit this might happen if the while loop or for loop is actually
(a) in a {...} block
(b) participating in a piper (e.g. for....done|somepipe)
If you're running this in a container of some sort, the condition apparently can be caused by a bug in bash that is easier to encounter in a containerized envrionment.
From my reading of the bash source (specifically see comments around RECYCLES_PIDS and CHILD_MAX in bash-4.2/jobs.c), it looks like in their effort to optimize their tracking of background jobs, they leave themselves vulnerable to PID aliasing (where a new process might obscure the status of an old one); to mitigate that, they prune their background process history (apparently as mandated by POSIX?). If you should happen to want to wait on a pruned process, the shell can't find it in the history and assumes this to mean that it never knew about it (i.e., that it "is not a child of this shell").

Resources