I'm working with parallel processing and rather than dealing with cvars and locks I've found it's much easier to run a few commands in a shell script in sequence to avoid race conditions in one place. The new problem is that one of these commands calls another program, which the OS has decided to put into a new process. I need to kill this process from the parent program, but the parent program only knows the pid of the parent (shell script), so this process keeps executing on its own.
Is there a way in bash to set a subprocess to die when the parent dies? I've tried to figure out how to execute it as a daemon because I read daemons exit when the parent dies, but it's tricky and I can't quite get it right. Thanks!
Found the problem, and this fixed it (except for some pesky messages that somehow cannot be redirected to /dev/null).
trap "trap - SIGTERM && kill -- -$$" SIGINT SIGTERM EXIT
Related
I am working on a project in which I need to loosely recreate supervisord(job control system) in D. I am using spawnShell() as opposed to spawnProcess() for ease of configuring arguments etc. This has the effect of running sh -c "command". However, it returns the PID of sh NOT of the child process(for obvious reasons). This becomes a problem because my program needs to be able to send a SIGKILL to the process if it doesn't respond to a SIGTERM after a certain period of time. I am able to send a SIGTERM no problem(presumably because sh catches the SIGTERM and passes it to it's child process/processes before exiting). However, for again obvious reasons, SIGKILL stops sh before it gets a chance to send a signal to the child process and it's left orphaned. Which brings me to my questions:
A: Can I safely assume that the PID of the spawned process will always be one higher than the PID of sh? It has behaved as such in all my testing so far.
B: If not, then is there a more elegant way(system call or such) to get a child process's PID knowing only the parent process's PID than having my program just execute pgrep -P <sh PID>?
You just need:
sh -c 'exec command'
the shell replaces itself with your command and gets out of the way, so there is no intermediate process.
No, you cannot assume pids will differ by one.
Can I safely assume that the PID of the spawned process will always be one higher than the PID of sh? It has behaved as such in all my testing so far.
No. Linux is a multitasking OS. While rare, other processes could start in between. Don't rely on a race condition.
If not, then is there a more elegant way (system call or such) to get a child process's PID knowing only the parent process's PID than having my program just execute pgrep -P <sh PID>?
Not really. Trying to navigate the process tree is a sign that your approach is wrong.
You're solving the wrong problem. Get rid of the shell middle man.
I am reading about daemonizing a process at https://en.wikipedia.org/wiki/Daemon_%28computing%29#Creation
In a strictly technical sense, a Unix-like system process is a daemon
when its parent process terminates and the daemon is assigned the init
process (process number 1) as its parent process and has no
controlling terminal. However, more commonly, a daemon may be any
background process, whether a child of the init process or not.
On a Unix-like system, the common method for a process to become a
daemon, when the process is started from the command line or from a
startup script such as an init script or a SystemStarter script,
involves:
Dissociating from the controlling tty
Becoming a session leader
Becoming a process group leader
Executing as a background task by forking and exiting (once or twice). This is required sometimes for the process to become a session
leader. It also allows the parent process to continue its normal
execution.
Setting the root directory (/) as the current working directory so that the process does not keep any directory in use that may be on
a mounted file system (allowing it to be unmounted).
Changing the umask to 0 to allow open(), creat(), and other operating system calls to provide their own permission masks and not
to depend on the umask of the caller
Closing all inherited files at the time of execution that are left open by the parent process, including file descriptors 0, 1 and 2
for the standard streams (stdin, stdout and stderr). Required files
will be opened later.
Using a logfile, the console, or /dev/null as stdin, stdout, and stderr
If the process is started by a super-server daemon, such as inetd,
launchd, or systemd, the super-server daemon will perform those
functions for the process[5][6][7] (except for old-style daemons not
converted to run under systemd and specified as Type=forking[7] and
"multi-threaded" datagram servers under inetd[5]).
Is there a step there that changes the parent process of a process
to be daemonized? It seems to me none of the steps does that?
Is changing parent process necessary when daemonize a process?
After changing the parent process of a process (a process not necessarily
to be daemonized), can the process be associated to the controlling
tty of the new parent process? (The purpose of the question is to
see whether "keeping a process disassociated from the the
controlling tty of the new parent process" is a necessary condition
of "changing the parent process of the process".)
See my related question https://unix.stackexchange.com/questions/266565/daemonize-a-process-in-shell
Thanks.
The parent of a Unix process can't be changed by the process itself. The typical method of creating a daemon involves a fork call (which creates the process that will become the daemon). The initial process then exits, and the newly-orphaned child process will be inherited by the init process which becomes it's new parent. That's handled in step 4. The only thing init will do is wait for all it's children to exit. init doesn't have a controlling TTY, so once inherited by init the daemon can't become associated with a controlling TTY anymore. The main reason to become disassociated is to prevent signals generated from the TTY (hangups and control-C's etc.) from getting to the daemon.
There are two ways daemons are usually run:
From a shell script. The script runs the daemon's executable with the & operator at the end of the command to put the daemon into the background, possibly with I/O redirection to set the daemon's stdin, stdout and/or stderr, and then exits leaving the daemon without a parent. Running an executable from the shell involves the shell doing a fork followed by an exec in the child process of the executable to be run.
The daemon program has an option to daemonize itself. When run with that option it does a fork followed in the child process by an exec of itself with an appropriate set of arguments. The parent will normally exit after the fork since the work it's been asked to do is done. If it doesn't, the child process needs an extra fork to give it a parent that can exit. NB: this is why so many programs that normally run as daemons can be run directly without becoming a daemon, the "become a daemon" option causes the child process to close stdin/stdout/stderr and then just exec it's own executable without the "become a daemon" option.
I would suggest to use daemon(3). See also credentials(7)
Your list does not mention explicitly setsid(2).
MUSL libc has a legacy/daemon.c which forks twice and do setsid
I'm developing code for Linux, and cannot seem to kill processes when running in a Jenkins environment.
I have test script that spawns processes and cleans them up as it goes through the tests. One of the processes also spawns and cleans up one of its own subprocesses. All of the "cleanup" is done by sending a SIGINT, followed by a wait. Everything works fine with this when run from a terminal, except when running through Jenkins.
When the same exact thing is run in Jenkins, processes killed with SIGINT do not die, and the call to wait blocks forever. This wreaks havoc on my test. I could update the logic to not do a blocking wait, but I don't feel I should have to change my production code to accommodate Jenkins.
Any ideas?
Process tree killer may be your answer - https://wiki.jenkins-ci.org/display/JENKINS/ProcessTreeKiller
In testing, this would usually work when I ran the tests from the command line, but would almost always fail when that unit test script was called from another script. Frankly, it was bizarre....
Then I realized that when I had stray processes, they would indeed go away when I killed them with SIGTERM. But WHY?????
I didn't find a 100%-definitive answer. But thinking about it logically, if the process is not attached to a terminal, then maybe the "terminal interrupt" signal (SIGINT), wouldn't work...?
In doing some reading, what I learned is that, basically, when it's a shell that executes a process, the SIGINT action may be set to 'ignore'. That make sense (to me, anyway) because you wouldn't want CTRL-C at the command line to kill all of your background processes:
When the shell executes a process “in the background” (or when another background process executes another process), the newly executed process should ignore the interrupt and quit characters. Thus, before a shell executes a background process, it should set SIGINT and SIGQUIT to SIG_IGN.
Our production code isn't a shell, but it is started from a shell, and Jenkins uses /bin/sh to run stuff. So, this would add up.
So, since there is an implied association between SIGINT and the existence of a TTY, SIGTERM is a better option for killing your own background processes:
It should be noted that SIGINT is nearly identical to SIGTERM. (1)
I've changed the code that kills the proxyserver processes, and the Python unit test code, to use the SIGTERM signal. Now everything runs at the terminal and in Jenkins.
I have a scenario in which after the fork the child executes using the excele() command
a linux system command in which its executes a small shell script .
And the parent does only a wait() after that . So my question is , does the parent executes
wait after an execle() which the child process executes ?
Thanks
Smita
I'm not too sure what you're asking, but the parent is in a wait() system call it will wait there until any child exits. There are other things like signals that will take it out of the exit too.
You do have to be careful in the child process that you don't accidently fall through into the parent code on error.
This (a child process doing some execve after its parent fork-ed, and the parent wait- or waitpid-ing it) is a very common scenario; most shells are working this way. You could e.g. strace -f an interactive bash shell to learn more, or study the source code of simple shells like sash
Notice that after a fork(2) syscall, the parent and the child processes may run simultanously (e.g. at the same time, especially on multi-core machines).
Someone told me that when you killed a parent process in linux, the child would die.
But I doubt it. So I wrote two bash scripts, where father.shwould invoke child.sh
Here is my script:
Now I run bash father.sh, you could check it ps -alf
Then I killed the father.sh by kill -9 24588, and I guessed the child process should be terminated but unfortunately I was wrong.
Could anyone explain why?
thx
No, when you kill a process alone, it will not kill the children.
You have to send the signal to the process group if you want all processes for a given group to receive the signal
For example, if your parent process id has the code 1234, you will have to specify the parentpid adding the symbol minus followed by your parent process id:
kill -9 -1234
Otherwise, orphans will be linked to init, as shown by your third screenshot (PPID of the child has become 1).
-bash: kill: (-123) - No such process
In an interactive Terminal.app session the foreground process group id number and background process group id number are different by design when job control/monitor mode is enabled. In other words, if you background a command in a job-control enabled Terminal.app session, the $! pid of the backgrounded process is in fact a new process group id number (pgid).
In a script having no job control enabled, however, this may not be the case! The pid of the backgrounded process may not be a new pgid but a normal pid! And this is, what causes the error message -bash: kill: (-123) - No such process, trying to kill a process group but only specifying a normal pid (instead of a pgid) to the kill command.
# the following code works in Terminal.app because $! == $pgid
{
sleep 100 &
IFS=" " read -r pgid <<EOF
$(ps -p $! -o pgid=)
EOF
echo $$ $! $pgid
sleep 10
kill -HUP -- -$!
#kill -HUP -- -${pgid} # use in script
}
pkill -TERM -P <ProcessID>
This will kill both Parent as well as child
Generally killing the parent also kills the child.
The reason that you are seeing the child still alive after killing the father is because the child only will die after it "chooses" (the kernel chooses) to handle the SIGKILL event. It doesn't have to handle it right away. Your script is running a sleep() command (i.e. in the kernel), which will not wake up to handle any events whatsoever until the sleep is completed.
Why is PPID #1? The parent has died and is no longer in the process table. child.sh isn't linked inexplicably to init now. It simply has no running parent. Saying it is linked to init creates the impression that if we somehow leave init, that init has control over shutting down the process. It also creates the impression that killing a parent will make the grandparent the owner of a child. Both are not true. That child process still exists in the process table and is running, but no new events based upon it's process ID will be handled until it handles SIGKILL. Which means that the child is a pre-zombie, walking dead, in danger of being labeled .
Killing in the process group is different, and is used to kill the siblings, and the parent by the process group #. It's probably also important to note that "killing a process" is not "killing" per se, in the human way, where you expect the process to be destroyed and all memory returned as though it never was. It just sends a particular event, among many, to the process for it to handle. If the process does not handle it properly, then after a while the OS will come along and "clean it up" forcibly.
It (killing) doesn't happen right away because the child (or even the parent) could have written something to disk and be waiting for I/O to complete or doing some other critical task that could compromise system stability or file integrity.