Why processes run from a shell script has parent id 1 - linux

running the following command would spawn a python process with parent pid 1
echo "python3 -m http.server 2>&1 &" > a && chmod 777 a && ./a && ps -ef | grep "python3 -m http.server"
result
501 4622 1 0 4:45PM ttys000 0:00.00 python3 -m http.server
while running the following
python3 -m http.server 2>&1 &
ps -ef | grep "python3 -m http.server"
will have something different
501 4646 665 0 4:51PM ttys000 0:00.07 python3 -m http.server
Can anyone explain?

All processes must have a parent. If a parent spawns a child, then the parent exits, the child must still have parent. When this happens, its parent is set to the init process, which has an pid of 1.
As for why, the above 2 cases are different... The second has its parent set to the shell pid, and the shell has not yet exited.
As for the first, you are spawning it from a shell, the server is put into the background by the use of '&', the shell then exits. We then go through paragraph 1.

Related

Why does SIGHUP not work on busybox sh in an Alpine Docker container?

Sending SIGHUP with
kill -HUP <pid>
to a busybox sh process on my native system works as expected and the shell hangs up. However, if I use docker kill to send the signal to a container with
docker kill -s HUP <container>
it doesn't do anything. The Alpine container is still running:
$ CONTAINER=$(docker run -dt alpine:latest)
$ docker ps -a --filter "id=$CONTAINER" --format "{{.Status}}"
Up 1 second
$ docker kill -s HUP $CONTAINER
4fea4f2dabe0f8a717b0e1272528af1a97050bcec51babbe0ed801e75fb15f1b
$ docker ps -a --filter "id=$CONTAINER" --format "{{.Status}}"
Up 7 seconds
By the way, with an Ubuntu container (which runs bash) it does work as expected:
$ CONTAINER=$(docker run -dt debian:latest)
$ docker ps -a --filter "id=$CONTAINER" --format "{{.Status}}"
Up 1 second
$ docker kill -s HUP $CONTAINER
9a4aff456716397527cd87492066230e5088fbbb2a1bb6fc80f04f01b3368986
$ docker ps -a --filter "id=$CONTAINER" --format "{{.Status}}"
Exited (129) 1 second ago
Sending SIGKILL does work, but I'd rather find out why SIGHUP does not.
Update: I'll add another example. Here you can see that busybox sh generally does hang up on SIGHUP successfully:
$ busybox sh -c 'while true; do sleep 10; done' &
[1] 28276
$ PID=$!
$ ps -e | grep busybox
28276 pts/5 00:00:00 busybox
$ kill -HUP $PID
$
[1]+ Hangup busybox sh -c 'while true; do sleep 10; done'
$ ps -e | grep busybox
$
However, running the same infinite sleep loop inside the docker container doesn't quit. As you can see, the container is still running after SIGHUP and only exits after SIGKILL:
$ CONTAINER=$(docker run -dt alpine:latest busybox sh -c 'while true; do sleep 10; done')
$ docker ps -a --filter "id=$CONTAINER" --format "{{.Status}}"
Up 14 seconds
$ docker kill -s HUP $CONTAINER
31574ba7c0eb0505b776c459b55ffc8137042e1ce0562a3cf9aac80bfe8f65a0
$ docker ps -a --filter "id=$CONTAINER" --format "{{.Status}}"
Up 28 seconds
$ docker kill -s KILL $CONTAINER
31574ba7c0eb0505b776c459b55ffc8137042e1ce0562a3cf9aac80bfe8f65a0
$ docker ps -a --filter "id=$CONTAINER" --format "{{.Status}}"
Exited (137) 2 seconds ago
$
(I don't have Docker env at hand for a try. Just guessing.)
For your case, docker run must be running busybox/sh or bash as PID 1.
According to Docker doc:
Note: A process running as PID 1 inside a container is treated specially by Linux: it ignores any signal with the default action. So, the process will not terminate on SIGINT or SIGTERM unless it is coded to do so.
For the differece between busybox/sh and bash regarding SIGHUP ---
On my system (Debian 9.6, x86_64), the signal masks for busybox/sh and bash are as follows:
busybox/sh:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 82817 0.0 0.0 6952 1904 pts/2 S+ 10:23 0:00 busybox sh
PENDING (0000000000000000):
BLOCKED (0000000000000000):
IGNORED (0000000000284004):
3 QUIT
15 TERM
20 TSTP
22 TTOU
CAUGHT (0000000008000002):
2 INT
28 WINCH
bash:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 4871 0.0 0.1 21752 6176 pts/16 Ss 2019 0:00 /usr/local/bin/bash
PENDING (0000000000000000):
BLOCKED (0000000000000000):
IGNORED (0000000000380004):
3 QUIT
20 TSTP
21 TTIN
22 TTOU
CAUGHT (000000004b817efb):
1 HUP
2 INT
4 ILL
5 TRAP
6 ABRT
7 BUS
8 FPE
10 USR1
11 SEGV
12 USR2
13 PIPE
14 ALRM
15 TERM
17 CHLD
24 XCPU
25 XFSZ
26 VTALRM
28 WINCH
31 SYS
As we can see busybox/sh does not handle SIGHUP so the signal is ignored. Bash catches SIGHUP so docker kill can deliver the signal to Bash and then Bash will be terminated because, according to its manual, "the shell exits by default upon receipt of a SIGHUP".
UPDATE 2020-03-07 #1:
Did a quick test and my previous analysis is basically correct. You can verify like this:
[STEP 104] # docker run -dt debian busybox sh -c \
'trap exit HUP; while true; do sleep 1; done'
331380090c59018dae4dbc17dd5af9d355260057fdbd2f2ce9fc6548a39df1db
[STEP 105] # docker ps
CONTAINER ID IMAGE COMMAND CREATED
331380090c59 debian "busybox sh -c 'trap…" 11 seconds ago
[STEP 106] # docker kill -s HUP 331380090c59
331380090c59
[STEP 107] # docker ps
CONTAINER ID IMAGE COMMAND CREATED
[STEP 108] #
As I showed earlier, by default busybox/sh does not catch SIGHUP so the signal will be ignored. But after busybox/sh explicitly trap SIGHUP, the signal will be delivered to it.
I also tried SIGKILL and yes it'll always terminate the running container. This is reasonable since SIGKILL cannot be caught by any process so the signal will always be delivered to the container and kill it.
UPDATE 2020-03-07 #2:
You can also verify it this way (much simpler):
[STEP 110] # docker run -ti alpine
/ # ps
PID USER TIME COMMAND
1 root 0:00 /bin/sh
7 root 0:00 ps
/ # kill -HUP 1 <-- this does not kill it because linux ignored the signal
/ #
/ # trap 'echo received SIGHUP' HUP
/ # kill -HUP 1
received SIGHUP <-- this indicates it can receive SIGHUP now
/ #
/ # trap exit HUP
/ # kill -HUP 1 <-- this terminates it because the action changed to `exit`
[STEP 111] #
Like the other answer already points out, the docs for docker run contain the following note:
Note: A process running as PID 1 inside a container is treated specially by Linux: it ignores any signal with the default action. So, the process will not terminate on SIGINT or SIGTERM unless it is coded to do so.
This is the reason why SIGHUP doesn't work on busybox sh inside the container. However, if I run busybox sh on my native system, it won't have PID 1 and therefore SIGHUP works.
There are various solutions:
Use --init to specify an init process which should be used as PID 1.
You can use the --init flag to indicate that an init process should be used as the PID 1 in the container. Specifying an init process ensures the usual responsibilities of an init system, such as reaping zombie processes, are performed inside the created container.
The default init process used is the first docker-init executable found in the system path of the Docker daemon process. This docker-init binary, included in the default installation, is backed by tini.
Trap SIGHUP and call exit yourself.
docker run -dt alpine busybox sh -c 'trap exit HUP ; while true ; do sleep 60 & wait $! ; done'
Use another shell like bash which exits on SIGHUP by default, doesn't matter if PID 1 or not.

Why does executing a simple command in a grouping command does not fork a subshell process, and the compound command will do it

I know that grouping commands(command-list) creates a subshell environment, and each listed command is executed in that subshell. But if I execute a simple command in the grouping command, (use the ps command to output the processes), then no subshell process is output. But if I tried to execute a list of commands (compound command) in the grouping command, then a subshell process is output. Why does it produce such a result?
A test of executing a simple command (only a ps command) in a grouping command:
[root#localhost ~]# (ps -f)
with the following output:
UID PID PPID C STIME TTY TIME CMD
root 1625 1623 0 13:49 pts/0 00:00:00 -bash
root 1670 1625 0 15:05 pts/0 00:00:00 ps -f
Another test of executing a compound command(a list of commands) in a grouping command:
[root#localhost ~]# (ps -f;cd)
with the following output:
UID PID PPID C STIME TTY TIME CMD
root 1625 1623 0 13:49 pts/0 00:00:00 -bash
root 1671 1625 0 15:05 pts/0 00:00:00 -bash
root 1672 1671 0 15:05 pts/0 00:00:00 ps -f
I tested a lot of other commands (compound commands and simple commands), but the results are the same. I guess even if I execute a simple command in a grouping command, bash should fork a subshell process, otherwise it can't execute the command. But why can't I see it?
Bash optimizes the execution. It detects that only one command is inside the ( ) group and calls fork + exec instead of fork + fork + exec. That's why you see one bash process less in the list of processes. It is easier to detect when using command that take more time ( sleep 5 ) to eliminate timing. Also, you may want to read this thread on unix.stackexchange.
I think the optimization is done somewhere inside execute_cmd.c in execute_in_subshell() function (arrows > added by me):
/* If this is a simple command, tell execute_disk_command that it
might be able to get away without forking and simply exec.
>>>> This means things like ( sleep 10 ) will only cause one fork
If we're timing the command or inverting its return value, however,
we cannot do this optimization. */
and in execute_disk_command() function we can also read:
/* If we can get away without forking and there are no pipes to deal with,
don't bother to fork, just directly exec the command. */
It looks like an optimization and dash appears to be doing it too:
Running
bash -c '( sleep 3)' & sleep 0.2 && ps #or with dash
as does, more robustly:
strace -f -e trace=clone dash -c '(/bin/sleep)' 2>&1 |grep clone # 1 clone
shows that the subshell is skipped, but if there's post work to be done in the subshell after the child, the subshell is created:
strace -f -e trace=clone dash -c '(/bin/sleep; echo done)' 2>&1 |grep clone #2 clones
Zsh and ksh are taking it even one step further and for (when they see it's the last command in the script):
strace -f -e trace=clone ksh -c '(/bin/sleep; echo done)' 2>&1 |grep clone # 0 clones
they don't fork (=clone) at all, execing directly in the shell process.

Can't detach child process when main process is started from systemd

I want to spawn long-running child processes that survive when the main process restarts/dies. This works fine when running from the terminal:
$ cat exectest.go
package main
import (
"log"
"os"
"os/exec"
"syscall"
"time"
)
func main() {
if len(os.Args) == 2 && os.Args[1] == "child" {
for {
time.Sleep(time.Second)
}
} else {
cmd := exec.Command(os.Args[0], "child")
cmd.SysProcAttr = &syscall.SysProcAttr{Setsid: true}
log.Printf("child exited: %v", cmd.Run())
}
}
$ go build
$ ./exectest
^Z
[1]+ Stopped ./exectest
$ bg
[1]+ ./exectest &
$ ps -ef | grep exectest | grep -v grep | grep -v vim
snowm 7914 5650 0 23:44 pts/7 00:00:00 ./exectest
snowm 7916 7914 0 23:44 ? 00:00:00 ./exectest child
$ kill -INT 7914 # kill parent process
[1]+ Exit 2 ./exectest
$ ps -ef | grep exectest | grep -v grep | grep -v vim
snowm 7916 1 0 23:44 ? 00:00:00 ./exectest child
Note that the child process is still alive after parent process was killed. However, if I start the main process from systemd like this...
[snowm#localhost exectest]$ cat /etc/systemd/system/exectest.service
[Unit]
Description=ExecTest
[Service]
Type=simple
ExecStart=/home/snowm/src/exectest/exectest
User=snowm
[Install]
WantedBy=multi-user.target
$ sudo systemctl enable exectest
ln -s '/etc/systemd/system/exectest.service' '/etc/systemd/system/multi-user.target.wants/exectest.service'
$ sudo systemctl start exectest
... then the child also dies when I kill the main process:
$ ps -ef | grep exectest | grep -v grep | grep -v vim
snowm 8132 1 0 23:55 ? 00:00:00 /home/snowm/src/exectest/exectest
snowm 8134 8132 0 23:55 ? 00:00:00 /home/snowm/src/exectest/exectest child
$ kill -INT 8132
$ ps -ef | grep exectest | grep -v grep | grep -v vim
$
How can I make the child survive?
Running go version go1.4.2 linux/amd64 under CentOS Linux release 7.1.1503 (Core).
Solution is to add
KillMode=process
to the service block. Default value is control-group which means systemd cleans up any child processes.
From man systemd.kill
KillMode= Specifies how processes of this unit shall be killed. One of
control-group, process, mixed, none.
If set to control-group, all remaining processes in the control group
of this unit will be killed on unit stop (for services: after the stop
command is executed, as configured with ExecStop=). If set to process,
only the main process itself is killed. If set to mixed, the SIGTERM
signal (see below) is sent to the main process while the subsequent
SIGKILL signal (see below) is sent to all remaining processes of the
unit's control group. If set to none, no process is killed. In this
case, only the stop command will be executed on unit stop, but no
process be killed otherwise. Processes remaining alive after stop are
left in their control group and the control group continues to exist
after stop unless it is empty.
The only way I know to solve this is to launch the child process with the --scope argument.
systemd-run --user --scope firefox
KillMode has been mentioned here also, but changing the KillMode also means that if your main process crashes, systemd won't restart it if any child process is still running.
If you cannot (like me) to change the KillMode of the service for some reason, you could try the at command (see man).
You can schedule your command to run 1 minute ahead. See an example:
# this will remove all .tmp files from "/path/" in 1 minute ahead (this task will run once)
echo rm /path/*.tmp | at now + 1 minute

How to find/kill a specific python program

There are two different python programs running in this VM
one is a background job who monitors a folder and then 'does stuff' (with several workers)
10835 ? Sl 0:03 python main.py
10844 ? Sl 34:02 python main.py
10845 ? S 33:43 python main.py
the second one is started via script
20056 pts/1 S+ 0:00 /bin/bash ./exp.sh
20069 pts/1 S+ 0:00 /bin/bash ./exp.sh
20087 pts/1 S+ 0:10 python /home/path/second.py
i have tried numerous things to find a way to kill only the main program (i want to build a cron watchdog), but non succeded
first one i want to find only the hanging 'python main.py' process (accompanied by [defunct]), but i cant even find just this process alone.
the upper ones are from ps -ax (so they both are running currently)
pgrep 'python' returns all PIDs, also from second.py which i dont want: (not usefull, so is pkill therefore)
pgrep 'python'
10835
10844
10845
20087
pgrep 'python main.py' always returns empty, so does pgrep 'main.py;
the only thing which works
ps ax | grep 'python main.py'
but this one also returns its own PID, grepping 'ps' isn't a prefered solution afair. when main.py hangs, it shows "python main.py [defunct]". a
ps ax | grep 'python main.py [defunct]'
is useless as test as it always returns true. pgrep for anything more than 'python' also returns always false. i am a bit clueless.
This works for me. Found it on the pgrep bro pages.
Find the pids of processes with 'test.py' as an argument, like 'python test.py'
pgrep -f test.py
And I use it to check if a python process is running:
searcher="backend/webapi.py"
if pgrep -f "$searcher" > /dev/null
then
echo "$(date +%y-%m-%d-%H-%M-%S) $searcher is alive. Doing nothing."
else
echo "No $searcher. Kickstarting..."
pushd $HOME/there/;
./run_backend
popd;
echo "Pgrepping $searcher:"
pgrep "$searcher" # out to loggers
fi
In your daemon python script you should create PID file:
def writePidFile():
pid = str(os.getpid())
f = open('/tmp/my_pid', 'w')
f.write(pid)
f.close()
Now killing this process is simple:
kill `cat /tmp/my_pid`
Or you can just use grep and filter its own process:
ps ax | grep 'python main.py [defunct]' | grep -v grep
I think I've found a pretty safe way to do find the proper python process PID for a specific file only. The idea is to find all the python processes, then all "test.py" processes and intersect the results to find the correct PID.
I use the subprocess module in a python script (you can interpolate for .sh script):
import subprocess as subp
#Get a list of all processes with a certain name
def get_pid(name):
return list(map(int,subp.check_output(["pidof", "-c", name]).split()))
#Get a list of all pids for python3 processes
python_pids = get_pid('python3')
#Get a list of all pids for processes with "main.py" argument
main_py_pids = list(map(int,subp.check_output(["pgrep", "-f", "main.py"]).split()))
python_main_py_pid = set(python_pids).intersection(main_py_pids)
print(python_pids)
print(main_py_pids)
print(python_main_py_pid)
Result with 3 running python processes (including the one from this script) and one "sudo nano main.py" process:
Python pids: [3032, 2329, 1698]
main.py pids: [1698, 3031]
Interesection: {1698}

Sudo - Waiting for child, and getting child PID

It seems that somewhere between sudo 1.7.2p2 and 1.7.4p5 the behaviour for waiting for executing processes has changed. It looks like in the older versions sudo would start the new process, and then quit. In the newer versions it starts the new process, and then waits for it. There is a bit of a discussion about it here: http://www.sudo.ws/pipermail/sudo-users/2010-August/004461.html which mentions that it is to stop it from breaking PAM session support.
This change is breaking one of my scripts which uses sudo to execute commands in the background, as with the older version sudo the command I want to execute would be backgrounded, and with the new version it is sudo itself that is backgrounded.
For example, the process returned by $! in this case is for sleep
user#localhost$ sudo -V
Sudo version 1.7.2p2
user#localhost$ sudo -u poweruser sleep 60 &
[1] 17491
user#localhost$ ps -fp $!
UID PID PPID C STIME TTY TIME CMD
poweruser 17491 17392 0 16:43 pts/0 00:00:00 sleep 60
Whereas in this case it is for sudo
user#localhost$ sudo -V
Sudo version 1.7.4p5
user#localhost$ sudo -u poweruser sleep 60 &
[1] 792
user#localhost$ ps -fp $!
UID PID PPID C STIME TTY TIME CMD
root 792 29257 0 16:42 pts/3 00:00:00 sudo -u poweruser sleep 60
Is it possible to get the process ID for a child process executed by sudo version 1.7.4p5? The $! variable returns the PID for sudo, and running sudo with the -b option doesn' seem to make the child PID available. Is it possible (without recompiling sudo) to revert the behaviour of sudo to stop it from waiting for child processes?
Thanks
This is certainly a hack, and it doesn't set $!, but you can echo the pid of the command:
$ sudo sh -c 'echo $$; exec sleep 60'
I'm guessing that your explanation of the old behavior is not quite right and that sudo simply exec'd the command rather than forking and exiting. Echoing the pid and then exec'ing the desired command might work for you, but you may need creative redirections. For example:
#!/bin/sh
exec 3>&1
pid=$( sudo sh -c 'echo $$; exec sh -c "{ sleep 1;
echo my pid is $$; }" >&3 &')
echo Child pid is $pid
In the above, you lose the pid of sudo...but it wouldn't be too hard to find it.

Resources