why Monit create multiple process if it already running - linux

myscript.sh is running then I start Monit with config
set daemon 20 with start delay 5
check program myscript with path "/home/myscript.sh"
if status != 0 then exec "/home/myscript.sh"
or
set daemon 20 with start delay 5
check program myscript with path "/bin/bash /home/myscript.sh"
if status != 0 then exec "/home/myscript.sh"
is my config wrong? why monit create new process
# ps -ef | grep myscript.sh
root 1580 1571 0 13:29 ? /bin/bash /home/myscript.sh < created by monit
root 32675 15735 0 13:23 pts/2 /bin/bash /home/myscript.sh

Related

Cgroup unexpectedly propagates SIGSTOP to the parent

I have a small script to run a command inside a cgroup that limits CPU time:
$ cat cgrun.sh
#!/bin/bash
if [[ $# -lt 1 ]]; then
echo "Usage: $0 <bin>"
exit 1
fi
sudo cgcreate -g cpu:/cpulimit
sudo cgset -r cpu.cfs_period_us=1000000 cpulimit
sudo cgset -r cpu.cfs_quota_us=100000 cpulimit
sudo cgexec -g cpu:cpulimit sudo -u $USER "$#"
sudo cgdelete cpu:/cpulimit
I let the command run: ./cgrun.sh /bin/sleep 10
Then I send SIGSTOP to the sleep command from another terminal. Somehow at this moment the parent commands, sudo and cgexec receive this signal as well. Then, I send SIGCONT to the sleep command, which allows sleep to continue.
But at this moment sudo and cgexec are stopped and never reap the zombie of the sleep process. I don't understand how this can happen? And how can I prevent it? Moreover, I cannot send SIGCONT to sudo and cgexec, because I'm sending the signals from user, while these commands run as root.
Here is how it looks in htop (some columns omitted):
PID USER S CPU% MEM% TIME+ Command
1222869 user S 0.0 0.0 0:00.00 │ │ └─ /bin/bash ./cgrun.sh /bin/sleep 10
1222882 root T 0.0 0.0 0:00.00 │ │ └─ sudo cgexec -g cpu:cpulimit sudo -u user /bin/sleep 10
1222884 root T 0.0 0.0 0:00.00 │ │ └─ sudo -u desertfox /bin/sleep 10
1222887 user Z 0.0 0.0 0:00.00 │ │ └─ /bin/sleep 10
How can create a cgroup in a way that SIGSTOP is not bounced to parent processes?
UPD
If I start the process using systemd-run, I do not observe the same behavior:
sudo systemd-run --uid=$USER -t -p CPUQuota=10% sleep 10
Instead of using the "cg tools", I would do it the "hard way" with the shell commands to create the cpulimit cgroup (it is a mkdir), set the cfs parameters (with echo command in the corresponding cpu.cfs_* files), create a sub-shell with the (...) notation, move it into the cgroup (echo command of its pid into the tasks file of the cgroup) and execute the requested command in this subshell.
Hence, cgrun.sh would look like this:
#!/bin/bash
if [[ $# -lt 1 ]]; then
echo "Usage: $0 <bin>" >&2
exit 1
fi
CGTREE=/sys/fs/cgroup/cpu
sudo -s <<EOF
[ ! -d ${CGTREE}/cpulimit ] && mkdir ${CGTREE}/cpulimit
echo 1000000 > ${CGTREE}/cpulimit/cpu.cfs_period_us
echo 100000 > ${CGTREE}/cpulimit/cpu.cfs_quota_us
EOF
# Sub-shell in background
(
# Pid of the current sub-shell
# ($$ would return the pid of the father process)
MY_PID=$BASHPID
# Move current process into the cgroup
sudo sh -c "echo ${MY_PID} > ${CGTREE}/cpulimit/tasks"
# Run the command with calling user id (it inherits the cgroup)
exec "$#"
) &
# Wait for the sub-shell
wait $!
# Exit code of the sub-shell
rc=$?
# Delete the cgroup
sudo rmdir ${CGTREE}/cpulimit
# Exit with the return code of the sub-shell
exit $rc
Run it (before we get the pid of the current shell to display the process hierarchy in another terminal):
$ echo $$
112588
$ ./cgrun.sh /bin/sleep 50
This creates the following process hierarchy:
$ pstree -p 112588
bash(112588)-+-cgrun.sh(113079)---sleep(113086)
Stop the sleep process:
$ kill -STOP 113086
Look at the cgroup to verify that sleep command is running into it (its pid is in the tasks file) and the CFS parameters are correctly set:
$ ls -l /sys/fs/cgroup/cpu/cpulimit/
total 0
-rw-r--r-- 1 root root 0 nov. 5 22:38 cgroup.clone_children
-rw-r--r-- 1 root root 0 nov. 5 22:38 cgroup.procs
-rw-r--r-- 1 root root 0 nov. 5 22:36 cpu.cfs_period_us
-rw-r--r-- 1 root root 0 nov. 5 22:36 cpu.cfs_quota_us
-rw-r--r-- 1 root root 0 nov. 5 22:38 cpu.shares
-r--r--r-- 1 root root 0 nov. 5 22:38 cpu.stat
-rw-r--r-- 1 root root 0 nov. 5 22:38 cpu.uclamp.max
-rw-r--r-- 1 root root 0 nov. 5 22:38 cpu.uclamp.min
-r--r--r-- 1 root root 0 nov. 5 22:38 cpuacct.stat
-rw-r--r-- 1 root root 0 nov. 5 22:38 cpuacct.usage
-r--r--r-- 1 root root 0 nov. 5 22:38 cpuacct.usage_all
-r--r--r-- 1 root root 0 nov. 5 22:38 cpuacct.usage_percpu
-r--r--r-- 1 root root 0 nov. 5 22:38 cpuacct.usage_percpu_sys
-r--r--r-- 1 root root 0 nov. 5 22:38 cpuacct.usage_percpu_user
-r--r--r-- 1 root root 0 nov. 5 22:38 cpuacct.usage_sys
-r--r--r-- 1 root root 0 nov. 5 22:38 cpuacct.usage_user
-rw-r--r-- 1 root root 0 nov. 5 22:38 notify_on_release
-rw-r--r-- 1 root root 0 nov. 5 22:36 tasks
$ cat /sys/fs/cgroup/cpu/cpulimit/tasks
113086 # This is the pid of sleep
$ cat /sys/fs/cgroup/cpu/cpulimit/cpu.cfs_*
1000000
100000
Send SIGCONT signal to the sleep process:
$ kill -CONT 113086
The process finishes and the cgroup is destroyed:
$ ls -l /sys/fs/cgroup/cpu/cpulimit
ls: cannot access '/sys/fs/cgroup/cpu/cpulimit': No such file or directory
Get the exit code of the script once it is finished (it is the exit code of the launched command):
$ echo $?
0

Killing subprocess from inside a Docker container kills the entire container

On my Windows machine, I started a Docker container from docker compose. My entrypoint is a Go filewatcher that runs a task of a taskmanager on every filechange. The executed task builds and runs the Go program.
But before I can build and run the program again after filechanges I have to kill the previous running version. But every time I kill the app process, the container is also gone.
The goal is to kill only the svc1 process with PID 74 in this example. I tried pkill -9 svc1 and kill $(pgrep svc1). But every time the parent processes are killed too.
The commandline output from inside the container:
root#bf073c39e6a2:/app/cmd/svc1# ps -aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 2.5 0.0 104812 2940 ? Ssl 13:38 0:00 /go/bin/watcher
root 13 0.0 0.0 294316 7576 ? Sl 13:38 0:00 /go/bin/task de
root 74 0.0 0.0 219284 4908 ? Sl 13:38 0:00 /svc1
root 82 0.2 0.0 18184 3160 pts/0 Ss 13:38 0:00 /bin/bash
root 87 0.0 0.0 36632 2824 pts/0 R+ 13:38 0:00 ps -aux
root#bf073c39e6a2:/app/cmd/svc1# ps -afx
PID TTY STAT TIME COMMAND
82 pts/0 Ss 0:00 /bin/bash
88 pts/0 R+ 0:00 \_ ps -afx
1 ? Ssl 0:01 /go/bin/watcher -cmd /go/bin/task dev -startcmd
13 ? Sl 0:00 /go/bin/task dev
74 ? Sl 0:00 \_ /svc1
root#bf073c39e6a2:/app/cmd/svc1# pkill -9 svc1
root#bf073c39e6a2:/app/cmd/svc1
Switching to the containerlog:
task: Failed to run task "dev": exit status 255
2019/08/16 14:20:21 exit status 1
"dev" is the name of the task in the taskmanger.
The Dockerfile:
FROM golang:stretch
RUN go get -u -v github.com/radovskyb/watcher/... \
&& go get -u -v github.com/go-task/task/cmd/task
WORKDIR /app
COPY ./Taskfile.yml ./Taskfile.yml
ENTRYPOINT ["/go/bin/watcher", "-cmd", "/go/bin/task dev", "-startcmd"]
I expect only the process with the target PID is killed and not the parent process that spawned it it.
You can use process manager like "supervisord" and configure it to re-execute your script or the command even if you killed it's process which will keep your container up and running.

Kill all process which uses specified port on Ubuntu 14.04

What's the reason the process is still alive?
List of node process running and what I tried:
root#111:/home/ubuntu# ps -e -o pid,ppid,stat,cmd | grep node
3150 1 Ss sudo /usr/bin/node /home/ubuntu/chatapp/bin/www
3152 3150 Sl /usr/bin/node /home/ubuntu/chatapp/bin/www
4407 1558 S+ grep --color=auto node
root#111:/home/ubuntu# kill -9 3150
root#111:/home/ubuntu# kill -9 3152
root#111:/home/ubuntu# ps -e -o pid,ppid,stat,cmd | grep node
4665 1 Ss sudo /usr/bin/node /home/ubuntu/chatapp/bin/www
4667 4665 Sl /usr/bin/node /home/ubuntu/chatapp/bin/www
4680 1558 S+ grep --color=auto node
Try with:
$ sudo kill -9 18200
Note the added flag '-9', which forces the murder...
From linus signal(7) man page:
...
SIGKILL 9 Term Kill signal
...
You killed process id 18200
You state node is still running, but that's process id 31261, not the one you killed...
Is the remaining process (parent pid = 1) a child proces that's been orphaned by killing 18200?

Getting the (parent) process executing the command in Linux shell

Please advice , how to verify the program that execute the process ?
For example
the following commands ( ps -ef ) will view the process sendmail in case this process is running
ps –ef | grep sendmail
root 9558 9544 019:05? 00:00:00/usr/sbin/sendmail-FCronDaemon-i-odi-oem-oi-t
what I want to find is the script that execute the binary /usr/sbin/sendmail
so my question – which flags I need to add to the syntax "ps –ef" in order to get from ps –ef the full details , include which program running the process
is it possible ?
Example and remark
If
/etc/rc3.d/sendmail run the binary /usr/sbin/sendmail
Then I expect to see the /etc/rc3.d/sendmail PATH from the command ps –ef …….
What do you need is a tree output and know the parent processes.
Example pstree -a:
[~]# pstree -a
init
├─atd
├─atop -a -w /var/log/atop.log 600
├─cron
├─dbus-daemon --system --fork --activation=upstart
├─getty -8 38400 tty4
│ ├─sshd
│ └─sshd
│ └─zsh
│ └─pstree -a
├─udevd --daemon
│ ├─udevd --daemon
│ └─udevd --daemon
├─upstart-socket- --daemon
├─upstart-udev-br --daemon
Here you can see that there is a process of zsh (my shell) which is running command pstree. The zsh itself was started by process sshd.
Here is the same output for ps -AF:
root 10006 649 0 22329 3944 0 12:48 ? 00:00:00 sshd: root#pts/2
root 10041 10006 0 10355 5276 0 12:48 pts/2 00:00:00 -zsh
root 16465 10041 0 4538 1220 0 12:52 pts/2 00:00:00 ps -AF
The second column is process id and the third column is parent process id. You see that the parent of ps -AF is the shell process 10041. You can always trace back processes to the init (process id 1) by walking them parent by parent.
In your case if you want to find /etc/rc3.d/sendmail you probably need to walk processes up from /usr/sbin/sendmail until you have something of which full path us under /etc/rc3.d.

ksh child process not ignoring SIGTERM

My ksh version is ksh93-
=>rpm -qa | grep ksh
ksh-20100621-3.fc13.i686
I have a simple script which is as below - #cat test_sigterm.sh -
#!/bin/ksh
trap 'echo "removing"' QUIT
while read line
do
sleep 20
done
I am Executing the script From Terminal 1 -
1. The ksh is started from /bin/ksh as below :
# exec /bin/ksh
2. The script is executed from this ksh-
# ./test_sigterm.sh&
[1] 12136
and Sending a "SIGTERM" From Terminal 2 -
# ps -elf | grep ksh
4 S root 12136 30437 0 84 4 - 1345 poll_s 13:09 pts/0 00:00:00 /bin/ksh ./test_sigterm.sh
0 S root 18952 18643 0 80 0 - 1076 pipe_w 13:12 pts/5 00:00:00 grep ksh
4 S root 30437 30329 0 80 0 - 1368 poll_s 10:04 pts/0 00:00:00 /bin/ksh
# kill -15 12136
I can see that my test_sigterm.sh is getting killed on receiving the "SIGTERM" in either case, when run in background (&) and foreground.
But the ksh man pages say -
Signals.
The INT and QUIT signals for an invoked command are ignored if the command is followed by & and the monitor option is not active.
Otherwise, signals have the values inherited by the shell from its parent (but see also the trap built-in command below).
Is it a know or default behaviour of ksh to NOT IGNORE SIGTERM? or is an issue with ksh child SIGTERM signal handling?
I believe that this is normal behaviour.
While it says that signals are normally inherited by background processes, the
action of the TERM signal is determined by whether the shell is interactive or not. (See the '-i' option in the ksh man page under Invocation.)
If you need the script to ignore SIGTERM, then you can add this line to it:
trap '' TERM

Resources