why defunct process generate when call exec in shell script?
Because some extra configure and sharelib should be set and preload before starting snmpd,
so I use shell script like bellow, but the problem is that a zombie process was generated every time when start the shell script.
as far as I know, exec will replace the original shell process 26452, why a child process 26453 generate and become zombie?
$# ps -ef | grep snmpd
root 26452 12652 0 10:24 pts/4 00:00:00 snmpd udp:161,udp6:161 -f -Ln -I -system_mib ifTable -c /opt/snmp/config/snmpd.conf
root 26453 26452 0 10:24 pts/4 00:00:00 [snmpd_wapper.sh] <defunct>
how to avoid the zombie process, pls help!
cat /home/xpeng/snmpd_wapper.sh
#!/bin/bash
( sleep 2;/opt/snmp/bin/snmpusm -v 3 -u myuser -l authNoPriv -a MD5 -A xpeng localhost create top myuser >/dev/null 2>&1; \
/opt/snmp/bin/snmpvacm -v 3 -u myuser -l authNoPriv -a MD5 -A xpeng localhost createSec2Group 3 top RWGroup >/dev/null 2>&1; \
/opt/snmp/bin/snmpvacm -v 3 -u myuser -l authNoPriv -a MD5 -A xpeng localhost createView all .1 80 >/dev/null 2>&1; \
/opt/snmp/bin/snmpvacm -v 3 -u myuser -l authNoPriv -a MD5 -A xpeng localhost createAccess RWGroup 3 1 1 all all none >/dev/null 2>&1 ) &
LIBRT=/usr/lib64
if [ "$(. /etc/os-release; echo $NAME)" = "Ubuntu" ]; then
LIBRT=/usr/lib/x86_64-linux-gnu
fi
echo $$>/tmp/snmpd.pid
export LD_PRELOAD=$LD_PRELOAD:$LIBRT/librt.so:/opt/xpeng/lib/libxpengsnmp.so
exec -a "snmpd" /opt/snmp/sbin/snmpd udp:161,udp6:161 -f -Ln -I -system_mib,ifTable -c /opt/snmp/config/snmpd.conf
It's a parent process' responsibility to wait for any child processes. The child process will be a zombie from the time it dies until the parent waits for it.
You started a child process, but then you used exec to replace the parent process. The new program doesn't know that it has children, so it doesn't wait. The child therefore becomes a zombie until the parent process dies.
Here's a MCVE:
#!/bin/sh
sleep 1 & # This process will become a zombie
exec sleep 30 # Because this executable won't `wait`
You can instead do a double fork:
#!/bin/sh
( # Start a child shell
sleep 1 & # Start a grandchild process
) # Child shell dies, grandchild is given to `init`
exec sleep 30 # This process now has no direct children
Related
I want to send server logs to the telegram bot. Here's my supervisor config:
[program:telegram-log-nginx]
process_name=%(program_name)s_%(process_num)02d
command=bash -c 'tail -f /var/log/nginx/error.log | /usr/share/telegram_log.sh nginx'
autostart=true
autorestart=true
numprocs=1
When I stop supervisor
supervisorctl stop telegram-log-nginx:*
the process is still running:
ps aux | grep telegram
www-data 32151 0.0 0.0 21608 3804 ? S 20:53 0:00 /bin/bash /usr/share/telegram_log.sh nginx
Is there a proper way to stop all processes?
telegram_log.sh
#!/bin/bash
CHATID="chat"
KEY="key"
SERVICE=$1
TIME="10"
URL="https://api.telegram.org/bot$KEY/sendMessage"
while IFS= read -r line; do
read -r -d '' TEXT <<- EOM
Service: $SERVICE
$line
EOM
curl -s --max-time $TIME -d "chat_id=$CHATID&disable_web_page_preview=1&text=$TEXT" $URL >/dev/null
done
├─supervisord,1101 /usr/bin/supervisord -n -c /etc/supervisor/supervisord.conf
│ ├─php,643187 /var/www/web/artisan queue:work
│ ├─php,643188 /var/www/web/artisan queue:work
│ ├─php,643189 /var/www/web/artisan queue:work
├─systemd,640839 --user
│ └─(sd-pam),640841
├─systemd-journal,406
├─systemd-logind,1102
├─systemd-resolve,807
├─systemd-timesyn,684
│ └─{systemd-timesyn},689
├─systemd-udevd,440
├─tail,643203 -f /var/log/nginx/error.log
├─telegram_log.sh,643204 /usr/share/telegram_log.sh nginx
Assuming that you have a new enough version of bash that process substitutions update $!, you can have your parent script store the PIDs of both its direct children and signal them explicitly during shutdown:
#!/usr/bin/env bash
# make our stdin come directly from tail -f; record its PID
exec < <(exec tail -f /var/log/nginx/error.log); tail_pid=$!
# start telegram_log.sh in the background inheriting our stdin; record its PID
/usr/share/telegram_log.sh nginx & telegram_script_pid=$!
# close our stdin to ensure that we don't keep the tail alive -- only
# telegram_log.sh should have a handle on it
exec </dev/null
# define a cleanup function that shuts down both subprocesses
cleanup() { kill "$tail_pid" "$telegram_script_pid"; }
# tell the shell to call the cleanup function when receiving a SIGTERM, or exiting
trap cleanup TERM EXIT
# wait until telegram_log.sh exits and exit with the same status
wait "$telegram_script_pid"
This means your config file might become something more like:
command=bash -c 'exec < <(exec tail -f /var/log/nginx/error.log); tail_pid=$!; /usr/share/telegram_log.sh nginx & telegram_script_pid=$!; exec </dev/null; cleanup() { kill "$tail_pid" "$telegram_script_pid"; }; trap cleanup TERM EXIT; wait "$telegram_script_pid"'
#CharlesDuffy has provided the answer
bash -c 'tail -f /var/log/nginx/error.log | /usr/share/telegram_log.sh nginx'
should be
bash -c 'exec < <(exec tail -f /var/log/nginx/error.log); exec /usr/share/telegram_log.sh nginx'
I have many processes by one program ( in this case node.js processes) running. Some times i need to run several ( for example 10 nodejs processes) , i start them with Makefile. I want to be able with some bash command within my Makefile to turn off those 10 process when needed, but i dont want to kill other node.js running processes. So i can use pkill node but it will kill every node processes, how can i give some name or some variable for this 10 processes, to kill only them with kill -9 or pkill?
You can store the PIDs of your child processes in a file and use it to kill them later. Example with sleep child processes:
$ cat Makefile
all: start-1 start-2 start-3
start-%:
sleep 100 & echo "$$!" >> pids.txt
kill:
kill -9 $$( cat pids.txt ); rm -f pids.txt
$ make
sleep 100 & echo "$!" >> pids.txt
sleep 100 & echo "$!" >> pids.txt
sleep 100 & echo "$!" >> pids.txt
$ ps
PID TTY TIME CMD
30331 ttys000 0:00.49 -bash
49812 ttys000 0:00.00 sleep 100
49814 ttys000 0:00.00 sleep 100
49816 ttys000 0:00.00 sleep 100
$ make kill
kill -9 $( cat pids.txt ); rm -f pids.txt
$ ps
PID TTY TIME CMD
30331 ttys000 0:00.50 -bash
Note: if you use parallel make you should pay attention to race conditions on pids.txt accesses.
You could try killing the processes by there PID (Process ID):
for example:
# ps -ax | grep nginx
22546 ? Ss 0:00 nginx: master process /usr/sbin/nginx
22953 pts/2 S+ 0:00 grep nginx
29419 ? Ss 0:00 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
29420 ? S 1:59 nginx: worker process
29421 ? S 1:54 nginx: worker process
29422 ? S 1:56 nginx: worker process
29423 ? S 1:49 nginx: worker process
29425 ? S 0:09 nginx: cache manager process
30796 ? S 1:49 nginx: worker process
and then you can kill the process with:
kill 22546; kill 22953; kill ...
You can also capture just the PID with:
# ps -ax | grep nginx | cut -d' ' -f1 |
22546
24582
29419
29420
29421
29422
29423
29425
30796
update:
you can write the PIDs to a file and pull them back in make like this:
pids:
echo ps -ax | grep nginx | cut -d' ' -f1 | > PIDs.txt \
FILE="/location/of/PIDs.txt" \
old_IFS=$IFS \
IFS=$'\n' \
lines=($(cat FILE)) \
IFS=$old_IFS \
PID=$(echo {line[4]}) \
kill $PID
I've written a small bash script to start a program every 3 seconds. This script is executed on startup and it saves its PID into a pidfile:
#!/bin/bash
echo $$ > /var/run/start_gps-read.pid
while [ true ] ; do
if [ "$1" == "stop" ] ;
then
echo "Stopping GPS read script ..."
sudo pkill -F /var/run/start_gps-read.pid
exit
fi
sudo /home/dh/gps_read.exe /dev/ttyACM0 /home/dh/gps_files/gpsMaus_1.xml
sleep 3
done
The problem is, I can't terminate the shell script by calling start_gps-read.sh stop. There it should read the pidfile and stop the inital process (from startup).
But when I call stop, the script still runs:
dh#Raspi_DataHarvest:~$ sudo /etc/init.d/start_gps-read.sh stop
Stopping GPS read script ...
dh#Raspi_DataHarvest:~$ ps aux | grep start
root 488 0.0 0.3 5080 2892 ? Ss 13:30 0:00 /bin/bash /etc/init.d/start_gps-read.sh start
dh 1125 0.0 0.2 4296 2016 pts/0 S+ 13:34 0:00 grep start
Note: The script is always executed as sudo.
Does anyone know how to stop my shell script?
The "stop" check needs to come before you overwrite the pid file, and certainly doesn't need to be inside the loop.
if [ "$1" = stop ]; then
echo "Stopping ..."
sudo pkill -F /var/run/start_gps-read.pid
exit
fi
echo "$$" > /var/run/start_gps-read.pid
while true; do
sudo /home/dh/gps_read.exe ...
sleep 3
done
I've written this bash daemon that keeps an eye on a named pipe, logs everything it sees on a file named $LOG_FILE_BASENAME.$DATE, and it also creates a filtered version of it in $ACTIONABLE_LOG_FILE:
while true
do
DATE=`date +%Y%m%d`
cat $NAMED_PIPE | tee -a "$LOG_FILE_BASENAME.$DATE" | grep -P -v "$EXCEPTIONS" >> "$ACTIONABLE_LOG_FILE"
done
pkill -P $$ # Here it's where it should kill it's children
exit 0
When the daemon is running, this is how the process table looks:
/bin/sh the_daemon.sh
\_ cat the_fifo_queue
\_ tee -a log_file.20150807
\_ grep -P -v "regexp" > filtered_log_file
The problem is that when I kill the daemon (SIGTERM), the cat, the tee, and the grep processes that where spawned by the daemon are not collected by the parent. Instead, they become orphans and keep on waiting for input on the named pipe.
Once the FIFO receives some input, then they process that input as instructed and die.
How can I make the daemon kill its children before dying? Why aren't they dying with pkill -P $$?
You want to setup a signal handler for your script which kills all members of its process group (its children) in case the script itself gets signalled:
#!/bin/bash
function handle_sigterm()
{
pkill -P $$
exit 0
}
trap handle_sigterm SIGTERM
while true
do
DATE=`date +%Y%m%d`
cat $NAMED_PIPE | tee -a "$LOG_FILE_BASENAME.$DATE" | grep -P -v "$EXCEPTIONS" >> "$ACTIONABLE_LOG_FILE"
done
handle_sigterm
exit 0
Update:
As per pilcrow's comment replace
cat $NAMED_PIPE | tee -a "$LOG_FILE_BASENAME.$DATE" | grep -P -v "$EXCEPTIONS" >> "$ACTIONABLE_LOG_FILE"
by
cat $NAMED_PIPE | tee -a "$LOG_FILE_BASENAME.$DATE" | grep -P -v "$EXCEPTIONS" >> "$ACTIONABLE_LOG_FILE" &
wait $!
My cron is like below:
$ crontab -l
0,15,30,45 * * * * /vas/app/check_cron/cronjob.sh 2>&1 > /vas/app/check_cron/cronjob.log; echo "Exit code: $?" >> /vas/app/check_cron/cronjob.log
$ more /vas/app/check_cron/cronjob.sh
#!/bin/sh
echo "starting script";
/usr/local/bin/rsync -r /vas/app/check_cron/cron1/ /vas/app/check_cron/cron2/
echo "completed running the script";
$ ls -l /usr/local/bin/rsync
-rwxr-xr-x 1 bin bin 411494 Oct 5 2011 /usr/local/bin/rsync
$ ls -l /vas/app/check_cron/cronjob.sh
-rwxr-xr-x 1 vas vas 153 May 14 12:28 /vas/app/check_cron/cronjob.sh
if i run it manually ... the script is running well.
$ /vas/app/check_cron/cronjob.sh 2>&1 > /vas/app/check_cron/cronjob.log; echo "Exit code: $?" >> /vas/app/check_cron/cronjob.log
if run by crontab, the cron generate double processes more than 30 in 24hours until i kill them manually.
$ ps -ef | grep cron | grep -v root | grep -v grep
vas 24157 24149 0 14:30:00 ? 0:00 /bin/sh /vas/app/check_cron/cronjob.sh
vas 24149 8579 0 14:30:00 ? 0:00 sh -c /vas/app/check_cron/cronjob.sh 2>&1 > /vas/app/check_cron/cronjob.log; ec
vas 24178 24166 0 14:30:00 ? 0:00 /usr/local/bin/rsync -r /vas/app/check_cron/cron1/ /vas/app/check_cron/cron2/
vas 24166 24157 0 14:30:00 ? 0:01 /usr/local/bin/rsync -r /vas/app/check_cron/cron1/ /vas/app/check_cron/cron2/
Please give me advice how to make running well and no processes still running in the system
and processes stop properly.
BR,
Noel
The output you provide seems normal, the first two processes is just /bin/sh running your cron script and the later two are the rsync processes.
It might be a permission issue if the crontab is not the same user as the one you use for testing, causing the script to take longer when run from cron. You can add -v, -vv, or even -vvv to the rsync command for increased output and then observe the cron email after each run.
One method to prevent multiple running instances of scripts is to use lock files of some sort, I find it easy to use mkdir for this purpose.
#!/bin/sh
LOCK="/tmp/$0.lock"
# If mkdir fails then the lock already exists
mkdir $LOCK > /dev/null 2>&1
[ $? -ne 0 ] && exit 0
# We clean up the lock when the script exists for any reason
trap "{ rmdir $LOCK ; exit 0 ; }" EXIT
echo "starting script";
/usr/local/bin/rsync -r /vas/app/check_cron/cron1/ /vas/app/check_cron/cron2/
echo "completed running the script";
Just make sure you have some kind of cleanup when the OS starts in case it doesn't clean up /tmp by itself. The lock might be left there if the script crashes, is killed or is running when the OS is rebooted.
Why do you worry? Is something not working? From the parent process ID's I can deduce that the shell (PID=24157) forks an rsync (24166), and the rsync forks another rsync (24178). Looks like that's just how rsync operates...
It's certainly not cron starting two rsync processes.
Instead of CRON, you might want to have a look at the Fat Controller
It works similarly to CRON but has various built-in strategies for managing cases where instances of the script you want to run would overlap.
For example, you could specify that the currently running instance is killed and a new one started, or you could specify a grace period in which the currently running instance has to finish before then terminating it and starting a new one. Alternatively, you can specify to wait indefinitely.
There are more examples and full documentation on the website:
http://fat-controller.sourceforge.net/