how to monitor gearmand daemon by Monit? - gearman

So the configuration file for monitoring gearman server is:
set logfile /var/log/monit.log
check process gearmand with pidfile /var/run/gearmand.pid
start program = "sudo gearmand --pid-file=/var/run/gearmand.pid"
stop program = "sudo kill all gearmand"
if failed port 4730 protocol http then restart
from monit.log
[EST Nov 26 19:42:39] info : 'gearmand' start: sudo
[EST Nov 26 19:42:39] error : Error: Could not execute sudo
[EST Nov 26 19:43:09] error : 'gearmand' failed to start
but Monit says that process failed to start. Does anyone know how to make it work? Thanks in advance.

check process gearman_daemon with pidfile /var/run/gearmand/gearmand.pid
start program = "/bin/bash -c '/usr/sbin/gearmand -d --job-retries 3 --log-file /var/log/gearmand/gearmand.log --pid-file /var/run/gearmand/gearmand.pid --queue-type libsqlite3 --libsqlite3-db /var/tmp/gearman-queue.sqlite3'"
stop program = "/bin/bash -c '/bin/killall gearmand'"

Related

How detect and restart a Node.js script running via SSH?

I'm running a .js script with Node via SSH on my web host (Bluehost). I have a shared hosting, so I just downloaded/unzipped node and I run a script in SSH terminal like so:
> ./node/bin/node ./script.js
the script is continuously printing some output in an endless loop but after some time (about an hour) it gets killed by the server.
How do I detect it and restart the script?
I tried to create a cron job that runs restart.sh every minute in hopes to run my shell command if the process is not detected:
#!/bin/bash
if pgrep node >/dev/null
then
echo "Process is running." > /home2/xxxx/txt.txt
else
ps aux > /home2/xxxx/txt.txt
fi
but don't see any processes in the txt.txt:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
xxx 2 0.0 0.0 113292 2696 ? SN 05:27 0:00 /bin/bash /home2/xxx/restart.sh
xxx 4 0.0 0.0 155460 3988 ? RN 05:27 0:00 ps aux
You can use pm2 for monitoring your node process
npm install -g pm2
Start a node process :
pm2 start -n "My process" node myscript.js
PM2 site : https://pm2.keymetrics.io/
PM2 will restart your script if needed.
To see node process :
pm2 list
But if your script end after 1 hour, perharps is there a problem with your code.
You can find out and error logs in : ~user/.pm2/logs

Why does SIGHUP not work on busybox sh in an Alpine Docker container?

Sending SIGHUP with
kill -HUP <pid>
to a busybox sh process on my native system works as expected and the shell hangs up. However, if I use docker kill to send the signal to a container with
docker kill -s HUP <container>
it doesn't do anything. The Alpine container is still running:
$ CONTAINER=$(docker run -dt alpine:latest)
$ docker ps -a --filter "id=$CONTAINER" --format "{{.Status}}"
Up 1 second
$ docker kill -s HUP $CONTAINER
4fea4f2dabe0f8a717b0e1272528af1a97050bcec51babbe0ed801e75fb15f1b
$ docker ps -a --filter "id=$CONTAINER" --format "{{.Status}}"
Up 7 seconds
By the way, with an Ubuntu container (which runs bash) it does work as expected:
$ CONTAINER=$(docker run -dt debian:latest)
$ docker ps -a --filter "id=$CONTAINER" --format "{{.Status}}"
Up 1 second
$ docker kill -s HUP $CONTAINER
9a4aff456716397527cd87492066230e5088fbbb2a1bb6fc80f04f01b3368986
$ docker ps -a --filter "id=$CONTAINER" --format "{{.Status}}"
Exited (129) 1 second ago
Sending SIGKILL does work, but I'd rather find out why SIGHUP does not.
Update: I'll add another example. Here you can see that busybox sh generally does hang up on SIGHUP successfully:
$ busybox sh -c 'while true; do sleep 10; done' &
[1] 28276
$ PID=$!
$ ps -e | grep busybox
28276 pts/5 00:00:00 busybox
$ kill -HUP $PID
$
[1]+ Hangup busybox sh -c 'while true; do sleep 10; done'
$ ps -e | grep busybox
$
However, running the same infinite sleep loop inside the docker container doesn't quit. As you can see, the container is still running after SIGHUP and only exits after SIGKILL:
$ CONTAINER=$(docker run -dt alpine:latest busybox sh -c 'while true; do sleep 10; done')
$ docker ps -a --filter "id=$CONTAINER" --format "{{.Status}}"
Up 14 seconds
$ docker kill -s HUP $CONTAINER
31574ba7c0eb0505b776c459b55ffc8137042e1ce0562a3cf9aac80bfe8f65a0
$ docker ps -a --filter "id=$CONTAINER" --format "{{.Status}}"
Up 28 seconds
$ docker kill -s KILL $CONTAINER
31574ba7c0eb0505b776c459b55ffc8137042e1ce0562a3cf9aac80bfe8f65a0
$ docker ps -a --filter "id=$CONTAINER" --format "{{.Status}}"
Exited (137) 2 seconds ago
$
(I don't have Docker env at hand for a try. Just guessing.)
For your case, docker run must be running busybox/sh or bash as PID 1.
According to Docker doc:
Note: A process running as PID 1 inside a container is treated specially by Linux: it ignores any signal with the default action. So, the process will not terminate on SIGINT or SIGTERM unless it is coded to do so.
For the differece between busybox/sh and bash regarding SIGHUP ---
On my system (Debian 9.6, x86_64), the signal masks for busybox/sh and bash are as follows:
busybox/sh:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 82817 0.0 0.0 6952 1904 pts/2 S+ 10:23 0:00 busybox sh
PENDING (0000000000000000):
BLOCKED (0000000000000000):
IGNORED (0000000000284004):
3 QUIT
15 TERM
20 TSTP
22 TTOU
CAUGHT (0000000008000002):
2 INT
28 WINCH
bash:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 4871 0.0 0.1 21752 6176 pts/16 Ss 2019 0:00 /usr/local/bin/bash
PENDING (0000000000000000):
BLOCKED (0000000000000000):
IGNORED (0000000000380004):
3 QUIT
20 TSTP
21 TTIN
22 TTOU
CAUGHT (000000004b817efb):
1 HUP
2 INT
4 ILL
5 TRAP
6 ABRT
7 BUS
8 FPE
10 USR1
11 SEGV
12 USR2
13 PIPE
14 ALRM
15 TERM
17 CHLD
24 XCPU
25 XFSZ
26 VTALRM
28 WINCH
31 SYS
As we can see busybox/sh does not handle SIGHUP so the signal is ignored. Bash catches SIGHUP so docker kill can deliver the signal to Bash and then Bash will be terminated because, according to its manual, "the shell exits by default upon receipt of a SIGHUP".
UPDATE 2020-03-07 #1:
Did a quick test and my previous analysis is basically correct. You can verify like this:
[STEP 104] # docker run -dt debian busybox sh -c \
'trap exit HUP; while true; do sleep 1; done'
331380090c59018dae4dbc17dd5af9d355260057fdbd2f2ce9fc6548a39df1db
[STEP 105] # docker ps
CONTAINER ID IMAGE COMMAND CREATED
331380090c59 debian "busybox sh -c 'trap…" 11 seconds ago
[STEP 106] # docker kill -s HUP 331380090c59
331380090c59
[STEP 107] # docker ps
CONTAINER ID IMAGE COMMAND CREATED
[STEP 108] #
As I showed earlier, by default busybox/sh does not catch SIGHUP so the signal will be ignored. But after busybox/sh explicitly trap SIGHUP, the signal will be delivered to it.
I also tried SIGKILL and yes it'll always terminate the running container. This is reasonable since SIGKILL cannot be caught by any process so the signal will always be delivered to the container and kill it.
UPDATE 2020-03-07 #2:
You can also verify it this way (much simpler):
[STEP 110] # docker run -ti alpine
/ # ps
PID USER TIME COMMAND
1 root 0:00 /bin/sh
7 root 0:00 ps
/ # kill -HUP 1 <-- this does not kill it because linux ignored the signal
/ #
/ # trap 'echo received SIGHUP' HUP
/ # kill -HUP 1
received SIGHUP <-- this indicates it can receive SIGHUP now
/ #
/ # trap exit HUP
/ # kill -HUP 1 <-- this terminates it because the action changed to `exit`
[STEP 111] #
Like the other answer already points out, the docs for docker run contain the following note:
Note: A process running as PID 1 inside a container is treated specially by Linux: it ignores any signal with the default action. So, the process will not terminate on SIGINT or SIGTERM unless it is coded to do so.
This is the reason why SIGHUP doesn't work on busybox sh inside the container. However, if I run busybox sh on my native system, it won't have PID 1 and therefore SIGHUP works.
There are various solutions:
Use --init to specify an init process which should be used as PID 1.
You can use the --init flag to indicate that an init process should be used as the PID 1 in the container. Specifying an init process ensures the usual responsibilities of an init system, such as reaping zombie processes, are performed inside the created container.
The default init process used is the first docker-init executable found in the system path of the Docker daemon process. This docker-init binary, included in the default installation, is backed by tini.
Trap SIGHUP and call exit yourself.
docker run -dt alpine busybox sh -c 'trap exit HUP ; while true ; do sleep 60 & wait $! ; done'
Use another shell like bash which exits on SIGHUP by default, doesn't matter if PID 1 or not.

Monit script not working to restart service

This is my first post so please be patient with me!
I have tried to create script that checks if service is unreachable (http error code), then Monit should restart program (Preview Service). Monit is run as user "spark".
This is phantomjs-check.sh code:
#!/bin/bash
# source: /opt/monit/bin/phantomjs-check.sh
url="localhost:9001/preview/phantomjs"
response=$(curl -sL -w "%{http_code}\\n" $url | grep 200)
if [ "$response" = "}200" ]
then
echo "-= Toimii!!!! =-"
exit 1
else
echo "-= RiKKi!!!! =-"
exit 0
fi
[root#s-preview-1 bin]#
If I manually kill previewservice and run that script I get exit code of 0 which is how that should work.
In Monit I have following conf:
check program phantomjs with path "/opt/monit/bin/phantomjs-check.sh"
if status = 0 then exec "/opt/monit/bin/testi.sh"
Currently I added some logging to it and this is test.sh code:
#!/bin/sh
# source: /opt/monit/bin/testi.sh
############# Added this for loggin purposes ############
#########################################################
dt=$(date '+%d/%m/%Y %H:%M:%S');
echo Testi.sh run at $dt >> /tmp/testi.txt
# Original part of the script
sudo bash /opt/previewservice/preview-service.sh start
In /etc/sudoers file I have line:
spark ALL=(ALL) NOPASSWD: /opt/previewservice/preview-service.sh
This command works from cli and it starts/restarts previewservice. I can run "testi.sh" script manually as spark [spark#s-preview-1 bin]$ ./testi.sh and it works as intended, but even Monit gets info that service is down it doesn't start.
$ cat /tmp/testi.txt
Testi.sh run at 05/01/2018 10:30:04
Testi.sh run at 05/01/2018 10:31:04
Testi.sh run at 05/01/2018 10:31:26
$ cat /tmp/previews.txt (This line was created by preview-service.sh start script so it has been run.
File created 05/01/2018 09:26:44
********************************
Preview-service.sh run at 05/01/2018 10:31:26
tail -f -n 1000 /opt/monit/logfile shows following
[EET Jan 5 10:29:04] error : 'phantomjs' '/opt/monit/bin/phantomjs-check.sh' failed with exit status (0) -- -= RiKKi!!!! =-
[EET Jan 5 10:29:04] info : 'phantomjs' exec: /opt/monit/bin/testi.sh
[EET Jan 5 10:30:04] error : 'phantomjs' '/opt/monit/bin/phantomjs-check.sh' failed with exit status (0) -- -= RiKKi!!!! =-
[EET Jan 5 10:30:04] info : 'phantomjs' exec: /opt/monit/bin/testi.sh
[EET Jan 5 10:31:04] error : 'phantomjs' '/opt/monit/bin/phantomjs-check.sh' failed with exit status (0) -- -= RiKKi!!!! =-
[EET Jan 5 10:31:04] info : 'phantomjs' exec: /opt/monit/bin/testi.sh
[EET Jan 5 10:32:04] error : 'phantomjs' '/opt/monit/bin/phantomjs-check.sh' failed with exit status (0) -- -= RiKKi!!!! =-
[EET Jan 5 10:32:04] info : 'phantomjs' exec: /opt/monit/bin/testi.sh
[EET Jan 5 10:33:04] info : 'phantomjs' status succeeded
And the last status succeeded comes when I run that testi.sh script as spark without sudoing.
Any tips what I should try next? I appreciate all the help I can get!
Monit Previewservice status - Failed
Manually as spark it works.
Monit is usually running at root user. Is it your case ? If yes, you probably don't need the sudo part.
After regarding you script working outside of Monit but not from Monit, Monit is having its own PATH environment variable which is very small. It is recommended to write full path to your script/binairies as:
/usr/bin/sudo /bin/bash /opt/previewservice/preview-service.sh start

Two instances of node started on linux

I have a node.js server app which is being started twice for some reason. I have a cronjob that runs every minute, checking for a node main.js process and if not found, starting it. The cron looks like this:
* * * * * ~/startmain.sh >> startmain.log 2>&1
And the startmain.sh file looks like this:
if ps -ef | grep -v grep | grep "node main.js" > /dev/null
then
echo "`date` Server is running."
else
echo "`date` Server is not running! Starting..."
sudo node main.js > main.log
fi
The log file storing the output of startmain.js shows this:
Fri Aug 8 19:22:00 UTC 2014 Server is running.
Fri Aug 8 19:23:00 UTC 2014 Server is running.
Fri Aug 8 19:24:00 UTC 2014 Server is not running! Starting...
Fri Aug 8 19:25:00 UTC 2014 Server is running.
Fri Aug 8 19:26:00 UTC 2014 Server is running.
Fri Aug 8 19:27:00 UTC 2014 Server is running.
That is what I expect, but when I look at processes, it seems that two are running. One under sudo and one without. Check out the top two processes:
$ ps -ef | grep node
root 99240 99232 0 19:24:01 ? 0:01 node main.js
root 99232 5664 0 19:24:01 ? 0:00 sudo node main.js
admin 2777 87580 0 19:37:41 pts/1 0:00 grep node
Indeed, when I look at the application logs, I see startup entries happening in duplicate. To kill these processes, I have to use sudo, even for the process that does not start with sudo. When I kill one of these, the other one dies too.
Any idea why I am kicking off two processes?
First, you are starting your node main.js application with sudo in the script startmain.sh. According to sudo man page:
When sudo runs a command, it calls fork(2), sets up the execution environment as described above, and calls the execve system call in the child process. The main sudo process waits until the command has completed, then passes the command's exit status to the security policy's close method and exits.
So, in your case the process with name sudo node main.js is the sudo command itself and the process node main.js is the node.js app. You can easily verify this - run ps auxfw and you will see that the sudo node main.js process is the parent process for node main.js.
Another way to verify this is to run lsof -p [process id] and see that the txt part for the process sudo node main.js states /usr/bin/sudo while the txt part of the process node main.js will display the path to your node binary.
The bottom line is that you should not worry that your node.js app starts twice.

Upstart respawning healthy process

I'm having an issue where upstart is respawning a Node.js (v0.8.8) process that is completely healthy. I'm on Ubunut 11.10. When I run the program from the command line it is completely stable and does not crash. But, when I run it with upstart, it gets respawned pretty consistently every few seconds. I'm not sure what is going on and none of logs seem to help. In fact, there are no error messages produced any of the upstart logs for the job. Below is my upstart script:
#!upstart
description "server.js"
start on (local-filesystems and net-device-up IFACE=eth0)
stop on shutdown
# Automtically respawn
respawn # restart when job dies
respawn limit 99 5 # give up restart after 99 respawns in 5 seconds
script
export HOME="/home/www-data"
exec sudo -u www-data NODE_ENV="production" /usr/local/bin/node /var/www/server/current/server.js >> /var/log/node.log 2>> /var/log/node.error.log
end script
post-start script
echo "server-2 has started!"
end script
The strange thing is that server-1 works perfectly fine and is setup the same way.
syslog messages look like this:
Sep 24 15:40:28 domU-xx-xx-xx-xx-xx-xx kernel: [5272182.027977] init: server-2 main process (3638) terminated with status 1
Sep 24 15:40:35 domU-xx-xx-xx-xx-xx-xx kernel: [5272189.039308] init: server-2 main process (3647) terminated with status 1
Sep 24 15:40:42 domU-xx-xx-xx-xx-xx-xx kernel: [5272196.050805] init: server-2 main process (3656) terminated with status 1
Sep 24 15:40:49 domU-xx-xx-xx-xx-xx-xx kernel: [5272203.064022] init: server-2 main process (3665) terminated with status 1
Any help would be appreciated. Thanks.
Ok, seems that it was actually monit that was restarting it. Problem has been solved. Thanks.

Resources