Monit script not working to restart service - linux

This is my first post so please be patient with me!
I have tried to create script that checks if service is unreachable (http error code), then Monit should restart program (Preview Service). Monit is run as user "spark".
This is phantomjs-check.sh code:
#!/bin/bash
# source: /opt/monit/bin/phantomjs-check.sh
url="localhost:9001/preview/phantomjs"
response=$(curl -sL -w "%{http_code}\\n" $url | grep 200)
if [ "$response" = "}200" ]
then
echo "-= Toimii!!!! =-"
exit 1
else
echo "-= RiKKi!!!! =-"
exit 0
fi
[root#s-preview-1 bin]#
If I manually kill previewservice and run that script I get exit code of 0 which is how that should work.
In Monit I have following conf:
check program phantomjs with path "/opt/monit/bin/phantomjs-check.sh"
if status = 0 then exec "/opt/monit/bin/testi.sh"
Currently I added some logging to it and this is test.sh code:
#!/bin/sh
# source: /opt/monit/bin/testi.sh
############# Added this for loggin purposes ############
#########################################################
dt=$(date '+%d/%m/%Y %H:%M:%S');
echo Testi.sh run at $dt >> /tmp/testi.txt
# Original part of the script
sudo bash /opt/previewservice/preview-service.sh start
In /etc/sudoers file I have line:
spark ALL=(ALL) NOPASSWD: /opt/previewservice/preview-service.sh
This command works from cli and it starts/restarts previewservice. I can run "testi.sh" script manually as spark [spark#s-preview-1 bin]$ ./testi.sh and it works as intended, but even Monit gets info that service is down it doesn't start.
$ cat /tmp/testi.txt
Testi.sh run at 05/01/2018 10:30:04
Testi.sh run at 05/01/2018 10:31:04
Testi.sh run at 05/01/2018 10:31:26
$ cat /tmp/previews.txt (This line was created by preview-service.sh start script so it has been run.
File created 05/01/2018 09:26:44
********************************
Preview-service.sh run at 05/01/2018 10:31:26
tail -f -n 1000 /opt/monit/logfile shows following
[EET Jan 5 10:29:04] error : 'phantomjs' '/opt/monit/bin/phantomjs-check.sh' failed with exit status (0) -- -= RiKKi!!!! =-
[EET Jan 5 10:29:04] info : 'phantomjs' exec: /opt/monit/bin/testi.sh
[EET Jan 5 10:30:04] error : 'phantomjs' '/opt/monit/bin/phantomjs-check.sh' failed with exit status (0) -- -= RiKKi!!!! =-
[EET Jan 5 10:30:04] info : 'phantomjs' exec: /opt/monit/bin/testi.sh
[EET Jan 5 10:31:04] error : 'phantomjs' '/opt/monit/bin/phantomjs-check.sh' failed with exit status (0) -- -= RiKKi!!!! =-
[EET Jan 5 10:31:04] info : 'phantomjs' exec: /opt/monit/bin/testi.sh
[EET Jan 5 10:32:04] error : 'phantomjs' '/opt/monit/bin/phantomjs-check.sh' failed with exit status (0) -- -= RiKKi!!!! =-
[EET Jan 5 10:32:04] info : 'phantomjs' exec: /opt/monit/bin/testi.sh
[EET Jan 5 10:33:04] info : 'phantomjs' status succeeded
And the last status succeeded comes when I run that testi.sh script as spark without sudoing.
Any tips what I should try next? I appreciate all the help I can get!
Monit Previewservice status - Failed
Manually as spark it works.

Monit is usually running at root user. Is it your case ? If yes, you probably don't need the sudo part.
After regarding you script working outside of Monit but not from Monit, Monit is having its own PATH environment variable which is very small. It is recommended to write full path to your script/binairies as:
/usr/bin/sudo /bin/bash /opt/previewservice/preview-service.sh start

Related

Trouble running a custom monit check, stderr maybe?

Trying to run this check from monit, but it doesn't work. The gravity program sends its output to stderr. Could it be that monit doesn't handle this properly because of the way it exec's the check?
contents of system.monitrc:
check program gravityStatus with path /usr/local/bin/check.sh
with timeout 10 seconds
if status !=0 then alert
check.sh:
root#tiki:~# cat /usr/local/bin/check.sh
#!/bin/bash
#This will return zero if all good
/usr/bin/gravity status |& /usr/bin/jq .SyncInfo.catching_up | grep -q 'false'
Output:
Program 'gravityStatus'
status Status failed
monitoring status Monitored
monitoring mode active
on reboot start
last exit value 1
last output parse error: Invalid numeric literal at line 1, column 6
data collected Tue, 01 Feb 2022 19:52:37
If I execute the contents of check.sh on the command line, the script works:
root#tiki:~# /usr/bin/gravity status |& /usr/bin/jq .SyncInfo.catching_up | grep -q 'false'
root#tiki:~# echo $?
0
I figured it out. I want to thank #boppy for his comment, it was very helpful. Here's what I did:
I changed the check.sh to just run 'gravity status' and then looked at the monit status. It says this:
Program 'gravityStatus'
status Status failed
monitoring status Monitored
monitoring mode active
on reboot start
last exit value 2
last output panic: $HOME is not defined
goroutine 1 [running]:
github.com/cosmos/cosmos-sdk/simapp.init.0()
/go/pkg/mod/github.com/cosmos/cosmos-sdk#v0.44.5/simapp/app.go:182 +0x189
The problem was that gravity status would die before it could send output to the jq process. gravity is a command that has to look at $HOME/.gravity were a bunch of configs are located. So the solution was to set $HOME to /root which is where all the gravity stuff is setup.

[: : integer expression expected , by systemctl status of my service that is calling a bash script in ubuntu 18.04

I do not have much experience with bash scripts, but i got the idea from the internet.
My bash script uses xprintidle to shutdown after the computer is in idle for some time.
I can run the script in terminal without any problem.
But when the /etc/systemd/system/poweroff.service is calling the script it gives the error in the systemctl status.
Jul 30 16:43:40 godo systemd[1]: Started autopoweroff.
Jul 30 16:43:42 godo bash[3107]: couldn't open display
Jul 30 16:43:42 godo bash[3107]: /usr/local/bin/poweroff.sh: line 5: [: : integer expression expected
Jul 30 16:43:42 godo bash[3107]: end
Here is the script:
#!/bin/bash
sleep 2
myidle=$(xprintidle)
myidletime=$((10000))
while [ "$myidle" -le "$myidletime" ]; do
echo $myidle
sleep 1
myidle=$(xprintidle)
done
#sudo shutdown -P now
#shutdown -P 5
echo "end"
And here is the service:
[Unit]
Description=autopoweroff
[Service]
ExecStart=/bin/bash /usr/local/bin/poweroff.sh
[Install]
WantedBy=multi-user.target
I hope you can help me and I do not waste your time with these beginner questions.
Thanks
When xprintidle does not have a display it print: "couldn't open display", you are trying to then compare this invalid value as an interger using "-le".
Since xprintidle returns exit code 1 when it does not have a display, you can use
set -e
at the start of your script to exit when an error occurs.
xprintidle - utility printing user's idle time in X
When your script runs in the systemd context, it has no X server, so xprintidle fails and output couldn't open display to stderr.
Your statement myidle=$(xprintidle) causes the myidle assignment to fail.
At this point you have to decide what you want to do when the X environment is not available.
A possiblity, is to have myidle with a default 0 value:
typeset -i myidle # Tells Bash it is an int and default to 0 if not assigned a value
myidle=$(xprintidle 2>/dev/null) || true # no error state generated
I think you need another way to get the idle value of the current currently running X session.
Here it is:
#!/bin/dash
sleep 2
# get the X DISPLAY of the first logged-in user with a X session
DISPLAY="$(
w --short --no-header \
| awk '{ if( match($3, ":") ) { print $3; exit; } }'
)"
export DISPLAY
myidletime=$((10000))
while myidle=$(xprintidle 2>/dev/null) && [ "$myidle" -le $myidletime ]; do
echo "$myidle"
sleep 1
done
#sudo shutdown -P now
#shutdown -P 5
echo "end"

Script doesn't start manually (or on boot) (init.d)

I'm running tinkerOS which is a distribution of debian. But for some reason the cwhservice that works on raspbian (also debian based) doesn't run on tinkerOS.
The script is placed in /etc/init.d/ and is called cwhservice, systemctl deamon-reload has been done and the code is as follows :
#!/bin/sh
### BEGIN INIT INFO
# Provides: CWH
# Required-Start: $all
# Required-Stop:
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Starts the CWH
# Description: Starts the CWH
### END INIT INFO
case "$1" in
start)
/opt/cwh/start.sh > /opt/cwh/log.scrout 2> /opt/cwh/log.screrr
;;
stop)
/opt/cwh/stop.sh
;;
restart)
/opt/cwh/stop.sh
/opt/cwh/start.sh
;;
*)
echo "Usage: $0 {start|stop|restart}"
esac
exit 0
when I run : sudo service cwhservice start I get the following error :
Job for cwhservice.service failed because the control process exited with error code.
See "systemctl status cwhservice.service" and "journalctl -xe" for details.
systemctl status cwhservice.service gives :
● cwhservice.service - LSB: Starts the CWH
Loaded: loaded (/etc/init.d/cwhservice; generated; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2017-08-24 13:36:22 UTC; 1min 21s ago
Docs: man:systemd-sysv-generator(8)
Process: 15431 ExecStart=/etc/init.d/cwhservice start (code=exited, status=203/EXEC)
Aug 24 13:36:22 linaro-alip systemd[1]: Failed to start LSB: Starts the CWH.
Aug 24 13:36:22 linaro-alip systemd[1]: cwhservice.service: Failed with result 'exit-code'.
So after fiddling with all code and values I still didn't got it too work so I tried to remodel the reboot script which ended up currently as :
#! /bin/sh
### BEGIN INIT INFO
# Provides: kaas2
# Required-Start:
# Required-Stop:
# Default-Start:
# Default-Stop: 6
# Short-Description: Execute the reboot command.
# Description:
### END INIT INFO
case "$1" in
start)
# No-op
/opt/cwh/start.sh
echo "foo" >&2
;;
restart|reload|force-reload)
echo "Error: argument '$1' not supported" >&2
exit 3
;;
stop)
;;
status)
exit 0
;;
*)
echo "Usage: $0 start|stop" >&2
exit 3
;;
esac
sudo service cwhservice start doesn't return an error but just does nothing. But for some strange reason sudo service cwhservicer restart actually starts the start.sh script but doesn't return the echo... So I'm totally lost at this point and wasted 2 days...
Any ideas on how to create a deamon which I can start on boot and starts the start.sh script on debian?

Unable to start Node on system reboot Ubuntu Crontab

I have tried this with adding the forever start code in /etc/rc.local didn't work.
When I use the #reboot keyword in /etc/rc.local it says #reboot cannot be found.
So I went back to using crontab Here is my crontab script. All other crontabs are working except the reboot one. In syslog, it says
Jun 4 09:51:12 ip-172-31-28-35 /usr/sbin/irqbalance: Balancing is ineffective on systems with a single cache domain. Shutting down
Jun 4 09:51:12 ip-172-31-28-35 cron[959]: (CRON) STARTUP (fork ok)
Jun 4 09:51:12 ip-172-31-28-35 cron[959]: (CRON) INFO (Running #reboot jobs)
Jun 4 09:51:12 ip-172-31-28-35 CRON[1005]: (ubuntu) CMD (/usr/bin/sudo -u ubuntu /usr/local/bin/forever start home/ubuntu/chat2/index.js)
Which shows that the reboot command in my cron tab is working but for some reason forever is still not starting node. After reboot , I run forever list and it says No forever processes running
I am assuming the problem is somehow with the node and forever paths. I am new to this and dont know which exact path to use on this statement in crontab.
I have also tried the following:
#reboot /usr/local/bin/forever start -c /usr/local/bin/node /home/ubuntu/chat2/index.js
and
#reboot /usr/local/bin/forever start /home/ubuntu/chat2/index.js
None of these are working.
If I run which forever it says
/usr/local/bin/forever
If I run which node it says
/usr/local/bin/node
If I get the full path of my index.js app file, by doing readlink -f index.js in my chat2 directory it says
/home/ubuntu/chat2/index.js
I just want to run this command every time my system reboots. I want to start my node app. The following line works perfect when I cd to the chat2 directory manually. I want this to work on reboot itself.
forever -m5000 -w start index.js
You can create a service with you code instead of using cron. Actually I prefer that because you can stop or start it whenever you want and you can also run it on the system reboot or start.
So:
1- Create a service in /etc/init.d/name_of_file
#!/bin/bash
#/etc/init.d/name_of_file
### BEGIN INIT INFO
# Provides: name
# Required-Start: $syslog
# Required-Stop: $syslog
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: add service
# Description:
#
### END INIT INFO
# Some things that run always
case "$1" in
start)
echo "Starting app_name "
touch /var/lock/app_name
cd /where/is/your/file
node index.js &
;;
stop)
echo " Stopping "
rm /var/lock/app_name
sudo pkill -f node
;;
status)
if [ -e /var/lock/app_name ]
then
echo "app_name is running"
else
echo "app_name is not running"
fi
;;
*)
echo "Usage:service app_name{start|stop|status}"
exit 1
;;
esac
exit 0
So after that you have created a service for running you nodejs application.
You have to give running permission to that script
chmod +x /etc/init.d/app_name
Now the only thing you have to do is configure this to run on boot.
Run:
sudo update-rc.d app_name defaults
And then every time you reboot you computer the service will start itself.
Suggest redirect stdout/stderr to file to debug why your script in crontab not work:
/usr/local/bin/forever start -c /usr/local/bin/node /home/ubuntu/chat2/index.js >/tmp/forever.log 2>&1 &
See log file for details after reboot.
You also can try pm2 , like forever but support buildin system start script generate, and will launch your apps after reboot.

how to monitor gearmand daemon by Monit?

So the configuration file for monitoring gearman server is:
set logfile /var/log/monit.log
check process gearmand with pidfile /var/run/gearmand.pid
start program = "sudo gearmand --pid-file=/var/run/gearmand.pid"
stop program = "sudo kill all gearmand"
if failed port 4730 protocol http then restart
from monit.log
[EST Nov 26 19:42:39] info : 'gearmand' start: sudo
[EST Nov 26 19:42:39] error : Error: Could not execute sudo
[EST Nov 26 19:43:09] error : 'gearmand' failed to start
but Monit says that process failed to start. Does anyone know how to make it work? Thanks in advance.
check process gearman_daemon with pidfile /var/run/gearmand/gearmand.pid
start program = "/bin/bash -c '/usr/sbin/gearmand -d --job-retries 3 --log-file /var/log/gearmand/gearmand.log --pid-file /var/run/gearmand/gearmand.pid --queue-type libsqlite3 --libsqlite3-db /var/tmp/gearman-queue.sqlite3'"
stop program = "/bin/bash -c '/bin/killall gearmand'"

Resources