Systemd "OnFailure=" not starting when binary or bash exits with an error code - linux

So I have a systemd unit that needs to be monitored, restarted in case of a crash and also something done in case the unit fails. I'm working on an embedded system so this needs to be robust.
In my case we have a systemd service:
Description=Demo unit
Wants=multi-user.target
OnFailure=FailHandler#%N.service
[Service]
ExecStart=/bin/bash /home/root/demo.sh
Restart=on-failure
RestartSec=1
Type=simple
The bash I start:
echo "Started demo.sh"
current_date=`date`
sleep 10s
echo "${current_date} Demo was here" >> /home/root/demo.txt
exit 1
So far so good. The bash always exits with 1 afer 10 seconds and logs the time. The problem is that FailHandler is never called in that case. Now this is just a demo all of the applications are in C++ but the behavior is the same. Now if I manually set the wrong path to the bash file it unit fails but it starts the "OnFailure" part. Here's syslog output from having correct path:
2021-09-03T13:06:31.575094+00:00 hostname bash[1125]: Started demo.sh
2021-09-03T13:06:41.629450+00:00 hostname systemd[1]: demo.service: Main process exited, code=exited, status=1/FAILURE
2021-09-03T13:06:41.644681+00:00 hostname systemd[1]: demo.service: Failed with result 'exit-code'.
2021-09-03T13:06:41.818089+00:00 hostname systemd[1]: demo.service: Service RestartSec=100ms expired, scheduling restart.
2021-09-03T13:06:41.824005+00:00 hostname systemd[1]: demo.service: Scheduled restart job, restart counter is at 1.
2021-09-03T13:06:41.850933+00:00 hostname bash[1179]: Started demo.sh
2021-09-03T13:06:51.870376+00:00 hostname systemd[1]: demo.service: Main process exited, code=exited, status=1/FAILURE
2021-09-03T13:06:51.872611+00:00 hostname systemd[1]: demo.service: Failed with result 'exit-code'.
2021-09-03T13:06:52.117479+00:00 hostname systemd[1]: demo.service: Service RestartSec=100ms expired, scheduling restart.
2021-09-03T13:06:52.136102+00:00 hostname systemd[1]: demo.service: Scheduled restart job, restart counter is at 2.
2021-09-03T13:06:52.163865+00:00 hostname bash[1221]: Started demo.sh
Here's output from when path is incorrect:
2021-09-03T13:07:46.582269+00:00 hostnaem bash[1446]: /bin/bash: /ahome/root/daemo.sh: No such file or directory
2021-09-03T13:07:46.588715+00:00 hostnaem systemd[1]: daemo.service: Main process exited, code=exited, status=127/n/a
2021-09-03T13:07:46.590356+00:00 hostnaem systemd[1]: daemo.service: Failed with result 'exit-code'.
2021-09-03T13:07:46.694616+00:00 hostnaem systemd[1]: daemo.service: Service RestartSec=100ms expired, scheduling restart.
2021-09-03T13:07:46.701519+00:00 hostnaem systemd[1]: daemo.service: Scheduled restart job, restart counter is at 1.
2021-09-03T13:07:46.720879+00:00 hostnaem systemd[1]: daemo.service: Start request repeated too quickly.
2021-09-03T13:07:46.721405+00:00 hostnaem systemd[1]: daemo.service: Failed with result 'exit-code'.
2021-09-03T13:07:46.722723+00:00 hostnaem systemd[1]: daemo.service: Triggering OnFailure= dependencies.
2021-09-03T13:07:46.804815+00:00 hostnaem FailHandler.sh[1457]: Failed application: daemo
2021-09-03T13:07:46.822342+00:00 hostnaem bash[1457]: error: cannot stat /etc/logrotate.d/daemo: No such file or directory
2021-09-03T13:07:46.841577+00:00 hostnaem FailHandler.sh[1457]: ERROR: Failed logrotate for daemo crash
2021-09-03T13:07:46.977003+00:00 hostnaem systemd[1]: FailHandler#daemo.service: Succeeded.
I understand from the syslog that it starts the FailHandler whenever number of restarts reaches StartLimitBurst=1 within 100ms but is there a way that it starts anytime the application exits with an error code?

Thank you man. I took one look at the link you sent and it landed. The solution in my case was:
ExecStopPost=/bin/bash -c 'if [ "$$EXIT_STATUS" != 0 ]; then systemctl start FailHandler#%N.service; fi'

Related

Error installing docker on Arch Linux: error initializing graphdriver: loopback attach failed

So I stupidly got a laptop with the latest and greatest hardware, so I had to install ArchLinux (version 5.19.13-arch1-1) instead of debian. I'm only vaguely familiar with linux, my fiance has been helping but this has him stumped too so here we are.
I followed the wiki instructions for installing docker, but consistently get the following error when attempting to start the service:
sudo journalctl --since "5 minutes ago" -xeu docker.service
Oct 26 14:37:34 werk systemd[1]: docker.service: Start request repeated too quickly.
Oct 26 14:37:34 werk systemd[1]: docker.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░
░░ The unit docker.service has entered the 'failed' state with result 'exit-code'.
Oct 26 14:37:34 werk systemd[1]: Failed to start Docker Application Container Engine.
░░ Subject: A start job for unit docker.service has failed
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░
░░ A start job for unit docker.service has finished with a failure.
░░
░░ The job identifier is 10988 and the job result is failed.
lines 334-369/369 (END)
sudo journalctl --since "5 minutes ago" -ru docker.service
Oct 26 14:37:34 werk systemd[1]: Failed to start Docker Application Container Engine.
Oct 26 14:37:34 werk systemd[1]: docker.service: Failed with result 'exit-code'.
Oct 26 14:37:34 werk systemd[1]: docker.service: Start request repeated too quickly.
Oct 26 14:37:34 werk systemd[1]: Stopped Docker Application Container Engine.
Oct 26 14:37:34 werk systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Oct 26 14:37:34 werk systemd[1]: Failed to start Docker Application Container Engine.
Oct 26 14:37:34 werk systemd[1]: docker.service: Failed with result 'exit-code'.
Oct 26 14:37:34 werk systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Oct 26 14:37:34 werk dockerd[71427]: failed to start daemon: error initializing graphdriver: loopback attach failed
I don't really understand loopback devices, but the result of losetup -f seemed sus:
losetup -f
losetup: cannot find an unused loop device: Permission denied
[me#werk ~]$ sudo losetup -f
losetup: cannot find an unused loop device: No such device
We also theorized that the docker user didn't have sufficient permissions to do what it needed to do, but could not find any user specified in the docker.service file (located at /etc/systemd/system/multi-user.target.wants/docker.service). We tried to run the ExecStart command line specified in the docker.service file as root, but got the following error:
[me#werk ~]$ sudo /usr/bin/dockerd -H fd://
[sudo] password for me:
INFO[2022-10-26T14:56:00.483059782-06:00] Starting up
failed to load listeners: no sockets found via socket activation: make sure the service was started by systemd
So that was a bit of a bust.
To be extra clear, installation was as follows:
pacman -Syu docker
systemctl enable docker.service
systemctl start docker.service
And in case it matters, I'm on a Framework laptop with 12th gen Intel core processors.

Systemd restarts my process which is not dead

I have the current systemd service /etc/systemd/system/getty#tty1.service.d/override.conf:
[Service]
ExecStart=
ExecStart=-/home/auto/script.sh
Type=simple
StandardInput=tty
StandardOutput=tty
The point is, user turn on the computer and can manage few stuff on the computer and didnt need to log in.
Systemd starts the scripts it works fine. But after few minutes systemd restart "script.sh" for no reason. I think the problem is "script.sh" starts some child process and systemd does not like it.
After a restart I can find these lines in syslog:
Sep 25 12:33:32 hostname systemd[1]: getty#tty1.service: Service has no hold-off time, scheduling restart.
Sep 25 12:33:32 hostname systemd[1]: getty#tty1.service: Scheduled restart job, restart counter is at 1.
Sep 25 12:33:32 hostname systemd[1]: Stopped Getty on tty1.
Sep 25 12:33:32 hostname systemd[1]: getty#tty1.service: Found left-over process 1711 (docker) in control group while starting unit. Ignoring.
Sep 25 12:33:32 hostname systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
I tried a lot of things like Type=forking or RestartSec=86400s but Systemd still restart script.sh.
Any idea?
Best regards,

NODE APP: Systemd startup script not working?

Trying to create a start up script for my nodejs app which runs on port 3000.
Issue: The node-app.server script is not working and I think it's because ExecStart pathway is wrong. When I go to the server IP in Chrome nothing shows.
The node app was created with npm generator, and I normally use npm start to start the app. I've added path to bin/wwww, here:
[Unit]
Description=tweetMonster twtiter server - making your environment variables rad
Documentation=https://example.com
After=network.target
[Service]
Environment=NODE_PORT=3000
Type=simple
User=ubuntu
ExecStart=/home/ubuntu/twitter-server/bin/www.js
Restart=on-failure
[Install]
WantedBy=multi-user.target
My app is running on Ubuntu 18 at /home/ubuntu/twitter-server . And if do ls:
/twitter-server$ ls
app.js node_modules package.json routes
bin package-lock.json public views
Please help!
ERROR TERMINAL:
Nov 01 05:40:25 ip-172-31-22-207 systemd[1]: node-app.service: Main process exited, code=exited, status=203/EXEC
Nov 01 05:40:25 ip-172-31-22-207 systemd[1]: node-app.service: Failed with result 'exit-code'.
Nov 01 05:40:26 ip-172-31-22-207 systemd[1]: node-app.service: Service hold-off time over, scheduling restart.
Nov 01 05:40:26 ip-172-31-22-207 systemd[1]: node-app.service: Scheduled restart job, restart counter is at 5.
Nov 01 05:40:26 ip-172-31-22-207 systemd[1]: Stopped hello_env.js - making your environment variables rad.
Nov 01 05:40:26 ip-172-31-22-207 systemd[1]: node-app.service: Start request repeated too quickly.
Nov 01 05:40:26 ip-172-31-22-207 systemd[1]: node-app.service: Failed with result 'exit-code'.
Nov 01 05:40:26 ip-172-31-22-207 systemd[1]: Failed to start hello_env.js - making your environment variables rad.
Nov 01 06:20:11 ip-172-31-22-207 systemd[1]: /etc/systemd/system/node-app.service:10: Executable path is not absolute: "node /home/ubuntu/twitter-server/bin/www.js"
Nov 01 06:24:35 ip-172-31-22-207 systemd[1]: /etc/systemd/system/node-app.service:10: Executable path is not absolute: "node /home/ubuntu/twitter-server/bin/www.js"
root#ip-172-31-22-207:/etc/systemd/system#
There are 2 issues with your service config.
First, wrap the value of Environment with double quotes:
Environment="NODE_PORT=3000"
Second,
You need to use node to run the ExecStart script. /home/ubuntu/twitter-server/bin/www.js is no command in itself.
Do,
ExecStart=/bin/bash -c '$$(which node) /home/ubuntu/twitter-server/bin/www.js'
I recommend this package (service-systemd) for a simple program
Sometimes you just want an "old style" daemon for simple services.
Sometimes you have to deploy in small devices (like a RaspberryPi) and you can't use Docker and all the band.

Changing systemd.service TimeoutSec value to “infinity” has no effect

My app.service file's [Service] part is the following:-
[Service]
Type=forking
Restart=no
IgnoreSIGPIPE=no
GuessMainPID=no
ExecStart=/opt/app/appl_init.d start
ExecStop=/opt/app/appl_init.d stop
TimeoutSec=infinity
After which I installed the app, and the file is correctly copied to /usr/lib/systemd/system/app.service.
I have run systemctl daemon-reload, but it seems to have no effect on the start up time! It fails just as I run systemctl start app or systemctl reload app.service with the following error:-
Job for app.service failed because a fatal signal was delivered to the control process. See "systemctl status app.service" and "journalctl -xe" for details
Output of systemctl status app is:-
● app.service - ApplicationTest
Loaded: loaded (/opt/app/appl_init.d; enabled; vendor preset: disabled)
Active: failed (Result: signal) since Tue 2017-03-21 01:55:22 EDT; 1min 4s ago
Docs: man:app(8)
Process: 4126 ExecStart=/opt/app/appl_init.d start (code=killed, signal=KILL)
Mar 21 01:55:22 centosvm systemd[1]: Starting ApplicationTest...
Mar 21 01:55:22 centosvm systemd[1]: app.service start operation timed out. Terminating.
Mar 21 01:55:22 centosvm systemd[1]: app.service stop-final-sigterm timed out. Killing.
Mar 21 01:55:22 centosvm systemd[1]: app.service: control process exited, code=killed status=9
Mar 21 01:55:22 centosvm systemd[1]: Failed to start ApplicationTest.
Mar 21 01:55:22 centosvm systemd[1]: Unit app.service entered failed state.
Mar 21 01:55:22 centosvm systemd[1]: app.service failed.
Another queer thing that I noticed is when I run systemctl show app.service -p TimeoutSec, I don't get any result; it's blank?
I have tried doing a systemctl reboot, but still, no dice.
Of course, when I change the value to anything else like TimeoutSec=5min, then it works perfectly fine. But I really need this application to take up infinity.
Where am I going wrong?
TimeoutSec=0 fixed the problem.
Apparently, if you are using a version of systemd older than 229, you will need to use 0 instead of infinity to disable the timeout.

Setting up systemctl for uwsgi

I'm trying to set up uwsgi service as /etc/systemd/system/emperor.uwsgi.service
[Unit]
Description=uWSGI Emperor
After=syslog.target
[Service]
ExecStart=/root/uwsgi/uwsgi --ini /etc/uwsgi/emperor.ini
# Requires systemd version 211 or newer
RuntimeDirectory=uwsgi
Restart=always
KillSignal=SIGQUIT
Type=notify
StandardError=syslog
NotifyAccess=all
[Install]
WantedBy=multi-user.target
When trying to start it, I get the following error:
ubuntu#ip-172-31-16-133:~$ sudo systemctl start emperor.uwsgi.service
Job for emperor.uwsgi.service failed because the control process exited with error code. See "systemctl status emperor.uwsgi.service" and "journalctl -xe" for details.
This is the output for when I checked the status:
ubuntu#ip-172-31-16-133:~$ sudo systemctl status emperor.uwsgi.service
● emperor.uwsgi.service - uWSGI Emperor
Loaded: loaded (/etc/systemd/system/emperor.uwsgi.service; disabled; vendor preset: enabled)
Active: inactive (dead)
Jan 30 11:16:05 ip-172-31-16-133 systemd[1]: Stopped uWSGI Emperor.
Jan 30 11:16:05 ip-172-31-16-133 systemd[1]: Starting uWSGI Emperor...
Jan 30 11:16:05 ip-172-31-16-133 systemd[1]: emperor.uwsgi.service: Main process exited, code=exited
Jan 30 11:16:05 ip-172-31-16-133 systemd[1]: Failed to start uWSGI Emperor.
Jan 30 11:16:05 ip-172-31-16-133 systemd[1]: emperor.uwsgi.service: Unit entered failed state.
Jan 30 11:16:05 ip-172-31-16-133 systemd[1]: emperor.uwsgi.service: Failed with result 'exit-code'.
Jan 30 11:16:05 ip-172-31-16-133 systemd[1]: emperor.uwsgi.service: Service hold-off time over, sche
Jan 30 11:16:05 ip-172-31-16-133 systemd[1]: Stopped uWSGI Emperor.
Jan 30 11:16:05 ip-172-31-16-133 systemd[1]: emperor.uwsgi.service: Start request repeated too quick
Jan 30 11:16:05 ip-172-31-16-133 systemd[1]: Failed to start uWSGI Emperor.
I've had similar issues. It seems systemd swallows some output when failing to start a (UWSGI) service. Here are a couple of things to check to figure out what's causing the issue:
Check systemd journal: journalctl -b -u $service
Try to run the service manually: simply run the cmdline specified after ExecStart= in the systemd service file; so in your example: /root/uwsgi/uwsgi --ini /etc/uwsgi/emperor.ini
Either of these should shed some light as to whether the service fails to start.

Resources