Systemd's StartLimitIntervalSec and StartLimitBurst never work - linux

I tried to restrict the number of a service (in a container) restart. The OS version is centos-release-7-5, the service file is pretty much as below (removed some parameters for reading convenience). It should be pretty straight forward as some other posts pointed out (Post of Server Fault restart limit 1 , Post of Stack Overflow restart limit 2 ). Yet StartLimitBurst and StartLimitIntervalSec never works for me.
I tested with several ways: (1) I check the service PID, kill the service with "kill -9 ****" several times. The service always gets restarted after 20s! (2) I also tried to mess up the service file, make the container never runs. Still, it doesn't work, the service file just keep restarting.
Any idea?
[Unit]
Description=Hello Fluentd
After=docker.service
Requires=docker.service
StartLimitBurst=2
StartLimitIntervalSec=150s
[Service]
EnvironmentFile=/etc/environment
ExecStartPre=-/usr/bin/docker stop "fluentd"
ExecStartPre=-/usr/bin/docker rm -f "fluentd"
ExecStart=/usr/bin/docker run fluentd
ExecStop=/usr/bin/docker stop "fluentd"
Restart=always
RestartSec=20s
SuccessExitStatus=143
[Install]
WantedBy=multi-user.target

I posted the problem in UNIT stack exchange. Anyway in case somebody search it here, I also posted my answer here since I found the issue. All the doc online suggests those all parameters are in UNIT file (systemd unit file), but still in my system (centos 7.5), they are in service file. Besides the name is "StartLimitInterval", not "StartLimitIntervalSec".

Related

systemd After=nginx.service is not working

I am trying to setup a custom systemd service on my linux system and was experimenting with it
Following is my custom service, where it will trigger a bash file
[Unit]
Description=Example systemd service.
After=nginx.service
[Service]
Type=simple
ExecStart=/bin/bash /usr/bin/test_service.sh
[Install]
WantedBy=multi-user.target
Since I have mentioned After=nginx.service i was expecting nginx serivce to start automatically
So after starting the above service, i check the status of nginx, which has not started
However if i replace After with Wants it works
Can someone differenciate between After and Wants and when to use what?
Specifying After=foo tells systemd how to order the units if they are both started at the same time. It will not cause the foo unit to autostart.
Use After=foo in combination with Wants=foo or Requires=foo to start foo if it's not already started and also to keep desired order of the units.
So your [Unit] should include:
[Unit]
Description=Example systemd service.
After=nginx.service
Wants=nginx.service
Difference between Wants and Requires:
Wants= : This directive is similar to Requires= , but less strict. Systemd will attempt to start any units listed here when this unit is activated. If these units are not found or fail to start, the current unit will continue to function. This is the recommended way to configure most dependency relationships.

bottle error "critical error while processing request:" when launched from systemd

I have a server built on bottle that works great when launched from userland. The server appears on port 8088 and appears to be communicating to the outside world, but when I contact the app all I get is the very informative "Critical error while processing request:schema" which is the url of the app.
My systemd file is below:
[Unit]
Description=Survey Service
After=multi-user.target
Conflicts=getty#tty1.service
[Service]
User=ubuntu
Type=simple
Working-directory=/home/ubuntu/survey
ExecStart=/usr/bin/python3 /home/ubuntu/survey/server.py
[Install]
WantedBy=multi-user.target
I've found several articles related to the informative error message, but none related with systemd. As I said, the app runs perfectly when launched as user ubuntu in the project directory with the very simple command "python3 server.py" but seems to be missing... something when systemd tries to launch it.
Systemd reports the process is running and, as I said, I'm able to connect to the app... it just fails in an orderly fashion with this message, and I'm lost as to why. I suspect a permissions problem, but doesn't "user" and "Working-directory" take care of that? All files used by the app are in that directory or directories below it.
Apparently doing it the old fashion way works: set systemd to run a bash script as such:
cat /home/ubuntu/survey/server.sh
#!/bin/bash
cd /home/ubuntu/survey/
python3 server.py
Works just great. So my question now becomes one about systemd: what is the point of "Working-directory" if it does not actually set to that working directory?

ExpressJS Server Goes Offline Every Night - 502 Bad Gateway

I have a website with Nginx installed as a reserve proxy for an ExpressJS server (proxies to port 3001). This uses Node and ReactJS for my frontend application.
This is simply a testing website currently, and isn't known or used by any users. I have this installed on a Digital Ocean Droplet with Ubuntu.
Every morning when I wake up, I load my website and see 502 Bad Gateway. The problem is, I don't know how to find out how this happened. I have PM2 installed which should automatically restart my ExpressJS server but it hasn't done so, and when I run pm2 list, my application is still showing online:
When I run pm2 logs, I get the following error (I am running this as an Administrator):
So I'll run pm2 restart all to restart the app, but then I don't see any crash information. However on this occasion when taking this screenshot, there were a couple of unusual requests. /robots.txt, /sitemap.xml and /.well-known/security.txt, but nothing indicating a crash:
When I look at my Nginx error.log file, all I can see is the following:
There is, however, something obscure within my access.log ([09/Oct/2018:06:33:19 +0000]) but I have no idea what this means:
If I run curl localhost:3001 whilst the server is offline, I will receive a connection error message. This works fine after I run pm2 restart all.
I'm completely stuck with this and even the smallest bit of help would be appreciated greatly, even if it's just to tell me I'm barking up the wrong tree completely and need to look elsewhere - thank you.
I think you should check this github thread, it seems like it could help you.
Basically, after few hours, a Nodejs server stop functioning, and the poor nginx can not forward its requests, as the service listening to the forward port is dead. So it triggers a 502 error.
It was all due to a memory leak, that leads to a massive garbage collection, then to the server to crash. Check your memory consumption, you could have some surprises. And try to debug your app code, a piece (dependency) at the time.
Updated answer:
So, i will add another branch to my question as it seems it has not helped you so far.
You could try to get rid of pm2, and use systemd to manage your app life cycle.
Create a service file
sudo vim /lib/systemd/system/appname.service
this is a simple file i used myself for a random ExpressJS app:
[Unit]
Description=YourApp Site Server
[Service]
ExecStart=/home/user/appname/index.js
Restart=always
Environment=PATH=/usr/bin:/usr/local/bin
Environment=NODE_ENV=production
WorkingDirectory=/home/user/appname
[Install]
WantedBy=multi-user.target
Note that it will try to restart if it fails somehow Restart=always
Manage it with systemd
Register the new service with:
sudo systemctl daemon-reload
Now start your app from systemd with:
sudo systemctl start appname
from now on you should be able to manage your app life cycle with the usual systemd commands.
You could add stdout and stderr to syslog to understand what you app is doing
StandardOutput=syslog
StandardError=syslog
Hope it helps more
You cannot say when exactly NodeJS will crash, or will do big GC, or will stun for other reason.
Easiest way to cover such issues is to do health check and restart an app. This should not be an issue when working with cluster.
Please look at health check module implementation, you may try to use it, or write some simple shell script to do the check

systemd does not reload updated Application code

recently I faced this weird problem with systemd, I have a nodejs app and I run using systemd, well everything works, until I do make any changes to my application code and restart my systemd service, But my newly made changes doesn't reflect in execution(unless I restart my machine).
The other thing I observed, if I use very small application test code then it works as intended, I my assumption is my application code size might be causing this behavior.
Thanks in advance.
[Unit]
Description=sandbox
Documentation=https://example.com
PartOf=appbase.target
[Service]
ExecStart=/home/user/.nvm/versions/node/vx.x.x/bin/node /home/user/repo/Server/SandboxServer.js
Restart=always
Slice=limits.slice
RestartSec=10 # Restart service after 10 seconds if node service crashes
StandardOutput=syslog # Output to syslog
StandardError=syslog # Output to syslog
SyslogIdentifier=sandbox
[Install]
WantedBy=multi-user.target
The problem is elsewhere. When a restart is issued, systemd will issue a TERM signal to the app and then a KILL signal if the TERM signal doesn't work, finally the unit will start back up.
Have you confirmed the app is actually stopping and starting?
You could add a signal handler to the TERM or KILL signals to check.

Systemd service failing on startup

I'm trying to get a nodejs server to run on startup, so I created the following systemd unit file:
[Unit]
Description=TI SensorTag Communicator
After=network.target
[Service]
ExecStart=/usr/bin/node /home/pi/sensortag-comm/sensortag.js
User=root
[Install]
WantedBy=multi-user.target
I'm not sure what I'm doing wrong here. It seems to fail before the nodejs script even starts, as no logging occurs. My script is dependent on mysql 5.5 (I think this is where I'm running into an issue). Any insight, or even a different solution would be appreciated.
Also, it runs fine once I'm logged into the system.
Update
The service is enabled, and is logging through journalctl. I'll update with the results on 7/11/16.
Not sure why it didn't work the first time, but upon checking journalctl the issue was 100% that MySQL hadn't started. I once again changed it to After=MySQL.service and it worked perfectly!
If there is no mention of the service at all in the output of journalctl that could indicate that the service was not enabled to start at boot.
Make you run systemctl enable my-unit-name before your next boot test.
Also, since you depend on MySQL being up and running, you should declare that with something like: After=mysql.service. The exact service name may depend on your Linux distribution, which you didn't state.
Adding User=root adds nothing, as system units would be run by root by default anyway.
When you said "it fails", you didn't specify whether it was failing at boot time, or with a test run by systemctl start my-unit-name.
After attempting to start a service, there should be logging if you run journalctl -u my-unit.name.service.
You might also consider adding StandardOutput=journal to your unit file to make sure you capture output from the service you are running as well.

Resources