Rails: 6.0.3
Sidekiq: 6.1.2
Ruby 2.7.2
Running on AWS Amazon Linux 2
I'm running a fairly simply Sidekiq configuration on production, and using the boilerplate systemd/sidekiq.service file from the examples directory in the sidekiq repo.
I noticed that my workers can not run long jobs because they are killed every 1 minute or so. I was able to track down what's happening, and it appears that systemd is restarting sidekiq, even though it is successfully started. It appears that it never receives the message that the service started successfully, so systemd is killing the process.
Here are the logs:
sidekiq: 2021-06-01T23:30:56.510Z pid=24939 tid=gir INFO: Shutting down
sidekiq: 2021-06-01T23:30:56.511Z pid=24939 tid=4jxb INFO: Scheduler exiting...
systemd: Failed to start sidekiq.
systemd: Unit sidekiq.service entered failed state.
systemd: sidekiq.service failed.
sidekiq: 2021-06-01T23:30:56.513Z pid=24939 tid=gir INFO: Terminating quiet workers
sidekiq: 2021-06-01T23:30:56.513Z pid=24939 tid=4jvn INFO: Scheduler exiting...
sidekiq: 2021-06-01T23:30:57.015Z pid=24939 tid=gir INFO: Pausing to allow workers to finish...
sidekiq: 2021-06-01T23:30:57.516Z pid=24939 tid=gir INFO: Bye!
systemd: sidekiq.service holdoff time over, scheduling restart.
systemd: Starting sidekiq...
sidekiq: 2021-06-01T23:30:58.991Z pid=32046 tid=fs6 INFO: Enabling systemd notification integration
sidekiq: 2021-06-01T23:31:04.475Z pid=32046 tid=fs6 INFO: Booting Sidekiq 6.1.2 with redis options {:url=>"redis://******"}
sidekiq: 2021-06-01T23:31:08.869Z pid=32046 tid=fs6 INFO: Running in ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux]
sidekiq: 2021-06-01T23:31:08.870Z pid=32046 tid=fs6 INFO: See LICENSE and the LGPL-3.0 for licensing details.
systemd: sidekiq.service: Got notification message from PID 32046, but reception only permitted for main PID 31981
Following these messages, the sidekiq worker will successfully perform the jobs from the queue for about 1 minute before it's restarted again. This cycle continues forever.
I've tried modifying the sidekiq.service file a number of different ways, but nothing seems to do the trick. In particular, this line from the logs seems to indicate there's an issue sending the signal to the right process ID, that sidekiq correctly started up: systemd: sidekiq.service: Got notification message from PID 32046, but reception only permitted for main PID 31981
Any ideas on how I can ensure that systemd accurately knows when a job succeeds/fails to start?
Here is my current systemd/sidekiq.service file:
#
# This file tells systemd how to run Sidekiq as a 24/7 long-running daemon.
#
# Customize this file based on your bundler location, app directory, etc.
# Customize and copy this into /usr/lib/systemd/system (CentOS) or /lib/systemd/system (Ubuntu).
# Then run:
# - systemctl enable sidekiq
# - systemctl {start,stop,restart} sidekiq
#
# This file corresponds to a single Sidekiq process. Add multiple copies
# to run multiple processes (sidekiq-1, sidekiq-2, etc).
#
# Use `journalctl -u sidekiq -rn 100` to view the last 100 lines of log output.
#
[Unit]
Description=sidekiq
# start us only once the network and logging subsystems are available,
# consider adding redis-server.service if Redis is local and systemd-managed.
After=syslog.target network.target
# See these pages for lots of options:
#
# https://www.freedesktop.org/software/systemd/man/systemd.service.html
# https://www.freedesktop.org/software/systemd/man/systemd.exec.html
#
# THOSE PAGES ARE CRITICAL FOR ANY LINUX DEVOPS WORK; read them multiple
# times! systemd is a critical tool for all developers to know and understand.
#
[Service]
#
# !!!! !!!! !!!!
#
# As of v6.0.6, Sidekiq automatically supports systemd's `Type=notify` and watchdog service
# monitoring. If you are using an earlier version of Sidekiq, change this to `Type=simple`
# and remove the `WatchdogSec` line.
#
# !!!! !!!! !!!!
#
Type=simple
# If your Sidekiq process locks up, systemd's watchdog will restart it within seconds.
#WatchdogSec=10
EnvironmentFile=/opt/elasticbeanstalk/deployment/custom_env_var
WorkingDirectory=/var/app/current
# If you use rbenv:
# ExecStart=/bin/bash -lc 'exec /home/deploy/.rbenv/shims/bundle exec sidekiq -e production'
# If you use the system's ruby:
# ExecStart=/usr/local/bin/bundle exec sidekiq -e production
# If you use rvm in production without gemset and your ruby version is 2.6.5
# ExecStart=/home/deploy/.rvm/gems/ruby-2.6.5/wrappers/bundle exec sidekiq -e production
# If you use rvm in production wit gemset and your ruby version is 2.6.5
ExecStart=/bin/bash -lc 'cd /var/app/current; bundle exec sidekiq -e production -r /var/app/current -C /var/app/current/config/sidekiq.yml'
# Use `systemctl kill -s TSTP sidekiq` to quiet the Sidekiq process
# !!! Change this to your deploy user account !!!
User=root
Group=root
UMask=0002
# Greatly reduce Ruby memory fragmentation and heap usage
# https://www.mikeperham.com/2018/04/25/taming-rails-memory-bloat/
Environment=MALLOC_ARENA_MAX=2
# if we crash, restart
RestartSec=1
Restart=on-failure
# output goes to /var/log/syslog (Ubuntu) or /var/log/messages (CentOS)
StandardOutput=syslog
StandardError=syslog
# This will default to "bundler" if we don't specify it
SyslogIdentifier=sidekiq
[Install]
WantedBy=multi-user.target
Change ExecStart to:
ExecStart=/direct/path/to/bundle exec sidekiq -e production
Everything else in that line appears superfluous.
Maybe this work in your case:
Type=notify
Notify=all # or "exec"
Related
I have a service that should run set of applications in background on my Yocto embedded Linux system. I don't like an idea to create a systemd startup script for each app so I just run them from a bash script as following:
The service:
startup.service
[Unit]
Description=applications startup script
After=network.target
[Service]
Type=simple
ExecStart=/opt/somedir/startup.sh
[Install]
WantedBy=multi-user.target
and the script
startup.sh
#!/bin/bash
echo "application startup script"
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/somedir
/opt/somedir/app1 &
/opt/somedir/app2 &
/opt/somedir/app3 &
But no application started. Checking the service status give me:
systemctl status startup
● startup.service - applications startup script
Loaded: loaded (/lib/systemd/system/startup.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Thu 2021-03-25 10:33:16 UTC; 18min ago
Process: 428 ExecStart=/opt/somedir/startup.sh (code=exited, status=0/SUCCESS)
Main PID: 428 (code=exited, status=0/SUCCESS)
Mar 25 10:33:16 systemd[1]: Started application startup script.
Mar 25 10:33:16 startup.sh[428]: application startup script
Mar 25 10:33:16 systemd[1]: startup.service: Succeeded.
So the service executed on the system startup and executes the script. If I execute the script from the command line it starts the applications as expected. So what a reason that no application run?
Systemd will need to know how to run the script. Therefore either add:
#!/bin/bash
to the top line of the startup.sh script or change the ExecStart line in the systemd service file to:
ExecStart=/bin/bash -c /opt/somedir/startup.sh
Also, to ensure that the processes spawned remain persistent after being spawned, change:
Type=forking
systemd runs script startup.sh, and after that process ends, it assumes all is done so it kills off any remaining processes and the unit ends. The simplest solution is to add a wait at the end of startup.sh so that it only returns when the backgrounded processes have all ended.
The jetty on our linux server is not installed as a service as we have multiple jetty servers on different ports. And we use command./jetty.sh stop and ./jetty.sh start to stop and start jetty.
However, when I add sudo to the command, the server never stop/start successfully. When I run sudo ./jetty.sh stop, it shows
Stopping Jetty: start-stop-daemon: warning: failed to kill 18772: No such process
1 pids were not killed
No process in pidfile '/var/run/jetty.pid' found running; none killed.
and the server was not stopped.
When I run sudo ./jetty.sh start, it shows
Starting Jetty: FAILED Tue Apr 23 23:07:15 CST 2019
How could this happen? From my understanding. Using sudo gives you more power and privilege to run commands. If you can successfully execute without sudo, then the command should never fail with sudo, since it only grants superuser privilege.
As a user it uses $HOME.
As root it uses system paths.
The error you got ..
Stopping Jetty: start-stop-daemon: warning: failed to kill 18772: No such process
1 pids were not killed
No process in pidfile '/var/run/jetty.pid' found running; none killed.
... means that there was a bad pid file sitting around for a process that no longer exists.
Short answer, the processing is different if you are root (a service) vs a user (just an application).
I have a systemctl job that performs vertica backup to s3, i wanted to add a timer that runs everyday at 3am . I tried to create an override file with the timer section but when i do daemon-reload, I am getting an error `Unknown section Timer', I am unable to find the issue.
/etc/systemd/system/vertica-backup.service.d/Override.conf
[Timer]
OnCalendar=*-*-* 03:00:00
Unit=vertica-backup.service
/etc/systemd/system/vertica-backup.service:
[Unit]
Description = Vertica Backup Service
After = network.target
[Service]
User= dbadmin
ExecStart= /usr/local/bin/vertica-backup.sh
Error
May 15 15:19:47 ip-10-150-4-42.ec2.internal systemd[1]: [/etc/systemd/system/vertica-backup.service.d/override.conf:1] Unknown section 'Timer'. Ignoring.
May 15 15:19:50 ip-10-150-4-42.ec2.internal systemd[1]: [/etc/systemd/system/vertica-backup.service.d/override.conf:1] Unknown section 'Timer'. Ignoring.
[Timer] sections don't go in service files, they go in their own .timer files. Create /etc/systemd/system/vertica-backup.timer and put the [Timer] section in there.
See man systemd.timer for reference.
Create the timer file /etc/systemd/system/vertica-backup.timer
[Timer]
OnCalendar=*-*-* 03:00:00
Unit=vertica-backup.service
verify it
sudo systemd-analyze verify /etc/systemd/system/vertica-backup.timer
start the timer
sudo systemctl start vertica-backup.timer
# check it
systemctl list-timers --all
Error when install and after when try to start Firebird 3.0 Service.
Job for firebird3.0.service failed because a configured resource limit was exceeded. See "systemctl status firebird3.0.service" and "journalctl -xe" for details.
invoke-rc.d: initscript firebird3.0, action "start" failed.
dpkg: error processing package firebird3.0-server (--configure):
subprocess installed post-installation script returned error exit status 1
Processing triggers for libc-bin (2.23-0ubuntu3) ...
Processing triggers for systemd (229-4ubuntu7) ...
Processing triggers for ureadahead (0.100.0-19) ...
Errors were encountered while processing:
firebird3.0-server
E: Sub-process /usr/bin/dpkg returned an error code (1)
See return from "service firebird3.0 start":
Job for firebird3.0.service failed because a configured resource limit was exceeded. See "systemctl status firebird3.0.service" and "journalctl -xe" for details
See return from "journalctl -xe":
-- Unit firebird3.0.service has begun starting up.
Ago 26 15:41:22 server14 systemd[1]: firebird3.0.service: PID file /var/run/firebird/3.0default.pid not readable (yet?) after start: No such file or directory
Ago 26 15:41:22 server14 firebird[3509]: Security database error
Ago 26 15:41:22 server14 systemd[1]: firebird3.0.service: Daemon never wrote its PID file. Failing.
Ago 26 15:41:22 server14 systemd[1]: Failed to start Firebird Database Server ( SuperServer ).
-- Subject: Unit firebird3.0.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit firebird3.0.service has failed.
--
-- The result is failed.
Ago 26 15:41:22 server14 systemd[1]: firebird3.0.service: Unit entered failed state.
Ago 26 15:41:22 server14 systemd[1]: firebird3.0.service: Failed with result 'resources'.
I've tried many thing to solve but only way at moment is the manual start:
start-stop-daemon --quiet --start --exec /usr/sbin/fbguard --pidfile /var/run/firebird/3.0/firebird.pid -b -m -- -daemon -forever -pidfile /var/run/firebird/3.0/firebird.pid
And manual stop:
start-stop-daemon --stop --signal KILL --exec /usr/sbin/fbguard
start-stop-daemon --stop --signal KILL --exec /usr/sbin/firebird
Any ideas?
The directory /run/firebird/3.0 is not created on installation on debian based systems. So the systemd script does not work.
Workaround:
As user root do
create the directory:
mkdir -p /run/firebird/3.0
chown to firebird:
chown -R firebird:firebird /run/firebird
After doing this, firebird 3.0 should run as expected
As /run normally is an temporary directory in Debian, you could change the sytemd startup script to always execute directory creation before start of the service:
/lib/systemd/system/firebird3.0 should then look like this:
[Unit]
Description=Firebird Database Server ( SuperServer )
After=network.target
Conflicts=firebird3.0-classic.socket
[Service]
User=firebird
Group=firebird
Type=forking
# Run ExecStartPre with root-permissions
PermissionsStartOnly=true
ExecStartPre=-/bin/mkdir -p /run/firebird/3.0
ExecStartPre=/bin/chown -R firebird:firebird /run/firebird
PIDFile=/run/firebird/3.0/default.pid
ExecStart=/usr/sbin/fbguard -pidfile /run/firebird/3.0/default.pid -daemon -forever
RuntimeDirectory=firebird/3.0
StandardError=syslog
[Install]
WantedBy=multi-user.target
PermissionsStartOnly=true is necessary to be able to execute all statements except the service itself (ExecStart) as root. This is important to create the subdirectories in /run.
BY the way: the - (minus) in the first ExecStartPre line makes run the script without stopping on errors returned from directory creation, helps if directory exists, for example after an service restart.
Don't forget to reload systemd:
systemctl --system daemon-reload
I have two problems running freeswitch from systemd :
EDIT 2 - I have moved the slow start up question to here (Freeswitch pauses on check_ip at boot on centos 7.1) as although they may be related it's probably good as a standalone.
EDIT - I have noticed something else. Look at these next lines captured from the terminal output when running it from there. The gap is 4 minutes but it has been around 10 minutes before. I noticed it because I was trying to find out why port 8021 was taking several minutes to accept the fs_cli connection. Why does this happen? Never happened to me before and I've installed loads of FS boxes. This does the same thing on both 1.7 & todays 1.6.
2015-10-23 12:57:35.280984 [DEBUG] switch_scheduler.c:249 Added task 1 heartbeat (core) to run at 1445601455
2015-10-23 12:57:35.281046 [DEBUG] switch_scheduler.c:249 Added task 2 check_ip (core) to run at 1445601455
2015-10-23 13:01:31.100892 [NOTICE] switch_core.c:1386 Created ip list rfc6598.auto default (deny)
I sometimes get double processes started. Here is my status line after such an occurrence :
# systemctl status freeswitch -l
freeswitch.service - freeswitch
Loaded: loaded (/etc/systemd/system/multi-user.target.wants/freeswitch.service)
Active: activating (start) since Fri 2015-10-23 01:31:53 BST; 18s ago
Main PID: 2571 (code=exited, status=0/SUCCESS); : 2742 (freeswitch)
CGroup: /system.slice/freeswitch.service
├─usr/bin/freeswitch -ncwait -core -db /dev/shm -log /usr/local/freeswitch/log -conf /usr/local/freeswitch/conf -run /usr/local/freeswitch/run
└─usr/bin/freeswitch -ncwait -core -db /dev/shm -log /usr/local/freeswitch/log -conf /usr/local/freeswitch/conf -run /usr/local/freeswitch/run
Oct 23 01:31:53 fswitch-1 systemd[1]: Starting freeswitch...
Oct 23 01:31:53 fswitch-1 freeswitch[2742]: 2743 Backgrounding.
and there are two processes running.
The PID file is sometimes not written fast enough for the systemd process to pick it up, but by the time I see this (no matter how fast I run the command) it's always there by the time I do :
Oct 23 02:00:26 arribacom-sbc-1 systemd[1]: PID file
/usr/local/freeswitch/run/freeswitch.pid not readable (yet?) after
start.
Now, in (2) everything seems to work ok, and I can shut down the freeswitch process using
systemctl stop freeswitch
without any issues, but in (1) it just doesn't seem to do anything.
I'm wondering if the two are related, and that freeswitch is reporting back to systemd that the program is running before it actually is. Then systemd is either starting up another process or (sometimes) not.
Can anyone offer any pointers? I have tried to mail the freeswitch users list but despite being registered I simply cannot get any emails to appear on the list (but that's another problem).
* Update *
If I remove the -ncwait it seems to improve the double process starting but I still get the can't read PID warning, so I'm still sure there's an issue present, possibly around timing(?).
I'm on Centos 7.1, & my freeswitch version is
FreeSWITCH Version 1.7.0+git~20151021T165609Z~9fee9bc613~64bit (git
9fee9bc 2015-10-21 16:56:09Z 64bit)
and here's my freeswitch.service file (some things have been commented out until I understand what they are doing and any side effects they may have) :
[Unit]
Description=freeswitch
After=syslog.target network.target
#
[Service]
Type=forking
PIDFile=/usr/local/freeswitch/run/freeswitch.pid
PermissionsStartOnly=true
ExecStart=/usr/bin/freeswitch -nc -core -db /dev/shm -log /usr/local/freeswitch/log -conf /u
ExecReload=/usr/bin/kill -HUP $MAINPID
#ExecStop=/usr/bin/freeswitch -stop
TimeoutSec=120s
#
WorkingDirectory=/usr/bin
User=freeswitch
Group=freeswitch
LimitCORE=infinity
LimitNOFILE=999999
LimitNPROC=60000
LimitSTACK=245760
LimitRTPRIO=infinity
LimitRTTIME=7000000
#IOSchedulingClass=realtime
#IOSchedulingPriority=2
#CPUSchedulingPolicy=rr
#CPUSchedulingPriority=89
#UMask=0007
#
[Install]
WantedBy=multi-user.target
In the current master branch, take the two files from debian/ directory:
freeswitch-systemd.freeswitch.service -- should go as /lib/systemd/system/freeswitch.service
freeswitch-systemd.freeswitch.tmpfile -- should go as /usr/lib/tmpfiles.d/freeswitch.conf
You probably need to adapt the paths, or build FreeSWITCH to use standard Debian paths.