Inconsistent systemd startup of freeswitch - linux

I have two problems running freeswitch from systemd :
EDIT 2 - I have moved the slow start up question to here (Freeswitch pauses on check_ip at boot on centos 7.1) as although they may be related it's probably good as a standalone.
EDIT - I have noticed something else. Look at these next lines captured from the terminal output when running it from there. The gap is 4 minutes but it has been around 10 minutes before. I noticed it because I was trying to find out why port 8021 was taking several minutes to accept the fs_cli connection. Why does this happen? Never happened to me before and I've installed loads of FS boxes. This does the same thing on both 1.7 & todays 1.6.
2015-10-23 12:57:35.280984 [DEBUG] switch_scheduler.c:249 Added task 1 heartbeat (core) to run at 1445601455
2015-10-23 12:57:35.281046 [DEBUG] switch_scheduler.c:249 Added task 2 check_ip (core) to run at 1445601455
2015-10-23 13:01:31.100892 [NOTICE] switch_core.c:1386 Created ip list rfc6598.auto default (deny)
I sometimes get double processes started. Here is my status line after such an occurrence :
# systemctl status freeswitch -l
freeswitch.service - freeswitch
Loaded: loaded (/etc/systemd/system/multi-user.target.wants/freeswitch.service)
Active: activating (start) since Fri 2015-10-23 01:31:53 BST; 18s ago
Main PID: 2571 (code=exited, status=0/SUCCESS); : 2742 (freeswitch)
CGroup: /system.slice/freeswitch.service
├─usr/bin/freeswitch -ncwait -core -db /dev/shm -log /usr/local/freeswitch/log -conf /usr/local/freeswitch/conf -run /usr/local/freeswitch/run
└─usr/bin/freeswitch -ncwait -core -db /dev/shm -log /usr/local/freeswitch/log -conf /usr/local/freeswitch/conf -run /usr/local/freeswitch/run
Oct 23 01:31:53 fswitch-1 systemd[1]: Starting freeswitch...
Oct 23 01:31:53 fswitch-1 freeswitch[2742]: 2743 Backgrounding.
and there are two processes running.
The PID file is sometimes not written fast enough for the systemd process to pick it up, but by the time I see this (no matter how fast I run the command) it's always there by the time I do :
Oct 23 02:00:26 arribacom-sbc-1 systemd[1]: PID file
/usr/local/freeswitch/run/freeswitch.pid not readable (yet?) after
start.
Now, in (2) everything seems to work ok, and I can shut down the freeswitch process using
systemctl stop freeswitch
without any issues, but in (1) it just doesn't seem to do anything.
I'm wondering if the two are related, and that freeswitch is reporting back to systemd that the program is running before it actually is. Then systemd is either starting up another process or (sometimes) not.
Can anyone offer any pointers? I have tried to mail the freeswitch users list but despite being registered I simply cannot get any emails to appear on the list (but that's another problem).
* Update *
If I remove the -ncwait it seems to improve the double process starting but I still get the can't read PID warning, so I'm still sure there's an issue present, possibly around timing(?).
I'm on Centos 7.1, & my freeswitch version is
FreeSWITCH Version 1.7.0+git~20151021T165609Z~9fee9bc613~64bit (git
9fee9bc 2015-10-21 16:56:09Z 64bit)
and here's my freeswitch.service file (some things have been commented out until I understand what they are doing and any side effects they may have) :
[Unit]
Description=freeswitch
After=syslog.target network.target
#
[Service]
Type=forking
PIDFile=/usr/local/freeswitch/run/freeswitch.pid
PermissionsStartOnly=true
ExecStart=/usr/bin/freeswitch -nc -core -db /dev/shm -log /usr/local/freeswitch/log -conf /u
ExecReload=/usr/bin/kill -HUP $MAINPID
#ExecStop=/usr/bin/freeswitch -stop
TimeoutSec=120s
#
WorkingDirectory=/usr/bin
User=freeswitch
Group=freeswitch
LimitCORE=infinity
LimitNOFILE=999999
LimitNPROC=60000
LimitSTACK=245760
LimitRTPRIO=infinity
LimitRTTIME=7000000
#IOSchedulingClass=realtime
#IOSchedulingPriority=2
#CPUSchedulingPolicy=rr
#CPUSchedulingPriority=89
#UMask=0007
#
[Install]
WantedBy=multi-user.target

In the current master branch, take the two files from debian/ directory:
freeswitch-systemd.freeswitch.service -- should go as /lib/systemd/system/freeswitch.service
freeswitch-systemd.freeswitch.tmpfile -- should go as /usr/lib/tmpfiles.d/freeswitch.conf
You probably need to adapt the paths, or build FreeSWITCH to use standard Debian paths.

Related

ROS RViz issues when starting using robot_startup

Background
I have an application that requires that I start several RViz windows in a headless ROS environment. The system is required to send image files to some locally networked dumb terminals which can barely but adequately show image files (.jpg). Therefore, I simply take screen snapshots of the RViz displays and send those. This works well, however, I need to run the RViz windows on startup.
Implementation
The ROS noetic system is running on Ubuntu 20.04. I used robot_upstart to give me a working skeleton for a systemd service and then modified the core service file to allow display_manager access
This is my working system.d service file called 'test.service'
[Unit]
Description="bringup test"
After=network.target
After=display_manager.service
Wants=display_manager.service
[Service]
Type=simple
Environment="XAUTHORITY=/run/user/1000/gdm/Xauthority"
Environment="DISPLAY=:0"
Environment="XDG_RUNTIME_DIR=/home/<my_username/catkin_ws/tmp"
Environment="/home/<my_username>" # THIS FIXED THE ISSUE
ExecStart=/usr/sbin/test-start
[Install]
WantedBy=multi-user.target
This almost works. journalctl -f -u test.service lists an error:
Jun 06 21:10:22 aoede test-start[10209]: /opt/ros/noetic/lib/rviz/rviz: line 1: 10220 Aborted (core dumped) $0 $#
Jun 06 21:10:25 aoede dbus-daemon[10259]: [session uid=1000 pid=10257] AppArmor D-Bus mediation is enabled
Jun 06 21:10:28 aoede test-start[10237]: terminate called after throwing an instance of 'boost::filesystem::filesystem_error'
Jun 06 21:10:28 aoede test-start[10237]: what(): boost::filesystem::create_directory: Permission denied: "/.rviz"
Jun 06 21:10:28 aoede test-start[10218]: Aborted (core dumped)
It is trying to write to a directory /.rviz . When I create this directory myself with relaxed permissions it then works correctly and the RViz windows all start. This directory seems to be filled with persistence files for the RViz instances.
I have tried setting XDG_RUNTIME_DIR as above but it had no effect. What environment variable should I set, or other way, so that RViz is looking in a more rational place? Also, would appreciate any recommendations on better practices than above.
The required environment variable is $HOME
This was being set after the service was run and was therefore not available.
Environment="/home/<my_username>"
Fixed the issue

Sidekiq starting successfully, but systemd restarts every ~1 minute anyway

Rails: 6.0.3
Sidekiq: 6.1.2
Ruby 2.7.2
Running on AWS Amazon Linux 2
I'm running a fairly simply Sidekiq configuration on production, and using the boilerplate systemd/sidekiq.service file from the examples directory in the sidekiq repo.
I noticed that my workers can not run long jobs because they are killed every 1 minute or so. I was able to track down what's happening, and it appears that systemd is restarting sidekiq, even though it is successfully started. It appears that it never receives the message that the service started successfully, so systemd is killing the process.
Here are the logs:
sidekiq: 2021-06-01T23:30:56.510Z pid=24939 tid=gir INFO: Shutting down
sidekiq: 2021-06-01T23:30:56.511Z pid=24939 tid=4jxb INFO: Scheduler exiting...
systemd: Failed to start sidekiq.
systemd: Unit sidekiq.service entered failed state.
systemd: sidekiq.service failed.
sidekiq: 2021-06-01T23:30:56.513Z pid=24939 tid=gir INFO: Terminating quiet workers
sidekiq: 2021-06-01T23:30:56.513Z pid=24939 tid=4jvn INFO: Scheduler exiting...
sidekiq: 2021-06-01T23:30:57.015Z pid=24939 tid=gir INFO: Pausing to allow workers to finish...
sidekiq: 2021-06-01T23:30:57.516Z pid=24939 tid=gir INFO: Bye!
systemd: sidekiq.service holdoff time over, scheduling restart.
systemd: Starting sidekiq...
sidekiq: 2021-06-01T23:30:58.991Z pid=32046 tid=fs6 INFO: Enabling systemd notification integration
sidekiq: 2021-06-01T23:31:04.475Z pid=32046 tid=fs6 INFO: Booting Sidekiq 6.1.2 with redis options {:url=>"redis://******"}
sidekiq: 2021-06-01T23:31:08.869Z pid=32046 tid=fs6 INFO: Running in ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux]
sidekiq: 2021-06-01T23:31:08.870Z pid=32046 tid=fs6 INFO: See LICENSE and the LGPL-3.0 for licensing details.
systemd: sidekiq.service: Got notification message from PID 32046, but reception only permitted for main PID 31981
Following these messages, the sidekiq worker will successfully perform the jobs from the queue for about 1 minute before it's restarted again. This cycle continues forever.
I've tried modifying the sidekiq.service file a number of different ways, but nothing seems to do the trick. In particular, this line from the logs seems to indicate there's an issue sending the signal to the right process ID, that sidekiq correctly started up: systemd: sidekiq.service: Got notification message from PID 32046, but reception only permitted for main PID 31981
Any ideas on how I can ensure that systemd accurately knows when a job succeeds/fails to start?
Here is my current systemd/sidekiq.service file:
#
# This file tells systemd how to run Sidekiq as a 24/7 long-running daemon.
#
# Customize this file based on your bundler location, app directory, etc.
# Customize and copy this into /usr/lib/systemd/system (CentOS) or /lib/systemd/system (Ubuntu).
# Then run:
# - systemctl enable sidekiq
# - systemctl {start,stop,restart} sidekiq
#
# This file corresponds to a single Sidekiq process. Add multiple copies
# to run multiple processes (sidekiq-1, sidekiq-2, etc).
#
# Use `journalctl -u sidekiq -rn 100` to view the last 100 lines of log output.
#
[Unit]
Description=sidekiq
# start us only once the network and logging subsystems are available,
# consider adding redis-server.service if Redis is local and systemd-managed.
After=syslog.target network.target
# See these pages for lots of options:
#
# https://www.freedesktop.org/software/systemd/man/systemd.service.html
# https://www.freedesktop.org/software/systemd/man/systemd.exec.html
#
# THOSE PAGES ARE CRITICAL FOR ANY LINUX DEVOPS WORK; read them multiple
# times! systemd is a critical tool for all developers to know and understand.
#
[Service]
#
# !!!! !!!! !!!!
#
# As of v6.0.6, Sidekiq automatically supports systemd's `Type=notify` and watchdog service
# monitoring. If you are using an earlier version of Sidekiq, change this to `Type=simple`
# and remove the `WatchdogSec` line.
#
# !!!! !!!! !!!!
#
Type=simple
# If your Sidekiq process locks up, systemd's watchdog will restart it within seconds.
#WatchdogSec=10
EnvironmentFile=/opt/elasticbeanstalk/deployment/custom_env_var
WorkingDirectory=/var/app/current
# If you use rbenv:
# ExecStart=/bin/bash -lc 'exec /home/deploy/.rbenv/shims/bundle exec sidekiq -e production'
# If you use the system's ruby:
# ExecStart=/usr/local/bin/bundle exec sidekiq -e production
# If you use rvm in production without gemset and your ruby version is 2.6.5
# ExecStart=/home/deploy/.rvm/gems/ruby-2.6.5/wrappers/bundle exec sidekiq -e production
# If you use rvm in production wit gemset and your ruby version is 2.6.5
ExecStart=/bin/bash -lc 'cd /var/app/current; bundle exec sidekiq -e production -r /var/app/current -C /var/app/current/config/sidekiq.yml'
# Use `systemctl kill -s TSTP sidekiq` to quiet the Sidekiq process
# !!! Change this to your deploy user account !!!
User=root
Group=root
UMask=0002
# Greatly reduce Ruby memory fragmentation and heap usage
# https://www.mikeperham.com/2018/04/25/taming-rails-memory-bloat/
Environment=MALLOC_ARENA_MAX=2
# if we crash, restart
RestartSec=1
Restart=on-failure
# output goes to /var/log/syslog (Ubuntu) or /var/log/messages (CentOS)
StandardOutput=syslog
StandardError=syslog
# This will default to "bundler" if we don't specify it
SyslogIdentifier=sidekiq
[Install]
WantedBy=multi-user.target
Change ExecStart to:
ExecStart=/direct/path/to/bundle exec sidekiq -e production
Everything else in that line appears superfluous.
Maybe this work in your case:
Type=notify
Notify=all # or "exec"

sudo ./jetty Stop or Start Failure

The jetty on our linux server is not installed as a service as we have multiple jetty servers on different ports. And we use command./jetty.sh stop and ./jetty.sh start to stop and start jetty.
However, when I add sudo to the command, the server never stop/start successfully. When I run sudo ./jetty.sh stop, it shows
Stopping Jetty: start-stop-daemon: warning: failed to kill 18772: No such process
1 pids were not killed
No process in pidfile '/var/run/jetty.pid' found running; none killed.
and the server was not stopped.
When I run sudo ./jetty.sh start, it shows
Starting Jetty: FAILED Tue Apr 23 23:07:15 CST 2019
How could this happen? From my understanding. Using sudo gives you more power and privilege to run commands. If you can successfully execute without sudo, then the command should never fail with sudo, since it only grants superuser privilege.
As a user it uses $HOME.
As root it uses system paths.
The error you got ..
Stopping Jetty: start-stop-daemon: warning: failed to kill 18772: No such process
1 pids were not killed
No process in pidfile '/var/run/jetty.pid' found running; none killed.
... means that there was a bad pid file sitting around for a process that no longer exists.
Short answer, the processing is different if you are root (a service) vs a user (just an application).

Postgresql 9.3 on Centos 7 with custom PGDATA

I am trying to set up Postgresql 9.3 server on Centos 7 (installation via yum) inside a custom directory, which in my case is an encrypted partition (/custom_container/database) that is mounted on startup. For a certain reason Postgresql does not behave like it should in the manual and makes an error on service startup.
Note: It does not want to accept the PGDATA environment variable which I set, and when running
su - postgres -c '/usr/pgsql-9.3/bin/initdb'
(given that the PGDATA directory is owned by postgres:postgres) the cluster gets initialized inside the default directory /var/lib/pgsql/9.3/data/
The only way to change that is using
su - postgres -c '/usr/pgsql-9.3/bin/initdb --pgdata=$PGDATA'
Which initializes the directory inside the custom container I am using. This is something I could not figure out, as the docs say that PGDATA variable is taken on default.
Problem: When running
service postgresql-9.3 start
I get an error with the log
postgresql-9.3.service - PostgreSQL 9.3 database server
Loaded: loaded (/usr/lib/systemd/system/postgresql-9.3.service; disabled)
Active: failed (Result: exit-code) since Mon 2014-11-10 15:24:15 CET; 1s ago
Process: 2785 ExecStartPre=/usr/pgsql-9.3/bin/postgresql93-check-db-dir ${PGDATA} (code=exited, status=1/FAILURE)
Nov 10 15:24:15 CentOS-70-64-minimal systemd[1]: Starting PostgreSQL 9.3 database server...
Nov 10 15:24:15 CentOS-70-64-minimal postgresql93-check-db-dir[2785]: "/var/lib/pgsql/9.3/data/" is missing or empty.
Nov 10 15:24:15 CentOS-70-64-minimal postgresql93-check-db-dir[2785]: Use "/usr/pgsql-9.3/bin/postgresql93-setup initdb" to initialize t...ster.
Nov 10 15:24:15 CentOS-70-64-minimal postgresql93-check-db-dir[2785]: See %{_pkgdocdir}/README.rpm-dist for more information.
Nov 10 15:24:15 CentOS-70-64-minimal systemd[1]: postgresql-9.3.service: control process exited, code=exited status=1
Nov 10 15:24:15 CentOS-70-64-minimal systemd[1]: Failed to start PostgreSQL 9.3 database server.
Nov 10 15:24:15 CentOS-70-64-minimal systemd[1]: Unit postgresql-9.3.service entered failed state.
Which means that Postgresql, even though the cluster is initialized in the new $PGDATA directory (/custom_container/database) still looks for the cluster in /var/lib/pgsql/9.3/data/
Did anyone experience this Postgresql behavior before? Could it be that I forgot certain configuration options or that the problem comes from Postgresql installation?
Thank you in advance!
It appears the real problem was setting the environment variables, which I got working in the following thread:
Centos 7 environment variables for Postgres service
The issue is the PGDATA variable set inside the custom /etc/systemd/system/postgresql-9.3.service which should be created from the contents of /usr/lib/systemd/system/postgresql-9.3.service which uses the default PGDATA var.
You need to create a custom postgresql.service file in /etc/systemd/system/, which overrides the default PGDATA environment variable. Your custom service file can .include the default postgresql service file, so you only need to add what you want to change. That way, upgrades can still modify/improve? stuff in the default service file, while your change is preserved.
This is how I just did it in Centos 7:
cat <<END >/etc/systemd/system/postgresql.service
.include /lib/systemd/system/postgresql.service
[Service]
Environment=PGDATA=/mnt/postgres/data ## <== SET THIS TO YOUR WANTED $PGDATA
END
systemctl daemon-reload
systemctl restart postgresql.service
Verify :
ps -ax | grep [p]ostgres
Update:
Rather than manually creating the file and adding the .include line, you can also use the systemd built-in way:
systemctl edit postgresql.service
This will open your default editor and save your changes to /etc/systemd/system/postgresql.service.d/override.conf
try this:
## Login with postgres user
su - postgres
export PGDATA=/your_path/data
pg_ctl -D $PGDATA start &
I think the most "CentOS 7 way" to do it is to copy the service file:
sudo cp /usr/lib/systemd/system/postgresql-9.6.service /etc/systemd/system/postgresql-9.6.service
Then edit the file /etc/systemd/system/postgresql-9.6.service:
# Location of database directory
Environment=PGDATA=/mnt/volume/var/lib/pgsql/9.6/data/
Then start it sudo systemctl start postgresql-9.6 and verify:
# sudo ps -ax | grep postmaster
32100 ? Ss 0:00 /usr/pgsql-9.6/bin/postmaster -D /mnt/volume/var/lib/pgsql/9.6/data/
Try to edit file /etc/init.d/postgresql-9.3:
PGDATA=/your/custom/path

Cassandra won't start in linux as a service

I have a debian linux image running on Google compute. Can successfully get cassandra working with "sudo cassandra" or "sudo cassandra -f" but then as soon as I log off this stops working. But when I try to run this as a service it simply doesnt say anything and doesnt start it either! I installed it using the aptget package v2.1.
I've tried sudo service cassandra start. It looks like its doing something and then quits without any logs.
Please help me run this up as a service. I can't even locate where the logs are stored when I run it as a service.
I ran into this issue recently, and as BrianC indicated it can be an out of memory condition. In my case I could successfully start cassandra with sudo cassandra -f but not with /etc/init.d/cassandra start.
For me, the last log entry in /var/log/cassandra/system.log when starting as a service was:
INFO [main] 2015-04-30 10:58:16,234 CassandraDaemon.java (line 248) Classpath: /etc/cassandra:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/super-csv-2.1.0.jar:/usr/share/cassandra/lib/thrift-server-0.3.7.jar:/usr/share/cassandra/apache-cassandra-2.0.14.jar:/usr/share/cassandra/apache-cassandra-thrift-2.0.14.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar::/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar
And nothing afterwards. If it is a memory problem you should be able to verify this in your syslog. If if contains something like:
Apr 30 10:53:39 dev kernel: [1173246.957818] Out of memory: Kill process 8229 (java) score 132 or sacrifice child
Apr 30 10:53:39 dev kernel: [1173246.957831] Killed process 8229 (java) total-vm:634084kB, anon-rss:286772kB, file-rss:12676kB
Increase your ram. In my case I increased it to 2GB and it started fine.

Resources