Why is my Postgres database working for a while and then not able to "start server" once restarted? - linux

Recently, I've started playing around with an old Raspberry Pi 3 b+, and I thought it would be good practice to host a Postgres database on my local network and use it for whatever I want to work through. I understand that running Postgres on a Raspberry Pi with 1GB of memory is not ideal and can take a toll on the SDcard, but I've updated the postgresql.conf file and specified that the data directory path is to utilize a 1TB SSD. Additionally, I've installed zram and log2ram to try and curb some of the overhead on SDcard.
Overview of tech I'm working with:
Raspberry Pi 3 B+
Postgres 12
Ubuntu server 20.04 (no gui, only working from terminal)
1TB SSD
Yesterday, I was writing to the Postgres db from a python notebook without any issue, but once I restarted the Raspberry Pi, I was unable to reach the db from DataGrip and would receive the following error from my terminal in Ubuntu:
psql: error: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
I checked the status of the postgres server and that seemed to be alright...:
● postgresql.service - PostgreSQL RDBMS
Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor preset: enabled)
Active: active (exited) since Thu 2021-01-28 13:34:41 UTC; 20min ago
Process: 1895 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
Main PID: 1895 (code=exited, status=0/SUCCESS)
Jan 28 13:34:41 ubuntu systemd[1]: Starting PostgreSQL RDBMS...
Jan 28 13:34:41 ubuntu systemd[1]: Finished PostgreSQL RDBMS.
This is what is provided in the postgresql-12-main.log:
2021-01-28 13:17:23.344 UTC [1889] LOG: starting PostgreSQL 12.5 (Ubuntu 12.5-0ubuntu0.20.04.1) on aarch64-unknown-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
2021-01-28 13:17:23.362 UTC [1889] LOG: listening on IPv4 address "0.0.0.0", port 5432
2021-01-28 13:17:23.362 UTC [1889] LOG: listening on IPv6 address "::", port 5432
2021-01-28 13:17:23.365 UTC [1889] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-01-28 13:17:23.664 UTC [1899] LOG: database system was shut down at 2021-01-28 01:43:38 UTC
2021-01-28 13:17:24.619 UTC [1899] LOG: could not link file "pg_wal/xlogtemp.1899" to "pg_wal/000000010000000000000002": Operation not permitted
2021-01-28 13:17:24.670 UTC [1899] FATAL: could not open file "pg_wal/000000010000000000000002": No such file or directory
2021-01-28 13:17:24.685 UTC [1889] LOG: startup process (PID 1899) exited with exit code 1
2021-01-28 13:17:24.686 UTC [1889] LOG: aborting startup due to startup process failure
2021-01-28 13:17:24.708 UTC [1889] LOG: database system is shut down
pg_ctl: could not start server
Examine the log output.
Please let me know if you have any questions or if you would like for me to include any additional information. I appreciate any pointers you may have for head ahead of time.

This is what the /etc/init.d/postgres file looks like:::
#!/bin/sh
set -e
### BEGIN INIT INFO
# Provides: postgresql
# Required-Start: $local_fs $remote_fs $network $time
# Required-Stop: $local_fs $remote_fs $network $time
# Should-Start: $syslog
# Should-Stop: $syslog
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: PostgreSQL RDBMS server
### END INIT INFO
# Setting environment variables for the postmaster here does not work; please
# set them in /etc/postgresql/<version>/<cluster>/environment instead.
[ -r /usr/share/postgresql-common/init.d-functions ] || exit 0
. /usr/share/postgresql-common/init.d-functions
# versions can be specified explicitly
if [ -n "$2" ]; then
versions="$2 $3 $4 $5 $6 $7 $8 $9"
else
get_versions
fi
case "$1" in
start|stop|restart|reload)
if [ "$1" = "start" ]; then
create_socket_directory
fi
if [ -z "`pg_lsclusters -h`" ]; then
log_warning_msg 'No PostgreSQL clusters exist; see "man pg_createcluster"'
exit 0
fi
for v in $versions; do
$1 $v || EXIT=$?
done
exit ${EXIT:-0}
;;
status)
LS=`pg_lsclusters -h`
# no clusters -> unknown status
[ -n "$LS" ] || exit 4
echo "$LS" | awk 'BEGIN {rc=0} {if (match($4, "down")) rc=3; printf ("%s/%s (port %s): %s\n", $1, $2, $3, $4)}; END {exit rc}'
;;
force-reload)
for v in $versions; do
reload $v
done
;;
*)
echo "Usage: $0 {start|stop|restart|reload|force-reload|status} [version ..]"
exit 1
;;
esac
exit 0

config file (partly):
#------------------------------------------------------------------------------
# FILE LOCATIONS
#------------------------------------------------------------------------------
# The default values of these variables are driven from the -D command-line
# option or PGDATA environment variable, represented here as ConfigDir.
#data_directory = 'ConfigDir' # use data in another directory
# (change requires restart)
#hba_file = 'ConfigDir/pg_hba.conf' # host-based authentication file
# (change requires restart)
#ident_file = 'ConfigDir/pg_ident.conf' # ident configuration file
# (change requires restart)
# If external_pid_file is not explicitly set, no extra PID file is written.
#external_pid_file = '' # write an extra PID file
# (change requires restart)
/etc/init.d/postgresql (partly):
NOTE: this is from a non-standard installation. YMMV
# Data directory
#PGDATA="/data/db/postgres"
#PGDATA="/data/db/postgres/pgdata"
#PGDATA="/data/db/postgres-12/pgdata"
PGDATA="/data/db/postgres-11/pgdata"
(when upgrading, I tend to keep the commented-out older setting for reference)
Note: the config-file is not edited, every path refers to the ConfigDir (by default)
Additionally, for Postgres on a Pi, I set:
random_page_cost = 1.1
shared_buffers = 128MB
#work_mem = 4MB # keep the low default
effective_cache_size = 3GB # This is for a RaspberryPi-4
# for a Pi-3, I'd use ~700M

Okay, I think I've figured it out. Might be overkill but it works:
First thing I did was format and mount my 1TB SSD. Here is a good video for a walkthrough for formatting to ext4 and mounting. The difference between the video is that I've updated the fstab file to check my SSD during bootup or "0 2" at the end of the SSD mount options instead of "0 0".
Secondly, I installed Postgres. Here is a good walkthrough for that. The directions provided in that blog were more than I needed, but a good walkthrough nonetheless. I simply installed Postgres with:
sudo apt install postgresql postgresql-contrib
Third, I followed this walkthrough until the end of step two, but before beginning step 2, I added a symbolic link from /var/lib/postgresql/12/main to /YOUR/MOUNT/POSITION/postgresql/12/main by executing:
ln -s /var/lib/postgresql/12/main /YOUR/MOUNT/POSITION/postgresql/12/main
Lastly, before restarting the postgres server, I used this website to help me better configure my server. Enter your specs and it should give you some useful configuration settings.
If I remember anything I've left out I'll try and come back and edit this post. Otherwise comment if anything doesn't make sense or is unclear.

Related

Issue with watchdog ping (watchdog: error opening socket (Operation not permitted))

I have an issue with my Pi 4 where it seems the networking crashes at some point. The pi still runs but network is unreachable. I tried setting the ping command in my watchdog.conf but I am getting an error watchdog: error opening socket (Operation not permitted)
Hardware: Pi 4 8GB
OS: Raspberry Pi OS Lite (64-bit)
Watchdog version: 5.16
My watchdog.conf:
# ====================================================================
# Configuration for the watchdog daemon. For more information on the
# parameters in this file use the command 'man watchdog.conf'
# ====================================================================
# =================== The hardware timer settings ====================
#
# For this daemon to be effective it really needs some hardware timer
# to back up any reboot actions. If you have a server then see if it
# has IPMI support. Otherwise for Intel-based machines try the iTCO_wdt
# module, otherwise (or if that fails) then see if any of the following
# module load and work:
#
# it87_wdt it8712f_wdt w83627hf_wdt w83877f_wdt w83977f_wdt
#
# If all else fails then 'softdog' is better than no timer at all!
# Or work your way through the modules listed under:
#
# /lib/modules/`uname -r`/kernel/drivers/watchdog/
#
# To see if they load, present /dev/watchdog, and are capable of
# resetting the system on time-out.
# Uncomment this to use the watchdog device driver access "file".
#verbose=yes
watchdog-device = /dev/watchdog
# Uncomment and edit this line for hardware timeout values that differ
# from the default of one minute.
watchdog-timeout = 15
# If your watchdog trips by itself when the first timeout interval
# elapses then try uncommenting the line below and changing the
# value to 'yes'.
#watchdog-refresh-use-settimeout = auto
# If you have a buggy watchdog device (e.g. some IPMI implementations)
# try uncommenting this line and setting it to 'yes'.
#watchdog-refresh-ignore-errors = no
# ====================== Other system settings ========================
#
# Interval between tests. Should be a couple of seconds shorter than
# the hardware time-out value.
#interval = 1
# The number of intervals skipped before a log message is written (i.e.
# a multiplier for 'interval' in terms of syslog messages)
#logtick = 1
# Directory for log files (probably best not to change this)
log-dir = /var/log/watchdog
# Email address for sending the reboot reason. This needs sendmail to
# be installed and properly configured. Maybe you should just enable
# syslog forwarding instead?
#admin = root
# Lock the daemon in to memory as a real-time process. This greatly
# decreases the chance that watchdog won't be scheduled before your
# machine is really loaded.
realtime = yes
priority = 1
# ====================== How to handle errors =======================
#
# If you have a custom binary/script to handle errors then uncomment
# this line and provide the path. For 'v1' test binary files they also
# handle error cases.
#repair-binary = /usr/sbin/repair
#repair-timeout = 60
# The retry-timeout and repair limit are used to handle errors in a
# more robust manner. Errors must persist for longer than this to
# action a repair or reboot, and if repair-maximum attempts are
# made without the test passing a reboot is initiated anyway.
#retry-timeout = 60
#repair-maximum = 1
# Configure the delay on reboot from sending SIGTERM to all processes
# and to following up with SIGKILL for any that are ignoring the polite
# request to stop.
#sigterm-delay = 5
# ====================== User-specified tests ========================
#
# Specify the directory for auto-added 'v1' test programs (any executable
# found in the 'test-directory should be listed).
#test-directory = /etc/watchdog.d
# Specify any v0 custom tests here. Multiple lines are permitted, but
# having any 'v1' programs/scripts discovered in the 'test-directory' is
# the better way.
#test-binary =
# Specify the time-out value for a test error to be reported.
#test-timeout = 60
# ====================== Typical tests ===============================
#
# Specify any IPv4 numeric addresses to be probed.
# NOTE: You should check you have permission to ping any machine before
# using it as a test. Also remember if the target goes down then this
# machine will reboot as a result!
#ping = 192.168.1.1
# Set the number of ping attempts in each 'interval' of time. Default
# is 3 and it completes on the first successful ping.
# NOTE: Round-trip delay has to be less than 'interval' / 'ping-count'
# for test success, but this is unlikely to be exceeded except possibly
# on satellite links (very unlikely case!).
# Specify any network interface to be checked for activity.
interface = eth0
# Specify any files to be checked for presence, and if desired, checked
# that they have been updated more recently than 'change' seconds.
#file = /var/log/syslog
#change = 1407
# Uncomment to enable load average tests for 1, 5 and 15 minute
# averages. Setting one of these values to '0' disables it. These
# values will hopefully never reboot your machine during normal use
# (if your machine is really hung, the loadavg will go much higher
# than 25 in most cases).
max-load-1 = 24
#max-load-5 = 18
#max-load-15 = 12
# Check available memory on the machine.
#
# The min-memory check is a passive test from reading the file
# /proc/meminfo and computed from MemFree + Buffers + Cached
# If this is below a few tens of MB you are likely to have problems.
#
# The allocatable-memory is an active test checking it can be paged
# in to use.
#
# Maximum swap should be based on normal use, probably a large part of
# available swap but paging 1GB of swap can take tens of seconds.
#
# NOTE: This is the number of pages, to get the real size, check how
# large the pagesize is on your machine (typically 4kB for x86 hardware).
#min-memory = 1
#allocatable-memory = 1
#max-swap = 0
# Check for over-temperature. Typically the temperature-sensor is a
# 'virtual file' under /sys and it contains the temperature in
# milli-Celsius. Usually these are generated by the 'sensors' package,
# but take care as device enumeration may not be fixed.
#temperature-sensor =
#max-temperature = 90
# Check for a running process/daemon by its PID file. For example,
# check if rsyslogd is still running by enabling the following line:
#pidfile = /var/run/rsyslogd.pid
This runs fine checking the status of the service:
pi#raspberrypi:~ $ sudo service watchdog status
● watchdog.service - watchdog daemon
Loaded: loaded (/lib/systemd/system/watchdog.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2022-04-12 08:01:53 BST; 2min 17s ago
Process: 2120 ExecStartPre=/bin/sh -c [ -z "${watchdog_module}" ] || [ "${watchdog_module}" = "none" ] || /sbin/modprobe $watchdog_module (code=exited, status=0/SUCCESS)
Process: 2121 ExecStart=/bin/sh -c [ $run_watchdog != 1 ] || exec /usr/sbin/watchdog $watchdog_options (code=exited, status=0/SUCCESS)
Main PID: 2126 (watchdog)
Tasks: 1 (limit: 8986)
CPU: 59ms
CGroup: /system.slice/watchdog.service
└─2126 /usr/sbin/watchdog
Apr 12 08:01:53 raspberrypi watchdog[2126]: interface: eth0
Apr 12 08:01:53 raspberrypi watchdog[2126]: temperature: no sensors to check
Apr 12 08:01:53 raspberrypi watchdog[2126]: no test binary files
Apr 12 08:01:53 raspberrypi watchdog[2126]: no repair binary files
Apr 12 08:01:53 raspberrypi watchdog[2126]: error retry time-out = 60 seconds
Apr 12 08:01:53 raspberrypi watchdog[2126]: repair attempts = 1
Apr 12 08:01:53 raspberrypi watchdog[2126]: alive=/dev/watchdog heartbeat=[none] to=root no_act=no force=no
Apr 12 08:01:53 raspberrypi watchdog[2126]: watchdog now set to 15 seconds
Apr 12 08:01:53 raspberrypi watchdog[2126]: hardware watchdog identity: Broadcom BCM2835 Watchdog timer
Apr 12 08:01:53 raspberrypi systemd[1]: Started watchdog daemon.
However when I try to enable the ping command in the conf file (#ping = 192.168.1.1) I get the following error running watchdog -v:
watchdog -v
watchdog: String 'watchdog-device' found as '/dev/watchdog'
watchdog: Integer 'watchdog-timeout' found = 15
watchdog: String 'log-dir' found as '/var/log/watchdog'
watchdog: Variable 'realtime' found as 'yes' = 1
watchdog: Integer 'priority' found = 1
watchdog: List 'ping' added as '192.168.1.1'
watchdog: List 'interface' added as 'eth0'
watchdog: Integer 'max-load-1' found = 24
watchdog: error opening socket (Operation not permitted)
This seems to indicate that it's not permitted to do the ping test.
I googled the issue and found nothing like this anywhere yet but I did try the solutions in these articles where none of them worked:
https://discuss.linuxcontainers.org/t/even-with-root-user-im-receiving-operation-not-permitted-when-try-creating-gluster-volume-between-ubuntu-14-04-lxc-containers/2699
https://superuser.com/questions/288521/problem-with-ping-open-socket-operation-not-permitted
Anyone have any ideas?

Dnsmasq fails to start every time I reboot my VPS

Dnsmasq fails to start every time I reboot my VPS. Below is the log about it.
Aug 22 18:14:51 debian dnsmasq[776]: dnsmasq: syntax check OK.
Aug 22 18:14:51 debian dnsmasq[798]: chown: invalid user: ‘dnsmasq:nogroup’
Aug 22 18:14:51 debian systemd[1]: dnsmasq.service: Control process exited, code=exited status=2
Aug 22 18:14:51 debian systemd[1]: Failed to start dnsmasq - A lightweight DHCP and caching DNS server.
Aug 22 18:14:51 debian systemd[1]: dnsmasq.service: Unit entered failed state.
Aug 22 18:14:51 debian systemd[1]: dnsmasq.service: Failed with result 'exit-code'.
I'd set the user and group name correctly in /etc/dnsmasq.conf file. I tried to reinstall dnsmasq and then it worked. But after I reboot the VPS, it failed to start again. Then I have to reinstall again...
So, If I want to use dnsmasq, I have to not to reboot my VPS. But I want to know what causes this and how to fix it once and for all. My VPS's system is Debian 9, kernel is 4.9.0-7-amd64.
Finally, I've found the trouble and fixed it. As the system informed, chown: invalid user: ‘dnsmasq:nogroup’ is the bad code. I thought it should be dnsmasq failed to automatically add user or group that caused the error.
One solution might be manually adding a user named "dnsmasq". I didn't try it but I thought it would work. Another handling way is to looked up user "dnsmasq" in relative files and replace user "dnsmasq" with an existing user like "nobody".
There are 3 places in the /etc/init.d/dnsmasq file that need replace.
if [ ! "$DNSMASQ_USER" ]; then
DNSMASQ_USER="*dnsmasq*"
fi
# /run may be volatile, so we need to ensure that
# /run/dnsmasq exists here as well as in postinst
if [ ! -d /run/dnsmasq ]; then
mkdir /run/dnsmasq || return 2
chown *dnsmasq*:nogroup /run/dnsmasq || return 2
fi
# /run may be volatile, so we need to ensure that
# /run/dnsmasq exists here as well as in postinst
if [ ! -d /run/dnsmasq ]; then
mkdir /run/dnsmasq || return 2
chown *dnsmasq*:nogroup /run/dnsmasq || return 2
fi
Replace 3 empasized places above.
Although I'm not sure why dnsmasq failed to access account files.

Debian init.d script fails to sleep

This problem occurs on a Pogoplug E02 running Debian jessie.
At startup the network interface takes several seconds to come online. A short delay is required after the "networking" script completes to ensure that ensuing network operations occur properly.
I wrote the following script and inserted it using update-rc.d. The script inserted correctly and executes at boot time in proper sequence, after networking and before the network-dependent scripts which were modified to depend on netdelay
cat /etc/init.d/netdelay
#! /bin/sh
### BEGIN INIT INFO
# Provides: netdelay
# Required-Start: networking
# Required-Stop:
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Delay 5s after eth0 up for Pogoplug
# Description:
### END INIT INFO
PATH=/sbin:/usr/sbin:/bin:/usr/bin
./lib/init/vars.sh
./lib/lsb/init-functions
log_action_msg "Pausing for eth0 to come online"
/bin/sleep 5
log_action_msg "Continuing"
exit 0
When the script executes at startup there is no delay. I've used both sleep and /bin/sleep in the script but neither effect the desired delay. Boot log showing this attached below.
Thu Jan 1 00:00:25 1970: Configuring network interfaces...done.
Thu Jan 1 00:00:25 1970: INIT: Entering runlevel: 2
Thu Jan 1 00:00:25 1970: Using makefile-style concurrent boot in runlevel 2.
Thu Jan 1 00:00:26 1970: Starting SASL Authentication Daemon: saslauthd.
Thu Jan 1 00:00:29 1970: Pausing for eth0 to come online.
Thu Jan 1 00:00:30 1970: Continuing.
Thu Jan 1 00:00:33 1970: ntpdate updating system time.
Wed Feb 1 05:33:40 2017: Starting enhanced syslogd: rsyslogd.
(The Pogoplug has no hardware clock and has no idea what time it is until ntpdate has run.)
Can someone see where the problem might be?

How to use systemctl to restart a service

I created a system service which can be start/stop/restart by running the command service. But when I try to do it by systemctl I always got below error. I don't know what wrong with my script file. Why it can be managed by service not systemctl?
# systemctl restart cooltoo_storage
Job for cooltoo_storage.service failed because the control process exited with error code. See "systemctl status cooltoo_storage.service" and "journalctl -xe" for details.
Below is the output of "systemctl status cooltoo_storage.service"
# systemctl status cooltoo_storage.service
● cooltoo_storage.service - LSB: cooltoo storage provider
Loaded: loaded (/etc/rc.d/init.d/cooltoo_storage)
Active: failed (Result: exit-code) since Tue 2016-05-03 17:17:12 CST; 1min 29s ago
Docs: man:systemd-sysv-generator(8)
Process: 32268 ExecStart=/etc/rc.d/init.d/cooltoo_storage start (code=exited, status=203/EXEC)
May 03 17:17:12 Cool-Too systemd[1]: Starting LSB: cooltoo storage provider...
May 03 17:17:12 Cool-Too systemd[1]: cooltoo_storage.service: control process exited, code=exited status=203
May 03 17:17:12 Cool-Too systemd[1]: Failed to start LSB: cooltoo storage provider.
May 03 17:17:12 Cool-Too systemd[1]: Unit cooltoo_storage.service entered failed state.
May 03 17:17:12 Cool-Too systemd[1]: cooltoo_storage.service failed.
Below is my script file /etc/init.d/cooltoo_storage,
### BEGIN INIT INFO
# Provides: cooltoo storage provider
# Required-Start: $local_fs $remote_fs $network $time $named
# Should-Start: $time sendmail
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: cooltoo storage provider
# Description: cooltoo storage provider
# chkconfig: 2345 90 90
### END INIT INFO
start() {
nginx -c /data/nginx/nginx.config
}
stop() {
nginx -c /data/nginx/nginx.config -s stop
}
restart() {
nginx -c /data/nginx/nginx.config -s reload
}
status() {
echo ''
}
action="$#"
if [[ "$MODE" == "auto" && -n "$init_script" ]] || [[ "$MODE" == "service" ]]; then
action="$1"
shift
fi
case "$action" in
start)
start ; exit $?;;
stop)
stop ; exit $?;;
restart)
restart ; exit $?;;
status)
status ; exit $?;;
*)
echo "Usage: $0 {start|stop|restart|force-reload|status|run}"; exit 1;
esac

Can't find documents with MediaWiki 1.21 with the Lucene-search extension

We're running MediaWiki 1.21 on Ubuntu 12.04.3 with the Lucene-search extension 2.1.3 (from its build.properties file).
I followed the instructions for a Single Host Setup (using ant to build the jar), and Setting Up Suggestions for the Search Box. Things seemed to be working just fine. However, new documents aren't being matched by the type-ahead search feature. Looking at the filesystem, I see that there are various items in the application's indexes directory:
$ cd /usr/local/search/lucene-search-2/indexes
$ ls -l
total 24
drwxr-xr-x 10 root root 4096 Aug 20 2013 import
drwxr-xr-x 7 root root 4096 Apr 14 06:42 index
drwxr-xr-x 2 root root 4096 Apr 14 06:41 search
drwxr-xr-x 9 root root 4096 Aug 20 2013 snapshot
drwxr-xr-x 2 root root 4096 Aug 20 2013 status
drwxr-xr-x 8 root root 4096 Aug 20 2013 update
We have a daily cron job that runs the Lucene-search build command, which dumps the wiki database as xml, and then modifies files in the import and snapshot folders. I noticed that the job reads from the search folder, which contains symbolic links to the update folder:
$ ls -l search/
total 24
lrwxrwxrwx 1 root root 70 Feb 12 21:39 wikidb -> /usr/local/search/lucene-search-2/indexes/update/wikidb/20140212064727
lrwxrwxrwx 1 root root 73 Feb 12 21:39 wikidb.hl -> /usr/local/search/lucene-search-2/indexes/update/wikidb.hl/20140212064727
lrwxrwxrwx 1 root root 76 Apr 14 06:41 wikidb.links -> /usr/local/search/lucene-search-2/indexes/update/wikidb.links/20140414064150
lrwxrwxrwx 1 root root 77 Feb 12 21:39 wikidb.prefix -> /usr/local/search/lucene-search-2/indexes/update/wikidb.prefix/20140212064728
lrwxrwxrwx 1 root root 78 Feb 12 21:39 wikidb.related -> /usr/local/search/lucene-search-2/indexes/update/wikidb.related/20140212064713
lrwxrwxrwx 1 root root 76 Feb 12 21:39 wikidb.spell -> /usr/local/search/lucene-search-2/indexes/update/wikidb.spell/20140212064740
Only the wikidb.links entry is current. The others are a couple of months old, which makes me think I missed something in how our daily cron task is setup. Here's the job:
#!/bin/sh
log=/var/log/lucene-search-2-cron.log
(
echo "Building wiki lucene-search indexes ..."
cd /usr/local/search/lucene-search-2
./build
echo "Stopping the lsearchd service..."
service lsearchd stop
# ok, so stopping the service apparently doesn't mean that the processes are gone, whack them manually
# See tip on using the "[x]yz" character class option so you don't need the additional "grep -v xyz":
# http://stackoverflow.com/questions/3510673/find-and-kill-a-process-in-one-line-using-bash-and-regex
echo "Killing any lucene-search processes that didn't terminate..."
kill -9 $(ps -ef | grep '[l]search' | awk '{print $2}')
echo "Starting the lsearchd service..."
service lsearchd start
) > $log 2>&1
And here's the service script /etc/init.d/lsearchd:
#!/bin/sh -e
### BEGIN INIT INFO
# Provides: lsearchd
# Required-Start: $syslog
# Required-Stop: $syslog
# Default-Start: 2 3 4 5
# Default-Stop: 1
# Short-Description: Start the Lucene Search daemon
# Description: Provide a Lucene Search backend for MediaWiki. Copied by John Ericson from: http://ubuntuforums.org/showthread.php?t
=1476445
### END INIT INFO
# Set to install directory of lucense-search. For example: /usr/local/lucene-search-2.1.3
LUCENE_SEARCH_DIR="/usr/local/search/lucene-search-2"
# Set username for daemon to run as. Can also use syntax "username:groupname" to also specify group for daemon to run as. For example: me:me
RUN_AS_USER="lsearchd"
OPTIONS="-configfile $LUCENE_SEARCH_DIR/lsearch.conf"
test -x $LUCENE_SEARCH_DIR/lsearchd || exit 0
test -n "$RUN_AS_USER" && CHUID_ARG="--chuid $RUN_AS_USER" || CHUID_ARG=""
if [ -f "/etc/default/lsearchd" ] ; then
. /etc/default/lsearchd
fi
. /lib/lsb/init-functions
case "$1" in
start)
cd $LUCENE_SEARCH_DIR
log_begin_msg "Starting Lucene Search Daemon..."
start-stop-daemon --start --quiet --oknodo --chdir $LUCENE_SEARCH_DIR --background $CHUID_ARG --exec $LUCENE_SEARCH_DIR/lsearchd -- $OPT
IONS
log_end_msg $?
;;
stop)
log_begin_msg "Stopping Lucene Search Daemon..."
start-stop-daemon --stop --quiet --oknodo --retry 2 --chdir $LUCENE_SEARCH_DIR $CHUID_ARG --exec $LUCENE_SEARCH_DIR/lsearchd
log_end_msg $?
;;
restart)
$0 stop
sleep 1
$0 start
;;
reload|force-reload)
log_begin_msg "Reloading Lucene Search Daemon..."
start-stop-daemon --stop -signal 1 --chdir $LUCENE_SEARCH_DIR $CHUID_ARG --exec $LUCENE_SEARCH_DIR/lsearchd
log_end_msg $?
;;
status)
status_of_proc $LUCENE_SEARCH_DIR/lsearchd lsearchd && exit 0 || exit $?
;;
*)
log_success_msg "Usage: /etc/init.d/lsearchd {start|stop|restart|reload|force-reload|status}"
exit 1
esac
exit 0
Update #1:
I deleted the update directory and ran the build command manually from the console as root. As expected, it only generated the update/wikidb.links entry, none of the other folders exist. I reviewed my earlier setup notes, and don't see anything different, so how did those folders get created, and how do they get maintained?
Update #2:
I retraced my steps from the initial install, and couldn't see anything I missed. So on a chance, I stopped the service and ran lsearchd from the console, and it created the missing directories! So I terminated the process and tried things again: deleted the indexes folder and ran the cron script from the console as root. I confirmed that when run this way, lsearchd DID NOT create the missing directories. And of course, now I remember that I had run lsearchd from the console when initially setting things up, verifying that it was getting client queries for the wiki's Search input field. And these are the indexes it had been using for the lookups, which explains why new documents are not included.
Here is what the command looks like when run as a service:
$ ps -ef | grep [l]search
lsearchd 10192 1 0 14:02 ? 00:00:00 /bin/bash /usr/local/search/lucene-search-2/lsearchd -configfile /usr/local/search/lucene-search-2/lsearch.conf
lsearchd 10198 10192 0 14:02 ? 00:00:01 java -Djava.rmi.server.codebase=file:///usr/local/search/lucene-search-2/LuceneSearch.jar -Djava.rmi.server.hostname=AMWikiBugz -jar /usr/local/search/lucene-search-2/LuceneSearch.jar -configfile /usr/local/search/lucene-search-2/lsearch.conf
So the remaining question is:
Why does lsearchd NOT create the directories when run as a service?
This was a permissions issue. d'oh!
The cron job and service init scripts all execute as root, however the service process is instantiated as the lsearchd user. Once I changed ownership of /usr/local/search/lucene-search-2/indexes/ and all subdirectories to be owned by lsearchd:lsearchd, the lsearchd process was able to create the missing directories when run via the service under cron.
It would have helped if something along the way had logged an error message to syslog indicating that it couldn't write to the target folder.

Resources