Issue with watchdog ping (watchdog: error opening socket (Operation not permitted)) - 64-bit

I have an issue with my Pi 4 where it seems the networking crashes at some point. The pi still runs but network is unreachable. I tried setting the ping command in my watchdog.conf but I am getting an error watchdog: error opening socket (Operation not permitted)
Hardware: Pi 4 8GB
OS: Raspberry Pi OS Lite (64-bit)
Watchdog version: 5.16
My watchdog.conf:
# ====================================================================
# Configuration for the watchdog daemon. For more information on the
# parameters in this file use the command 'man watchdog.conf'
# ====================================================================
# =================== The hardware timer settings ====================
#
# For this daemon to be effective it really needs some hardware timer
# to back up any reboot actions. If you have a server then see if it
# has IPMI support. Otherwise for Intel-based machines try the iTCO_wdt
# module, otherwise (or if that fails) then see if any of the following
# module load and work:
#
# it87_wdt it8712f_wdt w83627hf_wdt w83877f_wdt w83977f_wdt
#
# If all else fails then 'softdog' is better than no timer at all!
# Or work your way through the modules listed under:
#
# /lib/modules/`uname -r`/kernel/drivers/watchdog/
#
# To see if they load, present /dev/watchdog, and are capable of
# resetting the system on time-out.
# Uncomment this to use the watchdog device driver access "file".
#verbose=yes
watchdog-device = /dev/watchdog
# Uncomment and edit this line for hardware timeout values that differ
# from the default of one minute.
watchdog-timeout = 15
# If your watchdog trips by itself when the first timeout interval
# elapses then try uncommenting the line below and changing the
# value to 'yes'.
#watchdog-refresh-use-settimeout = auto
# If you have a buggy watchdog device (e.g. some IPMI implementations)
# try uncommenting this line and setting it to 'yes'.
#watchdog-refresh-ignore-errors = no
# ====================== Other system settings ========================
#
# Interval between tests. Should be a couple of seconds shorter than
# the hardware time-out value.
#interval = 1
# The number of intervals skipped before a log message is written (i.e.
# a multiplier for 'interval' in terms of syslog messages)
#logtick = 1
# Directory for log files (probably best not to change this)
log-dir = /var/log/watchdog
# Email address for sending the reboot reason. This needs sendmail to
# be installed and properly configured. Maybe you should just enable
# syslog forwarding instead?
#admin = root
# Lock the daemon in to memory as a real-time process. This greatly
# decreases the chance that watchdog won't be scheduled before your
# machine is really loaded.
realtime = yes
priority = 1
# ====================== How to handle errors =======================
#
# If you have a custom binary/script to handle errors then uncomment
# this line and provide the path. For 'v1' test binary files they also
# handle error cases.
#repair-binary = /usr/sbin/repair
#repair-timeout = 60
# The retry-timeout and repair limit are used to handle errors in a
# more robust manner. Errors must persist for longer than this to
# action a repair or reboot, and if repair-maximum attempts are
# made without the test passing a reboot is initiated anyway.
#retry-timeout = 60
#repair-maximum = 1
# Configure the delay on reboot from sending SIGTERM to all processes
# and to following up with SIGKILL for any that are ignoring the polite
# request to stop.
#sigterm-delay = 5
# ====================== User-specified tests ========================
#
# Specify the directory for auto-added 'v1' test programs (any executable
# found in the 'test-directory should be listed).
#test-directory = /etc/watchdog.d
# Specify any v0 custom tests here. Multiple lines are permitted, but
# having any 'v1' programs/scripts discovered in the 'test-directory' is
# the better way.
#test-binary =
# Specify the time-out value for a test error to be reported.
#test-timeout = 60
# ====================== Typical tests ===============================
#
# Specify any IPv4 numeric addresses to be probed.
# NOTE: You should check you have permission to ping any machine before
# using it as a test. Also remember if the target goes down then this
# machine will reboot as a result!
#ping = 192.168.1.1
# Set the number of ping attempts in each 'interval' of time. Default
# is 3 and it completes on the first successful ping.
# NOTE: Round-trip delay has to be less than 'interval' / 'ping-count'
# for test success, but this is unlikely to be exceeded except possibly
# on satellite links (very unlikely case!).
# Specify any network interface to be checked for activity.
interface = eth0
# Specify any files to be checked for presence, and if desired, checked
# that they have been updated more recently than 'change' seconds.
#file = /var/log/syslog
#change = 1407
# Uncomment to enable load average tests for 1, 5 and 15 minute
# averages. Setting one of these values to '0' disables it. These
# values will hopefully never reboot your machine during normal use
# (if your machine is really hung, the loadavg will go much higher
# than 25 in most cases).
max-load-1 = 24
#max-load-5 = 18
#max-load-15 = 12
# Check available memory on the machine.
#
# The min-memory check is a passive test from reading the file
# /proc/meminfo and computed from MemFree + Buffers + Cached
# If this is below a few tens of MB you are likely to have problems.
#
# The allocatable-memory is an active test checking it can be paged
# in to use.
#
# Maximum swap should be based on normal use, probably a large part of
# available swap but paging 1GB of swap can take tens of seconds.
#
# NOTE: This is the number of pages, to get the real size, check how
# large the pagesize is on your machine (typically 4kB for x86 hardware).
#min-memory = 1
#allocatable-memory = 1
#max-swap = 0
# Check for over-temperature. Typically the temperature-sensor is a
# 'virtual file' under /sys and it contains the temperature in
# milli-Celsius. Usually these are generated by the 'sensors' package,
# but take care as device enumeration may not be fixed.
#temperature-sensor =
#max-temperature = 90
# Check for a running process/daemon by its PID file. For example,
# check if rsyslogd is still running by enabling the following line:
#pidfile = /var/run/rsyslogd.pid
This runs fine checking the status of the service:
pi#raspberrypi:~ $ sudo service watchdog status
● watchdog.service - watchdog daemon
Loaded: loaded (/lib/systemd/system/watchdog.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2022-04-12 08:01:53 BST; 2min 17s ago
Process: 2120 ExecStartPre=/bin/sh -c [ -z "${watchdog_module}" ] || [ "${watchdog_module}" = "none" ] || /sbin/modprobe $watchdog_module (code=exited, status=0/SUCCESS)
Process: 2121 ExecStart=/bin/sh -c [ $run_watchdog != 1 ] || exec /usr/sbin/watchdog $watchdog_options (code=exited, status=0/SUCCESS)
Main PID: 2126 (watchdog)
Tasks: 1 (limit: 8986)
CPU: 59ms
CGroup: /system.slice/watchdog.service
└─2126 /usr/sbin/watchdog
Apr 12 08:01:53 raspberrypi watchdog[2126]: interface: eth0
Apr 12 08:01:53 raspberrypi watchdog[2126]: temperature: no sensors to check
Apr 12 08:01:53 raspberrypi watchdog[2126]: no test binary files
Apr 12 08:01:53 raspberrypi watchdog[2126]: no repair binary files
Apr 12 08:01:53 raspberrypi watchdog[2126]: error retry time-out = 60 seconds
Apr 12 08:01:53 raspberrypi watchdog[2126]: repair attempts = 1
Apr 12 08:01:53 raspberrypi watchdog[2126]: alive=/dev/watchdog heartbeat=[none] to=root no_act=no force=no
Apr 12 08:01:53 raspberrypi watchdog[2126]: watchdog now set to 15 seconds
Apr 12 08:01:53 raspberrypi watchdog[2126]: hardware watchdog identity: Broadcom BCM2835 Watchdog timer
Apr 12 08:01:53 raspberrypi systemd[1]: Started watchdog daemon.
However when I try to enable the ping command in the conf file (#ping = 192.168.1.1) I get the following error running watchdog -v:
watchdog -v
watchdog: String 'watchdog-device' found as '/dev/watchdog'
watchdog: Integer 'watchdog-timeout' found = 15
watchdog: String 'log-dir' found as '/var/log/watchdog'
watchdog: Variable 'realtime' found as 'yes' = 1
watchdog: Integer 'priority' found = 1
watchdog: List 'ping' added as '192.168.1.1'
watchdog: List 'interface' added as 'eth0'
watchdog: Integer 'max-load-1' found = 24
watchdog: error opening socket (Operation not permitted)
This seems to indicate that it's not permitted to do the ping test.
I googled the issue and found nothing like this anywhere yet but I did try the solutions in these articles where none of them worked:
https://discuss.linuxcontainers.org/t/even-with-root-user-im-receiving-operation-not-permitted-when-try-creating-gluster-volume-between-ubuntu-14-04-lxc-containers/2699
https://superuser.com/questions/288521/problem-with-ping-open-socket-operation-not-permitted
Anyone have any ideas?

Related

Why is my Postgres database working for a while and then not able to "start server" once restarted?

Recently, I've started playing around with an old Raspberry Pi 3 b+, and I thought it would be good practice to host a Postgres database on my local network and use it for whatever I want to work through. I understand that running Postgres on a Raspberry Pi with 1GB of memory is not ideal and can take a toll on the SDcard, but I've updated the postgresql.conf file and specified that the data directory path is to utilize a 1TB SSD. Additionally, I've installed zram and log2ram to try and curb some of the overhead on SDcard.
Overview of tech I'm working with:
Raspberry Pi 3 B+
Postgres 12
Ubuntu server 20.04 (no gui, only working from terminal)
1TB SSD
Yesterday, I was writing to the Postgres db from a python notebook without any issue, but once I restarted the Raspberry Pi, I was unable to reach the db from DataGrip and would receive the following error from my terminal in Ubuntu:
psql: error: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
I checked the status of the postgres server and that seemed to be alright...:
● postgresql.service - PostgreSQL RDBMS
Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor preset: enabled)
Active: active (exited) since Thu 2021-01-28 13:34:41 UTC; 20min ago
Process: 1895 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
Main PID: 1895 (code=exited, status=0/SUCCESS)
Jan 28 13:34:41 ubuntu systemd[1]: Starting PostgreSQL RDBMS...
Jan 28 13:34:41 ubuntu systemd[1]: Finished PostgreSQL RDBMS.
This is what is provided in the postgresql-12-main.log:
2021-01-28 13:17:23.344 UTC [1889] LOG: starting PostgreSQL 12.5 (Ubuntu 12.5-0ubuntu0.20.04.1) on aarch64-unknown-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
2021-01-28 13:17:23.362 UTC [1889] LOG: listening on IPv4 address "0.0.0.0", port 5432
2021-01-28 13:17:23.362 UTC [1889] LOG: listening on IPv6 address "::", port 5432
2021-01-28 13:17:23.365 UTC [1889] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-01-28 13:17:23.664 UTC [1899] LOG: database system was shut down at 2021-01-28 01:43:38 UTC
2021-01-28 13:17:24.619 UTC [1899] LOG: could not link file "pg_wal/xlogtemp.1899" to "pg_wal/000000010000000000000002": Operation not permitted
2021-01-28 13:17:24.670 UTC [1899] FATAL: could not open file "pg_wal/000000010000000000000002": No such file or directory
2021-01-28 13:17:24.685 UTC [1889] LOG: startup process (PID 1899) exited with exit code 1
2021-01-28 13:17:24.686 UTC [1889] LOG: aborting startup due to startup process failure
2021-01-28 13:17:24.708 UTC [1889] LOG: database system is shut down
pg_ctl: could not start server
Examine the log output.
Please let me know if you have any questions or if you would like for me to include any additional information. I appreciate any pointers you may have for head ahead of time.
This is what the /etc/init.d/postgres file looks like:::
#!/bin/sh
set -e
### BEGIN INIT INFO
# Provides: postgresql
# Required-Start: $local_fs $remote_fs $network $time
# Required-Stop: $local_fs $remote_fs $network $time
# Should-Start: $syslog
# Should-Stop: $syslog
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: PostgreSQL RDBMS server
### END INIT INFO
# Setting environment variables for the postmaster here does not work; please
# set them in /etc/postgresql/<version>/<cluster>/environment instead.
[ -r /usr/share/postgresql-common/init.d-functions ] || exit 0
. /usr/share/postgresql-common/init.d-functions
# versions can be specified explicitly
if [ -n "$2" ]; then
versions="$2 $3 $4 $5 $6 $7 $8 $9"
else
get_versions
fi
case "$1" in
start|stop|restart|reload)
if [ "$1" = "start" ]; then
create_socket_directory
fi
if [ -z "`pg_lsclusters -h`" ]; then
log_warning_msg 'No PostgreSQL clusters exist; see "man pg_createcluster"'
exit 0
fi
for v in $versions; do
$1 $v || EXIT=$?
done
exit ${EXIT:-0}
;;
status)
LS=`pg_lsclusters -h`
# no clusters -> unknown status
[ -n "$LS" ] || exit 4
echo "$LS" | awk 'BEGIN {rc=0} {if (match($4, "down")) rc=3; printf ("%s/%s (port %s): %s\n", $1, $2, $3, $4)}; END {exit rc}'
;;
force-reload)
for v in $versions; do
reload $v
done
;;
*)
echo "Usage: $0 {start|stop|restart|reload|force-reload|status} [version ..]"
exit 1
;;
esac
exit 0
config file (partly):
#------------------------------------------------------------------------------
# FILE LOCATIONS
#------------------------------------------------------------------------------
# The default values of these variables are driven from the -D command-line
# option or PGDATA environment variable, represented here as ConfigDir.
#data_directory = 'ConfigDir' # use data in another directory
# (change requires restart)
#hba_file = 'ConfigDir/pg_hba.conf' # host-based authentication file
# (change requires restart)
#ident_file = 'ConfigDir/pg_ident.conf' # ident configuration file
# (change requires restart)
# If external_pid_file is not explicitly set, no extra PID file is written.
#external_pid_file = '' # write an extra PID file
# (change requires restart)
/etc/init.d/postgresql (partly):
NOTE: this is from a non-standard installation. YMMV
# Data directory
#PGDATA="/data/db/postgres"
#PGDATA="/data/db/postgres/pgdata"
#PGDATA="/data/db/postgres-12/pgdata"
PGDATA="/data/db/postgres-11/pgdata"
(when upgrading, I tend to keep the commented-out older setting for reference)
Note: the config-file is not edited, every path refers to the ConfigDir (by default)
Additionally, for Postgres on a Pi, I set:
random_page_cost = 1.1
shared_buffers = 128MB
#work_mem = 4MB # keep the low default
effective_cache_size = 3GB # This is for a RaspberryPi-4
# for a Pi-3, I'd use ~700M
Okay, I think I've figured it out. Might be overkill but it works:
First thing I did was format and mount my 1TB SSD. Here is a good video for a walkthrough for formatting to ext4 and mounting. The difference between the video is that I've updated the fstab file to check my SSD during bootup or "0 2" at the end of the SSD mount options instead of "0 0".
Secondly, I installed Postgres. Here is a good walkthrough for that. The directions provided in that blog were more than I needed, but a good walkthrough nonetheless. I simply installed Postgres with:
sudo apt install postgresql postgresql-contrib
Third, I followed this walkthrough until the end of step two, but before beginning step 2, I added a symbolic link from /var/lib/postgresql/12/main to /YOUR/MOUNT/POSITION/postgresql/12/main by executing:
ln -s /var/lib/postgresql/12/main /YOUR/MOUNT/POSITION/postgresql/12/main
Lastly, before restarting the postgres server, I used this website to help me better configure my server. Enter your specs and it should give you some useful configuration settings.
If I remember anything I've left out I'll try and come back and edit this post. Otherwise comment if anything doesn't make sense or is unclear.

RetroPie and Bluetooth configuration

I have a Raspberry Pi 3B+ running RetroPie and am trying to set up my PS3 controller to always run in 'slave' mode to reduce latency issues. I got the idea from this post. I tested after running the command and it does indeed make a difference.
Problem is, it goes back to 'defaults' after the controller disconnects or rebooting the device. So I'm trying to make it always apply.
I have this shell script (yes, I know it's badly written - I plan to clean it up once I have it all working):
#!/bin/sh
concount=0
while [ ! "$(hcitool con | grep -o "[[:xdigit:]:]\{11,17\}")" ]; do
sleep 0.1
if [ $concount -eq 30 ]; then
break
else
echo "Try $concount - Device not found, retrying."
fi
concount=$(($concount + 1))
done
if [ $concount -eq 30 ]; then
echo "Device not found after $concount checks."
else
sudo hcitool sr $(hcitool con | grep -o "[[:xdigit:]:]\{11,17\}") slave
echo "Device found and set to slave"
fi
The delay part is because it would fire off before the controller was finished connecting and fail. This essentially gives it 3 seconds to work.
In any case, it works fine when I run it manually via command line and the latency is greatly reduced.
Connections:
> ACL 00:26:43:CC:B0:FB handle 11 state 1 lm SLAVE
Ping: 00:26:43:CC:B0:FB from B8:27:EB:68:0E:9E (data size 44) ...
4 bytes from 00:26:43:CC:B0:FB id 0 time 14.97ms
4 bytes from 00:26:43:CC:B0:FB id 1 time 16.09ms
4 bytes from 00:26:43:CC:B0:FB id 2 time 9.78ms
4 bytes from 00:26:43:CC:B0:FB id 3 time 11.11ms
4 bytes from 00:26:43:CC:B0:FB id 4 time 13.58ms
I made a file /etc/udev/rules.d/50-btslave.rules containing:
ACTION=="add", SUBSYSTEM=="usb", ATTR{idVendor}=="054C", ATTR{idProduct}=="0268", RUN+="/home/pi/btslave.sh"
(idvendor/idproduct came from reading the syslog)
and added this to 99-sixaxis.rules.
ACTION=="add", SUBSYSTEMS=="input", ATTRS{name}=="*PLAYSTATION(R)3 Controller", TAG+="systemd", RUN+="/home/pi/btslave.sh"
then did udevadm control --reload to apply the rules.
When I try to reconnect, the syslog doesn't seem to show anything different. It goes back to 'master' and I get the higher latency as well as spikes.
Feb 12 20:31:14 retropie kernel: [ 2121.796762] sony 0005:054C:0268.0005: unknown main item tag 0x0
Feb 12 20:31:15 retropie kernel: [ 2122.292506] input: Sony PLAYSTATION(R)3 Controller Motion Sensors as /devices/platform/soc/3f201000.serial/tty/ttyAMA0/hci0/hci0:11/0005:054C:0268.0005/input/input7
Feb 12 20:31:15 retropie kernel: [ 2122.293773] input: Sony PLAYSTATION(R)3 Controller as /devices/platform/soc/3f201000.serial/tty/ttyAMA0/hci0/hci0:11/0005:054C:0268.0005/input/input6
Feb 12 20:31:15 retropie kernel: [ 2122.294645] sony 0005:054C:0268.0005: input,hidraw2: BLUETOOTH HID v80.00 Joystick [Sony PLAYSTATION(R)3 Controller] on b8:27:eb:68:0e:9e
Feb 12 20:31:15 retropie systemd[1]: Started sixaxis helper (sys/devices/platform/soc/3f201000.serial/tty/ttyAMA0/hci0/hci0:11/0005:054C:0268.0005/input/input6).
Feb 12 20:31:15 retropie systemd[1]: Started sixaxis helper (/dev/input/event3).
Feb 12 20:31:15 retropie sixaxis-helper.sh[1403]: Calibrating: Sony PLAYSTATION(R)3 Controller (00:26:43:CC:B0:FB)
Feb 12 20:31:16 retropie sixaxis-helper.sh[1403]: Setting 600 second timeout on: Sony PLAYSTATION(R)3 Controller (00:26:43:CC:B0:FB)
Connections:
> ACL 00:26:43:CC:B0:FB handle 11 state 1 lm MASTER
4 bytes from 00:26:43:CC:B0:FB id 0 time 36.31ms
4 bytes from 00:26:43:CC:B0:FB id 1 time 35.97ms
4 bytes from 00:26:43:CC:B0:FB id 2 time 36.07ms
4 bytes from 00:26:43:CC:B0:FB id 3 time 36.10ms
4 bytes from 00:26:43:CC:B0:FB id 4 time 34.82ms
When I move the script elsewhere and test again, it shows an error in syslog that the script couldn't be found:
Feb 12 20:40:36 retropie systemd-udevd[1579]: failed to execute '/home/pi/btslave.sh' '/home/pi/btslave.sh': No such file or directory
Feb 12 20:40:36 retropie systemd-udevd[1577]: Process '/home/pi/btslave.sh' failed with exit code 2.
Feb 12 20:40:36 retropie systemd-udevd[1591]: failed to execute '/home/pi/btslave.sh' '/home/pi/btslave.sh': No such file or directory
Feb 12 20:40:36 retropie systemd-udevd[1574]: Process '/home/pi/btslave.sh' failed with exit code 2.
Feb 12 20:40:36 retropie systemd-udevd[1592]: failed to execute '/home/pi/btslave.sh' '/home/pi/btslave.sh': No such file or directory
Feb 12 20:40:36 retropie systemd-udevd[1577]: Process '/home/pi/btslave.sh' failed with exit code 2.
So that tells me that the event does seem to fire. It's just... not actually doing it. I scoured Google for a while and went nowhere. Am I missing something here or is there an easier way to go about this?
Thank you.
So, a few hours later and I managed to figure it out after a whole lot of debugging (and some ideas from a reddit post). Apparently udev is fired way too early in the sequence and no amount of 'waiting' would actually be able to retrieve the controller's address. It needed to fire much later.
So, I removed the udev rule and opted to just edit /usr/bin/sixaxis-helper.sh:
Added
hcitool sr "$SIXAXIS_MAC" slave
after
sixaxis_calibrate
(way at the bottom) It's ugly but it works. :) Controller immediately pairs to 'slave' mode and latency is all sexy-like.
Leaving it here for those who wish to implement the same solution. For now, I'm happy and can move on. Maybe I'll revisit later on and try to improve upon it so it doesn't require me to edit an existing script (or someone else takes up the challenge).

Debian init.d script fails to sleep

This problem occurs on a Pogoplug E02 running Debian jessie.
At startup the network interface takes several seconds to come online. A short delay is required after the "networking" script completes to ensure that ensuing network operations occur properly.
I wrote the following script and inserted it using update-rc.d. The script inserted correctly and executes at boot time in proper sequence, after networking and before the network-dependent scripts which were modified to depend on netdelay
cat /etc/init.d/netdelay
#! /bin/sh
### BEGIN INIT INFO
# Provides: netdelay
# Required-Start: networking
# Required-Stop:
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Delay 5s after eth0 up for Pogoplug
# Description:
### END INIT INFO
PATH=/sbin:/usr/sbin:/bin:/usr/bin
./lib/init/vars.sh
./lib/lsb/init-functions
log_action_msg "Pausing for eth0 to come online"
/bin/sleep 5
log_action_msg "Continuing"
exit 0
When the script executes at startup there is no delay. I've used both sleep and /bin/sleep in the script but neither effect the desired delay. Boot log showing this attached below.
Thu Jan 1 00:00:25 1970: Configuring network interfaces...done.
Thu Jan 1 00:00:25 1970: INIT: Entering runlevel: 2
Thu Jan 1 00:00:25 1970: Using makefile-style concurrent boot in runlevel 2.
Thu Jan 1 00:00:26 1970: Starting SASL Authentication Daemon: saslauthd.
Thu Jan 1 00:00:29 1970: Pausing for eth0 to come online.
Thu Jan 1 00:00:30 1970: Continuing.
Thu Jan 1 00:00:33 1970: ntpdate updating system time.
Wed Feb 1 05:33:40 2017: Starting enhanced syslogd: rsyslogd.
(The Pogoplug has no hardware clock and has no idea what time it is until ntpdate has run.)
Can someone see where the problem might be?

What is the criteria for rsyslogd to create symbolic link to my own file similar to /var/log/messages

When I check the file descriptors opened by rsyslogd, i see the process have created symbolic link to /var/log/messages and similar files,
root#blr09> ll /proc/16635/fd
total 0
lr-x------. 1 root root 64 Jan 4 08:29 0 -> /dev/null
l-wx------. 1 root root 64 Jan 4 08:29 1 -> /dev/null
l-wx------. 1 root root 64 Jan 4 08:29 10 -> **/var/log/authlog**
lr-x------. 1 root root 64 Jan 4 08:29 11 -> /run/log/journal/3da3ce2773004947b9a8d40578a1fb8b/system.journal
l-wx------. 1 root root 64 Jan 4 08:29 2 -> /dev/null
lrwx------. 1 root root 64 Jan 4 08:29 3 -> socket:[4422054]
l-wx------. 1 root root 64 Jan 4 08:29 4 -> **/var/log/messages**
lr-x------. 1 root root 64 Jan 4 08:29 5 -> /run/log/journal/3da3ce2773004947b9a8d40578a1fb8b/system#3a558a8cce7b45a6bf810fe33c7a89d6-0000000000011e3c-0005453c395dc7aa.journal
lr-x------. 1 root root 64 Jan 4 08:29 6 -> /run/log/journal/3da3ce2773004947b9a8d40578a1fb8b/system#3a558a8cce7b45a6bf810fe33c7a89d6-0000000000000001-00054520bb84848d.journal
lr-x------. 1 root root 64 Jan 4 08:29 7 -> anon_inode:inotify
l-wx------. 1 root root 64 Jan 4 08:29 8 -> **/var/log/secure**
l-wx------. 1 root root 64 Jan 4 08:29 9 -> **/var/log/cron**
root#blr09>
Below is rsyslog.conf
# rsyslog configuration file
# For more information see /usr/share/doc/rsyslog-*/rsyslog_conf.html
# If you experience problems, see http://www.rsyslog.com/doc/troubleshoot.html
#### MODULES ####
# The imjournal module bellow is now used as a message source instead of imuxsock.
$ModLoad imuxsock # provides support for local system logging (e.g. via logger command)
$ModLoad imjournal # provides access to the systemd journal
#$ModLoad imklog # reads kernel messages (the same are read from journald)
#$ModLoad immark # provides --MARK-- message capability
# Provides UDP syslog reception
#$ModLoad imudp
#$UDPServerRun 514
# Provides TCP syslog reception
#$ModLoad imtcp
#$InputTCPServerRun 514
#### GLOBAL DIRECTIVES ####
# Where to place auxiliary files
$WorkDirectory /var/lib/rsyslog
# Use default timestamp format
$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
# File syncing capability is disabled by default. This feature is usually not required,
# not useful and an extreme performance hit
#$ActionFileEnableSync on
# Include all config files in /etc/rsyslog.d/
$IncludeConfig /etc/rsyslog.d/*.conf
# Turn off message reception via local log socket;
# local messages are retrieved through imjournal now.
$OmitLocalLogging on
# File to store the position in the journal
$IMJournalStateFile imjournal.state
#### RULES ####
# Log all kernel messages to the console.
# Logging much else clutters up the screen.
#kern.* /dev/console
# Log anything (except mail) of level info or higher.
# Don't log private authentication messages!
*.info;mail.none;authpriv.none;cron.none /var/log/messages
# The authpriv file has restricted access.
authpriv.* /var/log/secure
# Log all the mail messages in one place.
mail.* -/var/log/maillog
# Log cron stuff
cron.* /var/log/cron
# Everybody gets emergency messages
*.emerg :omusrmsg:*
# Save news errors of level crit and higher in a special file.
uucp,news.crit /var/log/spooler
# Save boot messages also to boot.log
local7.* /var/log/boot.log
# ### begin forwarding rule ###
# The statement between the begin ... end define a SINGLE forwarding
# rule. They belong together, do NOT split them. If you create multiple
# forwarding rules, duplicate the whole block!
# Remote Logging (we use TCP for reliable delivery)
#
# An on-disk queue is created for this action. If the remote host is
# down, messages are spooled to disk and sent when it is up again.
#$ActionQueueFileName fwdRule1 # unique name prefix for spool files
#$ActionQueueMaxDiskSpace 1g # 1gb space limit (use as much as possible)
#$ActionQueueSaveOnShutdown on # save messages to disk on shutdown
#$ActionQueueType LinkedList # run asynchronously
#$ActionResumeRetryCount -1 # infinite retries if host is down
# remote host is: name/ip:port, e.g. 192.168.0.1:514, port optional
#*.* ##remote-host:514
# ### end of the forwarding rule ###
auth.info /var/log/authlog
*.* #127.0.0.1:10514
My requirement is to create a similar file of /var/log/messages for some application messages so that rsyslog will send all the messages to port 10514 as configured in rsyslog.conf.
Could you please let me know how this can be achieved.

Who is refreshing hardware watchdog in Linux?

I have a processor AT91SAM9G20 running a 2.6 kernel. Watchdog is enabled at bootstrap level and configured for 16 seconds. Watchdog mode register can be configured only once.
When code hangs either in bootstrap, bootloader or kernel, the board reboots. But once kernel comes up even though watchdog is not refreshed in any of the applications, the board is not being reset after 16 seconds, but 15 minutes.
Who is refreshing the watchdog?
In our case, the watchdog should be influenced by applications, so that the board can reset if our application hangs.
These are the running processes:
1 root init
2 root [kthreadd]
3 root [ksoftirqd/0]
4 root [watchdog/0]
5 root [events/0]
6 root [khelper]
63 root [kblockd/0]
72 root [ksuspend_usbd]
78 root [khubd]
85 root [kmmcd]
107 root [pdflush]
108 root [pdflush]
109 root [kswapd0]
110 root [aio/0]
740 root [mtdblockd]
828 root [rpciod/0]
982 root [jffs2_gcd_mtd10]
1003 root /sbin/udevd -d
1145 daemon portmap
1158 dbus dbus-daemon --system
1178 root /usr/sbin/ifplugd -i eth0 -fwI -u0 -d5 -l -q
1190 root /usr/sbin/ifplugd -i eth1 -fwI -u0 -d5 -l -q
1221 default avahi-daemon: running [SP14.local]
1226 root /usr/sbin/dropbear
1246 root /root/bin/host_app
1254 root /root/bin/mini_httpd -c *.cgi -d /root/bin -u root -E /root/bin/
1256 root -sh
1257 root /sbin/syslogd -n -m 0
1258 root /sbin/klogd -n
1259 root /usr/bin/tail -f /var/log/messages
1265 root ps -e
We are using the watchdog for soft lockups available in kernel-2.6.25-ts.at91sam9g20/kernel/softlockup.c
If you enabled the watchdog driver in your kernel, the watchdog driver sets up a kernel timer, in charge of resetting the watchdog. The corresponding code is linux/drivers/watchdog/at91sam9_wdt.c. So it works like this:
If no application opens the /dev/watchdog file, then the kernel takes care of resetting the watchdog. Since it is a timer, it won't appear as a dedicated kernel thread, but handled by the soft IRQ thread. Now, if an application opens this file, it becomes responsible of the watchdog, and can reset it by writing to the file, as documented by the documentation linked in Richard's post.
Is the watchdog driver configured in your kernel?
If not, you should configure it, and see if the reset still happens. If it still happens, it is likely that your reset comes from somewhere else.
If your kernel is too old to have a proper watchdog driver (not present in 2.6.25) you should backport it from 2.6.28. Or you can try to disable the watchdog in your bootloader and see if the reset still occurs.
In July 2016 commit 3fbfe92647 (watchdog: change watchdog_need_worker logic) in the 4.7 kernel to watchdog_dev.c enabled the same behavior as shodanex's answer for all watchdog timer drivers. This doesn't seem to be documented anywhere other than this thread and the source code.
/*
* A worker to generate heartbeat requests is needed if all of the
* following conditions are true.
* - Userspace activated the watchdog.
* - The driver provided a value for the maximum hardware timeout, and
* thus is aware that the framework supports generating heartbeat
* requests.
* - Userspace requests a longer timeout than the hardware can handle.
*
* Alternatively, if userspace has not opened the watchdog
* device, we take care of feeding the watchdog if it is
* running.
*/
return (hm && watchdog_active(wdd) && t > hm) ||
(t && !watchdog_active(wdd) && watchdog_hw_running(wdd));
This may give you a hint: http://www.mjmwired.net/kernel/Documentation/watchdog/watchdog-api.txt
It makes perfect sense to have a user space daemon handling the watchdog. It probably defaults to a 15 minute timeout.
we had a similar problem regarding WDT on AT91SAM9263. Problem was with bit 29 WDIDLEHLT of WDT_MR (Address: 0xFFFFFD44) register. This bit was set to 1 but it should be 0 for our application needs.
Bit explanation from datasheet documentation:
• WDIDLEHLT: Watchdog Idle Halt
0: The Watchdog runs when the system is in idle mode.
1: The Watchdog stops when the system is in idle state.
This means that WDT counter does not increment when kernel is in idle state, hence the 15 or more delay until reset happens.
You can try "dd if=/dev/zero of=/dev/null" which will prevent kernel from entering idle state and you should get a reset in 16 seconds (or whatever period you have set in WDT_MR register).
So, the solution is to update u-boot code or other piece of code that sets WDT_MR register. Remember this register is write once...
Wouldn't the kernel be refreshing the watchdog timer? The watchdog is designed to reset the board if the whole system hangs, not just a single application.

Resources