I have an issue with my Pi 4 where it seems the networking crashes at some point. The pi still runs but network is unreachable. I tried setting the ping command in my watchdog.conf but I am getting an error watchdog: error opening socket (Operation not permitted)
Hardware: Pi 4 8GB
OS: Raspberry Pi OS Lite (64-bit)
Watchdog version: 5.16
My watchdog.conf:
# ====================================================================
# Configuration for the watchdog daemon. For more information on the
# parameters in this file use the command 'man watchdog.conf'
# ====================================================================
# =================== The hardware timer settings ====================
#
# For this daemon to be effective it really needs some hardware timer
# to back up any reboot actions. If you have a server then see if it
# has IPMI support. Otherwise for Intel-based machines try the iTCO_wdt
# module, otherwise (or if that fails) then see if any of the following
# module load and work:
#
# it87_wdt it8712f_wdt w83627hf_wdt w83877f_wdt w83977f_wdt
#
# If all else fails then 'softdog' is better than no timer at all!
# Or work your way through the modules listed under:
#
# /lib/modules/`uname -r`/kernel/drivers/watchdog/
#
# To see if they load, present /dev/watchdog, and are capable of
# resetting the system on time-out.
# Uncomment this to use the watchdog device driver access "file".
#verbose=yes
watchdog-device = /dev/watchdog
# Uncomment and edit this line for hardware timeout values that differ
# from the default of one minute.
watchdog-timeout = 15
# If your watchdog trips by itself when the first timeout interval
# elapses then try uncommenting the line below and changing the
# value to 'yes'.
#watchdog-refresh-use-settimeout = auto
# If you have a buggy watchdog device (e.g. some IPMI implementations)
# try uncommenting this line and setting it to 'yes'.
#watchdog-refresh-ignore-errors = no
# ====================== Other system settings ========================
#
# Interval between tests. Should be a couple of seconds shorter than
# the hardware time-out value.
#interval = 1
# The number of intervals skipped before a log message is written (i.e.
# a multiplier for 'interval' in terms of syslog messages)
#logtick = 1
# Directory for log files (probably best not to change this)
log-dir = /var/log/watchdog
# Email address for sending the reboot reason. This needs sendmail to
# be installed and properly configured. Maybe you should just enable
# syslog forwarding instead?
#admin = root
# Lock the daemon in to memory as a real-time process. This greatly
# decreases the chance that watchdog won't be scheduled before your
# machine is really loaded.
realtime = yes
priority = 1
# ====================== How to handle errors =======================
#
# If you have a custom binary/script to handle errors then uncomment
# this line and provide the path. For 'v1' test binary files they also
# handle error cases.
#repair-binary = /usr/sbin/repair
#repair-timeout = 60
# The retry-timeout and repair limit are used to handle errors in a
# more robust manner. Errors must persist for longer than this to
# action a repair or reboot, and if repair-maximum attempts are
# made without the test passing a reboot is initiated anyway.
#retry-timeout = 60
#repair-maximum = 1
# Configure the delay on reboot from sending SIGTERM to all processes
# and to following up with SIGKILL for any that are ignoring the polite
# request to stop.
#sigterm-delay = 5
# ====================== User-specified tests ========================
#
# Specify the directory for auto-added 'v1' test programs (any executable
# found in the 'test-directory should be listed).
#test-directory = /etc/watchdog.d
# Specify any v0 custom tests here. Multiple lines are permitted, but
# having any 'v1' programs/scripts discovered in the 'test-directory' is
# the better way.
#test-binary =
# Specify the time-out value for a test error to be reported.
#test-timeout = 60
# ====================== Typical tests ===============================
#
# Specify any IPv4 numeric addresses to be probed.
# NOTE: You should check you have permission to ping any machine before
# using it as a test. Also remember if the target goes down then this
# machine will reboot as a result!
#ping = 192.168.1.1
# Set the number of ping attempts in each 'interval' of time. Default
# is 3 and it completes on the first successful ping.
# NOTE: Round-trip delay has to be less than 'interval' / 'ping-count'
# for test success, but this is unlikely to be exceeded except possibly
# on satellite links (very unlikely case!).
# Specify any network interface to be checked for activity.
interface = eth0
# Specify any files to be checked for presence, and if desired, checked
# that they have been updated more recently than 'change' seconds.
#file = /var/log/syslog
#change = 1407
# Uncomment to enable load average tests for 1, 5 and 15 minute
# averages. Setting one of these values to '0' disables it. These
# values will hopefully never reboot your machine during normal use
# (if your machine is really hung, the loadavg will go much higher
# than 25 in most cases).
max-load-1 = 24
#max-load-5 = 18
#max-load-15 = 12
# Check available memory on the machine.
#
# The min-memory check is a passive test from reading the file
# /proc/meminfo and computed from MemFree + Buffers + Cached
# If this is below a few tens of MB you are likely to have problems.
#
# The allocatable-memory is an active test checking it can be paged
# in to use.
#
# Maximum swap should be based on normal use, probably a large part of
# available swap but paging 1GB of swap can take tens of seconds.
#
# NOTE: This is the number of pages, to get the real size, check how
# large the pagesize is on your machine (typically 4kB for x86 hardware).
#min-memory = 1
#allocatable-memory = 1
#max-swap = 0
# Check for over-temperature. Typically the temperature-sensor is a
# 'virtual file' under /sys and it contains the temperature in
# milli-Celsius. Usually these are generated by the 'sensors' package,
# but take care as device enumeration may not be fixed.
#temperature-sensor =
#max-temperature = 90
# Check for a running process/daemon by its PID file. For example,
# check if rsyslogd is still running by enabling the following line:
#pidfile = /var/run/rsyslogd.pid
This runs fine checking the status of the service:
pi#raspberrypi:~ $ sudo service watchdog status
● watchdog.service - watchdog daemon
Loaded: loaded (/lib/systemd/system/watchdog.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2022-04-12 08:01:53 BST; 2min 17s ago
Process: 2120 ExecStartPre=/bin/sh -c [ -z "${watchdog_module}" ] || [ "${watchdog_module}" = "none" ] || /sbin/modprobe $watchdog_module (code=exited, status=0/SUCCESS)
Process: 2121 ExecStart=/bin/sh -c [ $run_watchdog != 1 ] || exec /usr/sbin/watchdog $watchdog_options (code=exited, status=0/SUCCESS)
Main PID: 2126 (watchdog)
Tasks: 1 (limit: 8986)
CPU: 59ms
CGroup: /system.slice/watchdog.service
└─2126 /usr/sbin/watchdog
Apr 12 08:01:53 raspberrypi watchdog[2126]: interface: eth0
Apr 12 08:01:53 raspberrypi watchdog[2126]: temperature: no sensors to check
Apr 12 08:01:53 raspberrypi watchdog[2126]: no test binary files
Apr 12 08:01:53 raspberrypi watchdog[2126]: no repair binary files
Apr 12 08:01:53 raspberrypi watchdog[2126]: error retry time-out = 60 seconds
Apr 12 08:01:53 raspberrypi watchdog[2126]: repair attempts = 1
Apr 12 08:01:53 raspberrypi watchdog[2126]: alive=/dev/watchdog heartbeat=[none] to=root no_act=no force=no
Apr 12 08:01:53 raspberrypi watchdog[2126]: watchdog now set to 15 seconds
Apr 12 08:01:53 raspberrypi watchdog[2126]: hardware watchdog identity: Broadcom BCM2835 Watchdog timer
Apr 12 08:01:53 raspberrypi systemd[1]: Started watchdog daemon.
However when I try to enable the ping command in the conf file (#ping = 192.168.1.1) I get the following error running watchdog -v:
watchdog -v
watchdog: String 'watchdog-device' found as '/dev/watchdog'
watchdog: Integer 'watchdog-timeout' found = 15
watchdog: String 'log-dir' found as '/var/log/watchdog'
watchdog: Variable 'realtime' found as 'yes' = 1
watchdog: Integer 'priority' found = 1
watchdog: List 'ping' added as '192.168.1.1'
watchdog: List 'interface' added as 'eth0'
watchdog: Integer 'max-load-1' found = 24
watchdog: error opening socket (Operation not permitted)
This seems to indicate that it's not permitted to do the ping test.
I googled the issue and found nothing like this anywhere yet but I did try the solutions in these articles where none of them worked:
https://discuss.linuxcontainers.org/t/even-with-root-user-im-receiving-operation-not-permitted-when-try-creating-gluster-volume-between-ubuntu-14-04-lxc-containers/2699
https://superuser.com/questions/288521/problem-with-ping-open-socket-operation-not-permitted
Anyone have any ideas?
I have stopped ttyS0 (initctl stop serial DEV=ttyS0). The ttyS0 process stops for the session but reappears post to reboot, I want to disable ttyS0 at boot as it throws errors like:
Feb 19 20:19:42 sdm2 init: serial (ttyS0) main process (608881) terminated with status 1
Feb 19 20:19:42 sdm2 init: serial (ttyS0) main process ended, respawning
Feb 19 20:19:42 sdm2 init: initLogger main process (608986) terminated with status 1
I couldnt find any /etc/init/ttyS0.conf but serial.conf exists.
I searched for 'respawn' in an attempt to turn it OFF, but I found 'respawn' in serial.conf.
instance $DEV
respawn
pre-start exec /sbin/securetty $DEV
./init/serial.conf-33-exec /sbin/agetty /dev/$DEV $SPEED vt100-nav
Though /etc/ttyS0.conf doesnt exists,but I used 'echo manual | sudo tee /etc/init/ttyS0.override' to stop the ttyS0 at boot time.
-Also I removed ttyS0 from securetty.
-There is no mention of ttyS0 in inittab file.
-In grub.conf I have two console entries tty0 and console=ttyS0,115200 as well.
-/dev/ttyS0 exists but /etc/init/ttyS0.conf does nt.
Could anyone assist in stopping ttyS0 after reboot.
This problem occurs on a Pogoplug E02 running Debian jessie.
At startup the network interface takes several seconds to come online. A short delay is required after the "networking" script completes to ensure that ensuing network operations occur properly.
I wrote the following script and inserted it using update-rc.d. The script inserted correctly and executes at boot time in proper sequence, after networking and before the network-dependent scripts which were modified to depend on netdelay
cat /etc/init.d/netdelay
#! /bin/sh
### BEGIN INIT INFO
# Provides: netdelay
# Required-Start: networking
# Required-Stop:
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Delay 5s after eth0 up for Pogoplug
# Description:
### END INIT INFO
PATH=/sbin:/usr/sbin:/bin:/usr/bin
./lib/init/vars.sh
./lib/lsb/init-functions
log_action_msg "Pausing for eth0 to come online"
/bin/sleep 5
log_action_msg "Continuing"
exit 0
When the script executes at startup there is no delay. I've used both sleep and /bin/sleep in the script but neither effect the desired delay. Boot log showing this attached below.
Thu Jan 1 00:00:25 1970: Configuring network interfaces...done.
Thu Jan 1 00:00:25 1970: INIT: Entering runlevel: 2
Thu Jan 1 00:00:25 1970: Using makefile-style concurrent boot in runlevel 2.
Thu Jan 1 00:00:26 1970: Starting SASL Authentication Daemon: saslauthd.
Thu Jan 1 00:00:29 1970: Pausing for eth0 to come online.
Thu Jan 1 00:00:30 1970: Continuing.
Thu Jan 1 00:00:33 1970: ntpdate updating system time.
Wed Feb 1 05:33:40 2017: Starting enhanced syslogd: rsyslogd.
(The Pogoplug has no hardware clock and has no idea what time it is until ntpdate has run.)
Can someone see where the problem might be?
My ksh version is ksh93-
=>rpm -qa | grep ksh
ksh-20100621-3.fc13.i686
I have a simple script which is as below - #cat test_sigterm.sh -
#!/bin/ksh
trap 'echo "removing"' QUIT
while read line
do
sleep 20
done
I am Executing the script From Terminal 1 -
1. The ksh is started from /bin/ksh as below :
# exec /bin/ksh
2. The script is executed from this ksh-
# ./test_sigterm.sh&
[1] 12136
and Sending a "SIGTERM" From Terminal 2 -
# ps -elf | grep ksh
4 S root 12136 30437 0 84 4 - 1345 poll_s 13:09 pts/0 00:00:00 /bin/ksh ./test_sigterm.sh
0 S root 18952 18643 0 80 0 - 1076 pipe_w 13:12 pts/5 00:00:00 grep ksh
4 S root 30437 30329 0 80 0 - 1368 poll_s 10:04 pts/0 00:00:00 /bin/ksh
# kill -15 12136
I can see that my test_sigterm.sh is getting killed on receiving the "SIGTERM" in either case, when run in background (&) and foreground.
But the ksh man pages say -
Signals.
The INT and QUIT signals for an invoked command are ignored if the command is followed by & and the monitor option is not active.
Otherwise, signals have the values inherited by the shell from its parent (but see also the trap built-in command below).
Is it a know or default behaviour of ksh to NOT IGNORE SIGTERM? or is an issue with ksh child SIGTERM signal handling?
I believe that this is normal behaviour.
While it says that signals are normally inherited by background processes, the
action of the TERM signal is determined by whether the shell is interactive or not. (See the '-i' option in the ksh man page under Invocation.)
If you need the script to ignore SIGTERM, then you can add this line to it:
trap '' TERM
I have a processor AT91SAM9G20 running a 2.6 kernel. Watchdog is enabled at bootstrap level and configured for 16 seconds. Watchdog mode register can be configured only once.
When code hangs either in bootstrap, bootloader or kernel, the board reboots. But once kernel comes up even though watchdog is not refreshed in any of the applications, the board is not being reset after 16 seconds, but 15 minutes.
Who is refreshing the watchdog?
In our case, the watchdog should be influenced by applications, so that the board can reset if our application hangs.
These are the running processes:
1 root init
2 root [kthreadd]
3 root [ksoftirqd/0]
4 root [watchdog/0]
5 root [events/0]
6 root [khelper]
63 root [kblockd/0]
72 root [ksuspend_usbd]
78 root [khubd]
85 root [kmmcd]
107 root [pdflush]
108 root [pdflush]
109 root [kswapd0]
110 root [aio/0]
740 root [mtdblockd]
828 root [rpciod/0]
982 root [jffs2_gcd_mtd10]
1003 root /sbin/udevd -d
1145 daemon portmap
1158 dbus dbus-daemon --system
1178 root /usr/sbin/ifplugd -i eth0 -fwI -u0 -d5 -l -q
1190 root /usr/sbin/ifplugd -i eth1 -fwI -u0 -d5 -l -q
1221 default avahi-daemon: running [SP14.local]
1226 root /usr/sbin/dropbear
1246 root /root/bin/host_app
1254 root /root/bin/mini_httpd -c *.cgi -d /root/bin -u root -E /root/bin/
1256 root -sh
1257 root /sbin/syslogd -n -m 0
1258 root /sbin/klogd -n
1259 root /usr/bin/tail -f /var/log/messages
1265 root ps -e
We are using the watchdog for soft lockups available in kernel-2.6.25-ts.at91sam9g20/kernel/softlockup.c
If you enabled the watchdog driver in your kernel, the watchdog driver sets up a kernel timer, in charge of resetting the watchdog. The corresponding code is linux/drivers/watchdog/at91sam9_wdt.c. So it works like this:
If no application opens the /dev/watchdog file, then the kernel takes care of resetting the watchdog. Since it is a timer, it won't appear as a dedicated kernel thread, but handled by the soft IRQ thread. Now, if an application opens this file, it becomes responsible of the watchdog, and can reset it by writing to the file, as documented by the documentation linked in Richard's post.
Is the watchdog driver configured in your kernel?
If not, you should configure it, and see if the reset still happens. If it still happens, it is likely that your reset comes from somewhere else.
If your kernel is too old to have a proper watchdog driver (not present in 2.6.25) you should backport it from 2.6.28. Or you can try to disable the watchdog in your bootloader and see if the reset still occurs.
In July 2016 commit 3fbfe92647 (watchdog: change watchdog_need_worker logic) in the 4.7 kernel to watchdog_dev.c enabled the same behavior as shodanex's answer for all watchdog timer drivers. This doesn't seem to be documented anywhere other than this thread and the source code.
/*
* A worker to generate heartbeat requests is needed if all of the
* following conditions are true.
* - Userspace activated the watchdog.
* - The driver provided a value for the maximum hardware timeout, and
* thus is aware that the framework supports generating heartbeat
* requests.
* - Userspace requests a longer timeout than the hardware can handle.
*
* Alternatively, if userspace has not opened the watchdog
* device, we take care of feeding the watchdog if it is
* running.
*/
return (hm && watchdog_active(wdd) && t > hm) ||
(t && !watchdog_active(wdd) && watchdog_hw_running(wdd));
This may give you a hint: http://www.mjmwired.net/kernel/Documentation/watchdog/watchdog-api.txt
It makes perfect sense to have a user space daemon handling the watchdog. It probably defaults to a 15 minute timeout.
we had a similar problem regarding WDT on AT91SAM9263. Problem was with bit 29 WDIDLEHLT of WDT_MR (Address: 0xFFFFFD44) register. This bit was set to 1 but it should be 0 for our application needs.
Bit explanation from datasheet documentation:
• WDIDLEHLT: Watchdog Idle Halt
0: The Watchdog runs when the system is in idle mode.
1: The Watchdog stops when the system is in idle state.
This means that WDT counter does not increment when kernel is in idle state, hence the 15 or more delay until reset happens.
You can try "dd if=/dev/zero of=/dev/null" which will prevent kernel from entering idle state and you should get a reset in 16 seconds (or whatever period you have set in WDT_MR register).
So, the solution is to update u-boot code or other piece of code that sets WDT_MR register. Remember this register is write once...
Wouldn't the kernel be refreshing the watchdog timer? The watchdog is designed to reset the board if the whole system hangs, not just a single application.