Who is refreshing hardware watchdog in Linux? - linux

I have a processor AT91SAM9G20 running a 2.6 kernel. Watchdog is enabled at bootstrap level and configured for 16 seconds. Watchdog mode register can be configured only once.
When code hangs either in bootstrap, bootloader or kernel, the board reboots. But once kernel comes up even though watchdog is not refreshed in any of the applications, the board is not being reset after 16 seconds, but 15 minutes.
Who is refreshing the watchdog?
In our case, the watchdog should be influenced by applications, so that the board can reset if our application hangs.
These are the running processes:
1 root init
2 root [kthreadd]
3 root [ksoftirqd/0]
4 root [watchdog/0]
5 root [events/0]
6 root [khelper]
63 root [kblockd/0]
72 root [ksuspend_usbd]
78 root [khubd]
85 root [kmmcd]
107 root [pdflush]
108 root [pdflush]
109 root [kswapd0]
110 root [aio/0]
740 root [mtdblockd]
828 root [rpciod/0]
982 root [jffs2_gcd_mtd10]
1003 root /sbin/udevd -d
1145 daemon portmap
1158 dbus dbus-daemon --system
1178 root /usr/sbin/ifplugd -i eth0 -fwI -u0 -d5 -l -q
1190 root /usr/sbin/ifplugd -i eth1 -fwI -u0 -d5 -l -q
1221 default avahi-daemon: running [SP14.local]
1226 root /usr/sbin/dropbear
1246 root /root/bin/host_app
1254 root /root/bin/mini_httpd -c *.cgi -d /root/bin -u root -E /root/bin/
1256 root -sh
1257 root /sbin/syslogd -n -m 0
1258 root /sbin/klogd -n
1259 root /usr/bin/tail -f /var/log/messages
1265 root ps -e
We are using the watchdog for soft lockups available in kernel-2.6.25-ts.at91sam9g20/kernel/softlockup.c

If you enabled the watchdog driver in your kernel, the watchdog driver sets up a kernel timer, in charge of resetting the watchdog. The corresponding code is linux/drivers/watchdog/at91sam9_wdt.c. So it works like this:
If no application opens the /dev/watchdog file, then the kernel takes care of resetting the watchdog. Since it is a timer, it won't appear as a dedicated kernel thread, but handled by the soft IRQ thread. Now, if an application opens this file, it becomes responsible of the watchdog, and can reset it by writing to the file, as documented by the documentation linked in Richard's post.
Is the watchdog driver configured in your kernel?
If not, you should configure it, and see if the reset still happens. If it still happens, it is likely that your reset comes from somewhere else.
If your kernel is too old to have a proper watchdog driver (not present in 2.6.25) you should backport it from 2.6.28. Or you can try to disable the watchdog in your bootloader and see if the reset still occurs.

In July 2016 commit 3fbfe92647 (watchdog: change watchdog_need_worker logic) in the 4.7 kernel to watchdog_dev.c enabled the same behavior as shodanex's answer for all watchdog timer drivers. This doesn't seem to be documented anywhere other than this thread and the source code.
/*
* A worker to generate heartbeat requests is needed if all of the
* following conditions are true.
* - Userspace activated the watchdog.
* - The driver provided a value for the maximum hardware timeout, and
* thus is aware that the framework supports generating heartbeat
* requests.
* - Userspace requests a longer timeout than the hardware can handle.
*
* Alternatively, if userspace has not opened the watchdog
* device, we take care of feeding the watchdog if it is
* running.
*/
return (hm && watchdog_active(wdd) && t > hm) ||
(t && !watchdog_active(wdd) && watchdog_hw_running(wdd));

This may give you a hint: http://www.mjmwired.net/kernel/Documentation/watchdog/watchdog-api.txt
It makes perfect sense to have a user space daemon handling the watchdog. It probably defaults to a 15 minute timeout.

we had a similar problem regarding WDT on AT91SAM9263. Problem was with bit 29 WDIDLEHLT of WDT_MR (Address: 0xFFFFFD44) register. This bit was set to 1 but it should be 0 for our application needs.
Bit explanation from datasheet documentation:
• WDIDLEHLT: Watchdog Idle Halt
0: The Watchdog runs when the system is in idle mode.
1: The Watchdog stops when the system is in idle state.
This means that WDT counter does not increment when kernel is in idle state, hence the 15 or more delay until reset happens.
You can try "dd if=/dev/zero of=/dev/null" which will prevent kernel from entering idle state and you should get a reset in 16 seconds (or whatever period you have set in WDT_MR register).
So, the solution is to update u-boot code or other piece of code that sets WDT_MR register. Remember this register is write once...

Wouldn't the kernel be refreshing the watchdog timer? The watchdog is designed to reset the board if the whole system hangs, not just a single application.

Related

Why my MTD driver becomes a normal file?

I am using phram and ramoops to store the latest system log in a reserved memory, so that once my machine crashed I could dump the panic log after reboot. MTD driver phram and module ramoops are used to automatically record the system log to memory:
/# insmod /lib/modules/phram.ko phram=phram-oops,<addr>,<len>
/# ls -l /dev/mtdchar/param-oops
crw-r--r-- 1 root root 90, 24 Jul 20 16:34 phram-oops
It worked well until recently I reused this driver to also backup the boot loader log - during the boot, phram-oops backs up the u-boot log to one reserved memory area; and after Linux shell is up, dump the u-boot log, clear phram-oops by dd if=/dev/zero bs=65536 count=1 of=/dev/mtdchar/param-oops, rmmod phram and insmod phram with a new memory area for panic log. Then dumping the system logs of last boot. Until this step, /dev/mtdchar/phram-oops still works fine:
/# ls -l /dev/mtdchar/phram-oops
crw-r--r-- 1 root root 90, 24 Jul 20 16:34 /dev/mtdchar/phram-oops
However, after running dd if=/dev/zero bs=65536 count=1 of=/dev/mtdchar/phram-oops” again to clear the memory, driver/dev/mtdchar/phram-oops` becomes a file!!!
/# ls -l /dev/mtdchar/phram-oops
-rw-r--r-- 1 root root 65536 Jul 20 16:34 /dev/mtdchar/phram-oops
And as a result the previous logs remains in the memory and could not be cleared. Any idea about how a driver turns to a file? And how to fix it?
It seems this problem was caused by hotplug - it requires some delay after rmmod phram and before insmod phram with a new address. Otherwise, the device driver is very likely not correctly loaded and as a result the dd command could create it as a normal file.

How to monitor number of syscalls executed by kernel?

I need to monitor amount of system calls executed by Linux.
I'm aware that vmstat has ability to show this for BSD and AIX systems, but for Linux it can't (according to man page).
Is there any counter in /proc? Or is there any other way to monitor it?
I wrote a simple SystemTap script(based on syscalls_by_pid.stp).
It produces output like this:
ProcessName #SysCalls
munin-graph 38609
munin-cron 8160
fping 4502
check_http_demo 2584
check_nrpe 2045
sh 1836
nagios 886
sendmail 747
smokeping 649
check_http 571
check_nt 376
pcscd 216
ping 108
check_ping 100
crond 87
stapio 69
init 56
syslog-ng 27
sshd 17
ntpd 9
hp-asrd 8
hald-addon-stor 7
automount 6
httpd 4
stap 3
flow-capture 2
gam_server 2
Total 61686
The script itself:
#! /usr/bin/env stap
#
# Print the system call count by process name in descending order.
#
global syscalls
probe begin {
print ("Collecting data... Type Ctrl-C to exit and display results\n")
}
probe syscall.* {
syscalls[execname()]++
}
probe end {
printf ("%-20s %-s\n\n", "ProcessName", "#SysCalls")
summary = 0
foreach (procname in syscalls-) {
printf("%-20s %-10d\n", procname, syscalls[procname])
summary = summary + syscalls[procname]
}
printf ("\n%-20s %-d\n", "Total", summary)
}
You can use pstrace as said Jeff Foster to trace the system call.
Also, you can use strace and ltrace
strace - trace system calls and signals
ltrace - A library call tracer
You can use ptrace to monitor all syscalls (see here)
I believe OProfile can do this.
I am not aware of a centralized way to monitor syscalls throughout the entire OS. Maybe do a ptrace on the init process and follow all children? But I don't know if that will work.
Your best bet is to write a patch to the kernel itself to do this. The closest thing to this that I've seen is a cgroup implementation for enforcing permissions on what syscalls can be executed at runtime. You can find the patch here:
https://github.com/luksow/syscalls-cgroup
It shouldn't be too much more work to throw a counter in there, from a kernel programming perspective.

running slattach on pseudo tty

I try to open a network connection through a pair of pseudo tty's on linux os.
# slattach -v /dev/ptmx
cslip started on /dev/ptmx interface sl0
OK, this was the "creating side" for the pseudo tty.
I can look in /dev/pts and find the new pty there.
If I now try to use slattach also on this side I got:
slattach -v /dev/pts/3
slattach: tty_open(/dev/pts/3, RW): Input/output error
I traced with strace:
28 5505 write(1, "slattach: tty_open: trying to op"..., 46) = 46
29 5505 open("/dev/pts/3", O_RDWR|O_NONBLOCK) = -1 EIO (Input/output error)
30 5505 write(2, "slattach: tty_open(/dev/pts/3, R"..., 55) = 55
31 5505 exit_group(3)
All this happens on different distros of ubuntu, tested on 10.04 and 11.04, both are failing.
What I'm doing wrong?
You may want to take a look at the man page pty(7).
Basically, /dev/ptmx uses the Unix 98 pseudo-terminal interface and requires that your program uses grantpt(3) and unlockpt(3). Here, slattach (the one that opens /dev/ptmx, not the other one) doesn't do so, and any program that tries to open the slave pseudo-terminal associated to the master will fail, as you experienced.
You can force slattach to do grantpt() and unlockpt() by overloading the open() call with an external
routine, see this example

Getting CPU utilization information

How could I get the CPU utilization with time info of a process in linux? Basically I want to let my application run overnight. At the same time, I would like to monitor the CPU utilization during the period the application is run.
I tried top | grep appName >& log, it does not seem to return me anything in the log. Could someone help me with this?
Thanks.
vmstat and iostat can both give you periodic information of this nature; I would suggest either setting up the number of times manually, or putting a single poll into a cron job, and then redirecting the output to a file:
vmstat 20 4230 >> cpu_log_file
This would give you a snapshot of usage every 20 seconds for 24 hours.
install sysstat package and run sar
nohup sar -o output.file 12 8 >/dev/null 2>&1 &
use the top or watch command
PID COMMAND %CPU TIME #TH #WQ #PORT #MREG RPRVT RSHRD RSIZE VPRVT VSIZE PGRP PPID STATE UID FAULTS COW MSGSENT MSGRECV SYSBSD SYSMACH CSW PAGEINS USER
10764 top 8.4 00:01.04 1/1 0 24 33 2000K 244K 2576K 17M 2378M 10764 10719 running 0 9908+ 54 564790+ 282365+ 3381+ 283412+ 838+ 27 root
10763 taskgated 0.0 00:00.00 2 0 25 27 432K 244K 1004K 27M 2387M 10763 1 sleeping 0 376 60 140 60 160 109 11 0 root
Write a program that invokes your process and then calls getrusage(2) and reports statistics for its children.
You can monitor the time used by your program with top while it is running.
Alternatively, you can launch your application with the time command, which will print the total amount of CPU time used by your program at the end of its execution. Just type time ./my_app instead of just ./my_app
For more info, man 1 time

core dump files on Linux: how to get info on opened files?

I have a core dump file from a process that has probably a file descriptor leak (it opens files and sockets but apparently sometimes forgets to close some of them). Is there a way to find out which files and sockets the process had opened before crashing? I can't easily reproduce the crash, so analyzing the core file seems to be the only way to get a hint on the bug.
If you have a core file and you have compiled the program with debugging options (-g), you can see where the core was dumped:
$ gcc -g -o something something.c
$ ./something
Segmentation fault (core dumped)
$ gdb something core
You can use this to do some post-morten debugging. A few gdb commands: bt prints the stack, fr jumps to given stack frame (see the output of bt).
Now if you want to see which files are opened at a segmentation fault, just handle the SIGSEGV signal, and in the handler, just dump the contents of the /proc/PID/fd directory (i.e. with system('ls -l /proc/PID/fs') or execv).
With these information at hand you can easily find what caused the crash, which files are opened and if the crash and the file descriptor leak are connected.
Your best bet is to install a signal handler for whatever signal is crashing your program (SIGSEGV, etc.).
Then, in the signal handler, inspect /proc/self/fd, and save the contents to a file. Here is a sample of what you might see:
Anderson cxc # ls -l /proc/8247/fd
total 0
lrwx------ 1 root root 64 Sep 12 06:05 0 -> /dev/pts/0
lrwx------ 1 root root 64 Sep 12 06:05 1 -> /dev/pts/0
lrwx------ 1 root root 64 Sep 12 06:05 10 -> anon_inode:[eventpoll]
lrwx------ 1 root root 64 Sep 12 06:05 11 -> socket:[124061]
lrwx------ 1 root root 64 Sep 12 06:05 12 -> socket:[124063]
lrwx------ 1 root root 64 Sep 12 06:05 13 -> socket:[124064]
lrwx------ 1 root root 64 Sep 12 06:05 14 -> /dev/driver0
lr-x------ 1 root root 64 Sep 12 06:05 16 -> /temp/app/whatever.tar.gz
lr-x------ 1 root root 64 Sep 12 06:05 17 -> /dev/urandom
Then you can return from your signal handler, and you should get a core dump as usual.
One of the ways I jump to this information is just running strings on the core file. For instance, when I was running file on a core recently, due to the length of the folders I would get a truncated arguments list. I knew my run would have opened files from my home directory, so I just ran:
strings core.14930|grep jodie
But this is a case where I had a needle and a haystack.
If the program forgot to close those resources it might be because something like the following happened:
fd = open("/tmp/foo",O_CREAT);
//do stuff
fd = open("/tmp/bar",O_CREAT); //Oops, forgot to close(fd)
now I won't have the file descriptor for foo in memory.
If this didn't happen, you might be able to find the file descriptor number, but then again, that is not very useful because they are continuously changing, by the time you get to debug you won't know which file it actually meant at the time.
I really think you should debug this live, with strace, lsof and friends.
If there is a way to do it from the core dump, I'm eager to know it too :-)
You can try using strace to see the open, socket and close calls the program makes.
Edit: I don't think you can get the information from the core; at most it will have the file descriptors somewhere, but this still doesn't give you the actual file/socket. (Assuming you can distinguish open from closed file descriptors, which I also doubt.)
Recently during my error troubleshooting and analysis , my customer provided me a coredump which got generated in his filesystem and he went out of station in order to quickly scan through the file and read its contents i used the command
strings core.67545 > coredump.txt
and later i was able to open the file in file editor.
A core dump is a copy of the memory the process had access to when crashed. Depending on how the leak is occurring, it might have lost the reference to the handles, so it may prove to be useless.
lsof lists all currently open files in the system, you could check its output to find leaked sockets or files. Yes, you'd need to have the process running. You could run it with a specific username to easily discern which are the open files from the process you are debugging.
I hope somebody else has better information :-)
Another way to find out what files a process has opened - again, only during runtime - is looking into /proc/PID/fd/ , which contains symlinks to open files.

Resources