Where could I find the oops info from kernel logs - linux

I am a newer of driver development. I have configured my linux kernel according to the Linux Device Driver chaper 4, enabled a lot of debug configuration. When I try to test a driver written by me, the kernel issues an oops. This oops, however, immediately flushed by chunks of other debug information. So, where could I find out the oops info which occurred in a flash.
By the way, can anyone explain the meaning of debug information below?
[ 1698.129712] evbug: Event. Dev: input0, Type: 0, Code: 0, Value: 0
This type of message flushed by screen and I even can't stop them.

To avoid a lot of useless information (in your case) you have to enable only and only what you really need to debug your module. I highly recommend to disable everything you enabled back. Then case-by-case you may enable debug features.
Next, there is a nice framework in kernel called Dynamic Debug. It allows at runtime enable or disable certain debug messages (be sure you have CONFIG_DYNAMIC_DEBUG=y in the Linux kernel configuration). More detailed description is available in Documentation/dynamic-debug-howto.txt.
evbug is a module to monitor input events in the kernel. There is one of the message it can issue. It's very simple one you may check at drivers/input/evbug.c. Unfortunately, it uses printk() calls directly and you can't manipulate its output through dynamic debug.
At the end the answer to your topic question is check output of dmesg command. But be aware that the kernel buffer for output is small enough and if you have a lot of logs you may miss some of them.

Related

ncurses disable kernel messages on console screen?

Im looking for a way how to get rid of (kernel?) messages that appear in my ncurses app. I wrote the app myself, so i would prefer a API that redirects these messages to /dev/null. I mean messages like, a USB stick that is inserted.
I tried to add this, but unfortunately it doesn't work
freopen("/dev/null", "w", stderr);
I'm not running X, just ncurses direct from the console.
I mean messages like, a USB stick that is inserted.
Thanks!
UPDATE 1:
Someone votes to close this question because it would not be related to programming. But it is, i wrote the ncurses app myself, i'm looking for a way how to disable the kernel message. I updated the question.
UPDATE 2:
Let me explain what i'm doing, and whats the problem in more detail:
I'm using Tiny Core linux, thats after boots starts (self written) ncurses program. Now when you for example connect a USB drive, a message (i suspect kernel) is shown over my program. I guess the message is written straight into the framebuffer. Im using TC 5.x since i need 32 bit, im running as root and have full access to the os.
You should be able to use openvt to have your program run on a new Virtual Terminal.
I'll also note that it should be possible to embed control for the VTs yourself if you prefer to break the external dependency, but note that structures used may not be stable between kernel versions, and may require recompilation.
See the KBD project's sources, specifically openvt.c to see how it works.
Try configuring the kernel through boot parameters with the option:
loglevel=3 (or a lower value)
0 (KERN_EMERG) system is unusable
1 (KERN_ALERT) action must be taken immediately
2 (KERN_CRIT) critical conditions
3 (KERN_ERR) error conditions
4 (KERN_WARNING) warning conditions
5 (KERN_NOTICE) normal but significant condition
6 (KERN_INFO) informational
7 (KERN_DEBUG) debug-level messages
source: https://www.kernel.org/doc/Documentation/kernel-parameters.txt
See also: Change default console loglevel during boot up
It might be impossible to block some other process with sufficient access from writing to /dev/console but you may be able to redefine console as some other device, at boot time by setting console=ttyS0 (first serial port), see:
https://unix.stackexchange.com/questions/60641/linux-difference-between-dev-console-dev-tty-and-dev-tty0
Also if we know exactly which software is sending the message it may be possible to reconfigure it (possibly dynamically) but it would help to know the version and edition of Tiny Core Linux you are using?
E.g. this website has a "Core", "TinyCore" and "CorePlus" versions 1.x up to 7
http://tinycorelinux.net/downloads.html
This would help reproducing the exact same behavior and testing potential solutions.

Core files generated by linux kernel modules

I am trying to load a kernel module (out-of-tree) and dmesg shows a panic. The kernel is still up though. I guess the module panic'd.
Where to find the core file? I want to use gdb and see whats the problem.
Where to find the core file?
Core files are strictly a user-space concept.
I want to use gdb and see whats the problem.
You may be looking for KGDB and/or Kdump/Kexec.
Normally, whenever the coredump was generated, it will state "core dumped". This could be one high level easy way to confirm whether coredump got generated however, this statement alone cannot guarantee on coredump file availability. The location where coredump is generated is specified through core_pattern to kernel via sysctl. You need to check the information present in core_pattern of your system. Also, note that in case of Ubuntu, it appears that the coredump file size is kept as zero by default which will avoid generation of coredump. So, you might need to check the corefile size ulimit and change it to 'ulimit -c unlimited', if it is zero. The manpage http://man7.org/linux/man-pages/man5/core.5.html explains about various reasons due to which coredump shall not get generated.
However, from your explanation, it appears that you are facing 'kernel oops' as the kernel is still up(unstable state) even though a particular module got panic'd/killed. In such cases, kernel shall print an oops message. Refer to link https://www.kernel.org/doc/Documentation/oops-tracing.txt that has information regarding the kernel oops messages.
Abstract from the link: Normally the Oops text is read from the kernel buffers by klogd and
handed to syslogd which writes it to a syslog file, typically
/var/log/messages (depends on /etc/syslog.conf). Sometimes klogd
dies, in which case you can run dmesg > file to read the data from the
kernel buffers and save it. Or you can cat /proc/kmsg > file, however
you have to break in to stop the transfer, kmsg is a "never ending
file".
printk is used for generating the oops messages. printk does tagging of severity by means of different loglevels /priorities and allows the classification of messages according to their severity. (Different priorities are defined in file linux/kernel.h or linux/kern_levels.h, in form of macros like KERN_EMERG, KERN_ALERT, KERN_CRIT etc..)So, you may need to check the default logging levels in system by using cat /proc/sys/kernel/printk and change it as per your requirement. Also, check whether the logging daemons are up and incase you want to debug kernel, ensure that the kernel is compiled with CONFIG_DEBUG_INFO.
The method to use GDB to find the location where the kernel panicked or oopsed in ubuntu is in the link https://wiki.ubuntu.com/Kernel/KernelDebuggingTricks which can be one of the method that can be used by you for debugging kernel oops.
there won't be a core file.
You should follow the stack trace in kernel messages. type dmesg to see it.

Limitations on printing to kernel logs

I'm working on a linux device driver (kernel version 2.6.32-37). I debug my code mostly by printing to the kernel logs (using printk). everything goes well, until my computer suddenly stop responding. I've checked it over and over again and my code seems to be correct.
my question is:
is it possible that too many printings to the kernel log might cause the computer to stop responding?
Thanks a lot!
Omer
I doubt the problem is caused by printk, certainly using printk in itself slows down the whole code but it won't crash your system.
Here's a quote from Ubuntu Kernel Debugging Tricks:
The internal kernel console message buffer can sometimes be too small to capture all of the printk messages, especially when debug code generates a lot of printk messages. If the buffer fills up, it wraps around and one can lose valueable debug messages.
As you can read, when printing too much data, you'll simply start writing over some old data that you wanted to see in the log-file; that's a problem because some debugging messages will disappear but not troubling enough to crash the whole thing.
I suggest you double check your code again, try to trace when/where it's crashing and if you can't fix the problem post a question here or in some kernel hacking mailing list.
P.S Also gotta mention, you need to be careful where you're putting your printk statements as some code areas might not tolerate the delay caused by it and THIS might cause further problems leading to a freeze/crash.

How to record the system information before system hang? [duplicate]

I have an embedded board with a kernel module of thousands of lines which freeze on random and complexe use case with random time. What are the solution for me to try to debug it ?
I have already try magic System Request but it does not work. I guess that the explanation is that I am in a loop or a deadlock in a code where hardware interrupt is disable ?
Thanks,
Eva.
Typically, embedded boards have a watch dog. You should enable this timer and use the watchdog user process to kick the watch dog hard ware. Use nice on the watchdog process so that higher priority tasks must relinquish the CPU. This gives clues as to the issue. If the device does not reset with a watch dog active, then it maybe that only the network or serial port has stopped communicating. Ie, the kernel has not locked up. The issue is that there is no user visible activity. The watch dog is also useful if/when this type of issue occurs in the field.
For a kernel lockup case, the lockup watchdogs kernel features maybe useful. This will work if you have an infinite loop/deadlock as speculated. However, if this is custom hardware, it is also possible that SDRAM or a peripheral device latches up and causes abnormal bus activity. This will stop the CPU from fetching proper code; obviously, it is tough for Linux to recover from this.
You can combine the watchdog with some fallow memory that is used as a trace buffer. memmap= and mem= can limit the memory used by the kernel. A driver/device using this memory can be written that saves trace points that survive a reboot. The fallow memory's ring buffer is dumped when a watchdog reset is detected on kernel boot.
It is also useful to register thread notifiers that can do a printk on context switches, if the issue is repeatable or to discover how to make the event repeatable. Once you determine a sequence of events that leads to the lockup, you can use the scope or logic analyzer to do some final diagnosis. Or, it maybe evident which peripheral is the issue at this point.
You may also set panic=-1 and reboot=... on the kernel command line. The kdump facilities are useful, if you only have a code problem.
Related: kernel trap (at web archive). This link may no longer be available, but aren't important to this answer.

Minimal core dump (stack trace + current frame only)

Can I configure what goes into a core dump on Linux? I want to obtain something like the Windows mini-dumps (minimal information about the stack frame when the app crashed). I know you can set a max size for the core files using ulimit, but this does not allow me to control what goes inside the core (i.e. there is no guarantee that if I set the limit to 64kb it will dump the last 16 pages of the stack, for example).
Also, I would like to set it in a programmatic way (from code), if possible.
I have looked at the /proc/PID/coredump_filter file mentioned by man core, but it seems too coarse grained for my purposes.
To provide a little context: I need tiny core files, for multiple reasons: I need to collect them over the network, for numerous (thousands) of clients; furthermore, these are embedded devices with little SD cards, and GPRS modems for the network connection. So anything above ~200k is out of question.
EDIT: I am working on an embedded device which runs linux 2.6.24. The processor is PowerPC. Unfortunately, powerpc-linux is not supported in breakpad at the moment, so google breakpad is not an option
I have "solved" this issue in two ways:
I installed a signal handler for SIGSEGV, and used backtrace/backtrace_symbols to print out the stack trace. I compiled my code with -rdynamic, so even after stripping the debug info I still get a backtrace with meaningful names (while keeping the executable compact enough).
I stripped the debug info and put it in a separate file, which I will store somewhere safe, using strip; from there, I will use add22line with the info saved from the backtrace (addresses) to understand where the problem happened. This way I have to store only a few bytes.
Alternatively, I found I could use the /proc/self/coredump_filter to dump no memory (setting its content to "0"): only thread and proc info, registers, stacktrace etc. are saved in the core. See more in this answer
I still lose information that could be precious (global and local variable(s) content, params..). I could easily figure out which page(s) to dump, but unfortunately there is no way to specify a "dump-these-pages" for normal core dumps (unless you are willing to go and patch the maydump() function in the kernel).
For now, I'm quite happy with there 2 solutions (it is better than nothing..) My next moves will be:
see how difficult would be to port Breakpad to powerpc-linux: there are already powerpc-darwin and i386-linux so.. how hard can it be? :)
try to use google-coredumper to dump only a few pages around the current ESP (that should give me locals and parameters) and around "&some_global" (that should give me globals).

Resources