How to debug ARM Linux kernel (msleep()) lock up? - linux

I am first of all looking for debugging tips. If some one can point out the one line of code to change or the one peripheral config bit to set to fix the problem, that would be terrific. But that's not what I'm hoping for; I'm looking more for how do I go about debugging it.
Googling "msleep hang linux kernel site:stackoverflow.com" yields 13 answers and none is on the point, so I think I'm safe to ask.
I rebuild an ARM Linux kernel for an embedded TI AM1808 ARM processor (Sitara/DaVinci?). I see the all the boot log up to the login: prompt coming out of the serial port, but trying to login gets no response, doesn't even echo what I typed.
After lots of debugging I arrived at the kernel and added debugging code between line 828 and 830 (yes, kernel version is 2.6.37). This is at this point in the kernel mode before 'sbin/init' is called:
http://lxr.linux.no/linux+v2.6.37/init/main.c#L815
Right before line 830 I added a forever loop printk and I see the results. I have let it run for about a couple of hour and it counts to about 2 million. Sample line:
dbg:init/main.c:1202: 2088430
So it has spit out 60 million bytes without problem.
However, if I add msleep(1000) in the loop, it prints only once, i.e. msleep () does not return.
Details:
Adding a conditional printk at line 4073 in the scheduler that condition on a flag that get set at the start of the forever test loop described above shows that the schedule() is no longer called when it hangs:
http://lxr.linux.no/linux+v2.6.37/kernel/sched.c#L4064
The only selections under .config/'Device Drivers' are:
Block devices
I2C support
SPI support
The kernel and its ramdisk are loaded using uboot/TFTP.
I don't believe it tries to use the Ethernet.
Since all these happened before '/sbin/init', very little should be happenning.
More details:
I have a very similar board with the same CPU. I can run the same uImage and the same ramdisk and it works fine there. I can login and do the usual things.
I have run memory test (64 MB total, limit kernel to 32M and test the other 32M; it's a single chip DDR2) and found no problem.
One board uses UART0, and the other UART2, but boot log comes out of both so it should not be the problem.
Any debugging tips is greatly appreciated.
I don't have an appropriate JTAG so I can't use that.

If msleep doesn't return or doesn't make it to schedule, then in order to debug we can follow the call stack.
msleep calls schedule_timeout_uninterruptible(timeout) which calls schedule_timeout(timeout) which in the default case exits without calling schedule if the timeout in jiffies passed to it is < 0, so that is one thing to check.
If timeout is positive , then setup_timer_on_stack(&timer, process_timeout, (unsigned long)current); is called, followed by __mod_timer(&timer, expire, false, TIMER_NOT_PINNED); before calling schedule.
If we aren't getting to schedule then something must be happening in either setup_timer_on_stack or __mod_timer.
The calltrace for setup_timer_on_stack is setup_timer_on_stack calls setup_timer_on_stack_key which calls init_timer_on_stack_key is either external if CONFIG_DEBUG_OBJECTS_TIMERS is enabled or calls init_timer_key(timer, name, key);which calls
debug_init followed by __init_timer(timer, name, key).
__mod_timer first calls timer_stats_timer_set_start_info(timer); then a whole lot of other function calls.
I would advise starting by putting a printk or two in schedule_timeout probably either side of the setup_timer_on_stack call or either side of the __mod_timer call.

This problem has been solved.
With liberal use of prink it was determined that schedule() indeed switches to another task, the idle task. In this instance, being an embedded Linux, the original code base I copied from installed an idle task. That idle task seems not appropriate for my board and has locked up the CPU and thus causing the crash. Commenting out the call to the idle task
http://lxr.linux.no/linux+v2.6.37/arch/arm/mach-davinci/cpuidle.c#L93
works around the problem.

Related

what dequeues requests queued by blk_execute_rq_nowait

I'm working on increasing a timeout in the SCSI mid-layer driver in Linux. At least, that's the quest. I'm familiarizing myself with the driver. This is turning out to be a formidable task. The Linux Documentation Project seems to be woefully out of date (the tour of the kernel is based on v 1.0.9 ... really?). I also found this from kernel.org. I'm not sure how up-to-date that is either.
A description of the problem is that we send SCSI commands through sg. Any timeout specified in sg_io_hdr_t seems to be ignored if it's longer than 30 seconds. I haven't seen anything in the sg driver code which seems to trump with 30 if the timeout requested is larger. Normally, we submit commands using the write/poll/read method through sg. I've traced through the sg code and I believe calling write(2) takes the following path:
sg_write()
sg_common_write()
blk_execute_rq_nowait()
By no means am I 100% positive of this, but it does seem plausible. My question to kernel developers here is, what call should I grep for which would dequeue this request? I haven't found anything in the references I do have which state this.
Ultimately, I'm looking for where, in the mid-layer, requests like this are dequeued for transmission to the lower layer. My premise is that, if I know what calls dequeues requests from the queue used in blk_execute_rq_nowait(), then I can grep through the appropriate source files looking for that and move from there. (If someone would be kind enough to tell me if all of the files listed in the first link are the correct list of files for the SCSI mid-layer in Linux, I thank you in advance. My kernel version: 2.6.32.)
Do I have things incorrect? Are requests like this just taken by the lower layer? I assume "no" because this seems like what the mid-layer is supposed to do: route these things to the proper place.
blk_execute_rq() - this call inserts a request at the back of the I/O scheduler queue. So you should be looking into the I/O scheduler code which dequeues the requests. You may want to start of looking at what I/O scheduler your system is running under,cat /sys/block/sda/queue/scheduler and settings under
ls /sys/block/sda/queue/scheduler
(should be something like noop [deadline] cfq), and thereafter look into the scheduler code.

Ways to get strace-like output for Heisenbug

I'm chasing a Heisenbug in a linux x64 process. (Attaching to a the process with a debugger or strace makes the problem never occur.) I've been able to put in an infinite loop when the code detects the fault and attach with gdb that way, but it just shows me that a file descriptor (fd) that should be working is no longer valid. I really want to get a history of the fd, hence trying strace, but of course that won't let the problem repo.
Other factors indicate that the problem with gdb/strace is timing. I've tried running strace with -etrace=desc or even -eraw=open and outputting to a ramdisk to see if that would reduce the strace overhead in the right way to trigger the problem, but no success. I tried running strace+, but it is a good order of magnitude slower than strace.
The process I'm attaching to is partly a commercial binary that I don't have source access to, and partly code I preload into the process space, so printf-everywhere isn't 100% possible.
Do you have suggestions for how to trace the fd history?
Update: added note about strace+
I solved the tracing problem by:
Preloading wrapper stub functions around the relevant system calls, open(), close() and poll()
Logging the relevant information in a filename created on a ramdisk.
(The actual issue was a race, with the kernel's poll() tring to access pollfd memory and returning EFAULT.)

request_irq succeeds but interrupt is never detected

I am running embedded linux 3.2.6 on an ARM processor. I am using a modified version of atmel's serial driver to control the 4 USART ports on my device. When I use the driver compiled with the kernel, all works fine. But I want to run the driver as a kernel module instead. I make all of the necessary changes and disable the internal driver and everything seems fine. The 4 tty devices are registered successfully and I can see that the all of my probe and initialization functions work correctly.
So here's the problem:
When I try to write to any of the devices, my "start transmit" function gets called but then waits for an interrupt from the usart which never occurs. So the write just hangs, and using a logic analyzer I can see that RTS gets asserted but no bytes show up on the tx line. I know that my call to request_irq succeeds and yet i never see any of the irq entries in /proc/interrupts. In the driver, I have also tried using request_irq to register a separate interrupt handler for a gpio line, and this works fine.
I know that this is a problem that is probably hard to diagnose, but I am looking for ANY possible suggestions that could lead me in the right direction to finding a solution. Let me know if you need any clarifications. Thank you
The symptoms reads like a peripheral clock that has not been enabled (or turned off): the device can be initialized w/o errors and an I/O operation can be setup, but the device doesn't do anything; it plays dead. Since no I/O ever starts, you're never going to get an interrupt indicating completion!
The other thing to check are the conditional compilation directives for HW configuration structures in your arch/arm/mach-xxx/zzz_devices.c file.
Make sure that the serial port structures have something like:
#if defined(CONFIG_SERIAL_ATMEL) || defined(CONFIG_SERIAL_ATMEL_MODULE)
and not just
#if defined(CONFIG_SERIAL_ATMEL)
Addendum
I could be wrong but the clock shouldn't have any effect on the CTS pin causing an interrupt, right?
Not right.
These digital circuits are synchronous state machines: without a clock, a change-of-state by an input cannot be processed.
Also, SoCs and modern uControllers use the peripheral clocks as on/off switches for those integrated peripherals. There is often way more functionality, i.e. peripherals, on the silicon chip than can actually be used, mostly due to insufficient quantity of pins to the board. So disabling the clocks to unused devices is employed to reduce power consumption.
You are far too focused on interrupts.
You do not have a solvable interrupt problem; those are secondary failures.
The lack of output when attempting to transmit is far more significant and revealing.
The root cause is probably a flawed configuration of the USART devices, since transmitting bits is an automatic operation for a configured & operational USART.
If the difference between not-working versus working is loadable module versus static linking, then the root cause is going to be something fundamental (and trivial) like my two suggestions.
Also your lack of acknowledgement regarding the #if defined(), e.g. you didn't respond with "Oh yeah, we already knew that", raises a gigantic red flag that says "Fix me first!"
Addendum 2
I'm tempted to delete this answer after discovering that the Atmel serial driver cannot be configured/built as a loadable module using make menuconfig (which is the premise for half of the answer). (Of course the Kconfig file could be hacked to make the config variable tristate instead of boolean to overcome the module restriction.) I've left a comment for the OP. But I also wanted to preserve the comment to Mr. Stratton pointing out how symbols in the .config file are (not) used.
So I did finally fix my problem. Thank you for the responses, none of them directly solved my problem but they did prompt further examination of my code. After some trial and error I finally got it working. I had originally moved the platform_device structures for each usart from /mach-at91/xxx_devices.c to my loadable module. Well for some reason the structures weren't getting the correct data to map to the hardware, I suppose because it wasn't correctly linking the symbols from the kernel (never got an error message though) and so some of the registration functions weren't even getting called. I ended up moving the structures and platform_device_register calls back into the devices file. I also decided to keep the driver for the console built-in using the original atmel_serial.c driver. I had to change the platform_device name for the console in both the devices file and in the built-in atmel_serial.c file in order for it to not conflict with my usart ports driver. I found that changing the platform_device and platform_driver name for the usarts from anything but "atmel_usart" resulted in usart transmission failing. I really don't understand why, but i'm just leaving it as atmel_usart so it works.
Thanks again to everybody who responded to my problem.

kernel hangs up at boot indefinitely

I have configured the kernel with linux slob allocator to implement best fit algorithm. I build the and installed the kernel image so that I can boot from it next time. Now when I try to boot this kernel it hangs indefinitely, the cursor just does not even blink. The following messages are printed before the cursor hangs up:
[0.000325] pid_max: default: 32768 minimum: 301
[0.001461] Security Framework initialized
[0.002108] AppArmor: AppArmor initialized
After this message the cursor hangs up indefinitely. I would like to know some kernel debugging tricks that would help me to navigate through the problem or some good read.
I have also configured kdb but do not know how to use it in such a condition. Any help is appriciated!!
Additional details:
I have modified the slob_page_alloc function to implement best-fit algorithm which is in turn called by slob_alloc function. I am using v3.6.2
Basically, you will need to stub-out (or mock-up) the external routines called by the best-fit algorithm code so that best-fit code can be dropped into a test program. Then use some kind of C unit-test suite and C coverage tool to help ensure that you have carefully tested all branches and all states of the code. (Unfortunately, I have no suggestions for such tools at this time.)

select() inside infinite loop uses significantly more CPU on RHEL 4.8 virtual machine than on a Solaris 10 machine

I have a daemon app written in C and is currently running with no known issues on a Solaris 10 machine. I am in the process of porting it over to Linux. I have had to make minimal changes. During testing it passes all test cases. There are no issues with its functionality. However, when I view its CPU usage when 'idle' on my Solaris machine it is using around .03% CPU. On the Virtual Machine running Red Hat Enterprise Linux 4.8 that same process uses all available CPU (usually somewhere in the 90%+ range).
My first thought was that something must be wrong with the event loop. The event loop is an infinite loop (while(1)) with a call to select(). The timeval is setup so that timeval.tv_sec = 0 and timeval.tv_usec = 1000. This seems reasonable enough for what the process is doing. As a test I bumped the timeval.tv_sec to 1. Even after doing that I saw the same issue.
Is there something I am missing about how select works on Linux vs. Unix? Or does it work differently with and OS running on a Virtual Machine? Or maybe there is something else I am missing entirely?
One more thing I am not sure which version of vmware server is being used. It was just updated about a month ago though.
I believe that Linux returns the remaining time by writing it into the time parameter of the select() call and Solaris does not. That means that a programmer who isn't aware of the POSIX spec might not reset the time parameter between calls to select.
This would result in the first call having 1000 usec timeout and all other calls using 0 usec timeout.
As Zan Lynx said, the timeval is modified by select on linux, so you should reassign the correct value before each select call. Also I suggest to check if some of the file descriptor is in a particular state (e.g. end of file, peer connection closed...). Maybe the porting is showing some latent bug in the analisys of the returned values (FD_ISSET and so on). It happened to me too some years ago in a port of a select-driven cycle: I was using the returned value in the wrong way, and a closed fd was added to the rd_set, causing select to fail. On the old platform the wrong fd was used to have a value higher than maxfd, so it was ignored. Because of the same bug, the program didn't recognize the select failure (select() == -1) and looped forever.
Bye!

Resources