Kernel Oops 17 for ARM - linux

For a few days, I have investigated this problem, but knowing hardly could not be.
linux version : 2.6.39
board : at91sam9x25
Processor : ARM926EJ-S rev 5
After the process has been run successfully for more than one day, displays the message oops suddenly.
look at the r0. it is measns struct kmem_cache pointer.
keme_cache pointer was initialized when linux booting.
Then the r0 is should never change.
but the oops message, the r0 is 0.. it is impossible
i don't know why the r0 was changed to 0.
i want how to debugging or answer
please help me.
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = cfb1c000
[00000000] *pgd=2fbfa831, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1]
last sysfs file:
Modules linked in:
CPU: 0 Not tainted (2.6.39 #216)
PC is at kmem_cache_alloc+0x24/0x98
LR is at getname_flags+0x20/0xe8
pc : [] lr : [] psr: 40000093
sp : cfbb5f40 ip : 00000000 fp : 00000000
r10: 00000000 r9 : cfbb4000 r8 : ffffff9c
r7 : 00000000 r6 : 00000000 r5 : 40000013 r4 : 40202e7e
r3 : 40000093 r2 : 00000100 r1 : 000000d0 r0 : 00000000
Flags: nZcv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user
Control: 0005317f Table: 2fb1c000 DAC: 00000015
Process MEG (pid: 589, stack limit = 0xcfbb4270)
Stack: (0xcfbb5f40 to 0xcfbb6000)
5f40: 00000000 40202e7e 00000000 00000000 00000000 c00a1e9c 00014220 00000001
5f60: 00000000 00000000 00000005 c0094efc 00000000 00000011 00000000 00000000
5f80: 00000024 00000100 00000000 00000000 4020e040 4020dee0 00000005 c0030b28
5fa0: 00000000 c0030980 00000000 4020e040 40202e7e 00000000 00000000 00000050
5fc0: 00000000 4020e040 4020dee0 00000005 402123dc 00000000 be3ffea0 00000000
5fe0: 40202e87 be3ffd58 401d949c 401c3010 60000010 40202e7e 00000000 00000000
[] (kmem_cache_alloc+0x24/0x98) from [] (getname_flags+0x20/0xe8)
[] (getname_flags+0x20/0xe8) from [] (do_sys_open+0xa8/0x1ac)
[] (do_sys_open+0xa8/0x1ac) from [] (ret_fast_syscall+0x0/0x2c)
Code: e0016003 e10f5000 e3853080 e121f003 (e5901000)
---[ end trace d33e1f5c547d52cb ]---
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c0004000
[00000000] *pgd=00000000
Internal error: Oops: 17 [#2]
last sysfs file:
Modules linked in:
CPU: 0 Tainted: G D (2.6.39 #216)
PC is at kmem_cache_free+0x18/0xcc
LR is at rcu_process_callbacks+0x6c/0x84
Blockquote
pc : [] lr : [] psr: 20000093
sp : cf833f68 ip : cf833f30 fp : 00000000
r10: 00000000 r9 : c0058dc4 r8 : cfbff540
r7 : 20000013 r6 : c042df1c r5 : cfbff540 r4 : cfbbf760
r3 : 20000093 r2 : 00000000 r1 : cfbff540 r0 : 00000000
Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel
Control: 0005317f Table: 2fb1c000 DAC: 00000017
Process rcu_kthread (pid: 6, stack limit = 0xcf832270)
Stack: (0xcf833f68 to 0xcf834000)
3f60: cfbbf760 cfbff540 c042df1c cf833f94 cf833fa0 c00705d0
3f80: cf815040 cf815040 cf832000 c00706a4 c031057c 00000000 cf815040 c0058dc4
3fa0: cf833fa0 cf833fa0 cf833fd4 cf81bf74 00000000 c00705e8 00000000 00000000
3fc0: 00000000 c0058a04 c003185c 00000000 00000000 00000000 cf833fd8 cf833fd8
3fe0: 00000000 cf81bf74 c0058984 c003185c 00000013 c003185c 00020018 18000004
[] (kmem_cache_free+0x18/0xcc) from [] (rcu_process_callbacks+0x6c/0x84)
[] (rcu_process_callbacks+0x6c/0x84) from [] (rcu_kthread+0xbc/0xe4)
[] (rcu_kthread+0xbc/0xe4) from [] (kthread+0x80/0x88)
[] (kthread+0x80/0x88) from [] (kernel_thread_exit+0x0/0x8)
Code: e1a08001 e10f7000 e3873080 e121f003 (e5904000)
---[ end trace d33e1f5c547d52cc ]---
Kernel panic - not syncing: Fatal exception in interrupt
[] (unwind_backtrace+0x0/0xec) from [] (panic+0x4c/0x180)
[] (panic+0x4c/0x180) from [] (die+0x180/0x1c4)
[] (die+0x180/0x1c4) from [] (__do_kernel_fault+0x64/0x84)
[] (__do_kernel_fault+0x64/0x84) from [] (do_page_fault+0x1b8/0x1d0)
[] (do_page_fault+0x1b8/0x1d0) from [] (do_DataAbort+0x34/0x94)
[] (do_DataAbort+0x34/0x94) from [] (__dabt_svc+0x4c/0x60)
Exception stack(0xcf833f20 to 0xcf833f68)
3f20: 00000000 cfbff540 00000000 20000093 cfbbf760 cfbff540 c042df1c 20000013
3f40: cfbff540 c0058dc4 00000000 00000000 cf833f30 cf833f68 c00705d0 c0093080
3f60: 20000093 ffffffff
[] (__dabt_svc+0x4c/0x60) from [] (kmem_cache_free+0x18/0xcc)
[] (kmem_cache_free+0x18/0xcc) from [] (rcu_process_callbacks+0x6c/0x84)
[] (rcu_process_callbacks+0x6c/0x84) from [] (rcu_kthread+0xbc/0xe4)
[] (rcu_kthread+0xbc/0xe4) from [] (kthread+0x80/0x88)
[] (kthread+0x80/0x88) from [] (kernel_thread_exit+0x0/0x8)

Related

Executing boot ROM functions in Linux drivers

I'm trying to execute some functions from the boot ROM on an NXP IMX6UL in a Linux device driver. I figured a device driver is the only place I can get manage this.
Currently, I map the boot ROM using devm_ioremap_resource() and I can read the ROM table in the device fine and it shows the values as expected. The problem comes when I try and execute a function from there, I get a paging request error and crash.
I get the following crash message:
Unable to handle kernel paging request at virtual address bf968f88
pgd = 8e5fa23c
[bf968f88] *pgd=b839e811, *pte=00008653, *ppte=00008453
Internal error: Oops: 8000000f [#1] PREEMPT ARM
Modules linked in:
CPU: 0 PID: 299 Comm: sh Not tainted 4.19.35-00007-ga99feb79b139-dirty #639
Hardware name: Freescale i.MX6 UltraLite (Device Tree)
PC is at 0xbf968f88
LR is at hab_rvt_entry+0x98/0xb4
pc : [<bf968f88>] lr : [<804ee430>] psr: 600f0033
sp : b9f85ea8 ip : 00000000 fp : 00000000
r10: b9e55e90 r9 : b9f85f78 r8 : b9a96800
r7 : 00000002 r6 : bf960000 r5 : 00008f89 r4 : bf968f89
r3 : fde952f0 r2 : fde952f0 r1 : 00000001 r0 : 00000000
Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment none
Control: 10c53c7d Table: b9ea8059 DAC: 00000051
Process sh (pid: 299, stack limit = 0x00d86b0c)
Stack: (0xb9f85ea8 to 0xb9f86000)
5ea0: 00000002 b9e55e80 00000000 00000000 b9a96800 8027b3ec
5ec0: 00000000 00000000 81004048 8027b304 002478d0 b9f85f78 00000000 002478d0
5ee0: 00000002 802032c0 00002ee7 00000000 81004048 fde952f0 81004048 7e87c490
5f00: 00235a30 80208600 000007ff 00008180 00000001 00001000 00000000 00000000
5f20: 00000000 00000000 00002ee7 00000000 00000000 fde952f0 b98f1164 00000002
5f40: b9cbb840 002478d0 b9f85f78 00000000 002478d0 8020357c 5dca454a 00000000
5f60: 81004048 b9cbb840 00000000 00000000 b9cbb840 80203794 00000000 00000000
5f80: 00000000 fde952f0 00000002 002478d0 76ec0d98 00000004 80101204 b9f84000
5fa0: 00000004 80101000 00000002 002478d0 00000001 002478d0 00000002 00000000
5fc0: 00000002 002478d0 76ec0d98 00000004 002478d0 00000002 00000000 00000000
5fe0: 00000064 7e87c9d0 76de9ce0 76e42a74 600e0010 00000001 00000000 00000000
[<804ee430>] (hab_rvt_entry) from [<8027b3ec>] (kernfs_fop_write+0xe8/0x1c8)
[<8027b3ec>] (kernfs_fop_write) from [<802032c0>] (__vfs_write+0x2c/0x160)
[<802032c0>] (__vfs_write) from [<8020357c>] (vfs_write+0xa4/0x17c)
[<8020357c>] (vfs_write) from [<80203794>] (ksys_write+0x4c/0xac)
[<80203794>] (ksys_write) from [<80101000>] (ret_fast_syscall+0x0/0x54)
Exception stack(0xb9f85fa8 to 0xb9f85ff0)
5fa0: 00000002 002478d0 00000001 002478d0 00000002 00000000
5fc0: 00000002 002478d0 76ec0d98 00000004 002478d0 00000002 00000000 00000000
5fe0: 00000064 7e87c9d0 76de9ce0 76e42a74
Code: ffc4 f7fd f833 e7fe (b5f0) b087
For reference and to make sense of these error messages a bit, BF960000 is what the base of my boot ROM is mapped to, and the address of the command I'm trying to execute is physically is at 8F89, virtually at BF968F89.
Is there any way to execute commands like this that exist in the boot ROM?

How to handle a page domain fault in a self written character device kernel module?

Hej
I am using yocto and meta-atmel to generate an own embedded Linux for the SAMA5D3x platform from Atmel. This includes an self written kernel module. It's a quite simple character device (chrdev), whicht toggles pins to switch on/off LEDs.
When I build it into the kernel 4.1 it worked fine. But when migrating to kernel 4.4 it crashes with a "page domain fault" at the write function.
The code till it crushes is shown below:
//! reads the commands from the i/o
static ssize_t dev_write(struct file *filp, const char *buff, size_t len, loff_t *off)
{
char * szDevice;
int deviceLen;
char * szPara;
int paraLen;
char * szValue;
int valueLen;
size_t remBytes;
char * szErrorStr;
int devIndex, paraIndx;
TBoardLed_State state;
char tb[len+1];
memcpy(tb, buff, len);
tb[len] = 0;
printk(KERN_INFO "%s: dev_write: %s (%i)\n", dSEK4Dev_indi, tb, (int) len);
The error print is:
[ 107.140000] Unhandled fault: page domain fault (0x01b) at 0x00101090
[ 107.140000] pgd = d41a4000
[ 107.140000] [00101090] *pgd=346e1831, *pte=3f5ba34f, *ppte=3f5ba83f
[ 107.140000] Internal error: : 1b [#1] ARM
[ 107.140000] Modules linked in: sek4matrixled(O) sek4comconfig(O) sek4boardled(O)
[ 107.140000] CPU: 0 PID: 428 Comm: sh Tainted: G O 4.4.19-linux4sam_5.4 #1
[ 107.140000] Hardware name: Atmel SAMA5
[ 107.140000] task: d45a0040 ti: d45b4000 task.ti: d45b4000
[ 107.140000] PC is at memcpy+0x7c/0x330
[ 107.140000] LR is at dev_write+0x2c/0x25c [sek4boardled]
[ 107.140000] pc : [<c020effc>] lr : [<bf0002f8>] psr: 00020013
sp : d45b5e74 ip : 0000000c fp : d45b5efc
[ 107.140000] r10: 00000000 r9 : d45b4000 r8 : c000f564
[ 107.140000] r7 : d45b5f88 r6 : 00101090 r5 : 00000015 r4 : d45b5ea8
[ 107.140000] r3 : 00000018 r2 : fffffff5 r1 : 00101090 r0 : d45b5ea8
[ 107.140000] Flags: nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[ 107.140000] Control: 10c53c7d Table: 341a4059 DAC: 00000051
[ 107.140000] Process sh (pid: 428, stack limit = 0xd45b4208)
[ 107.140000] Stack: (0xd45b5e74 to 0xd45b6000)
[ 107.140000] 5e60: 00000015 00101090 d45b5f88
[ 107.140000] 5e80: c000f564 d45b5ea8 d45b5ea8 bf0002f8 00000000 00000000 d45b4000 00000068
[ 107.140000] 5ea0: d45b5ed8 befff3f0 befff3f0 c0219c8c d46e07fc d45b5fb0 d45a0040 d4650540
[ 107.140000] 5ec0: 00000817 0010209c d4650574 00000055 00000800 c001674c 00000006 d457e1c0
[ 107.140000] 5ee0: bf0002cc 00101090 d45b5f88 c000f564 d45b4000 00000000 00000000 c00a2ae8
[ 107.140000] 5f00: b6f627cc 00006950 00007958 c000928c 00001000 00000000 00000000 00000000
[ 107.140000] 5f20: 57dabaed 258d097f 57dabaed 258d097f 57dabaed 258d097f 000005e5 00000000
[ 107.140000] 5f40: befff3f0 b6f62d58 b6f62d58 d457e1c0 00000015 00101090 d45b5f88 c000f564
[ 107.140000] 5f60: d45b4000 c00a32b0 00000000 0fa00000 d457e1c0 d457e1c0 00101090 00000015
[ 107.140000] 5f80: c000f564 c00a3ac8 00000000 00000000 b6fd16d0 00000015 00101090 b6f62d58
[ 107.140000] 5fa0: 00000004 c000f3a0 00000015 00101090 00000001 00101090 00000015 00000000
[ 107.140000] 5fc0: 00000015 00101090 b6f62d58 00000004 00000015 000ed124 00000001 00000000
[ 107.140000] 5fe0: 00000000 befff954 b6e8fe6c b6ee8f80 60020010 00000001 00000000 00000000
[ 107.140000] [<c020effc>] (memcpy) from [<bf0002f8>] (dev_write+0x2c/0x25c [sek4boardled])
[ 107.140000] [<bf0002f8>] (dev_write [sek4boardled]) from [<c00a2ae8>] (__vfs_write+0x1c/0xd8)
[ 107.140000] [<c00a2ae8>] (__vfs_write) from [<c00a32b0>] (vfs_write+0x90/0x16c)
[ 107.140000] [<c00a32b0>] (vfs_write) from [<c00a3ac8>] (SyS_write+0x44/0x9c)
[ 107.140000] [<c00a3ac8>] (SyS_write) from [<c000f3a0>] (ret_fast_syscall+0x0/0x3c)
[ 107.140000] Code: ea000011 e320f000 e4913004 e4914004 (e4915004)
[ 107.140000] ---[ end trace 2c62698a45a8d21d ]---
For me it looks like, that my module is not allowed to read data from userspace. But I have no idea how to overcome this error.
Any ideas?
As Tsyvarev mentioned, the input buffer needs to be copied from user space to kernel space via copy_from_user. After memcpy is replaced by copy_from_user the module works fine.
The page domain faults occurs when CONFIG_CPU_SW_DOMAIN_PAN is enabled. The CONFIG_CPU_SW_DOMAIN_PAN cannot allow the copy write directly to userspace from kernelspace.
Solution : Either remove this driver or modify your code.

Unable to handle kernel paging request at virtual address - Kernel OOPS

I've had a kernel OOPS the other day running speaker-test on my Freescale i.MX233. Presumably happened after an attempted SIGTERM on speaker-test (it could be any other time, ). After the OOPS I could see waiting for IO running at all unused cpu time. The process invoking speaker-test couldn't be terminated either. Tried SIGKILL as well. "ps ax" was also hanging after execution.
Luckily I've managed to extract the OOPS from the messages. I've searched all over the internet but couldn't really explain everything that I'm seeing in this OOPS.
What I really can't figure out is what can actually cause this and how can I backtrace it to a specific driver. The mxs audio drivers are built-in in the kernel, so it won't be visible in the drivers list. The driver itself has been heavily modified, on request I can share parts of it.
So the kernel addresses are starting at 0xc0000000, but why is the process stack part of the kernel memory address region? Isn't that supposed to be starting downwards from kernel addresses?
Speaker-test in use is 1.0.11rc2, but I presume even if the program would end abruptly the sound architecture would close everything properly. This version of speaker-test doesn't handle signals and is not attempting to close gracefully, just gives up.
What region would 0xe1a0a024 be? Is that an ARM instruction perhaps? Meaning this will a stack overflow somewhere? I know the memory mapped registers reside in 0x80000000. What region is "pgd = c39dc000" in? Is that kernel stack?
Is it possible to get more stack dump the next time on an OOPS, so that I can possibly go further? I can change the kernel if that's necessary (I guess the I should just go to the OOPS printer to get more), but is there a configuration for this?
Any ideas? Any helps is greatly appreciated, I'm looking at this for 2 days now.
OOPS:
<1>[268811.560000] Unable to handle kernel paging request at virtual address e1a0a024
<1>[268811.560000] pgd = c39dc000
<1>[268811.560000] [e1a0a024] *pgd=00000000
<4>[268811.560000] Internal error: Oops: 5 [#1] PREEMPT
<4>[268811.560000] Modules linked in:
<4>[268811.560000] CPU: 0 Tainted: P (2.6.31-private #153)
<4>[268811.560000] PC is at vma_prio_tree_next+0x3c/0x6c
<4>[268811.560000] LR is at update_mmu_cache+0x120/0x1c4
<4>[268811.560000] pc : [<c00b98d8>] lr : [<c00611d4>] psr: a0000093
<4>[268811.560000] sp : c39c5de0 ip : c5cfa8f8 fp : c7ce7d80
<4>[268811.560000] r10: c7ce7d80 r9 : 401fb000 r8 : 401fb000
<4>[268811.560000] r7 : 00000021 r6 : c5c9b478 r5 : 00000000 r4 : 401fad94
<4>[268811.560000] r3 : e08f7007 r2 : ea00014d r1 : c39c5dec r0 : e1a0a000
<4>[268811.560000] Flags: NzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user
<4>[268811.560000] Control: 0005317f Table: 439dc000 DAC: 00000015
<4>[268811.560000] Process speaker-test (pid: 1823, stack limit = 0xc39c4270)
<4>[268811.560000] Stack: (0xc39c5de0 to 0xc39c6000)
<4>[268811.560000] 5de0: 401fad94 c00611d4 c5f351c0 c7d34c84 00000080 00000000 00000000 c747d3c0
<4>[268811.560000] 5e00: 00000021 00000021 00000000 00000000 4507630f c04a4ec0 00000000 c5c9b478
<4>[268811.560000] 5e20: 00000000 c00bbc68 c7802060 00000000 00000200 c3a22fec 00000000 00000021
<4>[268811.560000] 5e40: 401fb000 c04a4ec0 c5c9b108 c3a22800 c39dd000 c5c9b478 c5c9b478 401fb000
<4>[268811.560000] 5e60: 00000000 00000000 c7ce7d80 c00bc69c 00000021 00000000 00000000 00000000
<4>[268811.560000] 5e80: 000001fb c39dc000 00000200 000007ec c3a22fec c5c0612c 00000010 00000000
<4>[268811.560000] 5ea0: 00000000 c749b0b0 0000000a c03b030c c5de0c00 c5c9b478 c7ce7db4 401fb290
<4>[268811.560000] 5ec0: c39c5fb0 c7ce7d80 00000017 c0060a30 c7d34cb8 00000000 00000200 00000000
<4>[268811.560000] 5ee0: 00000000 c03b030c 00000006 c03b037c 00000017 c39c5fb0 0000000b 401fb290
<4>[268811.560000] 5f00: be93295c c005a228 00000000 00000000 c7ce7d80 c00bc69c 0000000a 00000000
<4>[268811.560000] 5f20: 00000000 00000000 000001b0 c39dc000 00000200 000006c0 c3a22ec0 401c3000
<4>[268811.560000] 5f40: 00000001 00000000 40025050 c00858f8 00000021 ffffffff c5de0c00 c5c9b948
<4>[268811.560000] 5f60: 00000000 c01697e8 00000200 c5de0c00 c5c9b948 c0060ac4 c005af84 be932c38
<4>[268811.560000] 5f80: 00000008 00000000 c39c4000 ffffffff 00000006 ffffffff 00000006 be9329e8
<4>[268811.560000] 5fa0: be9329e8 0000000c 403004d0 c005ad9c 00000000 00000000 0000000c 00000000
<4>[268811.560000] 5fc0: 0000000c 00000006 be9329e8 be9329e8 0000000c 0000000b 403004d0 be93295c
<4>[268811.560000] 5fe0: 0000c718 be9328f8 401fa7bc 401fadb8 20000010 ffffffff 00000000 00000000
<4>[268811.560000] [<c00b98d8>] (vma_prio_tree_next+0x3c/0x6c) from [<c00611d4>] (update_mmu_cache+0x120/0x1c4)
<4>[268811.560000] [<c00611d4>] (update_mmu_cache+0x120/0x1c4) from [<c00bbc68>] (__do_fault+0x308/0x3ec)
<4>[268811.560000] [<c00bbc68>] (__do_fault+0x308/0x3ec) from [<c00bc69c>] (handle_mm_fault+0x298/0xc14)
<4>[268811.560000] [<c00bc69c>] (handle_mm_fault+0x298/0xc14) from [<c0060a30>] (do_page_fault+0xec/0x234)
<4>[268811.560000] [<c0060a30>] (do_page_fault+0xec/0x234) from [<c005a228>] (do_DataAbort+0x30/0x90)
<4>[268811.560000] [<c005a228>] (do_DataAbort+0x30/0x90) from [<c005ad9c>] (ret_from_exception+0x0/0x10)
<4>[268811.560000] Exception stack(0xc39c5fb0 to 0xc39c5ff8)
<4>[268811.560000] 5fa0: 00000000 00000000 0000000c 00000000
<4>[268811.560000] 5fc0: 0000000c 00000006 be9329e8 be9329e8 0000000c 0000000b 403004d0 be93295c
<4>[268811.560000] 5fe0: 0000c718 be9328f8 401fa7bc 401fadb8 20000010 ffffffff
<4>[268811.560000] Code: e2430024 e5903030 e3530000 1a000001 (e5903024)
<4>[268811.560000] ---[ end trace c70c22c7b9cf390d ]---
<6>[268811.560000] note: speaker-test[1823] exited with preempt_count 2
After some torture we have realized that this was most likely a hardware issue with memory controller, so the reason why it is very hard to understand what's there and what is happening is that because it is a random memory corruption.

Kernel debugging from /dev/kmsg

I am having some problem with a (customized) driver (smsc95xx) which runs on my embedded systems, and I would need to understand where the issue exactely comes from.
For example, this is a kernel error message from /dev/kmsg reporting the issue:
1,737,1433656890,-;Unable to handle kernel NULL pointer dereference at virtual address 000001a0
1,738,1433665618,-;pgd = daafc000
1,739,1433668609,-;[000001a0] *pgd=9d5dd831, *pte=00000000, *ppte=00000000
0,740,1433675720,-;Internal error: Oops: 17 [#2] SMP ARM
4,741,1433680664,-;Modules linked in: ctr ccm ecb hci_uart rfcomm bnep bluetooth arc4 usb_trimble(O) wl18xx wlcore mac80211 cfg80211 rfkill wlcore_sdio twl4030_madc industrialio ftdi_sio smsc95xx(O) usbserial(O) ipv6
4,742,1433700378,-;CPU: 0 PID: 17418 Comm: sh Tainted: G D O 3.18.18-custom #20
4,743,1433708343,-;task: de30cd40 ti: da9b8000 task.ti: da9b8000
4,744,1433714050,-;PC is at __pm_runtime_resume+0x1c/0x64
4,745,1433719085,-;LR is at usb_autopm_get_interface+0x18/0x5c
4,746,1433724578,-;pc : [<c03cb590>] lr : [<c04677d4>] psr: 20000013\x0asp : da9b9ea8 ip : da9b9f14 fp : 00000000
4,747,1433736633,-;r10: daa22a4c r9 : 00000024 r8 : 00000004
4,748,1433742126,-;r7 : 000000a0 r6 : 00000004 r5 : 00000000 r4 : 00000020
4,749,1433748992,-;r3 : 000001a0 r2 : 00000040 r1 : 00000004 r0 : 00000020
4,750,1433755859,-;Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
4,751,1433763366,-;Control: 10c5387d Table: 9aafc019 DAC: 00000015
0,752,1433769378,-;Process sh (pid: 17418, stack limit = 0xda9b8240)
0,753,1433775421,-;Stack: (0xda9b9ea8 to 0xda9ba000)
0,754,1433779998,-;9ea0: 00000000 00000000 00000000 00000020 000000a0 c04677d4
0,755,1433788604,-;9ec0: dd31f680 00000000 00000040 c04574c8 c01ae218 c0085f58 00000001 00000000
0,756,1433797210,-;9ee0: 00000000 00000024 c04574a0 dd31f680 c0457510 de687a00 da9b9f88 bf0d44e4
0,757,1433805816,-;9f00: 00000024 da9b9f14 00000004 de687a00 da9b9f88 01110000 00000000 bf0d7990
0,758,1433814422,-;9f20: bf0d7cbc 00000000 00000000 bf0d4554 00000002 00000002 daa22a40 c01ae24c
0,759,1433823028,-;9f40: 00000000 00000000 dd3721c0 00000002 000eb408 da9b9f88 c000e824 da9b8000
0,760,1433831634,-;9f60: 00000000 c0145fd8 de30cd40 c08f20d4 dd3721c0 dd3721c0 00000002 000eb408
0,761,1433840240,-;9f80: c000e824 c01464e0 00000000 00000000 00000000 00000002 000eb408 b6ee1d60
0,762,1433848815,-;9fa0: 00000004 c000e660 00000002 000eb408 00000001 000eb408 00000002 00000000
0,763,1433857421,-;9fc0: 00000002 000eb408 b6ee1d60 00000004 00000000 000e515c 00000001 00000000
0,764,1433865997,-;9fe0: 00000000 beaef904 b6e1946c b6e7139c 60000010 00000001 00000000 00000000
4,765,1433874603,-;[<c03cb590>] (__pm_runtime_resume) from [<c04677d4>] (usb_autopm_get_interface+0x18/0x5c)
4,766,1433884307,-;[<c04677d4>] (usb_autopm_get_interface) from [<c04574c8>] (usbnet_write_cmd+0x28/0x70)
4,767,1433893737,-;[<c04574c8>] (usbnet_write_cmd) from [<bf0d44e4>] (__smsc95xx_write_reg+0x50/0x8c [smsc95xx])
4,768,1433903839,-;[<bf0d44e4>] (__smsc95xx_write_reg [smsc95xx]) from [<bf0d4554>] (smsc95xx_store+0x34/0x218 [smsc95xx])
4,769,1433914794,-;[<bf0d4554>] (smsc95xx_store [smsc95xx]) from [<c01ae24c>] (kernfs_fop_write+0xc0/0x184)
4,770,1433924438,-;[<c01ae24c>] (kernfs_fop_write) from [<c0145fd8>] (vfs_write+0xa0/0x1ac)
4,771,1433932586,-;[<c0145fd8>] (vfs_write) from [<c01464e0>] (SyS_write+0x44/0x9c)
4,772,1433940002,-;[<c01464e0>] (SyS_write) from [<c000e660>] (ret_fast_syscall+0x0/0x50)
0,773,1433947967,-;Code: e1a04000 0a000006 e2803d06 f5d3f000 (e1932f9f)
4,774,1433954650,-;---[ end trace bdd277dec40e1d5c ]---
I suppose the most important part are the last few lines:
4,765,1433874603,-;[<c03cb590>] (__pm_runtime_resume) from [<c04677d4>] (usb_autopm_get_interface+0x18/0x5c)
4,766,1433884307,-;[<c04677d4>] (usb_autopm_get_interface) from [<c04574c8>] (usbnet_write_cmd+0x28/0x70)
4,767,1433893737,-;[<c04574c8>] (usbnet_write_cmd) from [<bf0d44e4>] (__smsc95xx_write_reg+0x50/0x8c [smsc95xx])
4,768,1433903839,-;[<bf0d44e4>] (__smsc95xx_write_reg [smsc95xx]) from [<bf0d4554>] (smsc95xx_store+0x34/0x218 [smsc95xx])
4,769,1433914794,-;[<bf0d4554>] (smsc95xx_store [smsc95xx]) from [<c01ae24c>] (kernfs_fop_write+0xc0/0x184)
4,770,1433924438,-;[<c01ae24c>] (kernfs_fop_write) from [<c0145fd8>] (vfs_write+0xa0/0x1ac)
4,771,1433932586,-;[<c0145fd8>] (vfs_write) from [<c01464e0>] (SyS_write+0x44/0x9c)
4,772,1433940002,-;[<c01464e0>] (SyS_write) from [<c000e660>] (ret_fast_syscall+0x0/0x50)
but maybe there is a better way than checking /dev/kmsg to understand this output ?
Problem solved.
The driver was modified to create the files into the
/sys/class/dirnamae/files
directory (where dirname and files are named into the driver's code).
The problem was that the driver did not delete the directory previously created, so unplugging and replugging the device and then writing into the files was causing the kernel error showed before, because it's like writing into a memory area which is not referenced any more.
The solution is to delete the
/sys/class/dirnamae
and recreating it every time the device is unplugged.

How to understand the ARM registers dumped by kernel panic?

After Linux kernel oops on ARM platform, registers are dumped to console. But I got confused with analyzing these registers.
For example,
Unable to handle kernel paging request at virtual address 0b56e8b8
pgd = c0004000
[0b56e8b8] *pgd=00000000
Internal error: Oops: 5 [#1] PREEMPT SMP ARM
......
pc : [<bf65e7c0>] lr : [<bf65ec14>] psr: 20000113
sp : c07059f0 ip : 00008d4c fp : c0705a3c
r10: 00000003 r9 : e8bcd800 r8 : e88b006c
r7 : 0000e203 r6 : c0705a44 r5 : e88b0000 r4 : 0b56e8b8
r3 : 00000000 r2 : 00000b56 r1 : e4592e10 r0 : e889570c
Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel
Control: 10c5787d Table: 69fec06a DAC: 00000015
SP: 0xc0705970:
5970 e8e70000 e45de100 00000181 00000180 c070599c bf65e7c0 20000113 ffffffff
5990 c07059dc e88b006c c0705a3c c07059a8 c000e318 c0008360 e889570c e4592e10
59b0 00000b56 00000000 0b56e8b8 e88b0000 c0705a44 0000e203 e88b006c e8bcd800
59d0 00000003 c0705a3c 00008d4c c07059f0 bf65ec14 bf65e7c0 20000113 ffffffff
59f0 e8b80000 e2030b56 00000000 e889570c 00000003 e88b006c c007eccc c007ebb4
5a10 00000000 eacc0480 e88b0000 00002098 e9c80480 e8c08000 00000000 e8bcdc80
5a30 c0705a5c c0705a40 bf65ec14 bf65e6c0 bf5e51c4 00000000 e88b0000 00000000
5a50 c0705a74 c0705a60 bf65ecfc bf65ebe4 e4554500 e4554500 c0705a84 c0705a78
R5: 0xe88aff80:
ff80 bf10f0b0 e8aca4c0 e88aff8c e88b1680 00000000 bf05b70c e87c3580 00000000
ffa0 bf095024 e87c3580 00000000 bf095024 e87c3580 00000000 bf095024 00000001
ffc0 00000004 ebd83000 00000793 e8cc2500 00000002 00000004 00000043 ffffffff
ffe0 40320354 be9ee8d8 00030444 40320380 20000010 00000000 70cfe821 70cfec21
0000 bf81e1f8 e88b0018 e88b000c e88e9a00 00000000 bf095024 00000000 fffffffe
0020 00000000 00000000 fffffffe 00000000 00000000 fffffffe 00000000 00000000
0040 00000001 e91dd000 00001073 0010051b 00080000 f1e4d900 00000001 00000002
0060 000000c8 6df9eca0 00008044 e8895700 00000040 00000026 00000003 0b56e8b8
R8: 0xe88affec:
ffec 40320380 20000010 00000000 70cfe821 70cfec21 bf81e1f8 e88b0018 e88b000c
000c e88e9a00 00000000 bf095024 00000000 fffffffe 00000000 00000000 fffffffe
002c 00000000 00000000 fffffffe 00000000 00000000 00000001 e91dd000 00001073
004c 0010051b 00060000 f1e4d900 00000001 00000002 000000c8 6df9eca0 00008044
006c e8895700 00000040 00000026 00000003 0b56e8b8 e4604000 0000026c 000000da
008c 00000000 21d7ff6e 000078a9 bf05add4 e88b0000 e88b0000 ebd02600 f1015a05
00ac 00000001 000000a6 000000c4 00000000 e88b0000 1e1e1e1e 1e1e1e1e 1e1e1e1e
00cc 1e1e1e1e 1e1e1e1e 1e1e1e1e 1e1e1e1e 1e1e1e1e 1e1e1e1e 1e1e1e1e 1e1e1e1e
Questions:
What does the 0xc0705970 stands for in SP: 0xc0705970:? Code address or data address? Where to find it?
Why sp : c07059f0 is not at the beginning or end of SP register? How is the stack organized in this register?
What does the first column of each register mean? If they stand for relative address, why are they not continuous?
Is 0b56e8b8 a pointer pointing to a page? How is it be accessed in R5 and R8?
How the registers are used in an OS is something up to the ABI, a.k.a Application Binary Interface.
However we can give a quick, informal and simplified explanation of the dump.
I'm not an expert on Linux on ARM but some name seem quite intuitive:
sp is Stack Pointer. A pointer to a useful memory area called the stack.
fp is Frame Pointer. A pointer used by routine to access local vars.
lr is Link Register. A register containing the Return address of a call.
nzCv are the flags, If a flag is in uppercase it is set, otherwise clear.
n = Last result was Negative
z = Last result was Zero
C = Last result needed/produced a Carry bit
v = Last result Overflowed
IRQ on means Hardware interrupts are enabled.
FIRQ on means that some hardware interrupts are handled with a fast context switch.
Mode is the CPU mode, indicating that the code was privileged.
The following info are control structures for the the CPU set by the kernel.
The dump make you a favor by considering the sp, r5 and r8 register values as pointers and showing the memory at that addresses.
The block below SP: 0xc0705970: for example is a dump of the memory at 0xc0705970. Each row is formatted as follow:
The first column is the current address. Only the last four digit are shown as is it obvious what the full address is (ie there is no ambiguity, the addresses start from 0xc0705970).
The following eight columns are 32 bit values dumped from memory. Each row show you 32 byte of memory.
For example by looking at
R5: 0xe88aff80:
ff80 bf10f0b0 e8aca4c0 e88aff8c e88b1680 00000000 bf05b70c e87c3580 00000000
ffa0 bf095024 e87c3580 00000000 bf095024 e87c3580 00000000 bf095024 00000001
ffc0 00000004 ebd83000 00000793 e8cc2500 00000002 00000004 00000043 ffffffff
ffe0 40320354 be9ee8d8 00030444 40320380 20000010 00000000 70cfe821 70cfec21
0000 bf81e1f8 e88b0018 e88b000c e88e9a00 00000000 bf095024 00000000 fffffffe
0020 00000000 00000000 fffffffe 00000000 00000000 fffffffe 00000000 00000000
0040 00000001 e91dd000 00001073 0010051b 00080000 f1e4d900 00000001 00000002
0060 000000c8 6df9eca0 00008044 e8895700 00000040 00000026 00000003 0b56e8b8
You can tell that the 32 bit value r5 was pointing to was 0xbf10f0b0 or that the 32 bit value at 0xe88a0000 was 0xbf81e1f8 or that the 32 bit value at 0xe88a0028 was 0xfffffffe.
All this information are useful for the developer of the code that panicked.

Resources