I am trying to troubleshoot a problem in my kernel:
khungtask panic was thrown due to a stuck process. The process which triggered the panic has four threads. Two threads stuck trying to a acquire the mmap_sem semaphore. And the other two were waiting on a user-space mutex.
Here is the stack trace for the first one:
crash> set 14789
PID: 14789
COMMAND: "dumper:du"
TASK: ffff8801a434c140 [THREAD_INFO: ffff8801ecd7c000]
CPU: 0
STATE: TASK_UNINTERRUPTIBLE
crash> bt
PID: 14789 TASK: ffff8801a434c140 CPU: 0 COMMAND: "dumper:du"
#0 [ffff8801ecd7dda8] __schedule at ffffffff8143fa09
#1 [ffff8801ecd7de50] schedule at ffffffff8143fb95
#2 [ffff8801ecd7de60] rwsem_down_failed_common at ffffffff81440fdf
#3 [ffff8801ecd7dec0] rwsem_down_write_failed at ffffffff81441024
#4 [ffff8801ecd7ded0] call_rwsem_down_write_failed at ffffffff812167c3
#5 [ffff8801ecd7df20] sys_mprotect at ffffffff810f968e
#6 [ffff8801ecd7df80] system_call_fastpath at ffffffff81447f42
RIP: 00000031dfce5377 RSP: 00007fff3c2b6408 RFLAGS: 00013202
RAX: 000000000000000a RBX: ffffffff81447f42 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000001000 RDI: 00007f95310a1000
RBP: 00000031e061c360 R8: 0000000000000019 R9: 0000000000000008
R10: 00000031dfa20000 R11: 0000000000003246 R12: 0000000000000003
R13: 0000000000000000 R14: 00007f95318a19c0 R15: 0000000000800000
ORIG_RAX: 000000000000000a CS: 0033 SS: 002b
And here is the stack trace from the second one:
crash> set 14791
PID: 14791
COMMAND: "dumper:du"
TASK: ffff8801eda161c0 [THREAD_INFO: ffff8801f0fa2000]
CPU: 1
STATE: TASK_RUNNING
crash> bt
PID: 14791 TASK: ffff8801eda161c0 CPU: 1 COMMAND: "dumper:du"
#0 [ffff8801f0fa3d88] __schedule at ffffffff8143fa09
#1 [ffff8801f0fa3e30] schedule at ffffffff8143fb95
#2 [ffff8801f0fa3e40] rwsem_down_failed_common at ffffffff81440fdf
#3 [ffff8801f0fa3ea0] rwsem_down_write_failed at ffffffff81441024
#4 [ffff8801f0fa3eb0] call_rwsem_down_write_failed at ffffffff812167c3
#5 [ffff8801f0fa3f00] sys_mmap_pgoff at ffffffff810f8de4
#6 [ffff8801f0fa3f70] sys_mmap at ffffffff8101280e
#7 [ffff8801f0fa3f80] system_call_fastpath at ffffffff81447f42
RIP: 00000031dfce531a RSP: 00007f95330a30c8 RFLAGS: 00013246
RAX: 0000000000000009 RBX: ffffffff81447f42 RCX: 0000000000000000
RDX: 0000000000000003 RSI: 0000000000001000 RDI: 0000000000000000
RBP: 0000000000001000 R8: 00000000ffffffff R9: 0000000000000000
R10: 0000000000000022 R11: 0000000000003246 R12: ffffffff8101280e
R13: ffff8801f0fa3f78 R14: 0000000000000003 R15: 0000000000040000
ORIG_RAX: 0000000000000009 CS: 0033 SS: 002b
I see that the first one was TASK_UNINTERRUPTIBLE, and the second one was TASK_RUNNING. Why would that be?
Obviously, the first one which was TASK_UNINTERRUPTIBLE is the one that triggered the panic. But, why would both fail to acquire the semaphore?
I do not understand this.
Related
I am using this tutorial to analyze the dump file generated on kernel crash.
The dump file is successfully generated and I am able to access it using crash utility.
/* Code for the kernel module */
#include<linux/module.h>
#include<linux/kernel.h>
#include<linux/types.h>
static s32 __init testmoduleinit(void)
{
s8 *ptr = NULL;
pr_info("%s:module loaded.\n", __func__);
*ptr = 100; // generate oops
return 0;
}
static void __exit testmoduledeinit(void)
{
pr_info("%s:module un-loaded.\n", __func__);
}
module_init(testmoduleinit);
module_exit(testmoduledeinit);
MODULE_LICENSE("GPL");
The crash logs(backtrace output) are as follows.
crash> bt
PID: 3401 TASK: ffff9d6928b3af00 CPU: 2 COMMAND: "insmod"
#0 [ffffb2bd846478c8] machine_kexec at ffffffff9246fe83
#1 [ffffb2bd84647928] __crash_kexec at ffffffff9255a152
#2 [ffffb2bd846479f8] crash_kexec at ffffffff9255aff1
#3 [ffffb2bd84647a18] oops_end at ffffffff9243633d
#4 [ffffb2bd84647a40] no_context at ffffffff924803c9
#5 [ffffb2bd84647ab0] __bad_area_nosemaphore at ffffffff924807c0
#6 [ffffb2bd84647af8] bad_area_nosemaphore at ffffffff92480976
#7 [ffffb2bd84647b08] __do_page_fault at ffffffff9248133d
#8 [ffffb2bd84647b70] do_page_fault at ffffffff9248162c
#9 [ffffb2bd84647ba0] page_fault at ffffffff93001284
[exception RIP: _MODULE_INIT_START_gencrash+44]
RIP: ffffffffc062b02c RSP: ffffb2bd84647c58 RFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9d6935c9c8c8 RDI: ffff9d6935c9c8c8
RBP: ffffb2bd84647c60 R8: 0000000000000722 R9: 0000000000000004
R10: ffff9d693219f730 R11: 0000000000000001 R12: ffffffffc062b000
R13: ffff9d693219f730 R14: ffffb2bd84647e68 R15: ffffffffc0628000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#10 [ffffb2bd84647c68] do_one_initcall at ffffffff9240389a
#11 [ffffb2bd84647ce0] do_init_module at ffffffff92edd493
#12 [ffffb2bd84647d08] load_module at ffffffff92556e1b
#13 [ffffb2bd84647e48] __do_sys_finit_module at ffffffff9255773c
#14 [ffffb2bd84647f20] __x64_sys_finit_module at ffffffff9255777a
#15 [ffffb2bd84647f30] do_syscall_64 at ffffffff92405207
#16 [ffffb2bd84647f50] entry_SYSCALL_64_after_hwframe at ffffffff9300008c
RIP: 00007f7f2613c539 RSP: 00007fff5ba4f6a8 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 000056442b32d7c0 RCX: 00007f7f2613c539
RDX: 0000000000000000 RSI: 0000564429214d2e RDI: 0000000000000003
RBP: 0000564429214d2e R8: 0000000000000000 R9: 00007f7f2640f000
R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000
R13: 000056442b32d760 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: 0000000000000139 CS: 0033 SS: 002b
The "bt" logs show that page fault is generated at address ffffffffc062b02c.
But when I do
crash> mod -s test_module ./test_module.o
crash> sym ffffffffc062b02c
I don't see the line number in source code which is generating crash.
Is there any way to get the line number from the kernel module which is causing the oops condition.
taneesha#TANEESHA:~$ sudo docker start 3e0033eb5b5c
[sudo] password for taneesha:
fatal error: slice bounds out of range
goroutine 0 [idle]:
runtime: unexpected return pc for runtime.sigtramp called from 0x7ff4b65a73c0
stack: frame={sp:0xc000009a68, fp:0xc000009ac0} stack=[0xc000002000,0xc00000a000)
000000c000009968: 0000000000000000 0000000000000000
000000c000009978: 0000000000000101 0000000000000000
000000c000009988: 0000000000000000 0000000000000000
000000c000009998: 0000000000000000 0000000000000000
000000c0000099a8: 000000c0000099f0 0000000000000000
000000c0000099b8: 000000c000000000 0000000000000000
000000c0000099c8: 0000000000000000 000000c000009bf0
000000c0000099d8: 000000c000009ac0 000000c000009a58
000000c0000099e8: 000055aff7014ae5 <runtime.sigtrampgo+421> 000000c000000017
000000c0000099f8: 000000c000009bf0 000000c000009ac0
000000c000009a08: 000000c000000180 0000000000000000
000000c000009a18: 0000000000000000 0000000000000000
000000c000009a28: 0000000000000000 0000000000000000
000000c000009a38: 0000000000000000 000000c000000180
000000c000009a48: 000000c000009bf0 000000c000009ac0
000000c000009a58: 000000c000009ab0 000055aff7037663 <runtime.sigtramp+67>
000000c000009a68: <0000000000000017 000000c000009bf0
000000c000009a78: 000000c000009ac0 0000000000000009
000000c000009a88: 0000000000000010 0000000000000003
000000c000009a98: 000000000000000d 000000c000009ab0
000000c000009aa8: 0000000000000000 000000c00059f108
000000c000009ab8: !00007ff4b65a73c0 >0000000000000007
000000c000009ac8: 0000000000000000 000000c000002000
000000c000009ad8: 0000000000000000 0000000000008000
000000c000009ae8: 0000000000000000 000055aff8864ae0
000000c000009af8: 000000000000352d 000000000000352c
000000c000009b08: 000000000000000d 0000000000000003
000000c000009b18: 0000000000000010 0000000000000009
000000c000009b28: 000055aff99da678 0000000000000010
000000c000009b38: 000000c00059f108 0000000000000000
000000c000009b48: 000055aff99da678 000055aff7b70320 fatal error: index out of range
panic during panic
goroutine 0 [idle]:
runtime: unexpected return pc for runtime.sigtramp called from 0x7ff4b65a73c0
stack: frame={sp:0xc000009a68, fp:0xc000009ac0} stack=[0xc000002000,0xc00000a000)
000000c000009968: 0000000000000000 0000000000000000
000000c000009978: 0000000000000101 0000000000000000
000000c000009988: 0000000000000000 0000000000000000
000000c000009998: 0000000000000000 0000000000000000
000000c0000099a8: 000000c0000099f0 0000000000000000
000000c0000099b8: 000000c000000000 0000000000000000
000000c0000099c8: 0000000000000000 000000c000009bf0
000000c0000099d8: 000000c000009ac0 000000c000009a58
000000c0000099e8: 000055aff7014ae5 <runtime.sigtrampgo+421> 000000c000000017
000000c0000099f8: 000000c000009bf0 000000c000009ac0
000000c000009a08: 000000c000000180 0000000000000000
000000c000009a18: 0000000000000000 0000000000000000
000000c000009a28: 0000000000000000 0000000000000000
000000c000009a38: 0000000000000000 000000c000000180
000000c000009a48: 000000c000009bf0 000000c000009ac0
000000c000009a58: 000000c000009ab0 000055aff7037663 <runtime.sigtramp+67>
000000c000009a68: <0000000000000017 000000c000009bf0
000000c000009a78: 000000c000009ac0 0000000000000009
000000c000009a88: 0000000000000010 0000000000000003
000000c000009a98: 000000000000000d 000000c000009ab0
000000c000009aa8: 0000000000000000 000000c00059f108
000000c000009ab8: !00007ff4b65a73c0 >0000000000000007
000000c000009ac8: 0000000000000000 000000c000002000
000000c000009ad8: 0000000000000000 0000000000008000
000000c000009ae8: 0000000000000000 000055aff8864ae0
000000c000009af8: 000000000000352d 000000000000352c
000000c000009b08: 000000000000000d 0000000000000003
000000c000009b18: 0000000000000010 0000000000000009
000000c000009b28: 000055aff99da678 0000000000000010
000000c000009b38: 000000c00059f108 0000000000000000
000000c000009b48: 000055aff99da678 000055aff7b70320 fatal error: index out of range
stack trace unavailable
I'm working on making continuous profiling on running process, so I set a crontab on server. It periodically runs a python script which exec perf subprocess collecting perf data from a daemon process started by supervise
The perf command I use is like this:
perf record -p {target process} -e cycles:u -a -q -g -- sleep {some time}
Everything goes on well except for the running process terminates.We sometimes need to update the target process executable file and restart the process with svc -t. The operation may lead to a kernel panic and we have to reboot the machine
My server's distribution version is CentOS release 6.5 (Final) and linux release version is 2.6.32-431.23.3.el6.x86_64
The core dump log and backtrace is shown as below
general protection fault: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/online
CPU 1
Modules linked in: AliSecGuard(U) AliSecProcFilter64(U) tcp_diag inet_diag joydev microcode virtio_net virtio_balloon shpchp i2c_piix4 i2c_core ext4 jbd2 mbcache virtio_blk virtio_console virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 22748, comm: server Not tainted 2.6.32-573.22.1.el6.x86_64 #1 Alibaba Cloud Alibaba Cloud ECS
RIP: 0010:[<ffffffff8111db57>] [<ffffffff8111db57>] ring_buffer_put+0x77/0xf0
RSP: 0018:ffff8801afadbda8 EFLAGS: 00010006
RAX: ffff880416d81e60 RBX: ffff8803d335f000 RCX: 63496d6165727473
RDX: 676e697274736f5f RSI: 0000000000000003 RDI: ffff880416d81c00
RBP: ffff8801afadbdd8 R08: 0000000000000001 R09: 00000000ffffffff
R10: 00000000ffffffff R11: dead000000200200 R12: ffff8803d335f058
R13: 676e697274736cff R14: ffff8803d335f060 R15: 0000000000000202
FS: 0000000000000000(0000) GS:ffff880028240000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000004264d70 CR3: 0000000001a8d000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Process gameserver (pid: 22748, threadinfo ffff8801afad8000, task ffff8803a19cd520)
Stack:
ffff8801afadbdf8 ffff8803a1b03800 ffff8804138bf78c ffff8804138bf790
<d> ffff88001a9fb800 ffff8804182a1c80 ffff8801afadbdf8 ffffffff8111e377
<d> ffff8804138bf790 ffff8803a1b03800 ffff8801afadbe28 ffffffff8111fe72
Call Trace:
[<ffffffff8111e377>] free_event+0x37/0x170
[<ffffffff8111fe72>] perf_event_release_kernel+0x72/0xb0
[<ffffffff8111ff49>] put_event+0x99/0xd0
[<ffffffff81123a65>] __perf_event_exit_task+0xf5/0x150
[<ffffffff81123c91>] perf_event_exit_task+0x1d1/0x210
[<ffffffff8107ca24>] do_exit+0x1e4/0x870
[<ffffffff8107d1b7>] sys_exit+0x17/0x20
[<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
Code: ff ff 4c 39 f0 48 8b 97 60 02 00 00 74 5f 4c 8d aa a0 fd ff ff eb 08 0f 1f 44 00 00 49 89 cd 48 8b 8f 68 02 00 00 be 03 00 00 00 <48> 89 4a 08 48 89 11 31 c9 48 89 87 60 02 00 00 48 89 87 68 02
RIP [<ffffffff8111db57>] ring_buffer_put+0x77/0xf0
RSP <ffff8801afadbda8>
PID: 22748 TASK: ffff8803a19cd520 CPU: 1 COMMAND: "server"
#0 [ffff8801afadbb30] machine_kexec at ffffffff8103d1fb
#1 [ffff8801afadbb90] crash_kexec at ffffffff810cc882
#2 [ffff8801afadbc60] oops_end at ffffffff8153da20
#3 [ffff8801afadbc90] die at ffffffff81010fab
#4 [ffff8801afadbcc0] do_general_protection at ffffffff8153d512
#5 [ffff8801afadbcf0] general_protection at ffffffff8153cce5
[exception RIP: ring_buffer_put+119]
RIP: ffffffff8111db57 RSP: ffff8801afadbda8 RFLAGS: 00010006
RAX: ffff880416d81e60 RBX: ffff8803d335f000 RCX: 63496d6165727473
RDX: 676e697274736f5f RSI: 0000000000000003 RDI: ffff880416d81c00
RBP: ffff8801afadbdd8 R8: 0000000000000001 R9: 00000000ffffffff
R10: 00000000ffffffff R11: dead000000200200 R12: ffff8803d335f058
R13: 676e697274736cff R14: ffff8803d335f060 R15: 0000000000000202
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#6 [ffff8801afadbda0] ring_buffer_put at ffffffff8111db20
#7 [ffff8801afadbde0] free_event at ffffffff8111e377
#8 [ffff8801afadbe00] perf_event_release_kernel at ffffffff8111fe72
#9 [ffff8801afadbe30] put_event at ffffffff8111ff49
#10 [ffff8801afadbe60] __perf_event_exit_task at ffffffff81123a65
#11 [ffff8801afadbe90] perf_event_exit_task at ffffffff81123c91
#12 [ffff8801afadbef0] do_exit at ffffffff8107ca24
#13 [ffff8801afadbf70] sys_exit at ffffffff8107d1b7
#14 [ffff8801afadbf80] system_call_fastpath at ffffffff8100b0d2
RIP: 0000003026207c41 RSP: 00007fe4d1e56e50 RFLAGS: 00000246
RAX: 000000000000003c RBX: ffffffff8100b0d2 RCX: 0000000000000001
RDX: 0000000000000004 RSI: 00000000009fb000 RDI: 0000000000000000
RBP: 0000000000000000 R8: 000000000598f280 R9: 00000000000058dc
R10: 00007fe4d259f3ac R11: 0000000000000246 R12: ffffffff8107d1b7
R13: ffff8801afadbf78 R14: 0000000000000003 R15: 0000000000000000
ORIG_RAX: 000000000000003c CS: 0033 SS: 002b
The attached process's thread exit causes the kernel panic and the panic cannot be reproduced every time, so I suppose this may be kind of a race condition bug in kernel?
BTW, the perf process doesn't exit after attached process terminates on my server (because of old version I guess), so perf will continue to work until I interrupt it. I'm not sure whether this can affect the target process exit
Sounds like a bug in that old kernel version; user-space perf shouldn't be able to panic the kernel. Either Linux 2.6.32 was buggy, or maybe CentOS's patch backporting (or the patches themselves) introduced a bug that only happens when perf is active.
I don't think it's plausible to fix this on your own, unless you want to really dig into kernel debugging, so your options are
update your kernel (from CentOS if available, otherwise you'll have to find another source for kernels, e.g. mainline).
update your distro to one that uses newer kernels. (And newer everything else).
stop using perf.
Maybe not possible: find a different set of perf options that doesn't trigger this crash. (Be careful testing, whatever bug is causing this could possibly manifest in other ways like memory corruption, possibly leading to corrupting file data before it's written out to disk if you get unlucky with writing through a wild pointer. The bug might not have that possible failure mode, but you can't be certain until you've found it.)
I am trying to boot Arch Linux in qemu adding console=ttyS0 to the kernel boot args. I downloaded the .iso, unpacked and ran the following command
qemu-system-x86_64 -accel hvf -cpu host -m 2048 -nographic -append "console=ttyS0" -kernel arch/boot/x86_64/vmlinuz-linux -initrd arch/boot/x86_64/initramfs-linux.img
As a result I get the following output
SeaBIOS (version rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org)
iPXE (http://ipxe.org) 00:03.0 CA00 PCI2.10 PnP PMM+7FF8F130+7FEEF130 CA00
Booting from ROM...
Probing EDD (edd=off to disable)... o
[ 0.233432] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 0.233903] CPU: 0 PID: 13 Comm: migration/0 Not tainted 5.8.12-arch1-1 #1
[ 0.234504] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014
[ 0.235521] RIP: 0010:read_tsc+0x0/0x10
[ 0.235870] Code: cc cc cc cc cc cc cc cc cc cc 8b 05 b6 23 93 01 c3 66 0f 1f 84 00 00 00 00 00 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 41
[ 0.236718] RSP: 0000:ffffaf1480073e28 EFLAGS: 00010002
[ 0.236718] RAX: ffffffff9da35aa0 RBX: ffffffff9f227520 RCX: 0000003b9aca0000
[ 0.236718] RDX: 0000003b9aca0000 RSI: 0000003b9aca0000 RDI: ffffffff9f227520
[ 0.236718] RBP: ffffffff9f25d1a0 R08: 0000000000000000 R09: 0000000000000004
[ 0.236718] R10: 0000000000000204 R11: 0000000000000000 R12: 0000000000000002
[ 0.236718] R13: ffffffff9f369520 R14: 0000000000000000 R15: 0000000000000003
[ 0.236718] FS: 0000000000000000(0000) GS:ffff9d957b000000(0000) knlGS:0000000000000000
[ 0.236718] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.236718] CR2: 0000000000000000 CR3: 000000004200a001 CR4: 00000000003606f0
[ 0.236718] Call Trace:
[ 0.236718] tk_setup_internals.constprop.0+0x39/0x150
[ 0.236718] change_clocksource+0x5b/0xc0
[ 0.236718] multi_cpu_stop+0x6b/0x110
[ 0.236718] ? stop_machine_yield+0x10/0x10
[ 0.236718] cpu_stopper_thread+0x72/0x100
[ 0.236718] ? smpboot_register_percpu_thread+0xe0/0xe0
[ 0.236718] smpboot_thread_fn+0x19a/0x230
[ 0.236718] kthread+0x142/0x160
[ 0.236718] ? __kthread_bind_mask+0x60/0x60
[ 0.236718] ret_from_fork+0x1f/0x30
[ 0.236718] Modules linked in:
[ 0.236718] ---[ end trace 18ea92f06c5f9ac2 ]---
[ 0.236718] RIP: 0010:read_tsc+0x0/0x10
[ 0.236718] Code: cc cc cc cc cc cc cc cc cc cc 8b 05 b6 23 93 01 c3 66 0f 1f 84 00 00 00 00 00 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 41
[ 0.236718] RSP: 0000:ffffaf1480073e28 EFLAGS: 00010002
[ 0.236718] RAX: ffffffff9da35aa0 RBX: ffffffff9f227520 RCX: 0000003b9aca0000
[ 0.236718] RDX: 0000003b9aca0000 RSI: 0000003b9aca0000 RDI: ffffffff9f227520
[ 0.236718] RBP: ffffffff9f25d1a0 R08: 0000000000000000 R09: 0000000000000004
[ 0.236718] R10: 0000000000000204 R11: 0000000000000000 R12: 0000000000000002
[ 0.236718] R13: ffffffff9f369520 R14: 0000000000000000 R15: 0000000000000003
[ 0.236718] FS: 0000000000000000(0000) GS:ffff9d957b000000(0000) knlGS:0000000000000000
[ 0.236718] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.236718] CR2: 0000000000000000 CR3: 000000004200a001 CR4: 00000000003606f0
[ 0.236718] note: migration/0[13] exited with preempt_count 2
I've tried to disable PTI by adding pti=off spectre_v2=off to the boot args but the result was exactly the same.
The key would seem to be in this line:
[ 0.236718] RIP: 0010:read_tsc+0x0/0x10
Try changing your --cpu to --cpu host,-rdtscp, which will disable TSC support in the CPU flags advertised to the Linux guest. You may need/want to add clocksource=hpet to your kernel arguments to get the guest a stable timesource afterwards.
I found a bug about ixgbevf kernel module on centos-3.10.0-229.20.1.el7. And I think this bug is fixed in 3.10.0-514.10.2.el7.
So could someone tell me which patch fix this bug, or how to find this patch?
BUG:
[308026.586026] ixgbevf 0000:01:10.0: NIC Link is Down
[308026.586037] ixgbevf 0000:01:10.1: NIC Link is Down
[308026.683724] bonding: bond1: link status definitely down for interface enp1s16, disabling it
[308026.683728] bonding: bond1: now running without any active interface !
[308026.683729] bonding: bond1: link status definitely down for interface enp1s16f1, disabling it
[308028.266060] bonding: bond1: Removing slave enp1s16.
[308028.266135] bonding: bond1: Warning: the permanent HWaddr of enp1s16 - 4e:cd:a6:59:26:2c - is still in use by bond1. Set the HWaddr of enp1s16 to a different address to avoid conflicts.
[308028.266139] bonding: bond1: releasing active interface enp1s16
[308028.359872] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[308028.361319] IP: [<ffffffffa0494970>] ixgbevf_alloc_rx_buffers+0x60/0x160 [ixgbevf]
[308028.362049] PGD 0
[308028.362777] Oops: 0000 [#1] SMP
[308028.363481] Modules linked in: ixgbevf(OF) igb_uio(OF) iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter nbd(OF) vhost_net macvtap macvlan udp_diag unix_diag af_packet_diag netlink_diag tun tcp_diag inet_diag uio bonding ext4 mbcache jbd2 intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel mgag200 aesni_intel iTCO_wdt lrw dcdbas gf128mul syscopyarea sysfillrect iTCO_vendor_support glue_helper sysimgblt ablk_helper ttm cryptd ipmi_devintf igb ixgbe drm_kms_helper drm i2c_algo_bit ptp i2c_core ipmi_si pps_core sg mdio ipmi_msghandler dca sb_edac mei_me mei shpchp lpc_ich pcspkr mfd_core edac_core wmi acpi_power_meter acpi_pad ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_common ahci libahci
[308028.368487] libata megaraid_sas [last unloaded: ixgbevf]
[308028.369345] CPU: 0 PID: 21971 Comm: kworker/0:1 Tainted: GF W O-------------- 3.10.0-229.el7.x86_64 #1
[308028.370226] Hardware name: Dell Inc. PowerEdge R720/068CDY, BIOS 2.5.2 01/28/2015
[308028.371132] Workqueue: events ixgbevf_service_task [ixgbevf]
[308028.372038] task: ffff88022b0dad80 ti: ffff88010905c000 task.ti: ffff88010905c000
[308028.372965] RIP: 0010:[<ffffffffa0494970>] [<ffffffffa0494970>] ixgbevf_alloc_rx_buffers+0x60/0x160 [ixgbevf]
[308028.373949] RSP: 0018:ffff88010905fd10 EFLAGS: 00010287
[308028.374900] RAX: 0000000000000200 RBX: 0000000000000000 RCX: 0000000000000000
[308028.375895] RDX: 0000000000000000 RSI: 00000000000001ff RDI: ffff8800b82061c0
[308028.376841] RBP: ffff88010905fd48 R08: 0000000000000282 R09: 0000000000000001
[308028.377780] R10: 0000000000000004 R11: 0000000000000005 R12: 0000000000000000
[308028.378702] R13: 00000000fffffe00 R14: 00000000000001ff R15: ffff8800b82061c0
[308028.379628] FS: 0000000000000000(0000) GS:ffff882f7fa00000(0000) knlGS:0000000000000000
[308028.380540] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[308028.381471] CR2: 0000000000000008 CR3: 000000000190a000 CR4: 00000000001427f0
[308028.382376] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[308028.383291] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[308028.384180] Stack:
[308028.385051] ffff8832d1b58bc0 ffff88010905fd28 ffff8832d1b588c0 0000000000000009
[308028.385933] ffff8832d1b58bc0 ffff8800b82061c0 0000000000001028 ffff88010905fdb8
[308028.386804] ffffffffa0496ba3 ffff8832d1b58e58 000000022b1e2000 00000000819e2108
[308028.387693] Call Trace:
[308028.388520] [<ffffffffa0496ba3>] ixgbevf_configure+0x5d3/0x7d0 [ixgbevf]
[308028.389363] [<ffffffffa0498135>] ixgbevf_reinit_locked+0x65/0x90 [ixgbevf]
[308028.390213] [<ffffffffa049a3e4>] ixgbevf_service_task+0x324/0x420 [ixgbevf]
[308028.391043] [<ffffffff8108f1db>] process_one_work+0x17b/0x470
[308028.391888] [<ffffffff8108ffbb>] worker_thread+0x11b/0x400
[308028.392728] [<ffffffff8108fea0>] ? rescuer_thread+0x400/0x400
[308028.393576] [<ffffffff8109739f>] kthread+0xcf/0xe0
[308028.394434] [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140
[308028.395339] [<ffffffff8161497c>] ret_from_fork+0x7c/0xb0
[308028.396205] [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140
[308028.397068] Code: c5 41 89 f6 49 89 c4 48 8d 14 40 48 8b 47 28 49 c1 e4 04 4c 03 67 20 48 8d 1c d0 0f b7 47 4c 41 29 c5 66 0f 1f 84 00 00 00 00 00 <48> 83 7b 08 00 74 73 8b 53 10 48 8b 03 48 01 d0 49 83 c4 10 48
[308028.398959] RIP [<ffffffffa0494970>] ixgbevf_alloc_rx_buffers+0x60/0x160 [ixgbevf]
[308028.399910] RSP <ffff88010905fd10>
[308028.400846] CR2: 0000000000000008