linux softlockup in memory - linux

My system is a embedded linux system(running kernel version 2.6.18). A client process send data to mysql server. The data will be stored in mysql database at a RAID5 assembled by four disks. The IO pressure(wa%) is always above 20% , mysql CPU utilization is very high.
After running 5 or 6 hours, the system run into softlock up stat.
The stack information is about releasing the physical memory, writing cache data to the hard disk.
Any suggestions in this circumstance?**
BUG: soft lockup detected on CPU#0!
[<c043dc1c>] softlockup_tick+0x8f/0xb1
[<c0428cb5>] update_process_times+0x26/0x5c
[<c0411256>] smp_apic_timer_interrupt+0x5d/0x67
[<c04044e7>] apic_timer_interrupt+0x1f/0x24
[<c06fe0b9>] _spin_lock+0x5/0xf
[<c047db2a>] __mark_inode_dirty+0x50/0x176
[<c0424eef>] current_fs_time+0x4d/0x5e
[<c0475ccd>] touch_atime+0x51/0x94
[<c0440926>] do_generic_mapping_read+0x425/0x563
[<c044134b>] __generic_file_aio_read+0xf3/0x267
[<c043fcd0>] file_read_actor+0x0/0xd4
[<c04414fb>] generic_file_aio_read+0x3c/0x4d
[<c045d72d>] do_sync_read+0xc1/0xfd
[<c0431656>] autoremove_wake_function+0x0/0x37
[<c045e0e8>] vfs_read+0xa4/0x167
[<c045d66c>] do_sync_read+0x0/0xfd
[<c045e688>] sys_pread64+0x5e/0x62
[<c0403a27>] syscall_call+0x7/0xb
=======================
BUG: soft lockup detected on CPU#2!
[<c043dc1c>] softlockup_tick+0x8f/0xb1
[<c0428cb5>] update_process_times+0x26/0x5c
[<c0411256>] smp_apic_timer_interrupt+0x5d/0x67
[<c04044e7>] apic_timer_interrupt+0x1f/0x24
[<c06fe0bb>] _spin_lock+0x7/0xf
[<c04aaf17>] journal_try_to_free_buffers+0xf4/0x17b
[<c0442c52>] find_get_pages+0x28/0x5d
[<c049c4b1>] ext3_releasepage+0x0/0x7d
[<c045f0bf>] try_to_release_page+0x2c/0x46
[<c0447894>] invalidate_mapping_pages+0xc9/0x167
[<c04813b0>] drop_pagecache+0x86/0xd2
[<c048144e>] drop_caches_sysctl_handler+0x52/0x64
[<c04813fc>] drop_caches_sysctl_handler+0x0/0x64
[<c042623d>] do_rw_proc+0xe8/0xf4
[<c0426268>] proc_writesys+0x1f/0x24
[<c045df81>] vfs_write+0xa6/0x169
[<c0426249>] proc_writesys+0x0/0x24
[<c045e601>] sys_write+0x41/0x6a
[<c0403a27>] syscall_call+0x7/0xb
=======================
BUG: soft lockup detected on CPU#1!
[<c043dc1c>] softlockup_tick+0x8f/0xb1
[<c0428cb5>] update_process_times+0x26/0x5c
[<c0411256>] smp_apic_timer_interrupt+0x5d/0x67
[<c04044e7>] apic_timer_interrupt+0x1f/0x24
[<c06f007b>] inet_diag_dump+0x804/0x821
[<c06fe0bb>] _spin_lock+0x7/0xf
[<c047db2a>] __mark_inode_dirty+0x50/0x176
[<c043168d>] wake_bit_function+0x0/0x3c
[<c04ae0f6>] __journal_remove_journal_head+0xee/0x1a5
[<c0445ae8>] __set_page_dirty_nobuffers+0x87/0xc6
[<c04a908e>] __journal_unfile_buffer+0x8/0x11
[<c04ab94d>] journal_commit_transaction+0x8e0/0x1103
[<c0431656>] autoremove_wake_function+0x0/0x37
[<c04af690>] kjournald+0xa9/0x1e5
[<c0431656>] autoremove_wake_function+0x0/0x37
[<c04af5e7>] kjournald+0x0/0x1e5
[<c04314da>] kthread+0xde/0xe2
[<c04313fc>] kthread+0x0/0xe2
[<c0404763>] kernel_thread_helper+0x7/0x14
=======================
BUG: soft lockup detected on CPU#3!
[<c043dc1c>] softlockup_tick+0x8f/0xb1
[<c0428cb5>] update_process_times+0x26/0x5c
[<c0411256>] smp_apic_timer_interrupt+0x5d/0x67
[<c04044e7>] apic_timer_interrupt+0x1f/0x24
[<c06fe0bb>] _spin_lock+0x7/0xf
[<c047db2a>] __mark_inode_dirty+0x50/0x176
[<c0424eef>] current_fs_time+0x4d/0x5e
[<c0475ccd>] touch_atime+0x51/0x94
[<c0440926>] do_generic_mapping_read+0x425/0x563
[<c044134b>] __generic_file_aio_read+0xf3/0x267
[<c043fcd0>] file_read_actor+0x0/0xd4
[<c04414fb>] generic_file_aio_read+0x3c/0x4d
[<c045d72d>] do_sync_read+0xc1/0xfd
[<c0431656>] autoremove_wake_function+0x0/0x37
[<c045e0e8>] vfs_read+0xa4/0x167
[<c045d66c>] do_sync_read+0x0/0xfd
[<c045e688>] sys_pread64+0x5e/0x62
[<c0403a27>] syscall_call+0x7/0xb
=======================

Seriously, try something newer. 2.6.18 is >7 years old.

Looks like CPU#1 and CPU#3 are spinning in a spinlock on a inode structure.

Related

RegexpParser crashes Python 3.8.6 kernel in JupyterLab

I am parsing chunks of pos-tagged text in JupyterLabs.
NLTK can run on Python<3.5 and >3.8 according to its faq.
I can return pos-tagged text just fine. But when I want to return parsed chunks, it crashes python.
I am running MacOS 11.1, Python 3.8.6, Jupyterlab 3.0.0, and nltk 3.5
from nltk import *
text = """The Buddha, the Godhead, resides quite as comfortably in the circuits of a digital
computer or the gears of a cycle transmission as he does at the top of a mountain
or in the petals of a flower. To think otherwise is to demean the Buddha...which is
to demean oneself."""
sentence_re = r'''(?x)
([A-Z])(\.[A-Z])+\.?
| \w+(-\w+)*
| \$?\d+(\.\d+)?%?
| \.\.\.
| [][.,;"'?():-_`]
'''
grammar = r"""
NBAR:
{<NN.*|JJ>*<NN.*>} # Nouns and Adjectives, terminated with Nouns
NP:
{<NBAR>}
{<NBAR><IN><NBAR>} # Above, connected with in/of/etc...
"""
chunker = RegexpParser(grammar)
toks = word_tokenize(text)
postoks = pos_tag(toks)
All is fine until I want to parse the chunks with RegexpParser. At which point, it crashes the kernel.
chunker.parse(postoks)
beginning of the crash report:
Process: Python [1549]
Path: /usr/local/Cellar/python#3.8/3.8.6_2/Frameworks/Python.framework/Versions/3.8/Resources/Python.app/Contents/MacOS/Python
Identifier: org.python.python
Version: 3.8.6 (3.8.6)
Code Type: X86-64 (Native)
Parent Process: Python [1522]
Responsible: Terminal [416]
User ID: 501
Date/Time: 2020-12-29 15:34:13.546 -0500
OS Version: macOS 11.1 (20C69)
Report Version: 12
Anonymous UUID: 8EEE2257-0986-3569-AA83-52641AF02282
Time Awake Since Boot: 1200 seconds
System Integrity Protection: enabled
Crashed Thread: 0 Dispatch queue: com.apple.main-thread
Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Exception Note: EXC_CORPSE_NOTIFY
Application Specific Information:
abort() called
end of the crash report:
External Modification Summary:
Calls made by other processes targeting this process:
task_for_pid: 4
thread_create: 0
thread_set_state: 0
Calls made by this process:
task_for_pid: 0
thread_create: 0
thread_set_state: 0
Calls made by all processes on this machine:
task_for_pid: 4785
thread_create: 0
thread_set_state: 0
VM Region Summary:
ReadOnly portion of Libraries: Total=835.0M resident=0K(0%) swapped_out_or_unallocated=835.0M(100%)
Writable regions: Total=1.5G written=0K(0%) resident=0K(0%) swapped_out=0K(0%) unallocated=1.5G(100%)
VIRTUAL REGION
REGION TYPE SIZE COUNT (non-coalesced)
=========== ======= =======
Activity Tracing 256K 1
Dispatch continuations 64.0M 1
Kernel Alloc Once 8K 1
MALLOC 150.1M 37
MALLOC guard page 24K 5
MALLOC_MEDIUM (reserved) 960.0M 8 reserved VM address space (unallocated)
STACK GUARD 72K 18
Stack 86.6M 18
VM_ALLOCATE 182.5M 359
VM_ALLOCATE (reserved) 128.0M 2 reserved VM address space (unallocated)
__DATA 16.1M 460
__DATA_CONST 11.8M 200
__DATA_DIRTY 509K 87
__FONT_DATA 4K 1
__LINKEDIT 506.1M 241
__OBJC_RO 60.5M 1
__OBJC_RW 2452K 2
__TEXT 329.5M 432
__UNICODE 588K 1
mapped file 51.5M 9
shared memory 40K 4
=========== ======= =======
TOTAL 2.5G 1888
TOTAL, minus reserved VM space 1.4G 1888
Model: iMac18,2, BootROM 429.60.3.0.0, 4 processors, Quad-Core Intel Core i7, 3.6 GHz, 32 GB, SMC 2.40f1
Graphics: kHW_AMDRadeonPro560Item, Radeon Pro 560, spdisplays_pcie_device, 4 GB
Memory Module: BANK 0/DIMM0, 16 GB, DDR4 SO-DIMM, 2400 MHz, 0x802C, 0x313641544632473634485A2D3247334232202020
Memory Module: BANK 1/DIMM0, 16 GB, DDR4 SO-DIMM, 2400 MHz, 0x802C, 0x313641544632473634485A2D3247334232202020
AirPort: spairport_wireless_card_type_airport_extreme (0x14E4, 0x16E), Broadcom BCM43xx 1.0 (7.77.111.1 AirPortDriverBrcmNIC-1675.1)
Bluetooth: Version 8.0.2f9, 3 services, 27 devices, 1 incoming serial ports
Network Service: Wi-Fi, AirPort, en1
USB Device: USB 3.0 Bus
USB Device: AS2105
USB Device: USB 2.0 BILLBOARD
USB Device: Bluetooth USB Host Controller
USB Device: FaceTime HD Camera (Built-in)
USB Device: Scarlett 2i4 USB
USB Device: My Passport 0827
Thunderbolt Bus: iMac, Apple Inc., 41.4

system_call value is different each time when I use rdmsrl(MSR_LSTAR, system_call)

I'm writing a lkm to get sys_call_table address and I'm trying to get it by IDT (I have tested other methods and they work). The problem is that when I use rdmsrl to get register MSR_LSTAR, it's different each time.
I have tried function rdmsrl (MSR_LSTAR) and asm sentences in Ubuntu 18.04.1 with kernel 4.15.0-51.
asm("rdmsr" : "=a" (low), "=d" (high) : "c" (IA32_LSTAR));
system_call = (void*)(((long)high<<32) | low);
printk(KERN_INFO "system_call: 0x%llx", system_call);
rdmsrl(MSR_LSTAR, sct_off);
printk("sct_off: %016llx\n", sct_off);
The result is as follows:
system_call: 0xfffffe0000006000
system_call: 0xfffffe000008a000
system_call: 0xfffffe0000032000
Do you have CONFIG_RETPOLINE=y enabled? (check via cat /usr/src/`uname -r`/.config | grep RETPOLINE). If so, for CPUs where Kernel Page Table Isolation is enabled MSR_LSTAR holds the trampoline per-cpu entry SYSCALL64_entry_trampoline instead of the standard entry_SYSCALL_64 for your kernel version.

Jvm is using more memory than Native Memory Tracking says, how do I locate where the extra meeory goes? [duplicate]

This question already has answers here:
Java process memory usage (jcmd vs pmap)
(3 answers)
Where do these java native memory allocated from?
(1 answer)
Closed 5 years ago.
I'm running jetty on my web server. My current jvm setting: -Xmx4g -Xms2g, however jetty uses a lot more memory and I don't know where these extra memory goes.
Jetty uses 4.547g memory in total:
heap usage shows heap memory usage at 2.5g:
Heap Usage:
New Generation (Eden + 1 Survivor Space):
capacity = 483196928 (460.8125MB)
used = 277626712 (264.76546478271484MB)
free = 205570216 (196.04703521728516MB)
57.45622455612963% used
Eden Space:
capacity = 429522944 (409.625MB)
used = 251267840 (239.627685546875MB)
free = 178255104 (169.997314453125MB)
58.4992824038755% used
From Space:
capacity = 53673984 (51.1875MB)
used = 26358872 (25.137779235839844MB)
free = 27315112 (26.049720764160156MB)
49.109214624351345% used
To Space:
capacity = 53673984 (51.1875MB)
used = 0 (0.0MB)
free = 53673984 (51.1875MB)
0.0% used
concurrent mark-sweep generation:
capacity = 2166849536 (2066.46875MB)
used = 1317710872 (1256.6670150756836MB)
free = 849138664 (809.8017349243164MB)
60.81229222922842% used
Still 2g missing, then I use Native Memory Tracking, it shows:
Total: reserved=5986478KB, committed=3259678KB
- Java Heap (reserved=4194304KB, committed=2640352KB)
(mmap: reserved=4194304KB, committed=2640352KB)
- Class (reserved=1159154KB, committed=122778KB)
(classes #18260)
(malloc=4082KB #62204)
(mmap: reserved=1155072KB, committed=118696KB)
- Thread (reserved=145568KB, committed=145568KB)
(thread #141)
(stack: reserved=143920KB, committed=143920KB)
(malloc=461KB #707)
(arena=1187KB #280)
- Code (reserved=275048KB, committed=143620KB)
(malloc=25448KB #30875)
(mmap: reserved=249600KB, committed=118172KB)
- GC (reserved=25836KB, committed=20792KB)
(malloc=11492KB #1615)
(mmap: reserved=14344KB, committed=9300KB)
- Compiler (reserved=583KB, committed=583KB)
(malloc=453KB #769)
(arena=131KB #3)
- Internal (reserved=76399KB, committed=76399KB)
(malloc=76367KB #25878)
(mmap: reserved=32KB, committed=32KB)
- Symbol (reserved=21603KB, committed=21603KB)
(malloc=17791KB #201952)
(arena=3812KB #1)
- Native Memory Tracking (reserved=5096KB, committed=5096KB)
(malloc=22KB #261)
(tracking overhead=5074KB)
- Arena Chunk (reserved=190KB, committed=190KB)
(malloc=190KB)
- Unknown (reserved=82696KB, committed=82696KB)
(mmap: reserved=82696KB, committed=82696KB)
Still does't explain where memory goes, can someone shed light on how to locate the missing memory?

CPU stalling in SMP

We are facing an issue on which we need some help.
Brief write-up :
We have enabled SMP in Linux 2.6.39.4 kernel and cross compiled it for PPC-476. After booting, kernel is able to map both the processors (2 cores at h/w). The problem we are facing is, while running modprobe command repeatedly, one of the cpu goes into stall state. We have tried to dump stack of all active cpus (using sysrq) while one of the cpu is in stall state. The stack dump showed both the processors were executing same process (with same PID) i.e. modprobe.
Question :
1. is it possible for both the processors to be executing same process at a movement with same PID.
2. Is the execution of both process simultaneously give rise to some race condition which is causing either of CPU to go in stall state.
LOGS===============================================
SysRq : Show backtrace of all active CPUs
CPU0:
NIP: 701786c4 LR: 701752f0 CTR: 00000004
REGS: 9fb4fdc0 TRAP: 0501 Not tainted (2.6.39.4)
MSR: 00029000 <EE,ME,CE> CR: 44002048 XER: 00000000
TASK = 8f868ae0[827] 'modprobe' THREAD: 9fb48000 CPU: 0
GPR00: 08101820 9fb4fe70 8f868ae0 22222222 0002374d 00000002 00849ffc 00000000
GPR08: a2bb3a8c 00000810 a2c2aac4 00000148 44002088
NIP [701786c4] __sw_hweight32+0x50/0x58
LR [701752f0] __bitmap_weight+0x54/0xc0
Call Trace:
[9fb4fe70] [20000000] 0x20000000 (unreliable)
[9fb4fe90] [7006b1cc] sys_init_module+0x11a8/0x1ca4
[9fb4ff40] [7000f1b8] ret_from_syscall+0x0/0x3c
--- Exception: c01 at 0x10050e38
LR = 0x100a6708
Instruction dump:
7c634838 7c004838 7c001a14 5409e13e 7c090214 3d200f0f 61290f0f 7c004838
5409c23e 7c090214 5409843e 7c090214 <5403063e> 4e800020 5460f87e 70005555
CPU1:
NIP: 700a9678 LR: 700a964c CTR: 7012b0f8
REGS: 9fb4fe30 TRAP: 0501 Not tainted (2.6.39.4)
MSR: 00029000 <EE,ME,CE> CR: 80008022 XER: 20000000
TASK = 8f868ae0[827] 'modprobe' THREAD: 9fb48000 CPU: 1
GPR00: 00000000 9fb4fee0 8f868ae0 9efb9f20 9efb9f20 9fb4fee8 1004d6d0 0002d000
GPR08: 9efb9d68 9f8eee00 9efb9f20 00000000 20002022
NIP [700a9678] do_munmap+0x114/0x314
LR [700a964c] do_munmap+0xe8/0x314
Call Trace:
[9fb4fee0] [00000014] 0x14 (unreliable)
[9fb4ff20] [700aa7c8] sys_munmap+0x44/0x74
[9fb4ff40] [7000f1b8] ret_from_syscall+0x0/0x3c
--- Exception: c01 at 0x100510a8
LR = 0x10022088
Instruction dump:
4bffe461 7c641b79 41820010 80040004 7f9d0040 419d01d4 83210008 2e190000
41920200 83d9000c 801f0074 2f800000 <419e00f8> 2f9e0000 419e00f0 801e0004
Call Trace:
[9ffaff00] [70008654] show_stack+0x6c/0x1a4 (unreliable)
[9ffaff40] [70192d30] showacpu+0x84/0xcc
[9ffaff60] [700679bc] generic_smp_call_function_single_interrupt+0x100/0x18c
[9ffaff90] [7000ff6c] call_function_single_action+0x10/0x24
[9ffaffa0] [7006f7f4] handle_irq_event_percpu+0xa0/0x21c
[9ffaffe0] [700728d8] handle_percpu_irq+0x88/0xb8
[9ffafff0] [7000e038] call_handle_irq+0x18/0x28
[9fb4fdf0] [700044b0] do_IRQ+0xe8/0x1a0
[9fb4fe20] [7000f81c] ret_from_except+0x0/0x18
--- Exception: 501 at do_munmap+0x114/0x314
LR = do_munmap+0xe8/0x314
[9fb4fee0] [00000014] 0x14 (unreliable)
[9fb4ff20] [700aa7c8] sys_munmap+0x44/0x74
[9fb4ff40] [7000f1b8] ret_from_syscall+0x0/0x3c
--- Exception: c01 at 0x100510a8
=============================================================
Both questions of yours are False. There is no possibility that two cores run the same task at the same moment.
From the OOPS/panic trace, we can know that kernel panic on CPU 0 first, then it triggered the SysRq to "Show backtrace of all active CPUs". That is to say, CPU 1 works good at the time CPU 0 panic, but CPU 0 exception triggered to dump the backtrace of CPU 1.
Now let us analysis why CPU 0 panic: from the backtrace, it happened during "modprobe", so please analysis the kernel modules on your system to locate the one that trigger the panic.

Memory leak at Microsoft DTV-DVD Video decoder?

During rendering h.264 at AVI container video file memory consumption of my application rising with big speed, aroud 150 Mb/min.
This is the link for image of my graph: http://picturepush.com/public/8926555
If using LAV video decoder insted - no memory leak.
First I suggest, that leak happened at my code, but than i just switch off (set "return S_OK" at the start of callback") both my sample grabbers filter - the leak continue.
Also i tried to relese every filter after stop graph like this, but this no delete leak:
if(m_pMediaControl)
{
HRESULT hr = m_pMediaControl->Stop();
LONG lCount;
IUnknown* pUnk;
IAMCollection* p_Collection;
hr = m_pMediaControl->get_FilterCollection(reinterpret_cast<IDispatch**>(&p_Collection));
hr = p_Collection->get_Count(&lCount);
for (int i=0; i<lCount; i++)
{
hr = p_Collection->Item(i, &pUnk);
pUnk->Release();
}
p_Collection->Release();
}
m_pMediaControl.Release();
Will be happy for any suugestions, how to eliminate memory leak?
I create different graphs at graphedit and observed for repetead playback short (6 sec) h.264 video file:
picturepush.com/public/8931745 - Full graph - +6 Mb grow up Private bytes every time after playback
picturepush.com/public/8931760 - With DMO convertor, Without samplegrabber - no memory leak
picturepush.com/public/8931766 - With DMO convertor, Without samplegrabber, but with video renderer - +7 Mb grow up Private bytes every time after playback
picturepush.com/public/8931770 - Only decoder - no memory leak

Resources