SMP threads not showing in GDB

SMP threads not showing in GDB - linux

Trying to debug multi CPU SoC (Amlogic A113X) and faced a problem.
So I have this debug configuration: A113X(JTAG) -> Segger J-Link V11 -> OpenOCD -> gdb-multiarch
Everything is connected and seems okay, but GDB shows just 1 thread (should be 4 - one for each CPU):
(gdb) info threads
Id Target Id Frame
* 1 Remote target 0xffffff8009853364 in arch_spin_lock (lock=<optimized out>) at ./arch/arm64/include/asm/spinlock.h:89
Meanwhile there are 4 debug targets according to telenet 'targets' command:
> targets
TargetName Type Endian TapName State
-- ------------------ ---------- ------ ------------------ ------------
0* A113X.a53.0 aarch64 little A113X.cpu halted
1 A113X.a53.1 aarch64 little A113X.cpu halted
2 A113X.a53.2 aarch64 little A113X.cpu unknown
3 A113X.a53.3 aarch64 little A113X.cpu halted
Core 2 is shut down in this particular case.
When I halt CPUs then get this output in GDB:
(gdb) continue
Continuing.
^CA113X.a53.1 halted in AArch64 state due to debug-request, current mode: EL1H
cpsr: 0x800001c5 pc: 0xffffff8009853364
MMU: enabled, D-Cache: enabled, I-Cache: enabled
A113X.a53.3 halted in AArch64 state due to debug-request, current mode: EL1H
cpsr: 0x800000c5 pc: 0xffffff80098532b4
MMU: enabled, D-Cache: enabled, I-Cache: enabled
Program received signal SIGINT, Interrupt.
0xffffff8009853364 in arch_spin_lock (lock=<optimized out>) at ./arch/arm64/include/asm/spinlock.h:89
89 asm volatile(
(gdb) where
#0 0xffffff8009853364 in arch_spin_lock (lock=<optimized out>) at ./arch/arm64/include/asm/spinlock.h:89
#1 do_raw_spin_lock (lock=<optimized out>) at ./include/linux/spinlock.h:148
#2 __raw_spin_lock (lock=<optimized out>) at ./include/linux/spinlock_api_smp.h:145
#3 _raw_spin_lock (lock=0xffffffc01fb86a00) at kernel/locking/spinlock.c:151
#4 0xffffff80090c2114 in try_to_wake_up (p=0xffffffc01963a880, state=<optimized out>, wake_flags=0) at kernel/sched/core.c:2110
#5 0xffffff80090c239c in wake_up_process (p=<optimized out>) at kernel/sched/core.c:2203
#6 0xffffff80090b1b7c in wake_up_worker (pool=<optimized out>) at kernel/workqueue.c:837
#7 insert_work (pwq=<optimized out>, work=<optimized out>, head=<optimized out>, extra_flags=<optimized out>) at kernel/workqueue.c:1310
#8 0xffffff80090b1d10 in __queue_work (cpu=0, wq=0xdf2, work=0x8df2) at kernel/workqueue.c:1460
#9 0xffffff80090b1fc8 in queue_work_on (cpu=8, wq=0xffffffc01dbb5c00, work=0xffffffc01ccb82a0) at kernel/workqueue.c:1485
#10 0xffffff800191d068 in ?? ()
#11 0xffffffc0138664e8 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
Is something wrong with my OpenOCD configuration? SMP config looks fine because it halts all 4 cores. What can be wrong here? Thanks beforehand.
Here is openocd config:
telnet_port 4444
gdb_port 3333
source [find interface/jlink.cfg]
transport select jtag
adapter speed 1000
scan_chain
set _CHIPNAME A113X
set _DAPNAME $_CHIPNAME.dap
jtag newtap $_CHIPNAME cpu -irlen 4 -expected-id 0x5ba00477
dap create $_DAPNAME -chain-position $_CHIPNAME.cpu
echo "$_CHIPNAME.cpu"
set CA53_DBGBASE {0x80410000 0x80510000 0x80610000 0x80710000}
set CA53_CTIBASE {0x80420000 0x80520000 0x80620000 0x80720000}
set _num_ca53 4
set _ap_num 0
set smp_targets ""
proc setup_a5x {core_name dbgbase ctibase num boot} {
for { set _core 0 } { $_core < $num } { incr _core } {
set _TARGETNAME $::_CHIPNAME.$core_name.$_core
set _CTINAME $_TARGETNAME.cti
cti create $_CTINAME -dap $::_DAPNAME -ap-num $::_ap_num \
-baseaddr [lindex $ctibase $_core]
target create $_TARGETNAME aarch64 -dap $::_DAPNAME -cti $_CTINAME -coreid $_core
set ::smp_targets "$::smp_targets $_TARGETNAME"
}
}
setup_a5x a53 $CA53_DBGBASE $CA53_CTIBASE $_num_ca53 1
echo "SMP targets:$smp_targets"
eval "target smp $smp_targets"
targets $_CHIPNAME.a53.0
And output of OpenOCD:
Open On-Chip Debugger 0.11.0+dev-00640-ge83eeb44a (2022-04-21-10:10)
Licensed under GNU GPL v2
For bug reports, read
http://openocd.org/doc/doxygen/bugs.html
A113X.cpu
SMP targets: A113X.a53.0 A113X.a53.1 A113X.a53.2 A113X.a53.3
Info : Listening on port 6666 for tcl connections
Info : Listening on port 4444 for telnet connections
Info : J-Link V11 compiled Mar 3 2022 10:16:14
Info : Hardware version: 11.00
Info : VTarget = 3.309 V
Info : clock speed 1000 kHz
Info : JTAG tap: A113X.cpu tap/device found: 0x5ba00477 (mfg: 0x23b (ARM Ltd), part: 0xba00, ver: 0x5)
Info : A113X.a53.0: hardware has 6 breakpoints, 4 watchpoints
Info : A113X.a53.1: hardware has 6 breakpoints, 4 watchpoints
Error: JTAG-DP STICKY ERROR
Warn : target A113X.a53.2 examination failed
Info : A113X.a53.3: hardware has 6 breakpoints, 4 watchpoints
Info : A113X.a53.0 cluster 0 core 0 multi core
Info : A113X.a53.1 cluster 0 core 1 multi core
Info : A113X.a53.3 cluster 0 core 3 multi core
Info : starting gdb server for A113X.a53.0 on 3333
Info : Listening on port 3333 for gdb connections
Info : accepting 'gdb' connection on tcp/3333
Info : New GDB Connection: 1, Target A113X.a53.0, state: halted
Warn : Prefer GDB command "target extended-remote :3333" instead of "target remote :3333"
A113X.a53.1 halted in AArch64 state due to debug-request, current mode: EL1H
cpsr: 0x800001c5 pc: 0xffffff8009853364
MMU: enabled, D-Cache: enabled, I-Cache: enabled
A113X.a53.3 halted in AArch64 state due to debug-request, current mode: EL1H
cpsr: 0x800000c5 pc: 0xffffff80098532b4
MMU: enabled, D-Cache: enabled, I-Cache: enabled

I tried to combine hwthread and targete smp to achieve multi-cores.
Add "-rtos hwthread" for each core, like
target create ...... -rtos hwthread
In my opinion, a core is regarded as a hwthread by OpenOCD, so set hwthread for each core. Following is the response of info threads.
(gdb) i threads
Id Target Id Frame
* 1 Thread 1 (Name: riscv.cpu0.0, state: debug-request) 0x8000000a in _mstart ()
2 Thread 2 (Name: riscv.cpu0.1, state: debug-request) 0x82000000 in ?? ()
3 Thread 3 (Name: riscv.cpu0.2, state: debug-request) 0x82000000 in ?? ()
4 Thread 4 (Name: riscv.cpu0.3, state: debug-request) 0x82000000 in ?? ()

Related

DPDK with AF_XDP: Failed to create xsk socket

I am trying to run AF_XDP poll mode driver. Kernel Version is 5.4(CONFIG_XDP_SOCKETS=y).
When i run the samples provided by DPDK website, an error was happened.
root#n211-203-164:~# dpdk-testpmd --vdev=net_af_xdp0,iface=eth3 -- -i --total-num-mbufs=10240
EAL: Detected 96 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No available 1048576 kB hugepages reported
EAL: VFIO support initialized
EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:5e:00.0 (socket 0)
mlx5_net: Default miss action is not supported.
EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:5e:00.1 (socket 0)
mlx5_net: Default miss action is not supported.
EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:86:00.0 (socket 1)
mlx5_net: Default miss action is not supported.
EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:86:00.1 (socket 1)
mlx5_net: Default miss action is not supported.
Interactive-mode selected
testpmd: create a new mbuf pool <mb_pool_0>: n=10240, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
testpmd: create a new mbuf pool <mb_pool_1>: n=10240, size=2176, socket=1
testpmd: preferred mempool ops selected: ring_mp_mc
Warning! port-topology=paired and odd forward ports number, the last port will pair with itself.
Configuring Port 0 (socket 0)
Port 0: B8:CE:F6:35:DB:3A
Configuring Port 1 (socket 0)
Port 1: B8:CE:F6:35:DB:3B
Configuring Port 2 (socket 1)
Port 2: B8:CE:F6:3B:31:4A
Configuring Port 3 (socket 1)
Port 3: B8:CE:F6:3B:31:4B
Configuring Port 4 (socket 0)
xsk_configure(): Failed to create xsk socket.
eth_rx_queue_setup(): Failed to configure xdp socket
Fail to configure port 4 rx queues
EAL: Error - exiting with code: 1
Cause: Start ports failed
it seems that XSK socket call bind() failed and eth_rx_queue_setup() failed as a result.
How to correctly configure EAL parameters to run dpdk-testpmd with vdev=af_net_xdp0? Any help is greatly appreciated.

It is an issue about rlimit. I wrote a simple test programer.
#include <bpf.h>
#include <xsk.h>
#include <stdio.h>
//#include <sys/resource.h>
//static struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
int main() {
//setrlimit(RLIMIT_MEMLOCK, & r);
int fd = bpf_create_map_name(BPF_MAP_TYPE_XSKMAP, "xsks_map", sizeof(int), sizeof(int), 1, 0);
printf("fd:%d\n", fd);
return 0;
}
Execute cmd strace -e bpf ./test-prog and the result could be like this:
root#n211-203-164:~# strace -e bpf ./test
bpf(BPF_MAP_CREATE, {map_type=0x11 /* BPF_MAP_TYPE_??? */, key_size=4, value_size=4, max_entries=1}, 112) = -1 EPERM (Operation not permitted)
fd:-1 1
+++ exited with 0 +++
This means max locked memory caused the insufficient resource allocation.
Execute cmd ulimit -a could find max locked memory is 64 kbytes.
There are 2 solutions, one is to enlarge max locked memory globally by executing cmd ulimit -S -l <a big value>, and the other is to call setrlimit() into libbpf.so when creating a BPF map.

RegexpParser crashes Python 3.8.6 kernel in JupyterLab

I am parsing chunks of pos-tagged text in JupyterLabs.
NLTK can run on Python<3.5 and >3.8 according to its faq.
I can return pos-tagged text just fine. But when I want to return parsed chunks, it crashes python.
I am running MacOS 11.1, Python 3.8.6, Jupyterlab 3.0.0, and nltk 3.5
from nltk import *
text = """The Buddha, the Godhead, resides quite as comfortably in the circuits of a digital
computer or the gears of a cycle transmission as he does at the top of a mountain
or in the petals of a flower. To think otherwise is to demean the Buddha...which is
to demean oneself."""
sentence_re = r'''(?x)
([A-Z])(\.[A-Z])+\.?
| \w+(-\w+)*
| \$?\d+(\.\d+)?%?
| \.\.\.
| [][.,;"'?():-_`]
'''
grammar = r"""
NBAR:
{<NN.*|JJ>*<NN.*>} # Nouns and Adjectives, terminated with Nouns
NP:
{<NBAR>}
{<NBAR><IN><NBAR>} # Above, connected with in/of/etc...
"""
chunker = RegexpParser(grammar)
toks = word_tokenize(text)
postoks = pos_tag(toks)
All is fine until I want to parse the chunks with RegexpParser. At which point, it crashes the kernel.
chunker.parse(postoks)
beginning of the crash report:
Process: Python [1549]
Path: /usr/local/Cellar/python#3.8/3.8.6_2/Frameworks/Python.framework/Versions/3.8/Resources/Python.app/Contents/MacOS/Python
Identifier: org.python.python
Version: 3.8.6 (3.8.6)
Code Type: X86-64 (Native)
Parent Process: Python [1522]
Responsible: Terminal [416]
User ID: 501
Date/Time: 2020-12-29 15:34:13.546 -0500
OS Version: macOS 11.1 (20C69)
Report Version: 12
Anonymous UUID: 8EEE2257-0986-3569-AA83-52641AF02282
Time Awake Since Boot: 1200 seconds
System Integrity Protection: enabled
Crashed Thread: 0 Dispatch queue: com.apple.main-thread
Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Exception Note: EXC_CORPSE_NOTIFY
Application Specific Information:
abort() called
end of the crash report:
External Modification Summary:
Calls made by other processes targeting this process:
task_for_pid: 4
thread_create: 0
thread_set_state: 0
Calls made by this process:
task_for_pid: 0
thread_create: 0
thread_set_state: 0
Calls made by all processes on this machine:
task_for_pid: 4785
thread_create: 0
thread_set_state: 0
VM Region Summary:
ReadOnly portion of Libraries: Total=835.0M resident=0K(0%) swapped_out_or_unallocated=835.0M(100%)
Writable regions: Total=1.5G written=0K(0%) resident=0K(0%) swapped_out=0K(0%) unallocated=1.5G(100%)
VIRTUAL REGION
REGION TYPE SIZE COUNT (non-coalesced)
=========== ======= =======
Activity Tracing 256K 1
Dispatch continuations 64.0M 1
Kernel Alloc Once 8K 1
MALLOC 150.1M 37
MALLOC guard page 24K 5
MALLOC_MEDIUM (reserved) 960.0M 8 reserved VM address space (unallocated)
STACK GUARD 72K 18
Stack 86.6M 18
VM_ALLOCATE 182.5M 359
VM_ALLOCATE (reserved) 128.0M 2 reserved VM address space (unallocated)
__DATA 16.1M 460
__DATA_CONST 11.8M 200
__DATA_DIRTY 509K 87
__FONT_DATA 4K 1
__LINKEDIT 506.1M 241
__OBJC_RO 60.5M 1
__OBJC_RW 2452K 2
__TEXT 329.5M 432
__UNICODE 588K 1
mapped file 51.5M 9
shared memory 40K 4
=========== ======= =======
TOTAL 2.5G 1888
TOTAL, minus reserved VM space 1.4G 1888
Model: iMac18,2, BootROM 429.60.3.0.0, 4 processors, Quad-Core Intel Core i7, 3.6 GHz, 32 GB, SMC 2.40f1
Graphics: kHW_AMDRadeonPro560Item, Radeon Pro 560, spdisplays_pcie_device, 4 GB
Memory Module: BANK 0/DIMM0, 16 GB, DDR4 SO-DIMM, 2400 MHz, 0x802C, 0x313641544632473634485A2D3247334232202020
Memory Module: BANK 1/DIMM0, 16 GB, DDR4 SO-DIMM, 2400 MHz, 0x802C, 0x313641544632473634485A2D3247334232202020
AirPort: spairport_wireless_card_type_airport_extreme (0x14E4, 0x16E), Broadcom BCM43xx 1.0 (7.77.111.1 AirPortDriverBrcmNIC-1675.1)
Bluetooth: Version 8.0.2f9, 3 services, 27 devices, 1 incoming serial ports
Network Service: Wi-Fi, AirPort, en1
USB Device: USB 3.0 Bus
USB Device: AS2105
USB Device: USB 2.0 BILLBOARD
USB Device: Bluetooth USB Host Controller
USB Device: FaceTime HD Camera (Built-in)
USB Device: Scarlett 2i4 USB
USB Device: My Passport 0827
Thunderbolt Bus: iMac, Apple Inc., 41.4

How does Mesa recycle graphic resources?

I have a system running on an Intel debug board with DRM and Mesa.
This graphic system use Wayland/Weston and Mesa.
And applications are developed with OpenGL ES 2.0.
Now, I find, sometimes, if the application crashed, the Weston will crashed too.
By checking the coredump of Weston, I can find some invalid memory address was used.
But when I run Weston with Valgrind, there is not any report for invalid memory access.
So, I am thinking about if there were some shared-memory leak by mesa when the client crash.
Means, for example, an application draw a buffer, and commit it to Weston, after that, the application crashed, and mesa recycled all the buffers alloced by this application. But, Weston do not know this, it used the committed buffer and crashed.
Will those things happen? And what could I do to survive from this?
Core was generated by `weston --config=/usr/lib/weston/weston.ini --backend=/usr/lib/weston/ias-backen'.
Program terminated with signal SIGTRAP, Trace/breakpoint trap.
#0 0x00007fec68911b09 in raise (sig=5) at ../sysdeps/unix/sysv/linux/pt-raise.c:36
36 ../sysdeps/unix/sysv/linux/pt-raise.c: No such file or directory.
(gdb) bt
#0 0x00007fec68911b09 in raise (sig=5) at ../sysdeps/unix/sysv/linux/pt-raise.c:36
#1 <signal handler called>
#2 0x00007fec685904b8 in __GI_raise (sig=sig#entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55
#3 0x00007fec6859358a in __GI_abort () at abort.c:89
#4 0x00007fec685ca90b in __libc_message (do_abort=do_abort#entry=2, fmt=fmt#entry=0x7fec686c68a0 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
#5 0x00007fec685d4896 in malloc_printerr (action=3, str=0x7fec686c2d31 "free(): invalid pointer", ptr=<optimized out>, ar_ptr=<optimized out>) at malloc.c:5000
#6 0x00007fec685d507e in _int_free (av=0x7fec688fbb40 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:3861
#7 0x00007fec655d320d in intel_miptree_release (mt=mt#entry=0x18884a8)
at src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1036
#8 0x00007fec655d32a7 in intel_miptree_reference (dst=0x18884a8, src=0x1870170)
at src/mesa/drivers/dri/i965/intel_mipmap_tree.c:989
#9 0x00007fec655dc640 in intel_set_texture_image_mt (brw=brw#entry=0x7fec69a44040, image=image#entry=0x185c050, internal_format=<optimized out>, mt=<optimized out>)
at src/mesa/drivers/dri/i965/intel_tex_image.c:180
#10 0x00007fec655dc952 in intel_image_target_texture_2d (ctx=<optimized out>, target=3553, texObj=0x1888080, texImage=0x185c050, image_handle=<optimized out>)
at src/mesa/drivers/dri/i965/intel_tex_image.c:426
#11 0x00007fec653544ff in _mesa_EGLImageTargetTexture2DOES (target=3553, image=0x185fb60)
at src/mesa/main/teximage.c:3194
#12 0x00007fec660a2d27 in gl_renderer_attach_egl (format=<optimized out>, buffer=0x1889130, es=<optimized out>) at ../src/gl-renderer.c:1450
#13 gl_renderer_attach (es=<optimized out>, buffer=0x1889130) at ../src/gl-renderer.c:1919
#14 0x000000000040fdb5 in weston_surface_attach (buffer=0x1889130, surface=0x13ad650) at ../src/compositor.c:2266
#15 weston_surface_commit_state (surface=surface#entry=0x13ad650, state=state#entry=0x13ad778) at ../src/compositor.c:3190
#16 0x000000000041036f in weston_surface_commit (surface=surface#entry=0x13ad650) at ../src/compositor.c:3262
#17 0x00000000004104c7 in surface_commit (client=<optimized out>, resource=<optimized out>) at ../src/compositor.c:3289
#18 0x00007fec6835ad04 in ffi_call_unix64 () from /usr/lib/libffi.so.6
#19 0x00007fec6835a7fa in ffi_call () from /usr/lib/libffi.so.6
#20 0x00007fec696ff7ba in wl_closure_invoke (closure=<optimized out>, flags=<optimized out>, target=0x13ad940, opcode=6, data=0x13acd70) at ../src/connection.c:935
#21 0x00007fec696fc517 in wl_client_connection_data (fd=<optimized out>, mask=<optimized out>, data=0x13acd70) at ../src/wayland-server.c:407
#22 0x00007fec696fdd32 in wl_event_loop_dispatch (loop=0x11a7460, timeout=timeout#entry=-1) at ../src/event-loop.c:423
#23 0x00007fec696fc6b5 in wl_display_run (display=display#entry=0x11a7380) at ../src/wayland-server.c:1281
#24 0x00000000004092d1 in main (argc=1, argv=<optimized out>) at ../src/main.c:1049
(gdb)

nodejs + aerospike crashes

I am getting the following backtrace from an error that only happens after some subsequent requests to the server:
node: ../deps/uv/src/unix/core.c:171: uv__finish_close: Assertion `handle->flags & UV_CLOSING' failed.
Program received signal SIGABRT, Aborted.
0x00000030eee32925 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) backtrace
#0 0x00000030eee32925 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00000030eee34105 in abort () at abort.c:92
#2 0x00000030eee2ba4e in __assert_fail_base (fmt=<value optimized out>, assertion=0xb37538 "handle->flags & UV_CLOSING", file=0xb374a8 "../deps/uv/src/unix/core.c", line=<value optimized out>, function=<value optimized out>)
at assert.c:96
#3 0x00000030eee2bb10 in __assert_fail (assertion=0xb37538 "handle->flags & UV_CLOSING", file=0xb374a8 "../deps/uv/src/unix/core.c", line=171, function=0xb37690 "uv__finish_close") at assert.c:105
#4 0x0000000000994bb4 in uv__finish_close (loop=0xe6d840, mode=<value optimized out>) at ../deps/uv/src/unix/core.c:171
#5 uv__run_closing_handles (loop=0xe6d840, mode=<value optimized out>) at ../deps/uv/src/unix/core.c:221
#6 uv_run (loop=0xe6d840, mode=<value optimized out>) at ../deps/uv/src/unix/core.c:319
#7 0x0000000000942132 in node::Start(int, char**) ()
#8 0x00000030eee1ed1d in __libc_start_main (main=0x599710 <main>, argc=2, ubp_av=0x7fffffffdec8, init=<value optimized out>, fini=<value optimized out>, rtld_fini=<value optimized out>, stack_end=0x7fffffffdeb8)
at libc-start.c:226
#9 0x00000000005999f1 in _start ()
Any idea why is this happening? I am using aerospike but I am not sure if it is related to the issue.
To reproduce it:
gdb --args node /bin/www
> run // until error occurs
> backtrace full

This question was asked on the Aerospike Community Edition user forum here.
Aerospike made an official release to NPM (Node.js 1.0.38) in April 2015; it fixes this UV assertion segfault when running query in the node.js API.
The user #Daniel reported that his problem is now fixed.

CPU stalling in SMP

We are facing an issue on which we need some help.
Brief write-up :
We have enabled SMP in Linux 2.6.39.4 kernel and cross compiled it for PPC-476. After booting, kernel is able to map both the processors (2 cores at h/w). The problem we are facing is, while running modprobe command repeatedly, one of the cpu goes into stall state. We have tried to dump stack of all active cpus (using sysrq) while one of the cpu is in stall state. The stack dump showed both the processors were executing same process (with same PID) i.e. modprobe.
Question :
1. is it possible for both the processors to be executing same process at a movement with same PID.
2. Is the execution of both process simultaneously give rise to some race condition which is causing either of CPU to go in stall state.
LOGS===============================================
SysRq : Show backtrace of all active CPUs
CPU0:
NIP: 701786c4 LR: 701752f0 CTR: 00000004
REGS: 9fb4fdc0 TRAP: 0501 Not tainted (2.6.39.4)
MSR: 00029000 <EE,ME,CE> CR: 44002048 XER: 00000000
TASK = 8f868ae0[827] 'modprobe' THREAD: 9fb48000 CPU: 0
GPR00: 08101820 9fb4fe70 8f868ae0 22222222 0002374d 00000002 00849ffc 00000000
GPR08: a2bb3a8c 00000810 a2c2aac4 00000148 44002088
NIP [701786c4] __sw_hweight32+0x50/0x58
LR [701752f0] __bitmap_weight+0x54/0xc0
Call Trace:
[9fb4fe70] [20000000] 0x20000000 (unreliable)
[9fb4fe90] [7006b1cc] sys_init_module+0x11a8/0x1ca4
[9fb4ff40] [7000f1b8] ret_from_syscall+0x0/0x3c
--- Exception: c01 at 0x10050e38
LR = 0x100a6708
Instruction dump:
7c634838 7c004838 7c001a14 5409e13e 7c090214 3d200f0f 61290f0f 7c004838
5409c23e 7c090214 5409843e 7c090214 <5403063e> 4e800020 5460f87e 70005555
CPU1:
NIP: 700a9678 LR: 700a964c CTR: 7012b0f8
REGS: 9fb4fe30 TRAP: 0501 Not tainted (2.6.39.4)
MSR: 00029000 <EE,ME,CE> CR: 80008022 XER: 20000000
TASK = 8f868ae0[827] 'modprobe' THREAD: 9fb48000 CPU: 1
GPR00: 00000000 9fb4fee0 8f868ae0 9efb9f20 9efb9f20 9fb4fee8 1004d6d0 0002d000
GPR08: 9efb9d68 9f8eee00 9efb9f20 00000000 20002022
NIP [700a9678] do_munmap+0x114/0x314
LR [700a964c] do_munmap+0xe8/0x314
Call Trace:
[9fb4fee0] [00000014] 0x14 (unreliable)
[9fb4ff20] [700aa7c8] sys_munmap+0x44/0x74
[9fb4ff40] [7000f1b8] ret_from_syscall+0x0/0x3c
--- Exception: c01 at 0x100510a8
LR = 0x10022088
Instruction dump:
4bffe461 7c641b79 41820010 80040004 7f9d0040 419d01d4 83210008 2e190000
41920200 83d9000c 801f0074 2f800000 <419e00f8> 2f9e0000 419e00f0 801e0004
Call Trace:
[9ffaff00] [70008654] show_stack+0x6c/0x1a4 (unreliable)
[9ffaff40] [70192d30] showacpu+0x84/0xcc
[9ffaff60] [700679bc] generic_smp_call_function_single_interrupt+0x100/0x18c
[9ffaff90] [7000ff6c] call_function_single_action+0x10/0x24
[9ffaffa0] [7006f7f4] handle_irq_event_percpu+0xa0/0x21c
[9ffaffe0] [700728d8] handle_percpu_irq+0x88/0xb8
[9ffafff0] [7000e038] call_handle_irq+0x18/0x28
[9fb4fdf0] [700044b0] do_IRQ+0xe8/0x1a0
[9fb4fe20] [7000f81c] ret_from_except+0x0/0x18
--- Exception: 501 at do_munmap+0x114/0x314
LR = do_munmap+0xe8/0x314
[9fb4fee0] [00000014] 0x14 (unreliable)
[9fb4ff20] [700aa7c8] sys_munmap+0x44/0x74
[9fb4ff40] [7000f1b8] ret_from_syscall+0x0/0x3c
--- Exception: c01 at 0x100510a8
=============================================================

Both questions of yours are False. There is no possibility that two cores run the same task at the same moment.
From the OOPS/panic trace, we can know that kernel panic on CPU 0 first, then it triggered the SysRq to "Show backtrace of all active CPUs". That is to say, CPU 1 works good at the time CPU 0 panic, but CPU 0 exception triggered to dump the backtrace of CPU 1.
Now let us analysis why CPU 0 panic: from the backtrace, it happened during "modprobe", so please analysis the kernel modules on your system to locate the one that trigger the panic.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

SMP threads not showing in GDB - linux

Related

DPDK with AF_XDP: Failed to create xsk socket

RegexpParser crashes Python 3.8.6 kernel in JupyterLab

How does Mesa recycle graphic resources?

nodejs + aerospike crashes

CPU stalling in SMP

Categories

Resources