I'm using Valgrind to correct a segmentation fault in my code, but when the run arrives to the segmentation fault point my process is killed.
Searching in /var/log/syslog file, I can see that the process memcheck-amd64- (valgrind?) has been killed:
Sep 7 12:48:34 fabiano-HP kernel: [10154.654505] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-c2.scope,task=memcheck-amd64-,pid=3688,uid=1000
Sep 7 12:48:34 fabiano-HP kernel: [10154.654560] Out of memory: Killed process 3688 (memcheck-amd64-) total-vm:11539708kB, anon-rss:6503332kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:12952kB oom_score_adj:0
Sep 7 12:46:26 fabiano-HP org.freedesktop.thumbnails.Cache1[3661]: message repeated 3 times: [ libpng error: Read Error]
Sep 7 12:48:34 fabiano-HP systemd[1]: session-c2.scope: A process of this unit has been killed by the OOM killer.
Now, the problem is that Valgrind doesn't write the output file, so I can't understand what's going on... how can I avoid this? I mean, what's happening?
EDIT:
I'm running Valgrind with this command valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes --log-file=valgrind-out.txt ./mainCPU -ellpack
Related
My runtime environment is CentOS 7.9(kernel is version 5.16.11) in the VMBox virtual machine, it is allocated 1G memory and 8 CPU cores.
[root#dev236 ~]# uname -a
Linux dev236 5.16.11-1.el7.elrepo.x86_64 #1 SMP PREEMPT Tue Feb 22 10:22:37 EST 2022 x86_64 x86_64 x86_64 GNU/Linux
I ran a computation-intensive program that used 8 threads to continuously use the CPU.
After some time, the operating system issues a bug alert, like this:
[root#dev236 src]# ./server --p:command-threads-count=8
[31274.179023] rcu: INFO: rcu_preempt self-detected stall on CPU
[31274.179039] watchdog: BUG: soft lockup - CPU#3 stuck for 210S! [server:1356]
[31274.179042] watchdog: BUG: soft lockup - CPU#1 stuck for 210S! [server:1350]
[31274.179070] watchdog: BUG: soft lockup - CPU#7 stuck for 210S! [server:1355]
[31274.179214] rcu: 0-...!: (1 GPs behind) idle=52f/1/0x4000000000000000 softirq=10073864/10073865
fqs=0
Message from syslogd#dev236 at Jan 25 18:59:49 ...
kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 210S! [server:1356]
Message from syslogd#dev236 at Jan 25 18:59:49 ...
kernel:watchdog: BUG: soft lockup - CPU#1 stuck for 210S! [server:1350]
Message from syslogd#dev236 at Jan 25 18:59:49 ...
kernel:watchdog: BUG: soft lockup - CPU#7 stuck for 210S! [server:1355]
^C
[root#dev236 src]#
Then, I looked at the program log, and the log file was constantly being appended, which indicated that my test program was still running.
I wonder if I can ignore this bug tip?
Or, do I have to do something?
for example:
Reduce the computational intensity of the program?
Give the CPU a break once in a while?
Reduce the number of threads started in the program?
Thank you all
linux:
[root#localhost bin]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.6 (Maipo)
[root#localhost bin]# cat /proc/version
Linux version 4.14.0-115.5.1.el7a.06.aarch64 (mockbuild#arm-buildhost1) (gcc version 4.8.5 20150623 (NeoKylin 4.8.5-36) (GCC)) #1 SMP Tue Jun 18 10:34:55 CST 2019
[root#localhost bin]# file /bin/bash
/bin/bash: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 3.7.0, BuildID[sha1]=8a346ec01d611062313a5a4ed2b0201ecc9d9fa1, stripped
JxBrower7.7:
i used this demo,the line 55 is:Browser browser = engine.newBrowser();
enter code here
public static void main(String[] args) {
Engine engine = Engine.newInstance(
EngineOptions.newBuilder(OFF_SCREEN).build());
Browser browser = engine.newBrowser();
enter code here
[root#localhost bin]# java -jar test.jar
Exception in thread "main" com.teamdev.jxbrowser.navigation.TimeoutException: Failed to execute task withing 45 seconds.
at com.teamdev.jxbrowser.navigation.internal.NavigationImpl.loadAndWait(NavigationImpl.java:248)
at com.teamdev.jxbrowser.navigation.internal.NavigationImpl.loadUrlAndWait(NavigationImpl.java:105)
at com.teamdev.jxbrowser.navigation.internal.NavigationImpl.loadUrlAndWait(NavigationImpl.java:82)
at com.teamdev.jxbrowser.navigation.internal.NavigationImpl.loadUrlAndWait(NavigationImpl.java:74)
at com.teamdev.jxbrowser.engine.internal.EngineImpl.newBrowser(EngineImpl.java:458)
at com.pinnet.HelloWorld.main(HelloWorld.java:55)
linux logs at /var/logs/messages:
22 09:48:53 localhost dbus[8661]: [system] Activating via systemd: service name='org.bluez' unit='dbus-org.bluez.service'
May 22 09:48:54 localhost abrt-hook-ccpp: Process 90562 (chromium) of user 0 killed by SIGABRT - dumping core
May 22 09:48:54 localhost abrt-hook-ccpp: Process 90566 (chromium) of user 0 killed by SIGABRT - ignoring (repeated crash)
May 22 09:48:54 localhost abrt-hook-ccpp: Process 90561 (chromium) of user 0 killed by SIGABRT - ignoring (repeated crash)
May 22 09:48:54 localhost abrt-hook-ccpp: Process 90593 (chromium) of user 0 killed by SIGABRT - ignoring (repeated crash)
May 22 09:48:55 localhost abrt-hook-ccpp: Process 90624 (chromium) of user 0 killed by SIGABRT - ignoring (repeated crash)
May 22 09:48:55 localhost abrt-hook-ccpp: Process 90623 (chromium) of user 0 killed by SIGABRT - ignoring (repeated crash)
May 22 09:48:56 localhost abrt-server: Duplicate: core backtrace
May 22 09:48:56 localhost abrt-server: DUP_OF_DIR: /var/spool/abrt/ccpp-2020-05-21-16:55:06-33694
May 22 09:48:56 localhost abrt-server: Deleting problem directory ccpp-2020-05-22-09:48:54-90562 (dup of ccpp-2020-05-21-16:55:06-33694)
May 22 09:48:56 localhost abrt-server: /bin/sh: reporter-mailx: 未找到命令
May 22 09:49:18 localhost dbus[8661]: [system] Failed to activate service 'org.bluez': timed out
I have two identical raspberry pi 3 b+ devices, one running with raspberrypi-kernel_1.20180313-1 and the other running raspberrypi-kernel_1.20180417-1. I'm watching bluetooth events using "hcidump -R" on both devices. One device shows bluetooth events, the other does not. I've swapped the SD cards on the devices to confirm it is not related to hardware, regardless of which device the SD card is in, the one running 20180313 shows the bluetooth events and the 20180417 does not.
To debug this, I've been adding some printk statements to various points in the raspbian source pulled from git:
https://github.com/raspberrypi/linux
I felt the most relevant place to start with the debugging was in the bluetooth RX code, e.g. print something for each bluetooth message send to the line discipline. Specifically, in drivers/bluetooth/hci_ldisc.c, in the hci_uart_tty_receive function, I added two lines to the beginning:
printk(KERN_ERR "AARON: In hci_uart_tty_receive with tty %p\n", tty);
dump_stack();
After rebuilding the kernels and starting the devices, on the pi running 20180313 I saw the log messages/stack traces and on the other pi I saw nothing, indicating the bluetooth RX code isn't being reached. So to further debug, I looked at the stack trace, which was:
Jul 9 21:03:18 tiltpi kernel: [ 9.391137] Workqueue: events_unbound flush_to_ldisc
Jul 9 21:03:18 tiltpi kernel: [ 9.391166] [<8010f664>] (unwind_backtrace) from [<8010bd1c>] (show_stack+0x20/0x24)
Jul 9 21:03:18 tiltpi kernel: [ 9.391183] [<8010bd1c>] (show_stack) from [<80449c20>] (dump_stack+0xc8/0x114)
Jul 9 21:03:18 tiltpi kernel: [ 9.391221] [<80449c20>] (dump_stack) from [<7f4400cc>] (hci_uart_tty_receive+0x5c/0xac [hci_uart])
Jul 9 21:03:18 tiltpi kernel: [ 9.391254] [<7f4400cc>] (hci_uart_tty_receive [hci_uart]) from [<804b9bdc>] (tty_ldisc_receive_buf+0x64/0x6c)
Jul 9 21:03:18 tiltpi kernel: [ 9.391273] [<804b9bdc>] (tty_ldisc_receive_buf) from [<804ba15c>] (flush_to_ldisc+0xcc/0xe4)
Jul 9 21:03:18 tiltpi kernel: [ 9.391293] [<804ba15c>] (flush_to_ldisc) from [<80135934>] (process_one_work+0x144/0x438)
Jul 9 21:03:18 tiltpi kernel: [ 9.391311] [<80135934>] (process_one_work) from [<80135c68>] (worker_thread+0x40/0x574)
Jul 9 21:03:18 tiltpi kernel: [ 9.391328] [<80135c68>] (worker_thread) from [<8013b930>] (kthread+0x108/0x124)
Jul 9 21:03:18 tiltpi kernel: [ 9.391352] [<8013b930>] (kthread) from [<80107ed4>] (ret_from_fork+0x14/0x20)
I proceeded to add printk statements for flush_to_ldisc and tty_ldisc_receive_buf, recompile, and retest. However, while I continued to see the printk message I added in hci_uart_tty_receive, I did not see the messages I added to flush_to_ldisc or tty_ldisc_receive_buf.
Upon further inspection of the kernel source, I found the stack trace didn't even make sense as the functions listed did not directly call to eachother. More specifically, in tty_buffer.c, flush_to_ldisc (towards the bottom of the stack) calls to receive_buf, which then calls to tty_ldisc_receive_buf, which will then call to hci_uart_tty_receive in hci_ldisc.c. The kernel stack doesn't have any entry for receive_buf and shows flush_to_ldisc calling directly to _tty_ldisc_receive_buf.
So I'm quite confused. I've searched through the kernel source and found no other declarations of "flush_to_ldisc" or "tty_ldisc_receive_buf" functions.
Why/how can dump_stack() be missing a stack entry? Why aren't the prink statements I've placed in the functions toward the bottom of that stack showing up while the printk statements I've placed toward the top of the stack do show up?
EDIT:
A bit more searching shows that the Linux kernel relies on gcc to do certain optimizations including automatic inlining of some functions, and hence that is likely what is happening to my stack trace. That would explain why I don't see the functions explicitly listed in the stack, but doesn't explain why the printk output doesn't show up. Any thoughts from anyone on why printk statements would show up from functions in the top of the stack but not the bottom? The rsyslog.conf file is setup with:
*.err -/var/log/errors.log
And all printk statements I added are like "printk(KERN_ERR "string\n");"
EDIT2: Updated question title to reflect that it is not just about absent printk output.
EDIT3: I deleted my local copy of the kernel source, pulled it again, added my printk statements, and recompiled from scratch, and I now get the printk statements showing up. It seems the code I added wasn't recompiled or linked into the kernel build. I ran "make clean" before making the kernel, but still it seems something wasn't getting compiled/linked properly. But starting clean resolved the problem.
Summary: Linux kernel makes use of gcc optimizations that will result in functions being compiled inline even when not explicitly specified in the source as inline. And when you're "sure" you've recompiled the kernel with your changes, you should start over with a clean source/build dir and try a second time before taking your issue to stack.
I’m using docker, there are more than ten containers running in my machine. I found some oom log in /var/log/message, but I can't figure out those killed process belong which container.
/var/log/message log as below:
kernel: Out of memory: Kill process 165480 (java) score 987 or
sacrifice child
I just updated my arch linux to gnome 3.22 and it broke many things.
Most annoying is that gnome-terminal won't start.
I checked journalctl and found that :
oct. 18 15:11:05 jarvis dbus-daemon[727]: Successfully activated service 'org.gnome.Terminal'
oct. 18 15:11:05 jarvis systemd[711]: Started GNOME Terminal Server.
oct. 18 15:11:06 jarvis org.gnome.Shell.desktop[745]: Window manager warning: Could not import pending buffer, ignoring commit: Failed to create texture 2d due to size/format constraints
oct. 18 15:11:06 jarvis gnome-terminal-[1569]: Error flushing display: Broken pipe
oct. 18 15:11:06 jarvis kernel: traps: gnome-terminal-[1569] trap int3 ip:7f21f3389ff1 sp:7ffd9c3e2bb0 error:0
oct. 18 15:11:06 jarvis systemd[1]: Started Process Core Dump (PID 1592/UID 0).
oct. 18 15:11:06 jarvis systemd[711]: gnome-terminal-server.service: Main process exited, code=dumped, status=5/TRAP
oct. 18 15:11:06 jarvis systemd[711]: gnome-terminal-server.service: Unit entered failed state.
oct. 18 15:11:06 jarvis systemd[711]: gnome-terminal-server.service: Failed with result 'core-dump'.
oct. 18 15:11:06 jarvis systemd-coredump[1593]: Process 1569 (gnome-terminal-) of user 1000 dumped core.
Stack trace of thread 1569:
#0 0x00007f21f3389ff1 n/a (libglib-2.0.so.0)
#1 0x00007f21f338c731 g_log_writer_default (libglib-2.0.so.0)
#2 0x00007f21f338ab8c g_log_structured_array (libglib-2.0.so.0)
...
Going on and on on different libc.so.6 calls.
I can't find how to fix this.
Does any one have an idea of what happened ? and of what I should do ?
Try switch login screen to use Xorg,
edit file: /etc/gdm/custom.conf and uncomment WaylandEnable=false:
[daemon]
# Uncoment the line below to force the login screen to use Xorg
# WaylandEnable=false