Nodejs crash: GC segfault - node.js

Node version: 4.8.0
Platform: Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u2 (2016-10-19) x86_64 GNU/Linux
Node crashed during the Garbace collection but without any other high level pattern (maybe related to https://github.com/nodejs/node/issues/3715).
Unfurtunately I don't have any code to reproduce as I was not able to isolate the problem.
This is the crash stack trace captured with segfault-handler module:
PID 24495 received SIGSEGV for address: 0x3809f3d021f8
<path_node_modules>/segfault-handler/build/Release/segfault-handler.node(+0x1a5b)[0x7f7dd565ca5b]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7f7dd9c20890]
/usr/bin/nodejs(_ZN2v88internal20MarkCompactCollector22ProcessWeakCollectionsEv+0xfd)[0xaec4dd]
/usr/bin/nodejs(_ZN2v88internal20MarkCompactCollector15MarkLiveObjectsEv+0x214)[0xaf3a14]
/usr/bin/nodejs(_ZN2v88internal20MarkCompactCollector14CollectGarbageEv+0x11)[0xaf47e1]
/usr/bin/nodejs(_ZN2v88internal4Heap11MarkCompactEv+0x60)[0xaaafe0]
/usr/bin/nodejs(_ZN2v88internal4Heap24PerformGarbageCollectionENS0_16GarbageCollectorENS_15GCCallbackFlagsE+0x4c0)[0xac2be0]
/usr/bin/nodejs(_ZN2v88internal4Heap14CollectGarbageENS0_16GarbageCollectorEPKcS4_NS_15GCCallbackFlagsE+0x238)[0xac30f8]
/usr/bin/nodejs(_ZN2v88internal4Heap15HandleGCRequestEv+0x8f)[0xac3aef]
/usr/bin/nodejs(_ZN2v88internal10StackGuard16HandleInterruptsEv+0x31c)[0xa6041c]
/usr/bin/nodejs(_ZN2v88internal18Runtime_StackGuardEiPPNS0_6ObjectEPNS0_7IsolateE+0x2b)[0xca51ab]
[0x2f2137d0963b]
And also this other stack sometimes:
PID 7545 received SIGSEGV for address: 0x68233500009
/home/documentapp/node_modules/segfault-handler/build/Release/segfault-handler.node(+0x1a5b)[0x7f89249bfa5b]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7f8928f83890]
/usr/bin/nodejs(_ZN2v88internal32IncrementalMarkingMarkingVisitor26VisitFixedArrayIncrementalEPNS0_3MapEPNS0_10HeapObjectE+0x3fe)[0xad51ee]
/usr/bin/nodejs(_ZN2v88internal18IncrementalMarking4StepElNS1_16CompletionActionENS1_18ForceMarkingActionENS1_21ForceCompletionActionE+0x30c)[0xad2a7c]
/usr/bin/nodejs(_ZN2v88internal8NewSpace15SlowAllocateRawEiNS0_19AllocationAlignmentE+0x78)[0xb00f18]
/usr/bin/nodejs(_ZN2v88internal4Heap11AllocateRawEiNS0_15AllocationSpaceES2_NS0_19AllocationAlignmentE+0x109)[0xa64719]
/usr/bin/nodejs(_ZN2v88internal4Heap20AllocateFillerObjectEibNS0_15AllocationSpaceE+0x19)[0xaabd19]
/usr/bin/nodejs(_ZN2v88internal7Factory15NewFillerObjectEibNS0_15AllocationSpaceE+0x2d)[0xa64c5d]
/usr/bin/nodejs(_ZN2v88internal29Runtime_AllocateInTargetSpaceEiPPNS0_6ObjectEPNS0_7IsolateE+0x5e)[0xca52ee]
[0x1e31ede06355]
Can someone give me some hint on how I can find the prblem? Thanks
If you prefer you can also answer on the node issues that I have created:
https://github.com/nodejs/node/issues/11606
Additional information:
Node framework: express, Sails.js
My native modules founded with find node_modules -name '*.node' are:
node_modules/bcrypt/build/Release/bcrypt_lib.node
node_modules/bcrypt/build/Release/obj.target/bcrypt_lib.node
node_modules/segfault-handler/build/Release/segfault-handler.node
node_modules/segfault-handler/build/Release/obj.target/segfault-handler.node

The problems seems to be cause by mongodb logs that fill up the disk space a some point. Was actually hard to see because we clean this periodically so was not critical at the moment I've checked.

Related

Tensorflow error when used as Docker baseimage

Hi i am using below as my Docker image for fastapi application
FROM tensorflow/tensorflow:latest
when i run docker its running but i am getting this error
2021-06-23 23:31:50.516749: F tensorflow/core/lib/monitoring/sampler.cc:42] Check failed: bucket_limits_[i] > bucket_limits_[i - 1] (0 vs. 10)
qemu: uncaught target signal 6 (Aborted) - core dumped
[2021-06-23 23:31:50 +0530] [1] [WARNING] Worker with pid 2697 was terminated due to signal 6
and when i call api, i am not getting response, does it take time for api call or can you please tell me where it is wrong
I am guessing you are using a Mac with a M1 chip as This is a qemu bug, which is the upstream component we use for running Intel containers on M1 chips, this issue hasn't been solved yet. I suggest you can try and build TensorFlow for aarch64 Linux from source.

"Microsoft Windows 10 Enterprise 2016 LTSB 10.0.14393 Version 1607" fails on boot in qemu/kvm (proxmox)

I have played with different versions of windows 10 inside qemu/kvm (proxmox) and all of them works fine except: "Microsoft Windows 10 Enterprise 2016 LTSB 10.0.14393 Version 1607".
I don't think that the problem is connected with proxmox itself. As I know proxmox is stable and reliable system that use qemu/kvm under the hood. So lets think more about qemu/kvm. However my proxmox versions below.
root#home:~# pveversion -v
proxmox-ve: 5.3-1 (running kernel: 4.15.18-10-pve)
pve-manager: 5.3-8 (running version: 5.3-8/2929af8e)
pve-kernel-4.15: 5.3-1
pve-kernel-4.15.18-10-pve: 4.15.18-32
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-44
libpve-guest-common-perl: 2.0-19
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-36
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-2
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-33
pve-container: 2.0-33
pve-docs: 5.3-1
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-17
pve-firmware: 2.0-6
pve-ha-manager: 2.0-6
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 3.10.1-1
qemu-server: 5.0-45
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1
In any case I have not found any similar thread on proxmox forum. That's why I post my problem here.
This is clean original msdn iso from microsoft with confirmed hash sums(installed more then 10 times).
Steps to reproduce:
Create VM with next configuration
root#home:~# cat /etc/pve/qemu-server/102.conf
bios: ovmf
boot: dcn
bootdisk: scsi0
cores: 8
cpu: host
efidisk0: local-lvm:vm-102-disk-0,size=4M
ide2: iso-backs:iso/MS DaRT 10 Eng x86 x64.iso,media=cdrom,size=600320K
machine: q35
memory: 8192
name: win10-test
net0: virtio=C2:25:D9:DD:F2:4F,bridge=vmbr0
numa: 1
ostype: win10
scsi0: local-lvm:vm-102-disk-1,size=100G
scsi1: external:vm-102-disk-0,size=100G
scsihw: virtio-scsi-pci
smbios1: uuid=9d455cbf-1fa2-495f-928d-3935ec39c245
sockets: 1
usb0: host=1c4f:0002
usb1: host=09da:9090
vmgenid: 40cd47b6-35c4-47ab-8f9e-ed2acb618fcc
Install latest virtio drivers (scsi, netkvm, baloon, qemu-fwcfg)
Accept disk auto-partitioning (4 partitions will be created for this iso)
Wait installations end and reboot the system
Boot will stuck at proxmox logo
However, I can always boot from live cd (MS DaRT), to do that I need to manually choose harddisk from "Use a device" menu.
Once it load successfully there is a chance it will boot again an indefinite number of times. I can't figure out the reason of such behavior.
I have tried to avoid this problem by installing grub. But nothing has changed - I am still able to load system through live cd and always have a random chance to stuck at default load process.
Event Viewer errors(repeatable):
Distributed COM Event_id: 10016
Eventlog Event_id: 1101
Kernel-Power Event_id: 41
Eventlog Event_id: 6008
Kernel-Power Event_id: 13

crashkernel not starting after crash

I am trying to start a crash kernel using Linux's Kernel Crash Dump. Both of the host and crash kernel are compiled linux-4.13.16 kernel. Unfortunately, the crash kernel fails to start after the crash occurs.
iomem reports reserved space for crash kernel and kdump reports its ready for kdump:
28000000-37ffffff : Crash kernel
$ sudo kdump-config show
DUMP_MODE: kdump
USE_KDUMP: 1
KDUMP_SYSCTL: kernel.panic_on_oops=1
KDUMP_COREDIR: /var/crash
crashkernel addr: 0x28000000
/var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinuz-4.13.16ksa
kdump initrd:
/var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-4.13.16ksa
current state: ready to kdump
kexec command:
/sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinuz-4.13.16ksa root=UUID=3254c608-d885-4dfc-b20b-fa4e69564dca ro quiet splash vt.handoff=7 irqpoll noirqdistrib nr_cpus=1 nousb systemd.unit=kdump-tools.service" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz
$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.13.16ksa root=UUID=3254c608-d885-4dfc-b20b-fa4e69564dca ro quiet splash crashkernel=384M-2G:128M,2G-:256M vt.handoff=7
After triggering crash using sysrq-trigger, it doesn't load crash kernel.
I have tested with a generic kernel, linux-4.8.0-36-generic, which successfully works.
Syslog file are here.
Both linux-4.8.0-36-generic and linux-4.13.16ksa has same .config file. Only difference I can see is that during boot, for linux-4.13.0-38, it loads the efi.signed vmlinuz(vmlinuz-4.13.0-38-generic.efi.signed) where as the compiled linux-4.13.16ksa is not efi signed.
Can it be an issue? How can I resolve this?

How can I debug this User Space application crash?

I'm running a Qt5.4.0 application on my embedded Linux system (TI AM335x based) and it's stopping to run and I'm having a hard time debugging this. This is a QtWebKit QML example (youtubeview) but other QtWebKit examples are preforming the same for me so it's something WebKit based on my system.
When I run the application, it runs for a second or so, then ends with no messages. There is nothing reported to the syslog or dmesg either. When I kick it off with strace I can see this futex message:
futex(0x2d990, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x2d9ac, FUTEX_WAIT_PRIVATE, 7, NULL <unfinished ...>
+++ exited with 1 +++
Then it stops. Not very helpful... My next though was to debug this with GDB, however GDB crashes when I try to run this:
-sh-4.2# gdb youtubeview
GNU gdb (GDB) 7.5
Copyright (C) 2012 Free Software Foundation, Inc.
...
(gdb) run
Starting program: /usr/share/qt5/examples/webkitqml/youtubeview/youtubeview
/home/mike/ulf_qt_450/ulf/build-ulf/out/work/armv7ahf-vfp-neon-linux-gnueabihf/gdb/7.5-r0.0/gdb-7.5/gdb/utils.c:1081: internal-error: virtual memory exhausted: can't allocate 64652911 bytes.
A problem internal to GDB has been detected,
This issue occurs even if I set a break point at main first, just as soon as it starts running it will get stuck and run out of memory.
Are there other tools or techniques that can be used here to help isolate the issue?
Perhaps arguments to GDB to limit memory use or give some more information about why this Qt program made it crash?
Perhaps some FDs or system variables I could use to figure out why the FUTEX is being held and failing?
I'm not sure where to take this problem right now.
The Qt code itself is pretty simple, and I don't anticipate any issues in here:
#include <QGuiApplication>
#include <QQuickView>
int main(int argc, char* argv[])
{
QGuiApplication app(argc,argv);
QQuickView view;
view.setSource(QUrl("qrc:///" QWEBKIT_EXAMPLE_NAME ".qml"));
view.setResizeMode(QQuickView::SizeRootObjectToView);
view.show();
return app.exec();
return 0;
}
Running gdb on the device, especially with a huge library such as WebKit, is bound to get you out of memory errors.
Instead, run gdbserver on the device, and connect to it via gdb running on the host machine, using the toolchain's cross-gdb for that. In that scenario, the gdb on the host loads all the debug information, while the gdbserver on the device needs almost no memory.
It is even possible to have the separate debug information available on the host and stripped libraries on the device.
Please note that parts of WebKit are always built in release mode, even if the rest of Qt was built in debug mode, if you are going to debug into WebKit you might want to change that in the build system.
Here is a minimal example:
Device:
# gdbserver 192.168.1.2:12345 myapp
Process myapp created; pid = 989
Listening on port 12345
Host:
# arm-none-linux-gnueabi-gdb myapp
GNU gdb (Sourcery G++ Lite 2009q1-203) 6.8.50.20081022-cvs
(gdb) set solib-absolute-prefix /opt/targetroot
(gdb) target remote 192.168.1.42:12345
Remote debugging using 192.168.1.42:12345
(gdb) start
The "remote" target does not support "run". Try "help target" or "continue".
(gdb) break main
Breakpoint 1 at 0x1ab9c: file myapp/main.cpp, line 12.
(gdb) cont
Continuing.
Breakpoint 1, main (argc=1, argv=0xbecfedb4) at myapp/main.cpp:12
12 QApplication app(argc, argv, QApplication::GuiServer);
And you are right that it looks like a problem in QtWebKit itself, not in your application. Good luck!

Not running RabbitMQ on Linux, can not find the file xmerl.app

I use OpenSUSE 12.3
I installed erlang and erlang-otp (R14B04)
I started RabbitMQ ./rabbitmq-server
is an error:
$:/opt/rabbitmq/rabbitmq_server-3.1.3/sbin # ./rabbitmq-server
BOOT FAILED
===========
Error description:
{error,{"no such file or directory","xmerl.app"}}
Log files (may contain more information):
./../var/log/rabbitmq/rabbit#testTFOMS.log
./../var/log/rabbitmq/rabbit#testTFOMS-sasl.log
Stack trace:
[{app_utils,load_applications,2},
{app_utils,load_applications,1},
{rabbit,'-boot/0-fun-1-',0},
{rabbit,start_it,1},
{init,start_it,1},
{init,start_em,1}]
{"init terminating in do_boot",{rabbit,failure_during_boot,{error,{"no such file or directory","xmerl.app"}}}}
Crash dump was written to: erl_crash.dump
init terminating in do_boot ()
I can find the file
$:find / -name 'xmerl.app'
/usr/lib/erlang/lib/xmerl-1.2.10/ebin/xmerl.app
where you need to specify it to start the program?
Can you start xmerl without rabbitmq? Just:
application:start(xmerl).
Try:
code:add_path("/usr/lib/erlang/lib/xmerl-1.2.10/ebin/").
And than:
./rabbitmq-server
I downloaded the rpm package for CentOS and set it using zypper, then start up rabbitMQ

Resources