Why process terminates abnormally without coredump? - linux

I have stuck in a such problem that my c++ server program can't coredump when terminate abnormally. The program running in daemon mode with chdir to '/'.
I had done the following things:
ulimit -c unlimited, so coredump enabled.
echo "/tmp/coredump/core.%e.%p.%t" > /proc/sys/kernel/core_pattern, and chmod a+w coredump, so it has the permission to write coredump file.
and I had try such things:
send a SIGABRT via kill -6, it can coredump.
in dmesg, I can't find any info about the abnormally terminates process.
running the program not in daemon mode.
My OS version: CentOS release 6.4 (Final), x86_64
ps. the server program installed a signal handler (sigaction() with flag SA_RESETHAND) to catch such signals {SIGHUP, SIGINT, SIGQUIT, SIGTERM} for normally terminates(free resources). so it can exclude the signal shielding.

Related

mpirun launching a runtime script for thread binding

Running a program in a MPI process, which exec file is myapp_mpi. The command line below launches the app correctly
mpirun --bind-to none -np 128 myapp_mpi apprun -resethway -noconfout -nsteps 8000 -s benchData.tpr -cpo state.cpt -e ener.edr -dlb no -pin off -v
I now wish to constrain the thread binding with a script pin.sh. The command line below would then produce an error.
mpirun --bind-to none -np 128 pin.sh myapp_mpi apprun -resethway -noconfout -nsteps 8000 -s benchData.tpr -cpo state.cpt -e ener.edr -dlb no -pin off -v
I get the following error
--------------------------------------------------------------------------
Open MPI tried to fork a new process via the "execve" system call but
failed. Open MPI checks many things before attempting to launch a
child process, but nothing is perfect. This error may be indicative
of another problem on the target host, or even something as silly as
having specified a directory for your application. Your job will now
abort.
Local host: machine001
Working dir: /home/user/myapp/bin
Application name: /home/user/myapp/bin/pin.sh
Error: Exec format error
--------------------------------------------------------------------------
mpirun: Forwarding signal 18 to job
mpirun: Forwarding signal 18 to job
--------------------------------------------------------------------------
mpirun was unable to start the specified application as it encountered an
error:
Error code: 1
Error name: (null)
Node: machine001
when attempting to start process rank 0.
--------------------------------------------------------------------------
125 total processes failed to start
File locations are good, a priori. Any clue ?

Software watchdog causing system reboot if started at bootup

In my device I enabled software watchdog to monitor a file which is updated every 5 second by a application. I have configured software watchdog as below
file = /data/file_name_to_watch
change = 10
Watchdog is getting started at bootup using below command during bootup:
/usr/sbin/watchdog.sh -f -v -c watchdog.conf
Application which is responsible to update the file(file_name_to_watch) is started after watchdog deamon during bootup. File being monitored by watchdog is updated every 5 seconds by the application.
Problem is that watchdog is rebooting the system if it is started at bootup and this same problem doesn't exist when watchdog is not started at bootup but started manually after application is launched.
dmesg shows "Watchdog did not stop"
Also, changing the watchdog configuration file to below didn't help.
file = /data/file_name_to_watch
change = 20
I have checked that the file is getting updated before 10 seconds elapsed after watchdog is launched during bootup.
Any pointers to debug this problem will be appreciated.
Code which I am using for watchdog: https://layers.openembedded.org/layerindex/recipe/122/
Debugged and found the problem to be time(NULL) returning a huge number in src/file_stat.c
This was happening due to date being not set very early during bootup.

What causes BitBake worker processes to exit unexpectedly?

I have a BitBake build process that runs on a Docker container (CentOS 7). The BitBake fails during recipe gcc-cross-i586-5.2.0-r0: task do_compile on each run that I try it in.
An example of bitbake's output:
NOTE: recipe gcc-cross-i586-5.2.0-r0: task do_compile: Started
ERROR: Worker process (367) exited unexpectedly (-9), shutting down...
ERROR: Worker process (367) exited unexpectedly (-9), shutting down...
ERROR: Worker process (367) exited unexpectedly (-9), shutting down...
ERROR: Worker process (367) exited unexpectedly (-9), shutting down...
NOTE: Tasks Summary: Attempted 1538 tasks of which 17 didn't need to be rerun and all succeeded.
Is this a problem with recipe gcc-cross-i586-5.2.0-r0: task do_compile? Perhaps an out-of-memory error? I don't know what the -9 refers to or how to find out more information about it.
Try:
$ bitbake -c cleansstate gcc-cross ; bitbake -k gcc-cross
How much you have memory of ram?
Report log error here.
This worked for me,
Edit conf/local.conf and decrease the number of working threads by adding the following to you conf/local.conf file (under the build directory):
BB_NUMBER_THREADS = "6"
Just a long shot, -9 in kernel land means EBADF (bad file number.) Is it possible you have done some operations as root and some files are not accessible during the build? Is the issue reproducible? ie. can you rm -rf tmp and does it happen again? Make sure you don't have any permissions issues in your project directory and associated file system(s).

PHP exec(myexe) fails in PHP App, but not CLI. Fails Running Under User "apache"

I have a custom program (e.g. myexe) being executed by a web app using PHP's exec() function. It does not fail when run using the PHP CLI nor does myexe fail when run from the command line with me as a user. I have built myexe so that there are no memory issues when profiled using valgrind. myexe is about 26MB in size.
To simplify the situation, I have run myexe on the command line under the user 'apache' and reproduced the failure.
su -s /bin/sh apache -c "/usr/local/bin/myexe parm1 parm2..."
==> Segmentation fault (core dumped)
BUT when I change the user to myself and run the same command above, it works.
su -s /bin/sh mike -c "/usr/local/bin/myexe parm1 parm2..."
==> WORKS
Here's the error from the system log file:
Jul 9 18:26:15 DEVSTN-1 kernel: myexe[27352]: segfault at 7fffa2bf9ff8 ip 0000000000410324 sp 00007fffa2bfa000 error 6 in myexe[400000+5ae000]
Jul 9 18:26:16 DEVSTN-1 abrt[27353]: Saved core dump of pid 27352 (/usr/local/bin/myexe) to /var/spool/abrt/ccpp-2015-07-09-18:26:15-27352 (13631488 bytes)
Jul 9 18:26:16 DEVSTN-1 abrtd: Directory 'ccpp-2015-07-09-18:26:15-27352' creation detected
Jul 9 18:26:17 DEVSTN-1 abrtd: Executable '/usr/local/bin/myexe' doesn't belong to any package and ProcessUnpackaged is set to 'no'
Jul 9 18:26:17 DEVSTN-1 abrtd: 'post-create' on '/var/spool/abrt/ccpp-2015-07-09-18:26:15-27352' exited with 1
Jul 9 18:26:17 DEVSTN-1 abrtd: Deleting problem directory '/var/spool/abrt/ccpp-2015-07-09-18:26:15-27352'
My configuration:
CentOS6 2.6.32-504.23.4.el6.x86_64
Apache/2.2.15 (CentOS)
PHP Version 5.3.3
Am I correct with assuming that PHP has nothing to do with the error?
What should I do next?
Correct; PHP has nothing to do with the error. This is a segmentation fault caused by invalid memory access (either overflowing a buffer, or accessing already-freed memory) in myexe. It seems to have saved a core dump to /var/spool/abrt/ccpp-2015-07-09-18:26:15-27352, so, try debugging with GDB:
gdb /usr/local/bin/myexe -c /var/spool/abrt/ccpp-2015-07-09-18:26:15-27352
(gdb) bt
And try to see where the executable is failing. To get useful output, it will need to be compiled with debugging symbols. If it doesn't fail running as root or a different user, or running in an interactive terminal, I'd look for bugs that could be triggered by being unable to open a file, unable to read an expected environment variable, etc. to help isolate your problem.
Running the executable under strace might help figure out what's going on as well.
Found the problem by entering a bash shell user user apache and running the program using gdb.
Turns out myexe was trying to create a directory under the user's home dir (/home/apache) which doesn't exist.
What helped me was knowing how to start a shell under a different user and using gdb.
Here's the command to start a shell under another user (apache):
su -s /bin/bash apache

How can I debug this User Space application crash?

I'm running a Qt5.4.0 application on my embedded Linux system (TI AM335x based) and it's stopping to run and I'm having a hard time debugging this. This is a QtWebKit QML example (youtubeview) but other QtWebKit examples are preforming the same for me so it's something WebKit based on my system.
When I run the application, it runs for a second or so, then ends with no messages. There is nothing reported to the syslog or dmesg either. When I kick it off with strace I can see this futex message:
futex(0x2d990, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x2d9ac, FUTEX_WAIT_PRIVATE, 7, NULL <unfinished ...>
+++ exited with 1 +++
Then it stops. Not very helpful... My next though was to debug this with GDB, however GDB crashes when I try to run this:
-sh-4.2# gdb youtubeview
GNU gdb (GDB) 7.5
Copyright (C) 2012 Free Software Foundation, Inc.
...
(gdb) run
Starting program: /usr/share/qt5/examples/webkitqml/youtubeview/youtubeview
/home/mike/ulf_qt_450/ulf/build-ulf/out/work/armv7ahf-vfp-neon-linux-gnueabihf/gdb/7.5-r0.0/gdb-7.5/gdb/utils.c:1081: internal-error: virtual memory exhausted: can't allocate 64652911 bytes.
A problem internal to GDB has been detected,
This issue occurs even if I set a break point at main first, just as soon as it starts running it will get stuck and run out of memory.
Are there other tools or techniques that can be used here to help isolate the issue?
Perhaps arguments to GDB to limit memory use or give some more information about why this Qt program made it crash?
Perhaps some FDs or system variables I could use to figure out why the FUTEX is being held and failing?
I'm not sure where to take this problem right now.
The Qt code itself is pretty simple, and I don't anticipate any issues in here:
#include <QGuiApplication>
#include <QQuickView>
int main(int argc, char* argv[])
{
QGuiApplication app(argc,argv);
QQuickView view;
view.setSource(QUrl("qrc:///" QWEBKIT_EXAMPLE_NAME ".qml"));
view.setResizeMode(QQuickView::SizeRootObjectToView);
view.show();
return app.exec();
return 0;
}
Running gdb on the device, especially with a huge library such as WebKit, is bound to get you out of memory errors.
Instead, run gdbserver on the device, and connect to it via gdb running on the host machine, using the toolchain's cross-gdb for that. In that scenario, the gdb on the host loads all the debug information, while the gdbserver on the device needs almost no memory.
It is even possible to have the separate debug information available on the host and stripped libraries on the device.
Please note that parts of WebKit are always built in release mode, even if the rest of Qt was built in debug mode, if you are going to debug into WebKit you might want to change that in the build system.
Here is a minimal example:
Device:
# gdbserver 192.168.1.2:12345 myapp
Process myapp created; pid = 989
Listening on port 12345
Host:
# arm-none-linux-gnueabi-gdb myapp
GNU gdb (Sourcery G++ Lite 2009q1-203) 6.8.50.20081022-cvs
(gdb) set solib-absolute-prefix /opt/targetroot
(gdb) target remote 192.168.1.42:12345
Remote debugging using 192.168.1.42:12345
(gdb) start
The "remote" target does not support "run". Try "help target" or "continue".
(gdb) break main
Breakpoint 1 at 0x1ab9c: file myapp/main.cpp, line 12.
(gdb) cont
Continuing.
Breakpoint 1, main (argc=1, argv=0xbecfedb4) at myapp/main.cpp:12
12 QApplication app(argc, argv, QApplication::GuiServer);
And you are right that it looks like a problem in QtWebKit itself, not in your application. Good luck!

Resources