TLDR: Can't find the core dump even after setting ulimit and looking into apport. Sick of working so hard to get a single backtrace. Questions on the bottom.
I'm having a little nightmare here. I'm currently doing some c coding, which in my case always means a metric ton of segfaults. Most of the times I'm able to reproduce the bug with little to no problem, but today I hit a wall.
My code produces segfaults inconsistently. I need that core dump it is talking about.
So I'm going on a hunt for a core dump, for my little precious a.out. And that is when I'm starting to pull my hair off.
My intuition would tell me, that core dump files should be stored somewhere in the working directory - which obviously isn't the case. After reading this, I happily typed:
ulimit -c 750000
And... nothing. Output of my program told me that it did the core dump - but I can't find it in cwd. So after reading this I learnt that I should do things to apport and core_pattern.
Changing core_pattern seems a bit too much for getting one core dump, I really don't wan't to mess with it, because I know I will forget about it later. And I tend to mess these things up really badly.
Apport has this magical property of chosing which core dumps are valuable and which are not. It's logs told me...
ERROR: apport (pid 7306) Sun Jan 3 14:42:12 2016: executable does not belong to a package, ignoring
...that my program isn't good enough for it.
Where is this core dump file?
Is there a way to get a core dump a single time manually, without having to set everything up? I rarely need those as files per se, GDB alone is enough most of the time. Something like let_me_look_at_the_core_dump <program name> would be great.
I'm already balding a little, so any help would be appreciated.
So, today I learnt:
ulimit resets after reopening the shell.
I did a big mistake in my .zshrc - zsh nested and reopened itself after typing some commands.
After fiddling a bit with this I also found solution to the second problem. Making a shell script:
ulimit -c 750000
./a.out
gdb ./a.out ./core
ulimit -c 0
echo "profit"
Related
This question already has answers here:
How should strace be used?
(12 answers)
Closed 4 years ago.
I installed a package on my linux machine. When I run the installed binary, it hangs:
$installedBinary --help
is supposed to return a list of command line options. Instead, the program hangs and doesn't respond. It closes when I run control+c.
How can I investigate this problem?
Start with strace -ffo traces ./installedBinary --help. And then inspect traces.* log files, in particular the last lines where it may show what it is blocked on. See strace(1)
You can also do that from htop. Locate the blocked thread and press s for strace and l for lsof.
Maxim Egorushkin's answer is a good one. But on Linux, most programs have some documentation (often, at least a man page, see man(1) & man(7)), and most programs are free software. And the documentation should tell a lot more than --help does (the output of --help is a short summary of the documentation; for example sed(1) explains a lot more than sed --help). Maybe the behavior of your program is explained in the documentation (e.g. depends upon some environment variable).
So you should also read the documentation of your installedBinary and you probably could get its source code, study and recompile it. If you have the source code and have built it, you usually could compile it with DWARF debug information (e.g. adding -g to some CFLAGS in a Makefile...) and run it under gdb
Notice that even on Linux you might have malware (e.g. for Debian or Ubuntu you might have found a .deb source which is used to publish malware; this is unlikely, but not impossible). Trusting a binary package provider is a social issue, not a technical one. Your installedBinary might (in principle) be bad enough to put you in trouble. But it is probably some executable.
Perhaps your installedBinary is always waiting for some input from its stdin (such a behavior might be unusual but is not forbidden) or from some other source. Then you might try installedBinary < /dev/null and even installedBinary --help < /dev/null
I recently did this to my system
ulimit -c unlimited
and it created as designed, a core file for the program I use, ever since, I have had random crashes to my program but I haven't had the possibility to check the core dump to see what errors it gave, as it does daily restart of the program, I assume the previous errors are gone, if they are not, please tell me so I can look them up.
But my question is: is there in any possible way that this new ulimit command I used, be the issue with the server crash? because for years ive runned the same program with no crashes and since this commmand, I have had random crashes from time to time that somewhat feels like it loops for around 5 minutes then restarts the program.
Any help is appreciated, as I cannot reproduce the issue
I'm trying to get a core dump of a proprietary application running on an embedded linux system, for which I wrote some plugins.
What I did was:
ulimit -c unlimited
echo "/tmp/cores/core.%e.%p.%h.%t" > /proc/sys/kernel/core_pattern
kill -3 <PID>`
However, no core dump is created. '/tmp/cores' exists and is writable for everyone, and the disk has enough space available. When I try the same thing with sleep 100 & as an example process and then kill it, the core dump is created.
I tried the example for the pipe syntax from the core manpage, which writes some parameters and the size of the core dump into a file called core.info. This file IS created, and the size is greater than 0. So if the core dump is created, why isn't it written to /tmp/cores? To be sure, I also searched for core* on the file system - it's not there. dmesg doesn't show any errors (but it does if I pipe the core dump to an invalid program).
Some more info: The system is probably based on Debian, but I'm not quite sure. GDB is not available, as well as many other tools - there is only busybox for basic stuff.
The process I'm trying to debug is automatically restarted soon after being killed.
So, I guess one solution would be to modify the example program in order to write the dump to a file instead of just counting bytes. But why doesn't it work just normally if there obviously is some data?
If your proprietary application calls setrlimit(2) with RLIMIT_CORE set to 0, or if it is setuid, no core dump happens. See core(5). Perhaps use strace(1) to find out. And you could install gdb (perhaps by [cross-] compiling it). See also gcore(1).
Also, check (and maybe set) the limit in the invoking shell. With bash(1) use ulimit builtin. Otherwise, cat /proc/self/limits should display the limits. If you don't have bash you could code a small wrapper in C calling setrlimit then execve ...
I've been tasked with writing a script to clean up old core files on production Linux servers. While the script is not difficult to write, I'd like to save a basic stack backtrace to a log file before removing the core files.
Being that these servers are production, and we do not have GDB or any development tools installed, I'm looking for some quick and dirty program that will give the analog of a gdb backtrace command for a multithreaded application.
Does anyone know of such a tool?
Thanks in advance.
There are a few things like this. Mostly they are incomplete relative to gdb -- for example it is uncommon for backtracers to print information about function arguments or locals, but gdb can do this. Also gdb can often unwind in cases where other unwinders choke.
Anyway, one I know of is elfutils. https://fedorahosted.org/elfutils/. It has an unwinder in development (not sure if it is in or yet, check git).
There's also libbacktrace. It is part of gcc and is designed for in-process unwinding. However, it could perhaps be adapted to core files.
There's also libunwind. I hear it is kind of terrible but YMMV.
One thing to note is that many of these require debuginfo to be available.
One last thought -- there has been a lot of work in the "catch a trace" area from the ABRT folks. ABRT uses a kernel hook to catch a core dump while it is being made. Then it does analysis by uploading the core to a server, files bugs, etc. You could maybe reuse a lot of their work. There's some other work in this space as well.
Kind of a brain dump, I hope it helps.
On a particular Debian server, iostat (and similar) report an unexpectedly high volume (in bytes) of disk writes going on. I am having trouble working out which process is doing these writes.
Two interesting points:
Tried turning off system services one at a time to no avail. Disk activity remains fairly constant and unexpectedly high.
Despite the writing, do not seem to be consuming more overall space on the disk.
Both of those make me think that the writing may be something that the kernel is doing, but I'm not swapping, so it's not clear to me what Linux might try to write.
Could try out atop:
http://www.atcomputing.nl/Tools/atop/
but would like to avoid patching my kernel.
Any ideas on how to track this down?
iotop is good (great, actually).
If you have a kernel from before 2.6.20, you can't use most of these tools.
Instead, you can try the following (which should work for almost any 2.6 kernel IIRC):
sudo -s
dmesg -c
/etc/init.d/klogd stop
echo 1 > /proc/sys/vm/block_dump
rm /tmp/disklog
watch "dmesg -c >> /tmp/disklog"
CTRL-C when you're done collecting data
echo 0 > /proc/sys/vm/block_dump
/etc/init.d/klogd start
exit (quit root shell)
cat /tmp/disklog | awk -F"[() \t]" '/(READ|WRITE|dirtied)/ {activity[$1]++} END {for (x in activity) print x, activity[x]}'| sort -nr -k2
The dmesg -c lines clear your kernel log . The logger is then shut off, manually (using watch) dumped to a disk (the memory buffer is small, which is why we need to do this). Let it run for about five minutes or so, and then CTRL-c the watch process. After shutting off the logging and restarting klogd, analyze the results using the little bit of awk at the end.
If you are using a kernel newer than 2.6.20 that is very easy, as that is the first version of Linux kernel that includes I/O accounting. If you are compiling your own kernel, be sure to include:
CONFIG_TASKSTATS=y
CONFIG_TASK_IO_ACCOUNTING=y
Kernels from Debian packages already include these flags, so there is no need for recompiling your kernel. Standard utility for accessing I/O accounting data in real time is iotop(1). It gives you a complete list of processes managed by I/O scheduler, and displays per process statistics for read, write and total I/O bandwidth used.
You may want to investigate iotop for Linux. There are some Solaris versions floating around, but there is a Debian package for example.
You can use the UNIX-command lsof (list open files). That prints out the process, process-id, user for any open file.
You could also use htop, enabling IO_RATR column. Htop is an exelent top replacement.
Brendan Gregg's iosnoop script can (heuristically) tell you about currently using the disk on recent kernels (example iosnoop output).
You could try to use SystemTap , it has a lot of examples , and if I'm not mistaken , it shows how to do this sort of thing .
I've recently heard about Mortadelo, a Filemon clone, but have not checked it out myself yet:
http://gitorious.org/mortadelo