CPU profiler on Google Performance Tool (gperftools) - Process with shared library with NO OUTPUT ISSUE - google-perftools

I had a process on a server. My process uses a shared lib, runing in the linux background. I use CPU profiler in gperftool to examine the functions. The steps is following:
1. in my app,
main ()
{
ProfilerStart("dump.txt");
...code..
ProfilerFlush();
ProfilerStop();
return 0;
}
2. CPUPROFILE_FREQUENCY=1000000 LD_LIBRARY_PATH=/usr/local/lib/libprofiler.so CPUPROFILE=dump.txt ./a.out
3. pprof --text a.out dump.txt
I checked my steps on the other process (not using shared lib), it's ok.
Problem: The dump.txt file is just remain an unchanged file size (8kb or 9kb), can not show the output despite of long time running in 2 or 3 hours (the app receive message from clients). I think that because my app uses the shared lib, some thing wrong here, totally not clear about this.
Can you pls explain me what happened? Any solution?
Thanks a lot,

Part LD_LIBRARY_PATH=/usr/local/lib/libprofiler.so is incorrect in your run.
According to documentation http://goog-perftools.sourceforge.net/doc/cpu_profiler.html
To install the CPU profiler into your executable, add -lprofiler to the link-time step for your executable. (It's also probably possible to add in the profiler at run-time using LD_PRELOAD, but this isn't necessarily recommended.)
you can either add libprofiler to linking step as -lprofiler of your application like
gcc -c myapp.c -o myapp.o
gcc myapp.o mystaticlib.a -Lmypath -lmydynamiclib -lprofiler -o myapp
or add it at with environment variable LD_PRELOAD (not LD_LIBARY_PATH as you did):
LD_PRELOAD=/usr/lib/libprofiler.so ./myapp
When cpu profiler of gperftools is correctly used, it will print information about event count and output file size when application terminates.

Related

How to retrieve memory content from a process core file?

i want to analyse each memory block content produced by a particular process. So what i did was using "gcore pid" to get a core dump of the process, but i do not know how to retrieve the content out, can anyone help?
In general, the good tool to analyze a core dump is the gdb debugger.
So you should compile all your code with the -g flag passed to gcc or g++ or clang (to have DWARF debug information inside your ELF executable).
Then, you can analyze the (post-mortem or not) core dump of your program myprog with the command gdb myprog core. Learn how to use gdb. Notice that gdb is scriptable and extensible (in Python and Guile).
You could (but probably should not) analyze the core file otherwise (without gdb). Then you need to understand its detailed format (and that could require months of work). See elf(5) and core(5).
BTW, valgrind could also be useful.
You could even use gdb to analyze a core dump from a program compiled without -g but that is much less useful.

Core dump is created, but not written to a file?

I'm trying to get a core dump of a proprietary application running on an embedded linux system, for which I wrote some plugins.
What I did was:
ulimit -c unlimited
echo "/tmp/cores/core.%e.%p.%h.%t" > /proc/sys/kernel/core_pattern
kill -3 <PID>`
However, no core dump is created. '/tmp/cores' exists and is writable for everyone, and the disk has enough space available. When I try the same thing with sleep 100 & as an example process and then kill it, the core dump is created.
I tried the example for the pipe syntax from the core manpage, which writes some parameters and the size of the core dump into a file called core.info. This file IS created, and the size is greater than 0. So if the core dump is created, why isn't it written to /tmp/cores? To be sure, I also searched for core* on the file system - it's not there. dmesg doesn't show any errors (but it does if I pipe the core dump to an invalid program).
Some more info: The system is probably based on Debian, but I'm not quite sure. GDB is not available, as well as many other tools - there is only busybox for basic stuff.
The process I'm trying to debug is automatically restarted soon after being killed.
So, I guess one solution would be to modify the example program in order to write the dump to a file instead of just counting bytes. But why doesn't it work just normally if there obviously is some data?
If your proprietary application calls setrlimit(2) with RLIMIT_CORE set to 0, or if it is setuid, no core dump happens. See core(5). Perhaps use strace(1) to find out. And you could install  gdb (perhaps by [cross-] compiling it). See also gcore(1).
Also, check (and maybe set) the limit in the invoking shell. With bash(1) use ulimit builtin. Otherwise, cat /proc/self/limits should display the limits. If you don't have bash you could code a small wrapper in C calling setrlimit then execve ...

Profiling a preloaded shared library with LD_PROFILE

I'm currently trying to profile a preloaded shared library by using the LD_PROFILE environment variable.
I compile the library with "-g" flag and export LD_PROFILE_OUTPUT as well as LD_PROFILE before running an application (ncat in my case) with the preloaded library. So, more precisely what I do is the following:
Compile shared library libexample.so with "-g" flag.
export LD_PROFILE_OUTPUT=`pwd`
export LD_PROFILE=libexample.so
run LD_PRELOAD=`pwd`/libexample.so ncat ...
The preloading itself does work and my library is used, but no file libexample.so.profile gets created. If I use export LD_PROFILE=libc.so.6 instead, there is a file libc.so.6.profile as expected.
Is this a problem of combining LD_PRELOAD and LD_PROFILE or is there anything I might have done wrong?
I'm using glibc v2.12 on CentOS 6.4 if that is of any relevance.
Thanks a lot!
Sorry, I don't know the answer why LD_PROFILE does not work with LD_PRELOAD.
However, for profiling binaries compiled with -g I really like the tool valgrind together with the grapichal tool kcachegrind.
valgrind --tool=callgrind /path/to/some/binary with options
will create a file called something like callgrind.out.1234 where 1234 was the pid of the program when run. That file can be analyzed with:
kcachegrind callgrind.out.1234
In kcachegrind you will easily see in which functions most CPU time is spended, the callee map also shows this in a nise graphical way. The call graph might help to understand how the program works. You will even be able to look at the source code to see how much CPU time is spent on each line.
I hope that you will find valgrind useful even though this was not the answer to your LD_PROFILE question. The drawback of valgrind is that it slows things down both when valgrind is used for profiling and memory checking.

How to set core dump naming scheme without su/sudo?

I am developing a MPI program on a Linux machine where I do not have sudo/su access. As my program currently segfaults, I would like to examine the core dumps via gdb. Unfortunately, as the program is multi-threaded, all the threads write to one core dump. So I would like to be able to append the PID to each separate core dump for every process.
I know there is a way to do it via /proc/sys/kernel/core_pattern, however I do not have access to write to this.
Thanks for any help.
It can be a pain to debug MPI apps on systems that are configured this way when you do not have root access. One option for working around this is to use Valgrind to get stack traces for your segfault(s). This will only be useful provided that your application will fail in a reasonable period of time when slowed down via Valgrind, and that it still segfaults at all in this case.
I usually run MPI apps under Valgrind like this:
% mpiexec -n 5 valgrind -q /path/to/my_app
That will send all of the Valgrind output to standard error. But if I want the output separated into different files, then you can get a bit fancier:
% mpiexec -n 5 valgrind -q --log-file='vg_out.%q{PMI_RANK}' /path/to/my_app
That's the setup for MPICH2. I think that for Open MPI you'll need to replace PMI_RANK with OMPI_MCA_ns_nds_vpid, but if that doesn't work for you then you'll need to check with the Open MPI developers on their discussion list. In either case, this will yield N files, where N is the size of MPI_COMM_WORLD, each named vg_out.0, vg_out.1, ..., to vg_out.$(($N-1)), each corresponding to a rank in MPI_COMM_WORLD.

What is a good way to dump a Linux core file from inside a process?

We have a server (written in C and C++) that currently catches a SEGV and dumps some internal info to a file. I would like to generate a core file and write it to disk at the time we catch the SEGV, so our support reps and customers don't have to fuss with ulimit and then wait for the crash to happen again in order to get a core file. We have used the abort function in the past, but it is subject to the ulimit rules and doesn't help.
We have some legacy code that reads /proc/pid/map and manually generates a core file, but it is out of date, and doesn't seem very portable (for example, I'm guessing it would not work in our 64 bit builds). What is the best way to generate and dump a core file in a Linux process?
Google has a library for generating coredumps from inside a running process called google-coredumper. This should ignore ulimit and other mechanisms.
The documentation for the call that generates the core file is here. According to the documentation, it seems that it is feasible to generate a core file in a signal handler, though it is not guaranteed to always work.
I saw pmbrett's post and thought "hey, thats cool" but couldn't find that utility anywhere on my system ( Gentoo ).
So I did a bit of prodding, and discovered GDB has this option in it.
gdb --pid=4049 --batch -ex gcore
Seemed to work fine for me.
Its not however very useful because it traps the very lowest function that was in use at the time, but it still does a good job outside that ( With no memory limitations, Dumped 350M snapshot of a firefox process with it )
Try using the Linux command gcore
usage: gcore [-o filename] pid
You'll need to use system (or exec) and getpid() to build up the right command line to call it from within your process
Some possible solutions^W ways of dealing with this situation:
Fix the ulimit!!!
Accept that you don't get a core file and run inside gdb, scripted to do a "thread all apply bt" on SIGSEGV
Accept that you don't get a core file and acquired a stack trace from within the application. The Stack Backtracing Inside Your Program article is pretty old but it should be possible these days too.
You can also change the ulimit() from within your program with setrlimit(2). Like the ulimit shell command, this can lower limits, or raise them as hard as the hard limit allows. At startup setrlimit() to allow core dumping, and you're fine.
I assume you have a signal handler that catches SEGV, for example, and does something like print a message and call _exit(). (Otherwise, you'd have a core file in the first place!) You could do something like the following.
void my_handler(int sig)
{
...
if (wantCore_ && !fork()) {
setrlimit(...); // ulimit -Sc unlimited
sigset(sig, SIG_DFL); // reset default handler
raise(sig); // doesn't return, generates a core file
}
_exit(1);
}
use backtrace and backtrace_symbols glibc calls to get the trace, just keep in mind that backtrace_symbols uses malloc internally and in case of heap corruption it might fail.
system ("kill -6 ")
I'd give it a try if you are still looking for something

Resources