perf can find symbol in the kernel ,but can not find symbol in my program. How to fix it? - linux

you may have read this question:
how can i get perf to find symbols in my program
1)my question is:
when I use perf report , it gives a result like this:
# Overhead Command Shared Object Symbol
# . .
#
99.59% test test [.] 0x000003d4
0.21% test [kernel.kallsyms] [k] __do_fault
0.10% test [kernel.kallsyms] [k] run_timer_softirq
0.10% test [kernel.kallsyms] [k] __update_cpu_load
0.01% test [kernel.kallsyms] [k] set_task_comm
0.00% test [kernel.kallsyms] [k] intel_pmu_enable_all
That is: the perf can find symbol in kernel but cannot find symbol in my program.
my program is here:
void longa()
{
int i,j;
for(i = 0; i < 1000000; i++)
j=i; //am I silly or crazy? I feel boring and desperate.
}
void foo2()
{
int i;
for(i=0 ; i < 10; i++)
longa();
}
void foo1()
{
int i;
for(i = 0; i< 100; i++)
longa();
}
int main(void)
{
foo1();
foo2();
}
2)I have compile the program like:
gcc test.c -g -o test
My env: os:ubuntu kernel:3.10.9

Today, when I run perf test, I got a message saying vmlinux symtab matches kallsyms: Failed.
When I was looking for the reason, I found that the reason is that the value of /proc/sys/kernel/kptr_restrict is 1. When we set it to 0, we will get the symbol in our program.

There are 2 possible sources of the problem:
Your perf tool was compiled without elfutils support.
Your perf tool cannot find libelf.so library on your target.

I ran into the same problem, and found out the reason is that the dwarf feature of my perf was not turned on.
A simple solution is to recompile perf:
% sudo apt-get install libdw-dev
% cd /path/to/perf/source/
% sudo make
% sudo make install
This enables perf to find all symbols!
If it still do not work for you, please refer to this link,
how to compile a Linux perf tool with all features.

Huh, i just tried this and to me it works as it should, afaik. Enviroment is ubuntu 13.04 (with gcc 4.7.3).
If it still does not work for you, you might want to check out that the debug symbols are otherwise fine, with say gdb.
% gcc test.c -g -o test
XXX#YYY
% perf record ./test
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.060 MB perf.data (~2620 samples) ]
XXX#YYY
% perf report --stdio
# ========
# captured on: Wed Oct 16 11:58:40 2013
# hostname : sundberg-office-antec
# os release : 3.8.0-31-generic
# perf version : 3.8.13.8
# arch : x86_64
# nrcpus online : 2
# nrcpus avail : 2
# cpudesc : AMD Phenom(tm) II X2 555 Processor
# cpuid : AuthenticAMD,16,4,3
# total memory : 16434276 kB
# cmdline : /usr/bin/perf_3.8.0-31 record ./test
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, excl_host = 0, excl_guest = 1, precise_ip = 0, id = { 671, 672 }
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: cpu = 4, software = 1, tracepoint = 2, ibs_fetch = 6, ibs_op = 7, breakpoint = 5
# ========
#
# Samples: 1K of event 'cycles'
# Event count (approx.): 1071717616
#
# Overhead Command Shared Object Symbol
# ........ ....... ................. .........................
#
99.85% test test [.] longa
0.08% test [kernel.kallsyms] [k] call_timer_fn
0.08% test [kernel.kallsyms] [k] task_work_run
0.00% test [kernel.kallsyms] [k] clear_page_c
0.00% test [kernel.kallsyms] [k] native_write_msr_safe
#
# (For a higher level overview, try: perf report --sort comm,dso)
#

Related

Record instruction addresses of page faults with perf

I would like to get addresses of the instructions which leads to major page faults using perf.
I have a simple program:
#include <time.h>
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdint.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/mman.h>
int main(int argc, char* argv[]) {
int fd = open("path to large file several Gb", O_RDONLY);
struct stat st;
fstat(fd, &st);
void* ptr = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
const uint8_t* data = (const uint8_t*) ptr;
srand(time(NULL));
size_t i1 = ((double) rand() / RAND_MAX) * st.st_size;
size_t i2 = ((double) rand() / RAND_MAX) * st.st_size;
size_t i3 = ((double) rand() / RAND_MAX) * st.st_size;
printf("%x[%lu], %x[%lu], %x[%lu]\n", data[i1], i1, data[i2], i2, data[i3], i3);
munmap(ptr, st.st_size);
close(fd);
return 0;
}
I compile it using gcc -g -O0 main.c
and run perf record -e major-faults -g -d ./a.out
Next I open the resulting report using perf report -g
The report says that there are 3 major page faults (it's correct),
but I can't understand addresses of the instructions which leads to the page faults.
The report is below:
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 3 of event 'major-faults'
# Event count (approx.): 3
#
# Children Self Command Shared Object Symbol
# ........ ........ ....... ................ ......................
#
100.00% 0.00% a.out libc-2.23.so [.] __libc_start_main
|
---__libc_start_main
main
100.00% 100.00% a.out a.out [.] main
|
---0x33e258d4c544155
__libc_start_main
main
100.00% 0.00% a.out [unknown] [.] 0x033e258d4c544155
|
---0x33e258d4c544155
__libc_start_main
main
a.out doesn't contain an address 0x33e258d4c544155 or something which ends with 155.
The question is how to get instruction addresses which leads to page faults?
For some reason I cannot reproduce your example, i.e. I'm not getting any samples with the major-faults event. But I can explain with a different example.
The pref report output is misleading, it doesn't the three events, it shows the three stack levels. It's much easier to understand by using perf script - which shows the actual events (including their stacks). The entries look like this (repeated for each sample):
a.out 22107 14721.378764: 10000000 cycles:u:
5653c1afb134 main+0x1b (/tmp/a.out)
7f58bb1eeee3 __libc_start_main+0xf3 (/usr/lib/libc-2.29.so)
49564100002cdb3d [unknown] ([unknown])
Now you see the function stack with the virtual instruction address, nearest symbol and offset from the symbol. If you want to fiddle with the addresses yourself, you can run perf script --show-mmap-events, which tells you:
a.out 22107 14721.372233: PERF_RECORD_MMAP2 22107/22107: [0x5653c1afb000(0x1000) # 0x1000 00:2b 463469 624179165]: r-xp /tmp/a.out
^ Base ^ size ^ offset ^ file
Then you can do the math for 0x5653c1afb134 by subtracting the base 0x5653c1afb000 and adding the offset 0x1000 - you get the address of the instruction or return address within the file.
You also see that 0x49564100002cdb3d is not mapped, could not be resolved - it's just garbage from the frame-pointer based stack unwinding. You can ignore it. You can also use --call-graph dwarf or --call-graph lbr which seem to show more sensible stack origins.

Computing offset of a function in memory

I am reading documentation for a uprobe tracer and there is a instruction how to compute offset of a function in memory. I am quoting it here.
Following example shows how to dump the instruction pointer and %ax
register at the probed text address. Probe zfree function in /bin/zsh:
# cd /sys/kernel/debug/tracing/
# cat /proc/`pgrep zsh`/maps | grep /bin/zsh | grep r-xp
00400000-0048a000 r-xp 00000000 08:03 130904 /bin/zsh
# objdump -T /bin/zsh | grep -w zfree
0000000000446420 g DF .text 0000000000000012 Base zfree
0x46420 is the offset of zfree in object /bin/zsh that is loaded at
0x00400000.
I do not know why, but they took output 0x446420 and subtracted 0x400000 to get 0x46420. It seamed as an error to me. Why 0x400000?
I have tried to do the same on my Fedora 23 with 4.5.6-200 kernel.
First I turned off memory address randomization
echo 0 > /proc/sys/kernel/randomize_va_space
Then I figured out where binary is in memory
$ cat /proc/`pgrep zsh`/maps | grep /bin/zsh | grep r-xp
555555554000-55555560f000 r-xp 00000000 fd:00 2387155 /usr/bin/zsh
Took the offset
marko#fedora:~ $ objdump -T /bin/zsh | grep -w zfree
000000000005dc90 g DF .text 0000000000000012 Base zfree
And figured out where zfree is via gdb
$ gdb -p 21067 --batch -ex 'p zfree'
$1 = {<text variable, no debug info>} 0x5555555b1c90 <zfree>
marko#fedora:~ $ python
Python 2.7.11 (default, Mar 31 2016, 20:46:51)
[GCC 5.3.1 20151207 (Red Hat 5.3.1-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> hex(0x5555555b1c90-0x555555554000)
'0x5dc90'
You see, I've got the same result as in objdump without subtracting anything.
But then I tried the same on another machine with SLES and there it's the same as in uprobe documentation.
Why is there such a difference? How do I compute correct offset then?
As far as I see the difference may be caused only by the way how examined binary was built. Saying more precisely - if ELF has fixed load address or not. Lets do simple experiment. We have simple test code:
int main(void) { return 0; }
Then, build it in two ways:
$ gcc -o t1 t.c # create image with fixed load address
$ gcc -o t2 t.c -pie # create load-base independent image
Now, lets check load base addresses for these two images:
$ readelf -l --wide t1 | grep LOAD
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x00067c 0x00067c R E 0x200000
LOAD 0x000680 0x0000000000600680 0x0000000000600680 0x000228 0x000230 RW 0x200000
$ readelf -l --wide t2 | grep LOAD
LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x0008cc 0x0008cc R E 0x200000
LOAD 0x0008d0 0x00000000002008d0 0x00000000002008d0 0x000250 0x000258 RW 0x2000
Here you can see that first image requires fixed load address - 0x400000, and the second one has no address requirements at all.
And now we can compare addresses that objdump tells about main:
$ objdump -t t1 | grep ' main'
00000000004004b6 g F .text 000000000000000b main
$ objdump -t t2 | grep ' main'
0000000000000710 g F .text 000000000000000b main
As we see, the address is a complete virtual address that first byte of main will occupy if image is loaded at address, stored in program header. And of course the second image never won't be loaded at 0x0 but instead at another, randomly chosen location, that will offset real function position.

perf : How to check processess running on particular cpu

Is there any option in perf to look into processes running on a particular cpu /core, and how much percentage of that core is taken by each process.
Reference links would be helpful.
perf is intended to do a profiling which is not good fit for your case. You may try to do sampling /proc/sched_debug (if it is compiled in your kernel). For example you may check which process is currently running on CPU:
egrep '^R|cpu#' /proc/sched_debug
cpu#0, 917.276 MHz
R egrep 2614 37730.177313 ...
cpu#1, 917.276 MHz
R bash 2023 218715.010833 ...
By using his PID as a key, you may check how many CPU time in milliseconds it consumed:
grep se.sum_exec_runtime /proc/2023/sched
se.sum_exec_runtime : 279346.058986
However, as #BrenoLeitão mentioned, SystemTap is quite useful for your script. Here is script for your task.
global cputimes;
global cmdline;
global oncpu;
global NS_PER_SEC = 1000000000;
probe scheduler.cpu_on {
oncpu[pid()] = local_clock_ns();
}
probe scheduler.cpu_off {
if(oncpu[pid()] == 0)
next;
cmdline[pid()] = cmdline_str();
cputimes[pid(), cpu()] <<< local_clock_ns() - oncpu[pid()];
delete oncpu[pid()];
}
probe timer.s(1) {
printf("%6s %3s %6s %s\n", "PID", "CPU", "PCT", "CMDLINE");
foreach([pid+, cpu] in cputimes) {
cpupct = #sum(cputimes[pid, cpu]) * 10000 / NS_PER_SEC;
printf("%6d %3d %3d.%02d %s\n", pid, cpu,
cpupct / 100, cpupct % 100, cmdline[pid]);
}
delete cputimes;
}
It traces moments when process is running on CPU and stops execution on that (due to migration or sleeping) by attaching to scheduler.cpu_on and scheduler.cpu_off probes. Second probe calculates time difference between these events and saves it to cputimes aggregation along with process command line arguments.
timer.s(1) fires once per second -- it walks over aggregation and calculates percentage. Here is sample output for Centos 7 with bash running infinite loop:
0 0 100.16
30 1 0.00
51 0 0.00
380 0 0.02 /usr/bin/python -Es /usr/sbin/tuned -l -P
2016 0 0.08 sshd: root#pts/0 "" "" "" ""
2023 1 100.11 -bash
2630 0 0.04 /usr/libexec/systemtap/stapio -R stap_3020c9e7ba76838179be68cd2390a10c_2630 -F3
I understand that perf is not the proper way to do it, although you can limit perf per CPU, as using perf record -C <cpulist> or even perf stat -c <cpulist>.
The close you are going to see is the context-switch event, but, this is not going to provide you the application names at all.
I think you are going to need something more powerful, as systemtap.

The address where filename has been loaded is missing [GDB]

I have following sample code
#include<stdio.h>
int main()
{
int num1, num2;
printf("Enter two numbers\n");
scanf("%d",&num1);
scanf("%d",&num2);
int i;
for(i = 0; i < num2; i++)
num1 = num1 + num1;
printf("Result is %d \n",num1);
return 0;
}
I compiled this code with -g option to gcc.
gcc -g file.c
Generate separate symbol file
objcopy --only-keep-debug a.out a.out.sym
Strip the symbols from a.out
strip -s a.out
Load this a.out in gdb
gdb a.out
gdb says "no debug information found" fine.
Then I use add-symbol-file command in gdb
(gdb) add-symbol-file a.out.debug [Enter]
The address where a.out.debug has been loaded is missing
I want to know how to find this address?
Is there any command or trick to find it?
This address is representing WHAT?
I know gdb has an other command symbol-file but it overwrites the previous loaded symbols.
So I have to use this command to add many symbol files in gdb.
my system is 64bit running ubuntu LTS 12.04
gdb version is 7.4-2012.04
gcc version is 4.6.3
objcopy --only-keep-debug a.out a.out.sym
If you want GDB to load the a.out.sym automatically, follow the steps outlined here (note in particular that you need to do the "add .gnu_debuglink" step).
This address is representing WHAT
The address GDB wants is the location of .text section of the binary. To find it, use readelf -WS a.out. E.g.
$ readelf -WS /bin/date
There are 28 section headers, starting at offset 0xe350:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .interp PROGBITS 0000000000400238 000238 00001c 00 A 0 0 1
...
[13] .text PROGBITS 0000000000401900 001900 0077f8 00 AX 0 0 16
Here, you want to give GDB 0x401900 as the load address.

Tool to trace local function calls in Linux

I am looking for a tool like ltrace or strace that can trace locally defined functions in an executable. ltrace only traces dynamic library calls and strace only traces system calls. For example, given the following C program:
#include <stdio.h>
int triple ( int x )
{
return 3 * x;
}
int main (void)
{
printf("%d\n", triple(10));
return 0;
}
Running the program with ltrace will show the call to printf since that is a standard library function (which is a dynamic library on my system) and strace will show all the system calls from the startup code, the system calls used to implement printf, and the shutdown code, but I want something that will show me that the function triple was called. Assuming that the local functions have not been inlined by an optimizing compiler and that the binary has not been stripped (symbols removed), is there a tool that can do this?
Edit
A couple of clarifications:
It is okay if the tool also provides trace information for non-local functions.
I don't want to have to recompile the program(s) with support for specific tools, the symbol information in the executable should be enough.
I would be really nice if I could use the tool to attach to existing processes like I can with ltrace/strace.
Assuming you only want to be notified for specific functions, you can do it like this:
compile with debug informations (as you already have symbol informations, you probably also have enough debugs in)
given
#include <iostream>
int fac(int n) {
if(n == 0)
return 1;
return n * fac(n-1);
}
int main()
{
for(int i=0;i<4;i++)
std::cout << fac(i) << std::endl;
}
Use gdb to trace:
[js#HOST2 cpp]$ g++ -g3 test.cpp
[js#HOST2 cpp]$ gdb ./a.out
(gdb) b fac
Breakpoint 1 at 0x804866a: file test.cpp, line 4.
(gdb) commands 1
Type commands for when breakpoint 1 is hit, one per line.
End with a line saying just "end".
>silent
>bt 1
>c
>end
(gdb) run
Starting program: /home/js/cpp/a.out
#0 fac (n=0) at test.cpp:4
1
#0 fac (n=1) at test.cpp:4
#0 fac (n=0) at test.cpp:4
1
#0 fac (n=2) at test.cpp:4
#0 fac (n=1) at test.cpp:4
#0 fac (n=0) at test.cpp:4
2
#0 fac (n=3) at test.cpp:4
#0 fac (n=2) at test.cpp:4
#0 fac (n=1) at test.cpp:4
#0 fac (n=0) at test.cpp:4
6
Program exited normally.
(gdb)
Here is what i do to collect all function's addresses:
tmp=$(mktemp)
readelf -s ./a.out | gawk '
{
if($4 == "FUNC" && $2 != 0) {
print "# code for " $NF;
print "b *0x" $2;
print "commands";
print "silent";
print "bt 1";
print "c";
print "end";
print "";
}
}' > $tmp;
gdb --command=$tmp ./a.out;
rm -f $tmp
Note that instead of just printing the current frame(bt 1), you can do anything you like, printing the value of some global, executing some shell command or mailing something if it hits the fatal_bomb_exploded function :) Sadly, gcc outputs some "Current Language changed" messages in between. But that's easily grepped out. No big deal.
System Tap can be used on a modern Linux box (Fedora 10, RHEL 5, etc.).
First download the para-callgraph.stp script.
Then run:
$ sudo stap para-callgraph.stp 'process("/bin/ls").function("*")' -c /bin/ls
0 ls(12631):->main argc=0x1 argv=0x7fff1ec3b038
276 ls(12631): ->human_options spec=0x0 opts=0x61a28c block_size=0x61a290
365 ls(12631): <-human_options return=0x0
496 ls(12631): ->clone_quoting_options o=0x0
657 ls(12631): ->xmemdup p=0x61a600 s=0x28
815 ls(12631): ->xmalloc n=0x28
908 ls(12631): <-xmalloc return=0x1efe540
950 ls(12631): <-xmemdup return=0x1efe540
990 ls(12631): <-clone_quoting_options return=0x1efe540
1030 ls(12631): ->get_quoting_style o=0x1efe540
See also: Observe, systemtap and oprofile updates
Using Uprobes (since Linux 3.5)
Assuming you wanted to trace all functions in ~/Desktop/datalog-2.2/datalog when calling it with the parameters -l ~/Desktop/datalog-2.2/add.lua ~/Desktop/datalog-2.2/test.dl
cd /usr/src/linux-`uname -r`/tools/perf
for i in `./perf probe -F -x ~/Desktop/datalog-2.2/datalog`; do sudo ./perf probe -x ~/Desktop/datalog-2.2/datalog $i; done
sudo ./perf record -agR $(for j in $(sudo ./perf probe -l | cut -d' ' -f3); do echo "-e $j"; done) ~/Desktop/datalog-2.2/datalog -l ~/Desktop/datalog-2.2/add.lua ~/Desktop/datalog-2.2/test.dl
sudo ./perf report -G
Assuming you can re-compile (no source change required) the code you want to trace with the gcc option -finstrument-functions, you can use etrace to get the function call graph.
Here is what the output looks like:
\-- main
| \-- Crumble_make_apple_crumble
| | \-- Crumble_buy_stuff
| | | \-- Crumble_buy
| | | \-- Crumble_buy
| | | \-- Crumble_buy
| | | \-- Crumble_buy
| | | \-- Crumble_buy
| | \-- Crumble_prepare_apples
| | | \-- Crumble_skin_and_dice
| | \-- Crumble_mix
| | \-- Crumble_finalize
| | | \-- Crumble_put
| | | \-- Crumble_put
| | \-- Crumble_cook
| | | \-- Crumble_put
| | | \-- Crumble_bake
On Solaris, truss (strace equivalent) has the ability to filter the library to be traced. I'm was surprised when I discovered strace doesn't have such a capability.
KcacheGrind
https://kcachegrind.github.io/html/Home.html
Test program:
int f2(int i) { return i + 2; }
int f1(int i) { return f2(2) + i + 1; }
int f0(int i) { return f1(1) + f2(2); }
int pointed(int i) { return i; }
int not_called(int i) { return 0; }
int main(int argc, char **argv) {
int (*f)(int);
f0(1);
f1(1);
f = pointed;
if (argc == 1)
f(1);
if (argc == 2)
not_called(1);
return 0;
}
Usage:
sudo apt-get install -y kcachegrind valgrind
# Compile the program as usual, no special flags.
gcc -ggdb3 -O0 -o main -std=c99 main.c
# Generate a callgrind.out.<PID> file.
valgrind --tool=callgrind ./main
# Open a GUI tool to visualize callgrind data.
kcachegrind callgrind.out.1234
You are now left inside an awesome GUI program that contains a lot of interesting performance data.
On the bottom right, select the "Call graph" tab. This shows an interactive call graph that correlates to performance metrics in other windows as you click the functions.
To export the graph, right click it and select "Export Graph". The exported PNG looks like this:
From that we can see that:
the root node is _start, which is the actual ELF entry point, and contains glibc initialization boilerplate
f0, f1 and f2 are called as expected from one another
pointed is also shown, even though we called it with a function pointer. It might not have been called if we had passed a command line argument.
not_called is not shown because it didn't get called in the run, because we didn't pass an extra command line argument.
The cool thing about valgrind is that it does not require any special compilation options.
Therefore, you could use it even if you don't have the source code, only the executable.
valgrind manages to do that by running your code through a lightweight "virtual machine".
Tested on Ubuntu 18.04.
$ sudo yum install frysk
$ ftrace -sym:'*' -- ./a.out
More: ftrace.1
If you externalize that function into an external library, you should also be able to see it getting called, ( with ltrace ).
The reason this works is because ltrace puts itself between your app and the library, and when all the code is internalized with the one file it can't intercept the call.
ie: ltrace xterm
spews stuff from X libraries, and X is hardly system.
Outside this, the only real way to do it is compile-time intercept via prof flags or debug symbols.
I just ran over this app, which looks interesting:
http://www.gnu.org/software/cflow/
But I dont think thats what you want.
If the functions aren't inlined, you might even have luck using objdump -d <program>.
For an example, let's take a loot at the beginning of GCC 4.3.2's main routine:
$ objdump `which gcc` -d | grep '\(call\|main\)'
08053270 <main>:
8053270: 8d 4c 24 04 lea 0x4(%esp),%ecx
--
8053299: 89 1c 24 mov %ebx,(%esp)
805329c: e8 8f 60 ff ff call 8049330 <strlen#plt>
80532a1: 8d 04 03 lea (%ebx,%eax,1),%eax
--
80532cf: 89 04 24 mov %eax,(%esp)
80532d2: e8 b9 c9 00 00 call 805fc90 <xmalloc_set_program_name>
80532d7: 8b 5d 9c mov 0xffffff9c(%ebp),%ebx
--
80532e4: 89 04 24 mov %eax,(%esp)
80532e7: e8 b4 a7 00 00 call 805daa0 <expandargv>
80532ec: 8b 55 9c mov 0xffffff9c(%ebp),%edx
--
8053302: 89 0c 24 mov %ecx,(%esp)
8053305: e8 d6 2a 00 00 call 8055de0 <prune_options>
805330a: e8 71 ac 00 00 call 805df80 <unlock_std_streams>
805330f: e8 4c 2f 00 00 call 8056260 <gcc_init_libintl>
8053314: c7 44 24 04 01 00 00 movl $0x1,0x4(%esp)
--
805331c: c7 04 24 02 00 00 00 movl $0x2,(%esp)
8053323: e8 78 5e ff ff call 80491a0 <signal#plt>
8053328: 83 e8 01 sub $0x1,%eax
It takes a bit of effort to wade through all of the assembler, but you can see all possible calls from a given function. It's not as easy to use as gprof or some of the other utilities mentioned, but it has several distinct advantages:
You generally don't need to recompile an application to use it
It shows all possible function calls, whereas something like gprof will only show the executed function calls.
There is a shell script for automatizating tracing function calls with gdb. But it can't attach to running process.
blog.superadditive.com/2007/12/01/call-graphs-using-the-gnu-project-debugger/
Copy of the page - http://web.archive.org/web/20090317091725/http://blog.superadditive.com/2007/12/01/call-graphs-using-the-gnu-project-debugger/
Copy of the tool - callgraph.tar.gz
http://web.archive.org/web/20090317091725/http://superadditive.com/software/callgraph.tar.gz
It dumps all functions from program and generate a gdb command file with breakpoints on each function. At each breakpoint, "backtrace 2" and "continue" are executed.
This script is rather slow on big porject (~ thousands of functions), so i add a filter on function list (via egrep). It was very easy, and I use this script almost evry day.
Gprof might be what you want
See traces, a tracing framework for Linux C/C++ applications:
https://github.com/baruch/traces#readme
It requires recompiling your code with its instrumentor, but will provide a listing of all functions, their parameters and return values. There's an interactive to allow easy navigation of large data samples.
Hopefully the callgrind or cachegrind tools for Valgrind will give you the information you seek.
NOTE: This is not the linux kernel based ftrace, but rather a tool I recently designed to accomplish local function tracing and control flow. Linux ELF x86_64/x86_32 are supported publicly.
https://github.com/leviathansecurity/ftrace

Resources