96 bit long double and too short stX registers

96 bit long double and too short stX registers - nasm

Let's consider:
#include <iostream>
int main(){
long double a=1.20, b=12.0;
std::cout << sizeof(long double);
add(a,b);
return 0;
}
g++ -m32 -o main main.cpp
After executing it turned out that size of long double is 12 bytes.
So, now, I would like to implement add function in 32-bit nasm.
How can I deal with addition floating points number? It is not a problem, just use FPU and fadd you can say. But, I have 12 byte number ( 96 bit). Our FPU registers are 80 bit. So it is a problem.
And another issue, I looked at code generated by gcc:
objdump -d main > main.s -M intel
( I know about -S flag, but I got used to objdump) and for addition long doubles the generated code is following:
80486e1: db 6d c8 fld TBYTE PTR [ebp-0x38]
80486e4: db 6d d8 fld TBYTE PTR [ebp-0x28]
80486e7: de c1 faddp st(1),st
How is it possible? After all, we have 80 bit registers and compiler try to load 96 bit numbers to them.
Please make clear my mind. :)

Related

Disassembling Code inside of a C++ program

objdump -D file.o will output something like
2b: 47 rex.RXB
2c: 43 rex.XB
2d: 43 3a 20 rex.XB cmp (%r8),%spl
In my program I have a pointer to the instruction and need to disassemble the first instruction found (not the whole file). What would be the easiest way to do that? For example
uint8_t * inst_ptr = memory_location
std::string human_readable = get_disassembly(instr_ptr)
human_readable = "43 3a 20 rex.XB cmp (%r8),%spl"
Is there linux headers/includes that do this already?
Ive been googling but havent found a good, straight forward solution.

Understanding how $ works in assembly [duplicate]

len: equ 2
len: db 2
Are they the same, producing a label that can be used instead of 2? If not, then what is the advantage or disadvantage of each declaration form? Can they be used interchangeably?

The first is equate, similar to C's:
#define len 2
in that it doesn't actually allocate any space in the final code, it simply sets the len symbol to be equal to 2. Then, when you use len later on in your source code, it's the same as if you're using the constant 2.
The second is define byte, similar to C's:
int len = 2;
It does actually allocate space, one byte in memory, stores a 2 there, and sets len to be the address of that byte.
Here's some pseudo-assembler code that shows the distinction:
line addr code label instruction
---- ---- -------- ----- -----------
1 0000 org 1234h
2 1234 elen equ 2
3 1234 02 dlen db 2
4 1235 44 02 00 mov ax, elen
5 1238 44 34 12 mov ax, dlen
Line 1 simply sets the assembly address to be 1234h, to make it easier to explain what's happening.
In line 2, no code is generated, the assembler simply loads elen into the symbol table with the value 2. Since no code has been generated, the address does not change.
Then, when you use it on line 4, it loads that value into the register.
Line 3 shows that db is different, it actually allocates some space (one byte) and stores the value in that space. It then loads dlen into the symbol table but gives it the value of that address 1234h rather than the constant value 2.
When you later use dlen on line 5, you get the address, which you would have to dereference to get the actual value 2.

Summary
NASM 2.10.09 ELF output:
db does not have any magic effects: it simply outputs bytes directly to the output object file.
If those bytes happen to be in front of a symbol, the symbol will point to that value when the program starts.
If you are on the text section, your bytes will get executed.
Weather you use db or dw, etc. that does not specify the size of the symbol: the st_size field of the symbol table entry is not affected.
equ makes the symbol in the current line have st_shndx == SHN_ABS magic value in its symbol table entry.
Instead of outputting a byte to the current object file location, it outputs it to the st_value field of the symbol table entry.
All else follows from this.
To understand what that really means, you should first understand the basics of the ELF standard and relocation.
SHN_ABS theory
SHN_ABS tells the linker that:
relocation is not to be done on this symbol
the st_value field of the symbol entry is to be used as a value directly
Contrast this with "regular" symbols, in which the value of the symbol is a memory address instead, and must therefore go through relocation.
Since it does not point to memory, SHN_ABS symbols can be effectively removed from the executable by the linker by inlining them.
But they are still regular symbols on object files and do take up memory there, and could be shared amongst multiple files if global.
Sample usage
section .data
x: equ 1
y: db 2
section .text
global _start
_start:
mov al, x
; al == 1
mov al, [y]
; al == 2
Note that since the symbol x contains a literal value, no dereference [] must be done to it like for y.
If we wanted to use x from a C program, we'd need something like:
extern char x;
printf("%d", &x);
and set on the asm:
global x
Empirical observation of generated output
We can observe what we've said before with:
nasm -felf32 -o equ.o equ.asm
ld -melf_i386 -o equ equ.o
Now:
readelf -s equ.o
contains:
Num: Value Size Type Bind Vis Ndx Name
4: 00000001 0 NOTYPE LOCAL DEFAULT ABS x
5: 00000000 0 NOTYPE LOCAL DEFAULT 1 y
Ndx is st_shndx, so we see that x is SHN_ABS while y is not.
Also see that Size is 0 for y: db in no way told y that it was a single byte wide. We could simply add two db directives to allocate 2 bytes there.
And then:
objdump -dr equ
gives:
08048080 <_start>:
8048080: b0 01 mov $0x1,%al
8048082: a0 88 90 04 08 mov 0x8049088,%al
So we see that 0x1 was inlined into instruction, while y got the value of a relocation address 0x8049088.
Tested on Ubuntu 14.04 AMD64.
Docs
http://www.nasm.us/doc/nasmdoc3.html#section-3.2.4:
EQU defines a symbol to a given constant value: when EQU is used, the source line must contain a label. The action of EQU is to define the given label name to the value of its (only) operand. This definition is absolute, and cannot change later. So, for example,
message db 'hello, world'
msglen equ $-message
defines msglen to be the constant 12. msglen may not then be redefined later. This is not a preprocessor definition either: the value of msglen is evaluated once, using the value of $ (see section 3.5 for an explanation of $) at the point of definition, rather than being evaluated wherever it is referenced and using the value of $ at the point of reference.
See also
Analogous question for GAS: Difference between .equ and .word in ARM Assembly? .equiv seems to be the closes GAS equivalent.

equ: preprocessor time. analogous to #define but most assemblers are lacking an #undef, and can't have anything but an atomic constant of fixed number of bytes on the right hand side, so floats, doubles, lists are not supported with most assemblers' equ directive.
db: compile time. the value stored in db is stored in the binary output by the assembler at a specific offset. equ allows you define constants that normally would need to be either hardcoded, or require a mov operation to get. db allows you to have data available in memory before the program even starts.
Here's a nasm demonstrating db:
; I am a 16 byte object at offset 0.
db '----------------'
; I am a 14 byte object at offset 16
; the label foo makes the assembler remember the current 'tell' of the
; binary being written.
foo:
db 'Hello, World!', 0
; I am a 2 byte filler at offset 30 to help readability in hex editor.
db ' .'
; I am a 4 byte object at offset 16 that the offset of foo, which is 16(0x10).
dd foo
An equ can only define a constant up to the largest the assembler supports
example of equ, along with a few common limitations of it.
; OK
ZERO equ 0
; OK(some assemblers won't recognize \r and will need to look up the ascii table to get the value of it).
CR equ 0xD
; OK(some assemblers won't recognize \n and will need to look up the ascii table to get the value of it).
LF equ 0xA
; error: bar.asm:2: warning: numeric constant 102919291299129192919293122 -
; does not fit in 64 bits
; LARGE_INTEGER equ 102919291299129192919293122
; bar.asm:5: error: expression syntax error
; assemblers often don't support float constants, despite fitting in
; reasonable number of bytes. This is one of the many things
; we take for granted in C, ability to precompile floats at compile time
; without the need to create your own assembly preprocessor/assembler.
; PI equ 3.1415926
; bar.asm:14: error: bad syntax for EQU
; assemblers often don't support list constants, this is something C
; does support using define, allowing you to define a macro that
; can be passed as a single argument to a function that takes multiple.
; eg
; #define RED 0xff, 0x00, 0x00, 0x00
; glVertex4f(RED);
; #undef RED
;RED equ 0xff, 0x00, 0x00, 0x00
the resulting binary has no bytes at all because equ does not pollute the image; all references to an equ get replaced by the right hand side of that equ.

How to compile STM32f103 program on ubuntu?

I've some experience with programming stm32 arm cortex m3 micro controllers on Windows using Keil. I now want to move to linux environment and use open source tools to program STM32 cortex m3 devices.
I've researched a bit and found that I can use OpenOCD or Texane's ST Link to flash the chip. I also found out that I'll need a cross compiler to compile the code viz. gcc-arm-none-eabi toolchain.
I want to know what basic source and header files are needed? Which are the core and systems file required to make a simple blink program.
I'm not intending to use HAL libraries as of now. I'm using stm32f103zet6 mcu (a very generic board). I went to http://regalis.com.pl/en/arm-cortex-stm32-gnulinux/ , but couldn't exactly pinpoint the files.
If there is any tutorial to start stm32 programming on linux environment, please let me know.
Any help is appreciated. Thanks!

Here is a very simple example that is fairly portable across the stm32 family. Doesnt do anything useful you have to fill in the blanks to blink an led or something (read the schematic, the manuals, enable the clocks to the gpio, follow the instructions to make it a push/pull output and so on, the set the bit or clear the bit, etc).
I have my reasons for how I do it others have theirs, and we all have various numbers of years or decades of experience behind those opinions. But at the end of they day they are opinions and many different solutions will work.
On the last so many releases of ubuntu you can simply do this to get a toolchain:
apt-get install gcc-arm-linux-gnueabi binutils-arm-linux-gnueabi
Or you can go here and get a pre-built for your operating system
https://launchpad.net/gcc-arm-embedded
flash.s
.cpu cortex-m0
.thumb
.thumb_func
.global _start
_start:
stacktop: .word 0x20001000
.word reset
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.thumb_func
reset:
bl notmain
b hang
.thumb_func
hang: b .
.align
.thumb_func
.globl PUT16
PUT16:
strh r1,[r0]
bx lr
.thumb_func
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.thumb_func
.globl GET32
GET32:
ldr r0,[r0]
bx lr
.thumb_func
.globl dummy
dummy:
bx lr
.end
flash.ld
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
sram.s
.cpu cortex-m0
.thumb
.thumb_func
.global _start
_start:
ldr r0,stacktop
mov sp,r0
bl notmain
b hang
.thumb_func
hang: b .
.align
stacktop: .word 0x20001000
.thumb_func
.globl PUT16
PUT16:
strh r1,[r0]
bx lr
.thumb_func
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.thumb_func
.globl GET32
GET32:
ldr r0,[r0]
bx lr
.thumb_func
.globl dummy
dummy:
bx lr
.end
sram.ld
MEMORY
{
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > ram
.rodata : { *(.rodata*) } > ram
.data : { *(.data*) } > ram
.bss : { *(.bss*) } > ram
}
notmain.c
void PUT32 ( unsigned int, unsigned int );
unsigned int GET32 ( unsigned int );
void dummy ( unsigned int );
#define STK_CSR 0xE000E010
#define STK_RVR 0xE000E014
#define STK_CVR 0xE000E018
#define STK_MASK 0x00FFFFFF
int delay ( unsigned int n )
{
unsigned int ra;
while(n--)
{
while(1)
{
ra=GET32(STK_CSR);
if(ra&(1<<16)) break;
}
}
return(0);
}
int notmain ( void )
{
unsigned int rx;
PUT32(STK_CSR,4);
PUT32(STK_RVR,1000000-1);
PUT32(STK_CVR,0x00000000);
PUT32(STK_CSR,5);
for(rx=0;;rx++)
{
dummy(rx);
delay(50);
dummy(rx);
delay(50);
}
return(0);
}
Makefile
#ARMGNU ?= arm-none-eabi
ARMGNU ?= arm-linux-gnueabi
AOPS = --warn --fatal-warnings -mcpu=cortex-m0
COPS = -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -mcpu=cortex-m0
all : notmain.gcc.thumb.flash.bin notmain.gcc.thumb.sram.bin
clean:
rm -f *.bin
rm -f *.o
rm -f *.elf
rm -f *.list
rm -f *.bc
rm -f *.opt.s
rm -f *.norm.s
rm -f *.hex
#---------------------------------
flash.o : flash.s
$(ARMGNU)-as $(AOPS) flash.s -o flash.o
sram.o : sram.s
$(ARMGNU)-as $(AOPS) sram.s -o sram.o
notmain.gcc.thumb.o : notmain.c
$(ARMGNU)-gcc $(COPS) -mthumb -c notmain.c -o notmain.gcc.thumb.o
notmain.gcc.thumb.flash.bin : flash.ld flash.o notmain.gcc.thumb.o
$(ARMGNU)-ld -o notmain.gcc.thumb.flash.elf -T flash.ld flash.o notmain.gcc.thumb.o
$(ARMGNU)-objdump -D notmain.gcc.thumb.flash.elf > notmain.gcc.thumb.flash.list
$(ARMGNU)-objcopy notmain.gcc.thumb.flash.elf notmain.gcc.thumb.flash.bin -O binary
notmain.gcc.thumb.sram.bin : sram.ld sram.o notmain.gcc.thumb.o
$(ARMGNU)-ld -o notmain.gcc.thumb.sram.elf -T sram.ld sram.o notmain.gcc.thumb.o
$(ARMGNU)-objdump -D notmain.gcc.thumb.sram.elf > notmain.gcc.thumb.sram.list
$(ARMGNU)-objcopy notmain.gcc.thumb.sram.elf notmain.gcc.thumb.sram.hex -O ihex
$(ARMGNU)-objcopy notmain.gcc.thumb.sram.elf notmain.gcc.thumb.sram.bin -O binary
You can also try/use this approach if you prefer. I have my reasons not to, TL;DW.
void dummy ( unsigned int );
#define STK_MASK 0x00FFFFFF
#define STK_CSR (*((volatile unsigned int *)0xE000E010))
#define STK_RVR (*((volatile unsigned int *)0xE000E014))
#define STK_CVR (*((volatile unsigned int *)0xE000E018))
int delay ( unsigned int n )
{
unsigned int ra;
while(n--)
{
while(1)
{
ra=STK_CSR;
if(ra&(1<<16)) break;
}
}
return(0);
}
int notmain ( void )
{
unsigned int rx;
STK_CSR=4;
STK_RVR=1000000-1;
STK_CVR=0x00000000;
STK_CSR=5;
for(rx=0;;rx++)
{
dummy(rx);
delay(50);
dummy(rx);
delay(50);
}
return(0);
}
Between the arm docs which to some extent ST publishes a derivative for you (not everyone does that you should still go to arm). Plus the st docs.
There is uart based bootloader built in (might be usb, etc), that is pretty easy to interface, lets see...my host code to download programs is in the hundreds of lines of code, probably took an evening or an afternoont to write. YMMV. You can get if you dont already have, one of the discovery or nucleo boards, I recommend those anyway, you can use the debug end of it to program other stm32 or even other non st arm chips (not all, depends on what openocd supports, etc, but some) can get those for 30% cheaper than the dedicated stlink usb dongles and you dont need an extension usb cable, etc, etc. YMMV. Can certainly use an stlink with openocd or texane stlink as you have already mentioned.
Due to the way the cortex-m boots I have provided two examples, one for burning to flash the other for downloading via openocd to ram and running that way, could arguably use the flash one too but you have to tweak the start address when you run. I prefer this method. YMMV.
This approach you are portable and completely unencumbered by HAL limitations or requirements, build environments, etc. But I recommend you try the various methods. Bare metal like this the HAL types of bare metal with one or more st solutions and the cmsis approach. Every year or so try again, see if the one you picked is still the one you like.
This example demonstrates though it does not take a whole lot. I picked the cortex-m0 simply to avoid the armv7m thumb2 extensions. thumb without those extensions is the most portable arm instruction set. so again the code does mostly nothing, but does nothing on any stm32 cortex-m with a systick timer.
EDIT
This along with whatever you need to feed the linker would be the minimal non-C code.
.global _start
_start:
.word 0x20001000
.word reset
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
And this is abbreviated depending on the chip vendor and core there can be dozens to hundreds of vectors for every little interrupt of every little thing. The labels reset and hang in this case would be the names of C functions to handle those vectors (the documentation for the chip and core determine what vector handles what). The first vector is always the initalization value of the stack pointer. The second is always reset, the next few are common, after that they are generic interrupt pins on the core that the chip vendor wires up so you have to look at the chip vendor documentation.
The core design is such that registers are preserved for you so you dont need a little bit of assembly. Going without any bootstrap then you assume to not have .bss zeroed nor .data initialized, and you cant return from the reset function, which in a real implementation you wouldnt but for demonstration tests, you might (blink an led 10 times then program is finished).
Your toolchain may have some other way to do this. Since all toolchains should have an assembler and assemblers can generate tables of words, there is always that option, doesnt really make sense to create yet another tool and language for this but some folks feel the need. Your toolchain may not require the entry point named _start and/or it may have a different entry point name requirement.
Even if you use Kiel, you should also try the gnu tools, easy(easier) to get, significantly more support and experience in the world than for Kiel. May not produce as "good" of code as Kiel, performance wise or other, but should always have that in your back pocket as you will always be able to find help with gnu tools.

http://gnuarmeclipse.github.io/
There you'll find everything, including an IDE (Eclipse), toolchain, debugger, headers.

Look at this package. This is IDE + toolchain + debugger and it available for linux platforms. You can research it and get any ideas to do what you want. I hope most of linux programs have commnad line interface.
In addition I can suggest to you: try to use LL api if it already available for your mcu.

Return To Libc with Null byte in the addr

I am trying to perform a return to libc format string attack, but the address I want to write to ( 0x0804a000) has a null byte in it!! I have to read in my format string to snprintf so the null byte causes it to malfunction and Segfaults randomly.
buf[70];
snprintf(buf, 80, argv[1]);
printf(buf);
Here is the GDB dump for printf#plt:
(gdb) disassem 0x080483c0
Dump of assembler code for function printf#plt:
0x080483c0 <+0>: jmp *0x804a000
0x080483c6 <+6>: push $0x0
0x080483cb <+11>: jmp 0x80483b0
End of assembler dump.
Does anyone have any ideas?
My current method is running it like this
./program `perl -e 'print "sh;#\x00\xa0\x04\x08%12345x%10$hn"'`
but there is a null byte. I have also tried
./program `perl -e 'print "sh;#\xff\x9f\x04\x08\x00\xa0\x04\x08%12345x%10$hn%12345x%11$hn"'`
but the address before 0x0804a000 has the global offset table, and therefore snprintf Segfaults before even returning the to function that calls it.

A common way out of this is to make some "memory construction" using the stack.
At the end of the construction you need to have somewhere on the stack (let's call that location n and let's say it correspond to the fifth parameter) :
00 a0 04 08
Given that you can start by writing 0x01 (Or whatever you prefer, we are only interested in the null byte) to n-1. The memory at location n will looks like :
00 ?? ?? ??
Then write 0x04a0 into n+1 thus the memory at location n looks like :
00 a0 04 ??
The final step would be to write 0xff08 into n+3.
Once that's done you can use direct parameter accessing to get your address and write a value at the pointed location.
%12345x%5$n
All you have to do is play with $hn and $n and find a way to overlap the data that suit you. I hope you get the idea.

Tool to trace local function calls in Linux

I am looking for a tool like ltrace or strace that can trace locally defined functions in an executable. ltrace only traces dynamic library calls and strace only traces system calls. For example, given the following C program:
#include <stdio.h>
int triple ( int x )
{
return 3 * x;
}
int main (void)
{
printf("%d\n", triple(10));
return 0;
}
Running the program with ltrace will show the call to printf since that is a standard library function (which is a dynamic library on my system) and strace will show all the system calls from the startup code, the system calls used to implement printf, and the shutdown code, but I want something that will show me that the function triple was called. Assuming that the local functions have not been inlined by an optimizing compiler and that the binary has not been stripped (symbols removed), is there a tool that can do this?
Edit
A couple of clarifications:
It is okay if the tool also provides trace information for non-local functions.
I don't want to have to recompile the program(s) with support for specific tools, the symbol information in the executable should be enough.
I would be really nice if I could use the tool to attach to existing processes like I can with ltrace/strace.

Assuming you only want to be notified for specific functions, you can do it like this:
compile with debug informations (as you already have symbol informations, you probably also have enough debugs in)
given
#include <iostream>
int fac(int n) {
if(n == 0)
return 1;
return n * fac(n-1);
}
int main()
{
for(int i=0;i<4;i++)
std::cout << fac(i) << std::endl;
}
Use gdb to trace:
[js#HOST2 cpp]$ g++ -g3 test.cpp
[js#HOST2 cpp]$ gdb ./a.out
(gdb) b fac
Breakpoint 1 at 0x804866a: file test.cpp, line 4.
(gdb) commands 1
Type commands for when breakpoint 1 is hit, one per line.
End with a line saying just "end".
>silent
>bt 1
>c
>end
(gdb) run
Starting program: /home/js/cpp/a.out
#0 fac (n=0) at test.cpp:4
1
#0 fac (n=1) at test.cpp:4
#0 fac (n=0) at test.cpp:4
1
#0 fac (n=2) at test.cpp:4
#0 fac (n=1) at test.cpp:4
#0 fac (n=0) at test.cpp:4
2
#0 fac (n=3) at test.cpp:4
#0 fac (n=2) at test.cpp:4
#0 fac (n=1) at test.cpp:4
#0 fac (n=0) at test.cpp:4
6
Program exited normally.
(gdb)
Here is what i do to collect all function's addresses:
tmp=$(mktemp)
readelf -s ./a.out | gawk '
{
if($4 == "FUNC" && $2 != 0) {
print "# code for " $NF;
print "b *0x" $2;
print "commands";
print "silent";
print "bt 1";
print "c";
print "end";
print "";
}
}' > $tmp;
gdb --command=$tmp ./a.out;
rm -f $tmp
Note that instead of just printing the current frame(bt 1), you can do anything you like, printing the value of some global, executing some shell command or mailing something if it hits the fatal_bomb_exploded function :) Sadly, gcc outputs some "Current Language changed" messages in between. But that's easily grepped out. No big deal.

System Tap can be used on a modern Linux box (Fedora 10, RHEL 5, etc.).
First download the para-callgraph.stp script.
Then run:
$ sudo stap para-callgraph.stp 'process("/bin/ls").function("*")' -c /bin/ls
0 ls(12631):->main argc=0x1 argv=0x7fff1ec3b038
276 ls(12631): ->human_options spec=0x0 opts=0x61a28c block_size=0x61a290
365 ls(12631): <-human_options return=0x0
496 ls(12631): ->clone_quoting_options o=0x0
657 ls(12631): ->xmemdup p=0x61a600 s=0x28
815 ls(12631): ->xmalloc n=0x28
908 ls(12631): <-xmalloc return=0x1efe540
950 ls(12631): <-xmemdup return=0x1efe540
990 ls(12631): <-clone_quoting_options return=0x1efe540
1030 ls(12631): ->get_quoting_style o=0x1efe540
See also: Observe, systemtap and oprofile updates

Using Uprobes (since Linux 3.5)
Assuming you wanted to trace all functions in ~/Desktop/datalog-2.2/datalog when calling it with the parameters -l ~/Desktop/datalog-2.2/add.lua ~/Desktop/datalog-2.2/test.dl
cd /usr/src/linux-`uname -r`/tools/perf
for i in `./perf probe -F -x ~/Desktop/datalog-2.2/datalog`; do sudo ./perf probe -x ~/Desktop/datalog-2.2/datalog $i; done
sudo ./perf record -agR $(for j in $(sudo ./perf probe -l | cut -d' ' -f3); do echo "-e $j"; done) ~/Desktop/datalog-2.2/datalog -l ~/Desktop/datalog-2.2/add.lua ~/Desktop/datalog-2.2/test.dl
sudo ./perf report -G

Assuming you can re-compile (no source change required) the code you want to trace with the gcc option -finstrument-functions, you can use etrace to get the function call graph.
Here is what the output looks like:
\-- main
| \-- Crumble_make_apple_crumble
| | \-- Crumble_buy_stuff
| | | \-- Crumble_buy
| | | \-- Crumble_buy
| | | \-- Crumble_buy
| | | \-- Crumble_buy
| | | \-- Crumble_buy
| | \-- Crumble_prepare_apples
| | | \-- Crumble_skin_and_dice
| | \-- Crumble_mix
| | \-- Crumble_finalize
| | | \-- Crumble_put
| | | \-- Crumble_put
| | \-- Crumble_cook
| | | \-- Crumble_put
| | | \-- Crumble_bake
On Solaris, truss (strace equivalent) has the ability to filter the library to be traced. I'm was surprised when I discovered strace doesn't have such a capability.

KcacheGrind
https://kcachegrind.github.io/html/Home.html
Test program:
int f2(int i) { return i + 2; }
int f1(int i) { return f2(2) + i + 1; }
int f0(int i) { return f1(1) + f2(2); }
int pointed(int i) { return i; }
int not_called(int i) { return 0; }
int main(int argc, char **argv) {
int (*f)(int);
f0(1);
f1(1);
f = pointed;
if (argc == 1)
f(1);
if (argc == 2)
not_called(1);
return 0;
}
Usage:
sudo apt-get install -y kcachegrind valgrind
# Compile the program as usual, no special flags.
gcc -ggdb3 -O0 -o main -std=c99 main.c
# Generate a callgrind.out.<PID> file.
valgrind --tool=callgrind ./main
# Open a GUI tool to visualize callgrind data.
kcachegrind callgrind.out.1234
You are now left inside an awesome GUI program that contains a lot of interesting performance data.
On the bottom right, select the "Call graph" tab. This shows an interactive call graph that correlates to performance metrics in other windows as you click the functions.
To export the graph, right click it and select "Export Graph". The exported PNG looks like this:
From that we can see that:
the root node is _start, which is the actual ELF entry point, and contains glibc initialization boilerplate
f0, f1 and f2 are called as expected from one another
pointed is also shown, even though we called it with a function pointer. It might not have been called if we had passed a command line argument.
not_called is not shown because it didn't get called in the run, because we didn't pass an extra command line argument.
The cool thing about valgrind is that it does not require any special compilation options.
Therefore, you could use it even if you don't have the source code, only the executable.
valgrind manages to do that by running your code through a lightweight "virtual machine".
Tested on Ubuntu 18.04.

$ sudo yum install frysk
$ ftrace -sym:'*' -- ./a.out
More: ftrace.1

If you externalize that function into an external library, you should also be able to see it getting called, ( with ltrace ).
The reason this works is because ltrace puts itself between your app and the library, and when all the code is internalized with the one file it can't intercept the call.
ie: ltrace xterm
spews stuff from X libraries, and X is hardly system.
Outside this, the only real way to do it is compile-time intercept via prof flags or debug symbols.
I just ran over this app, which looks interesting:
http://www.gnu.org/software/cflow/
But I dont think thats what you want.

If the functions aren't inlined, you might even have luck using objdump -d <program>.
For an example, let's take a loot at the beginning of GCC 4.3.2's main routine:
$ objdump `which gcc` -d | grep '\(call\|main\)'
08053270 <main>:
8053270: 8d 4c 24 04 lea 0x4(%esp),%ecx
--
8053299: 89 1c 24 mov %ebx,(%esp)
805329c: e8 8f 60 ff ff call 8049330 <strlen#plt>
80532a1: 8d 04 03 lea (%ebx,%eax,1),%eax
--
80532cf: 89 04 24 mov %eax,(%esp)
80532d2: e8 b9 c9 00 00 call 805fc90 <xmalloc_set_program_name>
80532d7: 8b 5d 9c mov 0xffffff9c(%ebp),%ebx
--
80532e4: 89 04 24 mov %eax,(%esp)
80532e7: e8 b4 a7 00 00 call 805daa0 <expandargv>
80532ec: 8b 55 9c mov 0xffffff9c(%ebp),%edx
--
8053302: 89 0c 24 mov %ecx,(%esp)
8053305: e8 d6 2a 00 00 call 8055de0 <prune_options>
805330a: e8 71 ac 00 00 call 805df80 <unlock_std_streams>
805330f: e8 4c 2f 00 00 call 8056260 <gcc_init_libintl>
8053314: c7 44 24 04 01 00 00 movl $0x1,0x4(%esp)
--
805331c: c7 04 24 02 00 00 00 movl $0x2,(%esp)
8053323: e8 78 5e ff ff call 80491a0 <signal#plt>
8053328: 83 e8 01 sub $0x1,%eax
It takes a bit of effort to wade through all of the assembler, but you can see all possible calls from a given function. It's not as easy to use as gprof or some of the other utilities mentioned, but it has several distinct advantages:
You generally don't need to recompile an application to use it
It shows all possible function calls, whereas something like gprof will only show the executed function calls.

There is a shell script for automatizating tracing function calls with gdb. But it can't attach to running process.
blog.superadditive.com/2007/12/01/call-graphs-using-the-gnu-project-debugger/
Copy of the page - http://web.archive.org/web/20090317091725/http://blog.superadditive.com/2007/12/01/call-graphs-using-the-gnu-project-debugger/
Copy of the tool - callgraph.tar.gz
http://web.archive.org/web/20090317091725/http://superadditive.com/software/callgraph.tar.gz
It dumps all functions from program and generate a gdb command file with breakpoints on each function. At each breakpoint, "backtrace 2" and "continue" are executed.
This script is rather slow on big porject (~ thousands of functions), so i add a filter on function list (via egrep). It was very easy, and I use this script almost evry day.

Gprof might be what you want

See traces, a tracing framework for Linux C/C++ applications:
https://github.com/baruch/traces#readme
It requires recompiling your code with its instrumentor, but will provide a listing of all functions, their parameters and return values. There's an interactive to allow easy navigation of large data samples.

Hopefully the callgrind or cachegrind tools for Valgrind will give you the information you seek.

NOTE: This is not the linux kernel based ftrace, but rather a tool I recently designed to accomplish local function tracing and control flow. Linux ELF x86_64/x86_32 are supported publicly.
https://github.com/leviathansecurity/ftrace

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string