GDB does not load debugging symbols although they are present - linux

I have a binary which was compiled with gcc and debugging symbols enabled:
# file binary
binary: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=f5902695bdca690e84987fb377b69d16e3b47829, not stripped
Using objdump -syms binary I can also see the debugging symbols. However, GDB does not load them when I run gdb ./binary:
Reading symbols from ./binary...(no debugging symbols found)...done.
(gdb) list
No symbol table is loaded. Use the "file" command.
Why is this happening and how can I load the debugging symbols?

Why is this happening and how can I load the debugging symbols?
This is most likely happening because in fact the library does not have debugging symbols.
file binary
... not stripped
Above output does not indicate that the binary has debugging symbols, only that it has a symbol table. So does this: objdump -syms.
To really see debugging symbols, do this: readelf -wi binary (I predict you wouldn't see any).
If debug symbols are in fact present, you should see something like this:
$ readelf -wi ./a.out
Contents of the .debug_info section:
Compilation Unit # offset 0x0:
Length: 0x4e (32-bit)
Version: 4
Abbrev Offset: 0x0
Pointer Size: 8
<0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
<c> DW_AT_producer : (indirect string, offset: 0x5): GNU C11 7.3.0 -mtune=generic -march=x86-64 -g
<10> DW_AT_language : 12 (ANSI C99)
<11> DW_AT_name : t.c
<15> DW_AT_comp_dir : (indirect string, offset: 0x0): /tmp
<19> DW_AT_low_pc : 0x5fa
<21> DW_AT_high_pc : 0xb
<29> DW_AT_stmt_list : 0x0
<1><2d>: Abbrev Number: 2 (DW_TAG_subprogram)
<2e> DW_AT_external : 1
<2e> DW_AT_name : (indirect string, offset: 0x33): main
<32> DW_AT_decl_file : 1
<33> DW_AT_decl_line : 1
<34> DW_AT_type : <0x4a>
<38> DW_AT_low_pc : 0x5fa
<40> DW_AT_high_pc : 0xb
<48> DW_AT_frame_base : 1 byte block: 9c (DW_OP_call_frame_cfa)
<4a> DW_AT_GNU_all_call_sites: 1
<1><4a>: Abbrev Number: 3 (DW_TAG_base_type)
<4b> DW_AT_byte_size : 4
<4c> DW_AT_encoding : 5 (signed)
<4d> DW_AT_name : int
<1><51>: Abbrev Number: 0

Related

How can it be done that functions in .so file are automatically exported?

In Windows, to call a function in a DLL, the function must have an explicit export declaration. For example, __declspec(dllexport) or .def file.
Other than Windows, we can call a function in a .so(shared object file) even if the function has no export declaration. It is much easier for me to make .so than .dll in terms of this.
Meanwhile, I am curious about how non-Windows enables functions defined in .so be called by other programs without having explicit export declaration. I roughly guess that all of the functions in .so file are automatically exported, but I am not sure of it.
An .so file is conventionally a DSO (Dynamic Shared Object, a.k.a shared library) in unix-like OSes. You want to
know how symbols defined in such a file are made visible to the runtime loader
for dynamic linkage of the DSO into the process of some program when
it's executed. That's what you mean by "exported". "Exported" is a somewhat
Windows/DLL-ish term, and is also apt to be confused with "external" or "global",
so we'll say dynamically visible instead.
I'll explain how dynamic visibility of symbols can be controlled in the context of
DSOs built with the GNU toolchain - i.e. compiled with a GCC compiler (gcc,
g++,gfortran, etc.) and linked with the binutils linker ld (or compatible
alternative compiler and linker). I'll illustrate with C code. The mechanics are
the same for other languages.
The symbols defined in an object file are the file-scope variables in the C source code. i.e. variables
that are not defined within any block. Block-scope variables:
{ int i; ... }
are defined only when the enclosing block is being executed and have no permanent
place in an object file.
The symbols defined in an object file generated by GCC are either local or global.
A local symbol can be referenced within the object file where it's defined but
the object file does not reveal it for linkage at all. Not for static linkage.
Not for dynamic linkage. In C, a file-scope variable definition is global
by default and local if it is qualified with the static storage class. So
in this source file:
foobar.c (1)
static int foo(void)
{
return 42;
}
int bar(void)
{
return foo();
}
foo is a local symbol and bar is a global one. If we compile this file
with -save-temps:
$ gcc -save-temps -c -fPIC foobar.c
then GCC will save the assembly listing in foobar.s, and there we can
see how the generated assembly code registers the fact that bar is global and foo is not:
foobar.s (1)
.file "foobar.c"
.text
.type foo, #function
foo:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $42, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size foo, .-foo
.globl bar
.type bar, #function
bar:
.LFB1:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
call foo
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE1:
.size bar, .-bar
.ident "GCC: (Ubuntu 8.2.0-7ubuntu1) 8.2.0"
.section .note.GNU-stack,"",#progbits
The assembler directive .globl bar means that bar is a global symbol.
There is no .globl foo; so foo is local.
And if we inspect the symbols in the object file itself, with
$ readelf -s foobar.o
Symbol table '.symtab' contains 10 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS foobar.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 2
4: 0000000000000000 0 SECTION LOCAL DEFAULT 3
5: 0000000000000000 11 FUNC LOCAL DEFAULT 1 foo
6: 0000000000000000 0 SECTION LOCAL DEFAULT 5
7: 0000000000000000 0 SECTION LOCAL DEFAULT 6
8: 0000000000000000 0 SECTION LOCAL DEFAULT 4
9: 000000000000000b 11 FUNC GLOBAL DEFAULT 1 bar
the message is the same:
5: 0000000000000000 11 FUNC LOCAL DEFAULT 1 foo
...
9: 000000000000000b 11 FUNC GLOBAL DEFAULT 1 bar
The global symbols defined in the object file, and only the global symbols,
are available to the static linker for resolving references in other object files. Indeed
the local symbols only appear in the symbol table of the file at all for possible
use by a debugger or some other object-file probing tool. If we redo the compilation
with even minimal optimisation:
$ gcc -save-temps -O1 -c -fPIC foobar.c
$ readelf -s foobar.o
Symbol table '.symtab' contains 9 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS foobar.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 2
4: 0000000000000000 0 SECTION LOCAL DEFAULT 3
5: 0000000000000000 0 SECTION LOCAL DEFAULT 5
6: 0000000000000000 0 SECTION LOCAL DEFAULT 6
7: 0000000000000000 0 SECTION LOCAL DEFAULT 4
8: 0000000000000000 6 FUNC GLOBAL DEFAULT 1 bar
then foo disappears from the symbol table.
Since global symbols are available to the static linker, we can link a program
with foobar.o that calls bar from another object file:
main.c
#include <stdio.h>
extern int foo(void);
int main(void)
{
printf("%d\n",bar());
return 0;
}
Like so:
$ gcc -c main.c
$ gcc -o prog main.o foobar.o
$ ./prog
42
But as you've noticed, we do not need to change foobar.o in any way to make
bar dynamically visible to the loader. We can just link it as it is into
a shared library:
$ gcc -shared -o libbar.so foobar.o
then dynamically link the same program with that shared library:
$ gcc -o prog main.o libbar.so
and it's fine:
$ ./prog
./prog: error while loading shared libraries: libbar.so: cannot open shared object file: No such file or directory
...Oops. It's fine as long as we let the loader know where libbar.so is, since my
working directory here isn't one of the search directories that it caches by default:
$ export LD_LIBRARY_PATH=.
$ ./prog
42
The object file foobar.o has a table of symbols as we've seen,
in the .symtab section, including (at least) the global symbols that are available to the static linker.
The DSO libbar.so has a symbol table in its .symtab section too. But it also has a dynamic symbol table,
in it's .dynsym section:
$ readelf -s libbar.so
Symbol table '.dynsym' contains 6 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __cxa_finalize
2: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMCloneTable
3: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterTMCloneTab
4: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
5: 00000000000010f5 6 FUNC GLOBAL DEFAULT 9 bar
Symbol table '.symtab' contains 45 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
...
...
21: 0000000000000000 0 FILE LOCAL DEFAULT ABS crtstuff.c
22: 0000000000001040 0 FUNC LOCAL DEFAULT 9 deregister_tm_clones
23: 0000000000001070 0 FUNC LOCAL DEFAULT 9 register_tm_clones
24: 00000000000010b0 0 FUNC LOCAL DEFAULT 9 __do_global_dtors_aux
25: 0000000000004020 1 OBJECT LOCAL DEFAULT 19 completed.7930
26: 0000000000003e88 0 OBJECT LOCAL DEFAULT 14 __do_global_dtors_aux_fin
27: 00000000000010f0 0 FUNC LOCAL DEFAULT 9 frame_dummy
28: 0000000000003e80 0 OBJECT LOCAL DEFAULT 13 __frame_dummy_init_array_
29: 0000000000000000 0 FILE LOCAL DEFAULT ABS foobar.c
30: 0000000000000000 0 FILE LOCAL DEFAULT ABS crtstuff.c
31: 0000000000002094 0 OBJECT LOCAL DEFAULT 12 __FRAME_END__
32: 0000000000000000 0 FILE LOCAL DEFAULT ABS
33: 0000000000003e90 0 OBJECT LOCAL DEFAULT 15 _DYNAMIC
34: 0000000000004020 0 OBJECT LOCAL DEFAULT 18 __TMC_END__
35: 0000000000004018 0 OBJECT LOCAL DEFAULT 18 __dso_handle
36: 0000000000001000 0 FUNC LOCAL DEFAULT 6 _init
37: 0000000000002000 0 NOTYPE LOCAL DEFAULT 11 __GNU_EH_FRAME_HDR
38: 00000000000010fc 0 FUNC LOCAL DEFAULT 10 _fini
39: 0000000000004000 0 OBJECT LOCAL DEFAULT 17 _GLOBAL_OFFSET_TABLE_
40: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __cxa_finalize
41: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMCloneTable
42: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterTMCloneTab
43: 00000000000010f5 6 FUNC GLOBAL DEFAULT 9 bar
The symbols in the dynamic symbol table are the ones that are dynamically visible -
available to the runtime loader. You
can see that bar appears both in the .symtab and in the .dynsym of libbar.so.
In both cases, the symbol has GLOBAL in the bind ( = binding)
column and DEFAULT in the vis ( = visibility) column.
If you want readelf to show you just the dynamic symbol table, then:
readelf --dyn-syms libbar.so
will do it, but not for foobar.o, because an object file has no dynamic symbol table:
$ readelf --dyn-syms foobar.o; echo Done
Done
So the linkage:
$ gcc -shared -o libbar.so foobar.o
creates the dynamic symbol table of libbar.so, and populates it with symbols
the from global symbol table of foobar.o (and various GCC boilerplate
files that GCC adds to the linkage by defauilt).
This makes it look like your guess:
I roughly guess that all of the functions in .so file are automatically exported
is right. In fact it's close, but not correct.
See what happens if I recompile foobar.c
like this:
$ gcc -save-temps -fvisibility=hidden -c -fPIC foobar.c
Let's take another look at the assembly listing:
foobar.s (2)
...
...
.globl bar
.hidden bar
.type bar, #function
bar:
.LFB1:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
call foo
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
...
...
Notice the assembler directive:
.hidden bar
that wasn't there before. .globl bar is still there; bar is still a global
symbol. I can still statically link foobar.o in this program:
$ gcc -o prog main.o foobar.o
$ ./prog
42
And I can still link this shared library:
$ gcc -shared -o libbar.so foobar.o
But I can no longer dynamically link this program:
$ gcc -o prog main.o libbar.so
/usr/bin/ld: main.o: in function `main':
main.c:(.text+0x5): undefined reference to `bar'
collect2: error: ld returned 1 exit status
In foobar.o, bar is still in the symbol table:
$ readelf -s foobar.o | grep bar
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS foobar.c
9: 000000000000000b 11 FUNC GLOBAL HIDDEN 1 bar
but it is now marked HIDDEN in the vis ( = visibility) column of the output.
And bar is still in the symbol table of libbar.so:
$ readelf -s libbar.so | grep bar
29: 0000000000000000 0 FILE LOCAL DEFAULT ABS foobar.c
41: 0000000000001100 11 FUNC LOCAL DEFAULT 9 bar
But this time, it is a LOCAL symbol. It will not be available to the static
linker from libbar.so - as we saw just now when our linkage failed. And it is no longer in the
dynamic symbol table at all:
$ readelf --dyn-syms libbar.so | grep bar; echo done
done
So the effect of -fvisibility=hidden, when compiling foobar.c, is to make
the compiler annotate .globl symbols as .hidden in foobar.o. Then, when
foobar.o is linked into libbar.so, the linker converts every global hidden
symbol to a local symbol in libbar.so, so that it cannot be used to resolve references
whenever libbar.so is linked with something else. And it does not add the hidden
symbols to the dynamic symbol table of libbar.so, so the runtime loader cannot
see them to resolve references dynamically.
The story so far: When the linker creates a shared library, it adds to the dynamic
symbol table all of the global symbols that are defined in the input object files and are not marked hidden
by the compiler. These become the dynamically visible symbols of the shared library. Global symbols are not
hidden by default, but we can hide them with the compiler option -fvisibility=hidden. The visibility
that this option refers to is dynamic visibility.
Now the ability to remove global symbols from dynamic visibility with -fvisibility=hidden
doesn't look very useful yet, because it seems that any object file we compile with
that option can contribute no dynamically visible symbols to a shared library.
But actually, we can control individually which global symbols defined in an object file
will be dynamically visible and which will not. Let's change foobar.c as follows:
foobar.c (2)
static int foo(void)
{
return 42;
}
int __attribute__((visibility("default"))) bar(void)
{
return foo();
}
The __attribute__ syntax you see here is a GCC language extension
that is used to specify properties of symbols that are not expressible in the standard language - such as dynamic visibility. Microsoft's
declspec(dllexport) is an Microsoft language extension with the same effect as GCC's __attribute__((visibility("default"))),
But for GCC, global symbols defined in an object file will possess __attribute__((visibility("default"))) by default, and you
have to compile with -fvisibility=hidden to override that.
Recompile like last time:
$ gcc -fvisibility=hidden -c -fPIC foobar.c
And now the symbol table of foobar.o:
$ readelf -s foobar.o | grep bar
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS foobar.c
9: 000000000000000b 11 FUNC GLOBAL DEFAULT 1 bar
shows bar with DEFAULT visibility once again, despite -fvisibility=hidden. And if we relink libbar.so:
$ gcc -shared -o libbar.so foobar.o
we see that bar is back in the dynamic symbol table:
$ readelf --dyn-syms libbar.so | grep bar
5: 0000000000001100 11 FUNC GLOBAL DEFAULT 9 bar
So, -fvisibility=hidden tells the compiler to mark a global symbol as hidden
unless, in the source code, we explicitly specify a countervailing dynamic visibility
for that symbol.
That's one way to select precisely the symbols from an object file that we wish
to make dynamically visible: pass -fvisibility=hidden to the compiler, and
individually specify __attribute__((visibility("default"))), in the source code, for just
the symbols we want to be dynamically visible.
Another way is not to pass -fvisibility=hidden to the compiler, and indvidually
specify __attribute__((visibility("hidden"))), in the source code, for just the
symbols that we don't want to be dynamically visible. So if we change foobar.c again
like so:
foobar.c (3)
static int foo(void)
{
return 42;
}
int __attribute__((visibility("hidden"))) bar(void)
{
return foo();
}
then recompile with default visibility:
$ gcc -c -fPIC foobar.c
bar reverts to hidden in the object file:
$ readelf -s foobar.o | grep bar
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS foobar.c
9: 000000000000000b 11 FUNC GLOBAL HIDDEN 1 bar
And after relinking libbar.so, bar is again absent from its dynamic symbol
table:
$ gcc -shared -o libbar.so foobar.o
$ readelf --dyn-syms libbar.so | grep bar; echo Done
Done
The professional approach is to minimize the dynamic API of
a DSO to exactly what is specified. With the apparatus we've discussed,
that means compiling with -fvisibility=hidden and using __attribute__((visibility("default"))) to
expose the specified API. A dynamic API can also be controlled - and versioned - with the GNU linker
using a type of linker script called a version-script: that is a
yet more professional approach.
Further reading:
GCC Wiki: Visibility
GCC Manual: Common Function Attributes -> visibility ("visibility_type")

How are function sizes calculated by readelf

I am trying to understand how readelf utility calculates function size. I wrote a simple program
#include <stdio.h>
int main() {
printf("Test!\n");
}
Now to check function size I used this (is this OK ? ):
readelf -sw a.out|sort -n -k 3,3|grep FUNC
which yielded:
1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND puts#GLIBC_2.2.5 (2)
2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main#GLIBC_2.2.5 (2)
29: 0000000000400470 0 FUNC LOCAL DEFAULT 13 deregister_tm_clones
30: 00000000004004a0 0 FUNC LOCAL DEFAULT 13 register_tm_clones
31: 00000000004004e0 0 FUNC LOCAL DEFAULT 13 __do_global_dtors_aux
34: 0000000000400500 0 FUNC LOCAL DEFAULT 13 frame_dummy
48: 0000000000000000 0 FUNC GLOBAL DEFAULT UND puts##GLIBC_2.2.5
50: 00000000004005b4 0 FUNC GLOBAL DEFAULT 14 _fini
51: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main##GLIBC_
58: 0000000000400440 0 FUNC GLOBAL DEFAULT 13 _start
64: 00000000004003e0 0 FUNC GLOBAL DEFAULT 11 _init
45: 00000000004005b0 2 FUNC GLOBAL DEFAULT 13 __libc_csu_fini
60: 000000000040052d 16 FUNC GLOBAL DEFAULT 13 main
56: 0000000000400540 101 FUNC GLOBAL DEFAULT 13 __libc_csu_init
Now if I check the main function's size, it shows 16. How did it arrive at that? Is that the stack size ?
Compiler used gcc version 4.8.5 (Ubuntu 4.8.5-2ubuntu1~14.04.1)
GNU readelf (GNU Binutils for Ubuntu) 2.24
ELF symbols have an attribute st_size which specifies their size (see <elf.h>):
typedef struct
{
...
Elf32_Word st_size; /* Symbol size */
...
} Elf32_Sym;
This attribute is generated by the toolchain generating the binary; e.g. when looking at the assembly code generated by the C compiler:
gcc -c -S test.c
cat test.s
you will see something like
.globl main
.type main, #function
main:
...
.LFE0:
.size main, .-main
where .size is a special as pseudo op.
update:
.size is the size of the code.
Here, .size gets assigned the result of . - main, where "." is the actual address and main the address where main() starts.
In the 'cat test.s' output provided by ensc, the "secret sauce" is the "..." between 'main:' and '.LFE0:'; those are the assembly instructions generated by the compiler that implement the call to printf(). The corresponding machine code for each assembler instruction occupies some number of bytes; "." is incremented by the number of bytes used by each instruction, so at the end of main, ". - main" is the total number of bytes occupied by the machine instructions for main().
It's the compiler that determines which sequences of assembly instructions are needed to execute the printf() call, and that varies from target to target, and from optimization level to optimization level. In your case, your compiler generated machine code occupying 16 bytes. The '.size main, . - main' caused the assembler to create the ELF symbol for main() with its st_size field set to 16.
readelf read the ELF symbol for main, saw that its st_size field was 16, and dutifully reported 16 as the size for main(). readelf doesn't 'calculate' a function's size -- it just reports the st_size field for the function's ELF symbol. The calculation is done by the assembler, when it interprets the '.size main, . - main' directive.

The address where filename has been loaded is missing [GDB]

I have following sample code
#include<stdio.h>
int main()
{
int num1, num2;
printf("Enter two numbers\n");
scanf("%d",&num1);
scanf("%d",&num2);
int i;
for(i = 0; i < num2; i++)
num1 = num1 + num1;
printf("Result is %d \n",num1);
return 0;
}
I compiled this code with -g option to gcc.
gcc -g file.c
Generate separate symbol file
objcopy --only-keep-debug a.out a.out.sym
Strip the symbols from a.out
strip -s a.out
Load this a.out in gdb
gdb a.out
gdb says "no debug information found" fine.
Then I use add-symbol-file command in gdb
(gdb) add-symbol-file a.out.debug [Enter]
The address where a.out.debug has been loaded is missing
I want to know how to find this address?
Is there any command or trick to find it?
This address is representing WHAT?
I know gdb has an other command symbol-file but it overwrites the previous loaded symbols.
So I have to use this command to add many symbol files in gdb.
my system is 64bit running ubuntu LTS 12.04
gdb version is 7.4-2012.04
gcc version is 4.6.3
objcopy --only-keep-debug a.out a.out.sym
If you want GDB to load the a.out.sym automatically, follow the steps outlined here (note in particular that you need to do the "add .gnu_debuglink" step).
This address is representing WHAT
The address GDB wants is the location of .text section of the binary. To find it, use readelf -WS a.out. E.g.
$ readelf -WS /bin/date
There are 28 section headers, starting at offset 0xe350:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .interp PROGBITS 0000000000400238 000238 00001c 00 A 0 0 1
...
[13] .text PROGBITS 0000000000401900 001900 0077f8 00 AX 0 0 16
Here, you want to give GDB 0x401900 as the load address.

Base Address of ELF

I am trying to find the base address of ELF files. I know that you can use readelf to find the Program Entry Point and different section details (base address, size, flags and so on).
For example, programs for x86 architecture are based at 0x8048000 by linker. using readelf I can see the program entry point but no specific field in the output tells the base address.
$ readelf -e test
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Intel 80386
Version: 0x1
Entry point address: 0x8048390
Start of program headers: 52 (bytes into file)
Start of section headers: 4436 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 9
Size of section headers: 40 (bytes)
Number of section headers: 30
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .interp PROGBITS 08048154 000154 000013 00 A 0 0 1
[ 2] .note.ABI-tag NOTE 08048168 000168 000020 00 A 0 0 4
[ 3] .note.gnu.build-i NOTE 08048188 000188 000024 00 A 0 0 4
[ 4] .gnu.hash GNU_HASH 080481ac 0001ac 000024 04 A 5 0 4
[ 5] .dynsym DYNSYM 080481d0 0001d0 000070 10 A 6 1 4
In the section details, I can see that the Offset is calculated with respect to the base address of the ELF.
So, .dynsym section starts at address, 0x080481d0 and offset is 0x1d0. This would mean the base address is, 0x08048000. Is this correct?
similarly, for programs compiled on different architectures like PPC, ARM, MIPS, I cannot see their base address but only the OEP, Section Headers.
You need to check the segment table aka program headers (readelf -l).
Elf file type is EXEC (Executable file)
Entry point 0x804a7a0
There are 9 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x08048034 0x08048034 0x00120 0x00120 R E 0x4
INTERP 0x000154 0x08048154 0x08048154 0x00013 0x00013 R 0x1
[Requesting program interpreter: /lib/ld-linux.so.2]
LOAD 0x000000 0x08048000 0x08048000 0x10fc8 0x10fc8 R E 0x1000
LOAD 0x011000 0x08059000 0x08059000 0x0038c 0x01700 RW 0x1000
DYNAMIC 0x01102c 0x0805902c 0x0805902c 0x000f8 0x000f8 RW 0x4
NOTE 0x000168 0x08048168 0x08048168 0x00020 0x00020 R 0x4
TLS 0x011000 0x08059000 0x08059000 0x00000 0x0005c R 0x4
GNU_EH_FRAME 0x00d3c0 0x080553c0 0x080553c0 0x00c5c 0x00c5c R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
The first (lowest) LOAD segment's virtual address is the default load base of the file. You can see it's 0x08048000 for this file.
The ELF mapping base Address of the .text section is defined by the ld(1) loader script in the binutils project under script template elf.sc on Linux.
The script define the following variables used by the loader ld(1):
# TEXT_START_ADDR - the first byte of the text segment, after any
# headers.
# TEXT_BASE_ADDRESS - the first byte of the text segment.
# TEXT_START_SYMBOLS - symbols that appear at the start of the
# .text section.
You can inspect the current values using the command:
~$ ld --verbose |grep SEGMENT_START
PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;
. = SEGMENT_START("ldata-segment", .);
The text-segment mapping values are:
0x08048000 on 32 Bits
0x400000 on 64 Bits
Also the interpreter base address of an ELF program is defined in the Auxiliary vector array at the index AT_BASE. The Auxiliary vector array is an array of the Elf_auxv_t structure and located after the envp in the process stack. It's configured while loading the ELF binary in the function create_elf_tables() of Linux kernel fs/binfmt_elf.c. The following code snippet show how to read the value:
$ cat at_base.c
#include <stdio.h>
#include <elf.h>
int
main(int argc, char* argv[], char* envp[])
{
Elf64_auxv_t *auxp;
while(*envp++ != NULL);
for (auxp = (Elf64_auxv_t *)envp; auxp->a_type != 0; auxp++) {
if (auxp->a_type == 7) {
printf("AT_BASE: 0x%lx\n", auxp->a_un.a_val);
}
}
}
$ clang -o at_base at_base.c
$ ./at_base
AT_BASE: 0x7fcfd4025000
Linux Auxiliary Vector definition
Auxiliary Vector Reference
It used to be a fixed address on x86 32 bits architecture, but with ASLR now, it's randomized. You can use setarch i386 -R to disable randomization if you want.
It's defined in the linker script. You can dump the default linker script with ld --verbose. Example output:
GNU ld (GNU Binutils) 2.23.1
Supported emulations:
elf_x86_64
elf32_x86_64
elf_i386
i386linux
elf_l1om
elf_k1om
using internal linker script:
==================================================
/* Script for -z combreloc: combine and sort reloc sections */
OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64",
"elf64-x86-64")
OUTPUT_ARCH(i386:x86-64)
ENTRY(_start)
SEARCH_DIR("/nix/store/kxf1p7l7lgm6j5mjzkiwcwzc98s9f1az-binutils-2.23.1/x86_64-unknown-linux-gnu/lib64"); SEARCH_DIR("/nix/store/kxf1p7l7lgm6j5mjzkiwcwzc98s9f1az-binutils-2.23.1/lib64"); SEARCH_DIR("/nix/store/kxf1p7l7lgm6j5mjzkiwcwzc98s9f1az-binutils-2.23.1/x86_64-unknown-linux-gnu/lib"); SEARCH_DIR("/nix/store/kxf1p7l7lgm6j5mjzkiwcwzc98s9f1az-binutils-2.23.1/lib");
SECTIONS
{
/* Read-only sections, merged into text segment: */
PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;
.interp : { *(.interp) }
.note.gnu.build-id : { *(.note.gnu.build-id) }
.hash : { *(.hash) }
.gnu.hash : { *(.gnu.hash) }
.dynsym : { *(.dynsym) }
.dynstr : { *(.dynstr) }
.gnu.version : { *(.gnu.version) }
.gnu.version_d : { *(.gnu.version_d) }
.gnu.version_r : { *(.gnu.version_r) }
(snip)
In case you missed it: __executable_start = SEGMENT_START("text-segment", 0x400000)).
And for me, sure enough, when I link a simple .o file into a binary, the entry point address is very close to 0x400000.
The entry point address in the ELF metadata is this value, plus the offset from the beginning of the .text section to the _start symbol. Note also that the _start symbol can be configured. Again from my default linker script example: ENTRY(_start).

checking if a binary compiled with "-static"

I have a binary file in linux. How can I check whether it has been compiled with "-static" or not?
ldd /path/to/binary should not list any shared libraries if the binary is statically compiled.
You can also use the file command (and objdump could also be useful).
Check if it has a program header of type INTERP
At the lower level, an executable is static if it does not have a program header with type:
Elf32_Phd.p_type == PT_INTERP
This is mentioned in the System V ABI spec.
Remember that program headers determine the ELF segments, including those of type PT_LOAD that will get loaded in to memory and be run.
If that header is present, its contents are exactly the path of the dynamic loader.
readelf check
We can observe this with readelf. First compile a C hello world dynamically:
gcc -o main.out main.c
and then:
readelf --program-headers --wide main.out
outputs:
Elf file type is DYN (Shared object file)
Entry point 0x1050
There are 11 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000040 0x0000000000000040 0x0000000000000040 0x000268 0x000268 R 0x8
INTERP 0x0002a8 0x00000000000002a8 0x00000000000002a8 0x00001c 0x00001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x000560 0x000560 R 0x1000
LOAD 0x001000 0x0000000000001000 0x0000000000001000 0x0001bd 0x0001bd R E 0x1000
LOAD 0x002000 0x0000000000002000 0x0000000000002000 0x000150 0x000150 R 0x1000
LOAD 0x002db8 0x0000000000003db8 0x0000000000003db8 0x000258 0x000260 RW 0x1000
DYNAMIC 0x002dc8 0x0000000000003dc8 0x0000000000003dc8 0x0001f0 0x0001f0 RW 0x8
NOTE 0x0002c4 0x00000000000002c4 0x00000000000002c4 0x000044 0x000044 R 0x4
GNU_EH_FRAME 0x00200c 0x000000000000200c 0x000000000000200c 0x00003c 0x00003c R 0x4
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10
GNU_RELRO 0x002db8 0x0000000000003db8 0x0000000000003db8 0x000248 0x000248 R 0x1
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt
03 .init .plt .plt.got .text .fini
04 .rodata .eh_frame_hdr .eh_frame
05 .init_array .fini_array .dynamic .got .data .bss
06 .dynamic
07 .note.ABI-tag .note.gnu.build-id
08 .eh_frame_hdr
09
10 .init_array .fini_array .dynamic .got
so note the INTERP header is there, and it is so important that readelf even gave a quick preview of its short 28 (0x1c) byte contents: /lib64/ld-linux-x86-64.so.2, which is the path to the dynamic loader (27 bytes long + 1 for \0).
Note how this resides side by side with the other segments, including e.g. those that actually get loaded into memory such as: .text.
We can then more directly extract those bytes without the preview with:
readelf -x .interp main.out
which gives:
Hex dump of section '.interp':
0x000002a8 2f6c6962 36342f6c 642d6c69 6e75782d /lib64/ld-linux-
0x000002b8 7838362d 36342e73 6f2e3200 x86-64.so.2.
as explained at: How can I examine contents of a data section of an ELF file on Linux?
file source code
file 5.36 source code comments at src/readelf.c claim that it also checks for PT_INTERP:
/*
* Look through the program headers of an executable image, searching
* for a PT_INTERP section; if one is found, it's dynamically linked,
* otherwise it's statically linked.
*/
private int
dophn_exec(struct magic_set *ms, int clazz, int swap, int fd, off_t off,
int num, size_t size, off_t fsize, int sh_num, int *flags,
uint16_t *notecount)
{
Elf32_Phdr ph32;
Elf64_Phdr ph64;
const char *linking_style = "statically";
found with git grep statically from the message main.out: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped.
However, this comment seems to be outdated compared to the code, which instead checks for PT_DYNAMIC:
case PT_DYNAMIC:
linking_style = "dynamically";
doread = 1;
break;
I'm not sure why this is done, and I'm lazy to dig into git log now. In particular, this confused me a bit when I tried to make a statically linked PIE executable with --no-dynamic-linker as shown at: How to create a statically linked position independent executable ELF in Linux? which does not have PT_INTERP but does have PT_DYNAMIC, and which I do not expect to use the dynamic loader.
I ended up doing a deeper source analysis for -fPIE at: Why does GCC create a shared object instead of an executable binary according to file? the answer is likely there as well.
Linux kernel source code
The Linux kernel 5.0 reads the ELF file during the exec system call at fs/binfmt_elf.c as explained at: How does kernel get an executable binary file running under linux?
The kernel loops over the program headers at load_elf_binary
for (i = 0; i < loc->elf_ex.e_phnum; i++) {
if (elf_ppnt->p_type == PT_INTERP) {
/* This is the program interpreter used for
* shared libraries - for now assume that this
* is an a.out format binary
*/
I haven't read the code fully, but I would expect then that it only uses the dynamic loader if INTERP is found, otherwise which path should it use?
PT_DYNAMIC is not used in that file.
Bonus: check if -pie was used
I've explained that in detail at: Why does GCC create a shared object instead of an executable binary according to file?

Resources