Cygwin install does not have shared libraries, or how should I activate the shared libraries? - cygwin

I'm new to Cygwin - so hopefully, someone can point me in the right direction. I would like to be able to choose to use the shared libraries to compile my code. However, so far, it seems that it always uses the static library, and I don't know where exactly I did wrong.
I installed Cygwin on my Windows 10 computer. Created a file: test.c, which contains:
#include <stdio.h>
const char msg[] = "Hello, world.";
int main(void){
puts (msg);
return 0;
}
I then compiled it with:
$ gcc -Wall -c test.c -o test.o
Then I checked the symbols using:
$ nm test.o
It gives me what I expected:
U __main
0000000000000000 T main
0000000000000000 R msg
U puts
where none of the symbols have been assigned addresses yet. This is all good.
Then, I linked it using the following:
$ gcc -Wall test.o –o test
Then checked the symbols like below:
$ nm test
I got the following:
0000000100401080 T main
0000000100401000 T mainCRTStartup
0000000100401640 T malloc
0000000100403000 R msg
0000000100401650 T posix_memalign
00000001004010d0 T puts
while I was expecting the symbol puts being something like
U puts##GLIBC_x.x.x`.
It seems like I did not have the shared libraries, or I'm not using the process correctly. What is wrong then? Thanks.

using objdump
objdump -x test.exe
DLL Name: cygwin1.dll
vma: Hint/Ord Member-Name Bound-To
813c 15 __cxa_atexit
814c 46 __main
8158 108 _dll_crt0
8164 115 _impure_ptr
8174 257 calloc
8180 373 cygwin_detach_dll
8194 375 cygwin_internal
81a8 403 dll_dllcrt0
81b8 579 free
81c0 909 malloc
81cc 1015 posix_memalign
81e0 1170 puts
81e8 1196 realloc
so puts is an external symbol taken from cygwin1.dll shared lib

Related

"Whirlwind Tutorial on Teensy ELF Executables" -- why is the output of ld 10X bigger, 20 years later? [duplicate]

For a university course, I like to compare code-sizes of functionally similar programs if written and compiled using gcc/clang versus assembly. In the process of re-evaluating how to further shrink the size of some executables, I couldn't trust my eyes when the very same assembly code I assembled/linked 2 years ago now has grown >10x in size after building it again (which true for multiple programs, not only helloworld):
$ make
as -32 -o helloworld-asm-2020.o helloworld-asm-2020.s
ld -melf_i386 -o helloworld-asm-2020 helloworld-asm-2020.o
$ ls -l
-rwxr-xr-x 1 xxx users 708 Jul 18 2018 helloworld-asm-2018*
-rwxr-xr-x 1 xxx users 8704 Nov 25 15:00 helloworld-asm-2020*
-rwxr-xr-x 1 xxx users 4724 Nov 25 15:00 helloworld-asm-2020-n*
-rwxr-xr-x 1 xxx users 4228 Nov 25 15:00 helloworld-asm-2020-n-sstripped*
-rwxr-xr-x 1 xxx users 604 Nov 25 15:00 helloworld-asm-2020.o*
-rw-r--r-- 1 xxx users 498 Nov 25 14:44 helloworld-asm-2020.s
The assembly code is:
.code32
.section .data
msg: .ascii "Hello, world!\n"
len = . - msg
.section .text
.globl _start
_start:
movl $len, %edx # EDX = message length
movl $msg, %ecx # ECX = address of message
movl $1, %ebx # EBX = file descriptor (1 = stdout)
movl $4, %eax # EAX = syscall number (4 = write)
int $0x80 # call kernel by interrupt
# and exit
movl $0, %ebx # return code is zero
movl $1, %eax # exit syscall number (1 = exit)
int $0x80 # call kernel again
The same hello world program, compiled using GNU as and GNU ld (always using 32-bit assembly) was 708 bytes then, and has grown to 8.5K now. Even when telling the linker to turn off page alignment (ld -n), it still has almost 4.2K. stripping/sstripping doesn't pay off either.
readelf tells me that the start of section headers is much later in the code (byte 468 vs 8464), but I have no idea why. It's running on the same arch system as in 2018, the Makefile is the same and I'm not linking against any libraries (especially not libc). I guess something regarding ld has changed due to the fact that the object file is still quite small, but what and why?
Disclaimer: I'm building 32-bit executables on an x86-64 machine.
Edit: I'm using GNU binutils (as & ld) version 2.35.1 Here is a base64-encoded archive which includes the source and both executables (small old one, large new one) :
cat << EOF | base64 -d | tar xj
QlpoOTFBWSZTWVaGrEQABBp////xebj/7//Xf+a8RP/v3/rAAEVARARAeEADBAAAoCAI0AQ+NAam
ytMpCGmpDVPU0aNpGmh6Rpo9QAAeoBoADQaNAADQ09IAACSSGUwaJpTNQGE9QZGhoADQPUAA0AAA
AA0aA4AAAABoAAAAA0GgAAAAZAGgAHAAAAANAAAAAGg0AAAADIA0AASJCBIyE8hHpqPVPUPU/VAa
fqn6o0ep6BB6TQaNGj0j1ABobU00yeU9JYiuVVZKYE+dKNa3wls6x81yBpGAN71NoylDUvNryWiW
E4ER8XkfpaJcPb6ND12ULEqkQX3eaBHP70Apa5uFhWNDy+U3Ekj+OLx5MtDHxQHQLfMcgCHrGayE
Dc76F4ZC4rcRkvTW4S2EbJAsbBGbQxSbx5o48zkyk5iPBBhJowtCSwDBsQBc0koYRSO6SgJNL0Bg
EmCoxCDAs5QkEmTGmQUgqZNIoxsmwDmDQe0NIDI0KjQ64leOr1fVk6AaVhjOAJjLrEYkYy4cDbyS
iXSuILWohNh+PA9Izk0YUM4TQQGEYNgn4oEjGmAByO+kzmDIxEC3Txni6E1WdswBJLKYiANdiQ2K
00jU/zpMzuIhjTbgiBqE24dZWBcNBBAAioiEhCQEIfAR8Vir4zNQZFgvKZa67Jckh6EHZWAWuf6Q
kGy1lOtA2h9fsyD/uPPI2kjvoYL+w54IUKBEEYFBIWRNCNpuyY86v3pNiHEB7XyCX5wDjZUSF2tO
w0PVlY2FQNcLQcbZjmMhZdlCGkVHojuICHMMMB5kQQSZRwNJkYTKz6stT/MTWmozDCcj+UjtB9Cf
CUqAqqRlgJdREtMtSO4S4GpJE2I/P8vuO9ckqCM2+iSJCLRWx2Gi8VSR8BIkVX6stqIDmtG8xSVU
kk7BnC5caZXTIynyI0doXiFY1+/Csw2RUQJroC0lCNiIqVVUkTqTRMYqKNVGtCJ5yfo7e3ZpgECk
PYUEihPU0QVgfQ76JA8Eb16KCbSzP3WYiVApqmfDhUk0aVc+jyBJH13uKztUuva8F4YdbpmzomjG
kSJmP+vCFdKkHU384LdRoO0LdN7VJlywJ2xJdM+TMQ0KhMaicvRqfC5pHSu+gVDVjfiss+S00ikI
DeMgatVKKtcjsVDX09XU3SzowLWXXunnFZp/fP3eN9Rj1ubiLc0utMl3CUUkcYsmwbKKrWhaZiLO
u67kMSsW20jVBcZ5tZUKgdRtu0UleWOs1HK2QdMpyKMxTRHWhhHwMnVEsWIUEjIfFEbWhRTRMJXn
oIBSEa2Q0llTBfJV0LEYEQTBTFsDKIxhgqNwZB2dovl/kiW4TLp6aGXxmoIpVeWTEXqg1PnyKwux
caORGyBhTEPV2G7/O3y+KeAL9mUM4Zjl1DsDKyTZy8vgn31EDY08rY+64Z/LO5tcRJHttMYsz0Fh
CRN8LTYJL/I/4u5IpwoSCtDViIA=
EOF
Update:
When using ld.gold instead of ld.bfd (to which /usr/bin/ld is symlinked to by default), the executable size becomes as small as expected:
$ cat Makefile
TARGET=helloworld
all:
as -32 -o ${TARGET}-asm.o ${TARGET}-asm.s
ld.bfd -melf_i386 -o ${TARGET}-asm-bfd ${TARGET}-asm.o
ld.gold -melf_i386 -o ${TARGET}-asm-gold ${TARGET}-asm.o
rm ${TARGET}-asm.o
$ make -q
$ ls -l
total 68
-rw-r--r-- 1 eso eso 200 Dec 1 13:57 Makefile
-rwxrwxr-x 1 eso eso 8700 Dec 1 13:57 helloworld-asm-bfd
-rwxrwxr-x 1 eso eso 732 Dec 1 13:57 helloworld-asm-gold
-rw-r--r-- 1 eso eso 498 Dec 1 13:44 helloworld-asm.s
Maybe I just used gold previously without being aware.
It's not 10x in general, it's page-alignment of a couple sections as Jester says, per changes to ld's default linker script for security reasons:
First change: Making sure data from .data isn't present in any of the mapping of .text, so none of that static data is available for ROP / Spectre gadgets in an executable page. (In older ld, that meant the program-headers mapped the same disk-block twice, also into a RW-without-exec segment for the actual .data section. The executable mapping was still read-only.)
More recent change: Separate .rodata from .text into separate segments, again so static data isn't mapped into an executable page. Previously, const char code[]= {...} could be cast to a function pointer and called, without needing mprotect or gcc -z execstack or other tricks, if you wanted to test shellcode that way. (A separate Linux kernel change made -z execstack only apply to the actual stack, not READ_IMPLIES_EXEC.)
See Why an ELF executable could have 4 LOAD segments? for this history, including the strange fact that .rodata is in a separate segment from the read-only mapping for access to the ELF metadata.
That extra space is just 00 padding and will compress well in a .tar.gz or whatever.
So it has a worst-case upper bound of about 2x 4k extra pages of padding, and tiny executables are close to that worst case.
gcc -Wl,--nmagic will turn off page-alignment of sections if you want that for some reason. (see the ld(1) man page) I don't know why that doesn't pack everything down to the old size. Perhaps checking the default linker script would shed some light, but it's pretty long. Run ld --verbose to see it.
stripping won't help for padding that's part of a section; I think it can only remove whole sections.
ld -z noseparate-code uses the old layout, only 2 total segments to cover the .text and .rodata sections, and the .data and .bss sections. (And the ELF metadata that dynamic linking wants access to.)
Related:
Linking with gcc instead of ld
This question is about ld, but note that if you're using gcc -nostdlib, that used to also default to making a static executable. But modern Linux distros config GCC with -pie as the default, and GCC won't make a static-pie by default even if there aren't any shared libraries being linked. Unlike with -no-pie mode where it will simply make a static executable in that case. (A static-pie still needs startup code to apply relocations for any absolute addresses.)
So the equivalent of ld directly is gcc -nostdlib -static (which implies -no-pie). Or gcc -nostdlib -no-pie should let it default to -static when there are no shared libs being linked. You can combine this with -Wl,--nmagic and/or -Wl,-z -Wl,noseparate-code.
Also:
A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux - eventually making a 45 byte executable, with the machine code for an _exit syscall stuffed into the ELF program header itself.
FASM can make quite small executables, using its mode where it outputs a static executable (not object file) directly with no ELF section metadata, just program headers. (It's a pain to debug with GDB or disassemble with objdump; most tools assume there will be section headers, even though they're not needed to run static executables.)
What is a reasonable minimum number of assembly instructions for a small C program including setup?
What's the difference between "statically linked" and "not a dynamic executable" from Linux ldd? (static vs. static-pie vs. (dynamic) PIE that happens to have no shared libraries.)

Minimal executable size now 10x larger after linking than 2 years ago, for tiny programs?

For a university course, I like to compare code-sizes of functionally similar programs if written and compiled using gcc/clang versus assembly. In the process of re-evaluating how to further shrink the size of some executables, I couldn't trust my eyes when the very same assembly code I assembled/linked 2 years ago now has grown >10x in size after building it again (which true for multiple programs, not only helloworld):
$ make
as -32 -o helloworld-asm-2020.o helloworld-asm-2020.s
ld -melf_i386 -o helloworld-asm-2020 helloworld-asm-2020.o
$ ls -l
-rwxr-xr-x 1 xxx users 708 Jul 18 2018 helloworld-asm-2018*
-rwxr-xr-x 1 xxx users 8704 Nov 25 15:00 helloworld-asm-2020*
-rwxr-xr-x 1 xxx users 4724 Nov 25 15:00 helloworld-asm-2020-n*
-rwxr-xr-x 1 xxx users 4228 Nov 25 15:00 helloworld-asm-2020-n-sstripped*
-rwxr-xr-x 1 xxx users 604 Nov 25 15:00 helloworld-asm-2020.o*
-rw-r--r-- 1 xxx users 498 Nov 25 14:44 helloworld-asm-2020.s
The assembly code is:
.code32
.section .data
msg: .ascii "Hello, world!\n"
len = . - msg
.section .text
.globl _start
_start:
movl $len, %edx # EDX = message length
movl $msg, %ecx # ECX = address of message
movl $1, %ebx # EBX = file descriptor (1 = stdout)
movl $4, %eax # EAX = syscall number (4 = write)
int $0x80 # call kernel by interrupt
# and exit
movl $0, %ebx # return code is zero
movl $1, %eax # exit syscall number (1 = exit)
int $0x80 # call kernel again
The same hello world program, compiled using GNU as and GNU ld (always using 32-bit assembly) was 708 bytes then, and has grown to 8.5K now. Even when telling the linker to turn off page alignment (ld -n), it still has almost 4.2K. stripping/sstripping doesn't pay off either.
readelf tells me that the start of section headers is much later in the code (byte 468 vs 8464), but I have no idea why. It's running on the same arch system as in 2018, the Makefile is the same and I'm not linking against any libraries (especially not libc). I guess something regarding ld has changed due to the fact that the object file is still quite small, but what and why?
Disclaimer: I'm building 32-bit executables on an x86-64 machine.
Edit: I'm using GNU binutils (as & ld) version 2.35.1 Here is a base64-encoded archive which includes the source and both executables (small old one, large new one) :
cat << EOF | base64 -d | tar xj
QlpoOTFBWSZTWVaGrEQABBp////xebj/7//Xf+a8RP/v3/rAAEVARARAeEADBAAAoCAI0AQ+NAam
ytMpCGmpDVPU0aNpGmh6Rpo9QAAeoBoADQaNAADQ09IAACSSGUwaJpTNQGE9QZGhoADQPUAA0AAA
AA0aA4AAAABoAAAAA0GgAAAAZAGgAHAAAAANAAAAAGg0AAAADIA0AASJCBIyE8hHpqPVPUPU/VAa
fqn6o0ep6BB6TQaNGj0j1ABobU00yeU9JYiuVVZKYE+dKNa3wls6x81yBpGAN71NoylDUvNryWiW
E4ER8XkfpaJcPb6ND12ULEqkQX3eaBHP70Apa5uFhWNDy+U3Ekj+OLx5MtDHxQHQLfMcgCHrGayE
Dc76F4ZC4rcRkvTW4S2EbJAsbBGbQxSbx5o48zkyk5iPBBhJowtCSwDBsQBc0koYRSO6SgJNL0Bg
EmCoxCDAs5QkEmTGmQUgqZNIoxsmwDmDQe0NIDI0KjQ64leOr1fVk6AaVhjOAJjLrEYkYy4cDbyS
iXSuILWohNh+PA9Izk0YUM4TQQGEYNgn4oEjGmAByO+kzmDIxEC3Txni6E1WdswBJLKYiANdiQ2K
00jU/zpMzuIhjTbgiBqE24dZWBcNBBAAioiEhCQEIfAR8Vir4zNQZFgvKZa67Jckh6EHZWAWuf6Q
kGy1lOtA2h9fsyD/uPPI2kjvoYL+w54IUKBEEYFBIWRNCNpuyY86v3pNiHEB7XyCX5wDjZUSF2tO
w0PVlY2FQNcLQcbZjmMhZdlCGkVHojuICHMMMB5kQQSZRwNJkYTKz6stT/MTWmozDCcj+UjtB9Cf
CUqAqqRlgJdREtMtSO4S4GpJE2I/P8vuO9ckqCM2+iSJCLRWx2Gi8VSR8BIkVX6stqIDmtG8xSVU
kk7BnC5caZXTIynyI0doXiFY1+/Csw2RUQJroC0lCNiIqVVUkTqTRMYqKNVGtCJ5yfo7e3ZpgECk
PYUEihPU0QVgfQ76JA8Eb16KCbSzP3WYiVApqmfDhUk0aVc+jyBJH13uKztUuva8F4YdbpmzomjG
kSJmP+vCFdKkHU384LdRoO0LdN7VJlywJ2xJdM+TMQ0KhMaicvRqfC5pHSu+gVDVjfiss+S00ikI
DeMgatVKKtcjsVDX09XU3SzowLWXXunnFZp/fP3eN9Rj1ubiLc0utMl3CUUkcYsmwbKKrWhaZiLO
u67kMSsW20jVBcZ5tZUKgdRtu0UleWOs1HK2QdMpyKMxTRHWhhHwMnVEsWIUEjIfFEbWhRTRMJXn
oIBSEa2Q0llTBfJV0LEYEQTBTFsDKIxhgqNwZB2dovl/kiW4TLp6aGXxmoIpVeWTEXqg1PnyKwux
caORGyBhTEPV2G7/O3y+KeAL9mUM4Zjl1DsDKyTZy8vgn31EDY08rY+64Z/LO5tcRJHttMYsz0Fh
CRN8LTYJL/I/4u5IpwoSCtDViIA=
EOF
Update:
When using ld.gold instead of ld.bfd (to which /usr/bin/ld is symlinked to by default), the executable size becomes as small as expected:
$ cat Makefile
TARGET=helloworld
all:
as -32 -o ${TARGET}-asm.o ${TARGET}-asm.s
ld.bfd -melf_i386 -o ${TARGET}-asm-bfd ${TARGET}-asm.o
ld.gold -melf_i386 -o ${TARGET}-asm-gold ${TARGET}-asm.o
rm ${TARGET}-asm.o
$ make -q
$ ls -l
total 68
-rw-r--r-- 1 eso eso 200 Dec 1 13:57 Makefile
-rwxrwxr-x 1 eso eso 8700 Dec 1 13:57 helloworld-asm-bfd
-rwxrwxr-x 1 eso eso 732 Dec 1 13:57 helloworld-asm-gold
-rw-r--r-- 1 eso eso 498 Dec 1 13:44 helloworld-asm.s
Maybe I just used gold previously without being aware.
It's not 10x in general, it's page-alignment of a couple sections as Jester says, per changes to ld's default linker script for security reasons:
First change: Making sure data from .data isn't present in any of the mapping of .text, so none of that static data is available for ROP / Spectre gadgets in an executable page. (In older ld, that meant the program-headers mapped the same disk-block twice, also into a RW-without-exec segment for the actual .data section. The executable mapping was still read-only.)
More recent change: Separate .rodata from .text into separate segments, again so static data isn't mapped into an executable page. Previously, const char code[]= {...} could be cast to a function pointer and called, without needing mprotect or gcc -z execstack or other tricks, if you wanted to test shellcode that way. (A separate Linux kernel change made -z execstack only apply to the actual stack, not READ_IMPLIES_EXEC.)
See Why an ELF executable could have 4 LOAD segments? for this history, including the strange fact that .rodata is in a separate segment from the read-only mapping for access to the ELF metadata.
That extra space is just 00 padding and will compress well in a .tar.gz or whatever.
So it has a worst-case upper bound of about 2x 4k extra pages of padding, and tiny executables are close to that worst case.
gcc -Wl,--nmagic will turn off page-alignment of sections if you want that for some reason. (see the ld(1) man page) I don't know why that doesn't pack everything down to the old size. Perhaps checking the default linker script would shed some light, but it's pretty long. Run ld --verbose to see it.
stripping won't help for padding that's part of a section; I think it can only remove whole sections.
ld -z noseparate-code uses the old layout, only 2 total segments to cover the .text and .rodata sections, and the .data and .bss sections. (And the ELF metadata that dynamic linking wants access to.)
Related:
Linking with gcc instead of ld
This question is about ld, but note that if you're using gcc -nostdlib, that used to also default to making a static executable. But modern Linux distros config GCC with -pie as the default, and GCC won't make a static-pie by default even if there aren't any shared libraries being linked. Unlike with -no-pie mode where it will simply make a static executable in that case. (A static-pie still needs startup code to apply relocations for any absolute addresses.)
So the equivalent of ld directly is gcc -nostdlib -static (which implies -no-pie). Or gcc -nostdlib -no-pie should let it default to -static when there are no shared libs being linked. You can combine this with -Wl,--nmagic and/or -Wl,-z -Wl,noseparate-code.
Also:
A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux - eventually making a 45 byte executable, with the machine code for an _exit syscall stuffed into the ELF program header itself.
FASM can make quite small executables, using its mode where it outputs a static executable (not object file) directly with no ELF section metadata, just program headers. (It's a pain to debug with GDB or disassemble with objdump; most tools assume there will be section headers, even though they're not needed to run static executables.)
What is a reasonable minimum number of assembly instructions for a small C program including setup?
What's the difference between "statically linked" and "not a dynamic executable" from Linux ldd? (static vs. static-pie vs. (dynamic) PIE that happens to have no shared libraries.)

how fio loads various io engines when it starts?

fio supports a whole bunch of io engines - all supported engines are present here : https://github.com/axboe/fio/tree/master/engines
I have been trying to understand the internals of how fio works and got stuck on how fio loads all the io engines.
For example I see every engine has a method to register and unregister itself, for example sync.c registers and unregisters using the following methods
fio_syncio_register : https://github.com/axboe/fio/blob/master/engines/sync.c#L448
and fio_syncio_unregister :
https://github.com/axboe/fio/blob/master/engines/sync.c#L461
My question is who calls these methods ?
To find answer I tried running fio under gdb - placed a break point in fio_syncio_register and in the main function, fio_syncio_register gets called even before main which tells me it has something to do with __libc_csu_init
and backtrace confirmed that
(gdb) bt
#0 fio_syncio_register () at engines/sync.c:450
#1 0x000000000047fb9d in __libc_csu_init ()
#2 0x00007ffff6ee27bf in __libc_start_main (main=0x40cd90 <main>, argc=2, argv=0x7fffffffe608, init=0x47fb50 <__libc_csu_init>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe5f8)
at ../csu/libc-start.c:247
#3 0x000000000040ce79 in _start ()
I spent sometime reading about __libc_csu_init and __libc_csu_fini and every single description talks about methods being decorated with __attribute__((constructor)) will be called before main, but in the case of fio sync.c I dont see fio_syncio_register decorated with __attribute__
Can someone please help me out in understanding how this flow works ? Are there other materials I should be reading to understand this ?
Thanks
Interesting question. I couldn't figure out the answer looking at the source, so here are the steps I took:
$ make
$ find . -name 'sync.o'
./engines/sync.o
$ readelf -WS engines/sync.o | grep '\.init'
[12] .init_array INIT_ARRAY 0000000000000000 0021f0 000008 00 WA 0 0 8
[13] .rela.init_array RELA 0000000000000000 0132a0 000018 18 36 12 8
This tells us that global initializers are present in this object. These are called at program startup. What are they?
$ objdump -Dr engines/sync.o | grep -A4 '\.init'
Disassembly of section .init_array:
0000000000000000 <.init_array>:
...
0: R_X86_64_64 .text.startup
Interesting. There is apparently a special .text.startup section. What's in it?
$ objdump -dr engines/sync.o | less
...
Disassembly of section .text.startup:
0000000000000000 <fio_syncio_register>:
0: 48 83 ec 08 sub $0x8,%rsp
4: bf 00 00 00 00 mov $0x0,%edi
5: R_X86_64_32 .data+0x380
9: e8 00 00 00 00 callq e <fio_syncio_register+0xe>
a: R_X86_64_PC32 register_ioengine-0x4
...
Why, it's exactly the function we are looking for. But how did it end up in this special section? To answer that, we can look at preprocessed source (in retrospect, I should have started with that).
How could we get it? The command line to compile sync.o is hidden. Looking in Makefile, we can unhide the command line with QUIET_CC=''.
$ rm engines/sync.o && make QUIET_CC=''
gcc -o engines/sync.o -std=gnu99 -Wwrite-strings -Wall -Wdeclaration-after-statement -g -ffast-math -D_GNU_SOURCE -include config-host.h -I. -I. -O3 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -DBITS_PER_LONG=64 -DFIO_VERSION='"fio-2.16-5-g915ca"' -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DFIO_INTERNAL -DFIO_INC_DEBUG -c engines/sync.c
LINK fio
Now we know the command line, and can produce preprocessed file:
$ gcc -E -dD -std=gnu99 -ffast-math -D_GNU_SOURCE -include config-host.h -I. -I. -O3 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -DBITS_PER_LONG=64 -DFIO_VERSION='"fio-2.16-5-g915ca"' -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DFIO_INTERNAL -DFIO_INC_DEBUG engines/sync.c -o /tmp/sync.i
Looking in /tmp/sync.i, we see:
static void __attribute__((constructor)) fio_syncio_register(void)
{
register_ioengine(&ioengine_rw);
register_ioengine(&ioengine_prw);
...
Hmm, it is __attribute__((constructor)) after all. But how did it get there? Aha! I missed the fio_init on this line:
static void fio_init fio_syncio_register(void)
What does fio_init stand for? Again in /tmp/sync.i:
#define fio_init __attribute__((constructor))
So that is how it works.

Is changing default virtual address in elf header to 0 possible?

Can I change the default virtual address(ph_vaddr) in the elf to 0x0. will this allow access to null pointer?? or the kernel does not allow to load at address 0?
I just want to know that if I change the p_vaddr of some section say .text to 0x0, does linux allow this? Is there some constraint that virtual address can start only after some value? Whenever I was trying to set .text vaddr using ld --section-start anywhere between 0 to 9999 it was getting killed. I want to know what is going on??
Can I change the default virtual address(ph_vaddr) in the elf to 0x0.
Yes, that is in fact how PIE (position independent) executables are usually linked.
echo "int main() { return 0; }" | gcc -xc - -fPIE -pie -o a.out
readelf -l a.out | grep LOAD | head -1
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
Note: above makes an executable that is of type ET_DYN.
will this allow access to null pointer?
No. When the kernel discovers that the .e_type == ET_DYN for the executable, it will relocate all of its segments elsewhere.
You can also make an executable of type ET_EXEC with .p_vaddr == 0, like so:
echo "int main() { return 0; }" | gcc -xc - -o a.out -Wl,-Ttext=0
readelf -l a.out | grep LOAD | head -1
LOAD 0x0000000000200000 0x0000000000000000 0x0000000000000000
The kernel will refuse to run it:
./a.out
Killed
You could mmap(2) with MAP_FIXED a segment starting at (void*)0 but I don't think you should.
I have no idea if changing the virtual address in elf(5) would do the equivalent. Are you speaking of p_vaddr for some segment?
Actually, you should really not use the NULL address in application code on Linux, especially if some of that code is coded in C, because the NULL pointer has a very special meaning, including to the compiler. In particular, some optimizations are done based on the fact that NULL is not dereferencable.
It is well known that GCC does optimize, for instance,
x = *p;
if (!p) goto wasnull;
into just x= *p; because if phas been dereferenced it cannot be NULL; And GCC is right in doing that optimization for application code (not for free-standing one).
Also the kernel is usually doing Address Space Layout Randomization.

How to make profilers (valgrind, perf, pprof) pick up / use local version of library with debugging symbols when using mpirun?

Edit: added important note that it is about debugging MPI application
System installed shared library doesn't have debugging symbols:
$ readelf -S /usr/lib64/libfftw3.so | grep debug
$
I have therefore compiled and instaled in my home directory my owne version, with debugging enabled (--with-debug CFLAGS=-g):
$ $ readelf -S ~/lib64/libfftw3.so | grep debug
[26] .debug_aranges PROGBITS 0000000000000000 001d3902
[27] .debug_pubnames PROGBITS 0000000000000000 001d8552
[28] .debug_info PROGBITS 0000000000000000 001ddebd
[29] .debug_abbrev PROGBITS 0000000000000000 003e221c
[30] .debug_line PROGBITS 0000000000000000 00414306
[31] .debug_str PROGBITS 0000000000000000 0044aa23
[32] .debug_loc PROGBITS 0000000000000000 004514de
[33] .debug_ranges PROGBITS 0000000000000000 0046bc82
I have set both LD_LIBRARY_PATH and LD_RUN_PATH to include ~/lib64 first, and ldd program confirms that local version of library should be used:
$ ldd a.out | grep fftw
libfftw3.so.3 => /home/narebski/lib64/libfftw3.so.3 (0x00007f2ed9a98000)
The program in question is parallel numerical application using MPI (Message Passing Interface). Therefore to run this application one must use mpirun wrapper (e.g. mpirun -np 1 valgrind --tool=callgrind ./a.out). I use OpenMPI implementation.
Nevertheless, various profilers: callgrind tool in Valgrind, CPU profiling google-perfutils and perf doesn't find those debugging symbols, resulting in more or less useless output:
calgrind:
$ callgrind_annotate --include=~/prog/src --inclusive=no --tree=none
[...]
--------------------------------------------------------------------------------
Ir file:function
--------------------------------------------------------------------------------
32,765,904,336 ???:0x000000000014e500 [/usr/lib64/libfftw3.so.3.2.4]
31,342,886,912 /home/narebski/prog/src/nonlinearity.F90:__nonlinearity_MOD_calc_nonlinearity_kxky [/home/narebski/prog/bin/a.out]
30,288,261,120 /home/narebski/gene11/src/axpy.F90:__axpy_MOD_axpy_ij [/home/narebski/prog/bin/a.out]
23,429,390,736 ???:0x00000000000fc5e0 [/usr/lib64/libfftw3.so.3.2.4]
17,851,018,186 ???:0x00000000000fdb80 [/usr/lib64/libmpi.so.1.0.1]
google-perftools:
$ pprof --text a.out prog.prof
Total: 8401 samples
842 10.0% 10.0% 842 10.0% 00007f200522d5f0
619 7.4% 17.4% 5025 59.8% calc_nonlinearity_kxky
517 6.2% 23.5% 517 6.2% axpy_ij
427 5.1% 28.6% 3156 37.6% nl_to_direct_xy
307 3.7% 32.3% 1234 14.7% nl_to_fourier_xy_1d
perf events:
$ perf report --sort comm,dso,symbol
# Events: 80K cycles
#
# Overhead Command Shared Object Symbol
# ........ ....... .................... ............................................
#
32.42% a.out libfftw3.so.3.2.4 [.] fdc4c
16.25% a.out 7fddcd97bb22 [.] 7fddcd97bb22
7.51% a.out libatlas.so.0.0.0 [.] ATL_dcopy_xp1yp1aXbX
6.98% a.out a.out [.] __nonlinearity_MOD_calc_nonlinearity_kxky
5.82% a.out a.out [.] __axpy_MOD_axpy_ij
Edit Added 11-07-2011:
I don't know if it is important, but:
$ file /usr/lib64/libfftw3.so.3.2.4
/usr/lib64/libfftw3.so.3.2.4: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, stripped
and
$ file ~/lib64/libfftw3.so.3.2.4
/home/narebski/lib64/libfftw3.so.3.2.4: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, not stripped
If /usr/lib64/libfftw3.so.3.2.4 is listed in callgrind output, then your LD_LIBRARY_PATH=~/lib64 had no effect.
Try again with export LD_LIBRARY_PATH=$HOME/lib64. Also watch out for any shell scripts you invoke, which might reset your environment.
You and Employed Russian are almost certainly right; the mpirun script is messing things up here. Two options:
Most x86 MPI implementations, as a practical matter, treat just running the executable
./a.out
the same as
mpirun -np 1 ./a.out.
They don't have to do this, but OpenMPI certainly does, as does MPICH2 and IntelMPI. So if you can do the debug serially, you should just be able to
valgrind --tool=callgrind ./a.out.
However, if you do want to run with mpirun, the issue is probably that your ~/.bashrc
(or whatever) is being sourced, undoing your changes to LD_LIBRARY_PATH etc. Easiest is just to temporarily put your changed environment variables in your ~/.bashrc for the duration of the run.
The way recent profiling tools typically handle this situation is to consult an external, matching non-stripped version of the library.
On debian-based Linux distros this is typically done by installing the -dbg suffixed version of a package; on Redhat-based they are named -debuginfo.
In the case of the tools you mentioned above; they will typically Just Work (tm) and find the debug symbols for a library if the debug info package has been installed in the standard location.

Resources