Is changing default virtual address in elf header to 0 possible? - linux

Can I change the default virtual address(ph_vaddr) in the elf to 0x0. will this allow access to null pointer?? or the kernel does not allow to load at address 0?
I just want to know that if I change the p_vaddr of some section say .text to 0x0, does linux allow this? Is there some constraint that virtual address can start only after some value? Whenever I was trying to set .text vaddr using ld --section-start anywhere between 0 to 9999 it was getting killed. I want to know what is going on??

Can I change the default virtual address(ph_vaddr) in the elf to 0x0.
Yes, that is in fact how PIE (position independent) executables are usually linked.
echo "int main() { return 0; }" | gcc -xc - -fPIE -pie -o a.out
readelf -l a.out | grep LOAD | head -1
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
Note: above makes an executable that is of type ET_DYN.
will this allow access to null pointer?
No. When the kernel discovers that the .e_type == ET_DYN for the executable, it will relocate all of its segments elsewhere.
You can also make an executable of type ET_EXEC with .p_vaddr == 0, like so:
echo "int main() { return 0; }" | gcc -xc - -o a.out -Wl,-Ttext=0
readelf -l a.out | grep LOAD | head -1
LOAD 0x0000000000200000 0x0000000000000000 0x0000000000000000
The kernel will refuse to run it:
./a.out
Killed

You could mmap(2) with MAP_FIXED a segment starting at (void*)0 but I don't think you should.
I have no idea if changing the virtual address in elf(5) would do the equivalent. Are you speaking of p_vaddr for some segment?
Actually, you should really not use the NULL address in application code on Linux, especially if some of that code is coded in C, because the NULL pointer has a very special meaning, including to the compiler. In particular, some optimizations are done based on the fact that NULL is not dereferencable.
It is well known that GCC does optimize, for instance,
x = *p;
if (!p) goto wasnull;
into just x= *p; because if phas been dereferenced it cannot be NULL; And GCC is right in doing that optimization for application code (not for free-standing one).
Also the kernel is usually doing Address Space Layout Randomization.

Related

Where does QEMU load the DTB?

I am writing my own bootloader for aarch64 that must boot linux, and in order to execute it properly I need to follow the linux boot protocol.
Here are some memory mappings: located in my linker file
FLASH_START = 0x000100000;
RAM_START = 0x40000000;
TEXT_START = 0x40080000;
Here is the command I am using to lauch my virt, giving 4 cores and 2GB of RAM
qemu-system-aarch64 -nographic -machine virt -cpu cortex-a72 -kernel pflash.bin -initrd initramfs.cpio.gz -serial mon:stdio -m 2G -smp 4
The pflash.bin has the following layout:
dd if=/dev/zero of=pflash.bin bs=1M count=512
dd if=my_bootloader.img of=pflash.bin conv=notrunc bs=1M count=20
dd if=Kernel of=pflash.bin conv=notrunc bs=1M seek=50
Where: Kernel is the linux kernel image file,
and my_bootloader.img is simply the objcopy of the elf file:
aarch64-linux-gnu-objcopy -O binary my_bootloader.elf my_bootloader.img
And the elf file is created in the following manner:
aarch64-linux-gnu-ld -nostdlib -T link.ld my_bootloader.o -o my_bootloader.elf
Here is my_bootloader.S
.section ".text.startup"
.global _start
_start:
ldr x30, =STACK_TOP
mov sp, x30
ldr x0, =RAM_START
ldr x1, =0x43280000
br x1
ret
As you can see All I have done so far is:
Set up the stack
Presumably I have loaded the DTB to the x0 (just like the linux boot protocol demands)
Branch to the location of the linux kernel
So I have not yet loaded the initramfs.cpio.gz which contains the file system but I should already normally get at least some output from the kernal since the DTB was loaded.
My question is, have I loaded it correctly? And I guess that the simple answer is no. But basically I have no clue where qemu puts the dtb in my RAM, and after looking everywhere on the documentation I cannot seem to find this information.
I would much appreciate if someone could tell me where QEMU loads the dtb so I can put it into x0 and the kernel could gladly read it!
Thanks in advance!
Where the dtb (if any) is is board specific. The QEMU documentation for the Arm 'virt' board tells you where the DTB is:
https://www.qemu.org/docs/master/system/arm/virt.html#hardware-configuration-information-for-bare-metal-programming
However, your command line is incorrect. "-kernel pflash.bin" says "this file is a Linux kernel, boot it in the way that the Linux kernel should be booted". What you want is "load this file into the flash, and start up in the way that the CPU would start out of reset on real hardware". For that you want one of the other ways of loading a guest binary (-bios is probably simplest). And you probably don't want to pass QEMU a -initrd option, either, since that is intended for either (a) QEMU's builtin bootloader or (b) QEMU-aware bootloaders that know how to extract a kernel and initrd from the fw-cfg device.
PS: If you tell QEMU to provide more than one guest CPU then your bootloader will need to deal with the secondary CPUs. That either means using PSCI to start them up, or else handling the fact that all the CPUs start executing the same code out of reset (which one depends on how you choose to start QEMU). You're better off sticking to '-smp 1' to start off with, and come back and deal with SMP later.

"Whirlwind Tutorial on Teensy ELF Executables" -- why is the output of ld 10X bigger, 20 years later? [duplicate]

For a university course, I like to compare code-sizes of functionally similar programs if written and compiled using gcc/clang versus assembly. In the process of re-evaluating how to further shrink the size of some executables, I couldn't trust my eyes when the very same assembly code I assembled/linked 2 years ago now has grown >10x in size after building it again (which true for multiple programs, not only helloworld):
$ make
as -32 -o helloworld-asm-2020.o helloworld-asm-2020.s
ld -melf_i386 -o helloworld-asm-2020 helloworld-asm-2020.o
$ ls -l
-rwxr-xr-x 1 xxx users 708 Jul 18 2018 helloworld-asm-2018*
-rwxr-xr-x 1 xxx users 8704 Nov 25 15:00 helloworld-asm-2020*
-rwxr-xr-x 1 xxx users 4724 Nov 25 15:00 helloworld-asm-2020-n*
-rwxr-xr-x 1 xxx users 4228 Nov 25 15:00 helloworld-asm-2020-n-sstripped*
-rwxr-xr-x 1 xxx users 604 Nov 25 15:00 helloworld-asm-2020.o*
-rw-r--r-- 1 xxx users 498 Nov 25 14:44 helloworld-asm-2020.s
The assembly code is:
.code32
.section .data
msg: .ascii "Hello, world!\n"
len = . - msg
.section .text
.globl _start
_start:
movl $len, %edx # EDX = message length
movl $msg, %ecx # ECX = address of message
movl $1, %ebx # EBX = file descriptor (1 = stdout)
movl $4, %eax # EAX = syscall number (4 = write)
int $0x80 # call kernel by interrupt
# and exit
movl $0, %ebx # return code is zero
movl $1, %eax # exit syscall number (1 = exit)
int $0x80 # call kernel again
The same hello world program, compiled using GNU as and GNU ld (always using 32-bit assembly) was 708 bytes then, and has grown to 8.5K now. Even when telling the linker to turn off page alignment (ld -n), it still has almost 4.2K. stripping/sstripping doesn't pay off either.
readelf tells me that the start of section headers is much later in the code (byte 468 vs 8464), but I have no idea why. It's running on the same arch system as in 2018, the Makefile is the same and I'm not linking against any libraries (especially not libc). I guess something regarding ld has changed due to the fact that the object file is still quite small, but what and why?
Disclaimer: I'm building 32-bit executables on an x86-64 machine.
Edit: I'm using GNU binutils (as & ld) version 2.35.1 Here is a base64-encoded archive which includes the source and both executables (small old one, large new one) :
cat << EOF | base64 -d | tar xj
QlpoOTFBWSZTWVaGrEQABBp////xebj/7//Xf+a8RP/v3/rAAEVARARAeEADBAAAoCAI0AQ+NAam
ytMpCGmpDVPU0aNpGmh6Rpo9QAAeoBoADQaNAADQ09IAACSSGUwaJpTNQGE9QZGhoADQPUAA0AAA
AA0aA4AAAABoAAAAA0GgAAAAZAGgAHAAAAANAAAAAGg0AAAADIA0AASJCBIyE8hHpqPVPUPU/VAa
fqn6o0ep6BB6TQaNGj0j1ABobU00yeU9JYiuVVZKYE+dKNa3wls6x81yBpGAN71NoylDUvNryWiW
E4ER8XkfpaJcPb6ND12ULEqkQX3eaBHP70Apa5uFhWNDy+U3Ekj+OLx5MtDHxQHQLfMcgCHrGayE
Dc76F4ZC4rcRkvTW4S2EbJAsbBGbQxSbx5o48zkyk5iPBBhJowtCSwDBsQBc0koYRSO6SgJNL0Bg
EmCoxCDAs5QkEmTGmQUgqZNIoxsmwDmDQe0NIDI0KjQ64leOr1fVk6AaVhjOAJjLrEYkYy4cDbyS
iXSuILWohNh+PA9Izk0YUM4TQQGEYNgn4oEjGmAByO+kzmDIxEC3Txni6E1WdswBJLKYiANdiQ2K
00jU/zpMzuIhjTbgiBqE24dZWBcNBBAAioiEhCQEIfAR8Vir4zNQZFgvKZa67Jckh6EHZWAWuf6Q
kGy1lOtA2h9fsyD/uPPI2kjvoYL+w54IUKBEEYFBIWRNCNpuyY86v3pNiHEB7XyCX5wDjZUSF2tO
w0PVlY2FQNcLQcbZjmMhZdlCGkVHojuICHMMMB5kQQSZRwNJkYTKz6stT/MTWmozDCcj+UjtB9Cf
CUqAqqRlgJdREtMtSO4S4GpJE2I/P8vuO9ckqCM2+iSJCLRWx2Gi8VSR8BIkVX6stqIDmtG8xSVU
kk7BnC5caZXTIynyI0doXiFY1+/Csw2RUQJroC0lCNiIqVVUkTqTRMYqKNVGtCJ5yfo7e3ZpgECk
PYUEihPU0QVgfQ76JA8Eb16KCbSzP3WYiVApqmfDhUk0aVc+jyBJH13uKztUuva8F4YdbpmzomjG
kSJmP+vCFdKkHU384LdRoO0LdN7VJlywJ2xJdM+TMQ0KhMaicvRqfC5pHSu+gVDVjfiss+S00ikI
DeMgatVKKtcjsVDX09XU3SzowLWXXunnFZp/fP3eN9Rj1ubiLc0utMl3CUUkcYsmwbKKrWhaZiLO
u67kMSsW20jVBcZ5tZUKgdRtu0UleWOs1HK2QdMpyKMxTRHWhhHwMnVEsWIUEjIfFEbWhRTRMJXn
oIBSEa2Q0llTBfJV0LEYEQTBTFsDKIxhgqNwZB2dovl/kiW4TLp6aGXxmoIpVeWTEXqg1PnyKwux
caORGyBhTEPV2G7/O3y+KeAL9mUM4Zjl1DsDKyTZy8vgn31EDY08rY+64Z/LO5tcRJHttMYsz0Fh
CRN8LTYJL/I/4u5IpwoSCtDViIA=
EOF
Update:
When using ld.gold instead of ld.bfd (to which /usr/bin/ld is symlinked to by default), the executable size becomes as small as expected:
$ cat Makefile
TARGET=helloworld
all:
as -32 -o ${TARGET}-asm.o ${TARGET}-asm.s
ld.bfd -melf_i386 -o ${TARGET}-asm-bfd ${TARGET}-asm.o
ld.gold -melf_i386 -o ${TARGET}-asm-gold ${TARGET}-asm.o
rm ${TARGET}-asm.o
$ make -q
$ ls -l
total 68
-rw-r--r-- 1 eso eso 200 Dec 1 13:57 Makefile
-rwxrwxr-x 1 eso eso 8700 Dec 1 13:57 helloworld-asm-bfd
-rwxrwxr-x 1 eso eso 732 Dec 1 13:57 helloworld-asm-gold
-rw-r--r-- 1 eso eso 498 Dec 1 13:44 helloworld-asm.s
Maybe I just used gold previously without being aware.
It's not 10x in general, it's page-alignment of a couple sections as Jester says, per changes to ld's default linker script for security reasons:
First change: Making sure data from .data isn't present in any of the mapping of .text, so none of that static data is available for ROP / Spectre gadgets in an executable page. (In older ld, that meant the program-headers mapped the same disk-block twice, also into a RW-without-exec segment for the actual .data section. The executable mapping was still read-only.)
More recent change: Separate .rodata from .text into separate segments, again so static data isn't mapped into an executable page. Previously, const char code[]= {...} could be cast to a function pointer and called, without needing mprotect or gcc -z execstack or other tricks, if you wanted to test shellcode that way. (A separate Linux kernel change made -z execstack only apply to the actual stack, not READ_IMPLIES_EXEC.)
See Why an ELF executable could have 4 LOAD segments? for this history, including the strange fact that .rodata is in a separate segment from the read-only mapping for access to the ELF metadata.
That extra space is just 00 padding and will compress well in a .tar.gz or whatever.
So it has a worst-case upper bound of about 2x 4k extra pages of padding, and tiny executables are close to that worst case.
gcc -Wl,--nmagic will turn off page-alignment of sections if you want that for some reason. (see the ld(1) man page) I don't know why that doesn't pack everything down to the old size. Perhaps checking the default linker script would shed some light, but it's pretty long. Run ld --verbose to see it.
stripping won't help for padding that's part of a section; I think it can only remove whole sections.
ld -z noseparate-code uses the old layout, only 2 total segments to cover the .text and .rodata sections, and the .data and .bss sections. (And the ELF metadata that dynamic linking wants access to.)
Related:
Linking with gcc instead of ld
This question is about ld, but note that if you're using gcc -nostdlib, that used to also default to making a static executable. But modern Linux distros config GCC with -pie as the default, and GCC won't make a static-pie by default even if there aren't any shared libraries being linked. Unlike with -no-pie mode where it will simply make a static executable in that case. (A static-pie still needs startup code to apply relocations for any absolute addresses.)
So the equivalent of ld directly is gcc -nostdlib -static (which implies -no-pie). Or gcc -nostdlib -no-pie should let it default to -static when there are no shared libs being linked. You can combine this with -Wl,--nmagic and/or -Wl,-z -Wl,noseparate-code.
Also:
A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux - eventually making a 45 byte executable, with the machine code for an _exit syscall stuffed into the ELF program header itself.
FASM can make quite small executables, using its mode where it outputs a static executable (not object file) directly with no ELF section metadata, just program headers. (It's a pain to debug with GDB or disassemble with objdump; most tools assume there will be section headers, even though they're not needed to run static executables.)
What is a reasonable minimum number of assembly instructions for a small C program including setup?
What's the difference between "statically linked" and "not a dynamic executable" from Linux ldd? (static vs. static-pie vs. (dynamic) PIE that happens to have no shared libraries.)

Minimal executable size now 10x larger after linking than 2 years ago, for tiny programs?

For a university course, I like to compare code-sizes of functionally similar programs if written and compiled using gcc/clang versus assembly. In the process of re-evaluating how to further shrink the size of some executables, I couldn't trust my eyes when the very same assembly code I assembled/linked 2 years ago now has grown >10x in size after building it again (which true for multiple programs, not only helloworld):
$ make
as -32 -o helloworld-asm-2020.o helloworld-asm-2020.s
ld -melf_i386 -o helloworld-asm-2020 helloworld-asm-2020.o
$ ls -l
-rwxr-xr-x 1 xxx users 708 Jul 18 2018 helloworld-asm-2018*
-rwxr-xr-x 1 xxx users 8704 Nov 25 15:00 helloworld-asm-2020*
-rwxr-xr-x 1 xxx users 4724 Nov 25 15:00 helloworld-asm-2020-n*
-rwxr-xr-x 1 xxx users 4228 Nov 25 15:00 helloworld-asm-2020-n-sstripped*
-rwxr-xr-x 1 xxx users 604 Nov 25 15:00 helloworld-asm-2020.o*
-rw-r--r-- 1 xxx users 498 Nov 25 14:44 helloworld-asm-2020.s
The assembly code is:
.code32
.section .data
msg: .ascii "Hello, world!\n"
len = . - msg
.section .text
.globl _start
_start:
movl $len, %edx # EDX = message length
movl $msg, %ecx # ECX = address of message
movl $1, %ebx # EBX = file descriptor (1 = stdout)
movl $4, %eax # EAX = syscall number (4 = write)
int $0x80 # call kernel by interrupt
# and exit
movl $0, %ebx # return code is zero
movl $1, %eax # exit syscall number (1 = exit)
int $0x80 # call kernel again
The same hello world program, compiled using GNU as and GNU ld (always using 32-bit assembly) was 708 bytes then, and has grown to 8.5K now. Even when telling the linker to turn off page alignment (ld -n), it still has almost 4.2K. stripping/sstripping doesn't pay off either.
readelf tells me that the start of section headers is much later in the code (byte 468 vs 8464), but I have no idea why. It's running on the same arch system as in 2018, the Makefile is the same and I'm not linking against any libraries (especially not libc). I guess something regarding ld has changed due to the fact that the object file is still quite small, but what and why?
Disclaimer: I'm building 32-bit executables on an x86-64 machine.
Edit: I'm using GNU binutils (as & ld) version 2.35.1 Here is a base64-encoded archive which includes the source and both executables (small old one, large new one) :
cat << EOF | base64 -d | tar xj
QlpoOTFBWSZTWVaGrEQABBp////xebj/7//Xf+a8RP/v3/rAAEVARARAeEADBAAAoCAI0AQ+NAam
ytMpCGmpDVPU0aNpGmh6Rpo9QAAeoBoADQaNAADQ09IAACSSGUwaJpTNQGE9QZGhoADQPUAA0AAA
AA0aA4AAAABoAAAAA0GgAAAAZAGgAHAAAAANAAAAAGg0AAAADIA0AASJCBIyE8hHpqPVPUPU/VAa
fqn6o0ep6BB6TQaNGj0j1ABobU00yeU9JYiuVVZKYE+dKNa3wls6x81yBpGAN71NoylDUvNryWiW
E4ER8XkfpaJcPb6ND12ULEqkQX3eaBHP70Apa5uFhWNDy+U3Ekj+OLx5MtDHxQHQLfMcgCHrGayE
Dc76F4ZC4rcRkvTW4S2EbJAsbBGbQxSbx5o48zkyk5iPBBhJowtCSwDBsQBc0koYRSO6SgJNL0Bg
EmCoxCDAs5QkEmTGmQUgqZNIoxsmwDmDQe0NIDI0KjQ64leOr1fVk6AaVhjOAJjLrEYkYy4cDbyS
iXSuILWohNh+PA9Izk0YUM4TQQGEYNgn4oEjGmAByO+kzmDIxEC3Txni6E1WdswBJLKYiANdiQ2K
00jU/zpMzuIhjTbgiBqE24dZWBcNBBAAioiEhCQEIfAR8Vir4zNQZFgvKZa67Jckh6EHZWAWuf6Q
kGy1lOtA2h9fsyD/uPPI2kjvoYL+w54IUKBEEYFBIWRNCNpuyY86v3pNiHEB7XyCX5wDjZUSF2tO
w0PVlY2FQNcLQcbZjmMhZdlCGkVHojuICHMMMB5kQQSZRwNJkYTKz6stT/MTWmozDCcj+UjtB9Cf
CUqAqqRlgJdREtMtSO4S4GpJE2I/P8vuO9ckqCM2+iSJCLRWx2Gi8VSR8BIkVX6stqIDmtG8xSVU
kk7BnC5caZXTIynyI0doXiFY1+/Csw2RUQJroC0lCNiIqVVUkTqTRMYqKNVGtCJ5yfo7e3ZpgECk
PYUEihPU0QVgfQ76JA8Eb16KCbSzP3WYiVApqmfDhUk0aVc+jyBJH13uKztUuva8F4YdbpmzomjG
kSJmP+vCFdKkHU384LdRoO0LdN7VJlywJ2xJdM+TMQ0KhMaicvRqfC5pHSu+gVDVjfiss+S00ikI
DeMgatVKKtcjsVDX09XU3SzowLWXXunnFZp/fP3eN9Rj1ubiLc0utMl3CUUkcYsmwbKKrWhaZiLO
u67kMSsW20jVBcZ5tZUKgdRtu0UleWOs1HK2QdMpyKMxTRHWhhHwMnVEsWIUEjIfFEbWhRTRMJXn
oIBSEa2Q0llTBfJV0LEYEQTBTFsDKIxhgqNwZB2dovl/kiW4TLp6aGXxmoIpVeWTEXqg1PnyKwux
caORGyBhTEPV2G7/O3y+KeAL9mUM4Zjl1DsDKyTZy8vgn31EDY08rY+64Z/LO5tcRJHttMYsz0Fh
CRN8LTYJL/I/4u5IpwoSCtDViIA=
EOF
Update:
When using ld.gold instead of ld.bfd (to which /usr/bin/ld is symlinked to by default), the executable size becomes as small as expected:
$ cat Makefile
TARGET=helloworld
all:
as -32 -o ${TARGET}-asm.o ${TARGET}-asm.s
ld.bfd -melf_i386 -o ${TARGET}-asm-bfd ${TARGET}-asm.o
ld.gold -melf_i386 -o ${TARGET}-asm-gold ${TARGET}-asm.o
rm ${TARGET}-asm.o
$ make -q
$ ls -l
total 68
-rw-r--r-- 1 eso eso 200 Dec 1 13:57 Makefile
-rwxrwxr-x 1 eso eso 8700 Dec 1 13:57 helloworld-asm-bfd
-rwxrwxr-x 1 eso eso 732 Dec 1 13:57 helloworld-asm-gold
-rw-r--r-- 1 eso eso 498 Dec 1 13:44 helloworld-asm.s
Maybe I just used gold previously without being aware.
It's not 10x in general, it's page-alignment of a couple sections as Jester says, per changes to ld's default linker script for security reasons:
First change: Making sure data from .data isn't present in any of the mapping of .text, so none of that static data is available for ROP / Spectre gadgets in an executable page. (In older ld, that meant the program-headers mapped the same disk-block twice, also into a RW-without-exec segment for the actual .data section. The executable mapping was still read-only.)
More recent change: Separate .rodata from .text into separate segments, again so static data isn't mapped into an executable page. Previously, const char code[]= {...} could be cast to a function pointer and called, without needing mprotect or gcc -z execstack or other tricks, if you wanted to test shellcode that way. (A separate Linux kernel change made -z execstack only apply to the actual stack, not READ_IMPLIES_EXEC.)
See Why an ELF executable could have 4 LOAD segments? for this history, including the strange fact that .rodata is in a separate segment from the read-only mapping for access to the ELF metadata.
That extra space is just 00 padding and will compress well in a .tar.gz or whatever.
So it has a worst-case upper bound of about 2x 4k extra pages of padding, and tiny executables are close to that worst case.
gcc -Wl,--nmagic will turn off page-alignment of sections if you want that for some reason. (see the ld(1) man page) I don't know why that doesn't pack everything down to the old size. Perhaps checking the default linker script would shed some light, but it's pretty long. Run ld --verbose to see it.
stripping won't help for padding that's part of a section; I think it can only remove whole sections.
ld -z noseparate-code uses the old layout, only 2 total segments to cover the .text and .rodata sections, and the .data and .bss sections. (And the ELF metadata that dynamic linking wants access to.)
Related:
Linking with gcc instead of ld
This question is about ld, but note that if you're using gcc -nostdlib, that used to also default to making a static executable. But modern Linux distros config GCC with -pie as the default, and GCC won't make a static-pie by default even if there aren't any shared libraries being linked. Unlike with -no-pie mode where it will simply make a static executable in that case. (A static-pie still needs startup code to apply relocations for any absolute addresses.)
So the equivalent of ld directly is gcc -nostdlib -static (which implies -no-pie). Or gcc -nostdlib -no-pie should let it default to -static when there are no shared libs being linked. You can combine this with -Wl,--nmagic and/or -Wl,-z -Wl,noseparate-code.
Also:
A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux - eventually making a 45 byte executable, with the machine code for an _exit syscall stuffed into the ELF program header itself.
FASM can make quite small executables, using its mode where it outputs a static executable (not object file) directly with no ELF section metadata, just program headers. (It's a pain to debug with GDB or disassemble with objdump; most tools assume there will be section headers, even though they're not needed to run static executables.)
What is a reasonable minimum number of assembly instructions for a small C program including setup?
What's the difference between "statically linked" and "not a dynamic executable" from Linux ldd? (static vs. static-pie vs. (dynamic) PIE that happens to have no shared libraries.)

What's the difference between "statically linked" and "not a dynamic executable" from Linux ldd?

Consider this AMD64 assembly program:
.globl _start
_start:
xorl %edi, %edi
movl $60, %eax
syscall
If I compile that with gcc -nostdlib and run ldd a.out, I get this:
statically linked
If I instead compile that with gcc -static -nostdlib and run ldd a.out, I get this:
not a dynamic executable
What's the difference between statically linked and not a dynamic executable? And if my binary was already statically linked, why does adding -static affect anything?
There are two separate things here:
Requesting an ELF interpreter (ld.so) or not.
Like #!/bin/sh but for binaries, runs before your _start.
This is the difference between a static vs. dynamic executable.
The list of dynamically linked libraries for ld.so to load happens to be empty.
This is apparently what ldd calls "statically linked", i.e. that any libraries you might have linked at build time were static libraries.
Other tools like file and readelf give more information and use terminology that matches what you'd expect.
Your GCC is configured so -pie is the default, and gcc doesn't make a static-pie for the special case of no dynamic libraries.
gcc -nostdlib just makes a PIE that happens not to link to any libraries but is otherwise identical to a normal PIE, specifying an ELF interpreter.
ldd confusingly calls this "statically linked".
file : ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2 ...
gcc -nostdlib -static overrides the -pie default and makes a true static executable.
file : ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked ...
gcc -nostdlib -no-pie also chooses to make a static executable as an optimization for the case where there are no dynamic libraries at all. Since a non-PIE executable couldn't have been ASLRed anyway, this makes sense. Byte-for-byte identical to the -static case.
gcc -nostdlib -static-pie makes an ASLRable executable that doesn't need an ELF interpreter. GCC doesn't do this by default for gcc -pie -nostdlib, unlike the no-pie case where it chooses to sidestep ld.so when no dynamically-linked libraries are involved.
file : ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), statically linked ...
-static-pie is obscure, rarely used, and older file doesn't identify it as statically linked.
-nostdlib doesn't imply -no-pie or -static, and -static-pie has to be explicitly specified to get that.
gcc -static-pie invokes ld -static -pie, so ld has to know what that means. Unlike with the non-PIE case where you don't have to ask for a dynamic executable explicitly, you just get one if you pass ld any .so libraries. I think that's why you happen to get a static executable from gcc -nostdlib -no-pie - GCC doesn't have to do anything special, it's just ld doing that optimization.
But ld doesn't enable -static implicitly when -pie is specified, even when there are no shared libraries to link.
Details
Examples generated with gcc --version gcc (Arch Linux 9.3.0-1) 9.3.0
ld --version GNU ld (GNU Binutils) 2.34 (also readelf is binutils)
ldd --version ldd (GNU libc) 2.31
file --version file-5.38 - note that static-pie detection has changed in recent patches, with Ubuntu cherry-picking an unreleased patch. (Thanks #Joseph for the detective work) - this in 2019 detected dynamic = having a PT_INTERP to handle static-pie, but it was reverted to detect based on PT_DYNAMIC so shared libraries count as dynamic. debian bug #948269. static-pie is an obscure rarely-used feature.
GCC ends up running ld -pie exit.o with a dynamic linker path specified, and no libraries. (And a boatload of other options to support possible LTO link-time optimization, but the keys here are -dynamic-linker /lib64/ld-linux-x86-64.so.2 -pie. collect2 is just a wrapper around ld.)
$ gcc -nostdlib exit.s -v # output manually line wrapped with \ for readability
...
COLLECT_GCC_OPTIONS='-nostdlib' '-v' '-mtune=generic' '-march=x86-64'
/usr/lib/gcc/x86_64-pc-linux-gnu/9.3.0/collect2 \
-plugin /usr/lib/gcc/x86_64-pc-linux-gnu/9.3.0/liblto_plugin.so \
-plugin-opt=/usr/lib/gcc/x86_64-pc-linux-gnu/9.3.0/lto-wrapper \
-plugin-opt=-fresolution=/tmp/ccoNx1IR.res \
--build-id --eh-frame-hdr --hash-style=gnu \
-m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -pie \
-L/usr/lib/gcc/x86_64-pc-linux-gnu/9.3.0 \
-L/usr/lib/gcc/x86_64-pc-linux-gnu/9.3.0/../../../../lib -L/lib/../lib \
-L/usr/lib/../lib \
-L/usr/lib/gcc/x86_64-pc-linux-gnu/9.3.0/../../.. \
/tmp/cctm2fSS.o
You get a dynamic PIE with no dependencies on other libraries. Running it still invokes the "ELF interpreter" /lib64/ld-linux-x86-64.so.2 on it which runs before jumping to your _start. (Although the kernel has already mapped the executable's ELF segments to ASLRed virtual addresses, along with ld.so's text / data / bss).
file and readelf are more descriptive.
PIE non-static executable from gcc -nostdlib
$ gcc -nostdlib exit.s -o exit-default
$ ls -l exit-default
-rwxr-xr-x 1 peter peter 13536 May 2 02:15 exit-default
$ ldd exit-default
statically linked
$ file exit-default
exit-default: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=05a4d1bdbc94d6f91cca1c9c26314e1aa227a3a5, not stripped
$ readelf -a exit-default
...
Type: DYN (Shared object file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x1000
...
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
0x00000000000001f8 0x00000000000001f8 R 0x8
INTERP 0x0000000000000238 0x0000000000000238 0x0000000000000238
0x000000000000001c 0x000000000000001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x00000000000002b1 0x00000000000002b1 R 0x1000
LOAD 0x0000000000001000 0x0000000000001000 0x0000000000001000
0x0000000000000009 0x0000000000000009 R E 0x1000
... (the Read+Exec segment to be mapped at virt addr 0x1000 is where your text section was linked.)
If you strace it you can also see the differences:
$ gcc -nostdlib exit.s -o exit-default
$ strace ./exit-default
execve("./exit-default", ["./exit-default"], 0x7ffe1f526040 /* 51 vars */) = 0
brk(NULL) = 0x5617eb1e4000
arch_prctl(0x3001 /* ARCH_??? */, 0x7ffcea703380) = -1 EINVAL (Invalid argument)
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9ff5b3e000
arch_prctl(ARCH_SET_FS, 0x7f9ff5b3ea80) = 0
mprotect(0x5617eabac000, 4096, PROT_READ) = 0
exit(0) = ?
+++ exited with 0 +++
vs. -static and -static-pie the first instruction executed in user-space is your _start (which you can also check with GDB using starti).
$ strace ./exit-static-pie
execve("./exit-static-pie", ["./exit-static-pie"], 0x7ffcdac96dd0 /* 51 vars */) = 0
exit(0) = ?
+++ exited with 0 +++
gcc -nostdlib -static-pie
$ gcc -nostdlib -static-pie exit.s -o exit-static-pie
$ ls -l exit-static-pie
-rwxr-xr-x 1 peter peter 13440 May 2 02:18 exit-static-pie
peter#volta:/tmp$ ldd exit-static-pie
statically linked
peter#volta:/tmp$ file exit-static-pie
exit-static-pie: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=daeb4a8f11bec1bb1aaa13cd48d24b5795af638e, not stripped
$ readelf -a exit-static-pie
...
Type: DYN (Shared object file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x1000
...
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000229 0x0000000000000229 R 0x1000
LOAD 0x0000000000001000 0x0000000000001000 0x0000000000001000
0x0000000000000009 0x0000000000000009 R E 0x1000
... (no Interp header, but still a read+exec text segment)
Notice that the addresses are still relative to the image base, leaving ASLR up to the kernel.
Surprisingly, ldd doesn't say that it's not a dynamic executable. That might be a bug, or a side effect of some implementation detail.
gcc -nostdlib -static traditional non-PIE old-school static executable
$ gcc -nostdlib -static exit.s -o exit-static
$ ls -l exit-static
-rwxr-xr-x 1 peter peter 4744 May 2 02:26 exit-static
peter#volta:/tmp$ ldd exit-static
not a dynamic executable
peter#volta:/tmp$ file exit-static
exit-static: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=1b03e3d05709b7288fe3006b4696fd0c11fb1cb2, not stripped
peter#volta:/tmp$ readelf -a exit-static
ELF Header:
...
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x401000
... (Note the absolute entry-point address nailed down at link time)
(And that the ELF type is EXEC, not DYN)
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x000000000000010c 0x000000000000010c R 0x1000
LOAD 0x0000000000001000 0x0000000000401000 0x0000000000401000
0x0000000000000009 0x0000000000000009 R E 0x1000
NOTE 0x00000000000000e8 0x00000000004000e8 0x00000000004000e8
0x0000000000000024 0x0000000000000024 R 0x4
Section to Segment mapping:
Segment Sections...
00 .note.gnu.build-id
01 .text
02 .note.gnu.build-id
...
Those are all the program headers; unlike pie / static-pie I'm not leaving any out, just other whole parts of the readelf -a output.
Also note the absolute virtual addresses in the program headers that don't give the kernel a choice where in virtual address space to map the file. This is the difference between EXEC and DYN types of ELF objects. PIE executables are shared objects with an entry point, allowing us to get ASLR for the main executable. Actual EXEC executables have a link-time-chosen memory layout.
ldd apparently only reports "not a dynamic executable" when both:
no ELF interpreter (dynamic linker) path
ELF type = EXEC

why does a stack program segment have executable attribute

Here is a dump from a.out
STACK off 0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**2
filesz 0x00000000 memsz 0x00000000 flags rwx
Why does a stack segment have executable attribute?
Why isn't there a heap segment with rw- attribute?
//On ubuntu 32bit machine. Program is a simple hello world.
Command:
ld test.o startup.s; objdump -dhSxt -M intel-pneumonic a.out
//startup.s has a small assembly code with _start symbol which calls main and exits after main returns.
Command: gcc test.c
Try gcc test.c -Wl,-z,noexecstack.
That should be the default on any reasonably modern distribution.

Resources