How does the Linux kernel determine ld.so's load address? - linux

I know that the dynamic linker uses mmap() to load libraries. I guess it is the kernel who loads both the executable and its .interpreter into the same address space, but how does it determine where? I noticed that ld.so's load address with ASLR disabled is 0x555555554000 (on x86_64) — where does this address come from? I tried following do_execve()'s code path, but it is too ramified for me not to be confused as hell.

Read more about ELF, in particular elf(5), and about the execve(2) syscall.
An ELF file may contain an interpreter. elf(5) mentions:
PT_INTERP The array element specifies the location and
size of a null-terminated pathname to invoke
as an interpreter. This segment type is
meaningful only for executable files (though
it may occur for shared objects). However it
may not occur more than once in a file. If
it is present, it must precede any loadable
segment entry.
That interpreter is practically almost always ld-linux(8) (e.g. with GNU glibc), more precisely (on my Debian/Sid) /lib64/ld-linux-x86-64.so.2. If you compile musl-libc then build some software with it you'll get a different interpreter, /lib/ld-musl-x86_64.so.1. That ELF interpreter is the dynamic linker.
The execve(2) syscall is using that interpreter:
If the executable is a dynamically linked ELF executable, the
interpreter named in the PT_INTERP segment is used to load the needed
shared libraries. This interpreter is typically /lib/ld-linux.so.2
for binaries linked with glibc.
See also Levine's book on Linkers and loaders, and Drepper's paper: How To Write Shared Libraries
Notice that execve is also handling the shebang (i.e. first line starting with #!); see the Interpreter scripts section of execve(2). BTW, for ELF binaries, execve is doing the equivalent of mmap(2) on some segments.
Read also about vdso(7), proc(5) & ASLR. Type cat /proc/self/maps in your shell.
(I guess, but I am not sure, that the 0x555555554000 address is in the ELF program header of your executable, or perhaps of ld-linux.so; it might also come from the kernel, since 0x55555555 seems to appear in the kernel source code)

Related

What is the role of program interpreters in executable files?

I was going through disassembly of elf executables and understanding the elf format. In there, I saw lib64/ld-linux-x86-64.so.2 used as program interpreter in the generated executable.
My guess is: I had used printf in the source code, which had to be dynamically linked. When I checked through dynamic section, I was able to find a reference to libc.so.6 shared library (tag:DT_NEEDED). In my system, I found multiple files with that name in different directories:
sourav#ubuntu-VirtualBox:/$ sudo find / -name libc.so.6
/usr/lib/x86_64-linux-gnu/libc.so.6
find: ‘/run/user/1000/doc’: Permission denied
find: ‘/run/user/1000/gvfs’: Permission denied
/snap/snapd/13170/lib/x86_64-linux-gnu/libc.so.6
/snap/snapd/11107/lib/x86_64-linux-gnu/libc.so.6
/snap/core18/1988/lib/i386-linux-gnu/libc.so.6
/snap/core18/1988/lib/x86_64-linux-gnu/libc.so.6
/snap/core18/2128/lib/i386-linux-gnu/libc.so.6
/snap/core18/2128/lib/x86_64-linux-gnu/libc.so.6
So, I guess purpose of program interpreter is to resolve these names to the proper libraries and load them during execution. Is this correct?
It seems, we can also have executables with no program interpreter (which is the case for program interpreter itself). In that case, does system/os itself loads the shared library? If so, how does it resolves the path of library?
Is it possible to generate executable with no program interpreter using gcc? My gcc version is 'gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)'.
So, I guess purpose of program interpreter is to resolve these names to the proper libraries and load them during execution. Is this correct?
Yes, but that that's a bit minimalistic. Loading dynamic libraries involves locating them, loading or mapping them into memory if necessary, and resolving dynamic symbols within, possibly lazily, for multiple kinds of relocations. It involves recursively loading the libraries' own needed libraries. Also, in a dynamically linked executable, the program interpreter provides the program entry point (from the kernel's perspective), so it is also responsible for setting up and entering the program-specific entry point (for example, main() in a C or C++ program).
It seems, we can also have executables with no program interpreter (which is the case for program interpreter itself). In that case, does system/os itself loads the shared library? If so, how does it resolves the path of library?
You can have ELF executables without a program interpreter, but they are not dynamically linked, at least not in the ELF sense. There are no shared libraries to load, and certainly the system does not load any.
Is it possible to generate executable with no program interpreter using gcc? My gcc version is 'gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)'.
If you have static versions of all needed libraries available then you should be able to achieve that by including the -static option on the command line when you link the program. It is entirely possible, however, that you do not have the needed static libraries, even if libc is the only library you need.

Why does the .bss segment have no executable attribute?

I have an ELF 32-bit executable file named orw from the pwnable.tw: https://pwnable.tw/challenge/. In my Ubuntu18.04, the .bss segment can be executed:
But in my Ubuntu20 and IDA Pro, the .bss segment have no executable attributes, why?
Why does the .bss segment have no executable attribute?
In a normal executable .bss should not have execute permissions, so it's the Ubuntu 18.04 result that is strange, not the other way around.
The following are all relevant here:
output from readelf -Wl orw
kernel versions
output from cat /proc/cpuinfo
emulator details (if you are using some kind of emulator).
I suspect that you are using an emulator, and it's set up to emulate pre-NX-bit processor (where the W bit implied X bit as well).
Alternatively, the executable lacks PT_GNU_STACK segment, in which case this answer is likely the correct one -- kernel defaults have changed for such binaries.
.bss is a segment for uninitialized global variables, so It's not normally executable (it doesn't need to). If you want it executable (because you are compiling machine code that you want to be able to test) you will probably need to select a special segment or to create two segments (one executable and other read/write) overlapping to allow to write the code while you can also execute it. This can be already specified in the standard script you use to link executables (with a different name, sure) or if it has not been done for you, you can specify a linker script that allows for those to be created. Read the linker documentation (in full, sorry) to know how the linker deals with this (and other) idiosynchracies of your processor architecture.
I don't know what architecture you are using, but for example, intel processors have an execution bit permissions in the segments, as they have read and write, which means that the memory access for an executable segment must be an opcode fetch access and not a data read access to load a data register. If you want to access the text segment for data reading, then you need to add also read access to the text segment to be able to see the code you are executing.

What is the significance of ".note.ABI-tag" section in ELF?

I see a .note.ABI-tag section when I objdump -h <binary> on a ELF file.
As per the ELF man page:
.note.ABI-tag
This section is used to declare the expected run-time ABI
of the ELF image. It may include the operating system name
and its run-time versions. This section is of type
SHT_NOTE. The only attribute used is SHF_ALLOC.
Is this section necessary?
What could be the side effects removing this section?
How to remove this section (a gcc flag) from ELF?
Could break the executable on some systems. It is supposed to give information on which kernel it's compatible with so if the binary's ABI are not compatible with the current kernel's ABI. See more information here https://refspecs.linuxfoundation.org/LSB_1.2.0/gLSB/noteabitag.html
However if your binary is not compiled for a specific kernel(not necessarily Linux as a lot of different targets use the ELF output) it doesn't matter and could just be cut if your goal is to reduce the executable's size. You should however be aware that it's already ignored if you are doing a objcopyfrom ELF to BIN.

How to build the elf interpreter (ld-linux.so.2/ld-2.17.so) as static library?

I apologize if my question is not precise because I don't have a lot
of Linux related experience. I'm currently building a Linux from
scratch (mostly following the guide at linuxfromscratch.org version
7.3). I ran into the following problem: when I build an executable it
gets a hardcoded path to something called ELF interpreter.
readelf -l program
shows something like
[Requesting program interpreter: /lib/ld-linux.so.2]
I traced this library ld-linux-so.2 to be part of glibc. I am not very
happy with this behaviour because it makes the binary very unportable
- if I change the location of /lib/ld-linux.so.2 the executable no
longer works and the only "fix" I found is to use the patchelf utility
from NixOS to change the hardcoded path to another hardcoded path. For
this reason I would like to link against a static version of the ld
library but such is not produced. And so this is my question, could
you please explain how could I build glibc so that it will produce a
static version of ld-linux.so.2 which I could later link to my
executables. I don't fully understand what this ld library does, but I
assume this is the part that loads other dynamic libraries (or at
least glibc.so). I would like to link my executables dynamically, but
I would like the dynamic linker itself to be statically built into
them, so they would not depend on hardcoded paths. Or alternatively I
would like to be able to set the path to the interpreter with
environment variable similar to LD_LIBRARY_PATH, maybe
LD_INTERPRETER_PATH. The goal is to be able to produce portable
binaries, that would run on any platform with the same ABI no matter
what the directory structure is.
Some background that may be relevant: I'm using Slackware 14 x86 to
build i686 compiler toolchain, so overall it is all x86 host and
target. I am using glibc 2.17 and gcc 4.7.x.
I would like to be able to set the path to the interpreter with environment variable similar to LD_LIBRARY_PATH, maybe LD_INTERPRETER_PATH.
This is simply not possible. Read carefully (and several times) the execve(2), elf(5) & ld.so(8) man pages and the Linux ABI & ELF specifications. And also the kernel code doing execve.
The ELF interpreter is responsible for dynamic linking. It has to be a file (technically a statically linked ELF shared library) at some fixed location in the file hierarchy (often /lib/ld.so.2 or /lib/ld-linux.so.2 or /lib64/ld-linux-x86-64.so.2)
The old a.out format from the 1990s had a builtin dynamic linker, partly implemented in old Linux 1.x kernel. It was much less flexible, and much less powerful.
The kernel enables, by such (in principle) arbitrary dynamic linker path, to have various dynamic linkers. But most systems have only one. This is a good way to parameterize the dynamic linker. If you want to try another one, install it in the file system and generate ELF executables mentioning that path.
With great pain and effort, you might make your own ld.so-like dynamic linker implementing your LD_INTERPRETER_PATH wish, but that linker still has to be an ELF shared library sitting at some fixed location in the file tree.
If you want a system not needing any files (at some predefined, and wired locations, like /lib/ld.so, /dev/null, /sbin/init ...), you'll need to build all its executable binaries statically. You may want (but current Linux distributions usually don't do that) to have a few statically linked executables (like /sbin/init, /bin/sash...) that will enable you to repair a system broken to the point of not having any dynamic linker.
BTW, the /sbin/init -or /bin/sh - path is wired inside the kernel itself. You may pass some argument to the kernel at boot load time -e.g. with GRUB- to overwrite the default. So even the kernel wants some files to be here!
As I commented, you might look into MUSL-Libc for an alternative Libc implementation (providing its own dynamic linker). Read also about VDSO and ASLR and initrd.
In practice, accept the fact that modern Linuxes and Unixes are expecting some non-empty file system ... Notice that dynamic linking and shared libraries are a huge progress (it was much more painful in the 1990s Linux kernels and distributions).
Alternatively, define your own binary format, then make a kernel module or a binfmt_misc entry to handle it.
BTW, most (or all) of Linux is free software, so you can improve it (but this will take months -or many years- of work to you). Please share your improvements by publishing them.
Read also Drepper's Hwo to Write Shared Libraries paper; and this question.
I ran into the same issue. In my case I want to bundle my application with a different GLIBC than comes system installed. Since ld-linux.so must match the GLIBC version I can't simply deploy my application with the according GLIBC. The problem is that I can't run my application on older installations that don't have the required GLIBC version.
The path to the loader interpreter can be modified with --dynamic-linker=/path/to/interp. However, this needs to be set at compile time and therefore would require my application to be installed in that location (or at least I would need to deploy the ld-linux.so that goes with my GLIBC in that location which goes against a simple xcopy deployment.
So what's needed is an $ORIGIN option equivalent to what the -rpath option can handle. That would allow for a fully dynamic deployment.
Given the lack of a dynamic interpreter path (at runtime) leaves two options:
a) Use patchelf to modify the path before the executable gets launched.
b) Invoke the ld-linux.so directly with the executable as an argument.
Both options are not as 'integrated' as a compiled $ORIGIN path in the executable itself.

Loading ELF shared library and custom binfmt executable into same Linux address space

I am working on a project to load and run a custom binary format executable (PE, in my case) on a Linux platform. I've done this pretty successfully so far by first loading the executable and then loading a small ELF shared library that calls the start address of the executable and then exits safely.
I would really like not doing the ELF loading myself for a few reasons, though. First, the shared library I use is written in assembly (I can't use anything else because I'm not linking to libc, etc.), which will be very platform-specific, and I'd like to move away from that and use C so I can compile for any platform. Also, it will be easier and safer to use Linux's native ELF loader instead of my own simplified version.
I'm wondering if there is a way to use my binfmt handler, an installed kernel module, to load my executable and then ask Linux to load my shared library (and its dependencies) into the same address space without overwriting my executable code. I first thought that the uselib syscall might be useful, but the description on the man page is unclear about whether or not this will serve my purposes:
From libc 4.4.4 on only the library "/lib/ld.so" is loaded, so that
this dynamic library can load the remaining libraries needed (again
using this call). This is also the state of affairs in libc5.
glibc2 does not use this call.
I've also never seen an example of its use, and I'm always wary of using syscalls that I don't understand.
Is there a good way to achieve what I've described? Can I use Linux's existing capabilities to load a shared library (written in C) into an address space already containing executable code, and, if so, how can I use that library without knowing where it has been loaded?
there is already a project like this called binfmt_pe (by me!) which is a kernel module and will have it's own linker (similar to /lib/ld). check it out here.
As for your question about making modules and the loader/linker, there are links below. I also included links with info about ELF and PE executables.
I hope this helps. :)
Useful information for making a Linux Kernel Module
The Linux Kernel Module Programming Guide
Writing Your Own Loadable Kernel Module
Linux Data Structures
The Linux kernel: The kernel source
Kernel Support for miscellaneous Binary Formats
binfmt_elf.c
binfmt_misc.c
Information About Dynamnic Loading/Linking
Understanding ld-linux.so.2
ld.so : Dynamic-Link Library support
How To Write Shared Libraries - PDF
ld-linux(8) - Linux man page
rpath
Listing Shared Library Dependencies
Information About ELF and PE Formats
OSRC: Executable File Formats
EXE Format
How Windows NT Recognizes MS-DOS - Based Applications
Common Object File Format (COFF)
Peering Inside the PE: A Tour of the Win32 Portable Executable File Format
Executable and Linkable Format
An In-Depth Look into the Win32 Portable Executable File Format
An In-Depth Look into the Win32 Portable Executable File Format, Part 2
Injective Code Inside an Import Table
The PE Format | Hackers Library
IMAGE_NT_HEADERS structure (Windows)
x86 Disassembly/Windows Executable Files
the Portable Executable Format on Windows

Resources