Linux userland exec - linux

I need a C library which lets me exec() a statically linked binary, without invoking the execve() system call. The reason why the system call wouldn't work is that the binary file is not executable, and it's not possible to make it executable on that system. For dynamically linked binaries, running /lib/ld-linux.so.2 progname does the trick, but that segfaults on my statically linked binary.
I've found ul_exec 1.1 on http://archive.cert.uni-stuttgart.de/bugtraq/2004/01/msg00002.html , but that seems to segfault for its own Hello, World binary on my system.
One option would be to make a copy of the binary, make the copy executable, and call execve(). I'm looking for a solution which doesn't need such a copy (because of performance reasons).

I've updated The Grugq's userland exec to work with modern x86 Linuxes. I wrote an x86_64 userland exec from scratch.

then how about a usermode filesystem (using python-fuse for example) that maps the execute bit to any file specified? would that be too much of a performance hit?

There is a good short wiki article with some not-completely-production-ready implementations: http://plash.beasts.org/wiki/UserModeExec

Related

How can we tell an instruction is from application code or library code on Linux x86_64

I wanted to know whether an instruction is from the application itself or from the library code.
I observed some application code/data are located at about 0x000055xxxx while libraries and mmaped regions are by default located at 0x00007fcxxxx. Can I use for example, 0x00007f00...00 as a boundary to tell instruction is from the application itself or from the library?
How can I configure this boundary in Linux kernel?
Updated.
Can I prevent (or detect) a syscall instruction being issued from application code (only allow it to go through libc). Maybe we can do a binary scan, but due to the variable length of instruction, it's hard to prevent unintended syscall instruction.
Do it the other way. You need to learn a lot.
First, read a lot more about operating systems. So read the Operating Systems: Three Easy Pieces textbook.
Then, learn more about ASLR.
Read also Drepper's How to write shared libraries and Levine's Linkers and loaders book.
You want to use pmap(1) and proc(5).
You probably want to parse the /proc/self/maps pseudo-file from inside your program. Or use dladdr(3).
To get some insight, run cat /proc/$$/maps and cat /proc/self/maps in a Linux terminal
I wanted to know whether an instruction is from userspace or from library code.
You are confused: both library code and main executable code are userspace.
On Linux x86_64, you can distinguish kernel addresses from userpsace addresses, because the kernel addresses are in the FFFF8000'00000000 through FFFFFFFF'FFFFFFFF range on current (48-bit) implementations. See canonical form address description here.
I observed some application code/data are located at about 0x000055xxxx while libraries and mmaped regions are by default located at 0x00007fcxxxx. Can I use for example, 0x00007f00...00 as a boundary to tell instruction is from the application itself or from the library?
No, in general you can't. An application can be linked to load anywhere within canonical address space (though most applications aren't).
As Basile Starynkevitch already answered, you'll need to parse /proc/$pid/maps, or know what address the executable is linked to load at (for non-PIE binary).

How to invoke newly added system call by the function id without using syscall(__NR_mysyscall)

I am working with Linux-3.9.3 kernel in Ubuntu 10.04. I have added a basic system call in the kernel directory of the linux-3.9.3 source tree. I am able to use it with syscall() by passing my newly system call number in it as an argument. But I want to invoke it directly by using its method name as in the case of getpid() or open() system calls. Can any one help me to add it in GNU C library. I went through few documents but did not get any clear idea of how to accomplish it.
Thanks!!!
Assuming you are on a 64 bits Linux x86-64, the relevant ABI is the x86-64 ABI. Read also the x86 calling conventions wikipage and the linux assembly howto and syscalls(2)
So syscalls are using a different convention than ordinary function calls (e.g. all arguments are passed by registers, error condition could use the carry bit). Hence, you need a C wrapper to make your syscall available to C applications.
You could look into the source code of existing C libraries, like GNU libc or musl libc (so you'll need to make your own library for that syscall).
The MUSL libc source code is very readable, see e.g. its src/unistd/fsync.c as an example.
I would suggest wrapping your new syscall in your own library without patching libc. Notice that some uncommon syscalls are sitting in a different library, e.g. request_key(2) has its C wrapper in libkeyutils

How to build the elf interpreter (ld-linux.so.2/ld-2.17.so) as static library?

I apologize if my question is not precise because I don't have a lot
of Linux related experience. I'm currently building a Linux from
scratch (mostly following the guide at linuxfromscratch.org version
7.3). I ran into the following problem: when I build an executable it
gets a hardcoded path to something called ELF interpreter.
readelf -l program
shows something like
[Requesting program interpreter: /lib/ld-linux.so.2]
I traced this library ld-linux-so.2 to be part of glibc. I am not very
happy with this behaviour because it makes the binary very unportable
- if I change the location of /lib/ld-linux.so.2 the executable no
longer works and the only "fix" I found is to use the patchelf utility
from NixOS to change the hardcoded path to another hardcoded path. For
this reason I would like to link against a static version of the ld
library but such is not produced. And so this is my question, could
you please explain how could I build glibc so that it will produce a
static version of ld-linux.so.2 which I could later link to my
executables. I don't fully understand what this ld library does, but I
assume this is the part that loads other dynamic libraries (or at
least glibc.so). I would like to link my executables dynamically, but
I would like the dynamic linker itself to be statically built into
them, so they would not depend on hardcoded paths. Or alternatively I
would like to be able to set the path to the interpreter with
environment variable similar to LD_LIBRARY_PATH, maybe
LD_INTERPRETER_PATH. The goal is to be able to produce portable
binaries, that would run on any platform with the same ABI no matter
what the directory structure is.
Some background that may be relevant: I'm using Slackware 14 x86 to
build i686 compiler toolchain, so overall it is all x86 host and
target. I am using glibc 2.17 and gcc 4.7.x.
I would like to be able to set the path to the interpreter with environment variable similar to LD_LIBRARY_PATH, maybe LD_INTERPRETER_PATH.
This is simply not possible. Read carefully (and several times) the execve(2), elf(5) & ld.so(8) man pages and the Linux ABI & ELF specifications. And also the kernel code doing execve.
The ELF interpreter is responsible for dynamic linking. It has to be a file (technically a statically linked ELF shared library) at some fixed location in the file hierarchy (often /lib/ld.so.2 or /lib/ld-linux.so.2 or /lib64/ld-linux-x86-64.so.2)
The old a.out format from the 1990s had a builtin dynamic linker, partly implemented in old Linux 1.x kernel. It was much less flexible, and much less powerful.
The kernel enables, by such (in principle) arbitrary dynamic linker path, to have various dynamic linkers. But most systems have only one. This is a good way to parameterize the dynamic linker. If you want to try another one, install it in the file system and generate ELF executables mentioning that path.
With great pain and effort, you might make your own ld.so-like dynamic linker implementing your LD_INTERPRETER_PATH wish, but that linker still has to be an ELF shared library sitting at some fixed location in the file tree.
If you want a system not needing any files (at some predefined, and wired locations, like /lib/ld.so, /dev/null, /sbin/init ...), you'll need to build all its executable binaries statically. You may want (but current Linux distributions usually don't do that) to have a few statically linked executables (like /sbin/init, /bin/sash...) that will enable you to repair a system broken to the point of not having any dynamic linker.
BTW, the /sbin/init -or /bin/sh - path is wired inside the kernel itself. You may pass some argument to the kernel at boot load time -e.g. with GRUB- to overwrite the default. So even the kernel wants some files to be here!
As I commented, you might look into MUSL-Libc for an alternative Libc implementation (providing its own dynamic linker). Read also about VDSO and ASLR and initrd.
In practice, accept the fact that modern Linuxes and Unixes are expecting some non-empty file system ... Notice that dynamic linking and shared libraries are a huge progress (it was much more painful in the 1990s Linux kernels and distributions).
Alternatively, define your own binary format, then make a kernel module or a binfmt_misc entry to handle it.
BTW, most (or all) of Linux is free software, so you can improve it (but this will take months -or many years- of work to you). Please share your improvements by publishing them.
Read also Drepper's Hwo to Write Shared Libraries paper; and this question.
I ran into the same issue. In my case I want to bundle my application with a different GLIBC than comes system installed. Since ld-linux.so must match the GLIBC version I can't simply deploy my application with the according GLIBC. The problem is that I can't run my application on older installations that don't have the required GLIBC version.
The path to the loader interpreter can be modified with --dynamic-linker=/path/to/interp. However, this needs to be set at compile time and therefore would require my application to be installed in that location (or at least I would need to deploy the ld-linux.so that goes with my GLIBC in that location which goes against a simple xcopy deployment.
So what's needed is an $ORIGIN option equivalent to what the -rpath option can handle. That would allow for a fully dynamic deployment.
Given the lack of a dynamic interpreter path (at runtime) leaves two options:
a) Use patchelf to modify the path before the executable gets launched.
b) Invoke the ld-linux.so directly with the executable as an argument.
Both options are not as 'integrated' as a compiled $ORIGIN path in the executable itself.

What are the differences between LD_PRELOAD and strace?

Both methods are used to gather system calls also parameters and return values of them. When we prefer LD_PRELOAD and why? Maybe we can say that we can only gather syscalls via strace but we can gather library calls with LD_PRELOAD trick. However, there is another tracer for libraries whose name is ltrace.
strace is using the ptrace(2) syscall (with PTRACE_SYSCALL probably), so will catch every system call (thru kernel hooks installed by ptrace). It will work on any executable, even on statically linked ones, or those using something else than your distribution's GNU Glibc (like e.g. musl-libc, or some assembly written utility like old versions of busybox).
LD_PRELOAD tricks use the dynamic loader e.g. /lib64/ld-linux-x86-64.so.2 or /lib/ld.so (see ld.so(8) man page) etc... so won't work with statically linked executables (or those using something else than your dynamic loader and your GNU libc).
ltrace is probably also ptrace based.
And all these are being free software, you could study their source code (and improve it).

Is there any difference between executable binary files between distributions?

As all Linux distributions use the same kernel, is there any difference between their executable binary files?
If yes, what are the main differences? Or does that mean we can build a universal linux executable file?
All Linux distributions use the same binary format ELF, but there is still some differences:
different cpu arch use different instruction set.
the same cpu arch may use different ABI, ABI defines how to use the register file, how to call/return a routine. Different ABI can not work together.
Even on same arch, same ABI, this still does not mean we can copy one binary file in a distribution to another. Since most binary files are not statically linked, so they depends on the libraries under the distribution, which means different distribution may use different versions or different compilation configuration of libraries.
So if you want your program to run on all distribution, you may have to statically link a version that depends on the kernel's syscall only, even this you can only run a specified arch.
If you really want to run a program on any arch, then you have to compile binaries for all arches, and use a shell script to start up the right one.
All Linux ports (that is, the Linux kernel on different processors) use ELF as the file format for executables and libraries. A specific ELF binary is labeled with a single architecture/OS on which it can run (although some OSes have compatibility to run ELF binaries from other OSes).
Most ports have support for the older a.out format. (Some processors are new enough that there have never existed any a.out executables for them.)
Some ports support other executable file formats as well; for example, the PA-RISC port has support for HP-UX's old SOM executables, and the μcLinux (nonmmu) ports support their own FLAT format.
Linux also has binfmt_misc, which allows userspace to register handlers for arbitrary binary formats. Some distributions take advantage of this to be able to execute Windows, .NET, or Java applications -- it's really still launching an interpreter, but it's completely transparent to the user.
Linux on Alpha has support for loading Intel binaries, which are run via the em86 emulator.
It's possible to register binfmt_misc for executables of other architectures, to be run with qemu-user.
In theory, one could create a new format -- perhaps register a new "architecture" in ELF -- for fat binaries. Then the kernel binfmt loader would have to be taught about this new format, and you wouldn't want to miss the ld-linux.so dynamic linker and the whole build toolchain. There's been little interest in such a feature, and as far as I know, nobody is working on anything like it.
Almost all linux program files use the ELF standard.
Old Unixes also used COFF format. You may still find executables from times of yore in this format. Linux still has support for it (I don't know if it's compiled in current distros, though).
If you want to create a program that runs an all Linux distributions, you can consider using scripting languages (like Python and Perl) or a platform independent programming language like Java.
Programs written in scripting languages are complied at execution time, which means they are always compiled to match the platform they are executed on, and, hence, should always work (given that the libraries are set up properly).
Programs written in Java, on the other hand, are compiled before distributing them, but can be executed on any Linux distribution as long as it has a Java VM installed.
Furthermore, programs written in Java can be run on other operating systems like MS Windows and Mac OS.
The same is true for many programs written in Python and Perl; however, whether a Python or Perl program will work on another operating system depends on what libraries are used by that program and whether these libraries are available on the other operating systems.

Resources