Linking symbols to fixed addresses on Linux - linux

How would one go about linking (some) symbols to specific fixed addresses using GNU ld so that the binary could still be executed as normal in Linux (x86)? There will not be any accesses to those symbols, but their addresses are important.
For example, I'd have the following structure:
struct FooBar {
Register32 field_1;
Register32 field_2;
//...
};
struct FooBar foobar;
I'd like to link foobar to address 0x76543210, but link the standard libraries and the rest of the application normally. The application will then make use of the address of foobar, but will not reference the (possibly non-existent) memory behind it.
The rationale for this request is that this same source can be used on two platforms: On the native platform, Register32 can simply be a volatile uint32_t, but on Linux Register32 is a C++ object with the same size as a uint32_t that defines e.g. operator=, which will then use the address of the object and sends a request to a communication framework with that address (and the data) to perform the actual access on remote hardware. The linker would thus ensure the Register32 fields of the struct refer to the correct "addresses".

The suggestion by litb to use --defsym symbol=address does work, but is a bit cumbersome when you have a few dozen such instances to map. However, --just-symbols=symbolfile does just the trick. It took me a while to find out the syntax of the symbolfile, which is
symbolname1 = address;
symbolname2 = address;
...
The spaces seem to be required, as otherwise ld reports file format not recognized; treating as linker script.

Try it with
--defsym symbol=expression
As with this:
gcc -Wl,--defsym,foobar=0x76543210 file.c
And make foobar in your code an extern declaration:
extern struct FooBar foobar;
This looks promising. However, it's a bad idea to do such a thing (unless you really know what you do). Why do you need it?

I'll give you the hot tip... GNU LD can do this (assuming the system libs don't need the address you want). You just need to build your own linker script instead of using the compiler's autogenerated one. Read the man page for ld. Also, building a linker script for a complex piece of software is no easy task when you involve the GLIBC too.

Related

How to determine the effective user id of a process in Rust?

On Linux and other POSIX systems, a program can be executed under the identity of another user (i.e. euid). Normally, you'd call geteuid and friends to reliably determine the current identities of the process. However, I couldn't figure out a reliable way to determine these identities using only rust's standard library.
The only thing I found that was close is std::os::unix::MetadataExt.
Is it currently possible to determine the euid (and other ids) of process using the rust's standard library? Is there a function or trait I'm missing?
This is going to be on an OS-specific dependency as the concept does not exist (or do what you think it will!) for most of the targets you can build rust code for. In particular, you will find this in the libc crate, which is, as the name suggests, a very small wrapper over libc.
The std::os namespace is typically limited for the bare minimum to get process and FS functionality going for the std::process, std::thread and std::fs modules. As such, it would not have been in there. MetadataExt is, for a similar reason, aimed and targeted at filesystem usage.
As you could have expected, the call itself is, unimaginatively, geteuid.
It is an unsafe extern import, so you'll have to wrap it in an unsafe block.
It appears that Rust 1.46.0 doesn't expose this functionality in the standard library. If you're using a POSIX system and don't want to rely on an extra dependency, you have four options:
You can use libc directly:
#[link(name = "c")]
extern "C" {
fn geteuid() -> u32;
fn getegid() -> u32;
}
If you're using GNU/Linux in particular, you won't need to link to libc at all since the system call symbols are automatically made available to your program via the VDSO. In other words, you can use a plain extern block without the link attribute.
Read /proc/self/status (potentially Linux only?). This file contains a line that starts with Uid:. This line lists the real user id, effective user id, and other information that you may also find relevant. Refer to man proc for more information.
If you're using a normal GNU/Linux system, you can access the metadata of the /proc/self directory itself. As pointed out in this question, the owner of this directory should match the effective user id of the process. You can get the euid as follows:
use std::os::unix::fs::MetadataExt;
println!("metadata for {:?}", std::fs::metadata("/proc/self").map(|m| m.uid()));
A benefit this approach provides is that it is relatively cheap compared to option #2 since it's only a single stat syscall (as opposed to opening a file and reading/parsing its contents).
If you're not using a normal GNU/Linux system, you might find success in creating a new dummy file and obtaining the owner id normally via Metadata.

does the loader modify relocation information on program startup?

I have always believed that resolving absolute addresses is completely the linker's job. That is, after the linker combines all object files into one executable, it will then modify all absolute addresses to reflect the new location inside the executable. But after reading in here that the loader does not have to place programs text at the address specified by the linker, I got really confused.
Take this code for example
Main.c
void printMe();
int main(){
printMe();
return 0;
}
Foo.c
/* Lots of other functions*/
void printMe(){
printf("Hello");
}
Say that after linking, the code for main gets placed at address 0x00000010 and the code for printMe gets placed at address 0x00000020. Then when the program is launched, the loader will indeed load main and printMe to their virtual addresses as specified by the linker. But if the loader does not load the program in that manner, then won't that break all absolute address references.
A program is normally composed of several modules created by a linker. There is the executable and usually a number of shared libraries. On some systems one executable can load another executable and call it's starting routine as a function.
If all these compiled uses had fixed addresses, it is likely there would be conflicts upon loading. If two linked modules used the same address, the application could not load.
For decades, relocatable code has been the solution to that problem. A module can be loaded anywhere. Some system take this to the next step and randomly place modules in memory for security.
There are some situations where code cannot be purely relocatable.
If you have something like this:
static int b, *a = &b ;
the initialization depends upon where the model is placed in memory (and where "b" is located). Linkers usually generate information for such constructs so that the loader can fix them up.
Thus, this is not correct:
I have always believed that resolving absolute addresses is completely the linker's job.
According to my knowledge, it's not the case here.
If it is linked statically, then the address of function is calculated statically by th linker. Because the relative address is know, so a relative function call is issued, and everything will be fine.
If it is linked dynamically, then ld.so comes in and loads the lib. The symbol is resolve either by Load-time relocation of shared libraries
or by Position Independent Code (PIC) in shared libraries
(these 2 articles aren't writen by me).
To simply put,
load-time relocation is done by rewriting code to give them the correct address, which disables wirte-protect and share among different processes.
PIC is done by adding 2 sections called GOT and PLT, all at a specific address that can be know at link-time. A call to a function in dynamic lib will first call a ...#plt function (E.x. printf#plt) and then it will jump *GOT[offset]. At the first call, this will actually be the address of the next instruction, which will call the dynamic loader to load the function. At the second call, this will be the address of the function. As you can see, this cost additional memory and time compared to normal code.

How can a shared library know where it resides?

I'm developing a shared library for linux machines, which is dynamically loaded relative to the main executable with rpath.
Now, the library itself tries to load other libraries dynamically relative to its location but without rpath (I use scandir to search for shared libraries in a certain folder - I don't know their names yet).
This works only, if the working directory is set to the shared libraries location, as otherwise I look in a different directory as intended.
Is there any practical, reliable way for the shared library to determine where it resides?
I know, I could use /proc/self/maps or something like that, to get the loaded files, but this works only, as long the library knows its own name.
Another idea is so use dlinfo(), but to use it, the shared library need to know its own handle.
Is there any practical, reliable way for the shared library
to determine where it resides?
I'd use dlinfo and /proc/self/maps (proc may not always be mounted, especially in containers).
I know, I could use /proc/self/maps or something like that,
to get the loaded files, but this works only, as long the library
knows its own name.
Not really, you can take a pointer to some code inside library (preferably to some internal label, to avoid messing with PLT/GOT) and compare result against memory range obtained from /proc/self/maps (or dlinfo).
No need to muck around with the page mappings or dlinfo. You can use dladdr on any symbol defined in the shared library:
#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdlib.h>
static char *lib_path(void) {
Dl_info info;
if (! dladdr((void *) lib_path, & info)) return NULL;
return realpath(info.dli_fname, NULL);
}
Technically this isn’t portable; in practice, it works even on on-Linux systems such as macOS. You may want to manually allocate the storage for realpath to avoid non-standard behaviour there (on Linux and macOS, realpath itself mallocs the storage, and needs to be freed by the caller).
This returns the path to the shared library itself. If you want to obtain the directory, you could use something like dirname (careful: reentrant) or modify the string yourself.

How to link kernel functions to user-space program?

I have a user-space program (Capstone). I would like to use it in FreeBSD kernel. I believe most of them have the same function name, semantics and arguments (in FreeBSD, the kernel printf is also named printf). First I built it as libcapstone.a library, and link my program with it. Since the include files are different between Linux user-space and FreeBSD kernel, the library cannot find symbols like sprintf, memset and vsnprintf. How could I make these symbols (from FreeBSD kernel) visible to libcapstone.a?
If directly including header files like <sys/systm.h> in Linux user-space source code, the errors would be like undefined type u_int or u_char, even if I add -D_BSD_SOURCE to CFLAGS.
Additionally, any other better ways to do this?
You also need ; take a look at kernel man pages, eg "man 9 printf". They list required includes at the top.
Note, however, that you're trying to do something really hard. Some basic functions (eg printf) might be there; others differ completely (eg malloc(9)), and most POSIX APIs are simply not there. You won't be able to use open(2), socket(2), or fork(2).

Putting code and data into the same section in a Linux kernel module

I'm writing a Linux kernel module in which I would like to have some code and associated data in the same section. I declare the data and the functions with the attribute tags, like:
void * foo __attribute__ ((section ("SEC_A"))) = NULL;
void bar(void) __attribute__ ((section("SEC_A")));
However when I do this, gcc complains with:
error: foo causes a section type conflict
If I do not declare the function with the specific section name, gcc is fine with it. But I want both the function and the variable to be in the same section.
Is there any way to do that with gcc? My gcc version is gcc (Ubuntu 4.3.2-1ubuntu12) 4.3.2
From the GCC manual:
Some file formats do not support arbitrary sections so the section attribute is not available on all platforms. If you need to map the entire contents of a module to a particular section, consider using the facilities of the linker instead.
IIRC, linux uses a flat memory model, so you don't gain anything by "forcing" things into a single section, anyway, do you?
Hmmm. I suppose you could make an asm function to reserve the space and then do pointer foo to get it's address. Might want to wrap the ugly in a macro...
Another thought would be to split the problem in half; write a small example case of the closest thing you can and still compile, get the asm code, and tinker with it to see what you can get past the downstream stages. If nothing else, you could write something to mungle the asm code for that module, entomb it in you make file, and call it good.
Yet another thought: try putting the variable definitions in a small asm module (e,g, as db's or whatever with the right section declarations) and let the linker handle it.
I think you cannot put text (function) and data (BSS) objects into the same section because (some) OSes assume immutability of .TEXT section types for process re-use.

Resources