How can a shared library know where it resides?

How can a shared library know where it resides? - linux

I'm developing a shared library for linux machines, which is dynamically loaded relative to the main executable with rpath.
Now, the library itself tries to load other libraries dynamically relative to its location but without rpath (I use scandir to search for shared libraries in a certain folder - I don't know their names yet).
This works only, if the working directory is set to the shared libraries location, as otherwise I look in a different directory as intended.
Is there any practical, reliable way for the shared library to determine where it resides?
I know, I could use /proc/self/maps or something like that, to get the loaded files, but this works only, as long the library knows its own name.
Another idea is so use dlinfo(), but to use it, the shared library need to know its own handle.

Is there any practical, reliable way for the shared library
to determine where it resides?
I'd use dlinfo and /proc/self/maps (proc may not always be mounted, especially in containers).
I know, I could use /proc/self/maps or something like that,
to get the loaded files, but this works only, as long the library
knows its own name.
Not really, you can take a pointer to some code inside library (preferably to some internal label, to avoid messing with PLT/GOT) and compare result against memory range obtained from /proc/self/maps (or dlinfo).

No need to muck around with the page mappings or dlinfo. You can use dladdr on any symbol defined in the shared library:
#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdlib.h>
static char *lib_path(void) {
Dl_info info;
if (! dladdr((void *) lib_path, & info)) return NULL;
return realpath(info.dli_fname, NULL);
}
Technically this isn’t portable; in practice, it works even on on-Linux systems such as macOS. You may want to manually allocate the storage for realpath to avoid non-standard behaviour there (on Linux and macOS, realpath itself mallocs the storage, and needs to be freed by the caller).
This returns the path to the shared library itself. If you want to obtain the directory, you could use something like dirname (careful: reentrant) or modify the string yourself.

Related

How to link kernel functions to user-space program?

I have a user-space program (Capstone). I would like to use it in FreeBSD kernel. I believe most of them have the same function name, semantics and arguments (in FreeBSD, the kernel printf is also named printf). First I built it as libcapstone.a library, and link my program with it. Since the include files are different between Linux user-space and FreeBSD kernel, the library cannot find symbols like sprintf, memset and vsnprintf. How could I make these symbols (from FreeBSD kernel) visible to libcapstone.a?
If directly including header files like <sys/systm.h> in Linux user-space source code, the errors would be like undefined type u_int or u_char, even if I add -D_BSD_SOURCE to CFLAGS.
Additionally, any other better ways to do this?

You also need ; take a look at kernel man pages, eg "man 9 printf". They list required includes at the top.
Note, however, that you're trying to do something really hard. Some basic functions (eg printf) might be there; others differ completely (eg malloc(9)), and most POSIX APIs are simply not there. You won't be able to use open(2), socket(2), or fork(2).

Call dlopen with file descriptor?

I want to open a shared object as a data file and perform a verification check on it. The verification is a signature check, and I sign the shared object. If the verification is successful, I would like to load the currently opened shared object as a proper shared object.
First question: is it possible to call dlopen and load the shared object as a data file during the signature check so that code is not executed? According to the man pages, I don't believe so since I don't see a flag similar to RTLD_DATA.
Since I have the shared object open as a data file, I have the descriptor available. Upon successful verification, I would like to pass the descriptor to dlopen so the dynamic loader loads the shared object properly. I don't want to close the file then re-open it via dlopen because it could introduce a race condition (where the file verified is not the same file opened and executed).
Second question: how does one pass an open file to dlopen using a file descriptor so that dlopen performs customary initialization of the shared object?

On Linux, you probably could dlopen some /proc/self/fd/15 file (for file descriptor 15).
RTLD_DATA does not seems to exist. So if you want it, you have to patch your own dynamic loader. Perhaps doing that within MUSL Libc could be less hard. I still don't understand why you need it.
You have to trust the dlopen-ed plugin somehow (and it will run its constructor functions at dlopen time).
You could analyze the shared object plugin before dlopen-ing it by using some ELF parsing library, perhaps libelf or libbfd (from binutils); but I still don't understand what kind of analysis you want to make (and you really should explain that; in particular what happens if the plugin is indirectly linked to some bad behaving software). In other words you should explain more about your verification step. Notice that a shared object could overwrite itself....
Alternatively, don't use dlopen and just mmap your file (you'll need to parse some ELF and process relocations; see elf(5) and Levine's Linkers and Loaders for details, and look into the source code of your ld.so, e.g. in GNU glibc).
Perhaps using some JIT generation techniques might be useful (you would JIT generate code from some validated data), e.g. with GCCJIT, LLVM, or libjit or asmjit (or even LuaJit or SBCL) etc...
And if you have two file descriptors to the same shared object you probably won't have any race conditions.
An option is to build your ad-hoc static C or C++ source code analyzer (perhaps using some GCC plugin provided by you). That plugin might (with months, or perhaps years, of development efforts) check some property of the user C++ code. Beware of Rice's theorem (limiting the properties of every static source code analyzer). Then your program might (like my manydl.c does, or like RefPerSys will soon do, in mid 2020, or like the obsolete GCC MELT did a few years ago) take as input the user C++ code, run some static analysis on that C++ code (e.g. using your GCC plugin), compile that C++ code into a temporary shared object, and dlopen that shared object. So read Drepper's paper How to Write Shared Libraries.

How tcamalloc gets linked to the main program

I want to know how malloc gets linked to the main program.Basically I have a program which uses several static and dynamic libraries.I am including all these in my makefile using option "-llibName1 -llibName2".
The documentation of TCmalloc says that we can override our malloc simply by calling "LD_PRELOAD=/usr/lib64/libtcmalloc.so".I am not able to understand how tcamlloc gets called to the all these static and dynamic libraries.Also how does tcmalloc also gets linked to STL libraries and new/delete operations of C++?
can anyone please give any insights on this.

"LD_PRELOAD=/usr/lib64/libtcmalloc.so" directs the loader to use libtcmalloc.so before any other shared library when resolving symbols external to your program, and because libtcmalloc.so defines a symbol named "malloc", that is the verison your program will use.
If you omit the LD_PRELOAD line, glibc.so (or whatever C library you have on your system) will be the first shared library to define a symbol named "malloc".
Note also that if you link against a static library which defines a symbol named "malloc" (and uses proper arguments, etc), or another shared library is loaded that defines a symbol named "malloc", your program will attempt to use that version of malloc.
That's the general idea anyway; the actual goings-on is quite interesting and I will have to direct to you http://en.wikipedia.org/wiki/Dynamic_linker as a starting point for more information.

How/where is the working directory of a program stored?

When a program accesses files, uses system(), etc., how and where is the current working directory of that program physically known/stored? Since logically the working directory of a program is similar to a global variable, it should ideally be thread-local, especially in languages like D where "global" variables are thread-local by default. Would it be possible to make the current working directory of a program thread-local?
Note: If you are not familiar with D specifically, even a language-agnostic answer would be useful.

On Linux, each process is represented by a process descriptor - a task_struct. This structure is defined in include/linux/sched.h in the kernel source.
One of the fields of task_struct is a pointer to an fs_struct, which stores filesystem-related information. fs_struct is defined in include/linux/fs_struct.h.
fs_struct has a field called pwd, which stores information about the current working directory (the filesystem it is on, and the details of the directory itself).

Current directory is maintained by the OS, not by language or framework. See description of GetCurrentDirectory WinAPI function for details.
From description:
Multithreaded applications and shared
library code should not use the
GetCurrentDirectory function and
should avoid using relative path
names. The current directory state
written by the SetCurrentDirectory
function is stored as a global
variable in each process, therefore
multithreaded applications cannot
reliably use this value without
possible data corruption from other
threads that may also be reading or
setting this value.

Linking symbols to fixed addresses on Linux

How would one go about linking (some) symbols to specific fixed addresses using GNU ld so that the binary could still be executed as normal in Linux (x86)? There will not be any accesses to those symbols, but their addresses are important.
For example, I'd have the following structure:
struct FooBar {
Register32 field_1;
Register32 field_2;
//...
};
struct FooBar foobar;
I'd like to link foobar to address 0x76543210, but link the standard libraries and the rest of the application normally. The application will then make use of the address of foobar, but will not reference the (possibly non-existent) memory behind it.
The rationale for this request is that this same source can be used on two platforms: On the native platform, Register32 can simply be a volatile uint32_t, but on Linux Register32 is a C++ object with the same size as a uint32_t that defines e.g. operator=, which will then use the address of the object and sends a request to a communication framework with that address (and the data) to perform the actual access on remote hardware. The linker would thus ensure the Register32 fields of the struct refer to the correct "addresses".

The suggestion by litb to use --defsym symbol=address does work, but is a bit cumbersome when you have a few dozen such instances to map. However, --just-symbols=symbolfile does just the trick. It took me a while to find out the syntax of the symbolfile, which is
symbolname1 = address;
symbolname2 = address;
...
The spaces seem to be required, as otherwise ld reports file format not recognized; treating as linker script.

Try it with
--defsym symbol=expression
As with this:
gcc -Wl,--defsym,foobar=0x76543210 file.c
And make foobar in your code an extern declaration:
extern struct FooBar foobar;
This looks promising. However, it's a bad idea to do such a thing (unless you really know what you do). Why do you need it?

I'll give you the hot tip... GNU LD can do this (assuming the system libs don't need the address you want). You just need to build your own linker script instead of using the compiler's autogenerated one. Read the man page for ld. Also, building a linker script for a complex piece of software is no easy task when you involve the GLIBC too.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How can a shared library know where it resides? - linux

Related

How to link kernel functions to user-space program?

Call dlopen with file descriptor?

How tcamalloc gets linked to the main program

How/where is the working directory of a program stored?

Linking symbols to fixed addresses on Linux

Categories

Resources