I've read this article about Position Independent Code, and this is what I got (focusing on function calls): When a shared library is built and linked, it is unknown to what memory address it will be loaded, and that's why we use Position Independent Code (PIC).
One of the mechanisms in PIC is the use of PLT (Procedure Linkage Table) for calling functions in a way that our code remains position independent, and it works basically like this: For each function we call (func) in our library, there's a func#plt small procedure, which is the one actually being called. This procedure then jumps to an address stored in a corresponding entry in the GOT. First, this address will point back to the PLT entry which will then call the dynamic linker resolver to resolve the actual function's address. The dynamic linker will now override the entry in the GOT to point to the correct actual function.
Now, all this is possible since when linking the shared library, the distance between the instruction that calls func and the func#plt is known, and also the distance to the entry in the GOT. All relative and great !
My question is: I can understand why wanting to have this mechanism for function calls in the file linking against the shared library: this is in order to only resolve function addresses when they are requested. But I don't understand why is this needed within the shared library itself!
Imagine we have a function func in the shared library, calling func2. The distance between func and func2 is known at the stage of static linking, so we can compute the position of func2 relatively to the instruction pointer, just as we do to find the func2#plt.
PLT is indeed much less efficient than simple PIC call.
In shared library PLT is only needed for external (imported) functions or for internal functions that have default visibility. Such functions can be interposed at runtime by functions with same names from different shlibs and interposition is handled through PLT/GOT.
Unfortunately by default on Linux systems all functions have default visibility but user can change this with -fvisibility=hidden compiler flag or linker version scripts (see e.g. this question for details).
Related
Does anyone know the general rule for exactly which LLVM IR code will be executed before main?
When using Clang++ 3.6, it seems that global class variables have their constructors called via a function in the ".text.startup" section of the object file. For example:
define internal void #__cxx_global_var_init() section ".text.startup" {
call void #_ZN7MyClassC2Ev(%class.MyClass* #M)
ret void
}
From this example, I'd guess that I should be looking for exactly those IR function definitions that specify section ".text.startup".
I have two reasons to suspect my theory is correct:
I don't see anything else in my LLVM IR file (.ll) suggesting that the global object constructors should be run first, if we assume that LLVM isn't sniffing for C++ -specific function names like "__cxx_global_var_init". So section ".text.startup" is the only obvious means of saying that code should run before main(). But even if that's correct, we've identified a sufficient condition for causing a function to run before main(), but haven't shown that it's the only way in LLVM IR to cause a function to run before main().
The Gnu linker, in some cases, will use the first instruction in the .text section to be the program entry point. This article on Raspberry Pi programming describes causing the .text.startup content to be the first body of code appearing in the program's .text section, as a means of causing the .text.startup code to run first.
Unfortunately I'm not finding much else to support my theory:
When I grep the LLVM 3.6 source code for the string ".startup", I only find it in the CLang-specific parts of the LLVM code. For my theory to be correct, I would expect to have found that string in other parts of the LLVM code as well; in particular, parts outside of the C++ front-end.
This article on data initialization in C++ seems to hint at ".text.startup" having a special role, but it doesn't come right out and say that the Linux program loader actually looks for a section of that name. Even if it did, I'd be surprised to find a potentially Linux-specific section name carrying special meaning in platform-neutral LLVM IR.
The Linux 3.13.0 source code doesn't seem to contain the string ".startup", suggesting to me that the program loader isn't sniffing for a section with the name ".text.startup".
The answer is pretty easy - LLVM is not executing anything behind the scenes. It's a job of the C runtime (CRT) to perform all necessary preparations before running main(). This includes (but not limited to) to static ctors and similar things. The runtime is usually informed about these objects via addresses of constructores being emitted in the special sections (e.g. .init_array or .ctors). See e.g. http://wiki.osdev.org/Calling_Global_Constructors for more information.
I'd like to understand the dynamic-linker/loader behaviour on Linux box in the problematic case I work upon.
Our code that crashes is loaded as a plugin (dlopen(libwrapper.so, RTLD_GLOBAL)). libwrapper.so is just a thin layer that loads another plugins that do the real job. These plugins can be named: P1 and P2, each of these depend on common library called F (all together very much simplified).
Wrapper (libwrapper.so) is introduced to allow loading Pn without RTLD_GLOBAL, since that flag set leads to obvious linkage problems loading Pns (they have the same API). RTLD_DEEPBIND is not an option since target platform is too old - does not support it.
To our surprise, the problem manifests in F library at the load time of P2 (when P1 is already loaded (and initialized) and F as its implicit dependency). At the time P2 is explicitly loaded (dlopen(libP2.so, RTLD_LOCAL | RTLD_NOW)), dynamic linker reports no problems, but calling code within F to instantiate some type instances defined in F (again) leads to segmentation faults on various places (in case one is skipped / out-commented, it crashes on another place - therefore didn't spent time to investigate the code pattern that might be troublesome, since more general problem / misunderstanding is suspected). There are no inlined functions used, code is linked with -Wl,-E, visibility default, GCC is 3.4.4.. F code is very much stable and used within standalone apps or as part of plugins in the past.
I thought to link F as static library to workaround any problem there might be with the dynamic linker, but result is the same.
My view on the topic:
linking F as dynamic library leads dynamic linker to "know" F is referenced second time loading P2 and just increments the reference counter and does not call static initializers (which is ok), but does relocations (again, and this seems to be problematic).
linking F as static library leads dynamic linker to load F code as statically linked part of P2 (P2F) and does relocations within P2F. However, "somehow" common symbols from F gets messed up with P1F code instance.
Assumption about the workaround to make the code at least work:
link P1 ... Pn in a single shared library (single plugin), whether F is shared / static doesn't matter. This way any relocation is done only once.
I'd appreciate any feedback is my view on the topic wrong / too simplified / missing important part? Is this some known GCC / binutils bug from the past?
My view on the topic:
Your view on the topic is wrong; but there is no way to prove that to you.
Write a minimal test case that simulates what your system does, and still crashes in a similar way. Update your question with actual broken code; then we can tell you exactly what the problem is.
There is also a very good chance that in reducing the problem to the minimal example, you'll discover what the problem is yourself.
Either way you'll understand the problem, and will learn something new.
Mac OS X provides a useful library for dynamic loading, called dyld. Among the many interesting functions for dynamic loading handling are functions to allow one to install callbacks which will be called by dyld whenever an image is loaded or unloaded, by dlopen and dlclose, respectively. Those functions are void _dyld_register_func_for_add_image(void (*func)(const struct mach_header* mh, intptr_t vmaddr_slide)) and void _dyld_register_func_for_remove_image(void (*func)(const struct mach_header* mh, intptr_t vmaddr_slide)), respectively.
I know it's not possible to have an exact port for Linux, because dyld functions deal with mach-o files and Linux uses ELF files.
So, is there an equivalent of the dyld library for Linux. Or, at least, is there an equivalent of those two functions, _dyld_register_func_for_add_image and _dyld_register_func_for_remove_image, in any Linux library? Or will I have to implement my own versions of these two by myself, which is not so hard, but I would have to find a way to make dlopen and dlclose call callback functions whenever they get called.
EDIT
To must things clearer, I need to make a library that has a callback function that must be called whenever an external library is dynamically loaded by dlopen. My callback function must perform some operations on any dynamic loaded library.
Yes, it is called dlopen(3) using the -ldl standard library
More precisely:
compile your plugin's source code using the -fPIC flag to get position independent code object files *.pic.o
make a shared library plugin by linking with gcc -shared your *.pic.o files (and you can also link another shared library).
use GCC function attributes, notably constructor and destructor functions (or static C++ data with explicit constructors & destructors, hence the name). The functions with __attribute__((constructor)) are called during dlopen time of your plugin, those with __attribute__((destructor)) in your plugin are called during dlclose time
linking the main program with the -rdynamic attribute is useful & needed, as soon as the plugin call some functions in the main program.
don't forget to declare extern "C" your C++ plugin functions (needed for the program)
use dlsym inside your main program to fetch function or data addresses inside your plugin.
There is indeed no hooks for dlopen like _dyld_register_func_for_add_image does. You may want to use constructor functions and/or dl_iterate_phdr(3) to mimic that.
If you can change the plugin (the shared object which you dlopen) you could play constructor tricks inside to mimic such hooks. Otherwise, use some own convention (e.g. that a plugin having a module_start function gets that module_start function called just after dlopen etc...).
Some libraries are wrapping dlopen into something of higher level. For example Qt has QPluginLoader & QLibrary etc...
There is also the LD_PRELOAD trick (perhaps you might redefine your own dlopen & dlclose thru such a trick, and have your modified functions do the hooks). The ifunc function attribute might also be relevant.
And since Gnu Libc is free software providing the dlopen - there is also MUSL Libc, you could patch it to suit your needs. dladdr(3) could be useful too!
addenda
If you are making your own runtime for some Objective-C, you should know well the conventions of the Objective-C compiler using that runtime, and you probably could have your own module loader, instead of overloading dlopen...
With respect to the following link:
http://www.archlinux.org/news/libpnglibtiff-rebuilds-move-from-testing/
Could someone explain to me why a program should be rebuilt after one of its libraries has been updated?
How does that make any sense since the "main" file is not changed at all?
If the signatures of the functions involved haven't changed, then "rebuilding" the program means that the object files must be linked again. You shouldn't need to compile them again.
An API is contract that describes the interface to the public functions in a library. When the compiler generates code, it needs to know what type of variables to pass to each function, and in what order. It also needs to know the return type, so it knows the size and format of the data that will be returned from the function. When your code is compiled, the address of a library function may be represented as "start of the library, plus 140 bytes." The compiler doesn't know the absolute address, so it simply specifies an offset from the beginning of the library.
But within the library, the contents (that is, the implementations) of the functions may change. When that happens, the length of the code may change, so the addresses of the functions may shift. It's the job of the linker to understand where the entry points of each function reside, and to fill those addresses into the object code to create the executable.
On the other hand, if the data structures in the library have changed and the library requires the callers to manage memory (a bad practice, but unfortunately common), then you will need to recompile the code so it can account for the changes. For example, if your code uses malloc(sizeof(dataStructure)) to allocate memory for a library data structure that's doubled in size, you need to recompile your code because sizeof(dataStructure) will have a larger value.
There are two kinds of compatibility: API and ABI.
API compatibility is about functions and data structures which other programs may rely on. For instance if version 0.1 of libfoo defines an API function called "hello_world()", and version 0.2 removes it, any programs relying on "hello_world()" need updating to work with the new version of libfoo.
ABI compatibility is about the assumptions of how functions and, in particular, data structures are represented in the binaries. If for example libfoo 0.1 also defined a data structure recipe with two fields: "instructions" and "ingredients" and libfoo 0.2 introduces "measurements" before the "ingredients" field then programs based on libfoo 0.1 recipes must be recompiled because the "instructions" and "ingredients" fields will likely be at different positions in the 0.2 version of the libfoo.so binary.
What is a "library"?
If a "library" is only a binary (e.g. a dynamically linked library aka ".dll", ".dylib" or ".so"; or a statically linked library aka ".lib" or ".a") then there is no need to recompile, re-linking should be enough (and even that can be avoided in some special cases)
On the other hand, libraries often consist of more than just the binary object - e.g. the header-files might include some in-line (or macro) logic.
if so, re-linking is not enough, and you might to have to re-compile in order to make use of the newest version of the lib.
everyone what is the difference between those 4 terms, can You give please examples?
Static and dynamic are jargon words that refer to the point in time at which some programming element is resolved. Static indicates that resolution takes place at the time a program is constructed. Dynamic indicates that resolution takes place at the time a program is run.
Static and Dynamic Typing
Typing refers to changes in program structure that are due to the differences between data values: integers, characters, floating point numbers, strings, objects and so on. These differences can have many effects, for example:
memory layout (e.g. 4 bytes for an int, 8 bytes for a double, more for an object)
instructions executed (e.g. primitive operations to add small integers, library calls to add large ones)
program flow (simple subroutine calling conventions versus hash-dispatch for multi-methods)
Static typing means that the executable form of a program generated at build time will vary depending upon the types of data values found in the program. Dynamic typing means that the generated code will always be the same, irrespective of type -- any differences in execution will be determined at run-time.
Note that few real systems are either purely one or the other, it is just a question of which is the preferred strategy.
Static and Dynamic Binding
Binding refers to the association of names in program text to the storage locations to which they refer. In static binding, this association is predetermined at build time. With dynamic binding, this association is not determined until run-time.
Truly static binding is almost extinct. Earlier assemblers and FORTRAN, for example, would completely precompute the exact memory location of all variables and subroutine locations. This situation did not last long, with the introduction of stack and heap allocation for variables and dynamically-loaded libraries for subroutines.
So one must take some liberty with the definitions. It is the spirit of the concept that counts here: statically bound programs precompute as much as possible about storage layout as is practical in a modern virtual memory, garbage collected, separately compiled application. Dynamically bound programs wait as late as possible.
An example might help. If I attempt to invoke a method MyClass.foo(), a static-binding system will verify at build time that there is a class called MyClass and that class has a method called foo. A dynamic-binding system will wait until run-time to see whether either exists.
Contrasts
The main strength of static strategies is that the program translator is much more aware of the programmer's intent. This makes it easier to:
catch many common errors early, during the build phase
build refactoring tools
incur a significant amount of the computational cost required to determine the executable form of the program only once, at build time
The main strength of dynamic strategies is that they are much easier to implement, meaning that:
a working dynamic environment can be created at a fraction of the cost of a static one
it is easier to add language features that might be very challenging to check statically
it is easier to handle situations that require self-modifying code
Typing - refers to variable tyes and if variables are allowed to change type during program execution
http://en.wikipedia.org/wiki/Type_system#Type_checking
Binding - this, as you can read below can refer to variable binding, or library binding
http://en.wikipedia.org/wiki/Binding_%28computer_science%29#Language_or_Name_binding