I'd like to understand the dynamic-linker/loader behaviour on Linux box in the problematic case I work upon.
Our code that crashes is loaded as a plugin (dlopen(libwrapper.so, RTLD_GLOBAL)). libwrapper.so is just a thin layer that loads another plugins that do the real job. These plugins can be named: P1 and P2, each of these depend on common library called F (all together very much simplified).
Wrapper (libwrapper.so) is introduced to allow loading Pn without RTLD_GLOBAL, since that flag set leads to obvious linkage problems loading Pns (they have the same API). RTLD_DEEPBIND is not an option since target platform is too old - does not support it.
To our surprise, the problem manifests in F library at the load time of P2 (when P1 is already loaded (and initialized) and F as its implicit dependency). At the time P2 is explicitly loaded (dlopen(libP2.so, RTLD_LOCAL | RTLD_NOW)), dynamic linker reports no problems, but calling code within F to instantiate some type instances defined in F (again) leads to segmentation faults on various places (in case one is skipped / out-commented, it crashes on another place - therefore didn't spent time to investigate the code pattern that might be troublesome, since more general problem / misunderstanding is suspected). There are no inlined functions used, code is linked with -Wl,-E, visibility default, GCC is 3.4.4.. F code is very much stable and used within standalone apps or as part of plugins in the past.
I thought to link F as static library to workaround any problem there might be with the dynamic linker, but result is the same.
My view on the topic:
linking F as dynamic library leads dynamic linker to "know" F is referenced second time loading P2 and just increments the reference counter and does not call static initializers (which is ok), but does relocations (again, and this seems to be problematic).
linking F as static library leads dynamic linker to load F code as statically linked part of P2 (P2F) and does relocations within P2F. However, "somehow" common symbols from F gets messed up with P1F code instance.
Assumption about the workaround to make the code at least work:
link P1 ... Pn in a single shared library (single plugin), whether F is shared / static doesn't matter. This way any relocation is done only once.
I'd appreciate any feedback is my view on the topic wrong / too simplified / missing important part? Is this some known GCC / binutils bug from the past?
My view on the topic:
Your view on the topic is wrong; but there is no way to prove that to you.
Write a minimal test case that simulates what your system does, and still crashes in a similar way. Update your question with actual broken code; then we can tell you exactly what the problem is.
There is also a very good chance that in reducing the problem to the minimal example, you'll discover what the problem is yourself.
Either way you'll understand the problem, and will learn something new.
Related
I'm working on a program that needs to manipulate git repositories. I've decided to use libgit2. Unfortunately, the haskell bindings for it are several years out of date and lack several functions that I require. Because of this I've decided to write the portions that use libgit2 in C and call them through the FFI. For demonstration purposes one of them is called git_update_repo.
git_update_repo works perfectly when used in a pure C program, however when it's called from haskell an assertion fails indicating that the libgit2 global init function, git_libgit2_init, hasn't been called. But, git_libgit2_init is called by git_update_repo. And if I use gdb I can see that git_libgit2_init is indeed called and reports that the initialization has been successful.
I've used nm to examine the executables and found something interesting. In a pure C executable, all the libgit2 functions are dynamically linked (as expected). However, in my haskell executable, git_libgit2_init is dynamically linked, while the rest of the libgit2 functions are statically linked. I'm certain that this mismatch is the cause of my issue.
So why do certain functions get linked dynamically and others statically? How can I change this?
The relevant settings in my .cabal file are
cc-options: -g
c-sources:
src/git-bindings.c
extra-libraries:
git2
Does anyone know the general rule for exactly which LLVM IR code will be executed before main?
When using Clang++ 3.6, it seems that global class variables have their constructors called via a function in the ".text.startup" section of the object file. For example:
define internal void #__cxx_global_var_init() section ".text.startup" {
call void #_ZN7MyClassC2Ev(%class.MyClass* #M)
ret void
}
From this example, I'd guess that I should be looking for exactly those IR function definitions that specify section ".text.startup".
I have two reasons to suspect my theory is correct:
I don't see anything else in my LLVM IR file (.ll) suggesting that the global object constructors should be run first, if we assume that LLVM isn't sniffing for C++ -specific function names like "__cxx_global_var_init". So section ".text.startup" is the only obvious means of saying that code should run before main(). But even if that's correct, we've identified a sufficient condition for causing a function to run before main(), but haven't shown that it's the only way in LLVM IR to cause a function to run before main().
The Gnu linker, in some cases, will use the first instruction in the .text section to be the program entry point. This article on Raspberry Pi programming describes causing the .text.startup content to be the first body of code appearing in the program's .text section, as a means of causing the .text.startup code to run first.
Unfortunately I'm not finding much else to support my theory:
When I grep the LLVM 3.6 source code for the string ".startup", I only find it in the CLang-specific parts of the LLVM code. For my theory to be correct, I would expect to have found that string in other parts of the LLVM code as well; in particular, parts outside of the C++ front-end.
This article on data initialization in C++ seems to hint at ".text.startup" having a special role, but it doesn't come right out and say that the Linux program loader actually looks for a section of that name. Even if it did, I'd be surprised to find a potentially Linux-specific section name carrying special meaning in platform-neutral LLVM IR.
The Linux 3.13.0 source code doesn't seem to contain the string ".startup", suggesting to me that the program loader isn't sniffing for a section with the name ".text.startup".
The answer is pretty easy - LLVM is not executing anything behind the scenes. It's a job of the C runtime (CRT) to perform all necessary preparations before running main(). This includes (but not limited to) to static ctors and similar things. The runtime is usually informed about these objects via addresses of constructores being emitted in the special sections (e.g. .init_array or .ctors). See e.g. http://wiki.osdev.org/Calling_Global_Constructors for more information.
I had noticed some time ago that the "Watch" window in VS2012 for Web doesn't work for default functions in FSharp. For example, cos someValue doesn't work, neither does the workaround where let _cos = cos or let _cos x = cos x is inserted in the beginning of the function and _cos(someValue) is used. The error is something like "cos doesn't exist in the current context" or "_cos isn't valid in the current scope", among others.
Should I change some settings or is this an unexpected bug? Of course I can declare all the results I need to watch, but that's a bit of overhead and it is quite impractical. What can I do to fix this?
As mentioned in the referneced answer, the watches and immediate windows only support C#, so they are not able to evaluate F# expressions and they are not aware of the F# context (such as opened namespaces).
In summary storing the result in a local variable (which is compiled to an ordinary local variable) is the best way to see the result.
More details:
In some cases, you can write C# code that corresponds to what you want to do in F#. This is probably only worth for simple situations, when the corresponding C# is not too hard to write, but it can often be done.
For example to call cos 3.14, you need to write something like:
Microsoft.FSharp.Core.Operators.Cos(3.14)
If you find the cos function in the F# source code (it righ here, in prim-types.fsi), then you can see that it comes with CompiledName attribute that tells the compiler to compile it as a method named Cos (to follow .NET naming guidelines). It is defined in module named Operators (see it here), which is annotated with AutoOpen so you do not need to explicitly write open in the F# code, but it is actually the name of the class that the F# compiler generates when compiling the code.
With respect to the following link:
http://www.archlinux.org/news/libpnglibtiff-rebuilds-move-from-testing/
Could someone explain to me why a program should be rebuilt after one of its libraries has been updated?
How does that make any sense since the "main" file is not changed at all?
If the signatures of the functions involved haven't changed, then "rebuilding" the program means that the object files must be linked again. You shouldn't need to compile them again.
An API is contract that describes the interface to the public functions in a library. When the compiler generates code, it needs to know what type of variables to pass to each function, and in what order. It also needs to know the return type, so it knows the size and format of the data that will be returned from the function. When your code is compiled, the address of a library function may be represented as "start of the library, plus 140 bytes." The compiler doesn't know the absolute address, so it simply specifies an offset from the beginning of the library.
But within the library, the contents (that is, the implementations) of the functions may change. When that happens, the length of the code may change, so the addresses of the functions may shift. It's the job of the linker to understand where the entry points of each function reside, and to fill those addresses into the object code to create the executable.
On the other hand, if the data structures in the library have changed and the library requires the callers to manage memory (a bad practice, but unfortunately common), then you will need to recompile the code so it can account for the changes. For example, if your code uses malloc(sizeof(dataStructure)) to allocate memory for a library data structure that's doubled in size, you need to recompile your code because sizeof(dataStructure) will have a larger value.
There are two kinds of compatibility: API and ABI.
API compatibility is about functions and data structures which other programs may rely on. For instance if version 0.1 of libfoo defines an API function called "hello_world()", and version 0.2 removes it, any programs relying on "hello_world()" need updating to work with the new version of libfoo.
ABI compatibility is about the assumptions of how functions and, in particular, data structures are represented in the binaries. If for example libfoo 0.1 also defined a data structure recipe with two fields: "instructions" and "ingredients" and libfoo 0.2 introduces "measurements" before the "ingredients" field then programs based on libfoo 0.1 recipes must be recompiled because the "instructions" and "ingredients" fields will likely be at different positions in the 0.2 version of the libfoo.so binary.
What is a "library"?
If a "library" is only a binary (e.g. a dynamically linked library aka ".dll", ".dylib" or ".so"; or a statically linked library aka ".lib" or ".a") then there is no need to recompile, re-linking should be enough (and even that can be avoided in some special cases)
On the other hand, libraries often consist of more than just the binary object - e.g. the header-files might include some in-line (or macro) logic.
if so, re-linking is not enough, and you might to have to re-compile in order to make use of the newest version of the lib.
everyone what is the difference between those 4 terms, can You give please examples?
Static and dynamic are jargon words that refer to the point in time at which some programming element is resolved. Static indicates that resolution takes place at the time a program is constructed. Dynamic indicates that resolution takes place at the time a program is run.
Static and Dynamic Typing
Typing refers to changes in program structure that are due to the differences between data values: integers, characters, floating point numbers, strings, objects and so on. These differences can have many effects, for example:
memory layout (e.g. 4 bytes for an int, 8 bytes for a double, more for an object)
instructions executed (e.g. primitive operations to add small integers, library calls to add large ones)
program flow (simple subroutine calling conventions versus hash-dispatch for multi-methods)
Static typing means that the executable form of a program generated at build time will vary depending upon the types of data values found in the program. Dynamic typing means that the generated code will always be the same, irrespective of type -- any differences in execution will be determined at run-time.
Note that few real systems are either purely one or the other, it is just a question of which is the preferred strategy.
Static and Dynamic Binding
Binding refers to the association of names in program text to the storage locations to which they refer. In static binding, this association is predetermined at build time. With dynamic binding, this association is not determined until run-time.
Truly static binding is almost extinct. Earlier assemblers and FORTRAN, for example, would completely precompute the exact memory location of all variables and subroutine locations. This situation did not last long, with the introduction of stack and heap allocation for variables and dynamically-loaded libraries for subroutines.
So one must take some liberty with the definitions. It is the spirit of the concept that counts here: statically bound programs precompute as much as possible about storage layout as is practical in a modern virtual memory, garbage collected, separately compiled application. Dynamically bound programs wait as late as possible.
An example might help. If I attempt to invoke a method MyClass.foo(), a static-binding system will verify at build time that there is a class called MyClass and that class has a method called foo. A dynamic-binding system will wait until run-time to see whether either exists.
Contrasts
The main strength of static strategies is that the program translator is much more aware of the programmer's intent. This makes it easier to:
catch many common errors early, during the build phase
build refactoring tools
incur a significant amount of the computational cost required to determine the executable form of the program only once, at build time
The main strength of dynamic strategies is that they are much easier to implement, meaning that:
a working dynamic environment can be created at a fraction of the cost of a static one
it is easier to add language features that might be very challenging to check statically
it is easier to handle situations that require self-modifying code
Typing - refers to variable tyes and if variables are allowed to change type during program execution
http://en.wikipedia.org/wiki/Type_system#Type_checking
Binding - this, as you can read below can refer to variable binding, or library binding
http://en.wikipedia.org/wiki/Binding_%28computer_science%29#Language_or_Name_binding