Linking to a custom .a from multiple objects - linux

In our build system, we generate multiple .so files (foo.so, bar.so, ...) that are loaded during runtime by the main executable (biz). So the .so files are linked separately.
We also have our own util.a static library, that has some utility functions and global data.
The problem comes when some of the .so want to use util.a data/function, but we can't link each .so to util.a. It's because of the data section: global data must be unique in the program address space. If more than one .so is linked to util.a and has a copy of the data, the program behavior will be very surprising but hard to debug.
We can't link executable (biz) to util.a either. The linker will not put everything to the target, since biz doesn't reference the functions on behalf of .so.
Of course, unless linking util.a with -Wl,-whole-archive. But is there a better way to do this?

Solution 1: consider making util.a a dynamic library util.so.
Solution 2: don't let the linker export any symbols exported by util.a. When using gcc you can achieve this for example by using __attribute__((visibility("hidden"))):
int __attribute__((visibility("hidden"))) helperfunc(void *p);
You can use objdump to check which symbols are exported.

To answer myself's question, the eventual solution was like:
http://lists.gnu.org/archive/html/qemu-devel/2014-09/msg00099.html
TL;DR: Search for all the interesting symbols (that you want to pull from archives) inside the .so objects with nm (1), and inject into the compiling command line with -Wl,-u,$SYMBOL. Note that the -Wl,-u,$SYMBOL arguments need to come before archive names in the command line, so the linker knows that it needs to link them.

Related

When a shared library is loaded, is it possible that it references something in the current binary?

Say I have a binary server, and when it's compiled, it's linked from server.c, static_lib.a, and dynamically with dynamic_lib.so.
When server is executed and it loads dynamic_lib.so dynamically, but on the code path, dynamic_lib.so actually expects some symbols from static_lib.a. What I'm seeing is that, dynamic_lib.so pulls in static_lib.so so essentially I have two static_lib in memory.
Let's assume there's no way we can change dynamic_lib.so, because it's a 3rd-party library.
My question is, is it possible to make dynamic_lib.so or ld itself search the current binary first, or even not search for it in ld's path, just use the binary's symbol, or abort.
I tried to find some related docs about it, but it's not easy for noobs about linkers like me :-)
You can not change library to not load static_lib.so but you can trick it to use static_lib.a instead.
By default ld does not export any symbols from executables but you can change this via -rdynamic. This option is quite crude as it exports all static symbols so for finer-grained control you can use -Wl,--dynamic-list (see example use in Clang sources).

How to keep a specific symbol in binary file?

I have a static lib (my_static_lib) which I link to an executable binary file. Some of the symbols, but not all, are used in my binary.
A second library, dynamically loaded(my_shared_lib), is expecting to receive some symbols from my_static_lib through symbol injection from the binary. But those symbols are not used by my_binary, so they are stripped off the final bin file.
So, at runtime, my_shared_lib complains that it cannot find __my_stripped_symbols__ and crashes.
Is there a way to force the linker to keep __my_stripped_symbols__? I would prefer something that can be cleanly written in a Makefile.am (autotools)
(-binary file makefile)
-L$(top_builddir)/static_lib -lmy_static_lib --magic-flag-to-keep-stripped-symbol
I do not want to link my_static_lib with my_shared_lib because it will generate strange conflicts in other parts of a rather complex group of executables/shared libraries.
When you link my_static_lib to your application, you want to use the --whole-archive option. It's documented in the ld options docs.
If you're linking with gcc, it looks something like this:
-L$(top_builddir)/static_lib -Wl,-whole-archive -lmy_static_lib -Wl,-no-whole-archive
That will make sure the entire library is kept, and not just the specific functions that your executable uses.
You also need to make sure that the symbols get exported. If the symbols from your static library aren't being exported already, you make need a combination of -fvisibility=hidden and use __attribute__ ((visibility("default"))) to mark up the ones you want exported. You can read a little more about it in the gcc docs

dlopen() .so fails to find symbols in a stripped executable

I have an executable in linux - exe
This executable has some functions in it, that are used throughout the code:
sendMsg
debugPrint
I then want to dynamically load a .so that provides extra functionality to my executable.
In this shared library I include the headers for sendMsg and debugPrint.
I load this shared library with dlopen() and create an API with dlsym().
However, at dlopen() I use RTLD_NOW to resolve all symbols at load time.
It fails stating that it cannot find sendMsg symbol.
This symbol must be in the executable as the sendMsg.c is compiled in there.
However, my executable is stripped by the make process. As such, it would make sense that dlopen cannot find the symbol.
How can i solve this situation?
I could build the shared functions into a static library and link that static library into both exe and the .so. This would increase code size :(
I could remove the stripping of the exe so the symbols can be found
Do some compile time linking magic that I don't know about so the .so knows where the symbols are in exe
man ld:
-E
--export-dynamic
--no-export-dynamic
When creating a dynamically linked executable, using the -E option or the --export-dynamic option causes the linker to add all symbols to the dynamic symbol table. The
dynamic symbol table is the set of symbols which are visible from dynamic objects at run time.
If you do not use either of these options (or use the --no-export-dynamic option to restore the default behavior), the dynamic symbol table will normally contain only those
symbols which are referenced by some dynamic object mentioned in the link.
If you use "dlopen" to load a dynamic object which needs to refer back to the symbols defined by the program, rather than some other dynamic object, then you will probably
need to use this option when linking the program itself.
You can also use the dynamic list to control what symbols should be added to the dynamic symbol table if the output format supports it. See the description of
--dynamic-list.
Note that this option is specific to ELF targeted ports. PE targets support a similar function to export all symbols from a DLL or EXE; see the description of
--export-all-symbols below.
You can also pass the -rdynamic option to gcc/g++ (as noted int the comment). Depending on how you setup your make script, this will be convenient

Restricting symbols to local scope for linux executable

Can anyone please suggest some way we can restrict exporting of our symbols to global symbol table?
Thanks in advance
Hi,
Thanks for replying...
Actually I have an executable which is statically linked to a third party library say "ver1.a" and also uses a third party ".so" file which is again linked with same library but different version say "ver2.a". Problem is implementation of both these versions is different. At the beginning, when executable is loaded, symbols from "ver1.a" will get exported to global symbol table. Now whenever ".so" is loaded it will try to refer to symbols from ver2.a, it will end up referring to symbols from "ver1.a" which were previously loaded.Thus crashing our binary.
we thought of a solution that we wont be exporting the symbols for executable to Global symbol table, thus when ".so" gets loaded and will try to use symbols from ver2.a it wont find it in global symbol table and it will use its own symbols i.e symbols from ver2.a
I cant find any way by which i can restrict exporting of symbols to global symbol table. I tried with --version-script and retain-symbol-file, but it didn't work. For -fvisibility=hidden option, its giving an error that " -f option may only be used with -shared". So I guess, this too like "--version-script" works only for shared libraries not for executable binaries.
code is in c++, OS-Linux, gcc version-3.2. It may not be possible to recompile any of the third party libraries and ".so"s. So option of recompiling "so' file with bsymbolic flag is ruled out.
Any help would be appreciated.
Pull in the 3rd party library with dlopen.
You might be able to avoid that by creating your own shared lib that hides all the third party symbols and only exposes your own API to them, but if all else fails dlopen gives you complete control.
I had, what sounds like, a similar issue/question: Segfault on C++ Plugin Library with Duplicate Symbols
If you can rebuild the 3rd party library, you could try adding the linker flag -Bsymbolic (the flag to gcc/g++ would be -Wl,-Bsymbolic). That might solve your issue. It all depends on the organization of your code and stuff, as there are caveats to using it:
http://www.technovelty.org/code/c/bsymbolic.html
http://software.intel.com/en-us/articles/performance-tools-for-software-developers-bsymbolic-can-cause-dangerous-side-effects/
If you can't rebuild it, according to the first caveat link:
In fact, the only thing the -Bsymbolic
flag does when building a shared
library is add a flag in the dynamic
section of the binary called
DT_SYMBOLIC.
So maybe there's a way to add the DT_SYMBOLIC flag to the dynamic section post-linking?
The simplest solution is to rename the symbols (by changing source code) in your executable so they don't conflict with the shared library in the first place.
The next simplest thing is to localize the "problem" symbols with 'objcopy -L problem_symbol'.
Finally, if you don't link directly with the third party library (but dlopen it instead, as bmargulies suggests), and none of your other shared libraries use of define the "problem" symbol, and you don't link with -rdynamic or one of its equivalents, then the symbol should not be exported to the dynamic symbol table of the executable, and thus you shouldn't have a conflict.
Note: 'nm a.out' will still, show the symbol as globally defined, but that doesn't matter for dynamic linking. You want to look at the dynamic symbol table of a.out with 'nm -D a.out'.

The compilation process

Can anyone explain how compilation works?
I can't seem to figure out how compilation works..
To be more specific, here's an example.. I'm trying to write some code in MSVC++ 6 to load a Lua state..
I've already:
set the additional directories for the library and include files to the right directories
used extern "C" (because Lua is C only or so I hear)
include'd the right header files
But i'm still getting some errors in MSVC++6 about unresolved external symbols (for the Lua functions that I used).
As much as I'd like to know how to solve this problem and move on, I think it would be much better for me if I came to understand the underlying processes involved, so could anyone perhaps write a nice explanation for this? What I'm looking to know is the process.. It could look like this:
Step 1:
Input: Source code(s)
Process: Parsing (perhaps add more detail here)
Output: whatever is output here..
Step 2:
Input: Whatever was output from step 1, plus maybe whatever else is needed (libraries? DLLs? .so? .lib? )
Process: whatever is done with the input
Output: whatever is output
and so on..
Thanks..
Maybe this will explain what symbols are, what exactly "linking" is, what "object" code or whatever is..
Thanks.. Sorry for being such a noob..
P.S. This doesn't have to be language specific.. But feel free to express it in the language you're most comfortable in.. :)
EDIT: So anyway, I was able to get the errors resolved, it turns out that I have to manually add the .lib file to the project; simply specifying the library directory (where the .lib resides) in the IDE settings or project settings does not work..
However, the answers below have somewhat helped me understand the process better. Many thanks!.. If anyone still wants to write up a thorough guide, please do.. :)
EDIT: Just for additional reference, I found two articles by one author (Mike Diehl) to explain this quite well.. :)
Examining the Compilation Process: Part 1
Examining the Compilation Process: Part 2
From source to executable is generally a two stage process for C and associated languages, although the IDE probably presents this as a single process.
1/ You code up your source and run it through the compiler. The compiler at this stage needs your source and the header files of the other stuff that you're going to link with (see below).
Compilation consists of turning your source files into object files. Object files have your compiled code and enough information to know what other stuff they need, but not where to find that other stuff (e.g., the LUA libraries).
2/ Linking, the next stage, is combining all your object files with libraries to create an executable. I won't cover dynamic linking here since that will complicate the explanation with little benefit.
Not only do you need to specify the directories where the linker can find the other code, you need to specify the actual library containing that code. The fact that you're getting unresolved externals indicates that you haven't done this.
As an example, consider the following simplified C code (xx.c) and command.
#include <bob.h>
int x = bob_fn(7);
cc -c -o xx.obj xx.c
This compiles the xx.c file to xx.obj. The bob.h contains the prototype for bob_fn() so that compilation will succeed. The -c instructs the compiler to generate an object file rather than an executable and the -o xx.obj sets the output file name.
But the actual code for bob_fn() is not in the header file but in /bob/libs/libbob.so, so to link, you need something like:
cc -o xx.exe xx.obj -L/bob/libs;/usr/lib -lbob
This creates xx.exe from xx.obj, using libraries (searched for in the given paths) of the form libbob.so (the lib and .so are added by the linker usually). In this example, -L sets the search path for libraries. The -l specifies a library to find for inclusion in the executable if necessary. The linker usually takes the "bob" and finds the first relevant library file in the search path specified by -L.
A library file is really a collection of object files (sort of how a zip file contains multiple other files, but not necessarily compressed) - when the first relevant occurrence of an undefined external is found, the object file is copied from the library and added to the executable just like your xx.obj file. This generally continues until there are no more unresolved externals. The 'relevant' library is a modification of the "bob" text, it may look for libbob.a, libbob.dll, libbob.so, bob.a, bob.dll, bob.so and so on. The relevance is decided by the linker itself and should be documented.
How it works depends on the linker but this is basically it.
1/ All of your object files contain a list of unresolved externals that they need to have resolved. The linker puts together all these objects and fixes up the links between them (resolves as many externals as possible).
2/ Then, for every external still unresolved, the linker combs the library files looking for an object file that can satisfy the link. If it finds it, it pulls it in - this may result in further unresolved externals as the object pulled in may have its own list of externals that need to be satisfied.
3/ Repeat step 2 until there are no more unresolved externals or no possibility of resolving them from the library list (this is where your development was at, since you hadn't included the LUA library file).
The complication I mentioned earlier is dynamic linking. That's where you link with a stub of a routine (sort of a marker) rather than the actual routine, which is later resolved at load time (when you run the executable). Things such as the Windows common controls are in these DLLs so that they can change without having to relink the objects into a new executable.
Step 1 - Compiler:
Input: Source code file[s]
Process: Parsing source code and translating into machine code
Output: Object file[s], which consist[s] of:
The names of symbols which are defined in this object, and which this object file "exports"
The machine code associated with each symbol that's defined in this object file
The names of symbols which are not defined in this object file, but on which the software in this object file depends and to which it must subsequently be linked, i.e. names which this object file "imports"
Step 2 - Linking:
Input:
Object file[s] from step 1
Libraries of other objects (e.g. from the O/S and other software)
Process:
For each object that you want to link
Get the list of symbols which this object imports
Find these symbols in other libraries
Link the corresponding libraries to your object files
Output: a single, executable file, which includes the machine code from all all your objects, plus the objects from libraries which were imported (linked) to your objects.
The two main steps are compilation and linking.
Compilation takes single compilation units (those are simply source files, with all the headers they include), and create object files. Now, in those object files, there are a lot of functions (and other stuff, like static data) defined at specific locations (addresses). In the next step, linking, a bit of extra information about these functions is also needed: their names. So these are also stored. A single object file can reference functions (because it wants to call them when to code is run) that are actually in other object files, but since we are dealing with a single object file here, only symbolic references (their 'names') to those other functions are stored in the object file.
Next comes linking (let's restrict ourselves to static linking here). Linking is where the object files that were created in the first step (either directly, or after they have been thrown together into a .lib file) are taken together and an executable is created.
In the linking step, all those symbolic references from one object file or lib to another are resolved (if they can be), by looking up the names in the correct object, finding the address of the function, and putting the addresses in the right place.
Now, to explain something about the 'extern "C"' thing you need:
C does not have function overloading. A function is always recognizable by its name. Therefore, when you compile code as C code, only the real name of the function is stored in the object file.
C++, however, has something called 'function / method overloading'. This means that the name of a function is no longer enough to identify it. C++ compilers therefore create 'names' for functions that include the prototypes of the function (since the name plus the prototype will uniquely identify a function). This is known as 'name mangling'.
The 'extern "C"' specification is needed when you want to use a library that has been compiled as 'C' code (for example, the pre-compiled Lua binaries) from a C++ project.
For your exact problem: if it still does not work, these hints might help:
* have the Lua binaries been compiled with the same version of VC++?
* can you simply compile Lua yourself, either within your VC solution, or as a separate project as C++ code?
* are you sure you have all the 'extern "C"' things correct?
You have to go into project setting and add a directory where you have that LUA library *.lib files somewhere on the "linker" tab. Setting called "including libraries" or something, sorry I can't look it up.
The reason you get "unresolved external symbols" is because compilation in C++ works in two stages. First, the code gets compiled, each .cpp file in it's own .obj file, then "linker" starts and join all that .obj files into .exe file. .lib file is just a bunch of .obj files merged together to make distribution of libraries just a little bit simplier.
So by adding all the "#include" and extern declaration you told the compiler that somewhere it would be possible to find code with those signatures but linker can't find that code because it doesn't know where those .lib files with actual code is placed.
Make sure you have read REDME of the library, usually they have rather detailed explanation of what you had to do to include it in your code.
You might also want to check this out: COMPILER, ASSEMBLER, LINKER AND LOADER: A BRIEF STORY.

Resources