Extraneous Library Linkage

Extraneous Library Linkage - linux

I have a question which may be somewhat silly because I'm pretty sure I may know the answer already.
Suppose you have static library A, and dynamic shared object library B and your program C under linux. Suppose that library A calls functions from library B and your program calls functions from library A. Now suppose that all functions that C calls in A make no use of functions in B.
To compile C will it be enough to link just A and omit B and furthermore can your program C be run on a system without library B installed?

If your program calls functions in A that don't reference B then B is not required either at link or load time, assuming that the functions in A are in separate compilation units, which is usually the case for a library.
The linker will pull the functions from the library that C uses and since none of them call functions in B, B will not be needed.

Holy placeholder name overload, batman. Let's first replace A, B, and C, with libstatic, libshared, and myapp to make things a little more legible:
Suppose you have static library libstatic, and
dynamic shared object library libshared and
your program myapp under linux. Suppose
that library libstatic calls functions from
library libshared and your program (myapp) calls
functions from library libstatic. Now suppose
that all functions that myapp calls in libstatic
make no use of functions in libshared.
To compile myapp will it be enough to link
just libstatic and omit libshared and furthermore can
your program myapp be run on a system
without library libshared installed?
So the way I understand your question, there is a library libstatic, some functions in which make use of libshared. You want to know: if I don't use any of the libstatic functions that are dependent on libshared, will myapp link and run without libshared?
The answer is yes, so long as two things are true:
The calls you make into libstatic do not depend on libshared directly or indirectly. Meaning that if myapp calls a function in libstatic which calls another function in libstatic which calls a function in libshared, then myapp is now dependent on libshared.
The calls you make into libstatic do not depend on any function in libstatic whose implementation appears in the same compilation unit (object file) with a call to libshared. The linker brings in code from the static library at the level of object files, not at the level of individual functions. And remember, this dependency is similarly chained, so if you call a function in foo.o, and something else in foo.o calls a function in bar.o, and something in bar.o depends on libshared, you're toast.
When you link in a static library into an application, only the object files that contain the symbols used (directly or indirectly) are linked. So if it turns out that none of the object files that myapp ends up needing from libstatic depend on libshared, then myapp doesn't depend on libshared.

Related

Why are some foreign functions statically linked while others are dynamically linked?

I'm working on a program that needs to manipulate git repositories. I've decided to use libgit2. Unfortunately, the haskell bindings for it are several years out of date and lack several functions that I require. Because of this I've decided to write the portions that use libgit2 in C and call them through the FFI. For demonstration purposes one of them is called git_update_repo.
git_update_repo works perfectly when used in a pure C program, however when it's called from haskell an assertion fails indicating that the libgit2 global init function, git_libgit2_init, hasn't been called. But, git_libgit2_init is called by git_update_repo. And if I use gdb I can see that git_libgit2_init is indeed called and reports that the initialization has been successful.
I've used nm to examine the executables and found something interesting. In a pure C executable, all the libgit2 functions are dynamically linked (as expected). However, in my haskell executable, git_libgit2_init is dynamically linked, while the rest of the libgit2 functions are statically linked. I'm certain that this mismatch is the cause of my issue.
So why do certain functions get linked dynamically and others statically? How can I change this?
The relevant settings in my .cabal file are
cc-options: -g
c-sources:
src/git-bindings.c
extra-libraries:
git2

Compile part of all dependencies as shared libraries

Say I got (regular source) libraries A and B and executable E which depends on both.
Now, I want E to include the object files of A directly, whereas B should be added as a shared library (concrete use: B contains shared types of a plugin architecture). How would I do that with existing tools, preferably stack?
Is that possible or is it rather an all-or-nothing choice (use only shared libraries or link everything into the same binary)?
Optimally, I'd like to specify for each dependency if it should be linked statically or dynamically. Also, that should probably go into the .cabal file, but we have to work with what we got...
(Well, technically that's both statically linked, but in the second case the object code is split up in different files, you get the idea).

Is there an equivalent of dyld for Linux?

Mac OS X provides a useful library for dynamic loading, called dyld. Among the many interesting functions for dynamic loading handling are functions to allow one to install callbacks which will be called by dyld whenever an image is loaded or unloaded, by dlopen and dlclose, respectively. Those functions are void _dyld_register_func_for_add_image(void (*func)(const struct mach_header* mh, intptr_t vmaddr_slide)) and void _dyld_register_func_for_remove_image(void (*func)(const struct mach_header* mh, intptr_t vmaddr_slide)), respectively.
I know it's not possible to have an exact port for Linux, because dyld functions deal with mach-o files and Linux uses ELF files.
So, is there an equivalent of the dyld library for Linux. Or, at least, is there an equivalent of those two functions, _dyld_register_func_for_add_image and _dyld_register_func_for_remove_image, in any Linux library? Or will I have to implement my own versions of these two by myself, which is not so hard, but I would have to find a way to make dlopen and dlclose call callback functions whenever they get called.
EDIT
To must things clearer, I need to make a library that has a callback function that must be called whenever an external library is dynamically loaded by dlopen. My callback function must perform some operations on any dynamic loaded library.

Yes, it is called dlopen(3) using the -ldl standard library
More precisely:
compile your plugin's source code using the -fPIC flag to get position independent code object files *.pic.o
make a shared library plugin by linking with gcc -shared your *.pic.o files (and you can also link another shared library).
use GCC function attributes, notably constructor and destructor functions (or static C++ data with explicit constructors & destructors, hence the name). The functions with __attribute__((constructor)) are called during dlopen time of your plugin, those with __attribute__((destructor)) in your plugin are called during dlclose time
linking the main program with the -rdynamic attribute is useful & needed, as soon as the plugin call some functions in the main program.
don't forget to declare extern "C" your C++ plugin functions (needed for the program)
use dlsym inside your main program to fetch function or data addresses inside your plugin.
There is indeed no hooks for dlopen like _dyld_register_func_for_add_image does. You may want to use constructor functions and/or dl_iterate_phdr(3) to mimic that.
If you can change the plugin (the shared object which you dlopen) you could play constructor tricks inside to mimic such hooks. Otherwise, use some own convention (e.g. that a plugin having a module_start function gets that module_start function called just after dlopen etc...).
Some libraries are wrapping dlopen into something of higher level. For example Qt has QPluginLoader & QLibrary etc...
There is also the LD_PRELOAD trick (perhaps you might redefine your own dlopen & dlclose thru such a trick, and have your modified functions do the hooks). The ifunc function attribute might also be relevant.
And since Gnu Libc is free software providing the dlopen - there is also MUSL Libc, you could patch it to suit your needs. dladdr(3) could be useful too!
addenda
If you are making your own runtime for some Objective-C, you should know well the conventions of the Objective-C compiler using that runtime, and you probably could have your own module loader, instead of overloading dlopen...

Is it possible to embed Haskell in a C library opaquely?

i.e. is it possible to embed Haskell code in a C library so that the user of the library doesn't have to know Haskell is being used? In particular, so that the user could use multiple libraries that embed Haskell, without any conflicts?
As far as I understand things, you embed between calls to hs_init and hs_exit, but these involve global state shenanigans and should conflict with other calls, no?

Yes, it's possible to call Haskell code from C (and vice versa) through FFI, the Foreign Function Interface. Unfortunately, as the haskell.org docs says, you can't avoid the calls to initialize and finalize the haskell environment:
The call to hs_init() initializes GHC's runtime system. Do NOT try to
invoke any Haskell functions before calling hs_init(): bad things will
undoubtedly happen.
But, this is interesting also:
There can be multiple calls to hs_init(), but each one should be
matched by one (and only one) call to hs_exit()
And furthermore:
The FFI spec requires the implementation to support re-initialising
itself after being shut down with hs_exit(), but GHC does not
currently support that.
Basically my idea is that you may exploit this specifications in order to write youself a wrapper C++ class that manages the calls to hs_init and hs_exit for you, in example by using template methods surrounded by hs_init and hs_exit that you can override using any haskell call you want.
However, beware of interactions with other libraries calling haskell code: nested layers of calls to hs_init and hs_exit should be OK (so it's safe to use libraries which calls them in between your wrappers), but the total number of calls should always match, meaning that if those libraries only initialize the environment without trying to close it, then it's up to you to finish the job.
Another (probably better) idea, without exploiting inheritance and overriding, may be to have a simple class HaskellEnv that calls hs_init in the constructor and hs_exit in the destructor. If you declare them as automatic variables, you'll obtain that the calls to hs_init and hs_exit will always be matched, and the latest call to hs_exit will be made as soon as the latest HaskellEnv object is destructed when you leave its scope.
Have a look at this question in order to prevent the creation of objects on the heap (they may be dangerous in this case).

Tools that list the prototypes in .so library

Is there a tool(like command) in Linux that list the prototypes in a .so library.
I found nm close to my need, but what I got are just symbols.

If the library is a C library, it does not contain by itself the signature of the functions. These are in the header files (that the library should give), unless the .so library has been compiled with debugging information enabled by -g (which is not usual for production libraries).
Even in C++, the .so library (without -g) don't contain the declaration of involved classes. The mangled names only refer to class or type names...
In short, you need the header files of libraries. Most Linux distributions package them separately from the library itself. For instance, on Debian you have both the libjansson4 package (containing the .soshared library, needed to run applications liked with the Jansson library) and the libjansson-dev package (containing the shared objects and header files useful to build an application calling functions in Jansson library). Debian also provides libjansson-dbg (for the debugging information or variant of the library) and libjansson-doc (for the documentation) packages.

Simple answer: no you cannot do that (for C).
Longer answer:
You can get "prototypes" as you named them ONLY for C++, because functions' declarations are mangled. Mangling really means encoding the whole function signature (or prototype if you like) into one string of characters without spaces, e.g:
CCertificate::GetInfo(Utils::TCertInfo&) const
which is in mangled form:
_ZZNK12CCertificate7GetInfoERN5Utils9TCertInfoEE8
Mangling was intoduced because of function overloading in C++ (functions with the same name but taking different number of parameters and/or different types). In C you do not have oveloading, so the functions are identified (in shared libraries) by name (which is NOT mangled).
To summarise: all functions in shared libraries are identified by name, but for C++ these names are mangled names, for C they are not mangled.
Mangling gives you that additional "side effect" that you can see the function signature (e.g. invoking nm -C).
Hope that helps.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Extraneous Library Linkage - linux

Related

Why are some foreign functions statically linked while others are dynamically linked?

Compile part of all dependencies as shared libraries

Is there an equivalent of dyld for Linux?

Is it possible to embed Haskell in a C library opaquely?

Tools that list the prototypes in .so library

Categories

Resources