gnu linker doesn't include unreferenced modules in a shared library

gnu linker doesn't include unreferenced modules in a shared library - linux

I have a shared library that consists of quite a few .c modules, some of which are themselves linked into the shared library from other static .a libraries. Most of these are referenced internally within the library, but some are not. I'm finding that the linker does not include those modules in the shared library unless there is at least one call to a function in the module from within the shared library. I've been working around this problem by adding calls in a dummy ForceLinkages() function in a module that I know will be included.
That's okay, but it's surprising, since I'm using a .version file to define a public API to this library. I would've thought including functions in those unreferenced .c modules in the .version file would constitute a reference to the modules and force them to be included in the library.
This library was originally developed on AIX, which uses a .exp file to define the public API. And there, I've never had the issue of unreferenced modules not getting included. I.e., referencing the modules in the .exp file was enough to get the linker to pull them in. Is there a way to get the linux linker to work like that. If not, I can continue to use my silly ForceLinkages() function to get the job done...

That's okay, but it's surprising, since I'm using a .version file to define a public API to this library.
The .version file (assuming it's a linker version script) does not define the library API. If only determines which functions are exported from the library (and with which version label) and which are hidden.
I would've thought including functions in those unreferenced .c modules in the .version file would constitute a reference to the modules and force them to be included in the library.
The version script is applied after the linker has decided which objects are going to be part of the library, and which are going to be discarded, and has no effect on the decisions taken earlier.
This is all working as designed.
You need to either use --whole-archive --no-whole-archive (this has a danger of linking in code you don't need and bloating your binaries), or keep adding references as you've done before.

Related

How do I statically compile a C library into a Haskell module that I can later load with the GHC API?

Here is my desired use case:
I have a package with a single module that reads HDF5 files and writes some of their data to Haskell records. To do the work, the library uses the bindings-hdf5 package. Here is my cabal's build-depends. reader-types is a module I wrote that defines the types of the Haskell records that contain the read-in data.
build-depends: base >=4.7 && <4.8
, text
, vector
, containers
, bindings-hdf5
, reader-types
Note that my cabal file does not currently use extra-libraries or ghc-options. I can load my module, src/Mabel.hs in ghci as long as I specify the required hdf5_hl library:
ghci src/Mabel.hs -lhdf5_hl -L/long/nixos/path/lib
and within ghci, I can run my function perfectly fine.
Now, what I want to do is compile this library/module into a single, compiled file that I can later load with the GHC API in a different Haskell program. By single file, I mean that it needs to run even if the hdf5_hl library does not exist on the system. Preferably, it would also run even if text, vector, and/or containers are missing, but this is not essential because reader-types requires those types anyway. When loading the module with the GHC API, I want it to load in already compiled form, and not run interpreted.
My purpose for doing this is that I want the self-contained file to act as a single, pre-compiled plugin file that is later loaded and executed by a different Haskell executable. Other plugins might not use hdf5 at all, and the only package they are guaranteed to use is reader-types, which essentially defines the plugin interface types.
The hdf5 library on my system contains the following files: libhdf5_la.la, libhdf5_hl.so, libhdf5.la, libhdf5.so, and similar files that have the version number in the file name.
I have done a lot of googling, but am getting confused by all the edge cases I am finding. Here are some examples that I'm either sure don't fit my case, or I can't tell.
I do not want to compile a Haskell library to use from C or Python, only a Haskell program using GHC API.
I do not want to compile C wrappers for a C++ library into a Haskell module because the bindings already exist and the library is already a C library.
I do not to want compile a library that is entirely self-contained because, since I am loading it with the GHC API, I don't need the GHC runtime included in the library. (My understanding is that the plugins must be compiled with the same ghc version they will be loaded with in the GHC API).
I do not want to compile C bindings and the C library at the same time because the C library is already compiled and the bindings are specified in separate package (bindings-hdf5).
The closest resource for what I want to do is this exchange on the mailing list from 2009. However, I added extra-libraries: hdf5_hl or extra-libraries: hdf5 to my cabal file, and in both cases the resulting .a, .so, .dyn_hi, .dyn_o, .hi, and .o files in dist/build are all the exact same size as without using extra-libraries, so I'm confident it is not working correctly.
What changes to my cabal file do I need to make to create a self-contained, standalone file that I can later load with the GHC API? If this is not possible, what are the alternatives?
Instead of using the GHC API, I am also open to using the plugins library to load the plugin, but the self-contained requirements are still the same.
EDIT: I do not care what form the compiled "plugin" must take (I assume object file is the right way), but I want to load it dynamically from an separate executable at run time and execute functions it defines with known names and known types. The reason I want a single file is that there will eventually be other different plugins, and I want them all to behave the same way without having to worry about lib paths and dependencies for each one. A compiled, single file is a simpler interface for doing this than zipping/unzipping archives that include Haskell object code and their dependencies.

Remove references in a shared object to another shared object

I want to remove references in a shared object to symbols of another shared object because the referenced object is GPL'd and I don't want to release my code under that license. (I don't think that the symbols provided by the referenced object are used by my code.) What are the steps to take to do this on Linux? I'm not steeped in this technology and it would help if you provided commands. Would another approach be better? Would it be better to somehow create a stub object replacing the referenced object?
Edit 1: I am using PyInstaller to build a self-contained archive containing the binaries for the code I've written along with all libraries that code requires and all libraries those libraries require. These libraries are shared objects that already exist on the build system. It would be too much work to forego PyInstaller and re-compile everything so that the GPL'd libraries are not linked in.

On Linux, the only libraries directly referenced by an executable or library are libc.so, ld-linux.so, linux-gate.so, plus anything you explicitly request on the compiler command line. As such, you can remove these references simply by removing them from the compiler command line.
Note that many times, pkg-config scripts will return all indirect dependencies as well as direct dependencies when queried for linker flags. You can either remove the unnecessary dependencies manually, or pass the -Wl,--as-needed flag to the linker to instruct it to remove unnecessary direct references to shared libraries automatically.
As for the pyinstaller bundling, keep in mind that indirectly linking GPL'd libraries via an intermediate library is already kind of a grey area; if you in addition merge them into a single file, this may not count as 'mere aggregation' and may not avoid the GPL restrictions. Also note that the GPL never mentions 'linking'; it's all about derivative works. I'm no lawyer and this is not legal advice, but the mere addition or removal of a NEEDED entry for a GPL'd library when no symbols are used seems unlikely to affect whether you are in violation of the GPL, when the GPL itself never mentions such a thing.

Linking against shared objects at compile time

In Windows, many .dlls come with a static .lib counterpart. My understanding is that the .lib counterpart basically contains LoadProcAddress calls so that the programmer doesn't have to do it him/herself. Essentially, a time saver. When I switched to Linux, I was assuming the situation was the same, replacing .dll with .so and .lib with .a, but I have come to a situation that is showing me this is wrong and I can't figure out what is going on:
I am using a library that comes as a .a/.so pair. I was linking against the .a, but when I executed ldd on the binary that was produced, it contained no reference to the corresponding .so file. So then, I tried linking against the .so file and to my surprise, this worked. In addition, the .so file showed up when I executed ldd against the resulting binary.
So, I am really confused as to what is going on. In Windows, I would never think to link against a .dll file. Also, in Windows, if a .dll file was accompanied with a .lib and I linked against the .lib at compile-time, then I would expect to have a dependency on the corresponding .dll at runtime. Both these things are not true in this case.
Yes, I have read the basic tutorials about shared objects in Linux, but everything I read seems to indicate that my initial assumption was correct. By the way, I should mention that I am using Code::Blocks as an IDE, which I know complicates things, but I am 99% sure that when I tell it to link against the .so file, it is not simply swapping out the .a file because the resulting binary is smaller. (Plus the whole business about ldd...)
Anyway, thanks in advance.

I was linking against the .a, but when I executed ldd on the binary that was produced, it contained no reference to the corresponding .so file.
This is expected. When you link statically, the static library's code is integrated into the resulting binary. There are no more references to or dependencies on the static library.
So then, I tried linking against the .so file and to my surprise, this worked.
What do you mean, that the static linking did not work? There's no reason that it shouldn't...

.lib are used in Windows to dynamically link. You don't have them in Linux, you link with .so directly.
The .a file is the statically built library, you use it to link statically.

To add to already correct answer by tharibo - in some situations (e.g. delayed shared library load) it might be desirable to do it the Windows way i.e. by linking against a static stub instead of .so. Such stubs can be written by hand, generated by project-specific scripts or by generic Implib.so tool.

mixing code compiled with /MT and /MD

I have a large body of code, compiled with /MT (i.e. expecting to statically link against the CRT). I need to combine this with a static third-party library, which has been built with /MD (i.e. expecting to link the CRT dynamically).
Is it theoretically possible to link the two into one executable without recompiling either?
If I link with /nodefaultlib:msvcrt, I end up with a small number of undefined references to things like __imp__wgetenv. I'm tempted to try implementing those functions in my own code, forwarding to wgetenv, etc. Is that worth trying, or will I run straight into the next problem?
Unfortunately I'm Forbidden from taking the easy option of packing the thirdparty code into a separate DLL :-/

No. /MT and /MD are mutually exclusive.
All modules passed to a given invocation of the linker must have been compiled with the same run-time library compiler option (/MD, /MT, /LD).
Source

I found such solution in OpenSSL sources: All obj files of the library are compiled with combination: /MT /Zl. As author described, such combination allows to build static library with ability to compile with applications either dynamic CRT (/MD) or static CRT (/MT).

I faced similar situation where in I had two libraries one was built with MT and another one with MD. I had to build an executable which uses functionalities from both the libraries. The library built as MD was third party thus I couldn't rebuilt it and library built as MT has many dependencies and to built all of them as MD is a big pain. I was getting error from the third party config header file which made it mandatory to built the executable as MD. I was looking for the easy way of packaging third party dll as a separate dll as mentioned in question. However, I couldn't find enough explanation online on this easy way. Hence my two cents below.
The following is the way I circumvent it
I built another .dll which acted as an interface. This interface basically wrapped all api calls that was made to third party dll. The header file for this interface did not include any header file from third party dll rather all those header files were included in the interface.cpp file. Interface as you expect was built as MD.
Now In my main.cpp file I included this interface header file to make all the calls to third party dll through the interface.
Extra care has to be taken in passing arguments to the interface. Basic variables like int,bool etc can be passed as value. However any class or structure needs to be passed as const reference to avoid heap corruption. This is applicable to even string.
Happy to share more details if it is not clear!

Loading multiple shared libraries with different versions

I have an executable on Linux that loads libfoo.so.1 (that's a SONAME) as one of its dependencies (via another shared library). It also links to another system library, which, in turn, links to a system version, libfoo.so.2. As a result, both libfoo.so.1 and libfoo.so.2 are loaded during execution, and code that was supposed to call functions from library with version 1 ends up calling (binary-incompatible) functions from a newer system library with version 2, because some symbols stay the same. The result is usually stack smashing and a subsequent segfault.
Now, the library which links against the older version is a closed-source third-party library, and I can't control what version of libfoo it compiles against. Assuming that, the only other option left is rebuilding a bunch of system libraries that currently link with libfoo.so.2 to link with libfoo.so.1.
Is there any way to avoid replacing system libraries wiith local copies that link to older libfoo? Can I load both libraries and have the code calling correct version of symbols? So I need some special symbol-level versioning?

You may be able to do some version script tricks:
http://sunsite.ualberta.ca/Documentation/Gnu/binutils-2.9.1/html_node/ld_26.html
This may require that you write a wrapper around your lib that pulls in libfoo.so.1 that exports some symbols explicitly and masks all others as local. For example:
MYSYMS {
global:
foo1;
foo2;
local:
*;
};
and use this when you link that wrapper like:
gcc -shared -Wl,--version-script,mysyms.map -o mylib wrapper.o -lfoo -L/path/to/foo.so.1
This should make libfoo.so.1's symbols local to the wrapper and not available to the main exe.

I can only come up with a work-around. Which would be to statically link a version of the "system library" that you are using. For your static build, you could make it link against the same old version as the third-party library. Given that it does not rely on the newer version...
Perhaps it is also possible to avoid these problems with not linking to the third-party library the ordinary way. Instead, your program could load it at execution time. Perhaps then it could be shadowed against the rest. But I don't know much about that.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string