How do I statically compile a C library into a Haskell module that I can later load with the GHC API? - haskell

Here is my desired use case:
I have a package with a single module that reads HDF5 files and writes some of their data to Haskell records. To do the work, the library uses the bindings-hdf5 package. Here is my cabal's build-depends. reader-types is a module I wrote that defines the types of the Haskell records that contain the read-in data.
build-depends: base >=4.7 && <4.8
, text
, vector
, containers
, bindings-hdf5
, reader-types
Note that my cabal file does not currently use extra-libraries or ghc-options. I can load my module, src/Mabel.hs in ghci as long as I specify the required hdf5_hl library:
ghci src/Mabel.hs -lhdf5_hl -L/long/nixos/path/lib
and within ghci, I can run my function perfectly fine.
Now, what I want to do is compile this library/module into a single, compiled file that I can later load with the GHC API in a different Haskell program. By single file, I mean that it needs to run even if the hdf5_hl library does not exist on the system. Preferably, it would also run even if text, vector, and/or containers are missing, but this is not essential because reader-types requires those types anyway. When loading the module with the GHC API, I want it to load in already compiled form, and not run interpreted.
My purpose for doing this is that I want the self-contained file to act as a single, pre-compiled plugin file that is later loaded and executed by a different Haskell executable. Other plugins might not use hdf5 at all, and the only package they are guaranteed to use is reader-types, which essentially defines the plugin interface types.
The hdf5 library on my system contains the following files: libhdf5_la.la, libhdf5_hl.so, libhdf5.la, libhdf5.so, and similar files that have the version number in the file name.
I have done a lot of googling, but am getting confused by all the edge cases I am finding. Here are some examples that I'm either sure don't fit my case, or I can't tell.
I do not want to compile a Haskell library to use from C or Python, only a Haskell program using GHC API.
I do not want to compile C wrappers for a C++ library into a Haskell module because the bindings already exist and the library is already a C library.
I do not to want compile a library that is entirely self-contained because, since I am loading it with the GHC API, I don't need the GHC runtime included in the library. (My understanding is that the plugins must be compiled with the same ghc version they will be loaded with in the GHC API).
I do not want to compile C bindings and the C library at the same time because the C library is already compiled and the bindings are specified in separate package (bindings-hdf5).
The closest resource for what I want to do is this exchange on the mailing list from 2009. However, I added extra-libraries: hdf5_hl or extra-libraries: hdf5 to my cabal file, and in both cases the resulting .a, .so, .dyn_hi, .dyn_o, .hi, and .o files in dist/build are all the exact same size as without using extra-libraries, so I'm confident it is not working correctly.
What changes to my cabal file do I need to make to create a self-contained, standalone file that I can later load with the GHC API? If this is not possible, what are the alternatives?
Instead of using the GHC API, I am also open to using the plugins library to load the plugin, but the self-contained requirements are still the same.
EDIT: I do not care what form the compiled "plugin" must take (I assume object file is the right way), but I want to load it dynamically from an separate executable at run time and execute functions it defines with known names and known types. The reason I want a single file is that there will eventually be other different plugins, and I want them all to behave the same way without having to worry about lib paths and dependencies for each one. A compiled, single file is a simpler interface for doing this than zipping/unzipping archives that include Haskell object code and their dependencies.

Related

What exactly is a "library" in a crate?

I'm a little confused about the concept "library" in rust, which is mentioned from "A crate is a binary or library".
If I'm right, a binary means an executable program (which can be run from shell, for example), but what is a library?
Are they some sort of object files with symbols like .a or .so, which will be linked to my program (like C/C++)
Or they are pure source codes which will be compiled together with my program?
As described by Masklinn, yes, Rust does have prebuilt library formats. However, these are mostly used internally, are finnicky for different compiler versions, and cargo still lacks support for them. In fact, crates.io requires libraries to be "open-source" (as in, you provide the source code, you could still have the source code load from some closed-source dependency), and it distributes the source code to whoever downloads the crate. Then, the source code is effectively compiled with your program (this is where rlibs come in to play, but cargo doesn't expose this to the user). This is also why you're able to inspect the source code for pretty much every crate.
If I'm right, a binary means an executable program (which can be run from shell, for example), but what is a library?
Yes. Specifically, per the Linkage documentation
A runnable executable will be produced. This requires that there is a main function in the crate which will be run when the program begins executing. This will link in all Rust and native dependencies, producing a single distributable binary. This is the default crate type.
Are they some sort of object files with symbols like .a or .so, which will be linked to my program (like C/C++)
Or they are pure source codes which will be compiled together with my program?
Never strictly the latter, but the exact artefact depends, as per the linkage documentation:
A Rust library will be produced. This is an ambiguous concept as to what exactly is produced because a library can manifest itself in several forms. The purpose of this generic lib option is to generate the "compiler recommended" style of library. The output library will always be usable by rustc, but the actual type of library may change from time-to-time.
The documentation then lists the various types of libraries:
rlib, a static library with rust-specific metadata (an augmented .a)
dylib, a dynamic library with rust-specific metadata (an augmented .so)
staticlib, a system static library (an actual .a)
cdylib, a system dynamic library (an actual .so)
I would think "lib" aliases to "rlib" but frankly I have no idea, and as the quote notes that's neither fixed nor documented by design.

gnu linker doesn't include unreferenced modules in a shared library

I have a shared library that consists of quite a few .c modules, some of which are themselves linked into the shared library from other static .a libraries. Most of these are referenced internally within the library, but some are not. I'm finding that the linker does not include those modules in the shared library unless there is at least one call to a function in the module from within the shared library. I've been working around this problem by adding calls in a dummy ForceLinkages() function in a module that I know will be included.
That's okay, but it's surprising, since I'm using a .version file to define a public API to this library. I would've thought including functions in those unreferenced .c modules in the .version file would constitute a reference to the modules and force them to be included in the library.
This library was originally developed on AIX, which uses a .exp file to define the public API. And there, I've never had the issue of unreferenced modules not getting included. I.e., referencing the modules in the .exp file was enough to get the linker to pull them in. Is there a way to get the linux linker to work like that. If not, I can continue to use my silly ForceLinkages() function to get the job done...
That's okay, but it's surprising, since I'm using a .version file to define a public API to this library.
The .version file (assuming it's a linker version script) does not define the library API. If only determines which functions are exported from the library (and with which version label) and which are hidden.
I would've thought including functions in those unreferenced .c modules in the .version file would constitute a reference to the modules and force them to be included in the library.
The version script is applied after the linker has decided which objects are going to be part of the library, and which are going to be discarded, and has no effect on the decisions taken earlier.
This is all working as designed.
You need to either use --whole-archive --no-whole-archive (this has a danger of linking in code you don't need and bloating your binaries), or keep adding references as you've done before.

How to generate library with a specific name via cabal

I am trying to build a shared Haskell library that is used by a C project afterwards. I am on a linux platform so my question is from that context.
Suppose I have a haskell package foo with a library named foo, say version 0.1 which exports some functions via ffi.
I can easily generate a shared library (.so) that I can then link with, but my issue is that the generated library is named libHSfoo-0.1-$COMPONENT_ID.so which makes it quite cumbersome to link with since $COMPONENT_ID is unpredictable as far as I can tell.
The $COMPONENT_ID comes, to the best of my knowledge from the following Cabal structure and it looks like I could write cabal hooks to at least copy the generated shared library, or create a symbolic link to it from a fixed location.
I am wondering whether there is a better way to specify the component-id to get an easily predictable name of the shared library without post-processing?
It seems like I can achieve this if in the configure hook I set the configArgs to just the library component, and the configCID to my desired name of the library, but that seems like a fragile solution and I am thinking there is a better way for this.
The name of the library also affects linking when there are other Haskell packages dependent on this one, which would make it even more convenient to specify/override the name.
I am using stack to drive cabal, if that is relevant.

within a project can I compile a module and interactively load the compiled module within ghci?

Typically in a Haskell project, I either work interactively with ghci or compile the entire project with cabal build.
However, in some use cases, I may have a computationally intensive routine along with some higher level scripting functionality, say for picking inputs to an analysis algorithm.
Is it possible to use GHCi + GHC such that I compile the computationally intensive module, load the compiled code to re-run with different inputs from within GHCi?
Yes, you can load compiled modules in ghci; if there is an appropriately named .hi and .o file, ghci will use those instead of interpreting the code in the corresponding .hs file. You will then only have access to the operations that are exported from that module.
In case you find yourself using a compiled loaded module when you wanted the interpreted one, you can :load *foo.hs to instruct ghci to ignore the compiled version and interpret foo.hs.

Is it possible to compile "only a file" in a cabal project?

In JVM based programs, you can compile a file to a .class file and be able to run the binary again, without compiling necessarily all the files.
Is it possible to do it in haskell? Is it imperative to compile and link all the files in the project? If yes, why?
What if there is no binary, you are only installing a library?
For GHC, you can change and recompile a single module without having to recompile modules depending on that, provided the exposed interface doesn't change. GHC's --make mode (default as of ghc-7.*) checks whether recompilation is necessary and recompiles only those modules where it can't determine that it's not necessary.
If you have a cabal package and you cabal build after changing one module, you can see from the compiler output that it doesn't recompile all modules in the package in general, only the changed module and [maybe] the ones depending on it.
If you build an executable, that of course has to be relinked, but many of the old object files can be reused.
If you build a library, the library archive of course has to be rebuilt, but many of the old object files can be reused.

Resources