Does GHC optimize away unused code and packages?

Does GHC optimize away unused code and packages? - haskell

Let's say a big package is included to a project and only one function from the package is used, is the rest of the code optimized away when compiling the final binary?
And If a package is included, but at the end it's never used (for example it's used to import a type by another library that's never used in the end), is the whole package stripped?

Related

Include a closed source Haskell package in an open source Haskell package without leaking its code

I'm working on an open source Haskell package and I want to use a proprietary package of mine as a dependency without leaking its source code.
One way would be to compile it to a binary and call it via System.Process.callCommand, but this would be unnecessarily inefficient.
Is there another way to distribute the package in a binary format or at least strongly obfuscated?

Test for GHC compile time errors

I'm working on proto-lens#400 tweaking a Haskell code generator. In one of the tests I'd like to verify that certain API has not been built. Specifically, I want to ensure that a certain type of program will not type check successfully. I'd also have a similar program with one identifier changed which should compile, to guard against a typo breaking the test. Reading Extending and using GHC as a Library I have managed to have my test write a small file and compile it using GHC as a library.
But I need the code emitted by the test to load some other modules. Specifically the output of the code generator of that project and its runtime environment with transitive dependencies. I have at best a very rough understanding of stack and hpack, which is providing the build time system. I know I can add dependencies to some package.yaml file to make them available to individual tests, but I have no clue how to access such dependencies from the GHC session set up as part of running the test. I imagine I might find some usable data in some environment variables, but I also believe such an approach might be undocumented and prone to break without warning.
How can I have a test case use GHC as a library and have it access dependencies expressed in package.yaml? Or alternatively, can I use some construct other than a regular test case to express a file with dependencies but check that the file won't compile?

I don't know if this applies to you because there are too many details going way over my head, but one way to test for type errors is to build your test suite with -fdefer-type-errors and to catch the exception at run-time (of type TypeError).

RcppEigen and package size

I am maintaining a package that uses RcppEigen. The package itself has a modest amount of code (+- 1000 lines at the moment).
What I don't understand is that the file size of my library is very large, leading to a file size of 14MB for my <packagename>.so and 11MB for <packagename>.o.
I would imagine that the package would link dynamically to RcppEigen libraries (thus keeping the size of the binaries of my package relatively small). But my guess instead it links the libraries statically into my .o and .so files.
Am I correct that this is what happens?
Can I/should I avoid this?
If so, how?
I see here (RcppEigen.package.skeleton documentation) that NAMESPACE should include "a useDynLib directive"; it is also present in my NAMESPACE file)
(On a side note, when I submit to CRAN the large package size is NOTEd, but has not been cause for rejection.)

This is expected behavior. I have not checked, but I expect that the majority of packages using RcppEigen (or RcppArmadillo) get this NOTE. That's because Eigen (and Armadillo) is a header-only library, i.e. it is not dynamically linked. Instead the respective function is compiled into each *.o file. This is potentially even worse than static linking: If a function is used in multiple compilation units, it will end up in multiple *.o files, leading to multiple versions of the same function in the *.so. That is the price we all have to pay for the convenience of header-only libraries. Getting dynamic (or static) linking correct can be really difficult, in particular on Windows.
Concerning the useDynLib: If you look into the NAMESPACE file in your package, you should see a line like useDynLib(<packagename> [...]). That tells R to load the dynamic library associated with your package and is required for any R package using compiled code.

How do I statically compile a C library into a Haskell module that I can later load with the GHC API?

Here is my desired use case:
I have a package with a single module that reads HDF5 files and writes some of their data to Haskell records. To do the work, the library uses the bindings-hdf5 package. Here is my cabal's build-depends. reader-types is a module I wrote that defines the types of the Haskell records that contain the read-in data.
build-depends: base >=4.7 && <4.8
, text
, vector
, containers
, bindings-hdf5
, reader-types
Note that my cabal file does not currently use extra-libraries or ghc-options. I can load my module, src/Mabel.hs in ghci as long as I specify the required hdf5_hl library:
ghci src/Mabel.hs -lhdf5_hl -L/long/nixos/path/lib
and within ghci, I can run my function perfectly fine.
Now, what I want to do is compile this library/module into a single, compiled file that I can later load with the GHC API in a different Haskell program. By single file, I mean that it needs to run even if the hdf5_hl library does not exist on the system. Preferably, it would also run even if text, vector, and/or containers are missing, but this is not essential because reader-types requires those types anyway. When loading the module with the GHC API, I want it to load in already compiled form, and not run interpreted.
My purpose for doing this is that I want the self-contained file to act as a single, pre-compiled plugin file that is later loaded and executed by a different Haskell executable. Other plugins might not use hdf5 at all, and the only package they are guaranteed to use is reader-types, which essentially defines the plugin interface types.
The hdf5 library on my system contains the following files: libhdf5_la.la, libhdf5_hl.so, libhdf5.la, libhdf5.so, and similar files that have the version number in the file name.
I have done a lot of googling, but am getting confused by all the edge cases I am finding. Here are some examples that I'm either sure don't fit my case, or I can't tell.
I do not want to compile a Haskell library to use from C or Python, only a Haskell program using GHC API.
I do not want to compile C wrappers for a C++ library into a Haskell module because the bindings already exist and the library is already a C library.
I do not to want compile a library that is entirely self-contained because, since I am loading it with the GHC API, I don't need the GHC runtime included in the library. (My understanding is that the plugins must be compiled with the same ghc version they will be loaded with in the GHC API).
I do not want to compile C bindings and the C library at the same time because the C library is already compiled and the bindings are specified in separate package (bindings-hdf5).
The closest resource for what I want to do is this exchange on the mailing list from 2009. However, I added extra-libraries: hdf5_hl or extra-libraries: hdf5 to my cabal file, and in both cases the resulting .a, .so, .dyn_hi, .dyn_o, .hi, and .o files in dist/build are all the exact same size as without using extra-libraries, so I'm confident it is not working correctly.
What changes to my cabal file do I need to make to create a self-contained, standalone file that I can later load with the GHC API? If this is not possible, what are the alternatives?
Instead of using the GHC API, I am also open to using the plugins library to load the plugin, but the self-contained requirements are still the same.
EDIT: I do not care what form the compiled "plugin" must take (I assume object file is the right way), but I want to load it dynamically from an separate executable at run time and execute functions it defines with known names and known types. The reason I want a single file is that there will eventually be other different plugins, and I want them all to behave the same way without having to worry about lib paths and dependencies for each one. A compiled, single file is a simpler interface for doing this than zipping/unzipping archives that include Haskell object code and their dependencies.

Haskell: unnecessary binary growth with module imports

When i import a (big) module into a Main module in one of the following ways:
import Mymodule
import qualified Mymodule as M
import Mymodule (MyDatatype)
the compiled binary grows the same huge amount compared to when i don't import that module. This happens regardless of whether i use anything inside that module or not in the Main module. Shouldn't the compiler (i am using GHC on Debian Testing) only add into the binary what is needed to run it?
In my specific case i have a huge Map in Mymodule which i don't use in the Main module. Selectively importing what i really need, did not change the growth of the compiled binary.

As far as GHC is concerned, import lists are only there for readability and avoiding name clashes; they don't affect what's linked in at all.
Also, even if you did only import a few functions from a library, they might still depend on the bulk of the library internally, so you shouldn't necessarily expect to see a size decrease from only using some of an available interface in general.
By default, GHC links in entire libraries, rather than only the pieces you use; you could avoid this by building libraries with the -split-objs option to GHC (or put split-objs: True in your cabal-install configuration file (~/.cabal/config on Unix)), but it slows down compilation, and is seemingly not recommended by the GHC developers:
-split-objs
Tell the linker to split the single object file that would normally be generated into multiple object files, one per top-level Haskell function or type in the module. This only makes sense for libraries, where it means that executables linked against the library are smaller as they only link against the object files that they need. However, assembling all the sections separately is expensive, so this is slower than compiling normally. Additionally, the size of the library itself (the .a file) can be a factor of 2 to 2.5 larger. We use this feature for building GHC’s libraries.
— The GHC manual
This will omit unused parts of libraries you use, regardless of what you import.
You might also be interested in using shared Haskell libraries.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Does GHC optimize away unused code and packages? - haskell

Related

Include a closed source Haskell package in an open source Haskell package without leaking its code

Test for GHC compile time errors

RcppEigen and package size

How do I statically compile a C library into a Haskell module that I can later load with the GHC API?

Haskell: unnecessary binary growth with module imports

Categories

Resources