How Haskell linking may work given that STG generates custom ABI

How Haskell linking may work given that STG generates custom ABI - haskell

As far as I understand,code generated from STG uses very custom ABI, which even has custom call stack.
How linker does work then? Does GHC use custom linker?

If you run ghc with the -v flag to generate verbose logging output, you'll see that it invokes gcc to perform the final link. Of course, gcc uses the standard system ld linker, but with some additional setup that GHC finds useful.
As mentioned by #arrowd in the comments, the linker doesn't really care much about calling conventions. For the most part, it just fixes up undefined references to symbols in other object files, whether those symbols reference code or data.
When a Haskell module is compiled to native code, it mostly takes the form of a bunch of data structures ("closures") and code snippets prefixed with metadata ("info blocks"), and that data and code is labelled with global symbols, which can be referenced by other modules and resolved by the linker, the same way references to data structures or C functions might be resolved when linking a C program.
As a simple example, the rather silly module:
module Foo where
foo :: (a -> b -> c) -> d -> a -> b -> c
foo f = const f
when compiled with ghc -c -O0 -fforce-recomp -ddump-asm Foo.hs generates basically the following info block and closure for foo (simplified by removing some extra assembly pragmas).
;; info block, proceeded by a few quad words of metadata
.section .text
.quad 4294967301
.quad 0
.long 14
.long GHC.Base.const_closure-(Foo.foo_info)+0
Foo.foo_info:
movl $GHC.Base.const_closure,%ebx
jmp stg_ap_p_fast
.size Foo.foo_info, .-Foo.foo_info
;; closure
.section .data
Foo.foo_closure:
.quad Foo.foo_info
.quad 0
The symbols Foo.foo_info and Foo.foo_closure are exported by the module, and the symbol Base.const_closure is referenced by code and data in the model, becoming an undefined symbol that must be resolved.
Another calling module using this foo function will typically do so via a reference to the symbol Foo.foo_closure. The linker will resolve the reference from the calling module to the Foo.foo_closure label in the Foo module, and also resolve the reference from the Foo module to the label GHC.Base.const_closure in the GHC base package.

Related

How to add own custom function in standard shared library (C) in linux?

I have downloaded libgcrypt library source code and I want to customize this standard shared library by adding my function inside one particular
source code file .
Although compilation/build process of customized shared library is successful, but it shows error at linking time.
Here is what I have done .
inside /src/visibility.c file, I have added my custom function,
void MyFunction(void)
{
printf("This is added just for testing purpose");
}
I have also include function prototype inside /src/gcrypt.h
void MyFunction(void);
#build process
./configure --prefix=/usr
sudo make install
nm command find this custom function.
nm /usr/lib/libgcrypt.so | grep MyFunction
000000000000dd70 t MyFunction
Here is my sample code to access my custom function.
//aes_gcrypt_example.c
#include <stdio.h>
#include <gcrypt.h>
#include <assert.h>
int main()
{
MyFunction();
return 0;
}
gcc aes_gcrypt_example.c -o aes -lgcrypt
/tmp/ccA0qgAB.o: In function `main':
aes_gcrypt_example.c:(.text+0x3a2): undefined reference to `MyFunction'
collect2: error: ld returned 1 exit status
I also tried by making MyFunction as extern inside gcrypt.h, but in that case also I am getting same error.
Why is this happening ?
Is the customization of standard library is not allowed ?
If YES, then is there any FLAG to disable to allow customization ?
If NO, what mistake I am making ?
It would be great help if someone provide some useful link/solution for the above mentioned problem. I am using Ubuntu16.04 , gcc 4.9.

Lower-case t for the symbol type?
nm /usr/lib/libgcrypt.so | grep MyFunction
000000000000dd70 t MyFunction
Are you sure that's a visible function? On my Ubuntu 16.04 VM, the linkable functions defined in an object file have T (not t) as the symbol type. Is there a stray static kicking around and causing confusion? Check a couple of other functions defined in libgcrypt.so (and documented in gcrypt.h) and see whether they have a t or a T. They will have a T and not t. You'll need to work out why your function gets a t — it is not clear from the code you show.
The (Ubuntu) man page for nm includes:
The symbol type. At least the following types are used; others
are, as well, depending on the object file format. If lowercase,
the symbol is usually local; if uppercase, the symbol is global
(external).
The line you show says that MyFunction is not visible outside its source file, and the linker agrees because it is not finding it.
Your problem now is to check that the object file containing MyFunction has a symbol type T — if it doesn't the problem is in the source code.
Assuming that the object file shows symbol type T but the shared object shows symbol type t, you have to find what happens during the shared object creation phase to make the symbol invisible outside the shared object. This is probably because of a 'linker script' that controls which symbols are visible outside the library (or maybe just compilation options). You can search on Google with 'linker script' and various extra words ('tutorial', 'provide', 'example', etc) and come up with links to the relevant documentation.
You may need to research documentation for LibTool, or for the linker BinUtils. LibTool provides ways of manipulating shared libraries. In a compilation command line that you show in a comment, there is the option -fvisibility=hidden. I found (mostly by serendipitous accident) a GCC Wiki on visibility. See also visibility attribute and code generation options.

Shared object in Linux without symbol interposition, -fno-semantic-interposition error

Shared objects (*.so) in Unix-like systems are inefficient because of the symbol interposition: Every access to a global variable inside the .so needs a GOT lookup, and every call from one function to another inside the .so needs a PLT lookup. I was therefore happy to see that gcc version 5.1 has added the option -fno-semantic-interposition. However, when I try to make a .so where one function calls another without using a PLT, I get the error message:
relocation R_X86_64_PC32 against symbol `functionname' can not be used when making a shared object; recompile with -fPIC
I expected the option -fno-semantic-interposition to eliminate this error message, but it doesn't. -mcmodel=large doesn't help either.
The reference to the function is indeed position-independent, which the error message actually confirms (R_X86_64_PC32 means PC-relative 32-bit relocation in 64-bit mode). -fPIC does not really mean position-independent, as the name would imply, it actually means use GOT and PLT.
I cannot use __attribute__((visibility ("hidden"))) because the called function and the caller are compiled in seperate files (caller is in C++, called function is in assembly).
I tried to make an assembly listing to see what the option -fno-semantic-interposition does. I found out that it makes a reference to a local alias when one function calls another in the same file, but it still uses a PLT when calling a function in another file.
(g++ version is 5.2.1 Ubuntu, 64-bit mode).
Is there a way to make the linker accept a cross-reference inside a .so without the GOT/PLT lookup?

Is there a way to make the linker accept a cross-reference inside a .so without the GOT/PLT lookup?
Yes: attribute((visibility("hidden"))) is exactly the way to do it.
I cannot use attribute((visibility ("hidden"))) because the called function and the caller are compiled in seperate files
You are confused: visibility("hidden") means that the symbol will not be exported from the shared library, when it is finally linked. But the symbol is global and visible across multiple translation units before that final link.
Proof:
$ cat t1.c
extern int foo() __attribute__((visibility("hidden")));
int main() { return foo(); }
$ cat t2.c
int foo() __attribute__((visibility("hidden")));
int foo() { return 42; }
$ gcc -c -fPIC t1.c t2.c
$ gcc -shared t1.o t2.o -o t.so
$ nm -D t.so | grep foo
$
I tried to make an assembly listing to see what the option -fno-semantic-interposition does. I found out that it makes a reference to a local alias when one function calls another in the same file, but it still uses a PLT when calling a function in another file.
If you read the discussion in gcc-patches, you'll see that the -fno-semantic-interposition is about allowing inlining of possibly interposable functions, not about the way they are actually called when not inlined.

Static vs. dynamic linking conflicts and duplication

I have a code A that is statically linked against one version of mpich. Now comes library B, which is used by A via dlopen(). B depends on mpich as well, but is linked dynamically against it.
The problem is that now, in order for B to take advantage of mpi distribution, needs to access the communicator currently handled by A. This communicator has been created by A static version of mpich, When B invokes MPI routines, it will use a dynamic version of MPI which is not guarateed to be compatible with the static version attached to A.
This is the overall picture. I think that the only solution is to have mpich dynamically linked for both A and B. What I am not fully understanding is however the following:
how does the linker handle shared objects dependencies when dlopening? Will I have two instances of mpich in VM also with dynamic linking, or is the linker smart enough to realize that the symbols required by the dlopened B are already in the address space and will resolve against those.
Is it possible to tell the linker: when you dlopen this library, don't go fetch the dynamic dependency, but resolve it with the static symbols that are already provided by A

In short: it depends on dlopen options. By default, if a symbol needed by the requested library already exists in the global scope, it will be reused (this is what you want). But you can bypass this behavior with RTLD_DEEPBIND, with this flag, the dependencies won't be reused from the global scope, and will be loaded a second time.
Here is some code to reproduce your situation and demo the effect of this flag.
Let's make a common library that will be used by both lib A and program B. This library will exist in two versions.
$ cat libcommon_v1.c
int common_func(int a)
{
return a+1;
}
$ cat libcommon_v2.c
int common_func(int a)
{
return a+2;
}
Now let's write lib A that uses libcommon_v2:
$ cat liba.c
int common_func(int a);
int a_func(int a)
{
return common_func(a)+1;
}
And finally program B that dynamically links to libcommon_v1 and dlopens lib A:
$ cat progb.c
#include <stdio.h>
#include <dlfcn.h>
int common_func(int a);
int a_func(int a);
int main(int argc, char *argv[])
{
void *dl_handle;
int (*a_ptr)(int);
char c;
/* just make sure common_func is registered in our global scope */
common_func(42);
printf("press 1 for global scope lookup, 2 for deep bind\n");
c = getchar();
if(c == '1')
{
dl_handle = dlopen("./liba.so", RTLD_NOW);
}
else if(c == '2')
{
dl_handle = dlopen("./liba.so", RTLD_NOW | RTLD_DEEPBIND);
}
else
{
printf("wrong choice\n");
return 1;
}
if( ! dl_handle)
{
printf("dlopen failed: %s\n", dlerror());
return 2;
}
a_ptr = dlsym(dl_handle, "a_func");
if( ! a_ptr)
{
printf("dlsym failed: %s\n", dlerror());
return 3;
}
printf("calling a_func(42): %d\n", (*a_ptr)(42));
return 0;
}
Let's build and run all the things:
$ export LD_LIBRARY_PATH=.
$ gcc -o libcommon_v1.so -fPIC -shared libcommon_v1.c
$ gcc -o libcommon_v2.so -fPIC -shared libcommon_v2.c
$ gcc -Wall -g -o progb progb.c -L. -lcommon_v1 -ldl
$ gcc -o liba.so -fPIC -shared liba.c -L. -lcommon_v2
$ ./progb
press 1 for global scope lookup, 2 for deep bind
1
calling a_func(42): 44
$ ./progb
press 1 for global scope lookup, 2 for deep bind
2
calling a_func(42): 45
We can clearly see that with default options, dlopen reuses the symbol common_func that was present in program B and that with RTLD_DEEPBIND, libcommon was loaded again and library A got its own version of common_func.

You did not say which Toolchain (GCC, LLVM, MSC, etc.) you are using, the most helpful answer will depend upon this information.
May I suggest you look at "GCC Exception Frames" http://www.airs.com/blog/archives/166 .
If that is helpful then the Gold Linker, which is available for GCC and LLVM, supports 'Link Time Optimization' and can run in "Make" with DLLTool http://sourceware.org/binutils/docs/binutils/dlltool.html .
Indeed it is possible to have both Static and Dynamic Code call each other, the Computer does not care; it will 'run' anything it is fed -- whether that ends up working exactly the way you want or HCFs depends on correct Code and correct Linker commands.
Using a Debugger will not be fun. It would be best to mangle the names prior to Linking so that when debugging you can see from which module the Code came. Once it is up and running you can undef the Mangle and have the same-named Functions Link (to ensure it still functions).
Compiler / Linker Bugs will not be your friend.
This sort of scenario (Static and Dynamic Linking) occurs more often with MinGW and Cygwin where some of the Libraries are Static yet a Library you download from the Internet is only available in Dynamic form (without Source).
If the Library is from two different Compiler Toolchains then other issues arise, see this StackOverflow Thread: " linking dilemma (undefined reference) between MinGW and MSVC. MinGW fails MSVC works ".
It would be best to simply get the newest version of the Library from the source and compile the whole thing yourself, rather than rely on trying to cobble together bits and pieces from different sources (though it is possible to do that).
You can even load the Dynamic Library and call it (statically) and then reload portions of it later.
How tight are you on Memory and how fast do you want Functions to run, if everything is in Memory your Program can transfer execution to called Functions straight away, if you are swapping a portion of your Code to VM your execution times will really take a hit.
Running a Profiler on your Code will help decide what portions of a Library to load if you want to do 'dynamic dynamic linking' (full control of your dyna-linking by loading a Dynamic Library so it can be used Statically). This is the stuff that headaches and nightmare are made of. GL.

Difference in md5sums in two object files

I compile twice the same .c and .h files and get object files with the same size but different md5sums.
Here is the only difference from objdump -d:
1) cpcidskephemerissegment.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <_ZN68_GLOBAL__N_sdk_segment_cpcidskephemerissegment.cpp_00000000_B8B9E66611MinFunctionEii>:
2) cpcidskephemerissegment.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <_ZN68_GLOBAL__N_sdk_segment_cpcidskephemerissegment.cpp_00000000_8B65537811MinFunctionEii>:
What can be the reason? Thanks!

I guess, the compiler didn't know how to name this namespace and used path to the source file plus some random number.
The compiler must guarantee that a symbol in unnamed namespace does not conflict with any other symbol in your program. By default this is achieved by taking full filename of the source, and appending a random hash value to it (it's legal to compile the same source twice (e.g. with different macros) and link the two objects into a single program, and the unnamed namespace symbols must still be distinct, so using just the source filename without the seed is not enough).
If you know that you are not linking the same source file more than once, and want to have a bit-identical object file on re-compile, the solution is to add -frandom-seed="abcd" to your compile line (replace "abcd" with anything you want; it's common to use the filename as the value of random seed). Documentation here.

The reasons can be many:
Using macros like __DATE__ and __TIME__
Embedding counters that are incremented for each build (the Linux kernel does this)
Timestamps (or similarly variable quantities) embedded in the .comments ELF section. One example of a compiler that does this is the xlC compiler on AIX.
Different names as a result of name mangling (e.g. C++)
Changes in environment variables which are affecting the build process.
Compiler bug(s) (however unlikely)
To produce bit identical builds, you can use GCC's -frandom-seed parameter. There were situations where it could break things before GCC 4.3, but GCC now turns functions defined in anonymous namespaces into static symbols. However, you will always be safe if you compile each file using a different value for -frandom-seed, the simplest way being to use
the filename itself as the seed.

Finally I've found the answer!
c++filt command gave the original name of the function:
{unnamed namespace}: MinFunction(int, int)
In the source was:
namespace
{
MinFunction(int a, int b) { ... }
}
I named the namespace and got stable checksum of object file!
As I guess, the compiler didn't know how to name this namespace and used path to the source file plus some random number.

How to restrict access to symbols in shared object?

I have a plug-in in the form of a shared library (bar.so) that links into a larger program (foo). Both foo and bar.so depend on the same third party library (baz) but they need to keep their implementations of baz completely separate. So when I link foo (using the supplied object files and archives) I need it to ignore any use of baz in bar.so and vice versa.
Right now if I link foo with --trace-symbol=baz_fun where baz_fun is one of the offending symbols I get the following output:
bar.so: definition of baz_fun
foo/src.a(baz.o): reference to baz_fun
I believe this is telling me that foo is referencing baz_fun from bar.so (and execution of foo confirms this).
Solutions that I have tried:
Using objcopy to "localize" the symbols of interest: objcopy --localize-symbols=local.syms bar.so where local.syms contains all of the symbols of interest. I think I might just be confused here and maybe "local" doesn't mean what I think it means. Regardless, I get the same output from the link above. I should note that if I run the nm tool on bar.so prior to using objcopy all of the symbols in question have the T flag (upper-case indicating global) and after objcopy they have a t indicating they are local now. So it appears I am using objcopy correctly.
Compiling with -fvisibility=hidden however due to some other constraints I need to use GCC 3.3 which doesn't appear to support that feature. I might be able to upgrade to a newer version of GCC but would like confirmation that compiling with this flag will help me before heading down that road.
Other things to note:
I do not have access to the source code of either foo or baz
I would prefer to keep all of my plug-in in one shared object (bar.so). baz is actually a licensing library so I don't want it separated

Use dlopen to load your plugin with RTLD_DEEPBIND flag.
(edit)
Please note that RTLD_DEEPBIND is Linux-specific and need glibc 2.3.4 or newer.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string