How tcamalloc gets linked to the main program - malloc

I want to know how malloc gets linked to the main program.Basically I have a program which uses several static and dynamic libraries.I am including all these in my makefile using option "-llibName1 -llibName2".
The documentation of TCmalloc says that we can override our malloc simply by calling "LD_PRELOAD=/usr/lib64/libtcmalloc.so".I am not able to understand how tcamlloc gets called to the all these static and dynamic libraries.Also how does tcmalloc also gets linked to STL libraries and new/delete operations of C++?
can anyone please give any insights on this.

"LD_PRELOAD=/usr/lib64/libtcmalloc.so" directs the loader to use libtcmalloc.so before any other shared library when resolving symbols external to your program, and because libtcmalloc.so defines a symbol named "malloc", that is the verison your program will use.
If you omit the LD_PRELOAD line, glibc.so (or whatever C library you have on your system) will be the first shared library to define a symbol named "malloc".
Note also that if you link against a static library which defines a symbol named "malloc" (and uses proper arguments, etc), or another shared library is loaded that defines a symbol named "malloc", your program will attempt to use that version of malloc.
That's the general idea anyway; the actual goings-on is quite interesting and I will have to direct to you http://en.wikipedia.org/wiki/Dynamic_linker as a starting point for more information.

Related

Is it possible to recursively populate missing shared objects when calling dlmopen(LM_ID_NEWLM)?

When a shared library is opened with dlmopen(LM_ID_NEWLM), there are no existing shared libraries in the new namespace so many libraries will crash or at least give unresolved symbols.
Is it possible to load a .so with LM_ID_NEWLM and pull in all of the .so's that were already linked to the running application?
I could walk /proc/$$/maps and pull them in first, but it seems like that should be unnecessary.
Ideas?
No, there is no API to copy shared libraries to new namespace from default one. You can either use /proc/self/maps as you suggested or better yet, use dl_iterate_phdr (/proc may not always be mounted).
As a side note, if your library fails to load due to unresolved symbols it means that it should be fixed to include its dependencies.

How to do runtime binding based on CPU capabilities on linux

Is it possible to have a linux library (e.g. "libloader.so") load another library to resolve any external symbols?
I've got a whole bunch of code that gets conditionally compiled for the SIMD level to be supported ( SSE2, AVX, AVX2 ). This works fine if the build platform is the same as the runtime platform. But it hinders reuse across different processor generations.
One thought is to have executable which calls function link to libloader.so that does not directly implement function. Rather, it resolves(binds?) that symbol from another loaded library e.g. libimpl_sse2.so, libimpl_avx2.so or so on depending on cpuflags.
There are hundreds of functions that need to be dynamically bound in this way, so changing the declarations or calling code is not practical.
The program linkage is fairly easy to change. The runtime environment variables could also be changed, but I'd prefer not to.
I've gotten as far as making an executable that builds and starts with unresolved external symbols (UES) via the ld flag --unresolved-symbols=ignore-all. But subsequent loading of the impl lib does not change the value of the UES function from NULL.
Edit: I found out later on that the technique described below will only work under limited circumstances. Specifically, your shared libraries must contain functions only, without any global variables. If there are globals inside the libraries that you want to dispatch to, then you will end up with a runtime dynamic linker error. This occurs because global variables are relocated before shared library constructors are invoked. Thus, the linker needs to resolve those references early, before the dispatching scheme described here has a chance to run.
One way of accomplishing what you want is to (ab)use the DT_SONAME field in your shared library's ELF header. This can be used to alter the name of the file that the dynamic loader (ld-linux-so*) loads at runtime in order to resolve the shared library dependency. This is best explained with an example. Say I compile a shared library libtest.so with the following command line:
g++ test.cc -shared -o libtest.so -Wl,-soname,libtest_dispatch.so
This will create a shared library whose filename is libtest.so, but its DT_SONAME field is set to libtest_dispatch.so. Let's see what happens when we link a program against it:
g++ testprog.cc -o test -ltest
Let's examine the runtime library dependencies for the resulting application binary test:
> ldd test
linux-vdso.so.1 => (0x00007fffcc5fe000)
libtest_dispatch.so => not found
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd1e4a55000)
/lib64/ld-linux-x86-64.so.2 (0x00007fd1e4e4f000)
Note that instead of looking for libtest.so, the dynamic loader instead wants to load libtest_dispatch.so instead. You can exploit this to implement the dispatching functionality that you want. Here's how I would do it:
Create the various versions of your shared library. I assume that there is some "generic" version that can always be used, with other optimized versions utilized at runtime as appropriate. I would name the generic version with the "plain" library name libtest.so, and name the others however you choose (e.g. libtest_sse2.so, libtest_avx.so, etc.).
When linking the generic version of the library, override its DT_SONAME to something else, like libtest_dispatch.so.
Create a dispatcher library called libtest_dispatch.so. When the dispatcher is loaded at application startup, it is responsible for loading the appropriate implementation of the library. Here's pseudocode for what the implementation of libtest_dispatch.so might look like:
#include <dlfcn.h>
#include <stdlib.h>
// the __attribute__ ensures that this function is called when the library is loaded
__attribute__((constructor)) void init()
{
// manually load the appropriate shared library based upon what the CPU supports
// at runtime
if (avx_is_available) dlopen("libtest_avx.so", RTLD_NOW | RTLD_GLOBAL);
else if (sse2_is_available) dlopen("libtest_sse2.so", RTLD_NOW | RTLD_GLOBAL);
else dlopen("libtest.so", RTLD_NOW | RTLD_GLOBAL);
// NOTE: this is just an example; you should check the return values from
// dlopen() above and handle errors accordingly
}
When linking an application against your library, link it against the "vanilla" libtest.so, the one that has its DT_SONAME overridden to point to the dispatcher library. This makes the dispatching essentially transparent to any application authors that use your library.
This should work as described above on Linux. On Mac OS, shared libraries have an "install name" that is analogous to the DT_SONAME used in ELF shared libraries, so a process very similar to the above could be used instead. I'm not sure about whether something similar could be used on Windows.
Note: There is one important assumption made in the above: ABI compatibility between the various implementations of the library. That is, your library should be designed such that it is safe to link against the most generic version at link time while using an optimized version (e.g. libtest_avx.so) at runtime.

Call dlopen with file descriptor?

I want to open a shared object as a data file and perform a verification check on it. The verification is a signature check, and I sign the shared object. If the verification is successful, I would like to load the currently opened shared object as a proper shared object.
First question: is it possible to call dlopen and load the shared object as a data file during the signature check so that code is not executed? According to the man pages, I don't believe so since I don't see a flag similar to RTLD_DATA.
Since I have the shared object open as a data file, I have the descriptor available. Upon successful verification, I would like to pass the descriptor to dlopen so the dynamic loader loads the shared object properly. I don't want to close the file then re-open it via dlopen because it could introduce a race condition (where the file verified is not the same file opened and executed).
Second question: how does one pass an open file to dlopen using a file descriptor so that dlopen performs customary initialization of the shared object?
On Linux, you probably could dlopen some /proc/self/fd/15 file (for file descriptor 15).
RTLD_DATA does not seems to exist. So if you want it, you have to patch your own dynamic loader. Perhaps doing that within MUSL Libc could be less hard. I still don't understand why you need it.
You have to trust the dlopen-ed plugin somehow (and it will run its constructor functions at dlopen time).
You could analyze the shared object plugin before dlopen-ing it by using some ELF parsing library, perhaps libelf or libbfd (from binutils); but I still don't understand what kind of analysis you want to make (and you really should explain that; in particular what happens if the plugin is indirectly linked to some bad behaving software). In other words you should explain more about your verification step. Notice that a shared object could overwrite itself....
Alternatively, don't use dlopen and just mmap your file (you'll need to parse some ELF and process relocations; see elf(5) and Levine's Linkers and Loaders for details, and look into the source code of your ld.so, e.g. in GNU glibc).
Perhaps using some JIT generation techniques might be useful (you would JIT generate code from some validated data), e.g. with GCCJIT, LLVM, or libjit or asmjit (or even LuaJit or SBCL) etc...
And if you have two file descriptors to the same shared object you probably won't have any race conditions.
An option is to build your ad-hoc static C or C++ source code analyzer (perhaps using some GCC plugin provided by you). That plugin might (with months, or perhaps years, of development efforts) check some property of the user C++ code. Beware of Rice's theorem (limiting the properties of every static source code analyzer). Then your program might (like my manydl.c does, or like RefPerSys will soon do, in mid 2020, or like the obsolete GCC MELT did a few years ago) take as input the user C++ code, run some static analysis on that C++ code (e.g. using your GCC plugin), compile that C++ code into a temporary shared object, and dlopen that shared object. So read Drepper's paper How to Write Shared Libraries.

does dynamic library shared global variable in linux

As we know, linux call ldconfig to load all *.so libraries and then link the applications who use the shared library. However, I am confused how the global variable is working in this case. Since there is only one copy of shared library across all these application, do they share the global variables in the shared library? If yes, then how they synchronize?
Thanks,
No it is not shared - the code/text section of the library is shared - the data portion is unique to each process that uses the library
As I commented:
Levine's book on linkers and loaders is a useful reference.
Linux dynamic linker ld.so is free software, part of GNU libc and you can study and improve its source code
the dynamic linker is ld.so not ldconfig (which just updated cached information used by ld.so).
the ld.so linker is using the mmap(2) system call to project some .so segments into the virtual address space of the process; the "text" segment (for code and read-only constants) uses MAP_SHARED with PROT_READ. The "data" segment (for global or static variables in C or C++) uses MAP_PRIVATE with PROT_WRITE
you would learn a lot by strace-ing your program to get a feeling of the involved system calls.

Do (statically linked) DLLs use a different heap than the main program?

I'm new to Windows programming and I've just "lost" two hours hunting a bug which everyone seems aware of: you cannot create an object on the heap in a DLL and destroy it in another DLL (or in the main program).
I'm almost sure that on Linux/Unix this is NOT the case (if it is, please say it, but I'm pretty sure I did that thousands of times without problems...).
At this point I have a couple of questions:
1) Do statically linked DLLs use a different heap than the main program?
2) Is the statically linked DLL mapped in the same process space of the main program? (I'm quite sure the answer here is a big YES otherwise it wouldn't make sense passing pointers from a function in the main program to a function in a DLL).
I'm talking about plain/regular DLL, not COM/ATL services
EDIT: By "statically linked" I mean that I don't use LoadLibrary to load the DLL but I link with the stub library
DLLs / exes will need to link to an implementation of C run time libraries.
In case of C Windows Runtime libraries, you have the option to specify, if you wish to link to the following:
Single-threaded C Run time library (Support for single threaded libraries have been discontinued now)
Multi-threaded DLL / Multi-threaded Debug DLL
Static Run time libraries.
Few More (You can check the link)
Each one of them will be referring to a different heap, so you are not allowed pass address obtained from heap of one runtime library to other.
Now, it depends on which C run time library the DLL which you are talking about has been linked to. Suppose let's say, the DLL which you are using has been linked to static C run time library and your application code (containing the main function) has linked to multi-threaded C Runtime DLL, then if you pass a pointer to memory allocated in the DLL to your main program and try to free it there or vice-versa, it can lead to undefined behaviour. So, the basic root cause are the C runtime libraries. Please choose them carefully.
Please find more info on the C run time libraries supported here & here
A quote from MSDN:
Caution Do not mix static and dynamic versions of the run-time libraries. Having more than one copy of the run-time libraries in a process can cause problems, because static data in one copy is not shared with the other copy. The linker prevents you from linking with both static and dynamic versions within one .exe file, but you can still end up with two (or more) copies of the run-time libraries. For example, a dynamic-link library linked with the static (non-DLL) versions of the run-time libraries can cause problems when used with an .exe file that was linked with the dynamic (DLL) version of the run-time libraries. (You should also avoid mixing the debug and non-debug versions of the libraries in one process.)
Let’s first understand heap allocation and stack on Windows OS wrt our applications/DLLs. Traditionally, the operating system and run-time libraries come with an implementation of the heap.
At the beginning of a process, the OS creates a default heap called Process heap. The Process heap is used for allocating blocks if no other heap is used.
Language run times also can create separate heaps within a process. (For example, C run time creates a heap of its own.)
Besides these dedicated heaps, the application program or one of the many loaded dynamic-link libraries (DLLs) may create and use separate heaps, called private heaps
These heap sits on top of the operating system's Virtual Memory Manager in all virtual memory systems.
Let’s discuss more about CRT and associated heaps:
C/C++ Run-time (CRT) allocator: Provides malloc() and free() as well as new and delete operators.
The CRT creates such an extra heap for all its allocations (the handle of this CRT heap is stored internally in the CRT library in a global variable called _crtheap) as part of its initialization.
CRT creates its own private heap, which resides on top of the Windows heap.
The Windows heap is a thin layer surrounding the Windows run-time allocator(NTDLL).
Windows run-time allocator interacts with Virtual Memory Allocator, which reserves and commits pages used by the OS.
Your DLL and exe link to multithreaded static CRT libraries. Each DLL and exe you create has a its own heap, i.e. _crtheap. The allocations and de-allocations has to happen from respective heap. That a dynamically allocated from DLL, cannot be de-allocated from executable and vice-versa.
What you can do? Compile our code in DLL and exe’s using /MD or /MDd to use the multithread-specific and DLL-specific version of the run-time library. Hence both DLL and exe are linked to the same C run time library and hence one _crtheap. Allocations are always paired with de-allocations within a single module.
If I have an application that compiles as an .exe and I want to use a library I can either statically link that library from a .lib file or dynamically linked that library from a .dll file.
Each linked module (ie. each .exe or .dll) will be linked to an implementation of the C or C++ run times. The run times themselves are a library that can be statically or dynamically linked to and come in different threading configurations.
By saying statically linked dlls are you describing a set up where an application .exe dynamically links to a library .dll and that library internally statically links to the runtime? I will assume that this is what you mean.
Also worth noting is that every module (.exe or .dll) has its own scope for statics i.e. a global static in an .exe will not be the same instance as a global static with the same name in a .dll.
In the general case therefore it cannot be assumed that lines of code running inside different modules are using the same implementation of the runtime, furthermore they will not be using the same instance of any static state.
Therefore certain rules need to be obeyed when dealing with objects or pointers that cross module boundaries. Allocations and deallocations must be occur in the same module for any given address. Otherwise the heaps will not match and behaviour will not be defined.
COM solves this this using reference counting, objects delete themselves when the reference count reaches zero. This is a common pattern used to solve the matched location problem.
Other problems can exist, for instance windows defines certain actions e.g. how allocation failures are handled on a per thread basis, not on a per module basis. This means that code running in module A on a thread setup by module B can also run into unexpected behaviour.

Resources