Why would dlopen reuse the address of a previously loaded symbol? - linux

I just debugged a strange issue where I have two libraries let's call it libA.so and libB.so
Application dlopens libA.so(EDIT: it isn't: it's linked by the -l option) which is a thin library which then loads libB.so which is the actual implementation.
The dlopen are called using the RTLD_NOW option, no other options are passed.
And both libraries use the same logger module where the logger's state is stored in a global variable, since both use the same logger and linked to them statically the global variable in both of them is of the same name.
When libB is loaded the two global variables are sitting at the same address and conflicting. So the dynamic linker reused the address of the variable to use the same variable in libB.
If it matters this variable is defined deep within a .cpp file, I'm not sure if linking between C and C++ is different.
Reading the dlopen's documentation it says:
RTLD_GLOBAL
The symbols defined by this library will be made available for symbol resolution of subsequently loaded libraries.
RTLD_LOCAL
This is the converse of RTLD_GLOBAL, and the default if neither flag is specified. Symbols defined in this library are not made available to resolve references in subsequently loaded libraries.
So RTLD_LOCAL is supposed to be the default, that is libA's symbols shouldn't be used when resolving libB's symbols. But it's still happening. Why?
As a workaround I added visibility("hidden") option to this global to avoid exporting. And raised a ticket to make all symbols hidden by default, so collisions like this shouldn't happen in the future, but I'm still wondering why this happens when it shouldn't.
EDIT2:
Source example:
commonvar.h:
#pragma once
#include <iostream>
struct A
{
A()
{
std::cout << "A inited. Address: " << this << "\n";
}
virtual ~A() {}
};
extern A object;
struct POD
{
int x, y, z;
};
extern POD pod;
commonvar.cpp:
#include <string>
#include "commonvar.h"
A object;
POD pod = {1, 2, 3};
a.h:
#pragma once
extern "C" void foo();
a.cpp:
#include <iostream>
#include "commonvar.h"
using FnFoo = void (*)();
extern "C" void foo()
{
std::cout << "A called.\n";
std::cout << "A: Address of foo is: " << &object << "\n";
std::cout << "A: Address of pod is: " << &pod << "\n";
std::cout << "A: {" << pod.x << ", " << pod.y << ", " << pod.z << "}\n";
pod.x = 42;
}
b.cpp:
#include <iostream>
#include <string>
#include "commonvar.h"
extern "C" void foo()
{
std::cout << "B called.\n";
std::cout << "B: Address of foo is: " << &object << "\n";
std::cout << "B: Address of pod is: " << &pod << "\n";
std::cout << "B: {" << pod.x << ", " << pod.y << ", " << pod.z << "}\n";
}
main.cpp:
#include <dlfcn.h>
#include <iostream>
#include <cassert>
#include "a.h"
using FnFoo = void (*)();
int main()
{
std::cout << "Start of program.\n";
foo();
std::cout << "Loading B\n";
void *b = dlopen("libb.so", RTLD_NOW);
assert(b);
FnFoo fnB;
fnB = FnFoo(dlsym(b, "foo"));
assert(fnB);
fnB();
}
Build script:
#!/bin/bash
g++ -fPIC -c commonvar.cpp
ar rcs common.a commonvar.o
g++ -fPIC -shared a.cpp common.a -o liba.so
g++ -fPIC -shared b.cpp common.a -o libb.so
g++ main.cpp liba.so -ldl -o main
Dynamic symbols of main:
U __assert_fail
0000000000202010 B __bss_start
U __cxa_atexit
w __cxa_finalize
U dlopen
U dlsym
0000000000202010 D _edata
0000000000202138 B _end
0000000000000bc4 T _fini
U foo
w __gmon_start__
0000000000000860 T _init
w _ITM_deregisterTMCloneTable
w _ITM_registerTMCloneTable
U __libc_start_main
U _ZNSt8ios_base4InitC1Ev
U _ZNSt8ios_base4InitD1Ev
0000000000202020 B _ZSt4cout
U _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
Dynamic symbols of liba.so:
0000000000202064 B __bss_start
U __cxa_atexit
w __cxa_finalize
0000000000202064 D _edata
0000000000202080 B _end
0000000000000e6c T _fini
0000000000000bba T foo
w __gmon_start__
0000000000000a30 T _init
w _ITM_deregisterTMCloneTable
w _ITM_registerTMCloneTable
0000000000202070 B object
0000000000202058 D pod
U _ZdlPvm
0000000000000dca W _ZN1AC1Ev
0000000000000dca W _ZN1AC2Ev
0000000000000e40 W _ZN1AD0Ev
0000000000000e22 W _ZN1AD1Ev
0000000000000e22 W _ZN1AD2Ev
U _ZNSolsEi
U _ZNSolsEPKv
U _ZNSt8ios_base4InitC1Ev
U _ZNSt8ios_base4InitD1Ev
U _ZSt4cout
U _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
0000000000201dd0 V _ZTI1A
0000000000000ed5 V _ZTS1A
0000000000201db0 V _ZTV1A
U _ZTVN10__cxxabiv117__class_type_infoE
Dynamic symbols of libb.so:
$ nm -D libb.so
0000000000202064 B __bss_start
U __cxa_atexit
w __cxa_finalize
0000000000202064 D _edata
0000000000202080 B _end
0000000000000e60 T _fini
0000000000000bba T foo
w __gmon_start__
0000000000000a30 T _init
w _ITM_deregisterTMCloneTable
w _ITM_registerTMCloneTable
0000000000202070 B object
0000000000202058 D pod
U _ZdlPvm
0000000000000dbe W _ZN1AC1Ev
0000000000000dbe W _ZN1AC2Ev
0000000000000e34 W _ZN1AD0Ev
0000000000000e16 W _ZN1AD1Ev
0000000000000e16 W _ZN1AD2Ev
U _ZNSolsEi
U _ZNSolsEPKv
U _ZNSt8ios_base4InitC1Ev
U _ZNSt8ios_base4InitD1Ev
U _ZSt4cout
U _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
0000000000201dd0 V _ZTI1A
0000000000000ec9 V _ZTS1A
0000000000201db0 V _ZTV1A
U _ZTVN10__cxxabiv117__class_type_infoE
Output:
A inited. Address: 0x7efd6cf97070
Start of program.
A called.
A: Address of foo is: 0x7efd6cf97070
A: Address of pod is: 0x7efd6cf97058
A: {1, 2, 3}
Loading B
A inited. Address: 0x7efd6cf97070
B called.
B: Address of foo is: 0x7efd6cf97070
B: Address of pod is: 0x7efd6cf97058
B: {42, 2, 3}
As it can be seen the addresses of the variables collide but the function's address doesn't.
Moreover the C++ initialization are peculiar: aggregates the pod variable are initialized only once you can see that call to foo() modifies it, but when B is loaded it won't reinitialize it, but does call the constructor for the full object when libb.so is loaded.

The key to answering this question is whether the main executable exports the same symbol in its dynamic symbol table. That is, what is the output from:
nm -D a.out | grep ' mangled_name_of_the_symbol'
If the output is empty, the two libraries should indeed use separate (their own) copies of the symbol. But if the output is not empty, then both libraries should reuse the symbol defined in the main binary (this happens because UNIX dynamic linking attempts to emulate what would have happened if everything was statically linked into the main binary -- UNIX support for shared libraries happened long after UNIX itself became popular, and in that context this design decision made sense).
Demonstration:
// main.c
#include <assert.h>
#include <dlfcn.h>
#include <stdio.h>
int foo = 12;
int main()
{
printf("main: &foo = %p, foo = %d\n", &foo, foo);
void *h = dlopen("./foo.so", RTLD_NOW);
assert (h != NULL);
void (*fn)(void) = (void (*)()) dlsym(h, "fn");
fn();
return 0;
}
// foo.c
#include <assert.h>
#include <dlfcn.h>
#include <stdio.h>
int foo = 42;
void fn()
{
printf("foo: &foo = %p, foo = %d\n", &foo, foo);
void *h = dlopen("./bar.so", RTLD_NOW);
assert (h != NULL);
void (*fn)(void) = (void (*)()) dlsym(h, "fn");
fn();
}
// bar.c
#include <stdio.h>
int foo = 24;
void fn()
{
printf("bar: &foo = %p, foo = %d\n", &foo, foo);
}
Build this with:
gcc -fPIC -shared -o foo.so foo.c && gcc -fPIC -shared -o bar.so bar.c &&
gcc main.c -ldl && ./a.out
Output:
main: &foo = 0x5618f1d61048, foo = 12
foo: &foo = 0x7faad6955040, foo = 42
bar: &foo = 0x7faad6950028, foo = 24
Now rebuild just the main binary with -rdynamic (which causes foo to be exported from it): gcc main.c -ldl -rdynamic. The output changes to:
main: &foo = 0x55ced88f1048, foo = 12
foo: &foo = 0x55ced88f1048, foo = 12
bar: &foo = 0x55ced88f1048, foo = 12
P.S.
You can gain much insight into the behavior of dynamic linker by running with:
LD_DEBUG=symbols,bindings ./a.out
Update:
It turns out I asked a wrong question ... Added source example.
If you look at LD_DEBUG output, you'll see:
165089: symbol=object; lookup in file=./main [0]
165089: symbol=object; lookup in file=./liba.so [0]
165089: binding file ./liba.so [0] to ./liba.so [0]: normal symbol `object'
165089: symbol=object; lookup in file=./main [0]
165089: symbol=object; lookup in file=./liba.so [0]
165089: binding file ./libb.so [0] to ./liba.so [0]: normal symbol `object'
What this means: liba.so is in the global search list (by virtue of having been directly linked to by main). This is approximately equivalent to having done dlopen("./liba.so", RTLD_GLOBAL).
It should not be a surprise then that the symbols in it are available for subsequently loaded shared libraries to bind to, which is exactly what the dynamic loader does.

A possible solution for this issue is using the RTLD_DEEPBIND flag of dlopen (however, it's Linux specific, not POSIX standard), which will make the loaded library try to resolve symbols against itself (and its own dependencies) before going through the ones in the global scope.
For this to work properly, the executable has to be built with -fPIE, otherwise some violated ODR assumptions made by libstdc++ will likely cause a segfault (alternatively, if iostream is replaced with cstdio, it works without -fPIE).

Related

Why linkage failed even when nm found the symbol?

When I compile a demo main.cpp, it failed because undefined reference to a_mtk_bt_service_init(), but I have found the symbol by
nm -D ./libmtk_bt_service_client.so|grep a_mtk_bt_service_init,
it's output is 0000000000004098 T a_mtk_bt_service_init,
I'm sure the compiler found the correct dynamic library by use command aarch64-poky-linux-g++ -print-file-name=libmtk_bt_service_client.so -o main main.cpp
This is the demo code main.cpp
void a_mtk_bt_service_init();
int main()
{
a_mtk_bt_service_init();
return 0;
}
and my compile command is
aarch64-poky-linux-g++ -mcpu=cortex-a72.cortex-a53+crypto -mtune=cortex-a72.cortex-a53 --sysroot=/home/sundq/code/newT9/T9-Amazon-Sdk/build/tmp/sysroots/aud8516-slc -o build/xx main.cpp -I../../include -lmtk_bt_service_client
The answer is here Call a C function from C++ code, that is, when c++ code call a c function, we also must add extern "C" before the declare of c function,
so my function declare should like this
extern "C" void a_mtk_bt_service_init();

ambiguous call to overloaded function with "bind" when i add header file"boost/function"

I write a test of boost::function.
These codes are working.
#include <iostream>
#include <boost/lambda/lambda.hpp>
#include <boost/bind.hpp>
//#include <boost/function.hpp>
#include <boost/ref.hpp>
using namespace std;
using namespace boost;
template<typename FUN,typename T>
T fun( FUN function, T lhs, T rhs ){
cout << typeid(function).name() << endl;
return function(lhs,rhs);
}
int add4(int a, int b, int c){
return a + b + c;
}
int main(){
cout << fun(bind(add4,2,_1,_2),1,4) << endl;
system("pause");
}
But when i add header file "boost/funcation"
VS2012 prompts me it.
error C2668: 'std::bind' : ambiguous call to overloaded function.
Don't import both std and boost namespaces into the global namespace, to avoid such an ambiguity.
Instead, either specify fully qualified names, like boost::function, boost::bind, or import particular symbols: using boost::function;.

Why can the value of the symbol returned by dlsym() be null?

In Linux. Per the dlsym(3) Linux man page,
*Since the value of the symbol could actually be NULL
(so that a NULL return from dlsym() need not indicate an error),*
Why is this, when can a symbol (for a function, specifically) be actually NULL? I am reviewing code and found a piece using dlerror to clean first, dlsym next, and dlerror to check for errors. But it does not check the resulting function from being null before calling it:
dlerror();
a_func_name = ...dlsym(...);
if (dlerror()) goto end;
a_func_name(...); // Never checked if a_func_name == NULL;
I am just a reviewer so don't have the option to just add the check. And perhaps the author knows NULL can never be returned. My job is to challenge that but don't know what could make this return a valid NULL so I can then check if such a condition could be met in this code's context. Have not found the right thing to read with Google, a pointer to good documentation would be enough unless you want to explain explicitly which would be great.
I know of one particular case where the symbol value returned by dlsym() can be NULL, which is when using GNU indirection functions (IFUNCs). However, there are presumably other cases, since the text in the dlsym(3) manual page pre-dates the invention of IFUNCs.
Here's an example using IFUNCs. First, a file that will be used to create a shared library:
$ cat foo.c
/* foo.c */
#include <stdio.h>
/* This is a 'GNU indirect function' (IFUNC) that will be called by
dlsym() to resolve the symbol "foo" to an address. Typically, such
a function would return the address of an actual function, but it
can also just return NULL. For some background on IFUNCs, see
https://willnewton.name/uncategorized/using-gnu-indirect-functions/ */
asm (".type foo, #gnu_indirect_function");
void *
foo(void)
{
fprintf(stderr, "foo called\n");
return NULL;
}
Now the main program, which will look up the symbol foo in the shared library:
$ cat main.c
/* main.c */
#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>
int
main(int argc, char *argv[])
{
void *handle;
void (*funcp)(void);
handle = dlopen("./foo.so", RTLD_LAZY);
if (handle == NULL) {
fprintf(stderr, "dlopen: %s\n", dlerror());
exit(EXIT_FAILURE);
}
dlerror(); /* Clear any outstanding error */
funcp = dlsym(handle, "foo");
printf("Results after dlsym(): funcp = %p; dlerror = %s\n",
(void *) funcp, dlerror());
exit(EXIT_SUCCESS);
}
Now build and run to see a case where dlsym() returns NULL, while dlerror() also returns NULL:
$ cc -Wall -fPIC -shared -o libfoo.so foo.c
$ cc -Wall -o main main.c libfoo.so -ldl
$ LD_LIBRARY_PATH=. ./main
foo called
Results after dlsym(): funcp = (nil); dlerror = (null)
Well, if it's returned with no errors, then pointer is valid and NULL is about as illegal as any random pointer from the shared object. Like the wrong function, data or whatever.
It can't be if the library/PIE is a product of normal C compilation, as C won't ever put a global object at the NULL address, but you can get a symbol to resolve to NULL using special linker tricks:
null.c:
#include <stdio.h>
extern char null_addressed_char;
int main(void)
{
printf("&null_addressed_char=%p\n", &null_addressed_char);
}
Compile, link, and run:
$ clang null.c -Xlinker --defsym -Xlinker null_addressed_char=0 && ./a.out
&null_addressed_char=(nil)
If you don't allow any such weirdness, you can treat NULL returns from dlsym as errors.
dlerror() returns the last error, not the status of the last call. So if nothing else the code you show may potentially get a valid result from dlsym() and fool itself into thinking there was an error (because there was still one in the queue). The purpose behind dlerror is to provide human-readable error messages. If you aren't printing the result, you are using it wrong.

Why is this wrong!? Generating strings

I've been trying to generate strings in this way:
a
b
.
.
z
aa
ab
.
.
zz
.
.
.
.
zzzz
And I want to know why Segmentation fault (core dumped) error is prompted when it reaches 'yz'. I know my code don't cover all the posibles strings like 'zb' or 'zc', but that's not all the point, I want to know why this error. I am not a master in coding as you see so please try to explain it clearly. Thanks :)
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
void move_positions (char s[]);
int main (int argc, char *argv[])
{
char s[28];
s[0] = ' ';
s[1] = '\0';
int a = 0;
for(int r = 'a'; r <= 'z'; r++)
{
for(int t ='a';t <='z'; t++)
{
for(int u = 'a';u <= 'z'; u++)
{
for(int y = 'a'; y <= 'z'; y++)
{
s[a] = (char)y;
printf ("%s\n", s);
if (s[0] == 'z')
{
move_positions(s);
a++;
}
}
s[a-1] = (char)u;
}
s[a-2] = (char)t;
}
s[a-3] = (char)r;
}
return 0;
}
void move_positions (char s[])
{
char z[28];
z[0] = ' ';
z[1] = '\0';
strcpy(s, strcat(z, s));
}
First, let's compile with debugging turned on:
gcc -g prog.c -o prog
Now let's run it under a debugger:
> gdb prog
GNU gdb 6.3.50-20050815 (Apple version gdb-1822) (Sun Aug 5 03:00:42 UTC 2012)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin"...Reading symbols for shared libraries .. done
(gdb) run
Starting program: /Users/andrew/Documents/programming/sx/13422880/prog
Reading symbols for shared libraries +............................. done
a
b
c
d
e
...
yu
yv
yw
yx
yy
yz
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x00007fffc0bff6c5
0x0000000100000c83 in main (argc=1, argv=0x7fff5fbff728) at prog.c:22
22 s[a] = (char)y;
Ok, it crashed on line 22, trying to do s[a] = (char)y. What's a?
(gdb) p a
$1 = 1627389953
So you're setting the ~1.6 millionth entry of the array s. What is s?
(gdb) ptype s
type = char [28]
Saving 1.6 million entries in a 28-element array? That's not going to work. Looks like you need to reset a to zero at the start of some of your loops.

Unprotect start and end of ELF section in library, to override from linked program

I would like to get the start and end pointers of a section in a library, in a way that it can be overridden from the program to which the program is being linked.
This allows me to specify in the program some parameters as to how the library should load. Here's a concrete example:
foo.c, the library:
#include <stdio.h>
typedef void (*fptr)();
void lib_function();
void dummy()
{
printf("NO -- I should be overriden by prog_function\n");
}
fptr section_fptrlist __attribute__((weak, section("fptrlist"))) = (fptr)dummy;
extern fptr __start_fptrlist;
extern fptr __stop_fptrlist;
void __attribute__((constructor)) setup()
{
// setup library: call pre-init functions;
for (fptr *f = &__start_fptrlist; f != &__stop_fptrlist; f++)
(*f)();
}
void lib_function()
{
}
bar.c, the program:
#include <stdio.h>
void lib_function();
typedef void (*fptr)();
void pre_init()
{
printf("OK -- run me from library constructor\n");
}
fptr section_fptrlist __attribute__((section("fptrlist"))) = (fptr)pre_init;
int main()
{
lib_function();
return 0;
}
I build libfoo.so from foo.c and then a test program from bar.c and libfoo.so, for example:
gcc -g -O0 -fPIC -shared foo.c -o libfoo.so
gcc -g -O0 bar.c -L. -lfoo -o test
This used to work fine, i.e. with ld version 2.26.1 I get as expected:
$ ./test
OK -- run me from library constructor
Now with ld version 2.29.1 I get:
$ ./test
NO -- I should be overriden by prog_function
I have compiled everything on one machine, and only changed the linker step by copying the object file to a different machine, running ld -shared foo.o -o libfoo.so and copying the library back, so as far as I can tell the linker is the only difference between this working and not working.
I further use gcc 7.2.0 and glibc 2.22-62 but as stated above that doesn't seem to be decisive. The differences in the linker scripts seem minor and using one instead of the other does not seem to make any difference to the result so far (2.26 with 2.29's script does work as excepted, 2.29 with 2.26's script does not). Here's the diff anyway:
--- ld_script_v2.26 2018-02-02 21:52:56.038573732 +0100
+++ ld_script_v2.29 2018-02-02 21:52:41.154504340 +0100
## -1,4 +1,4 ##
## -53,5 +53,5 ## SECTIONS
.plt : { *(.plt) *(.iplt) }
.plt.got : { *(.plt.got) }
-.plt.bnd : { *(.plt.bnd) }
+.plt.sec : { *(.plt.sec) }
.text :
{
## -226,4 +226,5 ## SECTIONS
/* DWARF Extension. */
.debug_macro 0 : { *(.debug_macro) }
+ .debug_addr 0 : { *(.debug_addr) }
.gnu.attributes 0 : { KEEP (*(.gnu.attributes)) }
/DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) }
Looking at the dynamic symbol table (with readelf -Ws) I noticed that in the 2.29 versions the symbols are now protected:
with ld 2.29> readelf -Ws libfoo.so | grep fptr
8: 0000000000201028 0 NOTYPE GLOBAL PROTECTED 24 __start_fptrlist
14: 0000000000201028 8 OBJECT WEAK DEFAULT 24 section_fptrlist
16: 0000000000201030 0 NOTYPE GLOBAL PROTECTED 24 __stop_fptrlist
54: 0000000000201028 8 OBJECT WEAK DEFAULT 24 section_fptrlist
58: 0000000000201028 0 NOTYPE GLOBAL PROTECTED 24 __start_fptrlist
62: 0000000000201030 0 NOTYPE GLOBAL PROTECTED 24 __stop_fptrlist
whith ld 2.26> readelf -Ws libfoo.so | grep fptrlist
9: 0000000000201028 0 NOTYPE GLOBAL DEFAULT 23 __start_fptrlist
15: 0000000000201028 8 OBJECT WEAK DEFAULT 23 section_fptrlist
17: 0000000000201030 0 NOTYPE GLOBAL DEFAULT 23 __stop_fptrlist
53: 0000000000201028 8 OBJECT WEAK DEFAULT 23 section_fptrlist
57: 0000000000201028 0 NOTYPE GLOBAL DEFAULT 23 __start_fptrlist
61: 0000000000201030 0 NOTYPE GLOBAL DEFAULT 23 __stop_fptrlist
I am aware, from this answer, that the feature I was relying on is more shady that well defined. I was able to track down the fact that this change was intentional. What is now the best way for me to achieve the goal of calling program function(s) from my library setup?
Can I still make this approach work? Is there a way to un-protect those symbols for example?
Even though this example is small, this problem actually happens in a pretty big C++ project, so the less changes the better.
I think the problem is
for (fptr *f = &__start_fptrlist; f != &__stop_fptrlist; f++)
(*f)();
here the for-loop is expected to go through __start_fptrlist and __stop_fptrlist defined in your app. While the .protected makes those symbols resolved from the .so itself.
A simple workaround will be:
// foo.c
/* ... */
fptr *my_start = &__start_fptrlist;
fptr *my_stop = &__stop_fptrlist;
void __attribute__((constructor)) setup()
{
// setup library: call pre-init functions;
for (fptr *f = my_start; f != my_stop; f++)
(*f)();
}
Here the exact value of my_* function is not important, because these 2 names are supposed to be bound to the value from app.
// bar.c
/* ... */
fptr section_fptrlist __attribute__((section("fptrlist"))) = (fptr)pre_init;
extern fptr __start_fptrlist;
extern fptr __stop_fptrlist;
fptr *my_start = &__start_fptrlist;
fptr *my_stop = &__stop_fptrlist;
This will force your for-loop go through the addresses from your app instead of your .so. Because my_* symbols are global and they will be first resolved from app.
Warning: Code not tested, since I don't have the environment as you described. Please let me know whether this approach works on your machine.

Resources