I am compiling a C program on linux with gcc. The program itself links libc (and not much else) at build-time, so that ldd gives this output :
$ ldd myprogram
linux-vdso.so.1 => (0x00007fffd31fe000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7a991c0000)
/lib64/ld-linux-x86-64.so.2 (0x00007f7a99bba000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f7a98fbb000)
At run-time this program dlopen()s library B, which dependes on a library A, which of course dlopen also loads before returning. A exports a function called re_exec, which B invokes (B is linked against A). libc also exports a function called re_exec. readelf output :
$ readelf -as A.so | grep re_exec
104: 00000000000044ff 803 FUNC GLOBAL PROTECTED 11 re_exec
469: 00000000000044ff 803 FUNC GLOBAL PROTECTED 11 re_exec
$ readelf -as /lib/x86_64-linux-gnu/libc.so.6 | grep re_exec
2165: 00000000000e4ae0 39 FUNC WEAK DEFAULT 12 re_exec##GLIBC_2.2.5
The problem is that when B invokes re_exec, the re_exec inside libc is called, NOT the re_exc inside of A.
If, when I invoke the program, I include LD_LIBRARY_PRELOAD=/path/to/A.so, then everything works as expected : Bs invocation of re_exec correctly calls A, and not libc.
The dlopen call passes RTLD_NOW | RTLD_GLOBAL. I have tried with and without DEEPBIND, and get the same behavior in either case.
I have also tried dlopen()ing A directly, before B, both with and without DEEPBIND, which did not affect the behavior.
The question : is it possible to dlopen A/B with higher precedence than libraries that were included at link-time (libc, in this case) ?
(please don't suggest that I rename the call to something other than re_exec ; not useful)
Well, you know, I can't reproduce your error. Please take a look:
puts.c:
#include <stdio.h>
int puts(const char* _s) {
return printf("custom puts: %s\n", _s);
}
built with:
cc -Wall -fPIC -c puts.c -o puts.o
cc -shared -o libputs.so -fPIC -Wl,-soname,libputs.so puts.o
foo.c:
#include <stdio.h>
void foo() {
puts("Hello, world! I'm foo!");
}
built with:
cc -Wall -fPIC -c foo.c -o foo.o
cc -L`pwd` -shared -o libfoo.so -fPIC -Wl,-soname,libfoo.so foo.o -lputs
and rundl.c:
#include <dlfcn.h>
#include <assert.h>
#include <stdio.h>
typedef void (*FooFunc)();
int main(void) {
void *foolib = dlopen("./libfoo.so", RTLD_NOW | RTLD_GLOBAL | RTLD_DEEPBIND);
assert(foolib != NULL);
FooFunc foo = (FooFunc)dlsym(foolib, "foo");
assert(foo != NULL);
foo();
return 0;
}
built with:
cc -c -Wall rundl.c -o rundl.o
cc -o rundl rundl.o -ldl
now we can run rundl with LD_LIBRARY_PATH=$(pwd) (it's needed because libputs.so isn't in the ld.so known paths so libfoo.so can't be loaded w/ dlopen() & Co):
alex#rhyme ~/tmp/dynlib $ LD_LIBRARY_PATH=`pwd` ./rundl
custom puts: Hello, world! I'm foo!
alex#rhyme ~/tmp/dynlib $ _
if we move libputs.so to a directory known to ld.so and (re)run ldconfig to update caches then the code runs without any special environment variables:
alex#rhyme ~/tmp/dynlib $ ldd ./libfoo.so
linux-vdso.so.1 (0x00007fff48db8000)
libputs.so => /usr/local/lib64/libputs.so (0x00007f8595450000)
libc.so.6 => /lib64/libc.so.6 (0x00007f85950a0000)
/lib64/ld-linux-x86-64.so.2 (0x00007f8595888000)
alex#rhyme ~/tmp/dynlib $ ./rundl
custom puts: Hello, world! I'm foo!
If I link libfoo.so w/o -lputs foo() invokes the standard puts() from libc. That's it.
Related
pqy#localhost ~/src/test/a $ cat m.c
#include <stdio.h>
int aaaaa __attribute__ ((weak)) =8;
int main(void){
printf("%d\n", aaaaa);
return 0;
}
pqy#localhost ~/src/test/a $ cat lib.c
int aaaaa = 5;
pqy#localhost ~/src/test/a $ gcc lib.c -fPIC -shared -o libb.so;gcc m.c -o m -L. -lb -Wl,-rpath=$PWD;./m
8
Above is my code and test result. I am confused why it does not work as expected.
Also try function, not work ether. Below is the test result.
pqy#localhost ~/src/test/a $ cat lib.c
int fun() {
return 5;
}
pqy#localhost ~/src/test/a $ cat m.c
#include <stdio.h>
__attribute__((weak)) int fun() {
return 8;
}
int main(void){
printf("%d\n", fun());
return 0;
}
pqy#localhost ~/src/test/a $ gcc lib.c -fPIC -shared -o libb.so;gcc m.c -O0 -o m -L. -lb -Wl,-rpath=$PWD;./m
8
pqy#localhost ~/src/test/a $ ldd m
linux-vdso.so.1 (0x00007ffd819ec000)
libb.so => /home/pqy/src/test/a/libb.so (0x00007f7226738000)
libc.so.6 => /lib64/libc.so.6 (0x00007f7226533000)
/lib64/ld-linux-x86-64.so.2 (0x00007f7226744000)
pqy#localhost ~/src/test/a $
At bottom what you have observed here is just the fact that the linker will not
resolve a symbol dynamically if it can resolve it statically. See:
main.c
extern void foo(void);
extern void need_dynamic_foo(void);
extern void need_static_foo(void);
int main(void){
foo();
need_dynamic_foo();
need_static_foo();
return 0;
}
dynamic_foo.c
#include <stdio.h>
void foo(void)
{
puts("foo (dynamic)");
}
void need_dynamic_foo(void)
{
puts(__func__);
}
static_foo.c
#include <stdio.h>
void foo(void)
{
puts("foo (static)");
}
void need_static_foo(void)
{
puts(__func__);
}
Compile the sources so:
$ gcc -Wall -c main.c static_foo.c
$ gcc -Wall -fPIC -c dynamic_foo.c
Make a shared library:
$ gcc -shared -o libfoo.so dynamic_foo.o
And link a program:
$ gcc -o prog main.o static_foo.o libfoo.so -Wl,-rpath=$PWD
It runs like:
$ ./prog
foo (static)
need_dynamic_foo
need_static_foo
So foo and need_static_foo were statically resolved to the definitions from static_foo.o and
the definition of foo from libfoo.so was ignored, despite the fact that libfoo.so
was needed and provided the definition of need_dynamic_foo. It makes no difference
if we change the linkage order to:
$ gcc -o prog main.o libfoo.so static_foo.o -Wl,-rpath=$PWD
$ ./prog
foo (static)
need_dynamic_foo
need_static_foo
It also makes no difference if we replace static_foo.c with:
static_weak_foo.c
#include <stdio.h>
void __attribute__((weak)) foo(void)
{
puts("foo (static weak)");
}
void need_static_foo(void)
{
puts(__func__);
}
Compile that and relink:
$ gcc -Wall -c static_weak_foo.c
$ gcc -o prog main.o libfoo.so static_weak_foo.o -Wl,-rpath=$PWD
$ ./prog
foo (static weak)
need_dynamic_foo
need_static_foo
Although the definition of foo in static_weak_foo.c is now declared weak,
the fact that foo can be statically resolved to this definition
still preempts any need to resolve it dynamically.
Now if we write another source file containing another strong definition of
foo:
static_strong_foo.c
#include <stdio.h>
void foo(void)
{
puts("foo (static strong)");
}
and compile it and link as follows:
$ gcc -Wall -c static_strong_foo.c
$ gcc -o prog main.o static_weak_foo.o libfoo.so static_strong_foo.o -Wl,-rpath=$PWD
we see:
$ ./prog
foo (static strong)
need_dynamic_foo
need_static_foo
Now, libfoo.so still provides the definition of need_dynamic_foo, because there
is no other; static_weak_foo.o still provides the only definition of need_static_foo,
and the definition of foo in libfoo.so is still ignored because the symbol
can be statically resolved.
But in this case there are two definitions of foo in different files that are
available to resolve it statically: the weak definition in static_weak_foo.o and
the strong definition in static_strong_foo.o. By the linkage rules that you are
familiar with, the strong definition wins.
If both of these statically linked definitions of foo were strong, there would of course be a
multiple definition error, just like:
$ gcc -o prog main.o static_foo.o libfoo.so static_strong_foo.o -Wl,-rpath=$PWD
static_strong_foo.o: In function `foo':
static_strong_foo.c:(.text+0x0): multiple definition of `foo'
static_foo.o:static_foo.c:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status
in which the dynamic definition in libfoo.so plays no part. So you can
be guided by this practical principle: The rules you are familiar with for arbitrating
between weak and strong definitions of the same symbol in a linkage only apply
to rival definitions which would provoke a multiple definition error in the absence
of the weak attribute.
The symbol is resolved at link stage, during the link stage only the weak symbol aaaaa = 8 is visible.
If the symbol can be resolved in the link stage, it won't generate a relocation entry, then nothing will happen at load stage
There are no aaaaa in the relocation table:
% objdump -R m
m: file format elf64-x86-64
DYNAMIC RELOCATION RECORDS
OFFSET TYPE VALUE
0000000000003dc8 R_X86_64_RELATIVE *ABS*+0x0000000000001130
0000000000003dd0 R_X86_64_RELATIVE *ABS*+0x00000000000010f0
0000000000004028 R_X86_64_RELATIVE *ABS*+0x0000000000004028
0000000000003fd8 R_X86_64_GLOB_DAT _ITM_deregisterTMCloneTable
0000000000003fe0 R_X86_64_GLOB_DAT __libc_start_main#GLIBC_2.2.5
0000000000003fe8 R_X86_64_GLOB_DAT __gmon_start__
0000000000003ff0 R_X86_64_GLOB_DAT _ITM_registerTMCloneTable
0000000000003ff8 R_X86_64_GLOB_DAT __cxa_finalize#GLIBC_2.2.5
0000000000004018 R_X86_64_JUMP_SLOT printf#GLIBC_2.2.5
Сompiler or linker builds files from the command line in reverse order. In other words, files with ((weak)) should be located earlier in the command line than dynamic ones.
We are using dlopen to read in a dynamic library on Mac OS X. Update:
This is a posix problem, the same thing fails under cygwin.
First the compile. On cygwin:
extern "C" void foo() { }
g++ -shared foo.c -o libfoo.so
nm -D libfoo.so
displays no public symbols. This appears to be the problem. If I could make them public, nm -D should display them.
Using:
nm libfoo.so | grep foo
000000x0xx0x00x0x0 T _foo
you can see the symbol is there. In Linux, this does seem to work:
nm -D foo.so
0000000000201020 B __bss_start
w __cxa_finalize
0000000000201020 D _edata
0000000000201028 B _end
0000000000000608 T _fini
0000000000000600 T foo
w __gmon_start__
00000000000004c0 T _init
w _ITM_deregisterTMCloneTable
w _ITM_registerTMCloneTable
w _Jv_RegisterClasses
However, even in Linux, we cannot seem to connect to the library. Here is the source code:
include
include
using namespace std;
int main() {
void* so = dlopen("foo.so", RTLD_NOW);
if (so = nullptr) {
cerr << "Can't open shared library\n";
exit(-1);
}
#if 0
const void* sym = dlsym(so, "foo");
if (sym == nullptr) {
cout << "Symbol not found\n";
}
#endif
dlclose(so);
}
If we remove the #ifdef, the above code prints "Symbol not found"
but it crashes on the dlclose.
We tried exporting LD_LIBRARY_PATH=. just to see if the library cannot be reached. And the dlopen call seems to work in any case, the return is not nullptr.
So to summarize, the library does not seem to work on Mac and Cygwin. On Linux nm -D shows the symbol in the library, but the code to load the symbol does not work.
In your example, you wrote if (so = nullptr) {, which assigns nullptr to so, and the condition is always false. -Wall is a good idea when debugging!
This alone explains why you can't load the symbol, but I also found that I needed to do dlopen("./foo.so", RTLD_NOW); because dlopen otherwise searches library paths, not the current directory.
I only cares about default/hidden visibility.
The .o file is not compiled with IPO.
How to find out a symbol's visibility in the .o file?
Why I have to find it out from .o file:
On certain platform, I ran into a gcov.a which appears problematic.
And I have to figure out where it is wrong.
1. I cannot know how exactly the toolchain is configured and built.
2. As part of libgcc magic, figure it out from source code is extremely difficult.
You can find out the visibility of a symbol in an object file by examining
the file's symbol table with objdump -t. If the symbol is hidden it
will be labelled .hidden in the 6th field of its objdump record, followed
by its name. If its visibility is default there will be no such label and
the 6th field will be the name (the usual case). For example:
foo.c (default visibility)
#include <stdio.h>
void foo(void)
{
puts("foo");
}
Compile and examine:
$ gcc -c -fPIC foo.c
$ objdump -t foo.o | grep foo
foo.o: file format elf64-x86-64
0000000000000000 l df *ABS* 0000000000000000 foo.c
0000000000000000 g F .text 0000000000000013 foo
foo.c (hidden visibility)
#include <stdio.h>
__attribute__ ((visibility ("hidden"))) void foo(void)
{
puts("foo");
}
Recompile and re-examine:
$ gcc -c -fPIC foo.c
$ objdump -t foo.o | grep foo
foo.o: file format elf64-x86-64
0000000000000000 l df *ABS* 0000000000000000 foo.c
0000000000000000 g F .text 0000000000000013 .hidden foo
I have one .cu file that contains my cuda kernel, and a wrapper function that calls the kernel. I have a bunch of .c files as well, one of which contains the main function. One of these .c files calls the wrapper function from the .cu to invoke the kernel.
I compile these files as follows:
LIBS=-lcuda -lcudart
LIBDIR=-L/usr/local/cuda/lib64
CFLAGS = -g -c -Wall -Iinclude -Ioflib
NVCCFLAGS =-g -c -Iinclude -Ioflib
CFLAGSEXE =-g -O2 -Wall -Iinclude -Ioflib
CC=gcc
NVCC=nvcc
objects := $(patsubst oflib/%.c,oflib/%.o,$(wildcard oflib/*.c))
table-hash-gpu.o: table-hash.cu table-hash.h
$(NVCC) $(NVCCFLAGS) table-hash.cu -o table-hash-gpu.o
main: main.c $(objects) table-hash-gpu.o
$(CC) $(CFLAGSEXE) $(objects) table-hash-gpu.o -o udatapath udatapath.c $(LIBS) $(LIBDIR)
So far everything is fine. table-hash-gpu.cu calls a function from one of the .c files. When linking for main, I get the error that the function is not present. Can someone please tell me what is going on?
nvcc compiles both device and host code using the host C++ compiler, which implies name mangling. If you need to call a function compiled with a C compiler in C++, you must tell the C++ compiler that it uses C calling conventions. I presume that the errors you are seeing are analogous to this:
$ cat cfunc.c
float adder(float a, float b, float c)
{
return a + 2.f*b + 3.f*c;
}
$ cat cumain.cu
#include <cstdio>
float adder(float, float, float);
int main(void)
{
float result = adder(1.f, 2.f, 3.f);
printf("%f\n", result);
return 0;
}
$ gcc -m32 -c cfunc.c
$ nvcc -o app cumain.cu cfunc.o
Undefined symbols:
"adder(float, float, float)", referenced from:
_main in tmpxft_0000b928_00000000-13_cumain.o
ld: symbol(s) not found
collect2: ld returned 1 exit status
Here we have code compiled with nvcc (so the host C++ compiler) trying to call a C function and getting a link error, because the C++ code expects a mangled name for adder in the supplied object file. If the main is changed like this:
$ cat cumain.cu
#include <cstdio>
extern "C" float adder(float, float, float);
int main(void)
{
float result = adder(1.f, 2.f, 3.f);
printf("%f\n", result);
return 0;
}
$ nvcc -o app cumain.cu cfunc.o
$ ./app
14.000000
It works. Using extern "C" to qualify the declaration of the function to the C++ compiler, it will not use C++ mangling and linkage rules when referencing adder and the resulting code links correctly.
I tried to link my executable program with 2 static libraries using g++. The 2 static libraries have the same function name. I'm expecting a "multiple definition" linking error from the linker, but I did not received. Can anyone help to explain why is this so?
staticLibA.h
#ifndef _STATIC_LIBA_HEADER
#define _STATIC_LIBA_HEADER
int hello(void);
#endif
staticLibA.cpp
#include "staticLibA.h"
int hello(void)
{
printf("\nI'm in staticLibA\n");
return 0;
}
output:
g++ -c -Wall -fPIC -m32 -o staticLibA.o staticLibA.cpp
ar -cvq ../libstaticLibA.a staticLibA.o
a - staticLibA.o
staticLibB.h
#ifndef _STATIC_LIBB_HEADER
#define _STATIC_LIBB_HEADER
int hello(void);
#endif
staticLibB.cpp
#include "staticLibB.h"
int hello(void)
{
printf("\nI'm in staticLibB\n");
return 0;
}
output:
g++ -c -Wall -fPIC -m32 -o staticLibB.o staticLibB.cpp
ar -cvq ../libstaticLibB.a staticLibB.o
a - staticLibB.o
main.cpp
extern int hello(void);
int main(void)
{
hello();
return 0;
}
output:
g++ -c -o main.o main.cpp
g++ -o multipleLibsTest main.o -L. -lstaticLibA -lstaticLibB -lstaticLibC -ldl -lpthread -lrt
The linker does not look at staticLibB, because by the time staticLibA is linked, there are no unfulfilled dependencies.
That's an easy one. An object is only pulled out of a library if the symbol referenced hasn't already been defined. Only one of the hellos are pulled (from A). You'd get errors if you linked with the .o files.
When the linker tries to link main.o into multipleLibsTest and sees that hello() is unresolved, it starts searching the libraries in the order given on the command line. It will find the definition of hello() in staticLibA and will terminate the search.
It will not look in staticLibB or staticLibC at all.
If staticLibB.o contained another symbol not in staticLibA and that was pulled into the final executable, you then get a multiple definition of hello error, as individual .o files are pulled out of the library and two of them would have hello(). Reversing the order of staticLibA and staticLibB on the link command line would then make that error go away.