Cygwin: why POSIX symbols in libc.a are strong symbols? - cygwin

On Cygwin POSIX symbols in libc.a are strong symbols (e.g. accept):
$ nm /usr/lib/libc.a | grep 'accept' -w
0000000000000000 T accept
while on Linux they are weak symbols (e.g. accept):
$ nm /usr/lib/x86_64-linux-gnu/libc.a |& grep 'accept' -w
accept.o:
0000000000000000 W accept
U accept
U accept
U accept
U accept
Note: having POSIX symbols in libc.a may be a bit unexpected.
However, why in Cygwin's libc.a POSIX symbols are strong symbols?
Example: If user has mylib.a containing the definition of accept and the mylib.a goes after the libc.a in the list of command line arguments, then the accept from the libc.a may be (and usually is) selected.
UPD. https://stackoverflow.com/a/2290838/1778275:
MSVC++ has __declspec(selectany) which covers part of the functionality of weak symbols.

https://en.wikipedia.org/wiki/Weak_symbol
Weak Symbol are not available on COFF PE binary.
It is also one of the reason why all symbols must be defined at compilation time.

Related

GNU ld: -z origin? -rpath $ORIGIN/../lib?

A legacy makefile that I'm trying to understand has -Wl,-z,origin,-rpath,'$ORIGIN/../lib'
OK, I see -Wl means the following are linker options; the commas will be replaced with spaces.
The manpage for the GNU ld mysteriously only says:
-z keyword
The recognized keywords are:
:
:
origin
Marks the object may contain $ORIGIN.
Likewise the next option -rpath (relative path?) contains this $ORIGIN suggesting it's some kind of key word but $ORIGIN is not otherwise mentioned in the ld man page.
$ORIGIN is mentioned under Substitution Sequences in the ELF specification. DF_ORIGIN is documented as well.
However, while GNU ld supports setting the DF_ORIGIN flag with the -z origin option, the dynamic loader in glibc always honors $ORIGIN, even if the flag is not set. This means that there is no reason to use the link editor flag when building for GNU/Linux.

What's the rule of dynamic library searching for ld?

Linux separates the linker-time search path and run-time search path.
For the run-time search path, I found the rule for ld.so in its man page (8 ld.so):
DT_RPATH
environment LD_LIBRARY_PATH
DT_RUNPATH
ld.so.cache
/lib, /usr/lib
But for linker-time search path, no luck for ld :(
Man page for ld (1 ld) says, besides -L option:
The default set of paths searched (without being specified with -L) depends on which emulation mode ld is using, and in some cases also on how it was configured.
The paths can also be specified in a link script with the "SEARCH_DIR" command. Directories specified this way are searched at the point in which the linker script appears in the command line.
Does the "default set of paths" depending on emulation mode mean "SEARCH_DIR"?
misssprite, to look for the linker search path for specific ELF emulation just run ld -m<emulation> --verbose | grep SEARCH_DIR
Speaking about the ld itself, the library path search order is the following:
Directories specified via -L command line flags
Directories in the LIBRARY_PATH environment variable
SEARCH_DIR variables in the linker script.
You can look what directories are specified in the default linker script by running ld --verbose | grep SEARCH_DIR. Note that = in the SEARCH_DIR values will be replaced by the value of --sysroot option if you specify it.
Usually ld is not invoked directly, but via compiler driver which passes several -L options to the linker. In the case of gcc or clang you can print the additional library search directories added by a compiler by invoking it with -print-search-dirs option. Also note that if you specify some machine-specific compiler flags (like e.g -m32 as misssprite mentioned) than the linker may use different linker script according to the chosen ELF emulation. In the case of gcc you can use -dumpspecs option to look how different compiler flags affect the linker invocation. But IMHO the simplest way to look for the linker command line is to compile and link a simple program with -v specified.
misssprite, there is no search for ld.so or ld-linux.so in the binutils's ld linker.
When dynamic program is build with gcc, it uses option -dynamic-linker of ld (collect2) program: http://man7.org/linux/man-pages/man1/ld.1.html
-Ifile, --dynamic-linker=file
Set the name of the dynamic linker. This is only meaningful when
generating dynamically linked ELF executables. The default
dynamic linker is normally correct; don't use this unless you
know what you are doing.")
Usually used as runtime loader for ELF, the "ld-linux.so" is registered as interpreter in the dynamic ELF file, program header INTERP (.interp), check output readelf -l ./dynamic_application. This field is for full path, as I understand.
When there is no gcc (directly called 'ld' program) or no this option was given, ld uses hardcoded string of full path to ld.so; and this default is incorrect for most OS, including Linux:
https://github.com/bneumeier/binutils/blob/db980de65ca9f296aae8db4d13ee884f0c18ac8a/bfd/elf64-x86-64.c#L510
/* The name of the dynamic interpreter. This is put in the .interp
section. */
#define ELF64_DYNAMIC_INTERPRETER "/lib/ld64.so.1"
#define ELF32_DYNAMIC_INTERPRETER "/lib/ldx32.so.1"
https://github.com/bneumeier/binutils/blob/db980de65ca9f296aae8db4d13ee884f0c18ac8a/gold/x86_64.cc#L816
template<>
const Target::Target_info Target_x86_64<64>::x86_64_info =
...
"/lib/ld64.so.1", // program interpreter
const Target::Target_info Target_x86_64<32>::x86_64_info =
...
"/libx32/ldx32.so.1", // program interpreter
Correct dynamic linker/loader path is hardcoded in machine spec files of gcc, grep output of gcc -dumpspecs command for ld-linux for -dynamic-linker option value.

Linux ld: What's the meaning of `-m` option and the command `ld -melf_32 -Ttext 0 -e startup_32`

I have read the ld manual, the -m emulation option refers to emulate the emulation linker, what's the meaning of the description. And the -T scriptfile option can use scriptfile as the linker script, but what the option -Ttext 0 refers to, is it valid?
-Ttext 0 tells the linker to start the program at address 0
15.3 Linker emulation selection
A linker emulation is a "personality" of the linker, which gives the linker default values for the other aspects of the target system. In particular, it consists of
the linker script
the target
several "hook" functions that are run at certain stages of the linking process to do special things that some targets require
http://ftp.gnu.org/old-gnu/Manuals/binutils/html_node/binutils_20.html

how to determine why a dynamic library is linked against an application?

I have a linux app I'm building from source. When I run ldd against the binary, I understand most of the libraries...but not all.
Is there a way to add a flag to ld or gcc/g++ or anything I can do to determine why the linker chose to link against specific libraries?
Edit:
To explore the route #shloim set up, I tried the following:
> nm -u /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
nm: /lib/x86_64-linux-gnu/libcrypto.so.1.0.0: no symbols
> file /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
/lib/x86_64-linux-gnu/libcrypto.so.1.0.0: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=230ebe6145b6681d0cb7e4c9021f0d899c02e0c4, stripped
Is there an obvious reason why nm would not work on libcrypto?
This should show you all symbols used in the so file that are undefined within the so:
nm -u <your_so_file>
You can then compare it with
nm --defined-only <3rd_party_so_file>
And try to figure out the common symbols
Is there an obvious reason why nm would not work on libcrypto?
Generally nm is to list the symbols of object files. Here, nm is used for share object file. So try like this nm -D libcrypto.so.
readelf or objdump can also be used to check the symbols present in shared objects.
readelf -Ws will show all the symbols

Convincing gcc to ignore system libraries in favour of locally installed libraries

I am trying to build a simple executable that uses boost_serialization and boost_iostreams.
#include <fstream>
#include <iostream>
#include <boost/archive/xml_iarchive.hpp>
#include <boost/archive/xml_oarchive.hpp>
#include <boost/iostreams/filtering_stream.hpp>
#include <boost/iostreams/filter/gzip.hpp>
#include <boost/iostreams/device/file.hpp>
int main()
{
using namespace boost::iostreams;
filtering_ostream os;
os.push(boost::iostreams::gzip_compressor());
os.push(boost::iostreams::file_sink("emptyGzipBug.txt.gz"));
}
Unfortunately the system I am working with has a very outdated version of boost_serialization in /usr/lib/, and I have no way to change that.
I am fairly certain when I build the example using
g++ -o main main.cpp -lboost_serialization -lboost_iostreams
that the linker errors result because gcc uses the system version of boost_serialization rather than my locally installed version. Setting LIBRARY_PATH and LD_LIBRARY_PATH to /home/andrew/install/lib doesnt work. When i build using
g++ -o main main.cpp -L/home/andrew/install/lib -lboost_serialization -lboost_iostreams
then everything works.
My questions are:
How can I get gcc to tell me the filenames of the libraries its using?
Is it possible to setup the environment so that I dont have to specify the absolute path to my local boost on the command line of gcc.
PS After typing the below info, I thought I'd be kind and add what you need for your specific case:
g++ -Wl,-rpath,/home/andrew/install/lib -o main main.cpp -I/home/andrew/install/include -L/home/andrew/install/lib -lboost_serialization -lboost_iostreams
gcc itself doesn't care about the libraries. The linker does ;).
Even though the linker needs to find the shared libraries so it can resolve
symbols, it doesn't store the path of those libraries in the executable normally.
So, for a start, lets find out what is actually in the binary after you linked it:
$ readelf -d main | grep 'libboost'
0x0000000000000001 (NEEDED) Shared library: [libboost_serialization.so.1.54.0]
0x0000000000000001 (NEEDED) Shared library: [libboost_iostreams.so.1.54.0]
Just the names thus.
The libraries that are actually used are detemined by /lib/ld-linux.so.*
at run time:
$ ldd main | grep libboost
libboost_serialization.so.1.54.0 => /usr/lib/x86_64-linux-gnu/libboost_serialization.so.1.54.0 (0x00007fd8fa920000)
libboost_iostreams.so.1.54.0 => /usr/lib/x86_64-linux-gnu/libboost_iostreams.so.1.54.0 (0x00007fd8fa700000)
The path is found by looking in /etc/ld.so.cache (which is normally
compiled by running ldconfig). You can print its contents with:
ldconfig -p | grep libboost_iostreams
libboost_iostreams.so.1.54.0 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libboost_iostreams.so.1.54.0
libboost_iostreams.so.1.49.0 (libc6,x86-64) => /usr/lib/libboost_iostreams.so.1.49.0
libboost_iostreams.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libboost_iostreams.so
but since that is only the cached result of a previous look up,
you are more interested in the output of:
$ ldconfig -v 2>/dev/null | egrep '^[^[:space:]]|libboost_iostreams'
/lib/i386-linux-gnu:
/usr/lib/i386-linux-gnu:
/usr/local/lib:
/lib/x86_64-linux-gnu:
/usr/lib/x86_64-linux-gnu:
libboost_iostreams.so.1.54.0 -> libboost_iostreams.so.1.54.0
/lib32:
/usr/lib32:
/lib:
/usr/lib:
libboost_iostreams.so.1.49.0 -> libboost_iostreams.so.1.49.0
which shows the paths that it looked in before finding a result.
Note if you are linking a 64bit program and it would find a 32bit
library first (or visa versa) then that would be skipped as being
incompatible. Otherwise, the first one found is used.
The paths used to search are specified in /etc/ld.so.conf which is
read (usually at boot time, or after installing something new)
when running ldconfig as root.
However, precedence take paths specified as a colon separated list
of paths in the environment variable LD_LIBRARY_PATH.
For example, if I'd do:
$ export LD_LIBRARY_PATH=/tmp
$ cp /usr/lib/libboost_iostreams.so.1.49.0 /tmp/libboost_iostreams.so.1.54.0
$ ldd main | grep libboost_iostreams
libboost_iostreams.so.1.54.0 => /tmp/libboost_iostreams.so.1.54.0 (0x00007f621add8000)
then it finds 'libboost_iostreams.so.1.54.0' in /tmp (even though it was a libboost_iostreams.so.1.49.0).
Note that you CAN hardcode a path in your executable by passing -rpath to
the linker:
$ unset LD_LIBRARY_PATH
$ g++ -Wl,-rpath,/tmp -o main main.cpp -lboost_serialization -lboost_iostreams
$ ldd main | grep libboost_iostreams
libboost_iostreams.so.1.54.0 => /tmp/libboost_iostreams.so.1.54.0 (0x00007fbd8bcd8000)
which can be made visible with
$ readelf -d main | grep RPATH
0x000000000000000f (RPATH) Library rpath: [/tmp]
Note that LD_LIBRARY_PATH even takes precedence over -rpath, unless
you also passed -Wl,--disable-new-dtags, along with the -rpath and provided that you are linking an executable and your linker supports
this flag.
You can show the search paths that gcc uses during compile(link) time with the -print-search-dirs command line option:
$ g++ -print-search-dirs | grep libraries
libraries: =/usr/lib/gcc/x86_64-linux-gnu/4.7/:/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../../x86_64-linux-gnu/lib/x86_64-linux-gnu/4.7/:/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../../x86_64-linux-gnu/lib/x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../../x86_64-linux-gnu/lib/../lib/:/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../x86_64-linux-gnu/4.7/:/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../../lib/:/lib/x86_64-linux-gnu/4.7/:/lib/x86_64-linux-gnu/:/lib/../lib/:/usr/lib/x86_64-linux-gnu/4.7/:/usr/lib/x86_64-linux-gnu/:/usr/lib/../lib/:/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../../x86_64-linux-gnu/lib/:/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../:/lib/:/usr/lib/
This can be influenced by adding -L command line options. If a library can't be found in a path specified with the -L option then it looks in paths found through the environment variable GCC_EXEC_PREFIX (see the man page for that) and if that fails it uses the environment variable LIBRARY_PATH.
When you run g++ with the -v option, it will print the LIBRARY_PATH used.
LIBRARY_PATH=/tmp/lib g++ -v -o main main.cpp -lboost_serialization -lboost_iostreams 2>&1 | grep LIBRARY_PATH
LIBRARY_PATH=/tmp/lib/../lib/:/usr/lib/gcc/x86_64-linux-gnu/4.7/:/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../../lib/:/lib/x86_64-linux-gnu/:/lib/../lib/:/usr/lib/x86_64-linux-gnu/:/usr/lib/../lib/:/tmp/lib/:/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../:/lib/:/usr/lib/
Finally, note that especially for boost (but in general) you should
use header files that match the correct version! So, if the library that you
link with at run time is version xyz you should have used an -I command line option to get g++ to find the corresponding header files, or things might not link or worse, result in unexplainable crashes.
-nodefaultlibs
Do not use the standard system libraries when linking. Only the
libraries you specify are passed to the linker, and options
specifying linkage of the system libraries, such as
-static-libgcc or -shared-libgcc, are ignored. The standard
startup files are used normally, unless -nostartfiles is used.
The compiler may generate calls to "memcmp", "memset", "memcpy"
and "memmove". These entries are usually resolved by entries in
libc. These entry points should be supplied through some other
mechanism when this option is specified.
Haven't used it myself but it sounds exactly like what was asked for.

Resources