Linking against an old version of libc to provide greater application coverage - linux

Linux binaries are usually dynamically linked to the core system library (libc). This keeps the memory footprint of the binary quite small but binaries which are dependent on the latest libraries will not run on older systems. Conversely, binaries linked to older libraries will run happily on the latest systems.
Therefore, in order to ensure our application has good coverage during distribution we need to figure out the oldest libc we can support and link our binary against that.
How should we determine the oldest version of libc we can link to?

Work out which symbols in your executable are creating the dependency on the undesired version of glibc.
$ objdump -p myprog
...
Version References:
required from libc.so.6:
0x09691972 0x00 05 GLIBC_2.3
0x09691a75 0x00 03 GLIBC_2.2.5
$ objdump -T myprog | fgrep GLIBC_2.3
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.3 realpath
Look within the depended-upon library to see if there are any symbols in older versions that you can link against:
$ objdump -T /lib/libc.so.6 | grep -w realpath
0000000000105d90 g DF .text 0000000000000021 (GLIBC_2.2.5) realpath
000000000003e7b0 g DF .text 00000000000004bf GLIBC_2.3 realpath
We're in luck!
Request the version from GLIBC_2.2.5 in your code:
#include <limits.h>
#include <stdlib.h>
__asm__(".symver realpath,realpath#GLIBC_2.2.5");
int main () {
realpath ("foo", "bar");
}
Observe that GLIBC_2.3 is no longer needed:
$ objdump -p myprog
...
Version References:
required from libc.so.6:
0x09691a75 0x00 02 GLIBC_2.2.5
$ objdump -T myprog | grep realpath
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 realpath
For further information, see http://web.archive.org/web/20160107032111/http://www.trevorpounds.com/blog/?p=103.

Unfortunately, #Sam's solution doesn't work well in my situation. But according to his way, I found my own way to solve that.
This is my situation:
I'm writing a C++ program using the Thrift framework(it's an RPC middleware). I prefer static link to dynamic link, so my program is linked to libthrift.a statically instead of libthrift.so. However, libthrift.a is dynamically linked to glibc, and since my libthrift.a is build on my system with glibc 2.15, my libthrift.a uses memcpy of version 2.14(memcpy#GLIBC_2.14) provided by glibc 2.15.
But the problem is that our server machines have only the glibc version 2.5 which has only memcpy#GLIBC_2.2.5. It is much lower than memcpy#GLIBC_2.14. So, of course, my server program can't run on those machines.
And I found this solusion:
Using .symver to obtain the ref to memcpy#GLIBC_2.2.5.
Write my own __wrap_memcpy function which just calls memcpy#GLIBC_2.2.5 directly.
When linking my program, add -Wl,--wrap=memcpy option to gcc/g++.
The code involved in steps 1 and 2 is here: https://gist.github.com/nicky-zs/7541169

To do this in a more automated fashion, you can use the following script to create a list of all the symbols that are newer in your GLIBC than in a given version (set on line 2). It creates a glibc.h file (filename set by the script argument) which contains all the necessary .symver declarations. You can then add -include glibc.h to your CFLAGS to make sure it gets picked up everywhere in your compilation.
This is sufficient if you don't use any static libraries that were compiled without the above include. If you do, and you don't want to recompile, you can use objcopy to create a copy of the library with the symbols renamed to the old versions. The second to bottom line of the script creates a version of your system libstdc++.a that will link against the old glibc symbols. Adding -L. (or -Lpath/to/libstdc++.a/) will make your program statically link libstdc++ without linking in a bunch of new symbols. If you don't need this, delete the last two lines and the printf ... redeff line.
#!/bin/bash
maxver=2.9
headerf=${1:-glibc.h}
set -e
for lib in libc.so.6 libm.so.6 libpthread.so.0 libdl.so.2 libresolv.so.2 librt.so.1; do
objdump -T /usr/lib/$lib
done | awk -v maxver=${maxver} -vheaderf=${headerf} -vredeff=${headerf}.redef -f <(cat <<'EOF'
BEGIN {
split(maxver, ver, /\./)
limit_ver = ver[1] * 10000 + ver[2]*100 + ver[3]
}
/GLIBC_/ {
gsub(/\(|\)/, "",$(NF-1))
split($(NF-1), ver, /GLIBC_|\./)
vers = ver[2] * 10000 + ver[3]*100 + ver[4]
if (vers > 0) {
if (symvertext[$(NF)] != $(NF-1))
count[$(NF)]++
if (vers <= limit_ver && vers > symvers[$(NF)]) {
symvers[$(NF)] = vers
symvertext[$(NF)] = $(NF-1)
}
}
}
END {
for (s in symvers) {
if (count[s] > 1) {
printf("__asm__(\".symver %s,%s#%s\");\n", s, s, symvertext[s]) > headerf
printf("%s %s#%s\n", s, s, symvertext[s]) > redeff
}
}
}
EOF
)
sort ${headerf} -o ${headerf}
objcopy --redefine-syms=${headerf}.redef /usr/lib/libstdc++.a libstdc++.a
rm ${headerf}.redef

glibc 2.2 is a pretty common minimum version. However finding a build platform for that version may be non-trivial.
Probably a better direction is to think about the oldest OS you want to support and build on that.

Related

Getting undefined reference to "_printf" error for assembly code despite using gcc linker

I am trying to follow the exercise in the book PC Assembly by Paul Carter. http://pacman128.github.io/pcasm/
I'm trying to run the program from 1.4 page 23 on Ubuntu 18. The files are all available on the github site above.
Since original code is for 32bit I compile using
nasm -f elf32
for first.asm and asm_io.asm to get the object files. I also compile driver.c
I use the linker from gcc and run
gcc -m32 -o first first.o asm_io.o driver.o
but it keeps giving me a bun of errors like
undefined reference to '_scanf'
undefined reference to '_printf'
(note _printf appears instead of printf because some conversion is done in the file asm_io.asm to maintain compatibility between windows and linux OS's)
I don't know why these errors are appearing. I also try running using linker directly
ld -m elf_i386 -e main -o first -first.o driver.o asm_io.o -I /lib/i386-linux-gnu/ld-linux.so.2
and many variations since it seems that its not linking with the C libraries.
Any help? Stuck on this for a while and couldn't find a solution on similar questions
Linux doesn't prepend _ to names when mapping from C to asm symbol names in ELF object files1.
So call printf, not _printf, because there is no _printf in libc.
Whatever "compatibility" code did that is doing it wrong. Only Windows and OS X use _printf, Linux uses printf.
So either you've misconfigured something or defined the wrong setting, or it requires updating / porting to Linux.
Footnote 1: In ancient history (like over 20 years ago), Linux with the a.out file format did use leading underscores on symbol names.
Update: the library uses the NASM preprocessor to %define _scanf scanf and so on, but it requires you to manually define ELF_TYPE by assembling with nasm -d ELF_TYPE.
They could have detected ELF32 or ELF64 output formats on their own, because NASM pre-defines __OUTPUT_FORMAT__. Someone should submit a pull-request to make this detection automatic with code something like this:
%ifidn __OUTPUT_FORMAT__, elf32
%define ELF_TYPE 32
%elifidn __OUTPUT_FORMAT__, elf64
%define ELF_TYPE 64
%endif
%ifdef ELF_TYPE
...
%endif

Why are there global offset tables and procedure linkage tables in statically linked executables?

I've done a bunch of reading on dynamic linker relocations and position independent code including procedure linkage tables and global offset tables. I don't understand why a statically linked executable needs a PLT and GOT. I compiled a hello world program on my ubuntu x86_64 machine and when I dump the section headers with readelf -S it shows PLT and GOT sections.
I also created a shared library with a simple increment function that I compiled with gcc -shared without -fpic and I also see PLT and GOT sections. I didn't expect this either.
I don't understand why a statically linked executable needs a PLT and GOT.
It doesn't.
I compiled a hello world program on my ubuntu x86_64 machine and when I dump the section headers with readelf -S it shows PLT and GOT sections.
This is an accident of implementation. The sections come from crt1.o, and there isn't a separate crt1s.o for fully-static linking, so you end up with .plt and .got entries from there.
You can strip these sections, and the binary will still work:
objcopy -R.got -R.plt a.out a.out2
Note: do not strip .rela.plt, as that section is still needed to implement IFUNCs.
I found that gcc generates a .got and .got.lpt when generating position independent code and taking the address of a function defined in another source file.
My test files were:
part1.c:
extern void afunc();
int _start()
{
return 0x55 & (__SIZE_TYPE__) afunc;
}
part2.c:
void afunc() {}
My test was (substitute your own gcc version):
for o in s 4 3 2 1 0
do
aarch64-linux-gnu-gcc-10 -fPIC part1.c part2.c -o static.elf -static -nostdlib -O$o &&
aarch64-linux-gnu-objdump -x static.elf | grep 'GLOBAL_OFFSET'
done
I get the following output for all optimization levels:
0000000000410fd8 l O .got 0000000000000000 _GLOBAL_OFFSET_TABLE_
Replacing -fPIC with -fno-PIC and the segment goes away.
You can tell if your compiler defaults to -fPIC by running this:
aarch64-linux-gnu-gcc-10 -mcmodel=large -x c - < /dev/null
From which, I get the error, if it does:
cc1: sorry, unimplemented: code model ‘large’ with ‘-fPIC’

Convincing gcc to ignore system libraries in favour of locally installed libraries

I am trying to build a simple executable that uses boost_serialization and boost_iostreams.
#include <fstream>
#include <iostream>
#include <boost/archive/xml_iarchive.hpp>
#include <boost/archive/xml_oarchive.hpp>
#include <boost/iostreams/filtering_stream.hpp>
#include <boost/iostreams/filter/gzip.hpp>
#include <boost/iostreams/device/file.hpp>
int main()
{
using namespace boost::iostreams;
filtering_ostream os;
os.push(boost::iostreams::gzip_compressor());
os.push(boost::iostreams::file_sink("emptyGzipBug.txt.gz"));
}
Unfortunately the system I am working with has a very outdated version of boost_serialization in /usr/lib/, and I have no way to change that.
I am fairly certain when I build the example using
g++ -o main main.cpp -lboost_serialization -lboost_iostreams
that the linker errors result because gcc uses the system version of boost_serialization rather than my locally installed version. Setting LIBRARY_PATH and LD_LIBRARY_PATH to /home/andrew/install/lib doesnt work. When i build using
g++ -o main main.cpp -L/home/andrew/install/lib -lboost_serialization -lboost_iostreams
then everything works.
My questions are:
How can I get gcc to tell me the filenames of the libraries its using?
Is it possible to setup the environment so that I dont have to specify the absolute path to my local boost on the command line of gcc.
PS After typing the below info, I thought I'd be kind and add what you need for your specific case:
g++ -Wl,-rpath,/home/andrew/install/lib -o main main.cpp -I/home/andrew/install/include -L/home/andrew/install/lib -lboost_serialization -lboost_iostreams
gcc itself doesn't care about the libraries. The linker does ;).
Even though the linker needs to find the shared libraries so it can resolve
symbols, it doesn't store the path of those libraries in the executable normally.
So, for a start, lets find out what is actually in the binary after you linked it:
$ readelf -d main | grep 'libboost'
0x0000000000000001 (NEEDED) Shared library: [libboost_serialization.so.1.54.0]
0x0000000000000001 (NEEDED) Shared library: [libboost_iostreams.so.1.54.0]
Just the names thus.
The libraries that are actually used are detemined by /lib/ld-linux.so.*
at run time:
$ ldd main | grep libboost
libboost_serialization.so.1.54.0 => /usr/lib/x86_64-linux-gnu/libboost_serialization.so.1.54.0 (0x00007fd8fa920000)
libboost_iostreams.so.1.54.0 => /usr/lib/x86_64-linux-gnu/libboost_iostreams.so.1.54.0 (0x00007fd8fa700000)
The path is found by looking in /etc/ld.so.cache (which is normally
compiled by running ldconfig). You can print its contents with:
ldconfig -p | grep libboost_iostreams
libboost_iostreams.so.1.54.0 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libboost_iostreams.so.1.54.0
libboost_iostreams.so.1.49.0 (libc6,x86-64) => /usr/lib/libboost_iostreams.so.1.49.0
libboost_iostreams.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libboost_iostreams.so
but since that is only the cached result of a previous look up,
you are more interested in the output of:
$ ldconfig -v 2>/dev/null | egrep '^[^[:space:]]|libboost_iostreams'
/lib/i386-linux-gnu:
/usr/lib/i386-linux-gnu:
/usr/local/lib:
/lib/x86_64-linux-gnu:
/usr/lib/x86_64-linux-gnu:
libboost_iostreams.so.1.54.0 -> libboost_iostreams.so.1.54.0
/lib32:
/usr/lib32:
/lib:
/usr/lib:
libboost_iostreams.so.1.49.0 -> libboost_iostreams.so.1.49.0
which shows the paths that it looked in before finding a result.
Note if you are linking a 64bit program and it would find a 32bit
library first (or visa versa) then that would be skipped as being
incompatible. Otherwise, the first one found is used.
The paths used to search are specified in /etc/ld.so.conf which is
read (usually at boot time, or after installing something new)
when running ldconfig as root.
However, precedence take paths specified as a colon separated list
of paths in the environment variable LD_LIBRARY_PATH.
For example, if I'd do:
$ export LD_LIBRARY_PATH=/tmp
$ cp /usr/lib/libboost_iostreams.so.1.49.0 /tmp/libboost_iostreams.so.1.54.0
$ ldd main | grep libboost_iostreams
libboost_iostreams.so.1.54.0 => /tmp/libboost_iostreams.so.1.54.0 (0x00007f621add8000)
then it finds 'libboost_iostreams.so.1.54.0' in /tmp (even though it was a libboost_iostreams.so.1.49.0).
Note that you CAN hardcode a path in your executable by passing -rpath to
the linker:
$ unset LD_LIBRARY_PATH
$ g++ -Wl,-rpath,/tmp -o main main.cpp -lboost_serialization -lboost_iostreams
$ ldd main | grep libboost_iostreams
libboost_iostreams.so.1.54.0 => /tmp/libboost_iostreams.so.1.54.0 (0x00007fbd8bcd8000)
which can be made visible with
$ readelf -d main | grep RPATH
0x000000000000000f (RPATH) Library rpath: [/tmp]
Note that LD_LIBRARY_PATH even takes precedence over -rpath, unless
you also passed -Wl,--disable-new-dtags, along with the -rpath and provided that you are linking an executable and your linker supports
this flag.
You can show the search paths that gcc uses during compile(link) time with the -print-search-dirs command line option:
$ g++ -print-search-dirs | grep libraries
libraries: =/usr/lib/gcc/x86_64-linux-gnu/4.7/:/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../../x86_64-linux-gnu/lib/x86_64-linux-gnu/4.7/:/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../../x86_64-linux-gnu/lib/x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../../x86_64-linux-gnu/lib/../lib/:/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../x86_64-linux-gnu/4.7/:/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../../lib/:/lib/x86_64-linux-gnu/4.7/:/lib/x86_64-linux-gnu/:/lib/../lib/:/usr/lib/x86_64-linux-gnu/4.7/:/usr/lib/x86_64-linux-gnu/:/usr/lib/../lib/:/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../../x86_64-linux-gnu/lib/:/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../:/lib/:/usr/lib/
This can be influenced by adding -L command line options. If a library can't be found in a path specified with the -L option then it looks in paths found through the environment variable GCC_EXEC_PREFIX (see the man page for that) and if that fails it uses the environment variable LIBRARY_PATH.
When you run g++ with the -v option, it will print the LIBRARY_PATH used.
LIBRARY_PATH=/tmp/lib g++ -v -o main main.cpp -lboost_serialization -lboost_iostreams 2>&1 | grep LIBRARY_PATH
LIBRARY_PATH=/tmp/lib/../lib/:/usr/lib/gcc/x86_64-linux-gnu/4.7/:/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../../lib/:/lib/x86_64-linux-gnu/:/lib/../lib/:/usr/lib/x86_64-linux-gnu/:/usr/lib/../lib/:/tmp/lib/:/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../:/lib/:/usr/lib/
Finally, note that especially for boost (but in general) you should
use header files that match the correct version! So, if the library that you
link with at run time is version xyz you should have used an -I command line option to get g++ to find the corresponding header files, or things might not link or worse, result in unexplainable crashes.
-nodefaultlibs
Do not use the standard system libraries when linking. Only the
libraries you specify are passed to the linker, and options
specifying linkage of the system libraries, such as
-static-libgcc or -shared-libgcc, are ignored. The standard
startup files are used normally, unless -nostartfiles is used.
The compiler may generate calls to "memcmp", "memset", "memcpy"
and "memmove". These entries are usually resolved by entries in
libc. These entry points should be supplied through some other
mechanism when this option is specified.
Haven't used it myself but it sounds exactly like what was asked for.

GCC/ELF - from where comes my symbol?

There is an executable that is dynamically linked to number of shared objects. How can I determine, to which of them some symbol (imported into executable) belongs ?
If there are more than one possibility, could I silmulate ld and see from where it is being taken ?
Have a look at nm(1), objdump(1) and elfdump(1).
As well as the ones Charlie mentioned, "ldd" might do some of what you're looking for.
If you can relink the executable, the simplest way to find out where references and definitions come from is using ld -y flag. For example:
$ cat t.c
int main() { printf("Hello\n"); return 0; }
$ gcc t.c -Wl,-yprintf
/lib/libc.so.6: definition of printf
If you can not relink the executable, then run ldd on it, and then run 'nm -D' on all the libraries listed in order, and grep for the symbol you are interested in.
$LD_DEBUG=bindings my_program
That would print all the symbol bindings on the console.

ELF file headers

A quick question about elf file headers, I can't seem to find anything useful on how to add/change fields in the elf header. I'd like to be able to change the magic numbers and to add a build date to the header, and probably a few other things.
As I understand it the linker creates the header information, but I don't see anything in the LD script that refers to it (though i'm new to ld scripts).
I'm using gcc and building for ARM.
thanks!
Updates:
ok maybe my first question should be: is it possible to create/edit the header file at link time?
I don't know of linker script commands that can do this, but you can do it post-link using the objcopy command. The --add-section option can be used to add a section containing arbitrary data to the ELF file. If the ELF header doesn't contain the fields you want, just make a new section and add them there.
This link (teensy elf binary) was someone's answer to another question, but it goes into the intricacies of an ELF header in some detail.
You can create an object file with informative fields like a version number and link that file such that they are included in the resulting ELF binary.
Ident
For example, as part of you build process, you can generate - say - info.c that contains one or more #ident directives:
#ident "Build: 1.2.3 (Halloween)"
#ident "Environment: example.org"
Compile it:
$ gcc -c info.c
Check if the information is included:
$ readelf -p .comment info.o
String dump of section '.comment':
[ 1] Build: 1.2.3 (Halloween)
[ 1a] Environment: example.org
[ 33] GCC: (GNU) 7.2.1 20170915 (Red Hat 7.2.1-2)
Alternatively, you can use objdump -s --section .comment info.o. Note that GCC also writes its own comment, by default.
Check the information after linking an ELF executable:
$ gcc -o main main.o info.o
$ readelf -p .comment main
String dump of section '.comment':
[ 0] GCC: (GNU) 7.2.1 20170915 (Red Hat 7.2.1-2)
[ 2c] Build: 1.2.3 (Halloween)
[ 45] Environment: example.org
Comment Section
Using #ident in a C translation unit is basically equivalent to creating a .comment section in an assembler file. Example:
$ cat info.s
.section .comment
.string "Build: 1.2.3 (Halloween)"
.string "Environment: example.org"
$ gcc -c info.s
$ readelf -p .comment info.o
String dump of section '.comment':
[ 0] Build: 1.2.3 (Halloween)
[ 19] Environment: example.org
Using an uncommon section name works, as well (e.g. .section .blahblah). But .comment is used and understood by other tools. GNU as also understands the .ident directive, and this is what GCC translates #ident to.
With Symbols
For data that you also want to access from the ELF executable itself you need to create symbols.
Objcopy
Say you want to include some magic bytes stored in a data file:
$ cat magic.bin
2342
Convert into a object file with GNU objcopy:
$ objcopy -I binary -O elf64-x86-64 -B i386 \
--rename-section .data=.rodata,alloc,load,readonly,data,contents \
magic.bin magic.o
Check for the symbols:
$ nm magic.o
0000000000000005 R _binary_magic_bin_end
0000000000000005 A _binary_magic_bin_size
0000000000000000 R _binary_magic_bin_start
Example usage:
#include <stdio.h>
#include <string.h>
#include <inttypes.h>
extern const char _binary_magic_bin_start[];
extern const char _binary_magic_bin_end[];
extern const unsigned char _binary_magic_bin_size;
static const size_t magic_bin_size = (uintptr_t) &_binary_magic_bin_size;
int main()
{
char s[23];
memcpy(s, _binary_magic_bin_start,
_binary_magic_bin_end - _binary_magic_bin_start);
s[magic_bin_size] = 0;
puts(s);
return 0;
}
Link everything together:
$ gcc -g -o main_magic main_magic.c magic.o
GNU ld
GNU ld is also able to turn data files into object files using an objcopy compatible naming scheme:
$ ld -r -b binary magic.bin -o magic-ld.o
Unlike objcopy, it places the symbols into the .data instead of the .rodata section, though (cf. objdump -h magic.o).
incbin
In case GNU objcopy isn't available, one can use the GNU as .incbin directive to create the object file (assemble with gcc -c incbin.s):
.section .rodata
.global _binary_magic_bin_start
.type _binary_magic_bin_start, #object
_binary_magic_bin_start:
.incbin "magic.bin"
.size _binary_magic_bin_start, . - _binary_magic_bin_start
.global _binary_magic_bin_size
.type _binary_magic_bin_size, #object
.set _binary_magic_bin_size, . - _binary_magic_bin_start
.global _binary_magic_bin_end
.type _binary_magic_bin_end, #object
.set _binary_magic_bin_end, _binary_magic_bin_start + _binary_magic_bin_size
; an alternate way to include the size
.global _binary_magic_bin_len
.type _binary_magic_bin_len, #object
.size _binary_magic_bin_len, 8
_binary_magic_bin_len:
.quad _binary_magic_bin_size
xxd
A more portable alternative that doesn't require GNU objcopy nor GNU as is to create an intermediate C file and compile and link that. For example with xxd:
$ xxd -i magic.bin | sed 's/\(unsigned\)/const \1/' > magic.c
$ gcc -c magic.c
$ nm magic.o
0000000000000000 R magic_bin
0000000000000008 R magic_bin_len
$ cat magic.c
const unsigned char magic_bin[] = {
0x32, 0x33, 0x34, 0x32, 0x0a
};
const unsigned int magic_bin_len = 5;
I'm fairly sure that a sufficiently complex ld script can do what you want. However, I have no idea how.
On the other hand, elfsh can easily do all sorts of manipulations to elf objects, so give it a whirl.
You might be able to use libmelf, a dead project on freshmeat, but available from LOPI - http://www.ipd.bth.se/ska/lopi.html
Otherwise, you can get the spec and (over)write the header yourself.
I haven't done this in awhile, but can't you just append arbitrary data to an executable. If you always append fixed-size data it would be trivial to recover anything you append. Variable size wouldn't be much harder. Probably easier than messing w/ elf headers and potentially ruining you executables.
I didn't finish the book but iirc Linkers and Loaders by John Levine had the gory details that you would need to be able to do this.
In Solaris you can use elfedit but I think you are really asking solutions for Linux. Linux Is Not UniX :P
In Linux Console:
$ man ld
$ ld --verbose
HTH

Resources