Removing a specific symbol from an SO file? - linux

I tried to remove symbol using:
objcopy -v -I elf32-little -N <function> main_lib.so new_lib.so
On which it says:
copy from `main_lib.so' [elf32-little] to `new_lib.so' [elf32-little]
There is no change between the two files, and the function is not removed.
I used to following commands to get the list of functions and format:
readelf -Ws main_lib.so > main_functions.txt
file main_lib.so
File command says:
main_lib.so: ELF 32-bit LSB shared object, ARM, EABI5 version 1 (SYSV),
dynamically linked (uses shared libs), BuildID[md5/uuid]=..., stripped

Related

Why doesn't `ld -L -l...` match the behavior of `ld /path/to/library.so`?

I have two questions highlighted below. I'm using 64-bit Linux.
I saw on another post that MUSL was mentioned as a libc implementation.
I tried using this with the following Hello world assembly program that uses two libc functions, write and _exit.
.data
hello:
.ascii "Hello, world\n"
.text
.globl _start
_start:
movl $13, %edx
movl $hello, %esi
movl $1, %edi
call write
movl $0, %edi
call _exit
I assembled the code with:
# Command 1
$ as -o hello.o hello.s
I then ran ld to generate an executable that statically links MUSL libc.
# Command 2
$ ld hello.o /usr/lib/x86_64-linux-musl/libc.a
That generated an a.out file that works as expected, outputting "Hello, world" when executed.
I also tried a different invocation of the preceding ld command, using -static -lc instead of specifying the path directly, and also using -L to give the path to MUSL so that glibc is not used, since the latter is already on ld's search path.
# Command 3
$ ld hello.o -L/usr/lib/x86_64-linux-musl/ -static -lc
That worked as expected.
Next I tried to dynamically link MUSL libc.
# Command 4
$ ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 hello.o \
/usr/lib/x86_64-linux-musl/libc.so
That appears to work as expected. I can run a.out, and calling ldd on a.out shows that MUSL's libc is linked.
Lastly, I tried an analogous modification relative to the statically linked version earlier, using -lc and -L instead of specifying the path to the .so file directly.
# Command 5
$ ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 hello.o \
-L/usr/lib/x86_64-linux-musl -lc
The program does not run properly, outputting the error:
bash: ./a.out: No such file or directory
When I run that same ld command with the --verbose flag, the output is the same as when passing --verbose to the earlier ld command (Command 4 that generated a working executable).
Running ldd on a.out also outputs an error:
./a.out: error while loading shared libraries: /usr/lib/x86_64-linux-gnu/libc.so: invalid ELF header
Question 1: Why does calling ld with -L and -lc in this case not match the behavior from earlier, where I specified the .so file directly?
I noticed that if I change the specified dynamic linker to /lib/ld-musl-x86_64.so.1, the generated a.out runs as expected.
# Command 6
$ ld -dynamic-linker /lib/ld-musl-x86_64.so.1 hello.o \
-L/usr/lib/x86_64-linux-musl -lc
However, calling ldd on the generated a.out produces the following error, which was not the case earlier when I did not use -lc and -L in Command 4:
./a.out: error while loading shared libraries: /usr/lib/x86_64-linux-gnu/libc.so: invalid ELF header
Question 2: Why does ld fail on this binary, but worked earlier when I passed the path of the .so file to ldd and used a different dynamic linker?
The problem I encountered was due to using -L with the linker, but not having that path available for loading libc.so at runtime.
I noticed this by calling readelf --dynamic --program-headers on the programs generated by Command 4 and Command 5.
# Command 4
0x0000000000000001 (NEEDED) Shared library: [/usr/lib/x86_64-linux-musl/libc.so]
# Command 5
0x0000000000000001 (NEEDED) Shared library: [libc.so]
I was able to resolve the issue for the program generated by Command 5 by using an environment variable, LD_LIBRARY_PATH=/usr/lib/x86_64-linux-musl, when running the program, or alternatively by passing an extra argument to ld, -rpath /usr/lib/x86_64-linux-musl.

What's the rule of dynamic library searching for ld?

Linux separates the linker-time search path and run-time search path.
For the run-time search path, I found the rule for ld.so in its man page (8 ld.so):
DT_RPATH
environment LD_LIBRARY_PATH
DT_RUNPATH
ld.so.cache
/lib, /usr/lib
But for linker-time search path, no luck for ld :(
Man page for ld (1 ld) says, besides -L option:
The default set of paths searched (without being specified with -L) depends on which emulation mode ld is using, and in some cases also on how it was configured.
The paths can also be specified in a link script with the "SEARCH_DIR" command. Directories specified this way are searched at the point in which the linker script appears in the command line.
Does the "default set of paths" depending on emulation mode mean "SEARCH_DIR"?
misssprite, to look for the linker search path for specific ELF emulation just run ld -m<emulation> --verbose | grep SEARCH_DIR
Speaking about the ld itself, the library path search order is the following:
Directories specified via -L command line flags
Directories in the LIBRARY_PATH environment variable
SEARCH_DIR variables in the linker script.
You can look what directories are specified in the default linker script by running ld --verbose | grep SEARCH_DIR. Note that = in the SEARCH_DIR values will be replaced by the value of --sysroot option if you specify it.
Usually ld is not invoked directly, but via compiler driver which passes several -L options to the linker. In the case of gcc or clang you can print the additional library search directories added by a compiler by invoking it with -print-search-dirs option. Also note that if you specify some machine-specific compiler flags (like e.g -m32 as misssprite mentioned) than the linker may use different linker script according to the chosen ELF emulation. In the case of gcc you can use -dumpspecs option to look how different compiler flags affect the linker invocation. But IMHO the simplest way to look for the linker command line is to compile and link a simple program with -v specified.
misssprite, there is no search for ld.so or ld-linux.so in the binutils's ld linker.
When dynamic program is build with gcc, it uses option -dynamic-linker of ld (collect2) program: http://man7.org/linux/man-pages/man1/ld.1.html
-Ifile, --dynamic-linker=file
Set the name of the dynamic linker. This is only meaningful when
generating dynamically linked ELF executables. The default
dynamic linker is normally correct; don't use this unless you
know what you are doing.")
Usually used as runtime loader for ELF, the "ld-linux.so" is registered as interpreter in the dynamic ELF file, program header INTERP (.interp), check output readelf -l ./dynamic_application. This field is for full path, as I understand.
When there is no gcc (directly called 'ld' program) or no this option was given, ld uses hardcoded string of full path to ld.so; and this default is incorrect for most OS, including Linux:
https://github.com/bneumeier/binutils/blob/db980de65ca9f296aae8db4d13ee884f0c18ac8a/bfd/elf64-x86-64.c#L510
/* The name of the dynamic interpreter. This is put in the .interp
section. */
#define ELF64_DYNAMIC_INTERPRETER "/lib/ld64.so.1"
#define ELF32_DYNAMIC_INTERPRETER "/lib/ldx32.so.1"
https://github.com/bneumeier/binutils/blob/db980de65ca9f296aae8db4d13ee884f0c18ac8a/gold/x86_64.cc#L816
template<>
const Target::Target_info Target_x86_64<64>::x86_64_info =
...
"/lib/ld64.so.1", // program interpreter
const Target::Target_info Target_x86_64<32>::x86_64_info =
...
"/libx32/ldx32.so.1", // program interpreter
Correct dynamic linker/loader path is hardcoded in machine spec files of gcc, grep output of gcc -dumpspecs command for ld-linux for -dynamic-linker option value.

how to determine why a dynamic library is linked against an application?

I have a linux app I'm building from source. When I run ldd against the binary, I understand most of the libraries...but not all.
Is there a way to add a flag to ld or gcc/g++ or anything I can do to determine why the linker chose to link against specific libraries?
Edit:
To explore the route #shloim set up, I tried the following:
> nm -u /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
nm: /lib/x86_64-linux-gnu/libcrypto.so.1.0.0: no symbols
> file /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
/lib/x86_64-linux-gnu/libcrypto.so.1.0.0: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=230ebe6145b6681d0cb7e4c9021f0d899c02e0c4, stripped
Is there an obvious reason why nm would not work on libcrypto?
This should show you all symbols used in the so file that are undefined within the so:
nm -u <your_so_file>
You can then compare it with
nm --defined-only <3rd_party_so_file>
And try to figure out the common symbols
Is there an obvious reason why nm would not work on libcrypto?
Generally nm is to list the symbols of object files. Here, nm is used for share object file. So try like this nm -D libcrypto.so.
readelf or objdump can also be used to check the symbols present in shared objects.
readelf -Ws will show all the symbols

tagging a shared library with checksum

How can I tag ELF libs with build IDs?
I downloaded a precompiled library that has a sha1 sum in it:
user#localhost ~/tmp $ file foo.so.0
foo.so.0: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=0x7e3374eb34cafb69d3dca8b126f4aa33d44bb465, stripped
user#localhost ~/tmp $ ldd foo.so.0
linux-vdso.so.1 (0x00007fff955b1000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f436d3c9000)
libc.so.6 => /lib64/libc.so.6 (0x00007f436d022000)
/lib64/ld-linux-x86-64.so.2 (0x0000003000000000)
From http://fedoraproject.org/wiki/RolandMcGrath/BuildID
ld: new option --build-id:
This adds an option to ld to synthesize a .note.gnu.build-id section with type SHT_NOTE and flags SHF_ALLOC (read-only data), that contains an ELF note header and the build ID bits. This then goes into the link as if it were part of the first object file (so it may be placed or merged by the linker script). The build ID bits are determined as the very last thing ld does before writing out the linked file. You can give --build-id=style chose md5, uuid (128 random bits), or 0xabcdef (your chosen bytes in hex). Just --build-id defaults to md5, which computes an 128-bit MD5 signature based all the ELF header bits and section contents in the file--i.e., an ID that is unique among the set of meaningful contents for ELF files and identical when the output file would otherwise have been identical.
The Linux binutils-2.17.50.0.17 release includes this, in f8test1.

Import names in ELF binary

Where does the ELF format stores the names of imported functions? Is it always possible to enumerate all import names, like for PE executables?
For example, if a binary is using printf is it possible to tell it does, just by static analysis of the binary itself?
In ELF they're called undefined symbols. You can view the list of undefined symbols by:
nm -D <file>|grep -w U
objdump -T <file>|grep "\*UND\*"
ELF files don't specify which symbols come from which libraries; it just adds a list of shared libraries to link to into the ELF binary, and lets the linker find the symbols in the libraries.

Resources