Import names in ELF binary - linux

Where does the ELF format stores the names of imported functions? Is it always possible to enumerate all import names, like for PE executables?
For example, if a binary is using printf is it possible to tell it does, just by static analysis of the binary itself?

In ELF they're called undefined symbols. You can view the list of undefined symbols by:
nm -D <file>|grep -w U
objdump -T <file>|grep "\*UND\*"
ELF files don't specify which symbols come from which libraries; it just adds a list of shared libraries to link to into the ELF binary, and lets the linker find the symbols in the libraries.

Related

What's the rule of dynamic library searching for ld?

Linux separates the linker-time search path and run-time search path.
For the run-time search path, I found the rule for ld.so in its man page (8 ld.so):
DT_RPATH
environment LD_LIBRARY_PATH
DT_RUNPATH
ld.so.cache
/lib, /usr/lib
But for linker-time search path, no luck for ld :(
Man page for ld (1 ld) says, besides -L option:
The default set of paths searched (without being specified with -L) depends on which emulation mode ld is using, and in some cases also on how it was configured.
The paths can also be specified in a link script with the "SEARCH_DIR" command. Directories specified this way are searched at the point in which the linker script appears in the command line.
Does the "default set of paths" depending on emulation mode mean "SEARCH_DIR"?
misssprite, to look for the linker search path for specific ELF emulation just run ld -m<emulation> --verbose | grep SEARCH_DIR
Speaking about the ld itself, the library path search order is the following:
Directories specified via -L command line flags
Directories in the LIBRARY_PATH environment variable
SEARCH_DIR variables in the linker script.
You can look what directories are specified in the default linker script by running ld --verbose | grep SEARCH_DIR. Note that = in the SEARCH_DIR values will be replaced by the value of --sysroot option if you specify it.
Usually ld is not invoked directly, but via compiler driver which passes several -L options to the linker. In the case of gcc or clang you can print the additional library search directories added by a compiler by invoking it with -print-search-dirs option. Also note that if you specify some machine-specific compiler flags (like e.g -m32 as misssprite mentioned) than the linker may use different linker script according to the chosen ELF emulation. In the case of gcc you can use -dumpspecs option to look how different compiler flags affect the linker invocation. But IMHO the simplest way to look for the linker command line is to compile and link a simple program with -v specified.
misssprite, there is no search for ld.so or ld-linux.so in the binutils's ld linker.
When dynamic program is build with gcc, it uses option -dynamic-linker of ld (collect2) program: http://man7.org/linux/man-pages/man1/ld.1.html
-Ifile, --dynamic-linker=file
Set the name of the dynamic linker. This is only meaningful when
generating dynamically linked ELF executables. The default
dynamic linker is normally correct; don't use this unless you
know what you are doing.")
Usually used as runtime loader for ELF, the "ld-linux.so" is registered as interpreter in the dynamic ELF file, program header INTERP (.interp), check output readelf -l ./dynamic_application. This field is for full path, as I understand.
When there is no gcc (directly called 'ld' program) or no this option was given, ld uses hardcoded string of full path to ld.so; and this default is incorrect for most OS, including Linux:
https://github.com/bneumeier/binutils/blob/db980de65ca9f296aae8db4d13ee884f0c18ac8a/bfd/elf64-x86-64.c#L510
/* The name of the dynamic interpreter. This is put in the .interp
section. */
#define ELF64_DYNAMIC_INTERPRETER "/lib/ld64.so.1"
#define ELF32_DYNAMIC_INTERPRETER "/lib/ldx32.so.1"
https://github.com/bneumeier/binutils/blob/db980de65ca9f296aae8db4d13ee884f0c18ac8a/gold/x86_64.cc#L816
template<>
const Target::Target_info Target_x86_64<64>::x86_64_info =
...
"/lib/ld64.so.1", // program interpreter
const Target::Target_info Target_x86_64<32>::x86_64_info =
...
"/libx32/ldx32.so.1", // program interpreter
Correct dynamic linker/loader path is hardcoded in machine spec files of gcc, grep output of gcc -dumpspecs command for ld-linux for -dynamic-linker option value.

How to know in which library function is located?

I have an executable and it calls a function. There are a lot of static and dynamic libraries that are linked with this exe. I need to know which one provides this function.
You can get a list of shared libraries used by the executable foo like this:
ldd -v foo
This post:
How do I list the symbols in a .so file
explains how to list symbols (exported functions) in a shared library.
If your library is linked statically, it'll show up in the list of symbols inside the executable itself:
nm -C foo
The same command will also list the names of all exported symbols (function names) in a static library:
nm -C libasan.a
You may want to construct a shell script to enumerate your libraries, looking for the specific function that you want inside each one. For example, to figure out which .a file provides sprintf():
for x in *.a; do echo --- ${x} ---; nm -C $x | grep sprintf ; done

linux gcc linking, duplicate symbols? [duplicate]

Is there any way we can get gcc to detect a duplicate symbol in static libraries vs the main code (Or another static library ?)
Here's the situation:
main.c erroneously contained a function definition, e.g. with the signature uint foohash(const char*)
foo.c also contains a function definition with the signature uint foohash(const char*)
foo.c and other source files are compiled to a static util library, which the main program links in, i.e. something like:
gcc -o main main.o util.o -L ./libs -lfooutils
So, now main.o and libs/libfooutils.a both contain a foohash function. Presumably the linker found that symbol in main.o and doesn't bother looking for it elsewhere.
Is there any way we can get gcc to detect such a situation ?
Indeed as Simon Richter stated, --whole-archive option can be useful. Try to change your command-line to:
gcc -o main main.o util.o -L ./libs -Wl,--whole-archive -lfooutils -Wl,--no-whole-archive
and you'll see a multiple definition error.
gcc calls the ld program for linking. The relevant ld options are:
--no-define-common
--traditional-format
--warn-common
See the man page for ld. These should be what you need to experiment with to get the warnings sought.
Short answer: no.
GCC does not actually do anything with libraries. It is the task of ld, the linker (called automatically by GCC) to pull in symbols from libraries, and that's really a fairly dumb tool.
The linker has lots of complex jiggery pokery for combining different types of data from different sources, and supporting different file formats, and all the evil little details of binary executables, but in the end, all it really does is look for undefined symbols and find the definitions.
What you can do is a link trace (pass -t to gcc) to see what comes from where. Or else run nm on all the object files and libraries in your system, and write a script to detect duplicates.

how to determine why a dynamic library is linked against an application?

I have a linux app I'm building from source. When I run ldd against the binary, I understand most of the libraries...but not all.
Is there a way to add a flag to ld or gcc/g++ or anything I can do to determine why the linker chose to link against specific libraries?
Edit:
To explore the route #shloim set up, I tried the following:
> nm -u /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
nm: /lib/x86_64-linux-gnu/libcrypto.so.1.0.0: no symbols
> file /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
/lib/x86_64-linux-gnu/libcrypto.so.1.0.0: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=230ebe6145b6681d0cb7e4c9021f0d899c02e0c4, stripped
Is there an obvious reason why nm would not work on libcrypto?
This should show you all symbols used in the so file that are undefined within the so:
nm -u <your_so_file>
You can then compare it with
nm --defined-only <3rd_party_so_file>
And try to figure out the common symbols
Is there an obvious reason why nm would not work on libcrypto?
Generally nm is to list the symbols of object files. Here, nm is used for share object file. So try like this nm -D libcrypto.so.
readelf or objdump can also be used to check the symbols present in shared objects.
readelf -Ws will show all the symbols

How to create a executable hex from elf file format

I am very very new to this, I have elf file input.out and need to create hex executable from it. I am using objcopy to create executable in intel hex format as follows
objcopy -O ihex input.out out.hex
by this out.hex contains data from all sections (.interp, .note.ABI-tag etc), but i am not sure if all of it is required for executable. Is just .text section enough for creating executable hex so can i just use as below or any more sections are required
objcopy -j.text -O ihex input.out out.hex
Also if there any good reference to understand this in detail, I couldn't find much by Goggling. Probably I don't know what to search.
It could work with
objcopy -O ihex input.elf output.hex
Add the -S will strip useless sections.

Resources