examining text segment in a statically linked executable - linux

I have a statically linked application binary that links against multiple user libraries and the pthread library. The application only uses a limited set of functions from each of these libraries. From a previous post Size of a library and the executable and from my experiments I realize that the linker only includes functions(in the executable) that are used/needed and not the entire contents of library.
I want to find out which functions from each of the respective libraries are linked
to the executable and their addresses (VMA). Ultimately I want to compile a list that contains the start and the end Virtual Memory Addresses (VMAs) for the each of the libraries based on the functions (in the library) that are mapped to the text segment.
One approach to do this is to make a list of functions in the library and then look for each of these functions in the executable and the corresponding Virtual Memory address it is mapped to. But this seems rather tedious to me. Is there a simpler way to accomplish this? Thanks.

I want to find out which functions from each of the respective libraries are linked to the executable and their addresses (VMA).
Add -Wl,-Map=foo.map argument to your link line. The resulting foo.map file will tell you all of the above.
Ultimately I want to compile a list that contains the start and the end Virtual Memory Addresses (VMAs) for the each of the libraries
That assumes that the linker does not re-order functions (and thus all functions from a single library occupy continuous range of text addresses). This assumption is probably true in simple cases, but in no way is guaranteed. See e.g. this patch.

Related

Interpose statically linked binaries

There's a well-known technique for interposing dynamically linked binaries: creating a shared library and and using LD_PRELOAD variable. But it doesn't work for statically-linked binaries.
One way is to write a static library that interpose the functions and link it with the application at compile time. But this isn't practical because re-compiling isn't always possible (think of third-party binaries, libraries, etc).
So I am wondering if there's a way to interpose statically linked binaries in the same LD_PRELOAD works for dynamically linked binaries i.e., with no code changes or re-compilation of existing binaries.
I am only interested in ELF on Linux. So it's not an issue if a potential solution is not "portable".
One way is to write a static library that interpose the functions and link it with the application at compile time.
One difficulty with such an interposer is that it can't easily call the original function (since it has the same name).
The linker --wrap=<symbol> option can help here.
But this isn't practical because re-compiling
Re-compiling is not necessary here, only re-linking.
isn't always possible (think of third-party binaries, libraries, etc).
Third-party libraries work fine (relinking), but binaries are trickier.
It is still possible to do using displaced execution technique, but the implementation is quite tricky to get right.
I'll assume you want to interpose symbols in main executable which came from a static library which is equivalent to interposing a symbol defined in executable. The question thus reduces to whether it's possible to intercept a function defined in executable.
This is not possible (EDIT: at least not without a lot of work - see comments to this answer) for two reasons:
by default symbols defined in executable are not exported so not accessible to dynamic linker (you can alter this via -export-dynamic or export lists but this has unpleasant performance or maintenance side effects)
even if you export necessary symbols, ELF requires executable's dynamic symtab to be always searched first during symbol resolution (see section 1.5.4 "Lookup Scope" in dsohowto); symtab of LD_PRELOAD-ed library will always follow that of executable and thus won't have a chance to intercept the symbols
What you are looking for is called binary instrumentation (e.g., using Dyninst or ptrace). The idea is you write a mutator program that attaches to (or statically rewrites) your original program (called mutatee) and inserts code of your choice at specific points in the mutatee. The main challenge usually revolves around finding those insertion points using the API provided by the instrumentation engine. In your case, since you are mainly looking for static symbols, this can be quite challenging and would likely require heuristics if the mutatee is stripped of non-dynamic symbols.

How can I know the addresses and sizes of all functions (including shared libraries) when the program is running under Linux?

The program is already loaded into memory. And I need to know all the function addresses and their sizes within the program source code (using tools like nm is OK). All functions mean, to include loaded shared library functions like "printf", and should be the real function address, not the PLT address. How could I implement that?
I am not sure that your question always makes sense, even if Employed Russian's answer gives a practically useful clue.
First, static functions in a stripped executable or library (including static functions inside shared libraries like libc) have no visible ELF symbols
Second, some compilers are able (using cloning of functions or other techniques) when optimizing strongly to have function code which is non-contiguous, e.g. because two functions share a piece of common machine code.
In a certain sense, this also happens when the compiler is optimizing tail-calls.
And most compilers are able to inline function calls (in particular to functions which are not defined as inline). With link-time optimization (e.g. code compiled and linked with gcc -flto -O3) it may happen even between several translation units.
You could experiment with dladdr(3) & backtrace(3). You'll find out that function code might have surprising or even poorly defined "boundaries".
I need to know all the function addresses and their sizes within the program source code
(using tools like nm is OK)
You could read /proc/self/maps to find out all ELF images currently mapped into your process, and run nm on each one.
That will give you all function addresses (for shared libraries, you would need to adjust nm output by the relocation (which you also get from /proc/self/maps), and most of function sizes.

How can I convert dynamically linked application to statically one?

I have an application, say gedit, which is dynamically linked and I don't have the source code. So I can not compile it as I like. what I want to do is to make it statically linked and move it to the system which doesn't have the necessary libraries to run that application. So is it possible to do it and how?
It is theoretically possible. You basically have to do the same job that the dynamic linker does, with some modifications, i.e.
dump all sections from the original file
resolve symbols
locate libraries
instead of loading them into memory, assemble them into a "virtual image"
resolve internal links
dump the whole thing in a independent file.
So objdump, readelf, and objcopy will be some of your friends.
The task is not easy and the result will be neither automatic, nor (probably) stable.
You may want to check out this code by someone else that tried the same, by actually intercepting the dynamic linker (i.e. all steps above, except the last) and dumping the results to disk.
It is based on this tool, so it's anyone's bet whether it works on the newest kernels.
(It probably doesn't - and you need at least to patch it to reflect the new structures. This is my attempt at doing so. Caveat emptor).

Simple way to reorder ELF file sections

I'm looking for a simple way to reorder the ELF file sections. I've got a sequence of custom sections that I would like all to be aligned in a certain order.
The only way I've found how to do it is to use a Linker script. However the documentation indicates that specifying a custom linker script overrides the default. The default linker script has a lot of content in it that I don't want to have to duplicate in my custom script just to get three sections always together in a certain order. It does not seem very flexible to hard code the linker behavior like that.
Why do I want to do this? I have a section of data that I need to know the run-time memory location of (beginning and end). So I've created two additional sections and put sentinel variables in them. I then want to use the memory locations of those variables to know the extent of the unknown section in memory.
.markerA
int markerA;
.targetSection
... Lots of variables ...
.markerB
int markerB;
In the above example, I would know that the data in .targetSection is between the address of markerA and markerB.
Is there another way to accomplish this? Are there libraries that would let me read in the currently executing ELF image and determine section location and size?
You can obtain addresses of loaded sections by analyzing the ELF-File format. Details may be found e.g. in
Tool Interface Standard (TIS)
Portable Formats Specification,
version 1.2
(http://refspecs.freestandards.org/elf/elf.pdf)
for a short impression which information is available its worth to take a look at readelf
readelf -S <filename>
returns a list of all sections contained in .
The sections which were mapped into memory were typed PROGBITS.
The address your are looking for is displayed in the column Addr.
To obtain the memory location you have to add the load address of your
executable / shared object
There are a few ways to determine the load adress of your executable/shared object:
you may parse /proc/[pid]/maps (the first column contains the load address). [pid] is the process id
if you know one function contained in your file you can apply dlsym to receive a pointer to the function. That pointer is the input parameter for dladdr returning a Dl_info struct containing the requested load address
To get some ELF information the library
libelf
may be a helpful companian (I detected it after studying the above mentioned TIS so I only took a short look at it and I don't know deeper details)
I hope this sketch of a possible solution will help.
You may consider using GCC's initializers to reference the variables which would go into a separate section otherwise and maintain all their pointers in an array. I recommend using initializers because this works file-independently.
You may look at ELFIO library. It contains WriteObj and Writer examples. By using the library, you will be able to create/modify ELF binary file programmatically.
I'm afraid override the default link script is the simple solution.
Since you worried about it might not be flexible (even I think the link script does change that often), you could write a script to generate a link script based on host system's default ld script ("ld --verbose") and insert your special sections into it.

Any downsides to using statically linked applications on Linux?

I seen several discussions here on the subject, but wanted to ask about my particular situation:
If I have some 3rd part libraries which my application is using, and I'd like to link them together in order to save myself the hassle in LD_LIBRARY, etc., is there any downside to it on Linux, other then larger file size?
Also, is it possible to statically link only some libraries, and other (standard Linux libraries) to link dynamically?
Thanks.
It is indeed possible to dynamically link against some libraries and statically link against others.
It sounds like what you really want to do is dynamically link against the system libraries, and statically link against the nonstandard ones that a user may not have installed (or that different users may have different installations of).
That's perfectly reasonable.
It's not generally a good idea to statically link against system libraries, especially libc.
It can often make sense to statically link against libraries that do not come with the OS and that will not be distributed with your application.
There are some bits of libc - those that use nsswitch - that need to load libraries dynamically. This can cause problems if you want to produce a completely static binary.
Statically linking your 3rd party libraries into your application should be completely fine.
The statically linked binary will be larger than if you had uses a shared library, but I find that disadvantage outweight the library path hassles, provided I control the distribution of all the libraries involved. If you are dependant on a particular distros shared libraries, then you have no choice but to use dynamic linking.
The main disadvantage I see is your application loses any automatic bugfixes that might be applied to a shared library. On the flip-side you don't get new bugs.
Static linking does not just affect the file size of the library, it also affects the memory footprint and start up time of the application. Dynamically linked libraries are loaded once no matter how many programs use them. Statically linked libraries must be loaded once per program that uses them (because they are now part of that program).
To answer your second question, yes, it is possible to have dynamic and static libraries linked to the same application. Just be careful to avoid interlibrary dependencies so you don't have a problem with library order. You should be able to list the libraries in any arbitrary order. Where I work, we prefer to list them alphabetically.
Edit: To link a static library, use the flag -lfoo. To add a directory to the library search path, use -L/path/to/libfoo.
Edit: You don't have to link a dynamic library. Your program can use a function provided by your compiler to open a dynamic library at run time, or you can link it at compile time and the compiler will resolve the symbols but not include them in the binary. See pjc50's comment below.
Statically linking will make your binary bulky, but you wont need to have a shared version of that library on the target runtime environment. This is especially the case while developing embedded apps.

Resources