What are the uses for custom linker scripts? - gnu

What are the common uses for custom linker scripts? When would I want to use one? What important things can I do with a custom linker script that I can't do with the default linker script? When do they provide advantages over the default linker script and what are those advantages?
I have in mind C++ if it affects the answer.

The linker script can be considered a mapping from the "generic" data/code/uninitialized-data/read-only-constants/symbols sections in an ELF relocatable object file to the peculiarities required by the target system's executable loader. In general, the "default" linker script for a toolchain and target OS is provided with the toolchain; custom linker scripts are used primarily in the embedded world, where there may not be an OS at all, or when writing low-level code such as bootloaders or OS kernels, as in those applications, you may not have the same guarantees (or any guarantees) as to "what goes where" vs. what an OS' executable loader provides for you.

I don't know the common uses, but in our case it is customization, because of a non standard os. We use it to set some symbols to identify special variables, which get another treatment while loading.

Related

Interpose statically linked binaries

There's a well-known technique for interposing dynamically linked binaries: creating a shared library and and using LD_PRELOAD variable. But it doesn't work for statically-linked binaries.
One way is to write a static library that interpose the functions and link it with the application at compile time. But this isn't practical because re-compiling isn't always possible (think of third-party binaries, libraries, etc).
So I am wondering if there's a way to interpose statically linked binaries in the same LD_PRELOAD works for dynamically linked binaries i.e., with no code changes or re-compilation of existing binaries.
I am only interested in ELF on Linux. So it's not an issue if a potential solution is not "portable".
One way is to write a static library that interpose the functions and link it with the application at compile time.
One difficulty with such an interposer is that it can't easily call the original function (since it has the same name).
The linker --wrap=<symbol> option can help here.
But this isn't practical because re-compiling
Re-compiling is not necessary here, only re-linking.
isn't always possible (think of third-party binaries, libraries, etc).
Third-party libraries work fine (relinking), but binaries are trickier.
It is still possible to do using displaced execution technique, but the implementation is quite tricky to get right.
I'll assume you want to interpose symbols in main executable which came from a static library which is equivalent to interposing a symbol defined in executable. The question thus reduces to whether it's possible to intercept a function defined in executable.
This is not possible (EDIT: at least not without a lot of work - see comments to this answer) for two reasons:
by default symbols defined in executable are not exported so not accessible to dynamic linker (you can alter this via -export-dynamic or export lists but this has unpleasant performance or maintenance side effects)
even if you export necessary symbols, ELF requires executable's dynamic symtab to be always searched first during symbol resolution (see section 1.5.4 "Lookup Scope" in dsohowto); symtab of LD_PRELOAD-ed library will always follow that of executable and thus won't have a chance to intercept the symbols
What you are looking for is called binary instrumentation (e.g., using Dyninst or ptrace). The idea is you write a mutator program that attaches to (or statically rewrites) your original program (called mutatee) and inserts code of your choice at specific points in the mutatee. The main challenge usually revolves around finding those insertion points using the API provided by the instrumentation engine. In your case, since you are mainly looking for static symbols, this can be quite challenging and would likely require heuristics if the mutatee is stripped of non-dynamic symbols.

Why does uClibc UCLIBC_BUILD_NOEXECSTACK not actually use the linker flag -Wl,-z,noexecstack

One modern Linux security hardening tactic is to compile & link code with the option -Wl,-z-noexecstack, this marks the DLL or binary as not needing an executable stack. This condition can be checked using readelf or other means.
I have been working with uClibc and noticed that it produces objects (.so files) that do not have this flag set. Yet uClibc has a configuration option UCLIBC_BUILD_NOEXECSTACK which according to the help means:
Mark all assembler files as noexecstack, which will mark uClibc
as not requiring an executable stack. (This doesn't prevent other
files you link against from claiming to need an executable stack, it
just won't cause uClibc to request it unnecessarily.)
This is a security thing to make buffer overflows harder to exploit.
...etc...
On some digging into the Makefiles this is correct - the flag is only applied to the assembler.
Because the flag is only passed to the assembler does this mean that the uClibc devs have missed an important hardening flag? There are other options, for example UCLIBC_BUILD_RELRO which do result in the equivalent flag being added to the linker (as -Wl,-z,relro)
However a casual observer could easily misread this and assume, as I originally did, that UCLIBC_BUILD_NOEXECSTACK is actually marking the .so file when it is in fact not. OpenWRT for example ensures that that flag is set when it builds uClibc.
Why would uClibc not do things the 'usual' way? What am I missing here? Are the libraries (e.g. librt.so, libpthread.so, etc) actually not NX?
EDIT
I was able to play with the Makefiles and get the noexecstack bit by using the -Wl,-z,noexecstack argument. So why would they not use that as well?
OK, it turns out after list conversation and further research that:
the GNU linker sets the DLL / executable stack state based on the 'lowest common denominator' i.e. if any linked or referenced part has an exec stack then the whole object is set this way
the 'correct' way to resolve this problem is actually to find and fix assembly / object files that use an exec stack when they dont need to.
Using the linker to 'fix' things is a workaround if you can't otherwise fix the root cause.
So for uClibc solution is to submit a bug so that the underlying objects get fixed. Otherwise anything linked with static libraries wont get a non-exec stack.
For my own question, if building a custom firmware not using any static libraries it is possibly sufficient to use the linker flag.
References:
Ubuntu Security Team - Executable Stacks

How to tell if current OS uses Linux-like or MacOSX-like shared libraries?

I am aware that there are (at least) two radically different kinds of shared-library files on Unix-type systems. One is the kind used on GNU/Linux systems and probably other systems as well (with the filename ending in ".so") and the other used in Mac OS X, and also possibly other systems as well (with the filename ending in ".dylib").
My question is this --- is there any type of test I could do from a shell-script that would easily detect which of these two paradigms the current OS uses for shared libraries?
I'm sure I could find some way to easily deal with this variance --- if only I knew of a simple test I could run from a shell-script that would tell me which type of shared library is used on the current system.
Well, I guess you need to check filetypes of executables on a target platform. You may use file for that (check its output for, say, /bin/ls ). ELF is a most widely used executable type on Linux, while Mach-O is "natively" used in MacOS X.
A note: technically there're other executable types on these systems, say a.out and PEF, and, you guess, those formats have their own dynamic libraries. Frankly speaking Linux has a pluggable support for executable formats and even Win32 .EXEs may be executed "quasi-natively" in Linux (of course, they need an implementation of Win32 API working above a given kernel API, WINE is a such implemetation).
Also if you need to create a dynamically loaded library, then you should use one of those portable build systems (to name a few: GNU autotools, CMake, QMake...). Thus you'll get not only ordinary DLL extension but also linker flags, portable methods of installation/uninstallation and so on...

Does LD_LIBRARY_PATH really cause inconsistencies?

The blog article "LD_LIBRARY_PATH – or: How to get yourself into trouble!" by the DTU Computing Center states:
3. Inconsistency: This is the most common problem. LD_LIBRARY_PATH forces an application to load a shared library it wasn’t linked against, and that is quite likely not compatible with the original version. This can either be very obvious, i.e. the application crashes, or it can lead to wrong results, if the picked up library not quite does what the original version would have done. Especially the latter is sometimes hard to debug.
Is this really true? LD_LIBRARY_PATH allows us to modify the search path for dynamic libraries, but does it really suppress the soname lookup that ensures binary compatibility?
(Because, by my interpretation, the Program Library HOWTO doesn't say any such thing.)
Or is the author unaware of the concept of maintaining a consistent library versioning scheme, and therefore assuming that one is not in use for the library in question?
I think the LD_LIBRARY should only be used for testing and not for a final installation, for it allows to use a specified library before the standard library location are used. But The linux documentation project says this about LD_LIBRARY_PATH and puts it more clear than I can.
3.3.1. LD_LIBRARY_PATH
You can temporarily substitute a different library for this particular
execution. In Linux, the environment variable LD_LIBRARY_PATH is a
colon-separated set of directories where libraries should be searched
for first, before the standard set of directories; this is useful when
debugging a new library or using a nonstandard library for special
purposes. The environment variable LD_PRELOAD lists shared libraries
with functions that override the standard set, just as
/etc/ld.so.preload does. These are implemented by the loader
/lib/ld-linux.so. I should note that, while LD_LIBRARY_PATH works on
many Unix-like systems, it doesn't work on all; for example, this
functionality is available on HP-UX but as the environment variable
SHLIB_PATH, and on AIX this functionality is through the variable
LIBPATH (with the same syntax, a colon-separated list).
LD_LIBRARY_PATH is handy for development and testing, but shouldn't be
modified by an installation process for normal use by normal users;
see ``Why LD_LIBRARY_PATH is Bad'' at
http://www.visi.com/~barr/ldpath.html for an explanation of why. But
it's still useful for development or testing, and for working around
problems that can't be worked around otherwise. If you don't want to
set the LD_LIBRARY_PATH environment variable, on Linux you can even
invoke the program loader directly and pass it arguments. For example,
the following will use the given PATH instead of the content of the
environment variable LD_LIBRARY_PATH, and run the given executable:
/lib/ld-linux.so.2 --library-path PATH EXECUTABLE
Just executing ld-linux.so without arguments will give you more help
on using this, but again, don't use this for normal use - these are
all intended for debugging.
taken at august 13th 2013 from: http://tldp.org/HOWTO/Program-Library-HOWTO/shared-libraries.html
The link inside the document is old, found a the intended article here: http://xahlee.info/UnixResource_dir/_/ldpath.html
edit
You can override a library to which a program is linked to during building/installing because of the order in which ld.so will lookup a library to load at runtime. A library found at in a location specified inside de environmental variable LD_LIBRARY_PATH will be loaded instead of a library specified the default path ( /lib and the /usr/lib)
from man 8 ld.so
ld.so loads the shared libraries needed by a program, prepares the pro‐
gram to run, and then runs it. Unless explicitly specified via the
-static option to ld during compilation, all Linux programs are incom‐
plete and require further linking at run time.
The necessary shared libraries needed by the program are searched for
in the following order
o Using the environment variable LD_LIBRARY_PATH
(LD_AOUT_LIBRARY_PATH for a.out programs). Except if the exe‐
cutable is a setuid/setgid binary, in which case it is ignored.
o From the cache file /etc/ld.so.cache which contains a compiled
list of candidate libraries previously found in the augmented
library path. Libraries installed in hardware capabilities
directories (see below) are prefered to other libraries.
o In the default path /lib, and then /usr/lib.

Loading Linux libraries at runtime

I think a major design flaw in Linux is the shared object hell when it comes to distributing programs in binary instead of source code form.
Here is my specific problem: I want to publish a Linux program in ELF binary form that should run on as many distributions as possible so my mandatory dependencies are as low as it gets: The only libraries required under any circumstances are libpthread, libX11, librt and libm (and glibc of course). I'm linking dynamically against these libraries when I build my program using gcc.
Optionally, however, my program should also support ALSA (sound interface), the Xcursor, Xfixes, and Xxf86vm extensions as well as GTK. But these should only be used if they are available on the user's system, otherwise my program should still run but with limited functionality. For example, if GTK isn't there, my program will fall back to terminal mode. Because my program should still be able to run without ALSA, Xcursor, Xfixes, etc. I cannot link dynamically against these libraries because then the program won't start at all if one of the libraries isn't there.
So I need to manually check if the libraries are present and then open them one by one using dlopen() and import the necessary function symbols using dlsym(). This, however, leads to all kinds of problems:
1) Library naming conventions:
Shared objects often aren't simply called "libXcursor.so" but have some kind of version extension like "libXcursor.so.1" or even really funny things like "libXcursor.so.0.2000". These extensions seem to differ from system to system. So which one should I choose when calling dlopen()? Using a hardcoded name here seems like a very bad idea because the names differ from system to system. So the only workaround that comes to my mind is to scan the whole library path and look for filenames starting with a "libXcursor.so" prefix and then do some custom version matching. But how do I know that they are really compatible?
2) Library search paths: Where should I look for the *.so files after all? This is also different from system to system. There are some default paths like /usr/lib and /lib but *.so files could also be in lots of other paths. So I'd have to open /etc/ld.so.conf and parse this to find out all library search paths. That's not a trivial thing to do because /etc/ld.so.conf files can also use some kind of include directive which means that I have to parse even more .conf files, do some checks against possible infinite loops caused by circular include directives etc. Is there really no easier way to find out the search paths for *.so?
So, my actual question is this: Isn't there a more convenient, less hackish way of achieving what I want to do? Is it really so complicated to create a Linux program that has some optional dependencies like ALSA, GTK, libXcursor... but should also work without it! Is there some kind of standard for doing what I want to do? Or am I doomed to do it the hackish way?
Thanks for your comments/solutions!
I think a major design flaw in Linux is the shared object hell when it comes to distributing programs in binary instead of source code form.
This isn't a design flaw as far as creators of the system are concerned; it's an advantage -- it encourages you to distribute programs in source form. Oh, you wanted to sell your software? Sorry, that's not the use case Linux is optimized for.
Library naming conventions: Shared objects often aren't simply called "libXcursor.so" but have some kind of version extension like "libXcursor.so.1" or even really funny things like "libXcursor.so.0.2000".
Yes, this is called external library versioning. Read about it here. As should be clear from that description, if you compiled your binaries using headers on a system that would normally give you libXcursor.so.1 as a runtime reference, then the only shared library you are compatible with is libXcursor.so.1, and trying to dlopen libXcursor.so.0.2000 will lead to unpredictable crashes.
Any system that provides libXcursor.so but not libXcursor.so.1 is either a broken installation, or is also incompatible with your binaries.
Library search paths: Where should I look for the *.so files after all?
You shouldn't be trying to dlopen any of these libraries using their full path. Just call dlopen("libXcursor.so.1", RTLD_GLOBAL);, and the runtime loader will search for the library in system-appropriate locations.

Resources