dynamic linker dependency info embedded in an archive

dynamic linker dependency info embedded in an archive - linux

Dynamic libraries are nice. The have embedded information in them that help the runtime linker figure what other libraries the final executable needs to load. It also tells the executable what symbols will be loaded
Static libraries, however, are a pain in the neck. The linker won't automatically link an archive's dependencies. This get heinous when statically linking a library that is moderately complicated or has a deep dependency graph.
So why can't archives just include their dependency information? I tried to do just that. The key to my idea is /usr/lib/libc.so. This file is not a shared object file but a linker script.
Linker scripts give you a lot of control over the final linker output, but all I want is to specify dependencies which you can with:
INPUT( -ldependency -ldependecy2 )
Here are my questions.
This seems pretty simple, why hasn't this been done before? I can't be the first person who thought of this. This seems easier and more intuitive than pkg-config or libtool (especially libtool ugh).
I tried to embed the linker script in the archive, it doesn't work. Why not? Can it be made to work? I suspect some clever use of ranlib might do the trick, but it is beyond me.
My current solution is create a linker script called libMyLibrary.a. It contains
INPUT( -lMyRealLibrary -ldependency1 -ldependency2 )
I have to put in the dependencies by hand, it would be nice if ld could for me, but that is problem for another day.

To answer the second part of your question, ar p ARCHIVE SCRIPT will cat the contents of SCRIPT contained in ARCHIVE. Or ar x ARCHIVE SCRIPT to extract the file and then pass it to the linker like ld blah blah -TSCRIPT.

Linker scripts of this form are supported only by GNU-ld (the new Gold ELF linker has limited support as well). If you have any portability requirements, you better not depend on linker script support.
libtool is written specifically to help with portability in creating shared libraries (details of which vary greatly between platforms).
If you don't need portability, then of course you don't need libtool and its complexity.

Related

Any tools to find order of .o files to be linked in a project using gcc

I am porting vc++ project to work on the Linux platform i am using g++ as my compiler. i resolved compiling issues on g++ and able to generate .o files for every source file in vc++ project now i have to link them to produce final executable
i can do that by
g++ file1.o file2.o -o file.out
but when i do that in my make file and execute it a lot of ld errors are coming dueto dependency's
is there any way i can figure it out order of giving the object files ?
are there any tools to do that or any vc++ project files which have the order ?

You say "vc++", but you are using "gcc" (usually that would be g++"). Likely you are missing one or more libraries, which you would specify with a "-l" option (documented as part of ld as well as gcc).
The distinction is important, because each wrapper (gcc and g++) adds the corresponding runtime library to the options it passes to ld.
The order of shared libraries (the most common form with Linux) supposedly does not matter (the linker makes two passes to resolve symbols). A while back, before shared libraries were common, I wrote a program (named liborder, and mentioned here) which analyzes a collection of ".o" objects and "-l" (static libraries) to print a recommended order for the "-l" options. That was workable for small programs, but not for complex ones. For example, Oracle's runtime libraries around 20 years ago were all static, and one needed a list of 15-20 libraries in the proper order to successfully link. My program could not handle that. However, since then, shared libraries (which do not have the problem with ordering) are common enough that I have not bothered to package liborder for use by others (it's still on a to-do list with a dozen other programs).
If your program uses symbols which are not in the standard library for C/C++, then you have to determine that yourself. I suppose one could have a program that searches all of the development libraries for a given symbol, but that appears wasteful, since only a tiny fraction would be pertinent. I see 200 of these in my /usr/lib.
Rather, I make it easy for me to see what my program is missing, by presenting the symbols from nm in readable form -
For C, I use scripts (here as "externs" and "imports") to check which symbols are exported or imported from a collection of ".o" files. The scripts use the output of the nm program, which shows the given symbols.
For C++, there's an option "-C" of nm which shows the unmangled names of symbols.

Why does uClibc UCLIBC_BUILD_NOEXECSTACK not actually use the linker flag -Wl,-z,noexecstack

One modern Linux security hardening tactic is to compile & link code with the option -Wl,-z-noexecstack, this marks the DLL or binary as not needing an executable stack. This condition can be checked using readelf or other means.
I have been working with uClibc and noticed that it produces objects (.so files) that do not have this flag set. Yet uClibc has a configuration option UCLIBC_BUILD_NOEXECSTACK which according to the help means:
Mark all assembler files as noexecstack, which will mark uClibc
as not requiring an executable stack. (This doesn't prevent other
files you link against from claiming to need an executable stack, it
just won't cause uClibc to request it unnecessarily.)
This is a security thing to make buffer overflows harder to exploit.
...etc...
On some digging into the Makefiles this is correct - the flag is only applied to the assembler.
Because the flag is only passed to the assembler does this mean that the uClibc devs have missed an important hardening flag? There are other options, for example UCLIBC_BUILD_RELRO which do result in the equivalent flag being added to the linker (as -Wl,-z,relro)
However a casual observer could easily misread this and assume, as I originally did, that UCLIBC_BUILD_NOEXECSTACK is actually marking the .so file when it is in fact not. OpenWRT for example ensures that that flag is set when it builds uClibc.
Why would uClibc not do things the 'usual' way? What am I missing here? Are the libraries (e.g. librt.so, libpthread.so, etc) actually not NX?
EDIT
I was able to play with the Makefiles and get the noexecstack bit by using the -Wl,-z,noexecstack argument. So why would they not use that as well?

OK, it turns out after list conversation and further research that:
the GNU linker sets the DLL / executable stack state based on the 'lowest common denominator' i.e. if any linked or referenced part has an exec stack then the whole object is set this way
the 'correct' way to resolve this problem is actually to find and fix assembly / object files that use an exec stack when they dont need to.
Using the linker to 'fix' things is a workaround if you can't otherwise fix the root cause.
So for uClibc solution is to submit a bug so that the underlying objects get fixed. Otherwise anything linked with static libraries wont get a non-exec stack.
For my own question, if building a custom firmware not using any static libraries it is possibly sufficient to use the linker flag.
References:
Ubuntu Security Team - Executable Stacks

A question about how loader locates libraries at runtime

Only a minimum amount of work is done
at compile time by the linker; it only
records what library routines the
program needs and the index names or
numbers of the routines in the
library. (source)
So it means ld.so won't check all libraries in its database,only those recorded by the application programe itself, that is to say, only those specified by gcc -lxxx.
This contradicts with my previous knowledge that ld.so will check all libraries in its database one by one until found.
Which is the exact case?

I will make a stab at answering this question...
At link time the linker (not ld.so) will make sure that all the symbols the .o files that are being linked together are satisfied by the libraries the program is being linked against. If any of those libraries are dynamic libraries, it will also check the libraries they depend on (no need to include them in the -l list) to make sure that all of the symbols in those libraries are satisfied. And it will do this recursively.
Only the libraries the executable directly depends on via supplied -l parameters at link time will be recorded in the executable. If the libraries themselves declared dependencies, those dependencies will not be recorded in the executable unless those libraries were also specified with -l flags at link time.
Linking happens when you run the linker. For gcc, this usually looks something like gcc a.o b.o c.o -lm -o myprogram. This generally happens at the end of the compilation process. Underneath the covers it generally runs a program called ld. But ld is not at all the same thing as ld.so (which is the runtime loader). Even though they are different entities they are named similarly because they do similar jobs, just at different times.
Loading is the step that happens when you run the program. For dynamic libraries, the loader does a lot of jobs that the linker would do if you were using static libraries.
When the program runs, ld.so (the runtime loader) actually hooks the symbols up on the executable to the definitions in the shared library. If that shared library depends on other shared libraries (a fact that's recorded in the library) it will also load those libraries and hook things up to them as well. If, after all this is done, there are still unresolved symbols, the loader will abort the program.
So, the executable says which dynamic libraries it directly depends upon. Each of those libraries say which dynamic libraries they directly depend upon, and so forth. The loader (ld.so) uses that to decide which libraries to look in for symbols. It will not go searching through random other libraries in a 'database' to find the appropriate symbols. They must be in libraries that are in the dependency chain.

How to statically-link a complex program

In Linux, downloaded a program source and want it to be statically linked.
Have a huge Makefile there,
I
./configure
make
to compile.
prehpes it's a bit too general to ask, but how can I make the binary statically linked?
EDIT: the reason for this is wanting to make sure the binary will
have no dependencies (or at least as few as possible), making it possible to run on any Linux based computer, even one without Internet connection, and non-updated Linux.

Most autoconf generated configure script will allow you to make a static build:
./configure --enable-static
make
If that doesn't work, you may be able to pass linker flags in via LDFLAGS, like this:
./configure LDFLAGS=-static

Yeah, you need to edit the make file and add the -static parameter to gcc during the link.

I assume it's using gcc to compile a series of c programs, although you will have to look in the Makefile to find out.
If so, you can adjust the gcc lines in the makefile to do static linking, although depending upon the structure of the program, this may be a complex change. Take a look at man gcc to see how this is done.
I'd be interested to know why you are statically linking. Have you considered using prelinking instead?
You should be aware that there may be licence problems to doing this if all components are not GPL.

If you cannot compile a static binary, I've had good results using Statifier.

What LD stand for on LD_LIBRARY_PATH variable on *unix?

I know that LD_LIBRARY_PATH is a environment variable where the linker will look for the shared library (which contains shared objects) to link with the executable code.
But what does the LD Stands for, is it for Load? or List Directory?

Linker. The *nix linker is called ld. When a program with dynamic libraries is linked, the linker adds additional code to look for dynamic libraries to resolve symbols not statically linked. Usually this code looks in /lib and /usr/lib. LD_LIBRARY_PATH is a colon separated list of other directories to search.
"ldd" is a handy program to see where the libraries are: try "ldd /bin/ls", for example.
It could also mean "Loader", though. ;-)
Editorial:
As a (semi) interesting side-note: I think dynamic libraries will go away someday. They were needed when disk space and system memory was scarce. There is a performance hit to using them (i.e. the symbols need to be resolved and the object code edited). In these modern days of 3GB memory and 7 second bootup times, it might be appropriate to go back to static linking.
Except for the fact that every C++ program would magically grow to 3MB. ;-)

LD_LIBRARY_PATH - stands for LOAD LIBRARY PATH or some times called as LOADER LIBRARY PATH

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string