Any way for nasm to automatically add used external symbols?

Any way for nasm to automatically add used external symbols? - linux

Is there any way for NASM to generate extern declarations automatically for any undeclared symbol it finds?
What I have now:
a source file for my app
a large include with all the GL_ defines and ~500 of OpenGL functions declared as extern gl* (made from gl.h)
This of course generates an adequately large relocation table as a result.
But I am using only a small percentage of all the declared OpenGL functions and I would like to include only those that I have used, possibly without having to declare the externs by myself at all?
Or should I stick to my previous way of doing this and manually declaring every new extern as soon as I need it?
I did read the NASMdocs and did a search, but all I found is how to declare externs and how to use them between .o files, which is not the problem.

After some more research I managed to find a solution to this: stripping.
First, I build my objects and executable as normal, including in them the large relocation tables. But at each step I add strip --strip-unneeded - both for intermediate objects and for the final executable.
What it does, it leaves only the symbols that were actually used in the objects, while removing all the others. It does add a bit of overhead to the compilation process, but it's mostly unnoticeable.
I have also found that stripping both the .o files, as well as the executable produces the smallest file. I suppose there are new symbols added while linking and some of them are unused and thus can be stripped.
It's very easy to do by adding this to the Makefile.

Related

gnu linker doesn't include unreferenced modules in a shared library

I have a shared library that consists of quite a few .c modules, some of which are themselves linked into the shared library from other static .a libraries. Most of these are referenced internally within the library, but some are not. I'm finding that the linker does not include those modules in the shared library unless there is at least one call to a function in the module from within the shared library. I've been working around this problem by adding calls in a dummy ForceLinkages() function in a module that I know will be included.
That's okay, but it's surprising, since I'm using a .version file to define a public API to this library. I would've thought including functions in those unreferenced .c modules in the .version file would constitute a reference to the modules and force them to be included in the library.
This library was originally developed on AIX, which uses a .exp file to define the public API. And there, I've never had the issue of unreferenced modules not getting included. I.e., referencing the modules in the .exp file was enough to get the linker to pull them in. Is there a way to get the linux linker to work like that. If not, I can continue to use my silly ForceLinkages() function to get the job done...

That's okay, but it's surprising, since I'm using a .version file to define a public API to this library.
The .version file (assuming it's a linker version script) does not define the library API. If only determines which functions are exported from the library (and with which version label) and which are hidden.
I would've thought including functions in those unreferenced .c modules in the .version file would constitute a reference to the modules and force them to be included in the library.
The version script is applied after the linker has decided which objects are going to be part of the library, and which are going to be discarded, and has no effect on the decisions taken earlier.
This is all working as designed.
You need to either use --whole-archive --no-whole-archive (this has a danger of linking in code you don't need and bloating your binaries), or keep adding references as you've done before.

Interpose statically linked binaries

There's a well-known technique for interposing dynamically linked binaries: creating a shared library and and using LD_PRELOAD variable. But it doesn't work for statically-linked binaries.
One way is to write a static library that interpose the functions and link it with the application at compile time. But this isn't practical because re-compiling isn't always possible (think of third-party binaries, libraries, etc).
So I am wondering if there's a way to interpose statically linked binaries in the same LD_PRELOAD works for dynamically linked binaries i.e., with no code changes or re-compilation of existing binaries.
I am only interested in ELF on Linux. So it's not an issue if a potential solution is not "portable".

One way is to write a static library that interpose the functions and link it with the application at compile time.
One difficulty with such an interposer is that it can't easily call the original function (since it has the same name).
The linker --wrap=<symbol> option can help here.
But this isn't practical because re-compiling
Re-compiling is not necessary here, only re-linking.
isn't always possible (think of third-party binaries, libraries, etc).
Third-party libraries work fine (relinking), but binaries are trickier.
It is still possible to do using displaced execution technique, but the implementation is quite tricky to get right.

I'll assume you want to interpose symbols in main executable which came from a static library which is equivalent to interposing a symbol defined in executable. The question thus reduces to whether it's possible to intercept a function defined in executable.
This is not possible (EDIT: at least not without a lot of work - see comments to this answer) for two reasons:
by default symbols defined in executable are not exported so not accessible to dynamic linker (you can alter this via -export-dynamic or export lists but this has unpleasant performance or maintenance side effects)
even if you export necessary symbols, ELF requires executable's dynamic symtab to be always searched first during symbol resolution (see section 1.5.4 "Lookup Scope" in dsohowto); symtab of LD_PRELOAD-ed library will always follow that of executable and thus won't have a chance to intercept the symbols

What you are looking for is called binary instrumentation (e.g., using Dyninst or ptrace). The idea is you write a mutator program that attaches to (or statically rewrites) your original program (called mutatee) and inserts code of your choice at specific points in the mutatee. The main challenge usually revolves around finding those insertion points using the API provided by the instrumentation engine. In your case, since you are mainly looking for static symbols, this can be quite challenging and would likely require heuristics if the mutatee is stripped of non-dynamic symbols.

Given two Linux static libraries, how to tell if one depends on the other?

I have a bunch of .a files whose generation process is not controlled by me, nor are their sources. When I use them for linking, I want to know their dependencies (libA.a depends on libB.a if there is some symbol undefined in libA.a but defined in libB.a), so that I can put them in the correct order in the ld/gcc command line.
I don't want to do over linking (i.e. specify those libraries twice), because I want to persist those dependencies into BUILD file of bazel, so I want to know the precise dependency.
I wonder if there is some command line tool, given libA.a and libB.a, can tell whether libA.a depends on libB.a? If there is not such, how do I write such a script?
Note: my definition for dependency may not be 100% accurate. Let me know if there are other types of dependency other than defined/undefined symbols.
The simplest way is to process the output of nm libA.a and nm libB.a and look for U symbols, but there are many types of symbols listed in man nm, each of them have different semantic, so I am concerned I might miss some if I use such simplified approach.

I would use the approach beginning with U symbols. In practice, the uppercase symbol types are all you need to be concerned with (those are what you link against). I wrote scripts to print the exported and imported symbols, and for this case, it would be enough to do
exports libB.a >libB-exports
externs libA.a >libA-externs
comm libB-exports libA-externs >libA-needs-libB
to list symbols where libA would use a symbol from libB (the lists are sorted, so comm should "just work"). If those were shared libraries, the scripts would have to be modified (adding a -D option to `nm).
Further reading:
exports script to show which symbols are exported from a collection of object files
externs display all external symbols used by a collection of object files
download-link

Is it valid to link non PIC objects into an executable with PIC objects

I'm adding a thread local variable to a couple of object files that are always linked directly to executables. These objects will never be included inside a shared library (and it's safe to assume this will hold true for the foreseeable future). This means the -fPIC flag is not required for these objects, correct?
Our codebase has the -fPIC flag for all objects by default. Many of these are included in shared libraries so the use of -fPIC makes sense. However, this flag presents an issue debugging the new thread local variable because GDB crashes while stepping over thread local variable with -fPIC. If I remove -fPIC from those few object files with the new thread local variable, I can debug properly.
I can't find any authoritative statements that mixing non-PIC objects with PIC objects in an executable is okay. My testing thus far shows it's okay, but it does not feel kosher, and online discussion is generally "do not mix PIC and non PIC" due to the shared library case.
Is it safe to link non PIC objects into an executable built with PIC objects and libraries in this case? Maybe there is an authoritative statement from GCC docs on this being safe, but I cannot find it.
EDIT: Binary patching gcc to avoid this bug is not a solution in the short-term. Switching compiler on Linux is not a possible solution.

Except for Bugs like the above it should be fine. I cant deliver you references to definitive documents describing this, but only speak from experience.
gcc (or the assembler) will produce different code when you specify -fPIC, but the resulting code still uses standardized relocation symbols.
For linking pieces together, this doesnt matter at first, a linker will just stubbornly string everything together and doesnt know whether the code denotes PIC on non-PIC code. I know this because I work with systems which dont support shared libraries, and I had to wrap my own loaders.
The final point tough, is that you can tell the linker if the resulting object should be a shared library or not. Only then will the linker generate some (OS-specific) Structures and symbols to denote im-/exports.
Otherwise the linker will just finish its work, the primary difference is that missing symbols will result in an error.
The clean separation between Compiler + Linker should guarantee that the flags should not matter (outside of performance differences). I would be careful with LTO tough, this had several problems with different compiler-settings in the past.
As said, I spent some time investigating this and red several docs about ELF and dynamic loaders. You will find an explicit mention of linking PIC/non-PIC nowhere, but the linking process really doesn`t care about the compiler-settings for the inputs, valid code will stay valid code.
If you want to link non-PIC code to a shared library (PIC), the linker will quit if absolute relocation`s are encountered (which is very likely).
If you want to link any code to a program, you are only limited to what the final program can deal with. On a OS supporting PIC you can use anything, otherwise the linker might complain about missing symbols or unsupported sections/relocation types.

It is possible almost always, but sometimes it requires some tricks

how to resolve weak symbols at link time (not load time) inside a shared library

I've had another accident of a shared library finding some symbols somewhere else
than inside itself.
How can I prevent this?
I'm already using -fvisibility=hidden.
It looks like that all template functions are compiled as weak symbols and only resolved at load time.
I'm already using RTLD_DEEPBIND to avoid this problem -- but purify ignores this option.

It seems the solution to this problem is the objcopy command from GNU binutils.
It allows one to change symbol attributes.
The option to use would potentially be
--localize-symbols=filename
or
--globalize-symbols=filename
Another way is to use the g++ compiler option -fno-weak -- but the g++ man page discourages the use of this option -- I'm not certain why -- potentially certain symbols from the C++ library must end up as weak.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string