Access Linux kernel symbols that are not exported via EXPORT_SYMBOL*

Access Linux kernel symbols that are not exported via EXPORT_SYMBOL* - linux

We have a need to access kernel global vars in net/ipv4/af_inet.c that are not exported explicitly from a loadable kernel module. We are using 2.6.18 kernel currently.
kallsyms_lookup_name doesn't appear to be available anymore (not exported)
__symbol_get returns NULL (upon further reading, symbol_get/__symbol_get looks through the kernel and existing modules' symbol tables that contains only exported symbol, and it is there to make sure the module from which a symbol is exported is actually loaded)
Is there anyway to access symbols that are not exported from a kernel module?
After doing a lot of reading and looking at answers people provided, it appears that it would be very hard to find one method across many kernel versions since the kAPI changes significantly over time.

You can use the method you mentioned before by getting it from /proc/kallsyms or just use the address given in the System.map (which is the same thing), it may seem hackish but this is how I've seen it done before (never really had to do it myself). Either this or you can build your own custom kernel where you actually do EXPORT_SYMBOL on whatever it is you want exported but this is not as portable.

If performance is not a big concern, you can traverse the whole list of symbols with kallsyms_on_each_symbol() (exported by the kernel for GPL'd modules) and check the names to get the ones you need. I would not recommend doing so unless there is no other choice though.
If you would like to go this way, here is an example from one of our projects. See the usage of kallsyms_on_each_symbol() as well as the code of symbol_walk_callback(), the other parts are irrelevant to this question.

Related

Interpose statically linked binaries

There's a well-known technique for interposing dynamically linked binaries: creating a shared library and and using LD_PRELOAD variable. But it doesn't work for statically-linked binaries.
One way is to write a static library that interpose the functions and link it with the application at compile time. But this isn't practical because re-compiling isn't always possible (think of third-party binaries, libraries, etc).
So I am wondering if there's a way to interpose statically linked binaries in the same LD_PRELOAD works for dynamically linked binaries i.e., with no code changes or re-compilation of existing binaries.
I am only interested in ELF on Linux. So it's not an issue if a potential solution is not "portable".

One way is to write a static library that interpose the functions and link it with the application at compile time.
One difficulty with such an interposer is that it can't easily call the original function (since it has the same name).
The linker --wrap=<symbol> option can help here.
But this isn't practical because re-compiling
Re-compiling is not necessary here, only re-linking.
isn't always possible (think of third-party binaries, libraries, etc).
Third-party libraries work fine (relinking), but binaries are trickier.
It is still possible to do using displaced execution technique, but the implementation is quite tricky to get right.

I'll assume you want to interpose symbols in main executable which came from a static library which is equivalent to interposing a symbol defined in executable. The question thus reduces to whether it's possible to intercept a function defined in executable.
This is not possible (EDIT: at least not without a lot of work - see comments to this answer) for two reasons:
by default symbols defined in executable are not exported so not accessible to dynamic linker (you can alter this via -export-dynamic or export lists but this has unpleasant performance or maintenance side effects)
even if you export necessary symbols, ELF requires executable's dynamic symtab to be always searched first during symbol resolution (see section 1.5.4 "Lookup Scope" in dsohowto); symtab of LD_PRELOAD-ed library will always follow that of executable and thus won't have a chance to intercept the symbols

What you are looking for is called binary instrumentation (e.g., using Dyninst or ptrace). The idea is you write a mutator program that attaches to (or statically rewrites) your original program (called mutatee) and inserts code of your choice at specific points in the mutatee. The main challenge usually revolves around finding those insertion points using the API provided by the instrumentation engine. In your case, since you are mainly looking for static symbols, this can be quite challenging and would likely require heuristics if the mutatee is stripped of non-dynamic symbols.

How to resolve current process symbols in LLVM MCJIT based JIT?

I'm creating a simple MCJIT based JIT (implementing Kaleidoscope tutorial in Rust to be more precise). I'm using SectionMemoryManager::getSymbolAddress for symbol resolution. It sees symbols from libraries (e.g. sin function), but fails to resolve functions from my program (global, visible with nm, marked there by T). Is this the expected behavior? Or should it be some error in my code?
If this is the expected behavior, how should I properly resolve symbols from the current process? I'm adding symbols from the process with LLVMAddSymbol now, so resolution starts to work. Is this the right solution?
For those, who'll read my code. The problem with symbols is not related with the name mangling, as when I tried to make SectionMemoryManager::getSymbolAddress work, I used no_mangle directive, so they were named properly.

Thanks to Lang Hames, he has answered my question in other place. I cite the answer here for the case if somebody will look at the same problem as me:
In answer to your question: SectionMemoryManager::getSymbolAddress eventually (through the RTDyldMemoryManager base class) makes a call to llvm::sys::DynamicLibrary::SearchForAddressOfSymbol, which searches all previously loaded dynamic libraries for the symbol. You can call to llvm::sys::DynamicLibrary::LoadLibraryPermanently(nullptr) as part of your JIT initialisation (before any calls to getSymbolAddress) to import the program's symbols into DynamicLibrary's symbol tables.
If you really want to expose all functions in your program to the JIT'd code this is a good way to go. If you only want to expose a limited set of runtime functions you can put them in a shared library and just load that.

How can I convert dynamically linked application to statically one?

I have an application, say gedit, which is dynamically linked and I don't have the source code. So I can not compile it as I like. what I want to do is to make it statically linked and move it to the system which doesn't have the necessary libraries to run that application. So is it possible to do it and how?

It is theoretically possible. You basically have to do the same job that the dynamic linker does, with some modifications, i.e.
dump all sections from the original file
resolve symbols
locate libraries
instead of loading them into memory, assemble them into a "virtual image"
resolve internal links
dump the whole thing in a independent file.
So objdump, readelf, and objcopy will be some of your friends.
The task is not easy and the result will be neither automatic, nor (probably) stable.
You may want to check out this code by someone else that tried the same, by actually intercepting the dynamic linker (i.e. all steps above, except the last) and dumping the results to disk.
It is based on this tool, so it's anyone's bet whether it works on the newest kernels.
(It probably doesn't - and you need at least to patch it to reflect the new structures. This is my attempt at doing so. Caveat emptor).

Loading Linux libraries at runtime

I think a major design flaw in Linux is the shared object hell when it comes to distributing programs in binary instead of source code form.
Here is my specific problem: I want to publish a Linux program in ELF binary form that should run on as many distributions as possible so my mandatory dependencies are as low as it gets: The only libraries required under any circumstances are libpthread, libX11, librt and libm (and glibc of course). I'm linking dynamically against these libraries when I build my program using gcc.
Optionally, however, my program should also support ALSA (sound interface), the Xcursor, Xfixes, and Xxf86vm extensions as well as GTK. But these should only be used if they are available on the user's system, otherwise my program should still run but with limited functionality. For example, if GTK isn't there, my program will fall back to terminal mode. Because my program should still be able to run without ALSA, Xcursor, Xfixes, etc. I cannot link dynamically against these libraries because then the program won't start at all if one of the libraries isn't there.
So I need to manually check if the libraries are present and then open them one by one using dlopen() and import the necessary function symbols using dlsym(). This, however, leads to all kinds of problems:
1) Library naming conventions:
Shared objects often aren't simply called "libXcursor.so" but have some kind of version extension like "libXcursor.so.1" or even really funny things like "libXcursor.so.0.2000". These extensions seem to differ from system to system. So which one should I choose when calling dlopen()? Using a hardcoded name here seems like a very bad idea because the names differ from system to system. So the only workaround that comes to my mind is to scan the whole library path and look for filenames starting with a "libXcursor.so" prefix and then do some custom version matching. But how do I know that they are really compatible?
2) Library search paths: Where should I look for the *.so files after all? This is also different from system to system. There are some default paths like /usr/lib and /lib but *.so files could also be in lots of other paths. So I'd have to open /etc/ld.so.conf and parse this to find out all library search paths. That's not a trivial thing to do because /etc/ld.so.conf files can also use some kind of include directive which means that I have to parse even more .conf files, do some checks against possible infinite loops caused by circular include directives etc. Is there really no easier way to find out the search paths for *.so?
So, my actual question is this: Isn't there a more convenient, less hackish way of achieving what I want to do? Is it really so complicated to create a Linux program that has some optional dependencies like ALSA, GTK, libXcursor... but should also work without it! Is there some kind of standard for doing what I want to do? Or am I doomed to do it the hackish way?
Thanks for your comments/solutions!

I think a major design flaw in Linux is the shared object hell when it comes to distributing programs in binary instead of source code form.
This isn't a design flaw as far as creators of the system are concerned; it's an advantage -- it encourages you to distribute programs in source form. Oh, you wanted to sell your software? Sorry, that's not the use case Linux is optimized for.
Library naming conventions: Shared objects often aren't simply called "libXcursor.so" but have some kind of version extension like "libXcursor.so.1" or even really funny things like "libXcursor.so.0.2000".
Yes, this is called external library versioning. Read about it here. As should be clear from that description, if you compiled your binaries using headers on a system that would normally give you libXcursor.so.1 as a runtime reference, then the only shared library you are compatible with is libXcursor.so.1, and trying to dlopen libXcursor.so.0.2000 will lead to unpredictable crashes.
Any system that provides libXcursor.so but not libXcursor.so.1 is either a broken installation, or is also incompatible with your binaries.
Library search paths: Where should I look for the *.so files after all?
You shouldn't be trying to dlopen any of these libraries using their full path. Just call dlopen("libXcursor.so.1", RTLD_GLOBAL);, and the runtime loader will search for the library in system-appropriate locations.

Finding the shared library name to use with dlload

In my open-source project Artha I use libnotify for showing passive desktop notifications to the user.
Instead of statically linking libnotify, a lookup at runtime is made for the shared object (.so) file via dlload, if available on the target machine, Artha exposes the notification feature in it's GUI. On app. start, a call to dlload with filename param as libnotify.so.1 is made and if it returns a non-null pointer, then the feature is exposed.
A recurring problem with this model is that every time the version number of the library is bumped, Artha's code needs to be updated, currently libnotify.so.4 is the latest to entail such an occurance.
Is there a linux system call (irrespective of the distro the app. is running on), which can tell me if a particular library's shared object is available at runtime? I know that there exists the bruteforce option of enumerating the library by going from 1 to say 10, I find the solution ugly and inelegant.
Also, if this can be addressed via autoconf, then that solution is welcome too I.e. at build time, based on the target machine, the configure.h generated should've the right .so name that can be passed to dlload.
P.S.: I think good distros follow the style of creating links to libnotify.so.x so that a programmer can just do dlload("libnotify.so", RTLD_LAZY) and the right version numbered .so is loaded; unfortunately not all distros follow this, including Ubuntu.

The answer is: you don't.
dlopen() is not designed to deal with things like that, and trying to load whichever soversion you find on the system just because it happens to have the symbols you need is not a good way to do it.
Different sonames have different ABIs, and different ABIs means that you may be calling the same exact symbol name that is expecting a different set (or different size) of parameters, which will cause crashes or misbehaviour that are extremely difficult do debug.
You should have a read on how shared object versions work and what an ABI is.
The libfoo.so link is there for the link editor (ld) and is usually installed with the -devel packages for that reason; it might also very well not be a link but rather a text file with a linker script, often times on purpose to avoid exactly what you're trying to do.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string