Finding the load address of a shared library in Linux

Finding the load address of a shared library in Linux - linux

At runtime I need to print out an address, and then find which function that address is part of. The functions are in a shared library so are not at a fixed address. My map file obviously just shows the relative offsets for each shared library func. Is it possible at runtime to query where a library has been loaded, so that I can subtract that value from my address to get the correct map file offset?
Currently I'm doing a slightly hacky approch whereby I also print out the address of one function in the library, then find that function in the map file to figure out where the load address must be. I would rather have a generic method that didn't require you to name a reference function.
(GDB is not available in my setup).
Thanks.

On a recent linux, you can use dl_iterate_phdr to find out the addresses of the shared libs.

dladdr does this, end of. :-)
More comprehnsively, dladdr will take an address and work out which library and symbol it corresponds to... and then it'll give you the name of the library, the name of the symbol, and the base addresses of each. Personally I think that's nifty, it's also making my current debugging job a lit easier.
Hope this helps!

Try to take a look at /proc/[PID]/maps file.
This should contain the address of library mapping in process memory address space.
If you want to reach the executable portion, use readelf on your library and find the offset of .text section.

Check the System V ABI, chapter 5. For the lazy ones out there, here is the standard way to do this for systems supporting the ELF binary format:
#include <link.h>
off_t load_offset;
for (Elf64_Dyn *dyn = _DYNAMIC; dyn->d_tag != DT_NULL; ++dyn) {
if (dyn->d_tag == DT_DEBUG) {
struct r_debug *r_debug = (struct r_debug *) dyn->d_un.d_ptr;
struct link_map *link_map = r_debug->r_map;
while (link_map) {
if (strcmp(link_map->l_name, libname) == 0) {
load_offset = (off_t)link_map->l_addr;
break;
}
link_map = link_map->l_next;
}
break;
}
}
This does not rely on any GNU extension.
On GNU systems, the macro ElfW(Dyn) will return either Elf64_Dyn or Elf32_Dyn which is handy.

GNU backtrace library?
I wouldn't rely on hacky approaches for something like this. I don't think there's any guarantee that the contents of libraries are loaded into contiguous memory.

Related

i am playing with processes etc. but I dont know how to add "client.dll" to hex value

In cheat engine you can do "client.dll"+00D3AC5C and in reclass <client.dll>+00D3AC5C
how to do the same in python I am using ReadWriteMemory but I will soon change it for something more complex. Can you tell me please how to do it with RWM or with something other ?

According to the source code of that library, there's seemingly no way to get the base address of a process.
However you can get the base address by bypassing the library and doing it yourself via this method. Then, once you have the hex value of the base address, you can then simply add an offset to it, then use RWM's read() or get_pointer().

Locating and Editing Dynamic Symbol Table of Loaded Program?

My goal is explained in this question HERE
Is it possible to locate the address of a symbol's entry in the dynamic symbol table loaded into a program?
If we can locate it, can we edit it somehow? For example if the app made the call to a function named original_func then the control should actually come to my hook_func and from there I call the original_func.
Update:
Some code according to the answer by 'Employed Russian':
extern Elf32_Dyn _DYNAMIC[];
int i=0;
uint32_t DST_base_addr;
Elf32_Dyn *dyn;
for (dyn = _DYNAMIC; dyn->d_tag != DT_NULL; ++dyn)
{
if(dyn->d_tag==DT_SYMTAB)
{
DST_base_addr=dyn->d_un.d_ptr;
LOGE("Base address of dynamic symbol table is; 0x%x", DST_base_addr);
break;
}
}
Output: 0x148
1- Not sure what that 0x148 means. It's definitely not an absolute address.
2- Also, where can I find good listing of these useful pre-defined variables such as _DYNAMIC[] _GLOBAL_OFFSET_TABLE_ etc.? I wasn't very aware of such variables even when I went through ELF notes here and there.

Is it possible to locate the address of a symbol's entry in the dynamic symbol table loaded into a program?
Yes, it's pretty easy: iterate over elements of the _DYNAMIC[] array, until you find an element with .d_tag == DT_SYMTAB. The .d_un.d_ptr of that entry will point to the dynamic symbol table in memory.
To find a specific symbol, you will also need to refer to DT_STRTAB.
If we can locate it, can we edit it somehow?
Sure: it's just a memory location. You may need to mprotect it to be writable, but once you do, you can modify it to your heart's content.
However, most modifications will either have no effect, or cause your program to crash later.
For example if the app made the call to a function named original_func then the control should actually come to my hook_func and from there I call the original_func.
It's pretty difficult to achieve your stated goal using this particular method, and much easier methods exist.
Perhaps you are looking for this?

Get the end address of Linux kernel function on run-time

I am trying to get the boundary for a kernel function (system calls for example). Now, if I understand correctly, I can get the start address of the interested function by reading /proc/kallsyms or System.map but I dont know how to get the end address of this function.
As you may know, /proc/kallsyms allow us to view the symbol table for Linux kernel so we can see the start address of all exported symbols. Can we use the start address of the next function to calculate the end address of the previous function? If we cannot do like this, could you suggest me another ways?

Generally, executables store only the start address of a function, as it is all that is required to call the function. You will have to infer the end address, rather than simply looking it up.
You could try to find the start address of the subsequent function, but that wouldn't always work either. Imagine the following:
void func_a() {
// do something
}
static void helper_function() {
// do something else
}
void func_b() {
// ...
helper_function();
// ...
}
You could get the address of func_a and func_b, but helper_function would not show up, because nothing needs to link to it. If you tried to use func_b as the end of func_a (assuming that the order in the compiled code in equivalent to the order in the source code, which is not guaranteed), you would end up accidentally including code that you didn't need to include - and might not find code that you need to find when inlining other functions into func_b.
So, how do we find this information? Well, if you think about it - the information does exist - all of the paths within func_a will eventually terminate (in a loop, return statement, tail call, etc), probably before helper_function begins.
You would need to parse out the code of func_a and build up a map of all of the possible code paths within it. Of course, you would need to do this anyway to inline other functions into it - so it shouldn't be too much harder to simply not care about the end address of the function.
One final note: in this example, you would have trouble finding helper_function in order to know to inline it, because the symbol wouldn't show up in kallsyms. The solution here is that you can track the call instructions in individual functions to determine what hidden functions exist that you didn't know about otherwise.
TL;DR: You can only find the end address by parsing the compiled code. You have to parse this anyway, so just do it once.

Using the C preprocessor to determine current scope?

I am developing an application in C / Objective-C (No C++ please, I already have a solution there), and I came across an interesting use case.
Because clang does not support nested functions, my original approach will not work:
#define CREATE_STATIC_VAR(Type, Name, Dflt) static Type Name; __attribute__((constructor)) void static_ ## Type ## _ ## Name ## _init_var(void) { /* loading code here */ }
This code would compile fine with GCC, but because clang doesn't support nested functions, I get a compile error:
Expected ';' at end of declaration.
So, I found a solution that works for Clang on variables inside a function:
#define CREATE_STATIC_VAR_LOCAL(Type, Name, Dflt) static Type Name; ^{ /* loading code here */ }(); // anonymous block usage
However, I was wondering if there was a way to leverage macro concatenation to choose the appropriate one for the situation, something like:
#define CREATE_STATIC_VAR_GLOBAL(Type, Name, Dflt) static Type Name; __attribute__((constructor)) void static_ ## Type ## _ ## Name ## _init_var(void) { /* loading code here */ }
#define CREATE_STATIC_VAR_LOCAL(Type, Name, Dflt) static Type Name; ^{ /* loading code here */ }(); // anonymous block usage
#define SCOPE_CHOOSER LOCAL || GLOBAL
#define CREATE_STATIC_VAR(Type, Name, DFLT) CREATE_STATIC_VAR_ ## SCOPE_CHOOSER(Type, Name, Dflt)
Obviously, the ending implementation doesn't have to be exactly that, but something similar will suffice.
I have attempted to use __builtin_constant_p with __func__, but because __func__ is not a compile-time constant, that wasn't working.
I have also tried to use __builtin_choose_expr, but that doesn't appear to work at the global scope.
Is there something else I am missing in the docs? Seems like this should be something fairly easy to do, and yet, I cannot seem to figure it out.
Note: I am aware that I could simply type CREATE_STATIC_VAR_GLOBAL or CREATE_STATIC_VAR_LOCAL instead of messing with macro concatenation, but this is me attempting to push the limits of the compiler. I am also aware that I could use C++ and get this over with right away, but that's not my goal here.

#define SCOPE_CHOOSER LOCAL || GLOBAL
#define CREATE_STATIC_VAR(Type, Name, DFLT) CREATE_STATIC_VAR_ ## SCOPE_CHOOSER(Type, Name, Dflt)
The biggest difficulty here is that the C preprocessor works by textual substitution, so even if you figured out how to get SCOPE_CHOOSER to do what you want, you'd end up with a macro expansion that looked something like
CREATE_STATIC_VAR_LOCAL || GLOBAL(Type, Name, Dflt);
There's no way to get the preprocessor to "constant-fold" macro expansions during substitution; the only time things are "folded" is when they appear in #if expressions. So your only hope (modulo slight handwaving) is to find a single construction that will work both inside and outside of a function.
Can you explain more about the ultimate goal here? I don't think you can load the variable's initial value with __attribute__((constructor)), but maybe there's a way to load the initial value the first time the function body is entered... or register all the addresses of these variables into a global list at compile-time and have a single __attribute__((constructor)) function that traverses that list... or some mishmash of those approaches. I don't have any specific ideas in mind, but maybe if you give more information something will emerge.
EDIT: I don't think this helps you either, since it's not a preprocessor trick, but here is a constant-expression that will evaluate to 0 at function scope and 1 at global scope.
#define AT_GLOBAL_SCOPE __builtin_types_compatible_p(const char (*)[1], __typeof__(&__func__))
However, notice that I said "evaluate" and not "expand". These constructs are compile-time, not preprocessing-time.

Inspired by the #Qxuuplusone answer.
The suggested macro for AT_GLOBAL_SCOPE does indeed work (in GCC), but causes a compiler warning (and I am pretty sure it cannot be silenced by Diagnostic Pragma because it's created by pedwarn with a test here).
Unless you turn on -w you will always see these warnings and have, in the back of your mind, a horrible feeling that you probably shouldn't be doing whatever it is that you are doing.
Fortunately, there is a solution that can silence these lingering doubts.
In the Other Builtins section, there is __builtin_FUNCTION with this very interesting description (emphasis mine):
This function is the equivalent of the __FUNCTION__ symbol and returns an address constant pointing to the name of the function from which the built-in was invoked, or the empty string if the invocation is not at function scope.
It turns out, at least in version 8.3 of GCC, you can do this:
#define AT_GLOBAL_SCOPE (__builtin_FUNCTION()[0] == '\0')
This still probably won't answer the original question, but until GCC decides this too will cause a warning (it kind of seems like it's intentionally designed not to though), it lets me continue doing questionable things using macros without anything to warn me that it's a bad idea.

Magic numbers of the Linux reboot() system call

The Linux Programming Interface has an exercise in Chapter 3 that goes like this:
When using the Linux-specific reboot()
system call to reboot the system, the
second argument, magic2, must be
specified as one of a set of magic
numbers (e.g., LINUX_REBOOT_MAGIC2).
What is the significance of these
numbers? (Converting them to
hexadecimal provides a clue.)
The man page tells us magic2 can be one of LINUX_REBOOT_MAGIC2 (672274793), LINUX_REBOOT_MAGIC2A (85072278), LINUX_REBOOT_MAGIC2B (369367448), or LINUX_REBOOT_MAGIC2C (537993216). I failed to decipher their meaning in hex. I also looked at /usr/include/linux/reboot.h, which didn't give any helpful comment either.
I then searched in the kernel's source code for sys_reboot's definition. All I found was a declaration in a header file.
Therefore, my first question is, what is the significance of these numbers? My second question is, where's sys_reboot's definition, and how did you find it?
EDIT: I found the definition in kernel/sys.c. I only grepped for sys_reboot, and forgot to grep for the MAGIC numbers. I figured the definition must be hidden behind some macro trick, so I looked at the System.map file under /boot, and found it next to ctrl_alt_del. I then grepped for that symbol, which led me to the correct file. If I had compiled the kernel from source code, I could try to find which object file defined the symbol, and go from there.

Just a guess, but those numbers look more interesting in hex:
672274793 = 0x28121969
85072278 = 0x05121996
369367448 = 0x16041998
537993216 = 0x20112000
Developers' or developers' children's birthdays?
Regarding finding the syscall implementation, I did a git grep -n LINUX_REBOOT_MAGIC2 and found the definition in kernel/sys.c. The symbol sys_reboot is generated by the SYSCALL_DEFINE4(reboot, ... gubbins, I suspect.

It's the birthday of Linus Torvalds (The developer of the Linux kernel and the Git version control) and his 3 daughters. works as magic numbers to reboot the system.
http://en.wikipedia.org/wiki/Linus_Torvalds

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Finding the load address of a shared library in Linux - linux

On a recent linux, you can use dl_iterate_phdr to find out the addresses of the shared libs.

Try to take a look at /proc/[PID]/maps file. This should contain the address of library mapping in process memory address space. If you want to reach the executable portion, use readelf on your library and find the offset of .text section.

GNU backtrace library? I wouldn't rely on hacky approaches for something like this. I don't think there's any guarantee that the contents of libraries are loaded into contiguous memory.

Related

i am playing with processes etc. but I dont know how to add "client.dll" to hex value

Locating and Editing Dynamic Symbol Table of Loaded Program?

Get the end address of Linux kernel function on run-time

Using the C preprocessor to determine current scope?

Magic numbers of the Linux reboot() system call

Categories

Resources