How to determine the memory address range of Linux-Kernel objects - linux

I want to inspect the memory where the functions from kernel/bpf/verifier.c are loaded into.
After compilation to verifier.o the object is linked "into" the kernel. At /proc/kallsyms only non static functions are listed. However I want the addresses of all functions defined in that c file. If kaslr is turned off they should lie sequentially in the kernel space or?
If so is there a way to determine the address range?
Thanks

The solution for my particular problem is to compile the kernel with debug_info=y and obtain the address range from the large vmlinux binary with readelf -Ws vmlinux
However the kernel needs to be bootet with nokalsr or kalsr turned off in the config at compile time.

Related

how to reserve a particular range of virtual memory from a linux process

i86-32 bits system:
Is there a way to reserve a particular range of virtual address space in a process memory map to stop ld.so (dynamic linker) from loading any shared objects into that range?
I want to use at least 2 1G virtual memory to map the two 1G huge pages, however, ld.so load the shared library in the middle, so I can't map the 1G huge pages.
Compiler can't do this job. linker scripts can't as well. ld.so is loaded into the executable by the loader, then ld.so loads other shared libraries. however, ld.so itself even in the middle of the mapped space.
entry point of ld.so and libc.so are at a higher address, which can't be changed for our application.
Entry point address: 0x46c38810
Thanks,
Jiangtao
ld.so is loaded into the executable by the loader,
No: ld.so is the loader, and it is loaded into the process by the kernel.
You do have a few choices:
the easiest solution is link the binary fully-statically. Note that on Linux such binary could still dlopen other shared libraries, although this is not a well-supported or well-tested thing to do.
harder solution is to build your own patched ld.so, and make your application use that ld.so (using -Wl,--dynamic-linker=... flag).
if you don't want to do that, rtldi may help (it will run before ld.so).
The entry point address in the shared libs are edited by the prelink.
prelink is to avoid conflicts of load address of shared libraries, to optimize and speed-up run-time loader. By default it's on in our system.
prelink is a program that modifies ELF shared libraries and ELF dynamically linked binaries and assigns a unique virtual address space slot to each libs. In such a way that the time needed for the dynamic linker to perform relocations at startup significantly decreases. Due to fewer relocations, the run-time memory consumption decreases as well.
/usr/sbin/prelink -avmR
prelinks all binaries found in directories specified in /etc/prelink.conf and all their dependent libraries, assigning libraries unique virtual address space slots
By disabling the prelink, the entry point is not in the middle of the lib. so we can get another 1G memory mmaped.

Is it possible to force a range of virtual addresses?

I have an Ada program that was written for a specific (embedded, multi-processor, 32-bit) architecture. I'm attempting to use this same code in a simulation on 64-bit RHEL as a shared object (since there are multiple versions and I have a requirement to choose a version at runtime).
The problem I'm having is that there are several places in the code where the people who wrote it (not me...) have used Unchecked_Conversions to convert System.Addresses to 32-bit integers. Not only that, but there are multiple routines with hard-coded memory addresses. I can make minor changes to this code, but completely porting it to x86_64 isn't really an option. There are routines that handle interrupts, CPU task scheduling, etc.
This code has run fine in the past when it was statically-linked into a previous version of the simulation (consisting of Fortran/C/C++). Now, however, the main executable starts, then loads a shared object based on some inputs. This shared object then checks some other inputs and loads the appropriate Ada shared object.
Looking through the code, it's apparent that it should work fine if I can keep the logical memory addresses between 0 and 2,147,483,647 (32-bit signed int). Is there a way to either force the shared object loader to leave space in the lower ranges for the Ada code or perhaps make the Ada code "think" that it's addresses are between 0 and 2,147,483,647?
Is there a way to either force the shared object loader to leave space in the lower ranges for the Ada code
The good news is that the loader will leave the lower ranges untouched.
The bad news is that it will not load any shared object there. There is no interface you could use to influence placement of shared objects.
That said, dlopen from memory (which we implemented in our private fork of glibc) would allow you to do that. But that's not available publicly.
Your other possible options are:
if you can fit the entire process into 32-bit address space, then your solution is trivial: just build everything with -m32.
use prelink to relocate the library to desired address. Since that address should almost always be available, the loader is very likely to load the library exactly there.
link the loader with a custom mmap implementation, which detects the library of interest through some kind of side channel, and does mmap syscall with MAP_32BIT set, or
run the program in a ptrace sandbox. Such sandbox can again intercept mmap syscall, and or-in MAP_32BIT when desirable.
or perhaps make the Ada code "think" that it's addresses are between 0 and 2,147,483,647?
I don't see how that's possible. If the library stores an address of a function or a global in a 32-bit memory location, then loads that address and dereferences it ... it's going to get a 32-bit truncated address and a SIGSEGV on dereference.

How does GDB perform base addresses of shared libraries [ internals of info sharedlibrary command]

I am trying to understand the internal working behind GDB commands. After initial homework of understanding about elf / shared libraries / address space randomization, I attempted to understand how GDB make sense between the executable and corefile.
solib.c contains the implementation of shared library processing. Esp am interested in the info sharedlibrary command.
The comment on the solib.c goes like this..
/* Relocate the section binding addresses as recorded in the shared
object's file by the base address to which the object was actually
mapped. */
ops->relocate_section_addresses (so, p);
I could not understand much from this comment. Can somebody explain me in plain english how relocation happens? i.e Every time when an executable loads a shared object, it is going to load at some location say X, and all the symbols inside the shared library will be located at fixed offset, say X+Y with some size Z. My question is, how does gdb does the same range of address relocation, so that it matches with the load segments in the corefile. How it takes that hint from executable.
how does gdb does the same range of address relocation, so that it matches with the load segments in the corefile
In other words, how does GDB find the relocation X?
The answer depends on the operating system.
On Linux, GDB finds _DYNAMIC[] array of struct Elf{32,64}_Dyns in the core file, which contains an element with .d_tag == DT_DEBUG.
The .d_ptr in that element points to struct r_debug (see /usr/include/link.h), which points to a linked list of struct link_maps, which describe all loaded shared libraries and their relocations in l_addr.
The relevant file in GDB is solib-svr4.c.
EDIT:
I see that, there are no .dynamic sections in the corefile.
There shouldn't be. There is a .dynamic section in the executable and a matching LOAD segment in the core (the segment will "cover" the .dynamic section, and have the contents that was there at runtime).

Reading kernel memory using a module

As part of my project I need to read the kernel to get the memory address of system call table and system call address. Or in effect i need to extract the contents of the system call table and all the system calls.
Till now I use GDB for this purpose. Is there any way so that I could do it using a kernel module. I am new the kernel module programming. Looking for advice from experts here.
Let me first start by saying reading arbitrary kernel memory is tricky business! And there are many ways to do it, which vary in their degree of complexity and flexability.
1) Hard-code the address.
Search for it in your kernel version's System.map file:
# grep sys_call_table /boot/System.map-2.6.18-238.12.1.el5
c06254e0 R sys_call_table
With this, hard-code the address:
unsigned long *syscall_table = (unsigned long *)0xc06254e0;
Then, assuming you #include <linux/syscalls.h>, you can use the __NR_syscall definitions to grab the addresses of those syscalls within the code:
syscall_table[__NR_close]
This is the easiest method, but by far the least flexible. This module will only work on that exact kernel. If you insmod it into a different kernel, you're liable to get a kernel OOPs.
2) Brute-force scan for the table
Have a look at this:
http://memset.wordpress.com/2011/03/18/syscall-hijacking-dynamically-obtain-syscall-table-address-kernel-2-6-x-2/
He uses a method to brute force the kernel memory address range to find the sys_call_stable. As-is, it only works on 32bit (64bit has a different memory address range for the kernel).
This method is somewhat flexible, but may break down the road as the kernel semantics change.
3) Dynamically search System.map load time
You can read your kernel's System.map file when you load the module. I demonstrate this in the tpe-lkm module I wrote. The project is hosted on github.
Have a look at the find_symbol_address_from_file() from this file:
https://github.com/cormander/tpe-lkm/blob/master/symbols.c
Very flexible, as you can find any symbol you want, but reading files from kernel space is a big 'no no'. Don't ask me why, but people are always telling me that. You also run the risk that the System.map it looks at is invalid, and could cause a kernel OOPs. Also, the code is... messy.
4) Use kallsyms_on_each_symbol()
As of around kernel version 2.6.30, the kernel exports kallsyms_on_each_symbol(). We can thank the ksplice folks for that. With this you can't find the sys_call_table (it isn't in there for some reason), but you can find most other symbols.
Very flexible, very stable method of finding addresses of symbols, but somewhat complicated to understand ;)
I demonstrate this in my tpe-lkm project. Have a look at the find_symbol_callback() and find_symbol_address() function in this file:
https://github.com/cormander/tpe-lkm/blob/master/symbols.c

How the share library be shared by different processes?

I read some documents that share library comiled with -fPIC argument,
the .text seqment of the .so will be shared at process fork's dynamic linking stage
(eq. the process will map the .so to the same physical address)
i am interested in who (the kernel or ld.so ) and how to accomplish this?
maybe i should trace the code, but i dont know where to start it.
Nevertheless, i try to verify the statement.
I decide to check the function address like printf which is in the libc.so that all c program will link.
I get the printf virtual address of the process and need to get the physical address. Tried to write a kernel module and pass the address value to kernel, then call virt_to_phys. But it did not work cause the virt_to_phys only works for kmalloc address.
So, process page table look-at might be the solution to find the virtual address map to physical address. Were there any ways to do page table look-at? Or othere ways can fit the verify experiment?
thanks in advance!
The dynamic loader uses mmap(2) with MAP_PRIVATE and appropriate permissions. You can see what it does exactly by running a command from strace -e file,mmap. For instance:
strace -e file,mmap ls
All the magic comes from mmap(2). mmap(2) creates mappings in the calling process, they are usually backed either by a file or by swap (anonymous mappings). In a file-backed mapping, MAP_PRIVATE means that writes to the memory don't update the file, and cause that page to be backed by swap from that point on (copy-on-write).
The dynamic loader gets the info it needs from ELF's program headers, which you can view with:
readelf -l libfoo.so
From these, the dynamic loader determines what to map as code, read-only data, data and bss (zero-filled segment with zero size in file, non-zero size in memory, and a name only matched in crypticness by Lisp's car and cdr).
So, in fact, code and also data is shared, until a write causes copy-on-write. That is why marking constant data as constant is a potentially important space optimization (see DSO howto).
You can get more info on the mmap(2) manpage, and in Documentation/nommu-mmap.txt (the MMU case, no-MMU is for embedded devices, like ADSL routers and the Nintendo DS).
Shared libraries just a particular use of mapped files.
The address which a file is mapped at in a process's address space has nothing to do with whether it is shared or not.
Pages can be shared even if they are mapped at different addresses.
To find out if pages are being shared, do the following:
Find the address that the file(s) are mapped at by examining /proc/pid/maps
There is a tool which extracts data from /proc/pid/pagemap - find it and use it. This gives you info as to exactly which page(s) of a mapping are present and what physical location they are at
If two processes have a page mapped in at the same physical address, it is of course, shared.

Resources