Reading kernel memory using a module - linux

As part of my project I need to read the kernel to get the memory address of system call table and system call address. Or in effect i need to extract the contents of the system call table and all the system calls.
Till now I use GDB for this purpose. Is there any way so that I could do it using a kernel module. I am new the kernel module programming. Looking for advice from experts here.

Let me first start by saying reading arbitrary kernel memory is tricky business! And there are many ways to do it, which vary in their degree of complexity and flexability.
1) Hard-code the address.
Search for it in your kernel version's System.map file:
# grep sys_call_table /boot/System.map-2.6.18-238.12.1.el5
c06254e0 R sys_call_table
With this, hard-code the address:
unsigned long *syscall_table = (unsigned long *)0xc06254e0;
Then, assuming you #include <linux/syscalls.h>, you can use the __NR_syscall definitions to grab the addresses of those syscalls within the code:
syscall_table[__NR_close]
This is the easiest method, but by far the least flexible. This module will only work on that exact kernel. If you insmod it into a different kernel, you're liable to get a kernel OOPs.
2) Brute-force scan for the table
Have a look at this:
http://memset.wordpress.com/2011/03/18/syscall-hijacking-dynamically-obtain-syscall-table-address-kernel-2-6-x-2/
He uses a method to brute force the kernel memory address range to find the sys_call_stable. As-is, it only works on 32bit (64bit has a different memory address range for the kernel).
This method is somewhat flexible, but may break down the road as the kernel semantics change.
3) Dynamically search System.map load time
You can read your kernel's System.map file when you load the module. I demonstrate this in the tpe-lkm module I wrote. The project is hosted on github.
Have a look at the find_symbol_address_from_file() from this file:
https://github.com/cormander/tpe-lkm/blob/master/symbols.c
Very flexible, as you can find any symbol you want, but reading files from kernel space is a big 'no no'. Don't ask me why, but people are always telling me that. You also run the risk that the System.map it looks at is invalid, and could cause a kernel OOPs. Also, the code is... messy.
4) Use kallsyms_on_each_symbol()
As of around kernel version 2.6.30, the kernel exports kallsyms_on_each_symbol(). We can thank the ksplice folks for that. With this you can't find the sys_call_table (it isn't in there for some reason), but you can find most other symbols.
Very flexible, very stable method of finding addresses of symbols, but somewhat complicated to understand ;)
I demonstrate this in my tpe-lkm project. Have a look at the find_symbol_callback() and find_symbol_address() function in this file:
https://github.com/cormander/tpe-lkm/blob/master/symbols.c

Related

How can we tell an instruction is from application code or library code on Linux x86_64

I wanted to know whether an instruction is from the application itself or from the library code.
I observed some application code/data are located at about 0x000055xxxx while libraries and mmaped regions are by default located at 0x00007fcxxxx. Can I use for example, 0x00007f00...00 as a boundary to tell instruction is from the application itself or from the library?
How can I configure this boundary in Linux kernel?
Updated.
Can I prevent (or detect) a syscall instruction being issued from application code (only allow it to go through libc). Maybe we can do a binary scan, but due to the variable length of instruction, it's hard to prevent unintended syscall instruction.
Do it the other way. You need to learn a lot.
First, read a lot more about operating systems. So read the Operating Systems: Three Easy Pieces textbook.
Then, learn more about ASLR.
Read also Drepper's How to write shared libraries and Levine's Linkers and loaders book.
You want to use pmap(1) and proc(5).
You probably want to parse the /proc/self/maps pseudo-file from inside your program. Or use dladdr(3).
To get some insight, run cat /proc/$$/maps and cat /proc/self/maps in a Linux terminal
I wanted to know whether an instruction is from userspace or from library code.
You are confused: both library code and main executable code are userspace.
On Linux x86_64, you can distinguish kernel addresses from userpsace addresses, because the kernel addresses are in the FFFF8000'00000000 through FFFFFFFF'FFFFFFFF range on current (48-bit) implementations. See canonical form address description here.
I observed some application code/data are located at about 0x000055xxxx while libraries and mmaped regions are by default located at 0x00007fcxxxx. Can I use for example, 0x00007f00...00 as a boundary to tell instruction is from the application itself or from the library?
No, in general you can't. An application can be linked to load anywhere within canonical address space (though most applications aren't).
As Basile Starynkevitch already answered, you'll need to parse /proc/$pid/maps, or know what address the executable is linked to load at (for non-PIE binary).

Source of clock_gettime

I've tried to understand the behavior of the function clock_gettime by looking at the source code of the linux kernel.
I'm currently using a 4.4.0-83-lowlatency but I only could get the 4.4.76 source files (but it should be close enough).
My first issue is that there is several occurrence of the function. I chose pc_clock_gettime which appears to be the closest and the only one handling CLOCK_MONOTONIC_RAW but if I'm wrong, please correct me.
I tracked back the execution flow of the function and came to a mysterious ravb_ptp_gettime64 and ravb_ptp_time_read which is related to the Ethernet driver.
So... If I understand correctly when I ask the system to give me the time, it ask to the Ethernet driver ?
This is the first time I looked into kernel code so I'm not used to it. If someone could give me an explanation of "how" and "why", it would be marvelous.
clock_gettime use a mechanism named vDSO (Virutal Dynamic Shared Object). It's a shared library which is mapped in the user space by the kernel.
vDSO allow the use of syscall frequently without a drawback on performances. So the kernel "puts" time informations into memory which user programm can access. In the end, it won't be a system call but only a simple function call.

Is it possible to force a range of virtual addresses?

I have an Ada program that was written for a specific (embedded, multi-processor, 32-bit) architecture. I'm attempting to use this same code in a simulation on 64-bit RHEL as a shared object (since there are multiple versions and I have a requirement to choose a version at runtime).
The problem I'm having is that there are several places in the code where the people who wrote it (not me...) have used Unchecked_Conversions to convert System.Addresses to 32-bit integers. Not only that, but there are multiple routines with hard-coded memory addresses. I can make minor changes to this code, but completely porting it to x86_64 isn't really an option. There are routines that handle interrupts, CPU task scheduling, etc.
This code has run fine in the past when it was statically-linked into a previous version of the simulation (consisting of Fortran/C/C++). Now, however, the main executable starts, then loads a shared object based on some inputs. This shared object then checks some other inputs and loads the appropriate Ada shared object.
Looking through the code, it's apparent that it should work fine if I can keep the logical memory addresses between 0 and 2,147,483,647 (32-bit signed int). Is there a way to either force the shared object loader to leave space in the lower ranges for the Ada code or perhaps make the Ada code "think" that it's addresses are between 0 and 2,147,483,647?
Is there a way to either force the shared object loader to leave space in the lower ranges for the Ada code
The good news is that the loader will leave the lower ranges untouched.
The bad news is that it will not load any shared object there. There is no interface you could use to influence placement of shared objects.
That said, dlopen from memory (which we implemented in our private fork of glibc) would allow you to do that. But that's not available publicly.
Your other possible options are:
if you can fit the entire process into 32-bit address space, then your solution is trivial: just build everything with -m32.
use prelink to relocate the library to desired address. Since that address should almost always be available, the loader is very likely to load the library exactly there.
link the loader with a custom mmap implementation, which detects the library of interest through some kind of side channel, and does mmap syscall with MAP_32BIT set, or
run the program in a ptrace sandbox. Such sandbox can again intercept mmap syscall, and or-in MAP_32BIT when desirable.
or perhaps make the Ada code "think" that it's addresses are between 0 and 2,147,483,647?
I don't see how that's possible. If the library stores an address of a function or a global in a 32-bit memory location, then loads that address and dereferences it ... it's going to get a 32-bit truncated address and a SIGSEGV on dereference.

Using user-space functions like sprintf in the kernel, or not?

I am making a /proc entry for my driver. So, in the read callback function the first argument is the location into which we write the data intended for the user. I searched on how to write the data in it and i could see that everybody is using sprintf for this purpose. I am surprised to see that it works in kernel space. However this should be wrong to use a user space function in kernel space. Also i cant figure out how to write in that location without using any user space function like strcpy, sprintf, etc. I am using kernel version 3.9.10. Please suggest me how i should do this without using sprintf or any other user space function.
Most of the 'normal' user-space functions would make no sense in kernel code, so they are not available in the kernel.
However, some functions like sprintf, strcpy, or memcpy are useful in kernel code, so the kernel implements them (more or less completely) and makes them available for drivers.
See include/linux/kernel.h and string.h.
sprintf is a kernel-space function in Linux. It is totally separate from its user-space namesake and may or may not work identically to it.
Just because a function in user-space exist, it does not mean an identically named function in kernel-space cannot.

pinning a pthread to a single core

I am trying to measure the performance of some library calls. My primary measurement tool is the rdtsc call. After doing some reading I realize that I need to disable preemption and interrupts in order to get the most accurate readings. Can someone help me figure out how to do these? I know that pthreads have a 'set affinity' mechanism. Is that enough to get the job done?
I also read somewhere that I can make calls into the kernel of the sort
preempt_disable()
raw_local_irq_save(...)
Is there any benefit to using one approach over the other? I tried the latter approach and got this error.
error: 'preempt_disable' was not declared in this scope
which can be fixed by including linux/preempt.h but the compiler still complains.
linux/preempt.h: No such file or directory
Obviously I have not done any kernel hacking and I could not find this file on my system anywhere. I am really hoping I wont have to install a new linux kernel. :)
Thanks for your input.
Pinning a pthread to a single CPU can be done using pthread_setaffinity_np
But what you want to achieve at the end is not so simple. I'll explain you why.
preempt.h is part of the Linux Kernel source. Its located here. You need to have kernel sources with you. Anyways, you need to write a kernel module to access it, you cannot use it from user space. Learn how to write a kernel module here. Same is the case with functions preempt_disable and other interrupt disabling kernel functions
Now the point is, pthreads are in user space and your preemption disabling function is in kernel space. How to interact?
Either you need to write a new system call of your own where you do your preemption and interrupt disabling and call it from user space. Or you need to resort to other Kernel-User Space Interfaces like procfs, sysfs, ioctl etc
But I am really skeptical as to how all these will help you to benchmark library functions. You may want to have a look at how performance is typically measured using rdtsc

Resources