I am interested in replacing a system call with a custom that I will implement in linux kernel 3.
I read that the sys call table is no longer exposed.
Any ideas?
any reference to this http://www.linuxtopia.org/online_books/linux_kernel/linux_kernel_module_programming_2.6/x978.html example but for kernel 3 will be appreciated :)
Thank you!
I would recommend using kprobes for this kind of job, you can easily break on any kernel address (or symbol...) and alter the execution path, all of this at runtime, with a kernel module if you need to :)
Kprobes work by dynamically replacing an instruction (e.g. first instruction of your syscall entry) by a break (e.g. int3 on x86). Inside the do_int3 handler, a notifier notifies kprobes, which in turn passes the execution to your registered function, from which point you can do almost anything.
A very good documentation is given in Documentation/kprobes.txt so as a tiny example in samples/kprobes/kprobes_example.c (in this example they break on do_fork to log each fork on the system). It has a very simple API and is very portable nowdays.
Warning: If you need to alter the execution path, make sure your kprobes are not optimized (i.e. a jmp instruction to your handler replaces the instruction you break onto instead of an int3) otherwize you won't be able to really alter the execution easily (after the ret of your function, the syscall function will still be executed as usual). If you are only interested in tracing, then this is fine and you can safely ignore this issue.
Write a LKM that would be better optio.What do you mean by replace,do you want to add a new one.
Related
In gdb, when debugging inside a function, we can use "finish" command to run to the end of a function.
My question is: how does gdb know the ending position of a function, especially when there's no debugging symbol to match source code "{}"?
I guess gdb looks for either "leave" or "mov %rbp, %rsp,pop %rbp" under x86 in order to judge whether it has reached the end of a function.
But the problem is,
(1) There're still some extra registers that needs to push/pop at the begin/end of a function call, depending on source code and ABI structure.
(2)The number of registers needs to be push/pop is decided during compilation phase, and I'm afraid this "number" information is not available throw binary executable file.
So, how does gdb determine, where is the end of a function call, so that "finish" command can jump to it?
Thanks!
gdb doesn't try to analyze the machine code. Instead, it unwinds the stack, finds the caller's PC, and sets a temporary breakpoint there. Then it lets the inferior run until the breakpoint is hit.
Due to the way gdb's unwinder is designed, this automatically handles finish from an inlined function as well (though there are still a few special cases in the code due to this).
I found a trick to automate gdb commands "less disturbing", to do it I simply switch to the last thread in non-stop mode and stop that thread, then execute the requested commands and continue that thread and then return back to main thread.
But this trick won't work if the inferior only has only one thread. So I need to create a thread that runs forever at the background, by this way I'll be able to attach to it anytime I want without having to disturb the inferior.
The only solution that came to my mind was this:
1-)Break at syscall
2-)Allocate some memory with gdb to inject codes
3-)Replace syscall with the jmp instruction that points to the allocated memory
4-)pushad and execute the code that "somehow" creates the thread
5-)Replace the jmp with syscall back
6-)popad and jmp back to where syscall located
But this is way too hacky and I still have no idea about the "somehow" part. Is there a more elegant way to do this? Maybe gdb has some tools for it and I'm missing it. If not, how can I do the "somehow" part?
Yes, it is. I found an elegant way to do it, you don't have to do such hacky stuff. You can inject a thread like this:
1-)Write a code that crates a thread in any compilable language
2-)Compile it so a .so file(-g option should be passed, gdb will need debug symbols)
3-)Load it to the inferior with linux system call dlopen: call dlopen(".so path",int)
4-)Call any function you implemented in the .so file you injected by executing call funcname(), gdb will pick it up automatically when you press Tab(that's why you need debug symbols)
I know that Kprobes can be used to probe any kernel function. But after going through its documents I realise that it is mostly a kind of passive entity. It simply puts a probe in the middle of an execution sequence.
But what if I want to invoke any kernel function directly without bothering about the execution sequence.
How can I achieve that?
Updated:
Note: I want to invoke any kernel function inside my kernel module and not from any user space application.
Kernel functions cannot be simply invoked from applications that live in user space. System calls are the only functions in user space that can request kernel services.
To call kernel functions directly, if you are interested in kernel programming, you must implement a kernel module. This is a starting point.
EDIT
As you have specified that you want to call kernel functions from within a module, then there is no problem at all. Just follow the link I posted above for the documentation.
what if I want to invoke any kernel function directly
Not all functions can be used directly at least.
Consider the following points when calling a kernel function in your case.
kernel function from different module can be used only if it is exported using EXPORT_SYMBOL family of macros.
static functions can't be used directly outside of that file.
Example
Function definition (i2c_smbus_read_byte_data)
http://lxr.free-electrons.com/source/drivers/i2c/i2c-core.c#L2689
Used here
http://lxr.free-electrons.com/source/drivers/i2c/i2c-core.c#L350
Problem Statement
I'm trying to get the address of a running thread's start_routine as passed in the pthread_create() call.
Research so far
It is apparently not in /proc/[tid]/stat or /proc/[tid]/status.
I found that start_routine is a member of struct pthread and gets set by pthread_create.[1]
If I knew the address of this struct, I could read the start_routine address.
I also found td_thr_get_info defined in the debugging library thread_db.h.[2]
It fills a struct with information about the thread, including the start function.[3] But it needs a struct td_thragent as an argument and I don't know how to create it properly.
Links
[1] http://fxr.watson.org/fxr/source/nptl/pthread_create.c?v=GLIBC27;im=excerpts#L455
[2] http://fxr.watson.org/fxr/source/nptl_db/td_thr_get_info.c?v=GLIBC27#L27
[3] See comment, because I'm not allowed to post more than 2 links.
You probably can't, and I could even imagine a very wild scenario where it could not exist at the moment you are querying it.
Let's suppose that the initial thread start routine void*foo_start(void*) is in some dlopen-ed dynamic shared library libfoo.so.
Let's imagine that foo_start is making a tail-recursive call to bar, and that bar function is dlclose-ing libfoo.so and later calling some of your routine querying that start. It is an wild address in some defunct segment (which has been munmap-ed by dlclose called by bar)!
So, even if you hack your libc to retrieve the start routine of a thread, that does not make much sense. BTW, you could look into MUSL libc, its src/thread/pthread_create.c file is quite readable.
NB: on some occasions, recent GCC (e.g. 4.8 or 4.9) when asked to optimize a lot (e.g. -O3) are able to generate tail recursive calls from C code.
This is related to: https://stackoverflow.com/a/13413099/1284631
Now, the question is:
Why the reboot() system call, when called with LINUX_REBOOT_CMD_HALT parameter (see here: http://lxr.linux.no/linux+v3.6.6/kernel/sys.c#L480) is calling do_exit(0) after having already called kernel_halt(), as calling kernel_halt() boils down to calling stop_this_cpu() (see here: http://lxr.linux.no/linux+v3.6.6/arch/x86/kernel/process.c#L519), as part of native_machine_halt() (see here: http://lxr.linux.no/linux+v3.6.6/arch/x86/kernel/reboot.c#L680).
Or, it seems to me that stop_this_cpu() is never returning (it ends with an infinite loop).
So, it is do_exit(0) called just in case that kernel_halt() doesn't do its job and it return? Why not panic() directly instead, then?
Some ideas:
It may be that kernel_halt() refuses to actually halt for a legitimate reason, though I can't think of any.
kernel_halt() may be designed to also be called by a hypervisor or something at a higher or equivalent level than the kernel (custom SMI code maybe?)
Perhaps the kernel_halt() function returns early, "scheduling" the halt, and the actual halt takes place some time later on some hardware. I remember reading about performing an ATX power off in DOS in assembly - you would issue the outb instruction to intiate the power off, but you'd have to have some nops, an endless loop, or a hlt right afterward, as the actual power off could happen some cycles later.
The calling process may wish to handle failure to reboot some other way than a kernel panic.