Is it possible to hook a function call with kprobes? - hook

According to https://docs.kernel.org/trace/kprobes.html it is possible to set the instruction pointer within a kprobe's pre_handler function.
Since kprobes can probe into a running kernel code, it can change the register set, including instruction pointer. This operation requires maximum care, such as keeping the stack frame, recovering the execution path etc. Since it operates on a running kernel and needs deep knowledge of computer architecture and concurrent computing, you can easily shoot your foot.
If you change the instruction pointer (and set up other related registers) in pre_handler, you must return !0 so that kprobes stops single stepping and just returns to the given address. This also means post_handler should not be called anymore.
The same type of question was asked here, https://linux-kernel.vger.kernel.narkive.com/et7AyFPm/kprobe-pre-handler-change-return-ip it appears that if the current kprobe is "cleaned up" and the pre_handler sets the new instruction pointer and then returns 1, then you can enter a function separate from the intended instruction.
I may doing things wrong but here is my kprobes pre_handler function:
int handler_pre(struct kprobe *kp, struct pt_regs *regs) {
regs->ip = (unsigned long)mock_function;
reset_current_kprobe();
preempt_enable_no_resched();
return 1;
}
First off, when I compile my module I get the error:
WARNING: "per_cpu__current_kprobe" undefined!
If I try to add the line:
EXPORT_PER_CPU_SYMBOL(current_kprobe);
After I define the kprobe, I still get the undefined warning above. Removing the reset_current_kprobe call removes the compiler warning and allows me to insert the module but, as you may have guessed, it completely crashes the kernel. Since the kernel crashes, I am unable to figure out what may be going wrong.
My understanding is that kprobes replace the first instruction at a probed address with a breakpoint instruction which triggers the pre_handler. So by the time the pre_handler is reached, a stack frame for the intended function shouldn't have been created. In my mind this removes the possibility that I could be somehow messing up the stack but I could be completely wrong.
Does anyone have any insight as to how I could go about fixing this issue or what I am doing wrong?

Related

Advantage of kprobes over kretprobes

Both kprobes and kretprobes allows you to put probe on a particular instruction in the kernel address.
If you register a kprobe, the pre_handler gets executed before the actual function and post_handler after the actual function
With kretprobes, you can get the entry_handler to execute before the actual function and ret_handler to execute after the actual function and it contain the return value of the function call.
So, what is the advantage of using kprobes over kretprobes, as kretprobes has the feature of kprobes plus the return value of the function
A kprobe can be placed on any instruction, not only at the start of a kernel function (if kprobes are allowed in the given kernel code, of course).
The handlers of a kprobe run before and after the instruction.
Kretprobes only make sense for probing function entries and exits. The handlers of a kretprobe run on entry to a function and at its exit, rather than before and after some instruction, like kprobe handlers do.
Besides, if you don't need to run your code at the function exit, kprobes might be a better choice than kretprobes for probing functions (although Ftrace might be even better). Kretprobes meddle with the return address of the function on the stack to get the handler executed. If the function crashes or dumps the backtrace for some other reason, the backtrace may include the addresses of kretprobe internals rather than the real return addresses, which may be confusing.
https://www.kernel.org/doc/Documentation/kprobes.txt

Can I block a new process execution using Kprobe?

Kprobe has a pre-handler function vaguely documented as followed:
User's pre-handler (kp->pre_handler)::
#include <linux/kprobes.h>
#include <linux/ptrace.h>
int pre_handler(struct kprobe *p, struct pt_regs *regs);
Called with p pointing to the kprobe associated with the breakpoint,
and regs pointing to the struct containing the registers saved when
the breakpoint was hit. Return 0 here unless you're a Kprobes geek.
I was wondering if one can use this function (or any other Kprobe feature) to prevent a process from being executed \ forked.
As documented in the kernel documentation, you can change the execution path by changing the appropriate register (e.g., IP register in x86):
Changing Execution Path
-----------------------
Since kprobes can probe into a running kernel code, it can change the
register set, including instruction pointer. This operation requires
maximum care, such as keeping the stack frame, recovering the execution
path etc. Since it operates on a running kernel and needs deep knowledge
of computer architecture and concurrent computing, you can easily shoot
your foot.
If you change the instruction pointer (and set up other related
registers) in pre_handler, you must return !0 so that kprobes stops
single stepping and just returns to the given address.
This also means post_handler should not be called anymore.
Note that this operation may be harder on some architectures which use
TOC (Table of Contents) for function call, since you have to setup a new
TOC for your function in your module, and recover the old one after
returning from it.
So you might be able to block a process' execution by jumping over some code. I wouldn't recommend it; you're more likely to cause a kernel crash than to succeed in stopping the execution of a new process.
seccomp-bpf is probably better suited for your use case. This StackOverflow answer gives you all the information you need to leverage seccomp-bpf.

How to generate a kernel oops or panic crash in Linux kernel code?

How can I generate a kernel oops or crash in kernel code? Is there a function for that?
The usual way to crash the kernel is by using BUG() macro. There's also WARN() macro, which dumps the stack down to console but the kernel keeps running.
http://kernelnewbies.org/FAQ/BUG
What happens after kernels hits a BUG() macro (which eventually results in an internal trap) or some similar error condition (like null pointer dereference) depends on a setting of panic_on_oops global variable. If it's set to 0, the kernel will try to keep running (with whatever awful consequences). If it's set to 1, the kernel will enter the panic state and halt.
If you want to crash the kernel from user space, you've got a handy <SysRq> + <c> key combo (or, alternatively, echo c > /proc/sysrq-trigger). It's worth looking at the handler implementation for this action (http://code.metager.de/source/xref/linux/stable/drivers/tty/sysrq.c#134):
static void sysrq_handle_crash(int key)
{
char *killer = NULL;
panic_on_oops = 1; /* force panic */
wmb();
*killer = 1;
}
The handler sets the global flag to make kernel panic on traps, then tries to dereference a random null pointer.
panic() function
The kernel also has a panic() function if you want to do it from inside a kernel module code:
#include <kernel.h>
panic("my message");
It is defined at kernel/panic.c.
Here is a minimal runnable example.
Related threads:
Using assertion in the Linux kernel
https://unix.stackexchange.com/questions/66197/how-to-cause-kernel-panic-with-a-single-command

Is it safe to call dlclose(NULL)?

I experience a crash when I pass a null pointer to dlclose.
Should I check for null before calling dlclose?
POSIX tells nothing about this:
http://pubs.opengroup.org/onlinepubs/7908799/xsh/dlclose.html
Is it undefined behaviour or a bug in dlclose implementation?
This is tricky. POSIX states that
if handle does not refer to an open object, dlclose() returns a non-zero value
from which you could infer that it should detect, for an arbitrary pointer, whether that pointer refers to an open object. The Linux/Glibc version apparently does no such check, so you'll want to check for NULL yourself.
[Aside, the Linux manpage isn't very helpful either. It's quite implicit about libdl functions' behavior, deferring to POSIX without very clearly claiming conformance.]
It also says nothing about being accepting a NULL and not crashing. We can assume from your test that it doesn't do an explicit NULL check, and it does need to use the pointer somehow to perform the closing action … so there you have it.
Following the malloc/free convention (1), it is a bug. If it follows the fopen/fclose convention (2) it is not. So if there is a bug, it is in the standard because it lacks convention for dealing with zombies.
Convention (1) works well with C++11 move semantics
Convention (2) leaves more responsibility to the caller. In particular, a dtor must explicitly check for null if a move operation has been done.
I think this is something that should be revised for an upcoming POSIX revision in order to avoid confusion.
Update
I found from this answer https://stackoverflow.com/a/6277781/877329, and then reading man pthread_join that you can call pthread_join with an invalid tid, supporting the malloc/free convension. Another issue I have found with the dynamic loader interface is that it does not use the standard error handling system, but has its own dlerrorfunction.

After Segfault: Is there a way, to check if pointer is still valid?

I plan to create a logging/tracing mechanism, which writes the address (const char*) of string literals to a ring-buffer. These strings are in the read-only data-segment and are created by the preprocessor with __function__ or __file__.
The Question: Is it possible, to analyze this ring-buffer content after a Segfault, if all pointers are valid? With "valid" I mean that they point to a mapped memory area and dereferencing won't cause a segmentation fault.
I'm working with Linux 2.6.3x and GCC 4.4.x.
Best regards,
Charly
The usual way to check if dereferencing a memory region will cause a segfault is to use read() or write(). Eg to check if the first 128 bytes pointed to by ptr are safely readable:
int fd[2];
if (pipe(fd) >= 0) {
if (write(fd[1], ptr, 128) > 0)
/* OK */
else
/* not OK */
close(fd[0]);
close(fd[1]);
}
(write() will return EFAULT rather than raising a signal if the region isn't readable).
If you want to test more than PIPE_BUF bytes at a time, you'll need to read and discard from the reading side of the pipe.
I think the approach you are looking for is to handle a SIGSEGV signal via sigaction.
void handler(int, siginfo_t *info, ucontext_t *uap)
{
/* Peek at parameters here... I'm not sure exactly what you want to do. */
}
/* Set up the signal handler... */
struct sigaction sa, old_sa;
memset(&sa, 0 sizeof(sa));
sa.sa_sigaction = handler;
sa.sa_flags = SA_SIGINFO;
if (sigaction(SIGSEGV, &sa, &old_sa))
{
/* TODO: handle error */
}
Note however that catching SIGSEGV on your own process is kind of weird. The process is likely in a bad state that can't be recovered from. The actions you'll be able to do in response to it may be limited, and it's most likely the process being killed is a good thing.
If you want it to be a bit more stable, there is the sigaltstack call which lets you specify an alternate stack buffer, so that if you've completely hosed your stack you can still handle SIGSEGV. To use this you need to set SA_ONSTACK in sa.sa_flags above.
If you want to respond to SEGV from the safety of another process (thereby isolating yourself from the poorly behaving segfaulting code and making it so that you won't crash while inspecting it), you can use ptrace. This interface is complex, has many non-portable parts, and is mainly used to write debuggers. But you can do great things with it, like read and write the process's memory and registers, and alter its execution.
Of course if the stack or other memory that you rely upon has been corrupted then there could be problems, but that is true for any code.
Assuming that that there is no problem with the stack or other memory that you rely upon, and assuming that you do not call any functions like malloc() that are not async-signal safe, and assuming that you do not attempt to return from your signal handler, then there should be no problem reading or writing your buffer from within your signal handler.
If you are trying to test whether a particular address is valid, you could use a system call such as mincore() and check for an error result.
Once you've received a segfault, all bets are off. The pointers may be valid or they may have been corrupted. You just don't know. You may be able to compare them with valid values or the pointer to the ring buffer itself may have been corrupted. In which case, you'll probably get garbage.

Resources