Advantage of kprobes over kretprobes - linux

Both kprobes and kretprobes allows you to put probe on a particular instruction in the kernel address.
If you register a kprobe, the pre_handler gets executed before the actual function and post_handler after the actual function
With kretprobes, you can get the entry_handler to execute before the actual function and ret_handler to execute after the actual function and it contain the return value of the function call.
So, what is the advantage of using kprobes over kretprobes, as kretprobes has the feature of kprobes plus the return value of the function

A kprobe can be placed on any instruction, not only at the start of a kernel function (if kprobes are allowed in the given kernel code, of course).
The handlers of a kprobe run before and after the instruction.
Kretprobes only make sense for probing function entries and exits. The handlers of a kretprobe run on entry to a function and at its exit, rather than before and after some instruction, like kprobe handlers do.
Besides, if you don't need to run your code at the function exit, kprobes might be a better choice than kretprobes for probing functions (although Ftrace might be even better). Kretprobes meddle with the return address of the function on the stack to get the handler executed. If the function crashes or dumps the backtrace for some other reason, the backtrace may include the addresses of kretprobe internals rather than the real return addresses, which may be confusing.
https://www.kernel.org/doc/Documentation/kprobes.txt

Related

Is it possible to hook a function call with kprobes?

According to https://docs.kernel.org/trace/kprobes.html it is possible to set the instruction pointer within a kprobe's pre_handler function.
Since kprobes can probe into a running kernel code, it can change the register set, including instruction pointer. This operation requires maximum care, such as keeping the stack frame, recovering the execution path etc. Since it operates on a running kernel and needs deep knowledge of computer architecture and concurrent computing, you can easily shoot your foot.
If you change the instruction pointer (and set up other related registers) in pre_handler, you must return !0 so that kprobes stops single stepping and just returns to the given address. This also means post_handler should not be called anymore.
The same type of question was asked here, https://linux-kernel.vger.kernel.narkive.com/et7AyFPm/kprobe-pre-handler-change-return-ip it appears that if the current kprobe is "cleaned up" and the pre_handler sets the new instruction pointer and then returns 1, then you can enter a function separate from the intended instruction.
I may doing things wrong but here is my kprobes pre_handler function:
int handler_pre(struct kprobe *kp, struct pt_regs *regs) {
regs->ip = (unsigned long)mock_function;
reset_current_kprobe();
preempt_enable_no_resched();
return 1;
}
First off, when I compile my module I get the error:
WARNING: "per_cpu__current_kprobe" undefined!
If I try to add the line:
EXPORT_PER_CPU_SYMBOL(current_kprobe);
After I define the kprobe, I still get the undefined warning above. Removing the reset_current_kprobe call removes the compiler warning and allows me to insert the module but, as you may have guessed, it completely crashes the kernel. Since the kernel crashes, I am unable to figure out what may be going wrong.
My understanding is that kprobes replace the first instruction at a probed address with a breakpoint instruction which triggers the pre_handler. So by the time the pre_handler is reached, a stack frame for the intended function shouldn't have been created. In my mind this removes the possibility that I could be somehow messing up the stack but I could be completely wrong.
Does anyone have any insight as to how I could go about fixing this issue or what I am doing wrong?

ARM64 calling conventions: caller or callee saved when there are fewer than 8 args?

This is regarding the Unix/Linux ABI for ARM 64-bit.
If one functions is using registers x0-x7 because it has received 8 parameters, and it then calls another function that also plans to use let's say x6-x7, is it expected that the caller will save those or that the callee will save them?
The Unix aarch64 ABI is unclear about this.
The function signature doesn't affect the calling convention.
All possible arg-passing registers are always call-clobbered whether the function actually takes that many or not, so the caller should keep any "precious" values in other registers, or memory.
This is generally a good design. e.g. a function taking 2 args might want to call another function that takes more args, and wouldn't want to waste instructions saving/restoring its caller's x2 so it could use it pass an arg.
Also, with your hypothetical design, variadic functions like printf would have to restore all the arg-passing registers they might have touched, in case they were called with fewer. (Easier to do that than to count args and only restore the ones past the end of the arg list. Only 8 registers is only four ldp load-pair instructions.)
Plus, what about floating point registers? Most functions don't take FP args, but you don't want them to waste instructions saving/restoring them if they want to internally call a math function.

Can I block a new process execution using Kprobe?

Kprobe has a pre-handler function vaguely documented as followed:
User's pre-handler (kp->pre_handler)::
#include <linux/kprobes.h>
#include <linux/ptrace.h>
int pre_handler(struct kprobe *p, struct pt_regs *regs);
Called with p pointing to the kprobe associated with the breakpoint,
and regs pointing to the struct containing the registers saved when
the breakpoint was hit. Return 0 here unless you're a Kprobes geek.
I was wondering if one can use this function (or any other Kprobe feature) to prevent a process from being executed \ forked.
As documented in the kernel documentation, you can change the execution path by changing the appropriate register (e.g., IP register in x86):
Changing Execution Path
-----------------------
Since kprobes can probe into a running kernel code, it can change the
register set, including instruction pointer. This operation requires
maximum care, such as keeping the stack frame, recovering the execution
path etc. Since it operates on a running kernel and needs deep knowledge
of computer architecture and concurrent computing, you can easily shoot
your foot.
If you change the instruction pointer (and set up other related
registers) in pre_handler, you must return !0 so that kprobes stops
single stepping and just returns to the given address.
This also means post_handler should not be called anymore.
Note that this operation may be harder on some architectures which use
TOC (Table of Contents) for function call, since you have to setup a new
TOC for your function in your module, and recover the old one after
returning from it.
So you might be able to block a process' execution by jumping over some code. I wouldn't recommend it; you're more likely to cause a kernel crash than to succeed in stopping the execution of a new process.
seccomp-bpf is probably better suited for your use case. This StackOverflow answer gives you all the information you need to leverage seccomp-bpf.

what happens after read is called for a Linux socket

What actually happens after calling read:
n = read(fd, buf, try_read_size);
here fd is a TCP socket descriptor. buf is the buffer. try_read_size is the number of bytes that the program tries to read.
I guess this may finally invokes a system call to the kernel. But could anyone provide some details? say the source code implementation in glibc or kernel source?
From a high-level perspective, this is what happens:
A wrapper function provided by glibc is called
The wrapper function puts the parameters passed on the stack into registers and sets the syscall number in the register dedicated for that purpose (e.g. EAX on x86)
The wrapper function executes a trap or equivalent instruction (e.g. SYSENTER)
The CPU switches to ring0, and the trap handler is invoked
The trap handler checks the syscall number for validity and looks it up in a jump table to kernel functions
The respective kernel function checks whether arguments are valid (e.g. the range buf to buf+try_read_size refers to accessible memory pages, fd is really a file descriptor). If something is amiss, a negative error code (e.g. -EFAULT) is generated, the cpu is switched back to user mode and the call returns to the wrapper.
Another function is called depending on the file descriptor's type (in your case a socket, but one could read from a block device or a proc entry or something more exotic)
The socket's input buffer is checked:
If there is some data in the buffer, min(available, try_read_size) is copied to buf, the amount is written to the return code register (EAX on x86), the cpu is switched back to user mode and the call returns to the wrapper.
If the input buffer is empty
If the connection has been closed, zero is written to the return code register, the cpu is switched back to user mode and the call returns to the wrapper
If the connection has not been closed
A negative error code (-EAGAIN) is written to the return code register if the socket is nonblocking, the cpu is switched back to user mode and the call returns to the wrapper.
The process is suspended if the socket is not non-blocking
The wrapper function checks whether the return value is negative (error).
If positive or zero, it returns the value.
If negative, it sets errno to the negated value (a positive error is reported) and returns -1

Using sigprocmask to implement locks

I'm implementing user threads in Linux kernel 2.4, and I'm using ualarm to invoke context switches between the threads.
We have a requirement that our thread library's functions should be uninterruptable by the context switching mechanism for threads, so I looked into blocking signals and learned that using sigprocmask is the standard way to do this.
However, it looks like I need to do quite a lot to implement this:
sigset_t new_set, old_set;
sigemptyset(&new_set);
sigaddset(&new_set, SIGALRM);
sigprocmask(SIG_BLOCK, &new_set, &old_set);
This blocks SIGALARM but it does this with 3 function invocations! A lot can happen in the time it takes for these functions to run, including the signal being sent.
The best idea I had to mitigate this was temporarily disabling ualarm, like this:
sigset_t new_set, old_set;
time=ualarm(0,0);
sigemptyset(&new_set);
sigaddset(&new_set, SIGALRM);
sigprocmask(SIG_BLOCK, &new_set, &old_set);
ualarm(time, 0);
Which is fine except that this feels verbose. Isn't there a better way to do this?
As WhirlWind points out, the signal set functions are quite lightweight and may even be implemented as macros; and you can also just keep around a signal set that contains only SIGALRM and re-use that.
Regardless, it doesn't actually matter if the signal happens during the sigaddset() or sigemptyset() calls - the new_set and old_set variable are (presumably) thread-local, and the critical section isn't entered until after sigprocmask() returns.
You'll find that sigemptyset() and sigaddset() in signals.h are just macros or inline functions, so they execute inline in your code. Just use a stack variable when you call them.
However, why don't you do this in a single-threaded startup section of your code? I also doubt the function call to sigprocmask will be atomic. Blocking signals does not mean your code will be uninterruptible.
By the way, I'm not sure how you're using ualarm, but if you're not catching or ignoring SIGALARM when you call it the first time, you'll probably kill your process.
sigprocmask() is the only function that goes to kernel level and actually changes the signal masking status. The other functions are just manipulation functions for setting up the mask before calling sigprocmask or passing the set to another signal related function.

Resources