Handling a SIGSEGV in kernel space

Handling a SIGSEGV in kernel space - linux

I need some help about extend the signal sending functionality in kernel space in the presence of a SIGSEGV.
I have been working on OPTEE-OS (see the image at the bottom) and when a program, which is running on the RICH OS (in my case linux), crashes for some reason i need to forward the crash detected to the OPTEE Trustzone. I do not need to forward crash details or core dump file to the TrustZone but only that the process with pid XXXXXX crashed.
I know SIGSEGV is caught by the MMU causing an interrupt and that interrupt is handled by the kernel, which sends a SIGSEGV signal to the process. I want to extend the kernel module which sends the SIGSEGV signal in order to forward information to the TrustZone. But i do not know which kernel module in linux i could edit to do that.
For instance by starting with a simple .c code
void foo(int *p) {
*p = 1;
}
int main(int argc, char **argv) {
int *p = NULL;
foo(p);
}
which returns a SIGSEGV my question is:
Which kernel module should i edit in order to send the information to the Trustzone ? I think it is not a complex task because i just have to import the TruztZone library in the kernel module so as to do that .
OP-TEE ARCHITECTURE

Related

Capturing power-off interrupt for i.MX6UL (linux kernel)

Context
I'm using an i.MX6 (IMXULL) application processor, and want to know in software when the power-off button has been pressed:
Luckily, the IMX6ULL reference manual explains that this should be possible:
Section 10.5: ONOFF Button
The chip supports the use of a button input signal to request main SoC power state changes (i.e. On or Off) from the PMU. The ONOFF logic inside of SNVS_LP allows for connecting directly to a PMIC or other voltage regulator device. The logic takes a button input signal and then outputs a pmic_en_b and set_pwr_off_irq signal. [...] The logic has two different modes of operation (Dumb and Smart mode).
The Dumb PMIC Mode uses pmic_en_b to issue a level signal for on and off. Dumb pmic mode has many different configuration options which include (debounce, off to on time, and max time out).
(Also available in condensed form here on page 18)
Attempt
Therefore, I have built a trivially simple kernel module to try and capture this interrupt:
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/sched.h>
#include <linux/syscalls.h>
#include <linux/interrupt.h>
MODULE_LICENSE("GPL");
MODULE_AUTHOR("John Doe <j.doe#acme.inc>");
// Forward declaration
irqreturn_t irq_handler (int, void *);
// Number of interrupt to capture
#define INTERRUPT_NO 36
static int __init pwr_ctl_init (void)
{
pr_err("init()\n");
return request_irq(INTERRUPT_NO, irq_handler, IRQF_SHARED, "onoff-button",
(void *)irq_handler);
}
static void __exit pwr_ctl_exit (void)
{
pr_err("exit()\n");
free_irq(INTERRUPT_NO, NULL);
}
irqreturn_t irq_handler (int irq, void *dev_irq)
{
pr_err("interrupt!\n");
return IRQ_HANDLED;
}
module_init(pwr_ctl_init);
module_exit(pwr_ctl_exit);
Problem
However, I cannot find any information about what the number of the interrupt is. When searching on the internet, all I get is this one NXP forum post:
ONOFF button doesn't interrupt
Which hints it should be 36. However, I have found that this isn't the case on my platform. When I check /proc/interrupts 36 is already occupied by 20b4000.ethernet. Because the application manual also mentions that it is generated by the SNVS low power system, I checked the device-tree and found the following information:
snvs_poweroff: snvs-poweroff {
compatible = "syscon-poweroff";
regmap = <&snvs>;
offset = <0x38>;
value = <0x60>;
mask = <0x60>;
status = "disabled";
};
snvs_pwrkey: snvs-powerkey {
compatible = "fsl,sec-v4.0-pwrkey";
regmap = <&snvs>;
interrupts = <GIC_SPI 4 IRQ_TYPE_LEVEL_HIGH>;
linux,keycode = <KEY_POWER>;
wakeup-source;
status = "disabled";
};
This information seems useful for knowing that SNVS is the interrupt controller, but not how to capture this set_pwr_off_irq signal.
Conclusion
How do I capture the ON/OFF interrupt supposedly generated by SNVS?
How do I determine the number of an interrupt from the device-tree (if applicable at all)
Am I misunderstanding something about how the ONOFF feature works? Is it possible to capture this from a kernel module at all?
Edit
This edit answers some user questions, and then goes into new information about the problem I have since discovered:
User Questions
Processor: The processor is an NXP i.MX 6UltraLite / 6ULL / 6ULZ ARM Cortex A7.
New Information
SNVS Driver: Using my build system kernel configuration, I have modified and verified that the snvs_pwrkey driver (see here) is enabled. My modification consists of adding a single kprint statement to the interrupt routine to see if the button trips it. This did not work
I have tried updating the driver to a newer version, which claims to support newer i.MX6 processors. This also did not work
I have tried to load the driver as a kernel module for easier debugging. This is not possible, as the kernel configuration requires this be enabled and I cannot remove it from being statically built into the kernel.

The answer is rather anticlimactic. In short, there was a device-tree overlay that was disabling my changes to snvs_pwrkey, even when I had enabled it. Once I located and removed the overlay, the driver (snvs_pwrkey.c) was working as expected.
As for the IRQ number, it turns out that the IRQ for the power button is 45 as interpreted through Linux. The interrupt is not configured for sharing, so my kernel module could not be loaded.
If you want to capture power button toggle events, I suggest modifying the driver to add some output, and then perhaps adding a udev rule to capture button presses. I will update my answer with an example ASAP.

Troubles at singlestepping on ARM machine [duplicate]

OK, this is a simple question.Does android support the PTRACE_SINGLESTEP when I use ptrace systemcall? when I want to ptrace a android apk program, I find that I can't process the SINGLESTEP trace. But the situation changed when I use the PTRACE_SYSCALL, It can work perfectly. Does the android wipe out this function or arm lack some supports in hardware? Any help will be appreciated！thanks.
this is my core program:
int main(int argc, char *argv[])
{
if(argc != 2) {
__android_log_print(ANDROID_LOG_DEBUG,TAG,"please input the pid!");
return -1;
}
if(0 != ptrace(PTRACE_ATTACH, target_pid, NULL, NULL))
{
__android_log_print(ANDROID_LOG_DEBUG,TAG,"ptrace attach error");
return -1;
}
__android_log_print(ANDROID_LOG_DEBUG,TAG,"start monitor process :%d",target_pid);
while(1)
{
wait(&status);
if(WIFEXITED(status))
{
break;
}
if (ptrace(PTRACE_SINGLESTEP, target_pid, 0, 0) != 0)
__android_log_print(ANDROID_LOG_DEBUG,TAG,"PTRACE_SINGLESTEP attach error");
}
ptrace(PTRACE_DETACH, target_pid, NULL, NULL);
__android_log_print(ANDROID_LOG_DEBUG,TAG,"monitor finished");
return 0;
}
I run this program on shell. And I can get the root privilege.
If I change the request to PTRACE_SYSCALL the program will run normally.
But if the request is PTRACE_SINGLESTEP, the program will get an error!

PTRACE_SINGLESTEP has been removed on ARM Linux since 2011, by this commit.
The HW has no support for single-stepping; previous kernel support involved decoding the instruction to figure out which one's next (branches) and temporarily replacing it with a debug-break software breakpoint.
Quoting a mailing list message about the same commit, describing the old situation: http://lists.infradead.org/pipermail/linux-arm-kernel/2011-February/041324.html
PTRACE_SINGLESTEP is a ptrace request designed to offer single-stepping
support to userspace when the underlying architecture has hardware
support for this operation.
On ARM, we set arch_has_single_step() to 1 and attempt to emulate
hardware single-stepping by disassembling the current instruction to
determine the next pc and placing a software breakpoint on that
location.
Unfortunately this has the following problems:
Only a subset of ARMv7 instructions are supported
Thumb-2 is unsupported
The code is not SMP safe
We could try to fix this code, but it turns out that because of the
above issues it is rarely used in practice. GDB, for example, uses
PTRACE_POKETEXT and PTRACE_PEEKTEXT to manage breakpoints itself and
does not require any kernel assistance.
This patch removes the single-step emulation code from ptrace meaning
that the PTRACE_SINGLESTEP request will return -EIO on ARM. Portable
code must check the return value from a ptrace call and handle the
failure gracefully.
Signed-off-by: Will Deacon <will.deacon at arm.com>
---
The comments I received about v1 suggest that:
If emulation is required, it is plausible to do it from userspace
ltrace uses the SINGLESTEP call (conditionally at compile-time since other architectures, such as mips, do not support this
request) but does not check the return value from ptrace. This is a
bug in ltrace.
strace does not use SINGLESTEP

Why a segfault instead of privilege instruction error?

I am trying to execute the privileged instruction rdmsr in user mode, and I expect to get some kind of privilege error, but I get a segfault instead. I have checked the asm and I am loading 0x186 into ecx, which is supposed to be PERFEVTSEL0, based on the manual, page 1171.
What is the cause of the segfault, and how can I modify the code below to fix it?
I want to resolve this before hacking a kernel module, because I don't want this segfault to blow up my kernel.
Update: I am running on Intel(R) Xeon(R) CPU X3470.
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
#include <sched.h>
#include <assert.h>
uint64_t
read_msr(int ecx)
{
unsigned int a, d;
__asm __volatile("rdmsr" : "=a"(a), "=d"(d) : "c"(ecx));
return ((uint64_t)a) | (((uint64_t)d) << 32);
}
int main(int ac, char **av)
{
uint64_t start, end;
cpu_set_t cpuset;
unsigned int c = 0x186;
int i = 0;
CPU_ZERO(&cpuset);
CPU_SET(i, &cpuset);
assert(sched_setaffinity(0, sizeof(cpuset), &cpuset) == 0);
printf("%lu\n", read_msr(c));
return 0;
}

The question I will try to answer: Why does the above code cause SIGSEGV instead of SIGILL, though the code has no memory error, but an illegal instruction (a privileged instruction called from non-privileged user pace)?
I would expect to get a SIGILL with si_code ILL_PRVOPC instead of a segfault, too. Your question is currently 3 years old and today, I stumbled upon the same behavior. I am disappointed too :-(
What is the cause of the segfault
The cause seems to be that the Linux kernel code decides to send SIGSEGV. Here is the responsible function:
http://elixir.free-electrons.com/linux/v4.9/source/arch/x86/kernel/traps.c#L487
Have a look at the last line of the function.
In your follow up question, you got a list of other assembly instructions which get propagated as SIGSEGV to userspace though they are actually general protection faults. I found your question because I triggered the behavior with cli.
and how can I modify the code below to fix it?
As of Linux kernel 4.9, I'm not aware of any reliable way to distinguish between a memory error (what I would expect to be a SIGSEGV) and a privileged instruction error from userspace.
There may be very hacky and unportable way to distibguish these cases. When a privileged instruction causes a SIGSEGV, the siginfo_t si_code is set to a value which is not directly listed in the SIGSEGV section of man 2 sigaction. The documented values are SEGV_MAPERR, SEGV_ACCERR, SEGV_PKUERR, but I get SI_KERNEL (0x80) on my system. According to the man page, SI_KERNEL is a code "which can be placed in si_code for any signal". In strace, you see SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0}. The responsible kernel code is here.
It would also be possible to grep dmesg for this string.
Please, never ever use those two methods to distinguish between GPF and memory error on a production system.
Specific solution for your code: Just don't run rdmsr from user space. But this answer is really unsatisfying if you are looking for a generic way to figure out why a program received a SIGSEGV.

How to get current program counter inside mprotect handler and update it

I want to get the current program counter(PC) value inside mprotect handler. From there I want to increase the value of PC by 'n' number of instruction so that the program will skip some instructions. I want to do all that for linux kernel version 3.0.1. Any help about the data structures where I can get the value of PC and how to update that value? Sample code will be appreciated. Thanks in advance.
My idea is to use some task when a memory address is being written. So my idea is to use mprotect to make the address write protected. When some code tries to write something on that memory address, I will use mprotect handler to perform some operation. After taking care of the handler, I want to make the write operation successful. So my idea was to make the memory address unprotected inside handler and then perform the write operation again. When the code returns from the handler function, the PC will point to the original write instruction, whereas I want it to point to the next instruction. So I want to increase PC by one instruction irrespective of instruction lenght.
Check the following flow
MprotectHandler(){
unprotect the memory address on which protection fault arised
write it again
set PC to the next instruction of original write instruction
}
inside main function:
main(){
mprotect a memory address
try to write the mprotected address // original write instruction
Other instruction // after mprotect handler execution, PC should point here
}

Since it is tedious to compute the instruction length on several CISC processors, I recommend a somewhat different procedure: Fork using clone(..., CLONE_VM, ...) into a tracer and a tracee thread, and in the tracer instead of
write it again
set PC to the next instruction of original write instruction
do a
ptrace(PTRACE_SINGLESTEP, ...)
- after the trace trap you may want to protect the memory again.

Here is sample code demonstrating the basic principle:
#include <signal.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <sys/ucontext.h>
static void
handler(int signal, siginfo_t* siginfo, void* uap) {
printf("Attempt to access memory at address %p\n", siginfo->si_addr);
mcontext_t *mctx = &((ucontext_t *)uap)->uc_mcontext;
greg_t *rsp = &mctx->gregs[15];
greg_t *rip = &mctx->gregs[16];
// Jump past the bad memory write.
*rip = *rip + 7;
}
static void
dobad(uintptr_t *addr) {
*addr = 0x998877;
printf("I'm a survivor!\n");
}
int
main(int argc, char *argv[]) {
struct sigaction act;
memset(&act, 0, sizeof(struct sigaction));
sigemptyset(&act.sa_mask);
act.sa_sigaction = handler;
act.sa_flags = SA_SIGINFO | SA_ONSTACK;
sigaction(SIGSEGV, &act, NULL);
// Write to an address we don't have access to.
dobad((uintptr_t*)0x1234);
return 0;
}
It shows you how to update the PC in response to a page fault. It lacks the following which you have to implement yourself:
Instruction length decoding. As you can see I have hardcoded + 7 which happens to work on my 64bit Linux since the instruction causing the page fault is a 7 byte MOV. As Armali said in his answer, it is a tedious problem and you probably have to use an external library like libudis86 or something.
mprotect() handling. You have the address that caused the page fault in siginfo->si_addr and using that it should be trivial to find the address of the mprotected page and unprotect it.

SIGSEGV Crash but unable to collect backtrace

Information about the application:
Linux - 2.4.1 Kernel
m68k based embedded application
Single process multithreaded application
We have an application where we have implemented the connection for the SIGSEGV with a segmentation_handler function. In this segmentation handler we create a file, do a file write (like "obtained stack frame"), then using backtrace and symbols write all the stack trace into the same file.
Problem: We get a SIGSEGV (confirmed due to creation of the log file) but unfortunately the file is empty (0kb file) with no information in it. (Even the first string which is a plain string is not available in the file).
I want to understand in what scenarios such a thing can happen because we can solve the crash if we get the stack trace, but we don't have it and the mechanism to get it did not work either :(
void segmentation_handler(int signal_no) {
char buffer[512]; .............
InitLog();//Create a log file
printf("\n*** segmentation fault occured ***\n");
fflush(stdout);
memset(buffer, 0, 512);
size = backtrace (array, 50);
strings = backtrace_symbols (array, size);
sprintf(buffer, "Obtained %d stack frames.\n", size);
Log(buffer);// Write the buffer into the file
for (n = 0; n < size; n++) {
sprintf(buffer, "%s\n", strings[n]); Log(buffer);
}
CloseLog();
}

Your segmentation handler is very naive and contains multiple errors. Here is a short list:
You are calling fprintf() and multiple other functions which are not async signal safe. Consider, fprintf uses a lock internally to synch multiple calls to the same file descriptor from multiple threads. What if your segmentation fault was in the middle of printf and the lock was taken? you would dead lock in the middle of the segmentation handlers...
You are allocating memory (call to backtrace_symbols), but if the segmentation fault was due to malloc arena corruption (a very likely cause of segmentation violations) you would double fault inside the segmentation handler.
If multiple threads cause an exception in the same time the code will open multiple times the file and run over the log.
There are other problems, but these are the basics...
There is a video on my lecture on how to write proper fault handlers available here: http://free-electrons.com/pub/video/2008/ols/ols2008-gilad-ben-yossef-fault-handlers.ogg

Remove the segmentation handler.
Allow the program to dump core (ulimit -c unlimited or setrlimit in process)
see if you have a core file.
do the backtrace thing offline using your toolchain debugger
You can also write a program that segfault on purpose, and test both method (ie post mortem using the core file, or in signal handler).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string