Why a segfault instead of privilege instruction error? - linux

I am trying to execute the privileged instruction rdmsr in user mode, and I expect to get some kind of privilege error, but I get a segfault instead. I have checked the asm and I am loading 0x186 into ecx, which is supposed to be PERFEVTSEL0, based on the manual, page 1171.
What is the cause of the segfault, and how can I modify the code below to fix it?
I want to resolve this before hacking a kernel module, because I don't want this segfault to blow up my kernel.
Update: I am running on Intel(R) Xeon(R) CPU X3470.
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
#include <sched.h>
#include <assert.h>
uint64_t
read_msr(int ecx)
{
unsigned int a, d;
__asm __volatile("rdmsr" : "=a"(a), "=d"(d) : "c"(ecx));
return ((uint64_t)a) | (((uint64_t)d) << 32);
}
int main(int ac, char **av)
{
uint64_t start, end;
cpu_set_t cpuset;
unsigned int c = 0x186;
int i = 0;
CPU_ZERO(&cpuset);
CPU_SET(i, &cpuset);
assert(sched_setaffinity(0, sizeof(cpuset), &cpuset) == 0);
printf("%lu\n", read_msr(c));
return 0;
}

The question I will try to answer: Why does the above code cause SIGSEGV instead of SIGILL, though the code has no memory error, but an illegal instruction (a privileged instruction called from non-privileged user pace)?
I would expect to get a SIGILL with si_code ILL_PRVOPC instead of a segfault, too. Your question is currently 3 years old and today, I stumbled upon the same behavior. I am disappointed too :-(
What is the cause of the segfault
The cause seems to be that the Linux kernel code decides to send SIGSEGV. Here is the responsible function:
http://elixir.free-electrons.com/linux/v4.9/source/arch/x86/kernel/traps.c#L487
Have a look at the last line of the function.
In your follow up question, you got a list of other assembly instructions which get propagated as SIGSEGV to userspace though they are actually general protection faults. I found your question because I triggered the behavior with cli.
and how can I modify the code below to fix it?
As of Linux kernel 4.9, I'm not aware of any reliable way to distinguish between a memory error (what I would expect to be a SIGSEGV) and a privileged instruction error from userspace.
There may be very hacky and unportable way to distibguish these cases. When a privileged instruction causes a SIGSEGV, the siginfo_t si_code is set to a value which is not directly listed in the SIGSEGV section of man 2 sigaction. The documented values are SEGV_MAPERR, SEGV_ACCERR, SEGV_PKUERR, but I get SI_KERNEL (0x80) on my system. According to the man page, SI_KERNEL is a code "which can be placed in si_code for any signal". In strace, you see SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0}. The responsible kernel code is here.
It would also be possible to grep dmesg for this string.
Please, never ever use those two methods to distinguish between GPF and memory error on a production system.
Specific solution for your code: Just don't run rdmsr from user space. But this answer is really unsatisfying if you are looking for a generic way to figure out why a program received a SIGSEGV.

Related

How to skip a system call with ptrace?

I am trying to write a program with ptrace that tracks all system calls made by a child.
Now I have a list of system calls which are forbidden for the child. I am able to track all system calls using ptrace but I just don't know how to skip a particular system call.
Currently my tracking (parent) process gets a signal everytime child enters or exits a system call (PTRACE_SYSCALL). But if child is trying to enter a prohibited system call then I wan't to make child skip that call and move to next step. Also when I do this I want the child to know that there was a permission denied error, so I will be setting errno = 13, will that be enough?
Update:
gdb provides this feature of skipping one line..what mechanism does gdb use?
How to achieve that?
UPDATE:
The best way to achieve this with ptrace is to redirect the original system call to some other system call for example to nanosleep() call. This call will fail since it will receive illegal arguments. Then you just have to change the return code in EAX to -EACCES to pretend that call failed due to Permission denied error.
I found two college lectures that mention the inability to abort an initiated system call as a disadvantage of ptrace (the manpage mentions a PTRACE_SYSEMU macro that looks like could do it, but the newer headers don't have it). Theoretically, you could make use of the ptrace entry and exit stops to counteract the calls you don't want -- by swapping in bogus arguments that'll cause the system call to fail or do nothing, or by injecting code that'll counter a previous systemcall, but that seems extremely hacky.
On Linux, you should be able to achieve your goal with seccomp:
#include <fcntl.h>
#include <seccomp.h>
#include <errno.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdlib.h>
#include <stdio.h>
static int set_security(){
int rc = -1;
scmp_filter_ctx ctx;
struct scmp_arg_cmp arg_cmp[] = { SCMP_A0(SCMP_CMP_EQ, 2) };
ctx = seccomp_init(SCMP_ACT_ERRNO(ENOSYS));
/*ctx = seccomp_init(SCMP_ACT_ALLOW);*/
if (ctx == NULL)
goto out;
rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(exit), 0);
if (rc < 0)
goto out;
rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(close), 0);
if (rc < 0)
goto out;
rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write), 1,
SCMP_CMP(0, SCMP_CMP_EQ, 1));
if (rc < 0)
goto out;
rc = seccomp_rule_add_array(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write), 1,
arg_cmp);
if (rc < 0)
goto out;
rc = seccomp_load(ctx);
if (rc < 0)
goto out;
/* ... */
out:
seccomp_release(ctx);
return -rc;
}
int main(int argc, char *argv[])
{
int fd;
const char out_msg[] = "stdout test\n";
const char err_msg[] = "stderr test\n";
if(0>set_security())
return 1;
if(0>write(1, out_msg, sizeof(out_msg)))
perror("Write stdout");
if(0>write(2, err_msg, sizeof(err_msg)))
perror("Write stderr");
//This should fail with ENOSYS
if(0>(fd=open("/dev/zero", O_RDONLY)))
perror("open");
exit(0);
}
If you want to disable a system call, it's probably easiest to use symbol interposition, instead of ptrace. (Assuming you're not aiming for security against malicious binaries. If this is for security reasons, PSKocik's answer shows how to use seccomp).
Make a shared library that provides a gettimeofday function which just sets errno and returns without making a system call.
Use LD_PRELOAD=./my_library.so ./a.out to get it loaded before libc.
This won't work on binaries that statically link libc, or that use inline system calls instead of the libc wrappers (e.g. mov eax, SYS_gettimeofday / syscall). You can disassemble a binary and look for syscall (x86-64) or int 0x80 (i386 ABI) to check for that.
Note that glibc's gettimeofday and clock_gettime implementations actually never make a real system call; instead they use RDTSC and the VDSO page exported by the kernel to find out how to scale the timestamp counter into a real time. (So intercepting the library function is your only hope; a strace-style method wouldn't catch them anyway.)
BTW, failed system calls return negative error values. e.g. on x86-64, rax = -EPERM. The glibc syscall wrappers take care of detecting negative values and setting the errno global variable. So if you are intercepting syscall instructions with ptrace, that's what you need to do.
re:edit: gdb skip line
gdb can skip a line by using ptrace to resume execution in a different place. That only works if you're already stopped there, though. So to use this to "skip" system calls, you'd have to set breakpoints at every system call site you want to block in the whole process.
It doesn't sound like a useful approach. If someone's actively trying to defeat it, they can just JIT-compile some code that makes a system call directly. You could prevent processes from mapping memory that's both writable and executable, and scanning it for system calls every time you detect a fault from the process jumping into memory that was requested to be executable but your mechanism just set it to writable. (So behind the scenes you catch the hardware-generated exception and flip the page from writable to executable and scan it, or back to writable but not executable.)
This sounds like a lot of kernel hacking to implement correctly, when you could just use seccomp (see the other answer) if you need something that's resistant to workarounds and static binaries.

Are function locations altered when running a program through GDB?

I'm trying to run through a buffer overflow exercise, here is the code:
#include <stdio.h>
int badfunction() {
char buffer[8];
gets(buffer);
puts(buffer);
}
int cantrun() {
printf("This function cant run because it is never called");
}
int main() {
badfunction();
}
This is a simple piece of code. The objective is to overflow the buffer in badfunction()and override the return address having it point to the memory address of the function cantrun().
Step 1: Find the offset of the return address (in this case it's 12bytes, 8 for the buffer and 4 for the base pointer).
Step 2: Find the memory location of cantrun(), gdb say it's 0x0804849a.
When I run the program printf "%012x\x9a\x84\x04\x08" | ./vuln, I get the error "illegal instruction". This suggests to me that I have correctly overwritten the EIP, but that the memory location of cantrun() is incorrect.
I am using Kali Linux, Kernel 3.14, I have ASLR turned off and I am using execstack to allow an executable stack. Am I doing something wrong?
UPDATE:
As a shot in the dark I tried to find the correct instruction by moving the address around and 0x0804849b does the trick. Why is this different than what GDB shows. When running GDB, 0x0804849a is the location of the prelude instruction push ebp and 0x0804849b is the prelude instruction mov ebp,esp.
gdb doesn't do anything to change the locations of functions in the programs it executes. ASLR may matter, but by default gdb turns this off to enable simpler debugging.
It's hard to say why you are seeing the results you are. What does disassembling the function in gdb show?

How to get current program counter inside mprotect handler and update it

I want to get the current program counter(PC) value inside mprotect handler. From there I want to increase the value of PC by 'n' number of instruction so that the program will skip some instructions. I want to do all that for linux kernel version 3.0.1. Any help about the data structures where I can get the value of PC and how to update that value? Sample code will be appreciated. Thanks in advance.
My idea is to use some task when a memory address is being written. So my idea is to use mprotect to make the address write protected. When some code tries to write something on that memory address, I will use mprotect handler to perform some operation. After taking care of the handler, I want to make the write operation successful. So my idea was to make the memory address unprotected inside handler and then perform the write operation again. When the code returns from the handler function, the PC will point to the original write instruction, whereas I want it to point to the next instruction. So I want to increase PC by one instruction irrespective of instruction lenght.
Check the following flow
MprotectHandler(){
unprotect the memory address on which protection fault arised
write it again
set PC to the next instruction of original write instruction
}
inside main function:
main(){
mprotect a memory address
try to write the mprotected address // original write instruction
Other instruction // after mprotect handler execution, PC should point here
}
Since it is tedious to compute the instruction length on several CISC processors, I recommend a somewhat different procedure: Fork using clone(..., CLONE_VM, ...) into a tracer and a tracee thread, and in the tracer instead of
write it again
set PC to the next instruction of original write instruction
do a
ptrace(PTRACE_SINGLESTEP, ...)
- after the trace trap you may want to protect the memory again.
Here is sample code demonstrating the basic principle:
#include <signal.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <sys/ucontext.h>
static void
handler(int signal, siginfo_t* siginfo, void* uap) {
printf("Attempt to access memory at address %p\n", siginfo->si_addr);
mcontext_t *mctx = &((ucontext_t *)uap)->uc_mcontext;
greg_t *rsp = &mctx->gregs[15];
greg_t *rip = &mctx->gregs[16];
// Jump past the bad memory write.
*rip = *rip + 7;
}
static void
dobad(uintptr_t *addr) {
*addr = 0x998877;
printf("I'm a survivor!\n");
}
int
main(int argc, char *argv[]) {
struct sigaction act;
memset(&act, 0, sizeof(struct sigaction));
sigemptyset(&act.sa_mask);
act.sa_sigaction = handler;
act.sa_flags = SA_SIGINFO | SA_ONSTACK;
sigaction(SIGSEGV, &act, NULL);
// Write to an address we don't have access to.
dobad((uintptr_t*)0x1234);
return 0;
}
It shows you how to update the PC in response to a page fault. It lacks the following which you have to implement yourself:
Instruction length decoding. As you can see I have hardcoded + 7 which happens to work on my 64bit Linux since the instruction causing the page fault is a 7 byte MOV. As Armali said in his answer, it is a tedious problem and you probably have to use an external library like libudis86 or something.
mprotect() handling. You have the address that caused the page fault in siginfo->si_addr and using that it should be trivial to find the address of the mprotected page and unprotect it.

Simple heap overflow exploit with toy example on old glibc

Consider this example of a heap buffer overflow vulnerable program in Linux, taken directly from the "Buffer Overflow Attacks" (p. 248) book:
#include <stdlib.h>
#include <string.h>
int main(int argc, char **argv)
{
char *A, *B;
A = malloc(128);
B = malloc(32);
strcpy(A, argv[1]);
free(A);
free(B);
return 0;
}
Since unlink() has been changed to prevent the most simple form of exploit using the FD and BK pointers with a sanity check, I'm using a very old system I have with an old version of glibc (version 2.3.2). I'm also setting MALLOC_CHECK_=0 for this testing.
My goal of this toy example is to simply see if I can write 4 bytes to some arbitrary address I specify. The most simple test I can think of is to try write something to 0x41414141, which is an illegal address and should let the program crash to just confirm to me that it is indeed trying to write to this address (something I should be able to observe in GDB).
So I try executing with the argument perl -e 'print "A"x128 . "\xf8\xff\xff\xff" . "\xf8\xff\xff\xff" . "\x41\x41\x41\x41" . "\x41\x41\x41\x41" '
So I have:
Buffer A: 128 bytes of 0x41.
prev_size: 0xfffffff8
size: 0xfffffff8
FD: 0x41414141
BK: 0x41414141
I'm using 0xfffffff8 instead of 0xfffffffc because there is a note that with glibc 2.3 the third lowest bit NON_MAIN_AREA is used for management purposes for the arenas and has to be 0.
This should attempt to write 0x41414141 to 0x41414141 (+ 12 to be more correct, but still an illegal address), correct? However, when I execute this, the program simply terminates normally.
What am I missing here? This seems simple enough that it shouldn't be that hard to get to work.
I've tried various things such as using 0xfffffffc instead for prev_size and size, using legal addresses for FD (some address on the heap). I've tried swapping the order A and B are free()'d, I've tried to step into free() to see what happens in GDB but I got lost. Note that there shouldn't be any other security features on this system as it is very old and wouldn't have NX-bit, ASLR, etc (not that it should matter for the purpose of just writing 4 bytes to an illegal address).
Any ideas for how to make this work?
I could add that if using MALLOC_CHECK_=3 I get this:
malloc: using debugging hooks
malloc: using debugging hooks
free(): invalid pointer 0x8049688!
Program received signal SIGABRT, Aborted.
0x4004a1b1 in kill () from /lib/libc.so.6

What happens if a program makes an OABI style syscall in an EABI-only kernel?

Or more generally, what happens if an swi instruction with an opcode !=0 is executed on such a kernel? Does it produce a signal? I ask because I'd like to trap it.
The code that fields swi instructions is here: http://lxr.linux.no/linux+*/arch/arm/kernel/entry-common.S#L335. I am not an ARM expert, but it appears that the CPU does not stash the swi argument anywhere the kernel can get at it; if the kernel wants to know, it has to fetch the instruction from the calling program's runtime image. This makes every system call more expensive, so (if I'm reading things correctly) the kernel only bothers to find out what the swi argument is if it's compiled with CONFIG_OABI_COMPAT.
EDIT: The ARM ARM confirms that SWI does not do anything useful with its argument. (Physical page 634 / logical page A7-118.)
So I tried to see what would happen. I compiled the following program and ran it:
#include <stdio.h>
#include <signal.h>
void traphandler(int signum, siginfo_t *info, void *context)
{
puts("trap");
}
int main()
{
struct sigaction sa;
sa.sa_sigaction = traphandler;
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_RESTART | SA_SIGINFO;
sigaction(SIGTRAP, &sa, NULL);
puts("begin");
asm("swi 1");
puts("after swi 1");
asm("swi 255");
puts("after swi 255");
}
and the output was:
begin
after swi 1
after swi 255
The signal handler was not called, nor the program was killed. Quite disappointing.

Resources