I want to get the current program counter(PC) value inside mprotect handler. From there I want to increase the value of PC by 'n' number of instruction so that the program will skip some instructions. I want to do all that for linux kernel version 3.0.1. Any help about the data structures where I can get the value of PC and how to update that value? Sample code will be appreciated. Thanks in advance.
My idea is to use some task when a memory address is being written. So my idea is to use mprotect to make the address write protected. When some code tries to write something on that memory address, I will use mprotect handler to perform some operation. After taking care of the handler, I want to make the write operation successful. So my idea was to make the memory address unprotected inside handler and then perform the write operation again. When the code returns from the handler function, the PC will point to the original write instruction, whereas I want it to point to the next instruction. So I want to increase PC by one instruction irrespective of instruction lenght.
Check the following flow
MprotectHandler(){
unprotect the memory address on which protection fault arised
write it again
set PC to the next instruction of original write instruction
}
inside main function:
main(){
mprotect a memory address
try to write the mprotected address // original write instruction
Other instruction // after mprotect handler execution, PC should point here
}
Since it is tedious to compute the instruction length on several CISC processors, I recommend a somewhat different procedure: Fork using clone(..., CLONE_VM, ...) into a tracer and a tracee thread, and in the tracer instead of
write it again
set PC to the next instruction of original write instruction
do a
ptrace(PTRACE_SINGLESTEP, ...)
- after the trace trap you may want to protect the memory again.
Here is sample code demonstrating the basic principle:
#include <signal.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <sys/ucontext.h>
static void
handler(int signal, siginfo_t* siginfo, void* uap) {
printf("Attempt to access memory at address %p\n", siginfo->si_addr);
mcontext_t *mctx = &((ucontext_t *)uap)->uc_mcontext;
greg_t *rsp = &mctx->gregs[15];
greg_t *rip = &mctx->gregs[16];
// Jump past the bad memory write.
*rip = *rip + 7;
}
static void
dobad(uintptr_t *addr) {
*addr = 0x998877;
printf("I'm a survivor!\n");
}
int
main(int argc, char *argv[]) {
struct sigaction act;
memset(&act, 0, sizeof(struct sigaction));
sigemptyset(&act.sa_mask);
act.sa_sigaction = handler;
act.sa_flags = SA_SIGINFO | SA_ONSTACK;
sigaction(SIGSEGV, &act, NULL);
// Write to an address we don't have access to.
dobad((uintptr_t*)0x1234);
return 0;
}
It shows you how to update the PC in response to a page fault. It lacks the following which you have to implement yourself:
Instruction length decoding. As you can see I have hardcoded + 7 which happens to work on my 64bit Linux since the instruction causing the page fault is a 7 byte MOV. As Armali said in his answer, it is a tedious problem and you probably have to use an external library like libudis86 or something.
mprotect() handling. You have the address that caused the page fault in siginfo->si_addr and using that it should be trivial to find the address of the mprotected page and unprotect it.
Related
Let us consider that I have an application that is to be executed on 1st node. This application however, cannot execute some function on this 1st node as the node lacks such capabilities. Hence, in order to make this application execution flawless, I am planning to steal the process's stack, heap & its registers using ptrace & send them over to other fully capable 2nd node. Here in this 2nd node, I would like to execute the same process(i.e same executable on the same architecture like x86) until the exact same point 1st process has exeuted, apply the previously stolen stack, heap & register's value onto this process and execute it here and transfer the results back to the 1st node and start executing the application from there.
I have also disabled the ASLR (Address space layout randomization) so that it will be one to one mapping between the process executed on remote node.
On applying such logic, the program ends up with "Stack smashing detected"
Is there anything that I am missing here, or is the idea itself not so feasible???
NOTE: I am also skipping the part of copying kernel stack, as the process on both sides are executed exactly until the same instruction. Please also note that this was a very simple program that I tried as I don't want the complexity of heaps to be involved.
#include <unistd.h>
#include <stdio.h>
#include <signal.h>
void add_one(int *p){
*p += 2;
}
int main(int argc, char **argv)
{
int i = 0;
add_one(&i);
return 0;
}
Above picture holds that program that I experimented with, here I disassembled and found out the address of the function add_one, the point at which I would steal stack & process registers and send them over to apply onto the other identical process in node 2.
Any help on how to do such migrations and the things that I am missing would really help me in moving forward.
if you want to do this you need to at least disable stack canaries, because those will 100% mismatch when carrying over the execution to another machine even if you copied the entire address space.
-fno-stack-protector will do
I made the following x86-64 program to view where the base address of the Interrupt Descriptor Tables starts:
#include <stdio.h>
#include <inttypes.h>
typedef struct __attribute__((packed)) {
uint16_t limit;
uint64_t base;
}idt_data_t;
static inline void store_idt(idt_data_t *idt_data)
{
asm volatile("sidt %0":"=m" (*idt_data));
}
int main(void)
{
idt_data_t idt_data;
store_idt(&idt_data);
printf("IDT Limit : 0x%X\n", idt_data.limit);
printf("IDT Base : 0x%lX\n", idt_data.base);
return 0;
}
And it prints the following:
IDT Limit : 0xFFF
IDT Base : 0xFFFFFE0000000000
The base address doesn't seem to be correct because the address should always be a physical address, am I right?
Also, I'm not sure but the limit seems to be too high. What am I doing wrong?
It's a linear address, not necessarily a physical address. In other words, it's subject to the page table like most other addresses. It has to be in pages that are never paged to disk--it wouldn't be able to handle page faults if not--but it can be in addresses that differ physically from virtually.
On x86-64, each entry of the IDT is 16 bytes long. There are 256 interrupt vectors. 256 * 16 = 4096 = 0x1000. The IDTR limit is a "less than or equal" check, so it's typical to use 0xFFF.
SIDT is a privileged instruction on newer CPUs if the OS enables a certain feature, so it's advisable not to use it in user mode unless you're writing an exploit PoC or something. It's possible that an OS lies about the answer rather than throwing an exception, but I don't know.
I am trying to write a program with ptrace that tracks all system calls made by a child.
Now I have a list of system calls which are forbidden for the child. I am able to track all system calls using ptrace but I just don't know how to skip a particular system call.
Currently my tracking (parent) process gets a signal everytime child enters or exits a system call (PTRACE_SYSCALL). But if child is trying to enter a prohibited system call then I wan't to make child skip that call and move to next step. Also when I do this I want the child to know that there was a permission denied error, so I will be setting errno = 13, will that be enough?
Update:
gdb provides this feature of skipping one line..what mechanism does gdb use?
How to achieve that?
UPDATE:
The best way to achieve this with ptrace is to redirect the original system call to some other system call for example to nanosleep() call. This call will fail since it will receive illegal arguments. Then you just have to change the return code in EAX to -EACCES to pretend that call failed due to Permission denied error.
I found two college lectures that mention the inability to abort an initiated system call as a disadvantage of ptrace (the manpage mentions a PTRACE_SYSEMU macro that looks like could do it, but the newer headers don't have it). Theoretically, you could make use of the ptrace entry and exit stops to counteract the calls you don't want -- by swapping in bogus arguments that'll cause the system call to fail or do nothing, or by injecting code that'll counter a previous systemcall, but that seems extremely hacky.
On Linux, you should be able to achieve your goal with seccomp:
#include <fcntl.h>
#include <seccomp.h>
#include <errno.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdlib.h>
#include <stdio.h>
static int set_security(){
int rc = -1;
scmp_filter_ctx ctx;
struct scmp_arg_cmp arg_cmp[] = { SCMP_A0(SCMP_CMP_EQ, 2) };
ctx = seccomp_init(SCMP_ACT_ERRNO(ENOSYS));
/*ctx = seccomp_init(SCMP_ACT_ALLOW);*/
if (ctx == NULL)
goto out;
rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(exit), 0);
if (rc < 0)
goto out;
rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(close), 0);
if (rc < 0)
goto out;
rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write), 1,
SCMP_CMP(0, SCMP_CMP_EQ, 1));
if (rc < 0)
goto out;
rc = seccomp_rule_add_array(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write), 1,
arg_cmp);
if (rc < 0)
goto out;
rc = seccomp_load(ctx);
if (rc < 0)
goto out;
/* ... */
out:
seccomp_release(ctx);
return -rc;
}
int main(int argc, char *argv[])
{
int fd;
const char out_msg[] = "stdout test\n";
const char err_msg[] = "stderr test\n";
if(0>set_security())
return 1;
if(0>write(1, out_msg, sizeof(out_msg)))
perror("Write stdout");
if(0>write(2, err_msg, sizeof(err_msg)))
perror("Write stderr");
//This should fail with ENOSYS
if(0>(fd=open("/dev/zero", O_RDONLY)))
perror("open");
exit(0);
}
If you want to disable a system call, it's probably easiest to use symbol interposition, instead of ptrace. (Assuming you're not aiming for security against malicious binaries. If this is for security reasons, PSKocik's answer shows how to use seccomp).
Make a shared library that provides a gettimeofday function which just sets errno and returns without making a system call.
Use LD_PRELOAD=./my_library.so ./a.out to get it loaded before libc.
This won't work on binaries that statically link libc, or that use inline system calls instead of the libc wrappers (e.g. mov eax, SYS_gettimeofday / syscall). You can disassemble a binary and look for syscall (x86-64) or int 0x80 (i386 ABI) to check for that.
Note that glibc's gettimeofday and clock_gettime implementations actually never make a real system call; instead they use RDTSC and the VDSO page exported by the kernel to find out how to scale the timestamp counter into a real time. (So intercepting the library function is your only hope; a strace-style method wouldn't catch them anyway.)
BTW, failed system calls return negative error values. e.g. on x86-64, rax = -EPERM. The glibc syscall wrappers take care of detecting negative values and setting the errno global variable. So if you are intercepting syscall instructions with ptrace, that's what you need to do.
re:edit: gdb skip line
gdb can skip a line by using ptrace to resume execution in a different place. That only works if you're already stopped there, though. So to use this to "skip" system calls, you'd have to set breakpoints at every system call site you want to block in the whole process.
It doesn't sound like a useful approach. If someone's actively trying to defeat it, they can just JIT-compile some code that makes a system call directly. You could prevent processes from mapping memory that's both writable and executable, and scanning it for system calls every time you detect a fault from the process jumping into memory that was requested to be executable but your mechanism just set it to writable. (So behind the scenes you catch the hardware-generated exception and flip the page from writable to executable and scan it, or back to writable but not executable.)
This sounds like a lot of kernel hacking to implement correctly, when you could just use seccomp (see the other answer) if you need something that's resistant to workarounds and static binaries.
I am trying to execute the privileged instruction rdmsr in user mode, and I expect to get some kind of privilege error, but I get a segfault instead. I have checked the asm and I am loading 0x186 into ecx, which is supposed to be PERFEVTSEL0, based on the manual, page 1171.
What is the cause of the segfault, and how can I modify the code below to fix it?
I want to resolve this before hacking a kernel module, because I don't want this segfault to blow up my kernel.
Update: I am running on Intel(R) Xeon(R) CPU X3470.
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
#include <sched.h>
#include <assert.h>
uint64_t
read_msr(int ecx)
{
unsigned int a, d;
__asm __volatile("rdmsr" : "=a"(a), "=d"(d) : "c"(ecx));
return ((uint64_t)a) | (((uint64_t)d) << 32);
}
int main(int ac, char **av)
{
uint64_t start, end;
cpu_set_t cpuset;
unsigned int c = 0x186;
int i = 0;
CPU_ZERO(&cpuset);
CPU_SET(i, &cpuset);
assert(sched_setaffinity(0, sizeof(cpuset), &cpuset) == 0);
printf("%lu\n", read_msr(c));
return 0;
}
The question I will try to answer: Why does the above code cause SIGSEGV instead of SIGILL, though the code has no memory error, but an illegal instruction (a privileged instruction called from non-privileged user pace)?
I would expect to get a SIGILL with si_code ILL_PRVOPC instead of a segfault, too. Your question is currently 3 years old and today, I stumbled upon the same behavior. I am disappointed too :-(
What is the cause of the segfault
The cause seems to be that the Linux kernel code decides to send SIGSEGV. Here is the responsible function:
http://elixir.free-electrons.com/linux/v4.9/source/arch/x86/kernel/traps.c#L487
Have a look at the last line of the function.
In your follow up question, you got a list of other assembly instructions which get propagated as SIGSEGV to userspace though they are actually general protection faults. I found your question because I triggered the behavior with cli.
and how can I modify the code below to fix it?
As of Linux kernel 4.9, I'm not aware of any reliable way to distinguish between a memory error (what I would expect to be a SIGSEGV) and a privileged instruction error from userspace.
There may be very hacky and unportable way to distibguish these cases. When a privileged instruction causes a SIGSEGV, the siginfo_t si_code is set to a value which is not directly listed in the SIGSEGV section of man 2 sigaction. The documented values are SEGV_MAPERR, SEGV_ACCERR, SEGV_PKUERR, but I get SI_KERNEL (0x80) on my system. According to the man page, SI_KERNEL is a code "which can be placed in si_code for any signal". In strace, you see SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0}. The responsible kernel code is here.
It would also be possible to grep dmesg for this string.
Please, never ever use those two methods to distinguish between GPF and memory error on a production system.
Specific solution for your code: Just don't run rdmsr from user space. But this answer is really unsatisfying if you are looking for a generic way to figure out why a program received a SIGSEGV.
Consider this example of a heap buffer overflow vulnerable program in Linux, taken directly from the "Buffer Overflow Attacks" (p. 248) book:
#include <stdlib.h>
#include <string.h>
int main(int argc, char **argv)
{
char *A, *B;
A = malloc(128);
B = malloc(32);
strcpy(A, argv[1]);
free(A);
free(B);
return 0;
}
Since unlink() has been changed to prevent the most simple form of exploit using the FD and BK pointers with a sanity check, I'm using a very old system I have with an old version of glibc (version 2.3.2). I'm also setting MALLOC_CHECK_=0 for this testing.
My goal of this toy example is to simply see if I can write 4 bytes to some arbitrary address I specify. The most simple test I can think of is to try write something to 0x41414141, which is an illegal address and should let the program crash to just confirm to me that it is indeed trying to write to this address (something I should be able to observe in GDB).
So I try executing with the argument perl -e 'print "A"x128 . "\xf8\xff\xff\xff" . "\xf8\xff\xff\xff" . "\x41\x41\x41\x41" . "\x41\x41\x41\x41" '
So I have:
Buffer A: 128 bytes of 0x41.
prev_size: 0xfffffff8
size: 0xfffffff8
FD: 0x41414141
BK: 0x41414141
I'm using 0xfffffff8 instead of 0xfffffffc because there is a note that with glibc 2.3 the third lowest bit NON_MAIN_AREA is used for management purposes for the arenas and has to be 0.
This should attempt to write 0x41414141 to 0x41414141 (+ 12 to be more correct, but still an illegal address), correct? However, when I execute this, the program simply terminates normally.
What am I missing here? This seems simple enough that it shouldn't be that hard to get to work.
I've tried various things such as using 0xfffffffc instead for prev_size and size, using legal addresses for FD (some address on the heap). I've tried swapping the order A and B are free()'d, I've tried to step into free() to see what happens in GDB but I got lost. Note that there shouldn't be any other security features on this system as it is very old and wouldn't have NX-bit, ASLR, etc (not that it should matter for the purpose of just writing 4 bytes to an illegal address).
Any ideas for how to make this work?
I could add that if using MALLOC_CHECK_=3 I get this:
malloc: using debugging hooks
malloc: using debugging hooks
free(): invalid pointer 0x8049688!
Program received signal SIGABRT, Aborted.
0x4004a1b1 in kill () from /lib/libc.so.6