Stack smashing detected while applying stack & register on the remote identical process - linux

Let us consider that I have an application that is to be executed on 1st node. This application however, cannot execute some function on this 1st node as the node lacks such capabilities. Hence, in order to make this application execution flawless, I am planning to steal the process's stack, heap & its registers using ptrace & send them over to other fully capable 2nd node. Here in this 2nd node, I would like to execute the same process(i.e same executable on the same architecture like x86) until the exact same point 1st process has exeuted, apply the previously stolen stack, heap & register's value onto this process and execute it here and transfer the results back to the 1st node and start executing the application from there.
I have also disabled the ASLR (Address space layout randomization) so that it will be one to one mapping between the process executed on remote node.
On applying such logic, the program ends up with "Stack smashing detected"
Is there anything that I am missing here, or is the idea itself not so feasible???
NOTE: I am also skipping the part of copying kernel stack, as the process on both sides are executed exactly until the same instruction. Please also note that this was a very simple program that I tried as I don't want the complexity of heaps to be involved.
#include <unistd.h>
#include <stdio.h>
#include <signal.h>
void add_one(int *p){
*p += 2;
}
int main(int argc, char **argv)
{
int i = 0;
add_one(&i);
return 0;
}
Above picture holds that program that I experimented with, here I disassembled and found out the address of the function add_one, the point at which I would steal stack & process registers and send them over to apply onto the other identical process in node 2.
Any help on how to do such migrations and the things that I am missing would really help me in moving forward.

if you want to do this you need to at least disable stack canaries, because those will 100% mismatch when carrying over the execution to another machine even if you copied the entire address space.
-fno-stack-protector will do

Related

USB device stuck polling on second call

I have a scientific camera, a USB3Vision device, that I wrote a program to captures images for. No problems when I use it on my desktop, it works fine every time. If I take the exact same application to another (2 other actually) computer it acquires one image but hangs the second time that you try to capture another.
The desktop I'm developing on is Fedora 25 and I am having this problem on clean installs of both Fedora 24 and 25. Also, on both computers the permissions of the USB device have been set to 666 using a udev rule so it's not some permissions error.
If I attach gdb to the process that's hung it shows that it's waiting on a blocking call to poll(), I've tried waiting a really long time but it never completes. This is where I'm at a bit of a loss now, I'm not really sure what to poke at to troubleshoot why that call never finishes on computers that aren't the mine. I'm also not really sure how to determine what's different between the USB buses on the different computers, that would probably be really useful too.
Example code to recreate this issue is irrelevant IMO, but here it is because it may be asked for if not included:
#include <arv.h>
#include <stdlib.h>
#include <stdio.h>
int
main (int argc, char **argv)
{
ArvCamera *camera;
ArvBuffer *buffer;
camera = arv_camera_new (argc > 1 ? argv[1] : NULL);
buffer = arv_camera_acquisition (camera, 0);
if (ARV_IS_BUFFER (buffer))
printf ("Image successfully acquired\n");
else
printf ("Failed to acquire a single image\n");
g_clear_object (&camera);
g_clear_object (&buffer);
return EXIT_SUCCESS;
}
This is just one of the test programs that comes with the Aravis library that works with GenICam type cameras.

How to get current program counter inside mprotect handler and update it

I want to get the current program counter(PC) value inside mprotect handler. From there I want to increase the value of PC by 'n' number of instruction so that the program will skip some instructions. I want to do all that for linux kernel version 3.0.1. Any help about the data structures where I can get the value of PC and how to update that value? Sample code will be appreciated. Thanks in advance.
My idea is to use some task when a memory address is being written. So my idea is to use mprotect to make the address write protected. When some code tries to write something on that memory address, I will use mprotect handler to perform some operation. After taking care of the handler, I want to make the write operation successful. So my idea was to make the memory address unprotected inside handler and then perform the write operation again. When the code returns from the handler function, the PC will point to the original write instruction, whereas I want it to point to the next instruction. So I want to increase PC by one instruction irrespective of instruction lenght.
Check the following flow
MprotectHandler(){
unprotect the memory address on which protection fault arised
write it again
set PC to the next instruction of original write instruction
}
inside main function:
main(){
mprotect a memory address
try to write the mprotected address // original write instruction
Other instruction // after mprotect handler execution, PC should point here
}
Since it is tedious to compute the instruction length on several CISC processors, I recommend a somewhat different procedure: Fork using clone(..., CLONE_VM, ...) into a tracer and a tracee thread, and in the tracer instead of
write it again
set PC to the next instruction of original write instruction
do a
ptrace(PTRACE_SINGLESTEP, ...)
- after the trace trap you may want to protect the memory again.
Here is sample code demonstrating the basic principle:
#include <signal.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <sys/ucontext.h>
static void
handler(int signal, siginfo_t* siginfo, void* uap) {
printf("Attempt to access memory at address %p\n", siginfo->si_addr);
mcontext_t *mctx = &((ucontext_t *)uap)->uc_mcontext;
greg_t *rsp = &mctx->gregs[15];
greg_t *rip = &mctx->gregs[16];
// Jump past the bad memory write.
*rip = *rip + 7;
}
static void
dobad(uintptr_t *addr) {
*addr = 0x998877;
printf("I'm a survivor!\n");
}
int
main(int argc, char *argv[]) {
struct sigaction act;
memset(&act, 0, sizeof(struct sigaction));
sigemptyset(&act.sa_mask);
act.sa_sigaction = handler;
act.sa_flags = SA_SIGINFO | SA_ONSTACK;
sigaction(SIGSEGV, &act, NULL);
// Write to an address we don't have access to.
dobad((uintptr_t*)0x1234);
return 0;
}
It shows you how to update the PC in response to a page fault. It lacks the following which you have to implement yourself:
Instruction length decoding. As you can see I have hardcoded + 7 which happens to work on my 64bit Linux since the instruction causing the page fault is a 7 byte MOV. As Armali said in his answer, it is a tedious problem and you probably have to use an external library like libudis86 or something.
mprotect() handling. You have the address that caused the page fault in siginfo->si_addr and using that it should be trivial to find the address of the mprotected page and unprotect it.

Simple heap overflow exploit with toy example on old glibc

Consider this example of a heap buffer overflow vulnerable program in Linux, taken directly from the "Buffer Overflow Attacks" (p. 248) book:
#include <stdlib.h>
#include <string.h>
int main(int argc, char **argv)
{
char *A, *B;
A = malloc(128);
B = malloc(32);
strcpy(A, argv[1]);
free(A);
free(B);
return 0;
}
Since unlink() has been changed to prevent the most simple form of exploit using the FD and BK pointers with a sanity check, I'm using a very old system I have with an old version of glibc (version 2.3.2). I'm also setting MALLOC_CHECK_=0 for this testing.
My goal of this toy example is to simply see if I can write 4 bytes to some arbitrary address I specify. The most simple test I can think of is to try write something to 0x41414141, which is an illegal address and should let the program crash to just confirm to me that it is indeed trying to write to this address (something I should be able to observe in GDB).
So I try executing with the argument perl -e 'print "A"x128 . "\xf8\xff\xff\xff" . "\xf8\xff\xff\xff" . "\x41\x41\x41\x41" . "\x41\x41\x41\x41" '
So I have:
Buffer A: 128 bytes of 0x41.
prev_size: 0xfffffff8
size: 0xfffffff8
FD: 0x41414141
BK: 0x41414141
I'm using 0xfffffff8 instead of 0xfffffffc because there is a note that with glibc 2.3 the third lowest bit NON_MAIN_AREA is used for management purposes for the arenas and has to be 0.
This should attempt to write 0x41414141 to 0x41414141 (+ 12 to be more correct, but still an illegal address), correct? However, when I execute this, the program simply terminates normally.
What am I missing here? This seems simple enough that it shouldn't be that hard to get to work.
I've tried various things such as using 0xfffffffc instead for prev_size and size, using legal addresses for FD (some address on the heap). I've tried swapping the order A and B are free()'d, I've tried to step into free() to see what happens in GDB but I got lost. Note that there shouldn't be any other security features on this system as it is very old and wouldn't have NX-bit, ASLR, etc (not that it should matter for the purpose of just writing 4 bytes to an illegal address).
Any ideas for how to make this work?
I could add that if using MALLOC_CHECK_=3 I get this:
malloc: using debugging hooks
malloc: using debugging hooks
free(): invalid pointer 0x8049688!
Program received signal SIGABRT, Aborted.
0x4004a1b1 in kill () from /lib/libc.so.6

Is there a way to check whether the processor cache has been flushed recently?

On i386 linux. Preferably in c/(c/posix std libs)/proc if possible. If not is there any piece of assembly or third party library that can do this?
Edit: I'm trying to develop test whether a kernel module clear a cache line or the whole proccesor(with wbinvd()). Program runs as root but I'd prefer to stay in user space if possible.
Cache coherent systems do their utmost to hide such things from you. I think you will have to observe it indirectly, either by using performance counting registers to detect cache misses or by carefully measuring the time to read a memory location with a high resolution timer.
This program works on my x86_64 box to demonstrate the effects of clflush. It times how long it takes to read a global variable using rdtsc. Being a single instruction tied directly to the CPU clock makes direct use of rdtsc ideal for this.
Here is the output:
took 81 ticks
took 81 ticks
flush: took 387 ticks
took 72 ticks
You see 3 trials: The first ensures i is in the cache (which it is, because it was just zeroed as part of BSS), the second is a read of i that should be in the cache. Then clflush kicks i out of the cache (along with its neighbors) and shows that re-reading it takes significantly longer. A final read verifies it is back in the cache. The results are very reproducible and the difference is substantial enough to easily see the cache misses. If you cared to calibrate the overhead of rdtsc() you could make the difference even more pronounced.
If you can't read the memory address you want to test (although even mmap of /dev/mem should work for these purposes) you may be able to infer what you want if you know the cacheline size and associativity of the cache. Then you can use accessible memory locations to probe the activity in the set you're interested in.
Source code:
#include <stdio.h>
#include <stdint.h>
inline void
clflush(volatile void *p)
{
asm volatile ("clflush (%0)" :: "r"(p));
}
inline uint64_t
rdtsc()
{
unsigned long a, d;
asm volatile ("rdtsc" : "=a" (a), "=d" (d));
return a | ((uint64_t)d << 32);
}
volatile int i;
inline void
test()
{
uint64_t start, end;
volatile int j;
start = rdtsc();
j = i;
end = rdtsc();
printf("took %lu ticks\n", end - start);
}
int
main(int ac, char **av)
{
test();
test();
printf("flush: ");
clflush(&i);
test();
test();
return 0;
}
I dont know of any generic command to get the the cache state, but there are ways:
I guess this is the easiest: If you got your kernel module, just disassemble it and look for cache invalidation / flushing commands (atm. just 3 came to my mind: WBINDVD, CLFLUSH, INVD).
You just said it is for i386, but I guess you dont mean a 80386. The problem is that there are many different with different extension and features. E.g. the newest Intel series has some performance/profiling registers for the cache system included, which you can use to evalute cache misses/hits/number of transfers and similar.
Similar to 2, very depending on the system you got. But when you have a multiprocessor configuration you could watch the first cache coherence protocol (MESI) with the 2nd.
You mentioned WBINVD - afaik that will always flush complete, i.e. all, cache lines
It may not be an answer to your specific question, but have you tried using a cache profiler such as Cachegrind? It can only be used to profile userspace code, but you might be able to use it nonetheless, by e.g. moving the code of your function to userspace if it does not depend on any kernel-specific interfaces.
It might actually be more effective than trying to ask the processor for information that may or may not exist and that will be probably affected by your mere asking about it - yes, Heisenberg was way before his time :-)

Can a program assign the memory directly?

Is there any really low level programming language that can get access the memory variable directly? For example, if I have a program have a variable i. Can anyone access the memory to change my program variable i to another value?
As an example of how to change the variable in a program from “the outside”, consider the use of a debugger. Example program:
$ cat print_i.c
#include <stdio.h>
#include <unistd.h>
int main (void) {
int i = 42;
for (;;) { (void) printf("i = %d\n", i); (void) sleep(3); }
return 0;
}
$ gcc -g -o print_i print_i.c
$ ./print_i
i = 42
i = 42
i = 42
…
(The program prints the value of i every 3 seconds.)
In another terminal, find the process id of the running program and attach the gdb debugger to it:
$ ps | grep print_i
1779 p1 S+ 0:00.01 ./print_i
$ gdb print_i 1779
…
(gdb) bt
#0 0x90040df8 in mach_wait_until ()
#1 0x90040bc4 in nanosleep ()
#2 0x900409f0 in sleep ()
#3 0x00002b8c in main () at print_i.c:6
(gdb) up 3
#3 0x00002b8c in main () at print_i.c:6
6 for (;;) { (void) printf("i = %d\n", i); (void) sleep(3); }
(gdb) set variable i = 666
(gdb) continue
Now the output of the program changes:
…
i = 42
i = 42
i = 666
So, yes, it's possible to change the variable of a program from the “outside” if you have access to its memory. There are plenty of caveats here, e.g. one needs to locate where and how the variable is stored. Here it was easy because I compiled the program with debugging symbols. For an arbitrary program in an arbitrary language it's much more difficult, but still theoretically possible. Of course, if I weren't the owner of the running process, then a well-behaved operating system would not let me access its memory (without “hacking”), but that's a whole another question.
Sure, unless of course the operating system protects that memory on your behalf. Machine language (the lowest level programming language) always "accesses memory directly", and it's pretty easy to achieve in C (by casting some kind of integer to pointer, for example). Point is, unless this code's running in your process (or the kernel), whatever language it's written in, the OS would normally be protecting your process from such interference (by mapping the memory in various ways for different processes, for example).
If another process has sufficient permissions, then it can change your process's memory. On Linux, it's as simple as reading and writing the pseudo-file /proc/{pid}/mem. This is how many exploits work, though they do rely on some vulnerability that allows them to run with very high privileges (root on Unix).
Short answer: yes. Long answer: it depends on a whole lot of factors including your hardware (memory management?), your OS (protected virtual address spaces? features to circumvent these protections?) and the detailed knowledge your opponent may or may not have of both your language's architecture and your application structure.
It depends. In general, one of the functions of an operating system is called segmentation -- that means keeping programs out of each other's memory. If I write a program that tries to access memory that belongs to your program, the OS should crash me, since I'm committing something called a segmentation fault.
But there are situations where I can get around that. For example, if I have root privileges on the system, I may be able to access your memory. Or worse -- I can run your program inside a virtual machine, then sit outside that VM and do whatever I want to its memory.
So in general, you should assume that a malicious person can reach in and fiddle with your program's memory if they try hard enough.

Resources