How to easily diagnose problems due to access to unmapped mmap regions? - linux

I've recently found a segfault that neither Valgrind, nor Address Sanitizer could give any useful info about. It happened because the faulty program munmapped a file and then tried to access the formerly mmapped region.
The following example demonstrates the problem:
#include <stdio.h>
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
int main()
{
const int fd=open("/tmp/test.txt", O_RDWR);
if(fd<0) abort();
const char buf[]="Hello";
if(write(fd, buf, sizeof buf) != sizeof buf) abort();
char*const volatile ptr=mmap(NULL,sizeof buf,PROT_READ,MAP_SHARED,fd,0);
if(!ptr) abort();
printf("1%c\n", ptr[0]);
if(close(fd)<0) abort();
printf("2%c\n", ptr[0]);
if(munmap(ptr, sizeof buf)<0) abort();
printf("3%c\n", ptr[0]); // Cause a segfault
}
With Address Sanitizer I get the following output:
1H
2H
AddressSanitizer:DEADLYSIGNAL
=================================================================
==8503==ERROR: AddressSanitizer: SEGV on unknown address 0x7fe7d0836000 (pc 0x55bda425c055 bp 0x7ffda5887210 sp 0x7ffda5887140 T0)
==8503==The signal is caused by a READ memory access.
#0 0x55bda425c054 in main /tmp/test/test1.c:22
#1 0x7fe7cf64fb96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
#2 0x55bda425bcd9 in _start (/tmp/test/test1+0xcd9)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /tmp/test/test1.c:22 in main
And here's the relevant part of output with Valgrind:
1H
2H
==8863== Invalid read of size 1
==8863== at 0x108940: main (test1.c:22)
==8863== Address 0x4029000 is not stack'd, malloc'd or (recently) free'd
==8863==
==8863==
==8863== Process terminating with default action of signal 11 (SIGSEGV)
==8863== Access not within mapped region at address 0x4029000
==8863== at 0x108940: main (test1.c:22)
Compare this with the case when a malloced region is accessed after free. Test program:
#include <stdio.h>
#include <string.h>
#include <malloc.h>
int main()
{
const char buf[]="Hello";
char*const volatile ptr=malloc(sizeof buf);
if(!ptr)
{
fprintf(stderr, "malloc failed");
return 1;
}
memcpy(ptr,buf,sizeof buf);
printf("1%c\n", ptr[0]);
free(ptr);
printf("2%c\n", ptr[0]); // Cause a segfault
}
Output with Address Sanitizer:
1H
=================================================================
==7057==ERROR: AddressSanitizer: heap-use-after-free on address 0x602000000010 at pc 0x55b8f96b5003 bp 0x7ffff5179b70 sp 0x7ffff5179b60
READ of size 1 at 0x602000000010 thread T0
#0 0x55b8f96b5002 in main /tmp/test/test1.c:17
#1 0x7f4298fd8b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
#2 0x55b8f96b4c49 in _start (/tmp/test/test1+0xc49)
0x602000000010 is located 0 bytes inside of 6-byte region [0x602000000010,0x602000000016)
freed by thread T0 here:
#0 0x7f42994b3b4f in free (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10bb4f)
#1 0x55b8f96b4fca in main /tmp/test/test1.c:16
#2 0x7f4298fd8b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
previously allocated by thread T0 here:
#0 0x7f42994b3f48 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10bf48)
#1 0x55b8f96b4e25 in main /tmp/test/test1.c:8
#2 0x7f4298fd8b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
Output with Valgrind:
1H
==6888== Invalid read of size 1
==6888== at 0x108845: main (test1.c:17)
==6888== Address 0x522d040 is 0 bytes inside a block of size 6 free'd
==6888== at 0x4C30D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6888== by 0x108840: main (test1.c:16)
==6888== Block was alloc'd at
==6888== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6888== by 0x1087D2: main (test1.c:8)
My question: is there any way to make Valgrind or a Sanitizer, or some other Linux-compatible tool output useful diagnostic about the context of access to munmapped region (like where it had been mmapped and munmapped), similar to the above given output for the access-after-free?

valgrind (and I guess asan does the same) can output a 'use after free' error
because it maintains a list of 'recently freed' blocks.
Such blocks are logically freed, but they are not returned (directly) to
the usable memory for further malloc calls. instead they are marked unaddressable.
The size of this 'recently freed' block list can be tuned using
--freelist-vol=<number> volume of freed blocks queue [20000000]
--freelist-big-blocks=<number> releases first blocks with size>= [1000000]
It would be possible to use a similar technique for munmap-ed memory:
rather than physically unmap it, it could be kept in a list of recently
unmapped blocks, be logically unmapped, but marked unaddressable.
Note that you could simulate that in your program by having a function
my_unmap that does not really do the unmap, but rather use the client requests
of valgrind to mark this memory as unaddressable.

is there any way to make Valgrind or a Sanitizer, or some other Linux-compatible tool output useful diagnostic
I know of no such tool, although it would be relatively easy to make one.
Your problem is sufficiently different from heap corruption problems which require specialized tools, and probably doesn't need such a tool.
The major difference is the "action at a distance" aspect: with heap corruption, the code in which the problem manifests is often very far removed from the code in which the problem originates. Hence the need to track memory state, to have red zones, etc.
In your case, the access to munmapped memory results in immediate crash. So if you just log every mmap and munmap that your program performs, you'll only have to look back for the last munmap that "covered" the address on which you crashed.
In addition, most programs perform relatively few mmap and munmap operations. If your program performs so many that you can't log them all, it's likely that it shouldn't actually do that (mmap and munmap are relatively very expensive system calls).

Related

Why a segfault instead of privilege instruction error?

I am trying to execute the privileged instruction rdmsr in user mode, and I expect to get some kind of privilege error, but I get a segfault instead. I have checked the asm and I am loading 0x186 into ecx, which is supposed to be PERFEVTSEL0, based on the manual, page 1171.
What is the cause of the segfault, and how can I modify the code below to fix it?
I want to resolve this before hacking a kernel module, because I don't want this segfault to blow up my kernel.
Update: I am running on Intel(R) Xeon(R) CPU X3470.
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
#include <sched.h>
#include <assert.h>
uint64_t
read_msr(int ecx)
{
unsigned int a, d;
__asm __volatile("rdmsr" : "=a"(a), "=d"(d) : "c"(ecx));
return ((uint64_t)a) | (((uint64_t)d) << 32);
}
int main(int ac, char **av)
{
uint64_t start, end;
cpu_set_t cpuset;
unsigned int c = 0x186;
int i = 0;
CPU_ZERO(&cpuset);
CPU_SET(i, &cpuset);
assert(sched_setaffinity(0, sizeof(cpuset), &cpuset) == 0);
printf("%lu\n", read_msr(c));
return 0;
}
The question I will try to answer: Why does the above code cause SIGSEGV instead of SIGILL, though the code has no memory error, but an illegal instruction (a privileged instruction called from non-privileged user pace)?
I would expect to get a SIGILL with si_code ILL_PRVOPC instead of a segfault, too. Your question is currently 3 years old and today, I stumbled upon the same behavior. I am disappointed too :-(
What is the cause of the segfault
The cause seems to be that the Linux kernel code decides to send SIGSEGV. Here is the responsible function:
http://elixir.free-electrons.com/linux/v4.9/source/arch/x86/kernel/traps.c#L487
Have a look at the last line of the function.
In your follow up question, you got a list of other assembly instructions which get propagated as SIGSEGV to userspace though they are actually general protection faults. I found your question because I triggered the behavior with cli.
and how can I modify the code below to fix it?
As of Linux kernel 4.9, I'm not aware of any reliable way to distinguish between a memory error (what I would expect to be a SIGSEGV) and a privileged instruction error from userspace.
There may be very hacky and unportable way to distibguish these cases. When a privileged instruction causes a SIGSEGV, the siginfo_t si_code is set to a value which is not directly listed in the SIGSEGV section of man 2 sigaction. The documented values are SEGV_MAPERR, SEGV_ACCERR, SEGV_PKUERR, but I get SI_KERNEL (0x80) on my system. According to the man page, SI_KERNEL is a code "which can be placed in si_code for any signal". In strace, you see SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0}. The responsible kernel code is here.
It would also be possible to grep dmesg for this string.
Please, never ever use those two methods to distinguish between GPF and memory error on a production system.
Specific solution for your code: Just don't run rdmsr from user space. But this answer is really unsatisfying if you are looking for a generic way to figure out why a program received a SIGSEGV.

How to get current program counter inside mprotect handler and update it

I want to get the current program counter(PC) value inside mprotect handler. From there I want to increase the value of PC by 'n' number of instruction so that the program will skip some instructions. I want to do all that for linux kernel version 3.0.1. Any help about the data structures where I can get the value of PC and how to update that value? Sample code will be appreciated. Thanks in advance.
My idea is to use some task when a memory address is being written. So my idea is to use mprotect to make the address write protected. When some code tries to write something on that memory address, I will use mprotect handler to perform some operation. After taking care of the handler, I want to make the write operation successful. So my idea was to make the memory address unprotected inside handler and then perform the write operation again. When the code returns from the handler function, the PC will point to the original write instruction, whereas I want it to point to the next instruction. So I want to increase PC by one instruction irrespective of instruction lenght.
Check the following flow
MprotectHandler(){
unprotect the memory address on which protection fault arised
write it again
set PC to the next instruction of original write instruction
}
inside main function:
main(){
mprotect a memory address
try to write the mprotected address // original write instruction
Other instruction // after mprotect handler execution, PC should point here
}
Since it is tedious to compute the instruction length on several CISC processors, I recommend a somewhat different procedure: Fork using clone(..., CLONE_VM, ...) into a tracer and a tracee thread, and in the tracer instead of
write it again
set PC to the next instruction of original write instruction
do a
ptrace(PTRACE_SINGLESTEP, ...)
- after the trace trap you may want to protect the memory again.
Here is sample code demonstrating the basic principle:
#include <signal.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <sys/ucontext.h>
static void
handler(int signal, siginfo_t* siginfo, void* uap) {
printf("Attempt to access memory at address %p\n", siginfo->si_addr);
mcontext_t *mctx = &((ucontext_t *)uap)->uc_mcontext;
greg_t *rsp = &mctx->gregs[15];
greg_t *rip = &mctx->gregs[16];
// Jump past the bad memory write.
*rip = *rip + 7;
}
static void
dobad(uintptr_t *addr) {
*addr = 0x998877;
printf("I'm a survivor!\n");
}
int
main(int argc, char *argv[]) {
struct sigaction act;
memset(&act, 0, sizeof(struct sigaction));
sigemptyset(&act.sa_mask);
act.sa_sigaction = handler;
act.sa_flags = SA_SIGINFO | SA_ONSTACK;
sigaction(SIGSEGV, &act, NULL);
// Write to an address we don't have access to.
dobad((uintptr_t*)0x1234);
return 0;
}
It shows you how to update the PC in response to a page fault. It lacks the following which you have to implement yourself:
Instruction length decoding. As you can see I have hardcoded + 7 which happens to work on my 64bit Linux since the instruction causing the page fault is a 7 byte MOV. As Armali said in his answer, it is a tedious problem and you probably have to use an external library like libudis86 or something.
mprotect() handling. You have the address that caused the page fault in siginfo->si_addr and using that it should be trivial to find the address of the mprotected page and unprotect it.

unexpected behavior of linux malloc

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include<pthread.h>
#define BLOCKSIZE 1024*1024
// #define BLOCKSIZE 4096
int main (int argc, char *argv[])
{
void *myblock = NULL;
int count = 0;
while (1)
{
myblock = malloc(BLOCKSIZE);
if (!myblock){
puts("error"); break;
}
memset(myblock,1, BLOCKSIZE);
count++;
}
printf("Currently allocated %d \n",count);
printf("end");
exit(0);
}
When BLOCKSIZE is 1024*1024. All is ok. Malloc return NULL, loop is break. Program print text and exit.
When BLOCKSIZE is 4096
Malloc never returns NULL Program crash. => Out of memory , killed by kernel .
Why?
It's pitch black, you are likely to be eaten by an OOM killer.
Linux has this thing called an OOM killer which wanders about killing off processes when it finds memory allocation is very heavy. The selection of which process(es) to kill is based on certain properties of each process (such as one allocating a lot of memory being a prime candidate).
It does this, partly due to its optimistic memory allocation strategy (it will generally give you address space whether or not there's enough backing memory on devices for it, something known as overcommit).
It's likely in this case that, when allocating 1M at a time, an allocation fails before the OOM killer finds you. With 4K, you're discovered before the allocation routines decide you've had enough.
You can configure the OOM killer to leave you alone if that's your desire, by writing an adjustment value of -17 to your oom_adj entry in procfs. It's not advisable unless you know what your doing since it puts other (perhaps more important) processes at risk. Other values from -16 to +15 adjust the likelihood that your process will be selected.
You can also turn off overcommit altogether by writing vm.overcommit_memory=2 to /etc/sysctl.conf but that again can present problems in your environment.

Simple heap overflow exploit with toy example on old glibc

Consider this example of a heap buffer overflow vulnerable program in Linux, taken directly from the "Buffer Overflow Attacks" (p. 248) book:
#include <stdlib.h>
#include <string.h>
int main(int argc, char **argv)
{
char *A, *B;
A = malloc(128);
B = malloc(32);
strcpy(A, argv[1]);
free(A);
free(B);
return 0;
}
Since unlink() has been changed to prevent the most simple form of exploit using the FD and BK pointers with a sanity check, I'm using a very old system I have with an old version of glibc (version 2.3.2). I'm also setting MALLOC_CHECK_=0 for this testing.
My goal of this toy example is to simply see if I can write 4 bytes to some arbitrary address I specify. The most simple test I can think of is to try write something to 0x41414141, which is an illegal address and should let the program crash to just confirm to me that it is indeed trying to write to this address (something I should be able to observe in GDB).
So I try executing with the argument perl -e 'print "A"x128 . "\xf8\xff\xff\xff" . "\xf8\xff\xff\xff" . "\x41\x41\x41\x41" . "\x41\x41\x41\x41" '
So I have:
Buffer A: 128 bytes of 0x41.
prev_size: 0xfffffff8
size: 0xfffffff8
FD: 0x41414141
BK: 0x41414141
I'm using 0xfffffff8 instead of 0xfffffffc because there is a note that with glibc 2.3 the third lowest bit NON_MAIN_AREA is used for management purposes for the arenas and has to be 0.
This should attempt to write 0x41414141 to 0x41414141 (+ 12 to be more correct, but still an illegal address), correct? However, when I execute this, the program simply terminates normally.
What am I missing here? This seems simple enough that it shouldn't be that hard to get to work.
I've tried various things such as using 0xfffffffc instead for prev_size and size, using legal addresses for FD (some address on the heap). I've tried swapping the order A and B are free()'d, I've tried to step into free() to see what happens in GDB but I got lost. Note that there shouldn't be any other security features on this system as it is very old and wouldn't have NX-bit, ASLR, etc (not that it should matter for the purpose of just writing 4 bytes to an illegal address).
Any ideas for how to make this work?
I could add that if using MALLOC_CHECK_=3 I get this:
malloc: using debugging hooks
malloc: using debugging hooks
free(): invalid pointer 0x8049688!
Program received signal SIGABRT, Aborted.
0x4004a1b1 in kill () from /lib/libc.so.6

Linux shared memory allocation on x86_64

I have 64 bit REHL linux, Linux ipms-sol1 2.6.32-71.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
RAM size = ~38GB
I changed default shared memory limits as follows in /etc/sysctl.conf & loaded changed file in memory as sysctl -p
kernel.shmmni=81474836
kernel.shmmax=32212254720
kernel.shmall=7864320
Just for experimental basis I have changed shmmax size to 32GB and tried allocating 10GB in code using shmget() as given below, but it fails to get 10GB of shared memory in single shot but when I reduce my demand for shared space to 8GB it succeeds any clue as to where am I possibly going wrong?
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <stdio.h>
#define SHMSZ 10737418240
main()
{
char c;
int shmspaceid;
key_t key;
char *shm, *s;
struct shmid_ds shmid;
key = 5678;
fprintf(stderr,"Changed code\n");
if ((shmspaceid = shmget(key, SHMSZ, IPC_CREAT | 0666)) < 0) {
fprintf(stderr,"ERROR memory allocation failed\n");
return 1;
}
shmctl(shmspaceid, IPC_RMID, &shmid);
return 0;
}
Regards
Himanshu
I'm not sure that this solution is applicable to shared memory as well, but I know this phenomenon from normal malloc() calls.
It's pretty usual that you cannot allocate very large blocks of memory as you try it here. What the functions call means is "Allocate me a block of continuous memory of 10737418240 bytes". Often times, even if the total system memory could theoretically satisfy this need, the implied "a block of continuous memory" forces the limit of allocatable memory to be much lower.
The in-memory program structure, the number of programs loaded can all contribute to blocking certain areas of memory and not allow there to be 10 continuous gigabytes of memory allocatable.
I have found often times that a reboot will change that (as programs get loaded to a different position on the heap). You can try out your maximum allocatable block size with something like this:
int i=1024;
int error=0;
while(!error) {
char *a=(char*)malloc(i);
error=(a==null);
if(!error)
printf("Successfully allocated %i.\n", i);
i*=2;
}
Hope this helps or is applicable here. I found this out while checking why I could not allocate close to maximum system memory to a JVM.
Shot in the dark: you don't have enough swap space. Shared memory, by default, requires reserving space in swap. You can disable this behavior using SHM_NORESERVE:
http://linux.die.net/man/2/shmget
SHM_NORESERVE (since Linux 2.6.15) This flag serves the same purpose
as the mmap(2) MAP_NORESERVE flag. Do not reserve swap space for this
segment. When swap space is reserved, one has the guarantee that it is
possible to modify the segment. When swap space is not reserved one
might get SIGSEGV upon a write if no physical memory is available. See
also the discussion of the file /proc/sys/vm/overcommit_memory in
proc(5).
I was just looking at this and I recommend printing out the exact errno value and description for the problem, rather than just noting that it failed. For example:
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
//#define SHMSZ 10737418240
#define SHMSZ 8589934592
int main()
{
int shmspaceid;
key_t key = 5678;
struct shmid_ds shmid;
if ((shmspaceid = shmget(key, SHMSZ, IPC_CREAT | 0666)) < 0) {
fprintf(stderr,"ERROR with shmget (%d: %s)\n", (int)(errno), strerror(errno));
return 1;
}
shmctl(shmspaceid, IPC_RMID, &shmid);
return 0;
}
I tried to reproduce your problem with an 8 GB block and 8 GB smhmax and shmall on my 16 GB system, but I could not. It worked fine. I do recommend using ipcs -m to look for other shared blocks that might prevent your 10 GB allocation from being honored. And definitely look closely at the exact error code that shmget() is returning through errno.

Resources