Gameboy Emulator pop off empty stack - emulation

I'm working on a Gameboy Emulator, and I've reached a point in the ROM where I get opcode 0xD1 (pop DE off stack) but the stack is empty (no values have been pushed onto it). All unknown opcodes return an error, and all other instructions seem to be working fine.
Is it an error in my programming, the ROM, or is this just a quick way for the program to set DE to 0x0000?

Even if no value has been PUSHed to the stack, POP will retrieve the value stored at the address in SP to the specified register pair, and SP will be incremented by 2.
In your example, if SP has been initialized to, say wD000, and that the WRAM is initialized to 0 beforehand, POP DE would effectively load 0 to DE, and increment the Stack Pointer by 2.
21 00 C0 ld hl,C000 ;Start of WRAM
01 FF 1F ld bc,1FFF ;Length of WRAM
AF xor a ;a = 0
22 ldi (hl),a ;Blanks WRAM
0B dec bc
78 ld a,b
B1 or c
20 F9 jr nz,0158 ;Loops until WRAM is cleared
21 00 D0 ld hl,D000
F9 ld sp,hl ;SP = 0xD000
D1 pop de ;de = 0x0000, SP = 0xD002
Also, please note that the CALL instruction pushes the return address to the stack, and decrements SP by 2. In the same way, RET retrieves the address from the stack, and increases SP by 2.

Related

How does BL instruction jump to invalid instruction still manage to work corretly

I'm practice to reverse engineering a il2cpp unity project
Things I done:
get the apk
using Apktool to extract files
open libunity.so with Ghidra ( or IDA works too )
And I found a wired block of instructions like :
004ac818 f4 0f 1e f8 str x20,[sp, #local_20]!
004ac81c f3 7b 01 a9 stp x19,x30,[sp, #local_10]
004ac820 e1 03 1f 2a mov w1,wzr
004ac824 77 b5 00 94 bl FUN_004d9e00
I follow bl FUN_004d9e00 and I found :
FUN_004d9e00
004d9e00 6e ?? 6Eh n
004d9e01 97 ?? 97h
004d9e02 85 ?? 85h
004d9e03 60 ?? 60h `
004d9e04 6d ?? 6Dh m
But here is the thing, the instruction in FUN_004d9e00 is not a valid one. How can the libunity.so still work properly
Perhaps there is a relocation symbol for address 0x004ac824? In that case the linker would modify the instruction when libunity.so is loaded, and it would end up calling a different address (maybe in a different shared library).

Find frame base and variable locations using DWARF version 4

I'm following Eli Bendersky's blog on parsing the DWARF debug information. He shows an example of parsing the binary with DWARF version 2 in his blog. The frame base of a function (further used for retrieving local variables) can be retrieved from the location list:
<1><71>: Abbrev Number: 5 (DW_TAG_subprogram)
<72> DW_AT_external : 1
<73> DW_AT_name : (...): do_stuff
<77> DW_AT_decl_file : 1
<78> DW_AT_decl_line : 4
<79> DW_AT_prototyped : 1
<7a> DW_AT_low_pc : 0x8048604
<7e> DW_AT_high_pc : 0x804863e
<82> DW_AT_frame_base : 0x0 (location list)
<86> DW_AT_sibling : <0xb3>
...
$ objdump --dwarf=loc tracedprog2
Contents of the .debug_loc section:
Offset Begin End Expression
00000000 08048604 08048605 (DW_OP_breg4: 4 )
00000000 08048605 08048607 (DW_OP_breg4: 8 )
00000000 08048607 0804863e (DW_OP_breg5: 8 )
However, I find in DWARF version 4 there is no such .debug_loc section. Here is the function info on my machine:
<1><300>: Abbrev Number: 17 (DW_TAG_subprogram)
<301> DW_AT_external : 1
<301> DW_AT_name : (indirect string, offset: 0x1e0): do_stuff
<305> DW_AT_decl_file : 1
<306> DW_AT_decl_line : 3
<307> DW_AT_decl_column : 6
<308> DW_AT_prototyped : 1
<308> DW_AT_low_pc : 0x1149
<310> DW_AT_high_pc : 0x47
<318> DW_AT_frame_base : 1 byte block: 9c (DW_OP_call_frame_cfa)
<31a> DW_AT_GNU_all_tail_call_sites: 1
Line <318> indicates the frame base is 1 byte block: 9c (DW_OP_call_frame_cfa). Any idea how to find the frame base for the DWARF v4 binaries?
Update based on #Employed Russian's answer:
The frame_base of a subprogram seems to point to the Canonical Frame Address (CFA), which is the RBP value before the call instruction.
<2><329>: Abbrev Number: 19 (DW_TAG_variable)
<32a> DW_AT_name : (indirect string, offset: 0x7d): my_local
<32e> DW_AT_decl_file : 1
<32f> DW_AT_decl_line : 5
<330> DW_AT_decl_column : 9
<331> DW_AT_type : <0x65>
<335> DW_AT_location : 2 byte block: 91 6c (DW_OP_fbreg: -20)
So a local variable (my_local in the above example) can be located by the CFA using this calculation: &my_local = CFA - 20 = (current RBP + 16) - 20 = current RBP - 4.
Verify it by checking the assembly:
void do_stuff(int my_arg)
{
1149: f3 0f 1e fa endbr64
114d: 55 push %rbp
114e: 48 89 e5 mov %rsp,%rbp
1151: 48 83 ec 20 sub $0x20,%rsp
1155: 89 7d ec mov %edi,-0x14(%rbp)
int my_local = my_arg + 2;
1158: 8b 45 ec mov -0x14(%rbp),%eax
115b: 83 c0 02 add $0x2,%eax
115e: 89 45 fc mov %eax,-0x4(%rbp)
my_local is at -0x4(%rbp).
This isn't about DWARFv2 vs. DWARFv4 -- using either version the compiler may chose to use or not use location lists. Your compiler chose not to.
Any idea how to find the frame base for the DWARF v4 binaries?
It tells you right there: use the CFA pseudo-register, also known as "canonical frame address".
That "imaginary" register has the same value that %rsp had just before the current function was called. That is, current function's return address is always stored at CFA+0, and %rsp == CFA+8 on entry into the function.
If the function uses frame pointer, then previous value of %rbp is usually stored at CFA+8.
More info here.

Why does a fully static Rust ELF binary have a Global Offset Table (GOT) section?

This code, when compiled for the x86_64-unknown-linux-musl target, produces a .got section:
fn main() {
println!("Hello, world!");
}
$ cargo build --release --target x86_64-unknown-linux-musl
$ readelf -S hello
There are 30 section headers, starting at offset 0x26dc08:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
...
[12] .got PROGBITS 0000000000637b58 00037b58
00000000000004a8 0000000000000008 WA 0 0 8
...
According to this answer for analogous C code, the .got section is an artifact that can be safely removed. However, it segfaults for me:
$ objcopy -R.got hello hello_no_got
$ ./hello_no_got
[1] 3131 segmentation fault (core dumped) ./hello_no_got
Looking at the disassembly, I see that the GOT basically holds static function addresses:
$ objdump -d hello -M intel
...
0000000000400340 <_ZN5hello4main17h5d434a6e08b2e3b8E>:
...
40037c: ff 15 26 7a 23 00 call QWORD PTR [rip+0x237a26] # 637da8 <_GLOBAL_OFFSET_TABLE_+0x250>
...
$ objdump -s -j .got hello | grep 637da8
637da8 50434000 00000000 b0854000 00000000 PC#.......#.....
$ objdump -d hello -M intel | grep 404350
0000000000404350 <_ZN3std2io5stdio6_print17h522bda9f206d7fddE>:
404350: 41 57 push r15
The number 404350 comes from 50434000 00000000, which is a little-endian 0x00000000000404350 (this was not obvious; I had to run the binary under GDB to figure this out!)
This is perplexing, since Wikipedia says that
[GOT] is used by executed programs to find during runtime addresses of global variables, unknown in compile time. The global offset table is updated in process bootstrap by the dynamic linker.
Why is the GOT present? From the disassembly, it looks like the compiler knows all the needed addresses. As far as I know, there is no bootstrap done by the dynamic linker: there is neither INTERP nor DYNAMIC program headers present in my binary;
Why does the GOT store function pointers? Wikipedia says the GOT is only for global variables, and function pointers should be contained in the PLT.
TL;DR summary: the GOT is really a rudimentary build artifact, which I was able to get rid of via simple machine code manipulations.
Breakdown
If we look at
$ objdump -dj .text hello
and search for GLOBAL, we see only four distinct types of references to the GOT (constants differ):
40037c: ff 15 26 7a 23 00 call QWORD PTR [rip+0x237a26] # 637da8 <_GLOBAL_OFFSET_TABLE_+0x250>
425903: ff 25 5f 26 21 00 jmp QWORD PTR [rip+0x21265f] # 637f68 <_GLOBAL_OFFSET_TABLE_+0x410>
41d8b5: 48 3b 1d b4 a5 21 00 cmp rbx,QWORD PTR [rip+0x21a5b4] # 637e70 <_GLOBAL_OFFSET_TABLE_+0x318>
40b259: 48 83 3d 7f cb 22 00 cmp QWORD PTR [rip+0x22cb7f],0x0 # 637de0 <_GLOBAL_OFFSET_TABLE_+0x288>
40b260: 00
All of these are reading instructions, which means that the GOT is not modified at runtime. This in turn means that we can statically resolve the addresses that the GOT refers to! Let's consider the reference types one by one:
call QWORD PTR [rip+0x2126be] simply says "go to address [rip+0x2126be], take 8 bytes from there, interpret them as a function address and call the function". We can simply replace this instruction with a direct call:
40037c: e8 cf 3f 00 00 call 404350 <_ZN3std2io5stdio6_print17h522bda9f206d7fddE>
400381: 90 nop
Notice the nop at the end: we need to replace all the 6 bytes of the machine code that constitute the first instruction, but the instruction we replace it with is only 5 bytes, so we need to pad it. Fundamentally, as we are patching a compiled binary, we can replace an instruction with a another one only if it is not longer.
jmp QWORD PTR [rip+0x21265f] is the same as the previous one, but instead of calling an address it jumps to it. This turns into:
425903: e9 b8 f7 ff ff jmp 4250c0 <_ZN68_$LT$core..fmt..builders..PadAdapter$u20$as$u20$core..fmt..Write$GT$9write_str17hc384e51187942069E>
425908: 90 nop
cmp rbx,QWORD PTR [rip+0x21a5b4] - this takes 8 bytes from [rip+0x21a5b4] and compares them to the contents of rbx register. This one is tricky, since cmp can not compare register contents to an 64-bit immediate value. We could use another register for that, but we don't know which of the registers are used around this instruction. A careful solution would be something like
push rax
mov rax,0x0000006363c0
cmp rbx,rax
pop rax
But that would be way beyond our limit of 7 bytes. The real solution stems from an observation that the GOT contains only addresses; our address space is (roughly) contained in range [0x400000; 0x650000], which can be seen in the program headers:
$ readelf -l hello
...
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x0000000000035b50 0x0000000000035b50 R E 0x200000
LOAD 0x0000000000036380 0x0000000000636380 0x0000000000636380
0x0000000000001dd0 0x0000000000003918 RW 0x200000
...
It follows that we can (mostly) get away with only comparing 4 bytes of a GOT entry instead of 8. So the substitution is:
41d8b5: 81 fb c0 63 63 00 cmp ebx,0x6363c0
41d8bb: 90 nop
The last one consists of two lines of objdump output, since 8 bytes do not fit in one line:
40b259: 48 83 3d 7f cb 22 00 cmp QWORD PTR [rip+0x22cb7f],0x0 # 637de0 <_GLOBAL_OFFSET_TABLE_+0x288>
40b260: 00
It just compares 8 bytes of the GOT to a constant (in this case, 0x0). In fact, we can do the comparison statically; if the operands compare equal, we replace the comparison with
40b259: 48 39 c0 cmp rax,rax
40b25c: 90 nop
40b25d: 90 nop
40b25e: 90 nop
40b25f: 90 nop
40b260: 90 nop
Obviously, a register is always equal to itself. A lot of padding needed here!
If the left operand is greater than the right one, we replace the comparison with
40b259: 48 83 fc 00 cmp rsp,0x0
40b25d: 90 nop
40b25e: 90 nop
40b25f: 90 nop
40b260: 90 nop
In practice, rsp is always greater than zero.
If the left operand is smaller than the right one, things get a bit more complicated, but since we have a whole lot of bytes (8!) we can manage:
40b259: 50 push rax
40b25a: 31 c0 xor eax,eax
40b25c: 83 f8 01 cmp eax,0x1
40b25f: 58 pop rax
40b260: 90 nop
Notice that the second and the third instructions use eax instead of rax, since cmp and xor involving eax take one less byte than with rax.
Testing
I have written a Python script to do all these substitutions automatically (it's a bit hacky and relies on parsing of objdump output though):
#!/usr/bin/env python3
import re
import sys
import argparse
import subprocess
def read_u64(binary):
return sum(binary[i] * 256 ** i for i in range(8))
def distance_u32(start, end):
assert abs(end - start) < 2 ** 31
diff = end - start
if diff < 0:
return 2 ** 32 + diff
else:
return diff
def to_u32(x):
assert 0 <= x < 2 ** 32
return bytes((x // (256 ** i)) % 256 for i in range(4))
class GotInstruction:
def __init__(self, lines, symbol_address, symbol_offset):
self.address = int(lines[0].split(":")[0].strip(), 16)
self.offset = symbol_offset + (self.address - symbol_address)
self.got_offset = int(lines[0].split("(File Offset: ")[1].strip().strip(")"), 16)
self.got_offset = self.got_offset % 0x200000 # No idea why the offset is actually wrong
self.bytes = []
for line in lines:
self.bytes += [int(x, 16) for x in line.split("\t")[1].split()]
class TextDump:
symbol_regex = re.compile(r"^([0-9,a-f]{16}) <(.*)> \(File Offset: 0x([0-9,a-f]*)\):")
def __init__(self, binary_path):
self.got_instructions = []
objdump_output = subprocess.check_output(["objdump", "-Fdj", ".text", "-M", "intel",
binary_path])
lines = objdump_output.decode("utf-8").split("\n")
current_symbol_address = 0
current_symbol_offset = 0
for line_group in self.group_lines(lines):
match = self.symbol_regex.match(line_group[0])
if match is not None:
current_symbol_address = int(match.group(1), 16)
current_symbol_offset = int(match.group(3), 16)
elif "_GLOBAL_OFFSET_TABLE_" in line_group[0]:
instruction = GotInstruction(line_group, current_symbol_address,
current_symbol_offset)
self.got_instructions.append(instruction)
#staticmethod
def group_lines(lines):
if not lines:
return
line_group = [lines[0]]
for line in lines[1:]:
if line.count("\t") == 1: # this line continues the previous one
line_group.append(line)
else:
yield line_group
line_group = [line]
yield line_group
def __iter__(self):
return iter(self.got_instructions)
def read_binary_file(path):
try:
with open(path, "rb") as f:
return f.read()
except (IOError, OSError) as exc:
print(f"Failed to open {path}: {exc.strerror}")
sys.exit(1)
def write_binary_file(path, content):
try:
with open(path, "wb") as f:
f.write(content)
except (IOError, OSError) as exc:
print(f"Failed to open {path}: {exc.strerror}")
sys.exit(1)
def patch_got_reference(instruction, binary_content):
got_data = read_u64(binary_content[instruction.got_offset:])
code = instruction.bytes
if code[0] == 0xff:
assert len(code) == 6
relative_address = distance_u32(instruction.address, got_data)
if code[1] == 0x15: # call QWORD PTR [rip+...]
patch = b"\xe8" + to_u32(relative_address - 5) + b"\x90"
elif code[1] == 0x25: # jmp QWORD PTR [rip+...]
patch = b"\xe9" + to_u32(relative_address - 5) + b"\x90"
else:
raise ValueError(f"unknown machine code: {code}")
elif code[:3] == [0x48, 0x83, 0x3d]: # cmp QWORD PTR [rip+...],<BYTE>
assert len(code) == 8
if got_data == code[7]:
patch = b"\x48\x39\xc0" + b"\x90" * 5 # cmp rax,rax
elif got_data > code[7]:
patch = b"\x48\x83\xfc\x00" + b"\x90" * 3 # cmp rsp,0x0
else:
patch = b"\x50\x31\xc0\x83\xf8\x01\x90" # push rax
# xor eax,eax
# cmp eax,0x1
# pop rax
elif code[:3] == [0x48, 0x3b, 0x1d]: # cmp rbx,QWORD PTR [rip+...]
assert len(code) == 7
patch = b"\x81\xfb" + to_u32(got_data) + b"\x90" # cmp ebx,<DWORD>
else:
raise ValueError(f"unknown machine code: {code}")
return dict(offset=instruction.offset, data=patch)
def make_got_patches(binary_path, binary_content):
patches = []
text_dump = TextDump(binary_path)
for instruction in text_dump.got_instructions:
patches.append(patch_got_reference(instruction, binary_content))
return patches
def apply_patches(binary_content, patches):
for patch in patches:
offset = patch["offset"]
data = patch["data"]
binary_content = binary_content[:offset] + data + binary_content[offset + len(data):]
return binary_content
def main():
parser = argparse.ArgumentParser()
parser.add_argument("binary_path", help="Path to ELF binary")
parser.add_argument("-o", "--output", help="Output file path", required=True)
args = parser.parse_args()
binary_content = read_binary_file(args.binary_path)
patches = make_got_patches(args.binary_path, binary_content)
patched_content = apply_patches(binary_content, patches)
write_binary_file(args.output, patched_content)
if __name__ == "__main__":
main()
Now we can get rid of the GOT for real:
$ cargo build --release --target x86_64-unknown-linux-musl
$ ./resolve_got.py target/x86_64-unknown-linux-musl/release/hello -o hello_no_got
$ objcopy -R.got hello_no_got
$ readelf -e hello_no_got | grep .got
$ ./hello_no_got
Hello, world!
I have also tested it on my ~3k LOC app, and it seems to work alright.
P.S. I am not an expert in assembly, so some of the above might be inaccurate.

Race condition on ticket-based ARM spinlock

I found that spinlocks in Linux kernel are all using "ticket-based" spinlock now. However after looking at the ARM implementation of it, I'm confused because the "load-add-store" operation is not atomic at all. Please see the code below:
74 static inline void arch_spin_lock(arch_spinlock_t *lock)
75 {
76 unsigned long tmp;
77 u32 newval;
78 arch_spinlock_t lockval;
79
80 __asm__ __volatile__(
81 "1: ldrex %0, [%3]\n" /*Why this load-add-store is not atomic?*/
82 " add %1, %0, %4\n"
83 " strex %2, %1, [%3]\n"
84 " teq %2, #0\n"
85 " bne 1b"
86 : "=&r" (lockval), "=&r" (newval), "=&r" (tmp)
87 : "r" (&lock->slock), "I" (1 << TICKET_SHIFT)
88 : "cc");
89
90 while (lockval.tickets.next != lockval.tickets.owner) {
91 wfe();
92 lockval.tickets.owner = ACCESS_ONCE(lock->tickets.owner);
93 }
94
95 smp_mb();
96 }
As you can see, on line 81~83 it loads lock->slock to "lockval" and increment it by one and then store it back to the lock->slock.
However I didn't see anywhere this is ensured to be atomic. So it could be possible that:
Two users on different cpu are reading lock->slock to their own variable "lockval" at the same time; Then they add "lockval" by one respectively and then store it back.
This will cause these two users are having the same "number" in hand and once the "owner" field becomes that number, both of them will acquire the lock and do operations on some shared-resources!
I don't think kernel can have such a bug in spinlock. Am I wrong somewhere?
STREX is a conditional store, this code has Load Link-Store Conditional semantics, even if ARM doesn't use that name.
The operation either completes atomically, or fails.
The assembler block tests for failure (the tmp variable indicates failure) and reattempts the modification, using the new value (updated by another core).

gnu C++ library stuck in loop during vector alloc

Running linux kernel 3.6.6-1, gcc 4.7.2-2, the following program:
1 #include <vector>
2 using namespace std;
3 int main ()
4 {
5 vector<size_t> a (1 << 24);
6 return 0;
7 }
never returns from line 5.
when I run in gdb, I see that it is stuck in stl_algobase.h at line 743/744:
0x000000000040101c in std::__fill_n_a<unsigned long*, unsigned long, unsigned long> (__first=0x7fffeffd8060, __n=16777216, __value=#0x7fffffffe0a8: 0)
at /usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../include/c++/4.7.2/bits/stl_algobase.h:743
740 __fill_n_a(_OutputIterator __first, _Size __n, const _Tp& __value)
741 {
742 const _Tp __tmp = __value;
743 for (__decltype(__n + 0) __niter = __n;
744 __niter > 0; --__niter, ++__first)
745 *__first = __tmp;
746 return __first;
747 }
__niter just stays at the value 1 and never counts down to 0.
This behavior only occurs after my system has been running for a while. And when it occurs, the whole system seems borked. That is, the gui soon stops responding, but I can ssh in and do some stuff, but eventually the whole system becomes unusable and I reboot.
After I reboot, the above program behaves as expected.
Obviously, the problem is not with my program. It's just a symptom of some larger problem.
My question is: What do I do next?
I have checked all my error logs and found nothing. I'm not getting hardware exceptions or anything like that, so it's hard to tell exactly when my system goes into this state.
I'm out of ideas, so any help would be very appreciated.
edit:
I changed my compiler options to -g -Wall and get the same result.
Here is the disassembly for __fill_n_a (with new options):
1 0x00000000004010bd <+0>: push %rbp
2 0x00000000004010be <+1>: mov %rsp,%rbp
3 0x00000000004010c1 <+4>: mov %rdi,-0x18(%rbp)
4 0x00000000004010c5 <+8>: mov %rsi,-0x20(%rbp)
5 0x00000000004010c9 <+12>: mov %rdx,-0x28(%rbp)
6 0x00000000004010cd <+16>: mov -0x28(%rbp),%rax
7 0x00000000004010d1 <+20>: mov (%rax),%rax
8 0x00000000004010d4 <+23>: mov %rax,-0x10(%rbp)
9 0x00000000004010d8 <+27>: mov -0x20(%rbp),%rax
10 0x00000000004010dc <+31>: mov %rax,-0x8(%rbp)
11 0x00000000004010e0 <+35>: jmp 0x4010f7 <std::__fill_n_a<unsigned long*, unsigned long, unsigned long>(unsigned long*, unsigned long, unsigned long const&)+58>
12 0x00000000004010e2 <+37>: mov -0x18(%rbp),%rax
13 0x00000000004010e6 <+41>: mov -0x10(%rbp),%rdx
14 0x00000000004010ea <+45>: mov %rdx,(%rax)
15 0x00000000004010ed <+48>: subq $0x1,-0x8(%rbp)
16 0x00000000004010f2 <+53>: addq $0x8,-0x18(%rbp)
17 0x00000000004010f7 <+58>: cmpq $0x0,-0x8(%rbp)
18 0x00000000004010fc <+63>: setne %al
19 0x00000000004010ff <+66>: test %al,%al
20 0x0000000000401101 <+68>: jne 0x4010e2 <std::__fill_n_a<unsigned long*, unsigned long, unsigned long>(unsigned long*, unsigned long, unsigned long const&)+37>
21 0x0000000000401103 <+70>: mov -0x18(%rbp),%rax
22 0x0000000000401107 <+74>: pop %rbp
23 0x0000000000401108 <+75>: retq
I've also run my system's memory diagnostic tool with no errors and, as suggested by DL, ran memtest86 with no errors.
edit:
I have confirmed that this is not a hardware problem by running the same code on a different machine. The other machine has the same kernel and compiler software installed, and it fails in the same way.
I am suspicious of ImageMagick. This seems to occur only after I have run scripts that make a lot of ImageMagick convert calls. I had problems with ImageMagick previously and had to set the shell variable MAGICK_THREAD_LIMIT=1.
The overall symptoms you describe sound like running out of memory. If the system memory use does not read as high, this may be due to some kind of RAM problem, as commenters have noted.
You say:
__niter just stays at the value 1 and never counts down to 0.
but this doesn't quite make sense -- __niter should start as 16777216 and count down to 0. If you were to break into this program randomly, it would almost certainly be in this loop, but the value of __niter would almost certainly not be 1 yet, and if you step through the loop it would seem to just loop. I'm highly suspect of the debugging info put out by gcc 4.7 (actually, its a problem pretty much since gcc 4.0), in that gdb frequently seems to print out the wrong values for local variables, but if you inspect the code and look and memory/registers directly you can see the correct value. If that's what is happening here, your problem probably has nothing to do with this program; its a system instability (possibly due to a hardware problem) that manifests as things hanging, such as this program. Given what this program does, the hang probably occurs when it touches a previously untouched page (getting a page fault) and the kernel attempts to allocate a page. Which suggests a memory problem, but you noted that you already ran memory diagnostics. Also make sure that you don't have anything overclocked or otherwise running out of spec.

Resources