I've "finished" emulating my 6502 processor, and I'm in the testing phase right now. Being the beginner that I am, I've been stuck on this one issue for a couple of hours already. I'm following an NES test which basically is just a homebrew ROM and a log someone made, and it says what value is supposed to be in every register after a step through the memory. I'm stuck on one part where the log is showing that the overflow flag is cleared after an immediate SBC instruction, even though the accumulator has the value 0x40, and first argument (memory pc + 1) is equal to 0x41. That means that 0x40 - 0x41 should be 0xFF, i.e. -1, so that means that there was an overflow, right? I read some article about the overflow flag, and it states that an overflow is produced when a value is too big or too small to be held in a signed byte, and therefore overflows to the other side (i.e. 2 8 bit positives become an 8 bit negative, and vice versa). So am I thinking the wrong way? This is the log line:
CBC6 E9 41 SBC #$41 A:40 X:AA Y:73 P:E5
The line after that states that the P register has become 0xA4, which would mean the carry and overflow flags are both cleared. I get the carry part, but not the overflow part. This is the next line, if care for it:
CBC8 20 62 F9 JSR $F962 A:FF X:AA Y:73 P:A4
By the way; that second line proves my point: the A register has indeed become 0xFF, and 0xFF in a signed byte would equal 256 - 255 = -1........
As you correctly say, overflow is for signed operations if the result doesn't fit. Note that a signed byte can hold values between -128 and +127, and as such -1 fits nicely in that range. That it has crossed zero is not relevant for signed overflow because 0 is in the middle of the representable range. It's only interesting for unsigned carry. For overflow the boundary is at 0x80 (-128) which in turn is irrelevant for unsigned operations.
Related
So I got recently interested in buffer overflow and many tutorials and recourses online have this CTF like attack where you need to read the content of a flag file (using cat for example).
So I started looking online for assembly examples of how to do this and I came accross sites like this or shell-storm where there are plenty of examples on how to do this.
So I generated my exploit and got this machine code (it basically executes a shell doing cat flag):
shellcode = b'\x31\xc0\x50\x68\x2f\x63\x61\x74\x68\x2f\x62\x69\x6e\x89\xe3\x50\x68\x66\x6c\x61\x67\x89\xe1\x50\x51\x53\x89\xe1\x31\xc0\x83\xc0\x0b\xcd\x80'
The problem is that, thanks to stepping in with GDB to debug the problem, I noticed that my buffer doesn't get copied starting with \x0b towards the end of the shell code. I know the problem is there because if I change it to say \x3b then it works (with the rest of my exploits not copied here) even if it obviously crashes when it reaches the wrong value there but at least the whole buffer gets copied. Now doing some research it seems like \x0b is a "bad char" which can cause issues and should be avoided. Having said this I don't understand how:
All those online and even university tutorials use that shell code
for this exact task.
How to potentially fix this. Is it even possible without completely
change the assembly code?
I will add that I am on Ubuntu and trying to make this work on 64 bits.
One thing that's special about byte 0x0b is it's ASCII Vertical Tab, which is considered a whitespace character.
So I'm going to make a wild guess that the code you're exploiting looks something like
// Dangerous code, DO NOT USE
char buf[TOO_SMALL];
scanf("%s", buf);
since scanf("%s") is a commonly (mis)used input mechanism that stops when it hits whitespace. If so, then if your shellcode contains 0x0b or any other whitespace character, it will get truncated.
To your first question, as to "why do other tutorials use shellcode like this", they may be thinking instead of exploiting code like
// Dangerous code, DO NOT USE
char buf[TOO_SMALL];
gets(buf);
where gets() will not stop reading at 0x0b but only at newline 0x0a. Or maybe they are thinking of a buffer filled by strcpy() which will only stop at 0x00, or maybe a buffer filled by read() with a user-controlled size which will read the full amount of data no matter what bytes it contains. So the question of which characters are "bad" depends on what the vulnerable code actually does.
As to how to handle it, well, you need to modify your shellcode to use only instructions that don't contain any whitespace bytes. This sort of thing is more an art than a science; you have to know your instruction set well, and be creative in thinking about alternative instruction sequences to achieve the desired result. Sometimes you may be able to do it with minor tweaks; other times a wholesale rewrite may be needed. It really varies.
In this case, luckily the 0x0b is the only whitespace character in the whole code, and it appears in the instruction
83C00B add eax, 0x0b
Since eax was previously zeroed, the goal is to load it with the value 0xb which is the system call number of execve. When the "bad byte" appears as part of immediate data, it is usually not too hard to find another way to get that data to where it needs to go. (Life is harder when the bad byte is part of the opcode itself.) In this case, a simple solution is to take advantage of two's complement, and write instead
83E8F5 sub eax, -0x0b
The single byte -0x0b = 0xf5 gets sign-extended to 32 bits and used as the value to subtract, which leaves 0x0b in eax as desired. Of course there are lots of other ways, some of which may have smaller code size; I'll leave this to your ingenuity.
To find out the "bad char" for the shellcode is an important step to exploit an overflow vulneribility.
first, you have to figure out how many bits the target can be overflow (this field is also for the shellcode). if this zone is big enough and you can use all the "char"(google bad char from \x01 to \xff. \x00 is bad char) to act as shellcode send to target.
Then you can get find the Register to see what the char left.(if the zone is not big enough for all the chars you can send just some chars one time and repeat)
you can follow this https://netsec.ws/?p=180.
I realized when I am looking at some files through GDB, very frequently, there are these three lines of codes at the starting of the function
0x08048548 <+0>: lea ecx,[esp+0x4]
0x0804854c <+4>: and esp,0xfffffff0
0x0804854f <+7>: push DWORD PTR [ecx-0x4]
I usually ignored them because right after those three lines stack frame gets created which is how functions usually start.
Thank you.
This is aligning the stack pointer to a 16-byte boundary, because sometimes (for SSE) the CPU needs 16 byte alignment of data.
A good compiler will examine the call graph (figure out what calls what), and will decide that:
the function doesn't need stack alignment itself and doesn't call other functions that need stack alignment; and therefore no stack alignment is needed
all of the function's callers used an aligned stack, and therefore either:
the function only needs a fixed adjustment to re-establish the pre-existing alignment, like sub esp, 8 (which could be merged with any code that reserves stack space for local variables)
the data that actually needs 16 byte alignment can be given 16 byte alignment without aligning the stack itself
none of the above can be proven to be true, so the function has to assume "worst case" and enforce alignment itself (e.g. the instructions you've seen at the start of the function)
Of course for a good compiler, the last case (where the code you've shown is needed) is extremely rare.
However; most compilers can't be good because they're not able to see the whole program (if the program is split into multiple object files that are compiled separately, then the compiler can only see a fraction of the program at a time). They can't figure out much/any of the call graph, so the last case (where the code you've shown is needed) becomes very common. To solve this you need "link time code generation", but often people don't bother.
Note: For AVX2 you want 32 byte alignment, for AVX512 you want 64 byte alignment, and for some things (to avoid false sharing in heavily threaded code) you might want "cache line size alignment" (typically also 64 byte alignment). This makes the "examine call graph to determine what alignment is actually needed" algorithm a little more complicated than what I described.
In x86, I understand multi-byte objects are stored in memory little endian style.
Now generally speaking, when it comes to CPU instructions, the OPCODE determines the purpose of the instruction and data/memory addresses may follow the opcode in it's encoded format. My understanding is the Opcode portion of the instruction should be the most significant byte and thus appear at the highest address of any given instruction encoding representation.
Can someone explain the memory layout on this x86 linux gdb example? I would imagine that the opcode 0xb8 would appear at a higher address due to it being the most significant byte.
(gdb) disassemble _start
Dump of assembler code for function _start:
0x08048080 <+0>: mov eax,0x11223344
(gdb) x/1xb _start+0
0x8048080 <_start>: 0xb8
(gdb) x/1xb _start+1
0x8048081 <_start+1>: 0x44
(gdb) x/1xb _start+2
0x8048082 <_start+2>: 0x33
(gdb) x/1xb _start+3
0x8048083 <_start+3>: 0x22
(gdb) x/1xb _start+4
0x8048084 <_start+4>: 0x11
It appears the instruction mov eax, 0x11223344 is encoding as 0x11 0x22 0x33 0x44 0xb8.
Questions.
1.) How does the CPU know how many bytes the instruction will take up if the first byte it see's is not an opcode?
2.) I'm wondering if perhaps x86 cpu instructions do not even have endian-ness and are considering some type of string? (probably way off here)
x86 is a variable length instruction set, you start with a single byte which has no endianness, it is wherever it is.
Then depending on the opcode there may be more bytes and those might for example be a 32 bit immediate, and (if that group of bytes is an immediate or address of some sort) THOSE bytes will be little endian. Say you have the five bytes ABCDE (no endianess, think array or string). The A byte is the opcode, the B byte would then be the lower 8 bits of the immediate and the E the upper 8 bits of the immediate.
Opcode is a hard to use term, in these older 8/16 bit CISC processors like x86 the entire byte was an opcode that you basically looked up in a table to see what it meant (and inside the processor they did use a table to figure out how to execute it). When you look at MIPS or ARM or other (certainly RISC) instruction sets like those, only a portion of the 32 bits are the "opcode" and in neither of those cases is it the same set of bits from one instruction to another, you have to look at various places in the instruction (yes there is overlap to make the decoding sane), MIPS is a lot more consistent you have one blob in one place you look at but one of those patterns requires you to look at another blob of bits to fully decode. ARM you start at a common bit and as you work your way across you are further decoding the instruction, then you may have to grab some random looking spots to fully decode. The rest of the bits are operands, what register to use or immediate or whatever the kind of thing that in a CISC you needed a look up table for (are implied by the opcode but not defined by bits in the opcode).
1) the next byte after the prior instruction will be interpreted as an opcode even if not intended to be one (if execution continues to that byte and doesnt branch). I dont remember my x86 table off hand to know if there are any undefined instructions or not, if undefined then it will hit a handler, otherwise it will decode what it finds as machine code and if it is not properly formed instructions will likely crash, sometimes you get lucky and it just messes something up and keeps going, or even more lucky and you cant tell that it almost crashed.
2) you are right for these 8/16 bit CISC or similar instruction sets they are treated more like strings that you parse through linearly.
I read in ARM docs that:
GE[3:0], bits[19:16]
The instructions described in Parallel addition and subtraction instructions on
page A4-171 update these flags to indicate the results from individual bytes or halfwords
of the operation. These flags can control a later SEL instruction.
So apparently GE[3:0] stands for "eq/lt/gt"?
I came into a couple of strange issues which I yet don't have a clue, but they all have CPSR value xxxf0030, so the GE bits are 0b1111? What does that stands for? Is it normal for these GE bits?
Thanks in advance!
In the ARMv7 ARM (which matches that text), the details of how the GE flags get set are only in the operation pseudocode of the parallel instructions themselves. Sadly, they seem to have removed this nice prose description which was in the ARMv6 ARM:
Instructions that operate on halfwords:
set or clear GE[3:2] together, based on the result of the top halfword calculation
set or clear GE[1:0] together, based on the result of the bottom halfword calculation.
Instructions that operate on bytes:
set or clear GE[3] according to the result of the top byte calculation
set or clear GE[2] according to the result of the second byte calculation
set or clear GE[1] according to the result of the third byte calculation
set or clear GE[0] according to the result of the bottom byte calculation.
Each bit is set (otherwise cleared) if the results of the
corresponding calculation are as follows:
for unsigned byte addition, if the result is greater than or equal to 2^8
for unsigned halfword addition, if the result is greater than or equal to 2^16
for unsigned subtraction, if the result is greater than or equal to zero
for signed arithmetic, if the result is greater than or equal to zero.
As arithmetic flags, they could have any old value (undefined at reset, and can be freely written to via APSR), so until you've specifically used one of the instructions which sets them, they're pretty much meaningless and can be ignored.
This question is not about any error I'm currently seeing, it's more about theory and getting educated on the variations in HW architecture design and implementation.
Scenario 1: Assuming a 16-bit processor with 16-bit registers, 16-bit addressing, and sizeof(int) = 16 bits:
unsigned int a, b, c, d;
a=0xFFFF;
b=0xFFFF;
c=a+b;
Is it possible for the memory location next to c to be overwritten? (In this case I would expect that an overflow flag would be raised during the add operation, and c either remain unchanged or be filled with undefined data.)
Scenario 2: Assuming a 32-bit processor with 32-bit registers, 32-bit addressing, sizeof(int) = 32 bits, and sizeof(short int)=16 bits:
unsigned int a, b;
unsigned short int c, d;
a=0xFFFF;
b=0xFFFF;
c=a+b;
Is it possible for the memory location next to c to be overwritten? (I would expect that no overflow flag be raised during the add operation, but whether or not a memory access or overflow flag be raised during the assignment operation would depend on the actual design and implementation of the HW. If d was located in the upper 16 bits of the same 32-bit address location (probably not even possible with 32-bit addressing), it might be overwritten.)
Scenario 3: Assuming a 32-bit processor with 32-bit registers, 16-bit addressing, sizeof(int) = 32 bits, and sizeof(short int)=16 bits:
unsigned int a, b;
unsigned short int c, d;
a=0xFFFF;
b=0xFFFF;
c=a+b;
Is it possible for the memory location next to c to be overwritten? (I would expect some overflow flag or memory violation flag to be raised during the type conversion and assignment operation.)
Scenario 4: Assuming a 32-bit processor with 32-bit registers, 32-bit addressing, and sizeof(int) = 32 bits:
unsigned int a, b;
struct {
unsigned int c:16;
unsigned int d:16;
} e;
a=0xFFFF;
b=0xFFFF;
e.c=a+b;
Is it possible for the memory location next to c, namely d, to be overwritten? (In this case, since c and d are expected to reside in the same 32-bit address and both are technically 32-bit integers, it's conceivable that no overflow flags be raised during addition or assignment, and d could be affected.)
I have not tried to test this on actual hardware because my question is more about theory and possible variations in design and implementation. Any insight would be appreciated.
Is it possible for a binary integer operation that results in an overflow to overwrite adjacent memory?
Are there currently any HW implementations that suffer from similar memory overwrite problems, or have there been systems in the past that had this problem?
What devices do typical processors use to guard against neighboring memory from being overwritten by arithmetic and assignment operations?
None of your 1,2,3,4 will result in any memory corruption, or writing "past" the storage for the integer location you are performing arithmetic on. C specifies what should happen on an overflow of unsigned integers. This is assuming the compiler produces the code it's supposed to. Nothing will guard you acainst a buggy compiler that generates code which copies 4 bytes into a 2 byte variable.
Here's what C99 (6.2.5) says:
"A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting unsigned integer type is
reduced modulo the number that is one greater than the largest value that can be
represented by the resulting type."
So, it's well defined what will happen when you try to "overflow" an unsigned integer.
Now, if your integers had been signed integers, it is an other story. According to C, overflowing a signed integer results in undefined behavior. And undefined behavior means anything, including memory corruption, could occur. I have not yet seen a C compiler that would corrupt anything regarding overflowing an integer though.
What devices do typical processors use to guard against neighboring memory from being overwritten by arithmetic and assignment operations?
There's no guards against neighboring memory regarding assignment and arithmetic operations. The processor simply executes the machine code instructions given to it (And if those instructions overwrites memory it was not suppoed to do as expressed in a higher level language, the processor does not care).
At a slightly different level, the CPU might issue traps if it cannot carry out the operation(e.g. the operation specified memory location that does not exist), or tries to do some illegal operation (e.g. division by zero, or encounters an op code the processor does not understand, or tries to do unaligned access to data).
The addition operation is processed inside the processor afaik, so whatever you do, the add operation will be done inside the CPU (in the ALU more precisely)
The overflow register would be set in case of overflow, and the result will still be in the register, and then copied back to your memory location without risk of corrupting adjacent memory.
This is how the code would (sort of) be translated in asm:
mov ax, ptr [memory location of a]
mov bx, ptr [memory location of b]
add ax,bx
mov ptr [memory location of c], ax
so as you can see, c would only hold what was in ax (which is of a known and fixed size) no matter if an overflow occurred or not
In C, the behavior of overflows with unsigned types is well-defined. Any implementation in which overflow causes any result other than what the standard predicts is non-compliant.
The behavior of overflows with signed types is undefined. While the most common effects of an overflow would either be the assignment of some erroneous value or an outright crash, nothing in the C standard guarantees that the processor won't trigger an overflow fault in a way that the compiled code tries to recover from, but which for some reason trashes the contents of a stack variable or a register. It's also possible that an overflow could cause a flag to get set in a way that code doesn't expect, and that such a flag might cause erroneous behavior on future computations.
Other languages may have different semantics for what happens when an overflow occurs.
note: I've seen processors which trap on overflow, processors where unexpected traps that happen at the same time as an external interrupt may cause data corruption, and processors where an unexpected unsigned overflow could cause a succeeding computation to be off by one. I don't know of any where a signed overflow would latch a flag that would interfere with subsequent calculations, but some DSP's have interesting overflow behaviors, so I wouldn't be surprised if one exists.