intel x64 REX prefix - 64-bit

For these two instructions:
12 /r Add with carry r/m8 to byte register.
REX + 12 /r Add with carry r/m64 to byte register.
Both of these can have the REX prefix if the register on the first instruction is 9-16... So how does the CPU differentiate between the two? Is the REX prefix on the first instruction missing the 1 in the 7th bit so it's just REX.B 0x01?

No, obviously not. The first one doesn't have a REX prefix, and the second one does. The first form by definition has no REX prefix, and therefore cannot have it (that would, again by definition, make it the second form instead). The reason they are both in the manual is so there can be asterisks next to the form with a REX prefix, and a note that it can't encode AH, BH, CH, or DH.
Is the REX prefix on the first instruction missing the 1 in the 7th bit so it's just REX.B 0x01?
That makes no sense.
So how does the CPU differentiate between the two?
Well, one has a REX prefix and the other does not.

Related

brk segment overflow error in x86 assembly [duplicate]

When to use size directives in x86 seems a bit ambiguous. This x86 assembly guide says the following:
In general, the intended size of the of the data item at a given memory
address can be inferred from the assembly code instruction in which it is
referenced. For example, in all of the above instructions, the size of
the memory regions could be inferred from the size of the register
operand. When we were loading a 32-bit register, the assembler could
infer that the region of memory we were referring to was 4 bytes wide.
When we were storing the value of a one byte register to memory, the
assembler could infer that we wanted the address to refer to a single
byte in memory.
The examples they give are pretty trivial, such as mov'ing an immediate value into a register.
But what about more complex situations, such as the following:
mov QWORD PTR [rip+0x21b520], 0x1
In this case, isn't the QWORD PTR size directive redundant since, according to the above guide, it can be assumed that we want to move 8 bytes into the destination register due to the fact that RIP is 8 bytes? What are the definitive rules for size directives on the x86 architecture? I couldn't find an answer for this anywhere, thanks.
Update: As Ross pointed out, the destination in the above example isn't a register. Here's a more relevant example:
mov esi, DWORD PTR [rax*4+0x419260]
In this case, can't it be assumed that we want to move 4 bytes because ESI is 4 bytes, making the DWORD PTR directive redundant?
You're right; it is rather ambiguous. Assuming we're talking about Intel syntax, it is true that you can often get away with not using size directives. Any time the assembler can figure it out automatically, they are optional. For example, in the instruction
mov esi, DWORD PTR [rax*4+0x419260]
the DWORD PTR specifier is optional for exactly the reason you suppose: the assembler can figure out that it is to move a DWORD-sized value, since the value is being moved into a DWORD-sized register.
Similarly, in
mov rsi, QWORD PTR [rax*4+0x419260]
the QWORD PTR specifier is optional for the exact same reason.
But it is not always optional. Consider your first example:
mov QWORD PTR [rip+0x21b520], 0x1
Here, the QWORD PTR specifier is not optional. Without it, the assembler has no idea what size value you want to store starting at the address rip+0x21b520. Should 0x1 be stored as a BYTE? Extended to a WORD? A DWORD? A QWORD? Some assemblers might guess, but you can't be assured of the correct result without explicitly specifying what you want.
In other words, when the value is in a register operand, the size specifier is optional because the assembler can figure out the size based on the size of the register. However, if you're dealing with an immediate value or a memory operand, the size specifier is probably required to ensure you get the results you want.
Personally, I prefer to always include the size when I write code. It's a couple of characters more typing, but it forces me to think about it and state explicitly what I want. If I screw up and code a mismatch, then the assembler will scream loudly at me, which has caught bugs more than once. I also think having it there enhances readability. So here I agree with old_timer, even though his perspective appears to be somewhat unpopular.
Disassemblers also tend to be verbose in their outputs, including the size specifiers even when they are optional. Hans Passant theorized in the comments this was to preserve backwards-compatibility with old-school assemblers that always needed these, but I'm not sure that's true. It might be part of it, but in my experience, disassemblers tend to be wordy in lots of different ways, and I think this is just to make it easier to analyze code with which you are unfamiliar.
Note that AT&T syntax uses a slightly different tact. Rather than writing the size as a prefix to the operand, it adds a suffix to the instruction mnemonic: b for byte, w for word, l for dword, and q for qword. So, the three previous examples become:
movl 0x419260(,%rax,4), %esi
movq 0x419260(,%rax,4), %rsi
movq $0x1, 0x21b520(%rip)
Again, on the first two instructions, the l and q prefixes are optional, because the assembler can deduce the appropriate size. On the last instruction, just like in Intel syntax, the prefix is non-optional. So, the same thing in AT&T syntax as Intel syntax, just a different format for the size specifiers.
RIP, or any other register in the address is only relevant to the addressing mode, not the width of data transfered. The memory reference [rip+0x21b520] could be used with a 1, 2, 4, or 8-byte access, and the constant value 0x01 could also be 1 to 8 bytes (0x01 is the same as 0x00000001 etc.) So in this case, the operand size has to be explicitly mentioned.
With a register as the source or destination, the operand size would be implicit: if, say, EAX is used, the data is 32 bits or 4 bytes:
mov [rip+0x21b520],eax
And of course, in the awfully beautiful AT&T syntax, the operand size is marked as a suffix to the instruction mnemonic (the l here).
movl $1, 0x21b520(%rip)
it gets worse than that, an assembly language is defined by the assembler, the program that reads/interprets/parses it. And x86 in particular but as a general rule there is no technical reason for any two assemblers for the same target to have the same assembly language, they tend to be similar, but dont have to be.
You have fallen into a couple of traps, first off the specific syntax used for the assembler you are using with respect to the size directive, then second, is there a default. My recommendation is ALWAYS use the size directive (or if there is a unique instruction mnemonic), then you never have to worry about it right?

How to remove a char from a 4-byte string in MIPS32 assembly using a shift?

I'm new to MIPS32 assembly and trying to delete a character in a string (delete the first character, specifically) stored in the .data section but have no clue how to do so.
In the following line of code, is there a way to make it so that test just equals "bc" instead of "abc"
test: .asciiz "abc"
Is this simply a matter of using something like logical shifting left by 2 to remove the first char, or do I need to offset by something, or is there an opcode to just straight delete it?
As Need to remove all non letter elements from a string in assembly (for x86) explains, removing a character in a string means copying over the whole rest of the string.
In your case it's just 3 bytes left out of the 4 (including the terminating 0). So yes, you could do this just by shifting the word by 8 bits (1 byte). Especially if you make sure test is word-aligned with .p2align 2 before it, so you can safely lw and sw all 4 bytes with one load.
For little-endian MIPS (like MARS simulates), that would be a right shift because the first byte in memory is the least significant. And right shifts shift out the low (least significant) bits.
For big-endian MIPS (most significant byte first, like some real MIPS CPUs operate), that would be a left shift, removing the most significant byte and shifting the low bits up.
Note that this will leave the word at test being 'b', 'c', 0, 0. So yes, as an implicit-length string it's "bc".
Also note that if you just had a pointer in a register, you could get a pointer to "bc" by simply incrementing it by 1 instead of modifying memory. Like addiu $t0, $t0, 1.
Or equivalently, la $t0, test+1 is a pointer to 1 byte past the start.

What does a hexadecimal number, with a register in parenthesis mean in Assembly?

lea 0x1c(%ebp),%eax
So, I understand vaguely what the lea instruction does, and I know those are registers, but what is this structure: 0x1c(%ebp)? I got this code out of objdump.
It is one of the many x86 addressing modes. Specifically, this is referred to as "displacement" addressing.
Since you said you used objdump and didn't specify that you used the -M flag, I'm going to assume this in the GAS syntax (as opposed to Intel syntax). This means that the first operand is the source, and the second operand is the destination.
The lea 0x1C(%ebp),%eax instruction means, "Take the value in %ebp, add 0x1C (28 in decimal), then store that value in %eax".

x64 opcodes and scaled byte index

I think I'm getting the Mod R/M byte down but I'm still confused by the effective memory address/scaled indexing byte. I'm looking at these sites: http://www.sandpile.org/x86/opc_rm.htm, http://wiki.osdev.org/X86-64_Instruction_Encoding. Can someone encode an example with the destination address being in a register where the SIB is used? Say for example adding an 8-bit register to an address in a 8-bit register with SIB used?
Also when I use the ModR/M byte of 0x05 is that (*) relative to the current instruction pointer? Is it 32 or 64 bits when in 64 bit mode?'
Is the SIB always used with a source or destination address?
A memory address is never in an 8-bit register, but here's an example of using SIB:
add byte [rax + rdx], 1
This is an instance of add rm8, imm8, 80 /0 ib. /0 indicates that the r field in the ModR/M byte is zero. We must use a SIB here but don't need an immediate offset, so we can use 00b for the mod and 100b for the rm, to form 04h for the ModR/M byte (44h and 84h also work, but wastes space encoding a zero-offset). Looking in the SIB table now, there are two registers both with "scale 1", so the base and index are mostly interchangeable (rsp can not be an index, but we're not using it here). So the SIB byte can be 10h or 02h.
Just putting the bytes in a row now:
80 04 10 01
; or
80 04 02 01
Also when I use the ModR/M byte of 0x05 is that (*) relative to the current instruction pointer? Is it 32 or 64 bits when in 64 bit mode?
Yes. You saw the note, I'm sure. So it can be either, depending on whether you used an address size override or not. In every reasonable case, it will be rip + sdword. Using the other form gives you a truncated result, I can't immediately imagine any circumstances under which that makes sense to do (for general lea math sure, but not for pointers). Probably (this is speculation though) that possibility only exists to make the address size override work reasonably uniformly.
Is the SIB always used with a source or destination address?
Depends on what you mean. Certainly, if you have a SIB, it will encode a source or destination (because what else is there?) (you might argue that the SIB that can appear in nop rm encodes nothing because nop has neither sources nor destinations). If you mean "which one does it encode", it can be either one. Looking over all instructions, it can most often appear in a source operand. But obviously there are many cases where it can encode the destination - example: see above. If you mean "is it always used", well no, see that table that you were looking at.

Calculate number of bytes to fetch , Assembly

I'm tying to calculate how much byte the "fetch" need.
I'm writing in assembly this code
jmp [2*eax]
and the command in the list file is 3 bytes.
when i'm writing this command :
jmp [4*eax]
I got 7 bytes
does anyone know why ?
I suspect your assembler is being smart and is encoding the jmp [2*eax] as jmp [eax+eax] which takes fewer bytes since it doesn't require a displacement. Whereas jmp [4*eax] is really the equivalent of jmp [4*eax+0x00000000] which requires an extra 4 bytes for the displacement.
It has to do with the was the SIB (scaled index byte) works. Typically this encodes addresses in the form base + index*scale + displacement. The displacement is optional, but only if a base register is included. If you want to leave off the base register, then you are forced to include a 32 bit displacement.
So to get eax*4 you need to use the form index*4 + displacement even though you don't need that displacement. But to get eax*2, you can use the form base + index*scale (i.e. eax+eax*1), and avoid having to include the displacement.

Resources