Related
When to use size directives in x86 seems a bit ambiguous. This x86 assembly guide says the following:
In general, the intended size of the of the data item at a given memory
address can be inferred from the assembly code instruction in which it is
referenced. For example, in all of the above instructions, the size of
the memory regions could be inferred from the size of the register
operand. When we were loading a 32-bit register, the assembler could
infer that the region of memory we were referring to was 4 bytes wide.
When we were storing the value of a one byte register to memory, the
assembler could infer that we wanted the address to refer to a single
byte in memory.
The examples they give are pretty trivial, such as mov'ing an immediate value into a register.
But what about more complex situations, such as the following:
mov QWORD PTR [rip+0x21b520], 0x1
In this case, isn't the QWORD PTR size directive redundant since, according to the above guide, it can be assumed that we want to move 8 bytes into the destination register due to the fact that RIP is 8 bytes? What are the definitive rules for size directives on the x86 architecture? I couldn't find an answer for this anywhere, thanks.
Update: As Ross pointed out, the destination in the above example isn't a register. Here's a more relevant example:
mov esi, DWORD PTR [rax*4+0x419260]
In this case, can't it be assumed that we want to move 4 bytes because ESI is 4 bytes, making the DWORD PTR directive redundant?
You're right; it is rather ambiguous. Assuming we're talking about Intel syntax, it is true that you can often get away with not using size directives. Any time the assembler can figure it out automatically, they are optional. For example, in the instruction
mov esi, DWORD PTR [rax*4+0x419260]
the DWORD PTR specifier is optional for exactly the reason you suppose: the assembler can figure out that it is to move a DWORD-sized value, since the value is being moved into a DWORD-sized register.
Similarly, in
mov rsi, QWORD PTR [rax*4+0x419260]
the QWORD PTR specifier is optional for the exact same reason.
But it is not always optional. Consider your first example:
mov QWORD PTR [rip+0x21b520], 0x1
Here, the QWORD PTR specifier is not optional. Without it, the assembler has no idea what size value you want to store starting at the address rip+0x21b520. Should 0x1 be stored as a BYTE? Extended to a WORD? A DWORD? A QWORD? Some assemblers might guess, but you can't be assured of the correct result without explicitly specifying what you want.
In other words, when the value is in a register operand, the size specifier is optional because the assembler can figure out the size based on the size of the register. However, if you're dealing with an immediate value or a memory operand, the size specifier is probably required to ensure you get the results you want.
Personally, I prefer to always include the size when I write code. It's a couple of characters more typing, but it forces me to think about it and state explicitly what I want. If I screw up and code a mismatch, then the assembler will scream loudly at me, which has caught bugs more than once. I also think having it there enhances readability. So here I agree with old_timer, even though his perspective appears to be somewhat unpopular.
Disassemblers also tend to be verbose in their outputs, including the size specifiers even when they are optional. Hans Passant theorized in the comments this was to preserve backwards-compatibility with old-school assemblers that always needed these, but I'm not sure that's true. It might be part of it, but in my experience, disassemblers tend to be wordy in lots of different ways, and I think this is just to make it easier to analyze code with which you are unfamiliar.
Note that AT&T syntax uses a slightly different tact. Rather than writing the size as a prefix to the operand, it adds a suffix to the instruction mnemonic: b for byte, w for word, l for dword, and q for qword. So, the three previous examples become:
movl 0x419260(,%rax,4), %esi
movq 0x419260(,%rax,4), %rsi
movq $0x1, 0x21b520(%rip)
Again, on the first two instructions, the l and q prefixes are optional, because the assembler can deduce the appropriate size. On the last instruction, just like in Intel syntax, the prefix is non-optional. So, the same thing in AT&T syntax as Intel syntax, just a different format for the size specifiers.
RIP, or any other register in the address is only relevant to the addressing mode, not the width of data transfered. The memory reference [rip+0x21b520] could be used with a 1, 2, 4, or 8-byte access, and the constant value 0x01 could also be 1 to 8 bytes (0x01 is the same as 0x00000001 etc.) So in this case, the operand size has to be explicitly mentioned.
With a register as the source or destination, the operand size would be implicit: if, say, EAX is used, the data is 32 bits or 4 bytes:
mov [rip+0x21b520],eax
And of course, in the awfully beautiful AT&T syntax, the operand size is marked as a suffix to the instruction mnemonic (the l here).
movl $1, 0x21b520(%rip)
it gets worse than that, an assembly language is defined by the assembler, the program that reads/interprets/parses it. And x86 in particular but as a general rule there is no technical reason for any two assemblers for the same target to have the same assembly language, they tend to be similar, but dont have to be.
You have fallen into a couple of traps, first off the specific syntax used for the assembler you are using with respect to the size directive, then second, is there a default. My recommendation is ALWAYS use the size directive (or if there is a unique instruction mnemonic), then you never have to worry about it right?
The general way to store strings in NASM is to use db like in msg: db 'hello world', 0xA. I think this stores the string in the bss section. So the string will occupy the storage for the duration of the entire program. Instead, if we store it in the stack, it will be alive only during the local frame. For small strings (less than 8 bytes), this can be done using mov dword [rsp], 'foo'. But for longer strings, the string has to be split and be stored using multiple instructions. So this would increase the executable size (I thought so).
So now, which is better in large programs with multiple strings? Are any of the assumptions I made above, wrong?
mov dword [rsp] 'foo' assembles to C70424666F6F00, it takes 7 bytes to encode 4 payload characters.
In comparison with standard static way DB 'foo',0 the definition of string as immediate operand in code section increases the code size by 75 %.
Your dynamic method may be profitable only if you could eliminate the .rodata or .data section entirely (which is seldom the case of large programs). Each additional section takes more space in executable program than its netto contents because of its file-alignment (in PE format it is 512 bytes).
And even when your program has no other static data in data sections beside long strings, you could spare more space with static declaration in .text (code) section.
But for longer strings string has to be split and be stored using multiple instructions. So this would increase the executable size (I thought so).
Yep, and in almost all cases, the number of bytes used by those instructions will exceed the number of bytes that would be needed to just store the string in memory normally. (The instruction includes all the bytes of the immediate, with a few exceptions like zero- and sign-extension, as well as additional bytes to encode the opcode, destination address, etc). And code, of course, also occupies (virtual) memory for the entire duration of the program. There's no free lunch.
As such, you should just assemble the strings directly into memory with db as you have done. But try to arrange for them to go in a read-only section such as .text or .rodata depending on what your system supports. A modern operating system will then be able to discard those pages from physical memory if there is memory pressure, knowing that they can be reloaded from the executable if needed again. This is entirely transparent to your program. If there are many strings that will be needed at the same times, you can optimize a little by trying to arrange for them to be adjacent in your program's memory (defining them all together in one asm file should suffice).
You could also design something where at runtime, you read your strings from an external file into dynamically allocated memory, then free it when done with them. This was a common technique for programs on ancient operating systems with limited physical memory and no virtual memory support (think MS-DOS).
The string data has to be somewhere. Existing as immediates in your machine code takes space in the .text section of your program, normally linked into the same program segment as .rodata where you'd keep string literals. Running instructions to store strings to the stack means the data has to come into I-cache, then go out into D-cache.
But for long redundant strings, code to generate them in memory may be smaller than the string itself. This comes down to the Kolmogorov complexity; minimum size of a program that can output (or generate in an array) the given data. That program could be a gzip or zstd decompressor with some input constant data, could be good for a very large string or set of strings, much larger than the decompression code. Or for a specific case, have a look at code-golf questions like The alphabet in programming languages where my answer is 9 bytes of x86 machine code (including a ret) which inefficiently stores 1 byte at a time, incrementing a pointer in a loop, to produce the 26-byte string (without a terminating 0). Slow but compact.
Pushing / Storing from immediates
For just 4 bytes (not including the 0 terminator) you'd use push 'foo' = 5 bytes = 80% efficiency. On x86-64, that's a qword push (sign-extending the imm32 to 64-bit), so we get 4 bytes of zeros for free.
After that, getting the pointer into a register with mov rdi, rsp (3 bytes) is 4 bytes smaller than lea rdi, [rel msg_foo] (7 bytes), so it's an overall win for total size (unless padding for function alignment bumps it up or hides it). But anyway, comparing against the best option instead of the worst might be a good idea for your answer.
Compilers will sometimes use immediate data to init a local struct or array that has to be on the stack anyway (i.e. they have to pass a non-const pointer to another function); their threshold for switching to copying from .rodata (with movdqa xmm load/store) is higher than 8 bytes. But when you just want to print the string, you just need to pass a pointer to .rodata without copying it to the stack at all, so the threshold is much lower for it to be worth it to use stack space. Maybe 8 bytes, maybe 16, maybe only 4, depending on I-cache vs. D-cache pressure in your program.
For short but not tiny strings
mov rcx, imm64 + push rcx is 10+1 = 11 total bytes for 8 bytes of payload = 73% efficiency, vs. 4/7 = 57%. (At an offset from RSP it would be even worse, but to save code size you could use RBP+imm8 instead of RSP+imm8, but that's still 4 bytes per 7. You could mix push sign_extended_imm32 with mov dword [rsp+4], imm32 but that's also bad.)
This does have overhead that scales with string size, so it quickly becomes more size-efficient to just copy from .rodata, e.g. with an XMM loop, call memcpy, or even rep movsb if you're optimizing for size.
Or of course just using the string in .rodata if possible, if you don't need to make a copy you can modify on the stack.
Shellcode is a common use-case for techniques like this. You need a single "flat" sequence of bytes, usually not containing any 00 bytes which would terminate a C string.
You can have some data past the end of the actual machine code part, and mov store some zeros into that and generate pointers to it, but that's somewhat cumbersome. And you need a position-independent way to get pointers into registers. call/pop works, if you jump forward and call backward so the rel32 doesn't involve any 00 bytes. Same for RIP-relative lea.
It's often just as easy to push a zeroed register, or an imm8 or imm32, and get some zeros into memory along with ASCII data that way. Generating it a bit at a time makes it easy to mov rsi, rsp or whatever, instead of needing position-independent addressing relative to RIP.
I want to know if using the MOV instruction to copy a string into a register causes the string to be stored in reverse order. I learned that when MASM stores a string into a variable defined as a word or higher (dw and larger sizes) the string is stored in reverse order. Does the same thing happen when I copy a string to a register?
Based on this questions (about the SCAS instruction and about assigning strings and characters to variables in MASM 32) I assumed the following:
When MASM loads a string into a variable, it loads it in reverse order, i.e. the last character in the string is stored in the lowest memory address (beginning) of the string variable. This means assigning a variable str like so: str dd "abc" causes MASM to store the strings as "cba", meaning "c" is in the lowest memory address.
When defining a variable as str db "abc" MASM treats str as an array of characters. Trying to match the array index with the memory address of str, MASM will store "a" at the lowest memory address of str.
By default, the SCAS and MOVS instructions execute from the beginning (lowest) address of the destination string, i.e. the string stored in the EDI register. They do not "pop" or apply the "last in, first out" rule to the memory addresses they operate on before executing.
MASM always treats character arrays and strings to memory registers the same way. Moving the character array 'a', 'b', 'c' to EAX is the same as moving "abc" to EAX.
When I transfer a byte array arLetters with the characters 'a', 'b', and 'c' to the double-word variable strLetters using MOVSD, I believe the letters are copied to strLetters in reverse, i.e. stored as "cba". When I use mov eax, "abc" are the letters also stored in reverse order?
The code below will set the zero flag before it exits.
.data?
strLetters dd ?,0
.data
arLetters db "abcd"
.code
start:
mov ecx, 4
lea esi, arLetters
lea edi, strLetters
movsd
;This stores the string "dcba" into strLetters.
mov ecx, 4
lea edi, strLetters
mov eax, "dcba"
repnz scasd
jz close
jmp printer
;strLetters is not popped as "abcd" and is compared as "dcba".
printer:
print "No match.",13,10,0
jmp close
close:
push 0
call ExitProcess
end start
I expect the string "dcba" to be stored in EAX "as is" - with 'd' in the lowest memory address of EAX - since MASM treats moving strings to registers different from assigning strings to variables. MASM copied 'a', 'b', 'c' 'd'" into strLetters as "dcba" to ensure that if strLetters was popped, the string is emmitted/released in the correct order ("abcd"). If the REP MOVSB instruction were used in place of MOVSD, strLetters would have contained "abcd" and would be popped/emmitted as "dcba". However, becasuse MOVSD was used and SCAS or MOVS instructions do not pop strings before executing, the code above should set the zero flag, right?
Don't use strings in contexts where MASM expects a 16-bit or larger integer. MASM will convert them to integers in a way that reverses the order of characters when stored in memory. Since this is confusing it's best to avoid this, and only use strings with the DB directive, which works as expected. Don't use strings with more than character as immediate values.
Memory has a byte order, registers don't
Registers don't have addresses, and it's meaningless to talk about the order of bytes within a register. On a 32-bit x86 CPU, the general purpose registers like EAX hold 32-bit integer values. You can divide a 32-bit value conceptually into 4 bytes, but while it lives in a register there is no meaningful order to the bytes.
It's only when 32-bit values exist in memory do the 4 bytes that make them up have addresses and so have an order. Since x86 CPUs use the little-endian byte order that means the least-significant byte of the 4 bytes is the first byte. The most-significant part becomes the last byte. Whenever the x86 loads or stores a 16-bit or wider value to or from memory it uses the little-endian byte order. (An exception is the MOVBE instruction which specifically uses the big-endian byte order when loading and storing values.)
So consider this program:
.MODEL flat
.DATA
db_str DB "abcd"
dd_str DD "abcd"
num DD 1684234849
.CODE
_start:
mov eax, "abcd"
mov ebx, DWORD PTR [db_str]
mov ecx, DWORD PTR [dd_str]
mov edx, 1684234849
mov esi, [num]
int 3
END _start
After assembling and linking it gets converted into sequence of bytes something like this:
.text section:
00401000: B8 64 63 62 61 8B 1D 00 30 40 00 8B 0D 04 30 40 ,dcba...0#....0#
00401010: 00 BA 61 62 63 64 8B 35 08 30 40 00 CC .Âșabcd.5.0#.I
...
.data section:
00403000: 61 62 63 64 64 63 62 61 61 62 63 64 abcddcbaabcd
(On Windows the .data section normally gets placed after the .text section in memory.)
DB and DD treat strings differently
So we can see that the DB and DD directives, the ones labelled db_str and dd_str, generates two different sequences of bytes for the same string "abcd". In the first case, the MASM generates a sequence of bytes that we would we would expect, 61h, 62h, 63h, and 64h, the ASCII values for a, b, c, and d respectively. For dd_str though the sequence of bytes is reversed. This is because the DD directive uses 32-bit integers as operands, so the string has to be converted to a 32-bit value and MASM ends up reversing the order of characters in the string when the result of the conversion gets stored in memory.
In memory, strings and numbers are both just bytes
You'll also notice the DD directive labelled num also generated the same sequence of bytes that the DB directive. Indeed, without looking at the source there's no way to tell that the first four bytes are supposed to be a string while the last four bytes are supposed to be a number. They only become strings or numbers if the program uses them that way.
(Less obvious is how the decimal value 1684234849 was converted into the same sequence bytes as generated by the DB directive. It's already a 32-bit value, it just needs to be converted into a sequence of bytes by MASM. Unsurprisingly, the assembler does so using the same little-endian byte order that the CPU uses. That means the first byte is the least significant part of 1684234849 which happens to have the same value as the ASCII letter a (1684234849 % 256 = 97 = 61h). The last byte is the most significant part of the number, which happens to be the ASCII value of d (1684234849 / 256 / 256 / 256 = 100 = 64h).)
Immediates treat strings like DD does
Looking the the values in the .text section more closely with a disassembler, we can see how the sequence of bytes stored there will interpreted as instructions when executed by the CPU:
00401000: B8 64 63 62 61 mov eax,61626364h
00401005: 8B 1D 00 30 40 00 mov ebx,dword ptr ds:[00403000h]
0040100B: 8B 0D 04 30 40 00 mov ecx,dword ptr ds:[00403004h]
00401011: BA 61 62 63 64 mov edx,64636261h
00401016: 8B 35 08 30 40 00 mov esi,dword ptr ds:[00403008h]
0040101C: CC int 3
What we can see here is that that MASM stored the bytes that make up the immediate value in the instruction mov eax, "abcd" in the same order it did with the dd_str DD directive. The first byte of the immediate part of the instruction in memory is 64h, the ASCII value of d. The reason why is because the with a 32-bit destination register this MOV instruction uses a 32-bit immediate. That means that MASM needs to convert the string to a 32-bit integer and ends up reversing the order of bytes as it did with dd_str. MASM also handles the decimal number given as the immediate to the mov ecx, 1684234849 the same way it did with the DD directive that used the same number. The 32-bit value was converted to same little-endian representation.
In memory, instructions are also just bytes
You'll also notice that the disassembler generated assembly instructions that use hexadecimal values for the immediates of these two instruction. Like the CPU, the assembler has no way of knowing that immediate values are supposed be strings and decimal numbers. They're just a sequence of bytes in the program, all it knows is that they're 32-bit immediate values (from the opcodes B8h and B9h) and so displays them as 32-bit hexadecimal values for the lack of any better alternative.
Values in registers reflect memory order
By executing the program under a debugger and inspecting the registers after it reaches the breakpoint instruction (int 3) we can see what actually ended up in the registers:
eax=61626364 ebx=64636261 ecx=61626364 edx=64636261 esi=64636261 edi=00000000
eip=0040101c esp=0018ff8c ebp=0018ff94 iopl=0 nv up ei pl zr na pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246
image00000000_00400000+0x101c:
0040101c cc int 3
Now we can see that the first and third instructions loaded a different value than the other instructions. These two instruction both involve cases where MASM converted the string to a 32-bit value and ended up reversing order of the characters in memory. The register dump confirms that reversed order of bytes in memory in memory results in different values being loaded into the registers.
But really, registers don't have a byte order
Now you might be looking at that register dump above and thinking that only EAX and ECX is in the correct order, with the ASCII value for a, 61h first and and the ASCII value for d, 64h last. That MASM reversing the order of the strings in memory actually caused them to be loaded into registers in the correct order. But as I said before, there's no byte order in registers. The number 61626364 is just how the debugger represents the value when displaying it as a sequence of characters you can read. The characters 61 come first in the debugger's representation because our numbering system puts the most significant part of the number on the left, and we read left-to-right so that makes it the first part. However, as I also said before, x86 CPUs are little-endian, which means the least significant part comes first in memory. That means the first byte in memory becomes the least significant part of the value in the register, which gets displayed as the rightmost two hexadecimal digits of the number by the debugger because that's where least significant part the number goes in our numbering system.
In other words because x86 CPUs are little-endian, least significant first, but our numbering system is big-endian, most significant first, hexadecimal numbers get displayed in a byte-wise reverse order to how they're actually stored in memory.
Simply copying "strings" won't change their order
It should also be hopefully clear by now that loading a string into a register is only something that happens conceptually. The string gets converted into a sequence of bytes by the assembler, which when loaded into a 32-bit register, gets treated as little-endian 32-bit integer in memory. When the 32-bit value in the register is stored in memory the 32-bit value is converted into a sequence of bytes that represent the value in little-endian format. To the CPU your string is just a 32-bit integer it loaded and stored to and from memory.
So that means that if the value loaded into EAX in the sample program is stored to memory with something like mov [mem], eax then the the 4 bytes stored at mem will be in the same order as they appeared in the bytes that made up the immediate of mov eax, "abcd". That is in the same reversed order, 64h, 63h, 62h, 61h, that MASM put them in the bytes that make up immediate.
But why? I dunno, just don't do that
Now as to why MASM is reversing the order of strings when converting them to 32-bit integers I don't know, but the moral here is not to use strings as immediates or any other context where they need to be converted to integers. Assemblers are inconsistent on how they convert string literals into integers. (A similar problem occurs in how C compilers convert character literals like 'abcd' into integers.)
SCASD and MOVSD aren't special
Nothing special happens with the SCASD or MOVSD instrucitons. SCASD treats the four bytes pointed to by EDI as a 32-bit little-endian value, loads it into an unnamed temporary register, compares the temporary register to EAX, and then adds or subtracts 4 from EDI depending on the DF flag. MOVSD loads a 32-bit value in memory pointed to by ESI into an unnamed temporary register, stores the temporary register the 32-bit memory location pointed to by EDI, and then updates ESI and EDI according to the DF flag. (Byte order doesn't matter for MOVSD as the bytes are never used as a 32-bit value, but the order isn't changed.)
I wouldn't try to think of SCASD or MOVSD as FIFO or LIFO because ultimately that depends on how you use them. MOVSD can just as easily be used as part of an implementation of FIFO queue as a LIFO stack. (Compare this to PUSH and POP, which in theory could independently be used part of an implementation of either a FIFO or LIFO data structure, but together can only be used to implement a LIFO stack.)
See #RossRidge's answer for a very detailed description of how MASM works. This answer compares it to NASM which might just be confusing if you only care about MASM.
mov ecx, 4 is four dwords = 16 bytes, when used with repne scasd.
Simpler would be to omit rep and just use scasd.
Or even simpler cmp dword ptr [strLetters], "dcba".
If you look at the immediate in the machine code, it will compare equal if it's in the same order in memory as the data, because both are treated as little-endian 32-bit integers. (Because x86 instruction encoding uses little-endian immediates, matching x86's data load/store endianness.)
And yes, for MASM apparently you do need "dcba" to get the desired byte order when using a string as an integer constant, because MASM treats the first character as "most significant" and puts it last in a 32-bit immediate.
NASM and MASM are very different here. In NASM, mov dword [mem], 'abcd' produces 'a', 'b', 'c', 'd' in memory. i.e. byte-at-a-time memory order matches source order. See NASM character constants. Multi-character constants are simply right-justified in a 32-bit little-endian immediate with the string bytes in source order.
e.g.
objdump -d -Mintel disassembly
c7 07 61 62 63 64 mov DWORD PTR [rdi], 0x64636261
NASM source: mov dword [rdi], "abcd"
MASM source: mov dword ptr [rdi], "dcba"
GAS source: AFAIK not possible with a multi-char string literal. You could do something like $'a' + ('b'<<8) + ...
I agree with Ross's suggestion to avoid multi-character string literals in MASM except as an operand to db. If you want nice sane multi-character literals as immediates, use NASM or EuroAssembler (https://euroassembler.eu/eadoc/#CharNumbers)
See also How are dw and dd different from db directives for strings? - in NASM, dd "abcd" is exactly equivalent to db "abcd" (because I used 4 ASCII characters so there's no extra padding to fill a dword). But MASM will mangle your strings in a dd like for a dword immediate.
Also, don't use jcc and jmp, just use a je close to fall-through or not.
(You did avoid the usual brain-dead idiom of jcc over a jmp, here your jz is sane and the jmp is totally redundant, jumping to the next instruction.)
I am using NASM assembler on linux 64 bit.
There is something with variables and registers I can't understand.
I create a variable named "msg":
msg db "hello, world"
Now when I want to write to the stdout I move the msg to rsi register, however I don't understand the mov instruction bitwise ... the rsi register consists of 64 bit , while the msg variable has 12 symbols which is 8 bits each , which means the msg variable has a size of 12 * 8 bits , which is greater than 64 bits obviously.
So how is this even possible to make an instruction like:
mov rsi, msg , without overflowing the memory allocated for rsi.
Or does the rsi register contain the memory location of the first symbol of the string and after writing 1 symbol it changes to the memory location of the next symbol?
Sorry if I wrote complete nonsense, I'm new to assembly and i just can't get the grasp of it for a while.
In NASM syntax (unlike MASM syntax) mov rsi, symbol puts the address of the symbol into RSI. (Inefficiently with a 64-bit absolute immediate; use a RIP-relative LEA or mov esi, symbol instead. How to load address of function or label into register in GNU Assembler)
mov rsi, [symbol] would load 8 bytes starting at symbol. It's up to you to choose a useful place to load 8 bytes from when you write an instruction like that.
mov rsi, msg ; rsi = address of msg. Use lea rsi, [rel msg] instead
movzx eax, byte [rsi+1] ; rax = 'e' (upper 7 bytes zeroed)
mov edx, [msg+6] ; rdx = ' wor' (upper 4 bytes zeroed)
Note that you can use mov esi, msg because symbol addresses always fit in 32 bits (in the default "small" code model, where all static code/data goes in the low 2GB of virtual address space). NASM makes this optimization for you with assemble-time constants (like mov rax, 1), but probably it can't with link-time constants. Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?
and after writing 1 symbol it changes to the memory location of the next symbol?
No, if you want that you have to inc rsi. There is no magic. Pointers are just integers that you manipulate like any other integers, and strings are just bytes in memory.
Accessing registers doesn't magically modify them.
There are instructions like lodsb and pop that load from memory and increment a pointer (rsi or rsp respectively), but x86 doesn't have any pre/post-increment/decrement addressing modes, so you can't get that behaviour with mov even if you want it. Use add/sub or inc/dec.
Disclaimer: I'm not familiar with the flavor of assembly that you're dealing with, so the following is more general. The particular flavor may have more features than what I'm used to. In general, assembly deals with single byte/word entities where the size depends on the processor. I've done quite a bit of work on 8 and 16-bit processors, so that is where my answer is coming from.
General statements about Assembly:
Assembly is just like a high level language, except you have to handle a lot more of the details. So if you're used to some operation in say C, you can start there and then break the operation down even further.
For instance, if you have declared two variables that you want to add, that's pretty easy in C:
x = a + b;
In assembly, you have to break that down further:
mov R1, a * get value from a into register R1
mov R2, b * get value from b into register R2
add R1,R2 * perform the addition (typically goes into a particular location I'll call it the accumulator
mov x, acc * store the result of the addition from the accumulator into x
Depending on the flavor of assembly and the processor, you may be able to directly refer to variables in the addition instruction, but like I said I would have to look at the specific flavor you're working with.
Comments on your specific question:
If you have a string of characters, then you would have to move each one individually using a loop of some sort. I would set up a register to contain the starting address of your string, and then increment that register after each character is moved. It acts like a pointer in C. You will need to have some sort of indication for the termination of the string or another value that tells the size of the string, so you know when to stop.
I think I'm getting the Mod R/M byte down but I'm still confused by the effective memory address/scaled indexing byte. I'm looking at these sites: http://www.sandpile.org/x86/opc_rm.htm, http://wiki.osdev.org/X86-64_Instruction_Encoding. Can someone encode an example with the destination address being in a register where the SIB is used? Say for example adding an 8-bit register to an address in a 8-bit register with SIB used?
Also when I use the ModR/M byte of 0x05 is that (*) relative to the current instruction pointer? Is it 32 or 64 bits when in 64 bit mode?'
Is the SIB always used with a source or destination address?
A memory address is never in an 8-bit register, but here's an example of using SIB:
add byte [rax + rdx], 1
This is an instance of add rm8, imm8, 80 /0 ib. /0 indicates that the r field in the ModR/M byte is zero. We must use a SIB here but don't need an immediate offset, so we can use 00b for the mod and 100b for the rm, to form 04h for the ModR/M byte (44h and 84h also work, but wastes space encoding a zero-offset). Looking in the SIB table now, there are two registers both with "scale 1", so the base and index are mostly interchangeable (rsp can not be an index, but we're not using it here). So the SIB byte can be 10h or 02h.
Just putting the bytes in a row now:
80 04 10 01
; or
80 04 02 01
Also when I use the ModR/M byte of 0x05 is that (*) relative to the current instruction pointer? Is it 32 or 64 bits when in 64 bit mode?
Yes. You saw the note, I'm sure. So it can be either, depending on whether you used an address size override or not. In every reasonable case, it will be rip + sdword. Using the other form gives you a truncated result, I can't immediately imagine any circumstances under which that makes sense to do (for general lea math sure, but not for pointers). Probably (this is speculation though) that possibility only exists to make the address size override work reasonably uniformly.
Is the SIB always used with a source or destination address?
Depends on what you mean. Certainly, if you have a SIB, it will encode a source or destination (because what else is there?) (you might argue that the SIB that can appear in nop rm encodes nothing because nop has neither sources nor destinations). If you mean "which one does it encode", it can be either one. Looking over all instructions, it can most often appear in a source operand. But obviously there are many cases where it can encode the destination - example: see above. If you mean "is it always used", well no, see that table that you were looking at.