Related
When to use size directives in x86 seems a bit ambiguous. This x86 assembly guide says the following:
In general, the intended size of the of the data item at a given memory
address can be inferred from the assembly code instruction in which it is
referenced. For example, in all of the above instructions, the size of
the memory regions could be inferred from the size of the register
operand. When we were loading a 32-bit register, the assembler could
infer that the region of memory we were referring to was 4 bytes wide.
When we were storing the value of a one byte register to memory, the
assembler could infer that we wanted the address to refer to a single
byte in memory.
The examples they give are pretty trivial, such as mov'ing an immediate value into a register.
But what about more complex situations, such as the following:
mov QWORD PTR [rip+0x21b520], 0x1
In this case, isn't the QWORD PTR size directive redundant since, according to the above guide, it can be assumed that we want to move 8 bytes into the destination register due to the fact that RIP is 8 bytes? What are the definitive rules for size directives on the x86 architecture? I couldn't find an answer for this anywhere, thanks.
Update: As Ross pointed out, the destination in the above example isn't a register. Here's a more relevant example:
mov esi, DWORD PTR [rax*4+0x419260]
In this case, can't it be assumed that we want to move 4 bytes because ESI is 4 bytes, making the DWORD PTR directive redundant?
You're right; it is rather ambiguous. Assuming we're talking about Intel syntax, it is true that you can often get away with not using size directives. Any time the assembler can figure it out automatically, they are optional. For example, in the instruction
mov esi, DWORD PTR [rax*4+0x419260]
the DWORD PTR specifier is optional for exactly the reason you suppose: the assembler can figure out that it is to move a DWORD-sized value, since the value is being moved into a DWORD-sized register.
Similarly, in
mov rsi, QWORD PTR [rax*4+0x419260]
the QWORD PTR specifier is optional for the exact same reason.
But it is not always optional. Consider your first example:
mov QWORD PTR [rip+0x21b520], 0x1
Here, the QWORD PTR specifier is not optional. Without it, the assembler has no idea what size value you want to store starting at the address rip+0x21b520. Should 0x1 be stored as a BYTE? Extended to a WORD? A DWORD? A QWORD? Some assemblers might guess, but you can't be assured of the correct result without explicitly specifying what you want.
In other words, when the value is in a register operand, the size specifier is optional because the assembler can figure out the size based on the size of the register. However, if you're dealing with an immediate value or a memory operand, the size specifier is probably required to ensure you get the results you want.
Personally, I prefer to always include the size when I write code. It's a couple of characters more typing, but it forces me to think about it and state explicitly what I want. If I screw up and code a mismatch, then the assembler will scream loudly at me, which has caught bugs more than once. I also think having it there enhances readability. So here I agree with old_timer, even though his perspective appears to be somewhat unpopular.
Disassemblers also tend to be verbose in their outputs, including the size specifiers even when they are optional. Hans Passant theorized in the comments this was to preserve backwards-compatibility with old-school assemblers that always needed these, but I'm not sure that's true. It might be part of it, but in my experience, disassemblers tend to be wordy in lots of different ways, and I think this is just to make it easier to analyze code with which you are unfamiliar.
Note that AT&T syntax uses a slightly different tact. Rather than writing the size as a prefix to the operand, it adds a suffix to the instruction mnemonic: b for byte, w for word, l for dword, and q for qword. So, the three previous examples become:
movl 0x419260(,%rax,4), %esi
movq 0x419260(,%rax,4), %rsi
movq $0x1, 0x21b520(%rip)
Again, on the first two instructions, the l and q prefixes are optional, because the assembler can deduce the appropriate size. On the last instruction, just like in Intel syntax, the prefix is non-optional. So, the same thing in AT&T syntax as Intel syntax, just a different format for the size specifiers.
RIP, or any other register in the address is only relevant to the addressing mode, not the width of data transfered. The memory reference [rip+0x21b520] could be used with a 1, 2, 4, or 8-byte access, and the constant value 0x01 could also be 1 to 8 bytes (0x01 is the same as 0x00000001 etc.) So in this case, the operand size has to be explicitly mentioned.
With a register as the source or destination, the operand size would be implicit: if, say, EAX is used, the data is 32 bits or 4 bytes:
mov [rip+0x21b520],eax
And of course, in the awfully beautiful AT&T syntax, the operand size is marked as a suffix to the instruction mnemonic (the l here).
movl $1, 0x21b520(%rip)
it gets worse than that, an assembly language is defined by the assembler, the program that reads/interprets/parses it. And x86 in particular but as a general rule there is no technical reason for any two assemblers for the same target to have the same assembly language, they tend to be similar, but dont have to be.
You have fallen into a couple of traps, first off the specific syntax used for the assembler you are using with respect to the size directive, then second, is there a default. My recommendation is ALWAYS use the size directive (or if there is a unique instruction mnemonic), then you never have to worry about it right?
I'm trying to write an array to the stack in x86 assembly using the AT&T (GAS) syntax. I have ran into an issue whereby I can write ~8.37 million entries to the stack, and then I get a Segmentation fault if I try to write any more. I don't know why this is, here's the code I'm using:
mov %rsp, %rbp
mov $8378658, %rdx
writeDataLoop:
sub %rdx, %rbp
movb $0b1, (%rbp)
add %rdx, %rbp
sub $1, %rdx
cmp $0, %rdx
jg writeDataLoop
Another odd thing that I've found is that the limit at which I can write data up to changes very slightly with each run (it's roughly at 8378658, which is also nothing significant in hex (0x7fd922). Can anyone tell me how to write more data to the stack, and also potentially explain what this arbitrary stack write limit is? Thanks.
To start off with, the default stack size is 8MB, this is the stack limit I was reaching. This can be found with the ulimit -a command. However, one should not use the stack for large amounts of data (usually arrays). Instead, the .space directive should be used, which, using AT&T syntax, takes the amount of data to store in bytes: .space <buffersize>. This can be labelled, for example:
buffer: .space 1024
This allocates 1024 bytes of storage to the label buffer. This label should be in the .bss section of your program, allowing for read and write access.
Finally, to access this data (read or write), one can use buffer(%rax), where rax is the offset.
Edit:
Using the .bss section is more efficient file-size wise than using the .data section, you just have to manually initialize the array.
I'm trying to create an object-oriented class using GNU assembly for educational purposes. I have many questions regarding the use of the .struct directive:
It is said that this directive switch the code to the absolute section. Why is it named .struct then? Does it have anything to do with the struct of C?
What is the difference between using:
.set Object.data, 8
.set Object.func_pointer, 16
and using
.struct 0
Object:
Object.data:
.struct Object + 8
Object.func_pointer:
.struct Object.data + 8
Object_size = . - Object
Is this actually the way to use the .struct directive or it it a miss-use? To be honest I don't even know what I'm doing with the .struct directive. It looks useless to me?
The full code I provided below will allocate one instance of the Object class manually. If I want to instantiate the class whenever I want using a constructor function, do I need to move my object onto the stack and how would that be done? using .bss I can name my instance (for example test_object) but I'm not sure if I can name a stack address.
Full code so you can understand what I'm talking about: (GNU as, Linux x64)
.data
msg: .asciz "Hello world!\n"
.set msglen, . - msg
# CLASS struct
# This class struct only hold pointers to functions available for the class
.struct 0
Object:
# Class variables
Object.data:
.struct Object + 8
# Class pointers
Object.func_pointer:
.struct Object.data + 8
Object_size = . - Object
# Allocate Object_size bytes in the bss section to store the object instance
.bss
test_object:
.space Object_size
#----------------------------------------------------------------------------
# Code starts here
.text
#----------------------------------------------------------------------------
# Simple hello world function used as the target
hellofunc:
push %rbp
mov %rsp, %rbp
mov $1, %rax
mov $1, %rdi
mov $msg, %rsi
mov $msglen, %rdx
syscall
leave
ret
# Initiate the Object class by:
# Allocate a chunk of Object_size bytes on the stack
# Load value given in %rdi into Object.data
# Load the address of hellofunc into Object.func_pointer
initiate:
# Prologue
push %rbp
mov %rsp, %rbp
# Make %rdx point at the current object instance
mov $test_object, %rdx
# Load hellofunc address onto test_object.func_pointer
mov $hellofunc, %rax
mov %rax, Object.func_pointer(%rdx)
# Load data from %rdi onto test_object.data
mov %rdi, Object.data(%rdx)
# Test call to see if test_object.func_pointer is properly loaded
call *Object.func_pointer(%rdx)
# Epilogue
leave
ret
#---------------------------------------------------------------------------
# Main function
.globl _start
#---------------------------------------------------------------------------
_start:
mov $0xDEADBEEF, %rdi
call initiate
# return(0);
mov $60, %rax
mov $0, %rdi
syscall
I'm pretty sure all you're ever going to get out of any assembly language struct syntax is symbolic names for offsets, and a way to define them using the same .space directives that you'd normally use to reserve space for one static instance of a struct in the BSS. That's what NASM gives you, for example, and GAS's is even more rudimentary than that.
So yes, it's like a struct foo {int x; char c;}; definition in C: you use syntax that looks like declaring (reserving space for) global variables, but instead you're just defining the layout of a struct. (Except in GAS, you can only use .space and similar directives that can't take a value, no distinguishing one int from an array of 4 chars for example.)
But in GAS, it's so primitive that yes, it's just one way to
separately set a bunch of assemble-time constants with names like Object.func_pointer. You could do exactly the same thing with Object.func_pointer = 8, as well.
What is the difference between using: (.set) and (.struct)
Nothing. That's all it's doing for you. There isn't anything like a concept of a struct object, it's just one part of the tools that you can use to get symbolic names for offsets. They chose to call it .struct because that's one use-case.
However, you're repeating yourself unnecessarily by using .struct repeatedly (the example in the GAS manual does this, too; I don't know why). AFAIK you do have to repeat the struct name in each label (because GAS .struct syntax isn't very sophisticated), but you can use .space instead of having to name the previous field.
.struct 0
Object:
# .space 4
Object.obdata:
.space 8
Object.func_pointer:
.space 8
Object_size = . - Object
.text
mov $Object.obdata, %eax
mov $Object.func_pointer, %ecx
gcc -c foo.s && objdump -drwC -Mintel foo.o
...
0: b8 00 00 00 00 mov eax,0x0
5: b9 08 00 00 00 mov ecx,0x8
The names you choose to define after switching to the absolute section are 100% up to you, and not associated with any overall name for the struct.
NASM's struct support is a bit more sophisticated: you can define a layout in one spot, and use those member names when statically initializing an instance of it (e.g. in the .data section). Nasm - access struct elements by value and by address has an example. Note that it involves endstruc and iend directives, because you are actually defining a struct with members, not just separately setting a bunch of assemble-time constants with names like Object.func_pointer
Anything else like "constructors" you'll have to do yourself. e.g. writing the members into memory somewhere that you've decided is now an instance of a struct object.
This is assembly language, there is no compiler, any instructions you want in the final machine code you're going to have to write explicitly. (Unless you were using MASM, which magically adds instructions to your code in a few cases. GAS certainly doesn't; it was primarily designed to compiler compiler-generated asm. Deciding to emit instructions for a "constructor" at a certain point is something a compiler does internally, not via any special asm syntax. Same for hand-written asm.)
I am using NASM assembler on linux 64 bit.
There is something with variables and registers I can't understand.
I create a variable named "msg":
msg db "hello, world"
Now when I want to write to the stdout I move the msg to rsi register, however I don't understand the mov instruction bitwise ... the rsi register consists of 64 bit , while the msg variable has 12 symbols which is 8 bits each , which means the msg variable has a size of 12 * 8 bits , which is greater than 64 bits obviously.
So how is this even possible to make an instruction like:
mov rsi, msg , without overflowing the memory allocated for rsi.
Or does the rsi register contain the memory location of the first symbol of the string and after writing 1 symbol it changes to the memory location of the next symbol?
Sorry if I wrote complete nonsense, I'm new to assembly and i just can't get the grasp of it for a while.
In NASM syntax (unlike MASM syntax) mov rsi, symbol puts the address of the symbol into RSI. (Inefficiently with a 64-bit absolute immediate; use a RIP-relative LEA or mov esi, symbol instead. How to load address of function or label into register in GNU Assembler)
mov rsi, [symbol] would load 8 bytes starting at symbol. It's up to you to choose a useful place to load 8 bytes from when you write an instruction like that.
mov rsi, msg ; rsi = address of msg. Use lea rsi, [rel msg] instead
movzx eax, byte [rsi+1] ; rax = 'e' (upper 7 bytes zeroed)
mov edx, [msg+6] ; rdx = ' wor' (upper 4 bytes zeroed)
Note that you can use mov esi, msg because symbol addresses always fit in 32 bits (in the default "small" code model, where all static code/data goes in the low 2GB of virtual address space). NASM makes this optimization for you with assemble-time constants (like mov rax, 1), but probably it can't with link-time constants. Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?
and after writing 1 symbol it changes to the memory location of the next symbol?
No, if you want that you have to inc rsi. There is no magic. Pointers are just integers that you manipulate like any other integers, and strings are just bytes in memory.
Accessing registers doesn't magically modify them.
There are instructions like lodsb and pop that load from memory and increment a pointer (rsi or rsp respectively), but x86 doesn't have any pre/post-increment/decrement addressing modes, so you can't get that behaviour with mov even if you want it. Use add/sub or inc/dec.
Disclaimer: I'm not familiar with the flavor of assembly that you're dealing with, so the following is more general. The particular flavor may have more features than what I'm used to. In general, assembly deals with single byte/word entities where the size depends on the processor. I've done quite a bit of work on 8 and 16-bit processors, so that is where my answer is coming from.
General statements about Assembly:
Assembly is just like a high level language, except you have to handle a lot more of the details. So if you're used to some operation in say C, you can start there and then break the operation down even further.
For instance, if you have declared two variables that you want to add, that's pretty easy in C:
x = a + b;
In assembly, you have to break that down further:
mov R1, a * get value from a into register R1
mov R2, b * get value from b into register R2
add R1,R2 * perform the addition (typically goes into a particular location I'll call it the accumulator
mov x, acc * store the result of the addition from the accumulator into x
Depending on the flavor of assembly and the processor, you may be able to directly refer to variables in the addition instruction, but like I said I would have to look at the specific flavor you're working with.
Comments on your specific question:
If you have a string of characters, then you would have to move each one individually using a loop of some sort. I would set up a register to contain the starting address of your string, and then increment that register after each character is moved. It acts like a pointer in C. You will need to have some sort of indication for the termination of the string or another value that tells the size of the string, so you know when to stop.
I'm trying to have thread-safe local variables in an assembly program.
I've searched on the net, but I haven't found anything simple.
I'm currently using GCC assembler, as the program is a mix of C code and assembly, but the final program will contain code for multiple-platforms / calling conventions.
For now, I've declared my variables using the .lcomm pseudo-op.
As I understand it, those variables will be placed in the .bss section.
So I guess they will be shared by all threads.
Is there a way to have a kind of TLS variables directly in assembly, or should I use platform-specific implementations, like pthread or __declspec on Windows?
Hope it's clear enough. Don't hesitate to ask if more information is needed.
Thanks to everyone,
EDIT
Here's the code in question:
.lcomm stack0, 8
.lcomm stack1, 8
.globl _XSRuntime_CallMethod
_XSRuntime_CallMethod:
pushq %rbp
movq %rsp, %rbp
xor %rax, %rax
popq stack0( %rip )
popq stack1( %rip )
callq *%rdi
pushq stack1( %rip )
pushq stack0( %rip )
leave
ret
Basically, it's used to redirect a call to a C function.
The C prototype is:
extern uint64_t XSRuntime_CallMethod( void ( *m )( void * self, ... ), ... );
It takes a function pointer as first argument, hence the callq *%rdi, as I'm testing this with system V ABI.
The assembly code is very simple, and I'd like to keep it that way, so it can be easily maintainable.
The question is: how to make the stack0 and stack1 variables thread safe.
Not that familiar with assembler so:
.lcomm stack0, 8
.lcomm stack1, 8
.globl _XSRuntime_CallMethod
_XSRuntime_CallMethod:
pushq %rbp // save BP
movq %rsp, %rbp // load BP with SP
xor %rax, %rax // clear AX
popq stack0( %rip ) // pop return address into STACK0
popq stack1( %rip ) // pop flags into stack1
callq *%rdi // call the indirect procedure, so putting flags/return to XSRuntime_CallMethod onto stack
pushq stack1( %rip ) // put caller flags onto stack
pushq stack0( %rip ) // put caller return onto stack
leave // clean passed parameters from stack
ret // and back to caller
Is this how it works, yes??
If so, would it not be easier to just jump to the indirect procedure, rather than calling it? You then don't need any extra variables to hold the caller flags/return & the indirected procedure returns directly to the caller.
Just a suggestion - while since I did assembler.
If you have to store the caller address somewhere, dec the SP, (enter?), and use the stack frame. Anything else is likely to be thread-unsafe at some point.
Rgds,
Martin
Well, with a TLS maybee not thread-unsafe, but what about any recursive calls? You end up with another stack in the TLS to cover this, so you may as well use the 'SP' stack
Martin
How do you think the compiler implements thread-local variables? Try compiling such a program with -S or /FAs and you'll see. Hint: it has to rely on OS-specific APIs or other details to get access to TLS storage. Sometimes the preparation steps are hidden in the CRT but there's no single way to do it.
For example, here's how recent MSVC does it:
_TLS SEGMENT
?number##3HA DD 01H DUP (?) ; number
_TLS ENDS
EXTRN __tls_array:DWORD
EXTRN __tls_index:DWORD
_TEXT SEGMENT
[...]
mov eax, DWORD PTR __tls_index
mov ecx, DWORD PTR fs:__tls_array
mov edx, DWORD PTR [ecx+eax*4]
mov eax, DWORD PTR ?number##3HA[edx]
As you can see, it uses special variables which are initialized by the CRT.
On recent Linux, GCC can use TLS-specific relocations:
.globl number
.section .tbss,"awT",#nobits
number:
.zero 4
.text
[...]
movl %gs:number#NTPOFF, %eax
If you want portability, it's best not to rely on such OS-specific details but use a generic API such as pthread or use stack-based approach proposed by Martin. But I guess if you wanted portability you wouldn't use assembler :)
?? 'Classic' local variables, ie. parameters/variables/results accessed by stack offsets, are inherently thread-safe.
If you need a 'TLS' that is platform-agnostic, pass some suitable struct/class instance into all threads, either as the creation parameter, in a thread field before resuming all the threads, first message to the threads input queue or whatever...
Rgds,
Martin
As previously mentioned, local variables (stack-based) are inherently thread-safe because each thread has its own stack.
A thread-safe variable which is reachable by all threads (not stack-based) is probably best implemented using a spin-lock (or the equivalent in the Windows NT engine, a critical section). Such a variable must be locked before access, accessed and then unlocked. One variant could be that reads are free, but writes must be framed by locking/unlocking.
AFAIK only compilers themselves don't implement thread-safe variables. Instead they provide lib functions which access the required OS functionality.
You should probably be using(placing calls to) the TlsAlloc & TlsFree (or their other OS equivalents) to do something like this. the returned indices can be stored in global set once, read-only vars for easy use.
Depending on what the vars hold and what the code using them does, you might be able to get away with atomic ops, but that might yield problems of its own.