Global Offset Table: "Pointers to Pointers"? Is this handled by the loader? - linux

This question is about Linux (Ubuntu) executables.
I'll detail things as I understand them to make it clearer if anything's off (so please correct me where applicable):
The GOT acts an extra level of indirection to enable accessing data from a text section which needs to be position-independent, for instance because the text section might be readonly and the actual addresses of the data may be unknown at (static) linking time.
The GOT then holds addresses to the actual data's location, which is known at loading time, and so the dynamic linker (which is invoked by the loader) is able to modify the appropriate GOT entries and make them point to the actual data.
The main thing that confuses me – not the only one at the moment, mind you :-) – is that this means the addresses in the text section now point to a value "of a different type":
If there was no GOT, I'd have expected this address (for instance in a RIP-relative addressing mode) to point to the actual value I'm after. With a GOT, though, I expect it to point to the appropriate GOT entry, which in turn holds the address to the value I'm after. In this case, there's an extra "dereferencing" required here.
Am I somehow misunderstanding this? If I use RIP-relative addressing, shouldn't the computed address (RIP+offset) be the actual address used in the instruction? So (in AT&T syntax):
mov $fun_data(%rip), %rax
To my understanding, without GOT, this should be "rax = *(rip + (fun_data - rip))", or in short: rax = *fun_data.
With GOT, however, I expect this to be equivalent to rax = **fun_data, since *fun_data is just the GOT entry to the real fun_data.
Am I wrong about this, or is it just that the loader somehow knows to access the real data if the pointer is into the GOT? (In other words: that in a PIE, I suppose, some pointers effectively become pointers-to-pointers?)

Am I wrong about this
No.
or is it just that the loader somehow knows to access the real data if the pointer is into the GOT?
The compiler knows that double dereference is required.
Compile this source with and without -fPIC and observe for yourself:
extern int dddd;
int fn() { return dddd; }
Without -fPIC, you get (expected):
movl dddd(%rip), %eax
With -fPIC you get "double dereference":
movq dddd#GOTPCREL(%rip), %rax # move pointer to dddd into RAX
movl (%rax), %eax # dereference it

Related

brk segment overflow error in x86 assembly [duplicate]

When to use size directives in x86 seems a bit ambiguous. This x86 assembly guide says the following:
In general, the intended size of the of the data item at a given memory
address can be inferred from the assembly code instruction in which it is
referenced. For example, in all of the above instructions, the size of
the memory regions could be inferred from the size of the register
operand. When we were loading a 32-bit register, the assembler could
infer that the region of memory we were referring to was 4 bytes wide.
When we were storing the value of a one byte register to memory, the
assembler could infer that we wanted the address to refer to a single
byte in memory.
The examples they give are pretty trivial, such as mov'ing an immediate value into a register.
But what about more complex situations, such as the following:
mov QWORD PTR [rip+0x21b520], 0x1
In this case, isn't the QWORD PTR size directive redundant since, according to the above guide, it can be assumed that we want to move 8 bytes into the destination register due to the fact that RIP is 8 bytes? What are the definitive rules for size directives on the x86 architecture? I couldn't find an answer for this anywhere, thanks.
Update: As Ross pointed out, the destination in the above example isn't a register. Here's a more relevant example:
mov esi, DWORD PTR [rax*4+0x419260]
In this case, can't it be assumed that we want to move 4 bytes because ESI is 4 bytes, making the DWORD PTR directive redundant?
You're right; it is rather ambiguous. Assuming we're talking about Intel syntax, it is true that you can often get away with not using size directives. Any time the assembler can figure it out automatically, they are optional. For example, in the instruction
mov esi, DWORD PTR [rax*4+0x419260]
the DWORD PTR specifier is optional for exactly the reason you suppose: the assembler can figure out that it is to move a DWORD-sized value, since the value is being moved into a DWORD-sized register.
Similarly, in
mov rsi, QWORD PTR [rax*4+0x419260]
the QWORD PTR specifier is optional for the exact same reason.
But it is not always optional. Consider your first example:
mov QWORD PTR [rip+0x21b520], 0x1
Here, the QWORD PTR specifier is not optional. Without it, the assembler has no idea what size value you want to store starting at the address rip+0x21b520. Should 0x1 be stored as a BYTE? Extended to a WORD? A DWORD? A QWORD? Some assemblers might guess, but you can't be assured of the correct result without explicitly specifying what you want.
In other words, when the value is in a register operand, the size specifier is optional because the assembler can figure out the size based on the size of the register. However, if you're dealing with an immediate value or a memory operand, the size specifier is probably required to ensure you get the results you want.
Personally, I prefer to always include the size when I write code. It's a couple of characters more typing, but it forces me to think about it and state explicitly what I want. If I screw up and code a mismatch, then the assembler will scream loudly at me, which has caught bugs more than once. I also think having it there enhances readability. So here I agree with old_timer, even though his perspective appears to be somewhat unpopular.
Disassemblers also tend to be verbose in their outputs, including the size specifiers even when they are optional. Hans Passant theorized in the comments this was to preserve backwards-compatibility with old-school assemblers that always needed these, but I'm not sure that's true. It might be part of it, but in my experience, disassemblers tend to be wordy in lots of different ways, and I think this is just to make it easier to analyze code with which you are unfamiliar.
Note that AT&T syntax uses a slightly different tact. Rather than writing the size as a prefix to the operand, it adds a suffix to the instruction mnemonic: b for byte, w for word, l for dword, and q for qword. So, the three previous examples become:
movl 0x419260(,%rax,4), %esi
movq 0x419260(,%rax,4), %rsi
movq $0x1, 0x21b520(%rip)
Again, on the first two instructions, the l and q prefixes are optional, because the assembler can deduce the appropriate size. On the last instruction, just like in Intel syntax, the prefix is non-optional. So, the same thing in AT&T syntax as Intel syntax, just a different format for the size specifiers.
RIP, or any other register in the address is only relevant to the addressing mode, not the width of data transfered. The memory reference [rip+0x21b520] could be used with a 1, 2, 4, or 8-byte access, and the constant value 0x01 could also be 1 to 8 bytes (0x01 is the same as 0x00000001 etc.) So in this case, the operand size has to be explicitly mentioned.
With a register as the source or destination, the operand size would be implicit: if, say, EAX is used, the data is 32 bits or 4 bytes:
mov [rip+0x21b520],eax
And of course, in the awfully beautiful AT&T syntax, the operand size is marked as a suffix to the instruction mnemonic (the l here).
movl $1, 0x21b520(%rip)
it gets worse than that, an assembly language is defined by the assembler, the program that reads/interprets/parses it. And x86 in particular but as a general rule there is no technical reason for any two assemblers for the same target to have the same assembly language, they tend to be similar, but dont have to be.
You have fallen into a couple of traps, first off the specific syntax used for the assembler you are using with respect to the size directive, then second, is there a default. My recommendation is ALWAYS use the size directive (or if there is a unique instruction mnemonic), then you never have to worry about it right?

Linux assembly x86 | trying to get stack value, syntax error

I have been messing around with linux assembly on an x86 machine,
Basically my question is: I have pushed couple values into the stack moved the stack pointer into the base pointer and moved a value of 8 into a register to get a pushed value and in the end i wanted to get the value and put it into %ebx for the system call so i would get the value, but it seems to get an error. no clue why.
Error is: junk (%ebp) after register
Example:
.section .data
.section .text
.globl _start
_start:
pushl $50
pushl $20
movl %esp,%ebp
movl $8,%edx
movl %edx(%ebp),%ebx ## Supposed to be return value at system termination // PROBLEM HERE
movl $1,%eax ## System call
int $0x80 # Terminate program
I think part of the problem might be that in x86 the stack actually grows downwards, not up. You're adding to the base pointer, which is giving junk, where you have to subtract from it. I don't have an x86 machine so I can't test this, but have you tried something like movl -%edx(%ebp),%ebx?
Oops, I reversed the direction of the operands in my head. In this case, your stack looks like this:
1952 - ???
1948 - 20
1944 - 50 <- ebp <- esp
So when you take ebp+8, you aren't getting 20, you're getting address 1952, and you don't know what that contains.
Check out the links in https://stackoverflow.com/tags/x86/info. I updated them recently, and added the info about using gdb to single-step asm.
What do you mean "get an error"? Segmentation fault? Syntax error? (The normal syntax is (%ebp, %edx). Only numeric-constant displacements go outside the parens, e.g. -4(%ebp, %edx))
Also, if you're going to use stack frame pointers at all, do the mov %esp, %ebp after pushing any registers you want to preserve, but before pushing args to any functions you're going to call. However, there's no need to use %ebp that way at all, though. gcc defaults to -fomit-frame-pointer since 4.4 I think. It can make it easier to keep track of where your local variables are, if you're pushing/popping stuff.
You might want to just start with 64bit asm, instead of messing around with the obsolete x86 args-on-the-stack ABI.
This just made me think of what's probably wrong with your code. You're probably getting a segfault. (But you didn't say if it was that, syntax error, or something else.) Because you probably built your code in 64bit mode. Build a 32bit binary, or change your code to use %rsp.
You might want to just start with 64bit asm, instead of messing around with the obsolete x86 args-on-the-stack ABI.
This just made me think of what's probably wrong with your code. You're probably getting a segfault. (But you didn't say if it was that, syntax error, or something else.) Because you probably built your code in 64bit mode. Build a 32bit binary, or change your code to use %rsp.

What is the purpose of this code segment from glibc

I am trying to understand what the following code segment from tls.h in glibc is doing and why:
/* Macros to load from and store into segment registers. */
# define TLS_GET_FS() \
({ int __seg; __asm ("movl %%fs, %0" : "=q" (__seg)); __seg; })
I think I understand the basic operation it is moving the value stored in the fs register to __seg. However, I have some questions:
My understanding is the fs is only 16-bits. Is this correct? What happens when the value gets moved to a quadword memory location? Does this mean the upper bits get set to 0?
More importantly I think that the scope of the variable __seg that gets declared at the start of the segment is limited to this segment. So how is __seg useful? I'm sure that the authors of glibc have a good reason for doing this but I can't figure out what it is from looking at the source code.
I tried generating assembly for this code and I got the following?
#APP
# 13 "fs-test.cpp" 1
movl %fs, %eax
# 0 "" 2
#NO_APP
So in my case it looks like eax was used for __seg. But I don't know if that is always what happens or if it was just what happened in the small test file that I compiled. If it is always going to use eax why wouldn't the assembly be written that way? If the compiler might pick other registers then how will the programmer know which one to access since __seg goes out of scope at the end of the macro? Finally I did not see this macro used anywhere when I grepped for it in the glibc source code, so that further adds to my confusion about what its purpose is. Any explanation about what the code is doing and why is appreciated.
My understanding is the fs is only 16-bits. Is this correct? What happens when the value gets moved to a quadword memory location? Does this mean the upper bits get set to 0?
Yes.
the variable __seg that gets declared at the start of the segment is limited to this segment. So how is __seg useful?
You have to read about GCC statement-expression extension. The value of statement expression is the value of the last expression in it. The __seg; at the end would be useless, unless one assigns it to something else, like this:
int foo = TLS_GET_FS();
Finally I did not see this macro used anywhere when I grepped for it in the glibc source code
The TLS_{GET,SET}_FS in fact do not appear to be used. They probably were used in some version, then accidentally left over when the code referencing them was removed.

Good references for the syscalls

I need some reference but a good one, possibly with some nice examples. I need it because I am starting to write code in assembly using the NASM assembler. I have this reference:
http://bluemaster.iu.hio.no/edu/dark/lin-asm/syscalls.html
which is quite nice and useful, but it's got a lot of limitations because it doesn't explain the fields in the other registers. For example, if I am using the write syscall, I know I should put 1 in the EAX register, and the ECX is probably a pointer to the string, but what about EBX and EDX? I would like that to be explained too, that EBX determines the input (0 for stdin, 1 for something else etc.) and EDX is the length of the string to be entered, etc. etc. I hope you understood me what I want, I couldn't find any such materials so that's why I am writing here.
Thanks in advance.
The standard programming language in Linux is C. Because of that, the best descriptions of the system calls will show them as C functions to be called. Given their description as a C function and a knowledge of how to map them to the actual system call in assembly, you will be able to use any system call you want easily.
First, you need a reference for all the system calls as they would appear to a C programmer. The best one I know of is the Linux man-pages project, in particular the system calls section.
Let's take the write system call as an example, since it is the one in your question. As you can see, the first parameter is a signed integer, which is usually a file descriptor returned by the open syscall. These file descriptors could also have been inherited from your parent process, as usually happens for the first three file descriptors (0=stdin, 1=stdout, 2=stderr). The second parameter is a pointer to a buffer, and the third parameter is the buffer's size (as an unsigned integer). Finally, the function returns a signed integer, which is the number of bytes written, or a negative number for an error.
Now, how to map this to the actual system call? There are many ways to do a system call on 32-bit x86 (which is probably what you are using, based on your register names); be careful that it is completely different on 64-bit x86 (be sure you are assembling in 32-bit mode and linking a 32-bit executable; see this question for an example of how things can go wrong otherwise). The oldest, simplest and slowest of them in the 32-bit x86 is the int $0x80 method.
For the int $0x80 method, you put the system call number in %eax, and the parameters in %ebx, %ecx, %edx, %esi, %edi, and %ebp, in that order. Then you call int $0x80, and the return value from the system call is on %eax. Note that this return value is different from what the reference says; the reference shows how the C library will return it, but the system call returns -errno on error (for instance -EINVAL). The C library will move this to errno and return -1 in that case. See syscalls(2) and intro(2) for more detail.
So, in the write example, you would put the write system call number in %eax, the first parameter (file descriptor number) in %ebx, the second parameter (pointer to the string) in %ecx, and the third parameter (length of the string) in %edx. The system call will return in %eax either the number of bytes written, or the error number negated (if the return value is between -1 and -4095, it is a negated error number).
Finally, how do you find the system call numbers? They can be found at /usr/include/linux/unistd.h. On my system, this just includes /usr/include/asm/unistd.h, which finally includes /usr/include/asm/unistd_32.h, so the numbers are there (for write, you can see __NR_write is 4). The same goes for the error numbers, which come from /usr/include/linux/errno.h (on my system, after chasing the inclusion chain I find the first ones at /usr/include/asm-generic/errno-base.h and the rest at /usr/include/asm-generic/errno.h). For the system calls which use other constants or structures, their documentation tells which headers you should look at to find the corresponding definitions.
Now, as I said, int $0x80 is the oldest and slowest method. Newer processors have special system call instructions which are faster. To use them, the kernel makes available a virtual dynamic shared object (the vDSO; it is like a shared library, but in memory only) with a function you can call to do a system call using the best method available for your hardware. It also makes available special functions to get the current time without even having to do a system call, and a few other things. Of course, it is a bit harder to use if you are not using a dynamic linker.
There is also another older method, the vsyscall, which is similar to the vDSO but uses a single page at a fixed address. This method is deprecated, will result in warnings on the system log if you are using recent kernels, can be disabled on boot on even more recent kernels, and might be removed in the future. Do not use it.
If you download that web page (like it suggests in the second paragraph) and download the kernel sources, you can click the links in the "Source" column, and go directly to the source file that implements the system calls. You can read their C signatures to see what each parameter is used for.
If you're just looking for a quick reference, each of those system calls has a C library interface with the same name minus the sys_. So, for example, you could check out man 2 lseek to get the information about the parameters forsys_lseek:
off_t lseek(int fd, off_t offset, int whence);
where, as you can see, the parameters match the ones from your HTML table:
%ebx %ecx %edx
unsigned int off_t unsigned int

Can anybody explain some simple assembly code?

I have just started to learn assembly. This is the dump from gdb for a simple program which prints hello ranjit.
Dump of assembler code for function main:
0x080483b4 <+0>: push %ebp
0x080483b5 <+1>: mov %esp,%ebp
0x080483b7 <+3>: sub $0x4,%esp
=> 0x080483ba <+6>: movl $0x8048490,(%esp)
0x080483c1 <+13>: call 0x80482f0 <puts#plt>
0x080483c6 <+18>: leave
0x080483c7 <+19>: ret
My questions are :
Why every time ebp is pushed on to stack at start of the program? What is in the ebp which is necessary to run this program?
In second line why is ebp copied to esp?
I can't get the third line at all. what I know about SUB syntax is "sub dest,source", but here how can esp be subtracted from 4 and stored in 4?
What is this value "$0x8048490"? Why it is moved to esp, and why this time is esp closed in brackets? Does it denote something different than esp without brackets?
Next line is the call to function but what is this "0x80482f0"?
What is leave and ret (maybe ret means returning to lib c.)?
operating system : ubuntu 10, compiler : gcc
ebp is used as a frame pointer in Intel processors (assuming you're using a calling convention that uses frames).
It provides a known point of reference for locating passed-in parameters (on one side) and local variables (on the other) no matter what you do with the stack pointer while your function is active.
The sequence:
push %ebp ; save callers frame pointer
mov %esp,%ebp ; create a new frame pointer
sub $N,%esp ; make space for locals
saves the frame pointer for the previous stack frame (the caller), loads up a new frame pointer, then adjusts the stack to hold things for the current "stack level".
Since parameters would have been pushed before setting up the frame, they can be accessed with [bp+N] where N is a suitable offset.
Similarly, because locals are created "under" the frame pointer, they can be accessed with [bp-N].
The leave instruction is a single one which undoes that stack frame. You used to have to do it manually but Intel introduced a faster way of getting it done. It's functionally equivalent to:
mov %ebp, %esp ; restore the old stack pointer
pop %ebp ; and frame pointer
(the old, manual way).
Answering the questions one by one in case I've missed something:
To start a new frame. See above.
It isn't. esp is copied to ebp. This is AT&T notation (the %reg is a dead giveaway) where (among other thing) source and destination operands are swapped relative to Intel notation.
See answer to (2) above. You're subtracting 4 from esp, not the other way around.
It's a parameter being passed to the function at 0x80482f0. It's not being loaded into esp but into the memory pointed at by esp. In other words, it's being pushed on the stack. Since the function being called is puts (see (5) below), it will be the address of the string you want putsed.
The function name in the <> after the address. It's calling the puts function (probably the one in the standard library though that's not guaranteed). For a description of what the PLT is, see here.
I've already explained leave above as unwinding the current stack frame before exiting. The ret simply returns from the current function. If the current functtion is main, it's going back to the C startup code.
In my career I learned several assembly languages, you didn't mention which but it appears Intel x86 (segmented memory model as PaxDiablo pointed out). However, I have not used assembly since last century (lucky me!). Here are some of your answers:
The EBP register is pushed onto the stack at the beginning because we need it further along in other operations of the routine. You don't want to just discard its original value thus corrupting the integrity of the rest of the application.
If I remember correctly (I may be wrong, long time) it is the other way around, we are moving %esp INTO %ebp, remember we saved it in the previous line? now we are storing some new value without destroying the original one.
Actually they are SUBstracting the value of four (4) FROM the contents of the %esp register. The resulting value is not stored on "four" but on %esp. If %esp had 0xFFF8 after the SUB it will contain 0xFFF4. I think this is called "Immediate" if my memory serves me. What is happening here (I reckon) is the computation of a memory address (4 bytes less).
The value $0x8048490 I don't know. However, it is NOT being moved INTO %esp but rather INTO THE ADDRESS POINTED TO BY THE CONTENTS OF %esp. That is why the notation is (%esp) rather than %esp. This is kind of a common notation in all assembly languages I came about in my career. If on the other hand the right operand was simply %esp, then the value would have been moved INTO the %esp register. Basically the %esp register's contents are being used for addressing.
It is a fixed value and the string on the right makes me think that this value is actually the address of the puts() (Put String) compiler library routine.
"leave" is an instrution that is the equivalent of "pop %ebp". Remember we saved the contents of %ebp at the beginning, now that we are done with the routine we are restoring it back into the register so that the caller gets back to its context. The "ret" instruction is the final instruction of the routine, it "returns" to the caller.

Resources