how __libc_start_main#plt works? - linux

To study how the object file loaded and run in linux, I made the simplest c code, file name simple.c.
int main(){}
Next, I make object file and save object file as text file.
$gcc ./simple.c
$objdump -xD ./a.out > simple.text
From many internet articles, I could catch that gcc dynamically load initiating functions like _start, _init, __libc_start_main#plt, and so on. So I started to read my assembly code, helped by http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html .
Here is the some part of assembly code.
080482e0 <__libc_start_main#plt>:
80482e0: ff 25 10 a0 04 08 jmp *0x804a010
80482e6: 68 08 00 00 00 push $0x8
80482eb: e9 d0 ff ff ff jmp 80482c0 <_init+0x2c>
Disassembly of section .text:
080482f0 <_start>:
80482f0: 31 ed xor %ebp,%ebp
80482f2: 5e pop %esi
80482f3: 89 e1 mov %esp,%ecx
80482f5: 83 e4 f0 and $0xfffffff0,%esp
80482f8: 50 push %eax
80482f9: 54 push %esp
80482fa: 52 push %edx
80482fb: 68 70 84 04 08 push $0x8048470
8048300: 68 00 84 04 08 push $0x8048400
8048305: 51 push %ecx
8048306: 56 push %esi
8048307: 68 ed 83 04 08 push $0x80483ed
804830c: e8 cf ff ff ff call 80482e0 <__libc_start_main#plt>
8048311: f4 hlt
8048312: 66 90 xchg %ax,%ax
8048314: 66 90 xchg %ax,%ax
8048316: 66 90 xchg %ax,%ax
8048318: 66 90 xchg %ax,%ax
804831a: 66 90 xchg %ax,%ax
804831c: 66 90 xchg %ax,%ax
804831e: 66 90 xchg %ax,%ax
080483ed <main>:
80483ed: 55 push %ebp
80483ee: 89 e5 mov %esp,%ebp
80483f0: b8 00 00 00 00 mov $0x0,%eax
80483f5: 5d pop %ebp
80483f6: c3 ret
80483f7: 66 90 xchg %ax,%ax
80483f9: 66 90 xchg %ax,%ax
80483fb: 66 90 xchg %ax,%ax
80483fd: 66 90 xchg %ax,%ax
80483ff: 90 nop
...
Disassembly of section .got:
08049ffc <.got>:
8049ffc: 00 00 add %al,(%eax)
...
Disassembly of section .got.plt:
0804a000 <_GLOBAL_OFFSET_TABLE_>:
804a000: 14 9f adc $0x9f,%al
804a002: 04 08 add $0x8,%al
...
804a00c: d6 (bad)
804a00d: 82 (bad)
804a00e: 04 08 add $0x8,%al
804a010: e6 82 out %al,$0x82
804a012: 04 08 add $0x8,%al
My question is;
In 0x804830c, 0x80482e0 is called (I've already apprehended the previous instructions.).
In 0x80482e0, the process jump to 0x804a010.
In 0x804a010, the instruction is < out %al,$0x82 >
...wait. just out? What was in the %al and where is 0x82?? I got stuck in this line.
Please help....
*p.s. I'm beginner to linux and operating system. I'm studying operating system concepts by school class, but still can not find how to study proper linux assembly language. I've already downloaded intel processor manual but it is too huge to read. Can anyone inform me good material for me? Thanks.

80482e0: ff 25 10 a0 04 08 jmp *0x804a010
This means "retrieve the 4-byte address stored at 0x804a010 and jump to it."
804a010: e6 82 out %al,$0x82
804a012: 04 08 add $0x8,%al
Those 4 bytes will be treated as an address, 0x80482e6, not as instructions.
80482e0: ff 25 10 a0 04 08 jmp *0x804a010
80482e6: 68 08 00 00 00 push $0x8
80482eb: e9 d0 ff ff ff jmp 80482c0 <_init+0x2c>
So we've just executed an instruction that has moved us exactly one instruction forward. At this point, you're probably wondering if there's a good reason for this.
There is. This is a typical PLT/GOT implementation. Much more detail, including a diagram, is at Position Independent Code in shared libraries: The Procedure Linkage Table.
The real code for __libc_start_main is in a shared library, glibc. The compiler and compile-time linker don't know where the code will be at run-time, so they place in your compiled program a short __libc_start_main function which contains just three instructions:
jump to a location specified by the 4th (or 5th, depending on whether you like to count from 0 or 1) entry in the GOT
push $8 onto the stack
jump to a resolver routine
The first time you call __libc_start_main, the resolver code will run. It will find the actual location of __libc_start_main in a shared library and will patch the 4th entry of the GOT to be that address. If your program calls __libc_start_main again, the jmp *0x804a010 instruction will take the program directly to the code in the shared library.
Can anyone inform me good material for me?
The x86 Assembly book at Wikibooks might be one place to start.

Related

How does 32-bit socketcall system call work based on the libc assembly? [duplicate]

This question already has answers here:
Linux syscall, libc, VDSO and implementation dissection
(1 answer)
function calls from fork() to do_fork()
(1 answer)
Closed 3 years ago.
I am trying to understand how 32-bit socketcall work by reading the assembly code in socket API and a few others in Libc library.
000ed9f0 <socket>:
ed9f0: 89 da mov %ebx,%edx
ed9f2: b8 66 00 00 00 mov $0x66,%eax # socketcall syscall number
ed9f7: bb 01 00 00 00 mov $0x1,%ebx # SYS_SOCKET value
ed9fc: 8d 4c 24 04 lea 0x4(%esp),%ecx # pointer to the *arg structure
eda00: 65 ff 15 10 00 00 00 call *%gs:0x10 # invokes syscall? but this is not sysenter or int 0x80
eda07: 89 d3 mov %edx,%ebx
eda09: 83 f8 83 cmp $0xffffff83,%eax
eda0c: 73 01 jae eda0f <socket+0x1f>
eda0e: c3 ret
eda0f: e8 cb 8d 03 00 call 1267df <__frame_state_for+0x35f>
eda14: 81 c1 ec d5 0b 00 add $0xbd5ec,%ecx
eda1a: 8b 89 24 ff ff ff mov -0xdc(%ecx),%ecx
eda20: f7 d8 neg %eax
eda22: 65 03 0d 00 00 00 00 add %gs:0x0,%ecx
eda29: 89 01 mov %eax,(%ecx)
eda2b: 83 c8 ff or $0xffffffff,%eax
eda2e: c3 ret
eda2f: 90 nop
See my code comment above (#). It makes sense to me until this line:
eda00: 65 ff 15 10 00 00 00 call *%gs:0x10 # invokes syscall? but this is not Sysenter or int 0x80
I thought we invoke syscall using either int 0x80 or Sysenter. But how does this call with segment register invokes the socketcall syscall?

Linux perf_events annotation frame pointer confusion

I ran sudo perf record -F 99 find / followed by sudo perf report and selected "Annotate fdopendir" and here are the first seven instructions:
push %rbp
push %rbx
mov %edi,%esi
mov %edi,%ebx
mov $0x1,%edi
sub $0xa8,%rsp
mov %rsp,%rbp
The first instruction appears to be saving the caller's base frame pointer. I believe instructions 2 through 5 are irrelevant to this question but here for completeness. Instructions 6 and 7 are confusing to me. Shouldn't the assignment of rbp to rsp occur before subtracting 0xa8 from rsp?
The x86-64 System V ABI doesn't require making a traditional / legacy stack-frame. This looks close to a traditional stack frame setup, but it's definitely not because there's no mov %rsp, %rbp right after the first push %rbp.
We're seeing compiler-generated code that simply uses RBP as a temporary register, and is using it to hold a pointer to a local on the stack. It's just a coincidence that this happens to involve the instruction mov %rsp, %rbp sometime after push %rbp. This is not making a stack frame.
In x86-64 System V, RBX and RBP are the only 2 "low 8" registers that are call-preserved, and thus usable without REX prefixes in some cases (e.g. for the push/pop, and when used in addressing modes), saving code-size. GCC prefers to use them before saving/restoring any of R12..R15. What registers are preserved through a linux x86-64 function call (For pointers, copying them with mov always requires a REX prefix for 64-bit operand-size, so there are fewer savings than for 32-bit integers, but gcc still goes for RBX then RBP, in that order, when it needs to save/restore call-preserved regs in a function.)
Disassembly of /lib/libc.so.6 (glibc) on my system (Arch Linux) shows similar but different code-gen for fdopendir. You stopped the disassembly too soon, before it makes a function call. That sheds some light on why it wanted a call-preserved temporary register: it wanted the var in a reg across the call.
00000000000c1260 <fdopendir>:
c1260: 55 push %rbp
c1261: 89 fe mov %edi,%esi
c1263: 53 push %rbx
c1264: 89 fb mov %edi,%ebx
c1266: bf 01 00 00 00 mov $0x1,%edi
c126b: 48 81 ec a8 00 00 00 sub $0xa8,%rsp
c1272: 64 48 8b 04 25 28 00 00 00 mov %fs:0x28,%rax # stack-check cookie
c127b: 48 89 84 24 98 00 00 00 mov %rax,0x98(%rsp)
c1283: 31 c0 xor %eax,%eax
c1285: 48 89 e5 mov %rsp,%rbp # save a pointer
c1288: 48 89 ea mov %rbp,%rdx # and pass it as a function arg
c128b: e8 90 7d 02 00 callq e9020 <__fxstat>
c1290: 85 c0 test %eax,%eax
c1292: 78 6a js c12fe <fdopendir+0x9e>
c1294: 8b 44 24 18 mov 0x18(%rsp),%eax
c1298: 25 00 f0 00 00 and $0xf000,%eax
c129d: 3d 00 40 00 00 cmp $0x4000,%eax
c12a2: 75 4c jne c12f0 <fdopendir+0x90>
....
c12c1: 48 89 e9 mov %rbp,%rcx # pass the pointer as the 4th arg
c12c4: 89 c2 mov %eax,%edx
c12c6: 31 f6 xor %esi,%esi
c12c8: 89 df mov %ebx,%edi
c12ca: e8 d1 f7 ff ff callq c0aa0 <__alloc_dir>
c12cf: 48 8b 8c 24 98 00 00 00 mov 0x98(%rsp),%rcx
c12d7: 64 48 33 0c 25 28 00 00 00 xor %fs:0x28,%rcx # check the stack cookie
c12e0: 75 38 jne c131a <fdopendir+0xba>
c12e2: 48 81 c4 a8 00 00 00 add $0xa8,%rsp
c12e9: 5b pop %rbx
c12ea: 5d pop %rbp
c12eb: c3 retq
This is pretty silly code-gen; gcc could have simply used mov %rsp, %rcx the 2nd time it needed it. I'd call this a missed-optimization. It never needed that pointer in a call-preserved register because it always knew where it was relative to RSP.
(Even if it hadn't been exactly at RSP+0, lea something(%rsp), %rdx and lea something(%rsp), %rcx would have been totally fine the two times it was needed, with probably less total cost than saving/restoring RBP + the required mov instructions.)
Or it could have used mov 0x18(%rbp),%eax instead of rsp to save a byte of code-size in that addressing mode. Avoiding direct references to RSP between function calls reduces the amount of stack-sync uops Intel CPUs need to insert.

how to know present carry flag in assembly code?

I've been reading and studying assembly code. Code is below
Disassembly of section .text:
08048510 <main>:
8048510: 8d 4c 24 04 lea 0x4(%esp),%ecx
8048514: 83 e4 f0 and $0xfffffff0,%esp
8048517: ff 71 fc pushl -0x4(%ecx)
804851a: 55 push %ebp
804851b: 89 e5 mov %esp,%ebp
804851d: 51 push %ecx
804851e: 83 ec 08 sub $0x8,%esp
8048521: 68 e0 93 04 08 push $0x80493e0
8048526: 68 c0 93 04 08 push $0x80493c0
804852b: 68 c9 93 04 08 push $0x80493c9
8048530: e8 7a 07 00 00 call 8048caf <eos_printf>
8048535: c7 04 24 d6 93 04 08 movl $0x80493d6,(%esp)
804853c: e8 6e 07 00 00 call 8048caf <eos_printf>
8048541: a1 38 c0 04 08 mov 0x804c038,%eax
8048546: bc 00 00 00 00 mov $0x0,%esp
804854b: ff e0 jmp *%eax
804854d: 8b 4d fc mov -0x4(%ebp),%ecx
8048550: 31 c0 xor %eax,%eax
8048552: c7 05 34 c0 04 08 00 movl $0x0,0x804c034
8048559: 00 00 00
804855c: c9 leave
804855d: 8d 61 fc lea -0x4(%ecx),%esp
8048560: c3 ret
Disassembly of section .data:
0804c030 <_irq_mask>:
804c030: ff (bad)
804c031: ff (bad)
804c032: ff (bad)
804c033: ff 01 incl (%ecx)
0804c034 <_eflags>:
804c034: 01 00 add %eax,(%eax)
...
0804c038 <_vector>:
804c038: 1d 8d 04 08 1d sbb $0x1d08048d,%eax
804c03d: 8d 04 08 lea (%eax,%ecx,1),%eax
804c040: 1d 8d 04 08 37 sbb $0x3708048d,%eax
804c045: 8d 04 08 lea (%eax,%ecx,1),%eax
At 0x8048541, EAX register is set to 0x804c038
At 0x804854b, process jump to the address pointed by EAX register
At 0x804c048, the instruction is < sbb $0x1d08048d, %eax>
By the instruction manual, sbb is stand for dest = dest - (src+carry flag). So we can replace 0x804c048 instruction to %eax = $eax - ($0x1d08048d + carry flag).
Then.... at that time, what value is set to carry flag value?
I didn't find any carry flag setting instruction previous to the 0x804c048 line. Is the carry flag is initially set to 0?
And the second question is, at 0x804854b, process jump to *%eax value. After that, how the process return to main function? there is nothing return instruction in _vector section.
I'll be glad to your help. Thanks.
Oh........ #MarkPlotnick You are God to me...... I was totally trapped in the < sbb $0x1d08048d, %eax >.
In the assembly source code, _vector array and _os_reset_handler function is defined as below.
.data
.global _vector
_vector:
.long _os_reset_handler
.long _os_reset_handler
.long _os_reset_handler
.long _os_irq_handler
.text
.global _os_reset_handler
_os_reset_handler:
_CLI
lea _os_init_stack_end, %esp
call _os_initialization
jmp _os_reset_handler
-----------------------
_CLI is defined in another c header file as macro
#define _CLI \
movl $0, _eflags;
I was consistently wondering why _vector array is not contain _os_reset_handler address. I read the disassembled code again and found that the objdump misaligned the hexcode at _vector data. "0x1d (address at 0x804c03c)" didn't go to new line, so it interpreted to irrelevant assembly code. (I'm very unhappy. I didn't do any other work to catch this problem for 10 hours...)
Anyway. At the address 0x8048d1d, there is _os_reset_handler function.
08048d1d <_os_reset_handler>:
8048d1d: c7 05 34 c0 04 08 00 movl $0x0,0x804c034
8048d24: 00 00 00
8048d27: 8d 25 48 d0 04 08 lea 0x804d048,%esp
8048d2d: e8 07 01 00 00 call 8048e39 <_os_initialization>
8048d32: e9 e6 ff ff ff jmp 8048d1d <_os_reset_handler>
No more questions. Thanks.

Smallest Stack Frame Size

I'm currently doing the Capture-the-Flag event by Stripe (you should check it out if you haven't seen it yet). The event requires you to look at disassembled executables a lot, and my knowledge of asm is rusty.
I keep seeing the constant 0x18 show up as some sort of minimum stack size. For instance, in a function that allocates a char[1024] array and calls the function strcpy(), the assembly looks like this:
8048484: 55 push %ebp
8048485: 89 e5 mov %esp,%ebp
8048487: 81 ec 18 04 00 00 sub $0x418,%esp
804848d: 8b 45 08 mov 0x8(%ebp),%eax
8048490: 89 44 24 04 mov %eax,0x4(%esp)
8048494: 8d 85 f8 fb ff ff lea -0x408(%ebp),%eax
804849a: 89 04 24 mov %eax,(%esp)
804849d: e8 e6 fe ff ff call 8048388 <strcpy#plt>
80484a2: c9 leave
80484a3: c3 ret
Why is the extra space needed?

operand generation of CALL instruction on x86-64 AMD

Following is the output of objdump of a sample program,
080483b4 <display>:
80483b4: 55 push %ebp
80483b5: 89 e5 mov %esp,%ebp
80483b7: 83 ec 18 sub $0x18,%esp
80483ba: 8b 45 0c mov 0xc(%ebp),%eax
80483bd: 89 44 24 04 mov %eax,0x4(%esp)
80483c1: 8d 45 fe lea 0xfffffffe(%ebp),%eax
80483c4: 89 04 24 mov %eax,(%esp)
80483c7: e8 ec fe ff ff call 80482b8 <strcpy#plt>
80483cc: 8b 45 08 mov 0x8(%ebp),%eax
80483cf: 89 44 24 04 mov %eax,0x4(%esp)
80483d3: c7 04 24 f0 84 04 08 movl $0x80484f0,(%esp)
80483da: e8 e9 fe ff ff call 80482c8 <printf#plt>
80483df: c9 leave
80483e0: c3 ret
080483e1 <main>:
80483e1: 8d 4c 24 04 lea 0x4(%esp),%ecx
80483e5: 83 e4 f0 and $0xfffffff0,%esp
80483e8: ff 71 fc pushl 0xfffffffc(%ecx)
80483eb: 55 push %ebp
80483ec: 89 e5 mov %esp,%ebp
80483ee: 51 push %ecx
80483ef: 83 ec 24 sub $0x24,%esp
80483f2: c7 44 24 04 f3 84 04 movl $0x80484f3,0x4(%esp)
80483f9: 08
80483fa: c7 04 24 0a 00 00 00 movl $0xa,(%esp)
8048401: e8 ae ff ff ff call 80483b4 <display>
8048406: b8 00 00 00 00 mov $0x0,%eax
804840b: 83 c4 24 add $0x24,%esp
804840e: 59 pop %ecx
804840f: 5d pop %ebp
8048410: 8d 61 fc lea 0xfffffffc(%ecx),%esp
What i need to understand, is in main we see the following at address - 8048401, call 80483b4 , however the machine code is - e8 ae ff ff ff. I see that CALL instruction is E8 but how is the address of function 80483b4 getting decoded to FFFFFFAE? I did a lot of search in google but it did not return anything. Can Anyone please explain?
E8 is the operand for "Call Relative", meaning the destination address is computed by adding the operand to the address of the next instruction. The operand is 0xFFFFFFAE, which is negative 0x52. 0x808406 - 0x52 is 0x80483b4.
Most disassemblers helpfully calculate the actual target address rather than just give you the relative address in the operand.
Complete info for x86 ISA at: http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-2a-manual.html
Interesting question. I've had a look at Intel's documentation and the E8 opcode is CALL rel16/32. 0xffffffae is actually a 32-bit two's complement signed integer equal to -82 decimal; it is a relative address from the byte immediately after the opcode and its operands.
If you do the math you can see it checks out:
0x8048406 - 82 = 0x80483b4
This puts the instruction pointer at the beginning of the display function.
Near calls are typically IP-relative -- meaning, the "address" is actually an offset from the instruction pointer. In such case, EIP points to the next instruction (so its value is 8048406). Add ffffffae (or -00000052 in two's complement) to it, and you get 80483b4.
Note that all this math is 32-bit. You're not doing any 64-bit operations here (or your registers would have Rs instead of Es in their names, and the addresses would be much longer).

Resources