Why there are some "meaningless" memory slot in the disassembly code? - linux

I use objdump to disassemble some ELF file on 32 bit Linux.
The asm file are in Intel format.
In the disassemble file, I notice some memory slot like below:
80483ed: c7 44 24 18 07 00 00 mov DWORD PTR [esp+0x18],0x7
80483f4: 00
80483f5: c7 44 24 1c 0c 00 00 mov DWORD PTR [esp+0x1c],0xc
80483fc: 00
80483fd: c7 44 24 20 01 00 00 mov DWORD PTR [esp+0x20],0x1
8048404: 00
8048405: c7 44 24 24 fe ff ff mov DWORD PTR [esp+0x24],0xfffffffe
804840c: ff
and the original assemble file is :
mov DWORD PTR [esp+24], 7
mov DWORD PTR [esp+28], 12
mov DWORD PTR [esp+32], 1
mov DWORD PTR [esp+36], -2
Could anyone tell me what does the memory address like "80483f4","80483fc" do?
Is this issue related to the memory alignment?
Thank you!

These are part of the previous line's operands. The "immediate" (constant) numbers are encoded as 32-bits. So 0x07 takes up 4 bytes: 07 00 00 00. Whatever you're using to disable is showing you the last byte on a different line.

Related

Why objdump disassembly don't have .data section?

Sorry, if question is too naive but I didn't get why my disassembly don't have that hello world string, or how it's loading that string into memory while executing.
section .data
msg: db "Hello, World!",0xa
len: equ $-msg
global _start
section .text
_start:
mov rax,0x01
mov rdi,1
mov rsi,msg
mov rdx,len
syscall
mov rax, 0x3c
mov rdi, 0
syscall
Objdump -d
a.out: file format elf64-x86-64
Disassembly of section .text:
0000000000401000 <_start>:
401000: b8 01 00 00 00 mov $0x1,%eax
401005: bf 01 00 00 00 mov $0x1,%edi
40100a: 48 be 00 20 40 00 00 movabs $0x402000,%rsi
401011: 00 00 00
401014: ba 0e 00 00 00 mov $0xe,%edx
401019: 0f 05 syscall
40101b: b8 3c 00 00 00 mov $0x3c,%eax
401020: bf 00 00 00 00 mov $0x0,%edi
401025: 0f 05 syscall
If it is moving address of msg string into $rsi then how it decide that address even before executing is this all done by linker? if yes can you give little bit insights?I know each program have there own virtual memory, but is that linker while linking put that string somewhere in memory?

ELF relocation: offsets are there but i see no call for the offset

I'm trying to understand elf relocation and there're couple of things that I don't really understand:
say I've got:
relamain.c
#include <stdio.h>
#include <stdlib.h>
#include "relafoo.c"
int main() {
int n;
scanf("%d",&n);
printf("\ngot %d, %d!=%d",n,n,factorial(n));
return 0;
}
and relafoo.c
int factorial(int n) {
if (n == 0 || n == 1) {
return 1;
}
return factorial(n-1)*n;
}
now in relamain.o readelf -r i see:
000000000027 000900000002 R_X86_64_PC32 0000000000000000 factorial - 4
000000000052 000500000002 R_X86_64_PC32 0000000000000000 .rodata - 4
00000000005c 000c00000004 R_X86_64_PLT32 0000000000000000 __isoc99_scanf - 4
000000000066 000900000002 R_X86_64_PC32 0000000000000000 factorial - 4
how come i have two offsets for the same function(factorial)
i objdump -d relamain.o:
0000000000000031 <main>:
31: 55 push rbp
32: 48 89 e5 mov rbp,rsp
35: 48 83 ec 10 sub rsp,0x10
39: 64 48 8b 04 25 28 00 mov rax,QWORD PTR fs:0x28
40: 00 00
42: 48 89 45 f8 mov QWORD PTR [rbp-0x8],rax
46: 31 c0 xor eax,eax
48: 48 8d 45 f4 lea rax,[rbp-0xc]
4c: 48 89 c6 mov rsi,rax
4f: 48 8d 3d 00 00 00 00 lea rdi,[rip+0x0] # 56 <main+0x25>
56: b8 00 00 00 00 mov eax,0x0
5b: e8 00 00 00 00 call 60 <main+0x2f>
60: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc]
63: 89 c7 mov edi,eax
65: e8 00 00 00 00 call 6a <main+0x39>
6a: 89 c1 mov ecx,eax
6c: 8b 55 f4 mov edx,DWORD PTR [rbp-0xc]
6f: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc]
72: 89 c6 mov esi,eax
74: 48 8d 3d 00 00 00 00 lea rdi,[rip+0x0] # 7b <main+0x4a>
7b: b8 00 00 00 00 mov eax,0x0
80: e8 00 00 00 00 call 85 <main+0x54>
85: b8 00 00 00 00 mov eax,0x0
8a: 48 8b 75 f8 mov rsi,QWORD PTR [rbp-0x8]
8e: 64 48 33 34 25 28 00 xor rsi,QWORD PTR fs:0x28
95: 00 00
97: 74 05 je 9e <main+0x6d>
99: e8 00 00 00 00 call 9e <main+0x6d>
9e: c9 leave
9f: c3 ret
Looking at the produced code i see that all the calls are not referring to 66 nor 27 which are the offsets for my factorial function, why is that? according to "learning linux binary analysis" by Ryan "elfmaster" O'Neill, I should expect that at least i should see call 66 or call 27, can anyone explain this?
if anyone can link me to a good book that explains everything in details with examples(beyond man ofc) dynamic linking and relocations it would be great
1- You don't have two offset for the same function. you have two offset to two locations when the relocations must be applied. In your example you have two call to the function factorial. One at offset 0x26 and the other at offset 0x66 and the offset to the relocation linked to these calls are at offset 0x27 and 0x67. The "00000000" at the offsets 0x27 and 0x66 will be replaced by a value calculated by the linker. you can see the dump of the executable to be sure.
2- When creating the object file. The assembler don't know factorial address , so it places "00000000" and place a relocation to tell the linker to replace these 0 by the value needed to get factorial since only the linker will know it exact location.
3- May be Linkers & Loaders by John R. Levine. however what i suggest you, is to start reading http://www.skyfree.org/linux/references/ELF_Format.pdf. Maybe it can be enough, depending on the level of understanding you seek.

Why [ebp-8] is the location of the first local variable in visual C++?

When I program in visual studio 2019, I input the following code and I compile it in debug mode and do some disassembly. I discover that the variable "c" is located in address ebp-8(in myfunction). However, I read from books that "the first local variable should appear in address ebp-4". Is there something with visual studio or with debug mode?
int myfunction(int a, int b)
{
013017B0 55 push ebp
013017B1 8B EC mov ebp,esp
013017B3 81 EC D8 00 00 00 sub esp,0D8h
013017B9 53 push ebx
013017BA 56 push esi
013017BB 57 push edi
013017BC 8D BD 28 FF FF FF lea edi,[ebp+FFFFFF28h]
013017C2 B9 36 00 00 00 mov ecx,36h
013017C7 B8 CC CC CC CC mov eax,0CCCCCCCCh
013017CC F3 AB rep stos dword ptr es:[edi]
013017CE B9 08 C0 30 01 mov ecx,130C008h
013017D3 E8 3F FA FF FF call 01301217
//Nonsense above.
int c = a + b;
013017D8 8B 45 08 mov eax,dword ptr [ebp+8] //a
013017DB 03 45 0C add eax,dword ptr [ebp+0Ch] //b
013017DE 89 45 F8 mov dword ptr [ebp-8],eax //Why it is not [ebp-4]?
}
I've figured out that Visual Studio is leaving 8 bytes between local variables in debug mode, but in release mode it is working normal as expected.

Linux perf_events annotation frame pointer confusion

I ran sudo perf record -F 99 find / followed by sudo perf report and selected "Annotate fdopendir" and here are the first seven instructions:
push %rbp
push %rbx
mov %edi,%esi
mov %edi,%ebx
mov $0x1,%edi
sub $0xa8,%rsp
mov %rsp,%rbp
The first instruction appears to be saving the caller's base frame pointer. I believe instructions 2 through 5 are irrelevant to this question but here for completeness. Instructions 6 and 7 are confusing to me. Shouldn't the assignment of rbp to rsp occur before subtracting 0xa8 from rsp?
The x86-64 System V ABI doesn't require making a traditional / legacy stack-frame. This looks close to a traditional stack frame setup, but it's definitely not because there's no mov %rsp, %rbp right after the first push %rbp.
We're seeing compiler-generated code that simply uses RBP as a temporary register, and is using it to hold a pointer to a local on the stack. It's just a coincidence that this happens to involve the instruction mov %rsp, %rbp sometime after push %rbp. This is not making a stack frame.
In x86-64 System V, RBX and RBP are the only 2 "low 8" registers that are call-preserved, and thus usable without REX prefixes in some cases (e.g. for the push/pop, and when used in addressing modes), saving code-size. GCC prefers to use them before saving/restoring any of R12..R15. What registers are preserved through a linux x86-64 function call (For pointers, copying them with mov always requires a REX prefix for 64-bit operand-size, so there are fewer savings than for 32-bit integers, but gcc still goes for RBX then RBP, in that order, when it needs to save/restore call-preserved regs in a function.)
Disassembly of /lib/libc.so.6 (glibc) on my system (Arch Linux) shows similar but different code-gen for fdopendir. You stopped the disassembly too soon, before it makes a function call. That sheds some light on why it wanted a call-preserved temporary register: it wanted the var in a reg across the call.
00000000000c1260 <fdopendir>:
c1260: 55 push %rbp
c1261: 89 fe mov %edi,%esi
c1263: 53 push %rbx
c1264: 89 fb mov %edi,%ebx
c1266: bf 01 00 00 00 mov $0x1,%edi
c126b: 48 81 ec a8 00 00 00 sub $0xa8,%rsp
c1272: 64 48 8b 04 25 28 00 00 00 mov %fs:0x28,%rax # stack-check cookie
c127b: 48 89 84 24 98 00 00 00 mov %rax,0x98(%rsp)
c1283: 31 c0 xor %eax,%eax
c1285: 48 89 e5 mov %rsp,%rbp # save a pointer
c1288: 48 89 ea mov %rbp,%rdx # and pass it as a function arg
c128b: e8 90 7d 02 00 callq e9020 <__fxstat>
c1290: 85 c0 test %eax,%eax
c1292: 78 6a js c12fe <fdopendir+0x9e>
c1294: 8b 44 24 18 mov 0x18(%rsp),%eax
c1298: 25 00 f0 00 00 and $0xf000,%eax
c129d: 3d 00 40 00 00 cmp $0x4000,%eax
c12a2: 75 4c jne c12f0 <fdopendir+0x90>
....
c12c1: 48 89 e9 mov %rbp,%rcx # pass the pointer as the 4th arg
c12c4: 89 c2 mov %eax,%edx
c12c6: 31 f6 xor %esi,%esi
c12c8: 89 df mov %ebx,%edi
c12ca: e8 d1 f7 ff ff callq c0aa0 <__alloc_dir>
c12cf: 48 8b 8c 24 98 00 00 00 mov 0x98(%rsp),%rcx
c12d7: 64 48 33 0c 25 28 00 00 00 xor %fs:0x28,%rcx # check the stack cookie
c12e0: 75 38 jne c131a <fdopendir+0xba>
c12e2: 48 81 c4 a8 00 00 00 add $0xa8,%rsp
c12e9: 5b pop %rbx
c12ea: 5d pop %rbp
c12eb: c3 retq
This is pretty silly code-gen; gcc could have simply used mov %rsp, %rcx the 2nd time it needed it. I'd call this a missed-optimization. It never needed that pointer in a call-preserved register because it always knew where it was relative to RSP.
(Even if it hadn't been exactly at RSP+0, lea something(%rsp), %rdx and lea something(%rsp), %rcx would have been totally fine the two times it was needed, with probably less total cost than saving/restoring RBP + the required mov instructions.)
Or it could have used mov 0x18(%rbp),%eax instead of rsp to save a byte of code-size in that addressing mode. Avoiding direct references to RSP between function calls reduces the amount of stack-sync uops Intel CPUs need to insert.

Can I use scasb on a program's own code? [duplicate]

I wrote:
mov 60, %rax
GNU as accepted it, although I should have written
mov $60, %rax
Is there any difference between two such calls?
Yes; the first loads the value stored in memory at address 60 and stores the result in rax, the second stores the immediate value 60 into rax.
Just try it...
mov 60,%rax
mov $60,%rax
mov 0x60,%rax
0000000000000000 <.text>:
0: 48 8b 04 25 3c 00 00 mov 0x3c,%rax
7: 00
8: 48 c7 c0 3c 00 00 00 mov $0x3c,%rax
f: 48 8b 04 25 60 00 00 mov 0x60,%rax
16: 00
Ewww! Historically the dollar sign meant hex $60 = 0x60, but gas also has a history of screwing up assembly languages...and historically x86 assembly languages allowed 60h to indicate hex, but got an error when I did that.
So with and without the dollar sigh you get a different instruction.
0x8B is a register/memory to register, 0xC7 is an immediate to register. so as davmac answered mov 60,%rax is a mov memory location to register, and mov $60,%rax is mov immediate to register.

Resources