Sorry, if question is too naive but I didn't get why my disassembly don't have that hello world string, or how it's loading that string into memory while executing.
section .data
msg: db "Hello, World!",0xa
len: equ $-msg
global _start
section .text
_start:
mov rax,0x01
mov rdi,1
mov rsi,msg
mov rdx,len
syscall
mov rax, 0x3c
mov rdi, 0
syscall
Objdump -d
a.out: file format elf64-x86-64
Disassembly of section .text:
0000000000401000 <_start>:
401000: b8 01 00 00 00 mov $0x1,%eax
401005: bf 01 00 00 00 mov $0x1,%edi
40100a: 48 be 00 20 40 00 00 movabs $0x402000,%rsi
401011: 00 00 00
401014: ba 0e 00 00 00 mov $0xe,%edx
401019: 0f 05 syscall
40101b: b8 3c 00 00 00 mov $0x3c,%eax
401020: bf 00 00 00 00 mov $0x0,%edi
401025: 0f 05 syscall
If it is moving address of msg string into $rsi then how it decide that address even before executing is this all done by linker? if yes can you give little bit insights?I know each program have there own virtual memory, but is that linker while linking put that string somewhere in memory?
Related
I'm trying to understand elf relocation and there're couple of things that I don't really understand:
say I've got:
relamain.c
#include <stdio.h>
#include <stdlib.h>
#include "relafoo.c"
int main() {
int n;
scanf("%d",&n);
printf("\ngot %d, %d!=%d",n,n,factorial(n));
return 0;
}
and relafoo.c
int factorial(int n) {
if (n == 0 || n == 1) {
return 1;
}
return factorial(n-1)*n;
}
now in relamain.o readelf -r i see:
000000000027 000900000002 R_X86_64_PC32 0000000000000000 factorial - 4
000000000052 000500000002 R_X86_64_PC32 0000000000000000 .rodata - 4
00000000005c 000c00000004 R_X86_64_PLT32 0000000000000000 __isoc99_scanf - 4
000000000066 000900000002 R_X86_64_PC32 0000000000000000 factorial - 4
how come i have two offsets for the same function(factorial)
i objdump -d relamain.o:
0000000000000031 <main>:
31: 55 push rbp
32: 48 89 e5 mov rbp,rsp
35: 48 83 ec 10 sub rsp,0x10
39: 64 48 8b 04 25 28 00 mov rax,QWORD PTR fs:0x28
40: 00 00
42: 48 89 45 f8 mov QWORD PTR [rbp-0x8],rax
46: 31 c0 xor eax,eax
48: 48 8d 45 f4 lea rax,[rbp-0xc]
4c: 48 89 c6 mov rsi,rax
4f: 48 8d 3d 00 00 00 00 lea rdi,[rip+0x0] # 56 <main+0x25>
56: b8 00 00 00 00 mov eax,0x0
5b: e8 00 00 00 00 call 60 <main+0x2f>
60: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc]
63: 89 c7 mov edi,eax
65: e8 00 00 00 00 call 6a <main+0x39>
6a: 89 c1 mov ecx,eax
6c: 8b 55 f4 mov edx,DWORD PTR [rbp-0xc]
6f: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc]
72: 89 c6 mov esi,eax
74: 48 8d 3d 00 00 00 00 lea rdi,[rip+0x0] # 7b <main+0x4a>
7b: b8 00 00 00 00 mov eax,0x0
80: e8 00 00 00 00 call 85 <main+0x54>
85: b8 00 00 00 00 mov eax,0x0
8a: 48 8b 75 f8 mov rsi,QWORD PTR [rbp-0x8]
8e: 64 48 33 34 25 28 00 xor rsi,QWORD PTR fs:0x28
95: 00 00
97: 74 05 je 9e <main+0x6d>
99: e8 00 00 00 00 call 9e <main+0x6d>
9e: c9 leave
9f: c3 ret
Looking at the produced code i see that all the calls are not referring to 66 nor 27 which are the offsets for my factorial function, why is that? according to "learning linux binary analysis" by Ryan "elfmaster" O'Neill, I should expect that at least i should see call 66 or call 27, can anyone explain this?
if anyone can link me to a good book that explains everything in details with examples(beyond man ofc) dynamic linking and relocations it would be great
1- You don't have two offset for the same function. you have two offset to two locations when the relocations must be applied. In your example you have two call to the function factorial. One at offset 0x26 and the other at offset 0x66 and the offset to the relocation linked to these calls are at offset 0x27 and 0x67. The "00000000" at the offsets 0x27 and 0x66 will be replaced by a value calculated by the linker. you can see the dump of the executable to be sure.
2- When creating the object file. The assembler don't know factorial address , so it places "00000000" and place a relocation to tell the linker to replace these 0 by the value needed to get factorial since only the linker will know it exact location.
3- May be Linkers & Loaders by John R. Levine. however what i suggest you, is to start reading http://www.skyfree.org/linux/references/ELF_Format.pdf. Maybe it can be enough, depending on the level of understanding you seek.
I used to think when using a integer as a constant, I always need to append a "$" sign, unless the number is interpreted as an address, so I had:
.data
a=$2
.equ b,3
.text
.globl _start
_start:
movl $a,%ebx
movl $b,%ecx
movl $1,%eax
int $0x80
This code is compiled into:
0: bb 00 00 00 00 mov $0x0,%ebx
5: b9 03 00 00 00 mov $0x3,%ecx
a: b8 01 00 00 00 mov $0x1,%eax
f: cd 80 int $0x80
I don't know how as compiler deal with a=$2, why $2 is compiled into 0x0?
Then I removed "$",
.data
a=2
.equ b,3
.text
.globl _start
_start:
movl $a,%ebx
movl $b,%ecx
movl $1,%eax
int $0x80
This time, the compiled code is correct:
0: bb 02 00 00 00 mov $0x2,%ebx
5: b9 03 00 00 00 mov $0x3,%ecx
a: b8 01 00 00 00 mov $0x1,%eax
f: cd 80 int $0x80
So my question is, what's the difference between '2' and '$2' here?
I'm using ubuntu64 + gas
I use AT&T assembly, I tried to assign al register with a 'e', and compile it into 32 bit program
$ cat c2.s
.code32
.globl _start
_start:
movb 'e',%al # Problem here!!!!
mov $1,%eax
mov $0,%ebx
int $0x80
$ as -g c2.s -o c2.o && ld c2.o -o c2
$ c2
Segmentation fault(SIGSEGV)
I used gdb to debug c2, and found it crash at movb 'e',%al. So weird, how could a "movb" crash?
Then I switched my syntax to use intel same content:
$ cat b2.s
.intel_syntax noprefix
.code32
.section .text
.global _start
_start:
mov al,'e'
mov eax,1
mov ebx,0
int 0x80
$ as -g b2.s -o b2.o && ld b2.o -o b2
$ b2
This time, no problem. But why, is my usage of AT&T assembly has something wrong?
Isn't that interesting. AS produces this object code for AT&T syntax
400078: a0 65 00 00 00 b8 01 movabs 0x1b800000065,%al
40007f: 00 00
400081: 00 bb 00 00 00 00 add %bh,0x0(%rbx)
400087: cd 80 int $0x80
obviously, location 0x1b800000065 is not mapped, but Intel;
400078: b0 65 mov $0x65,%al
40007a: b8 01 00 00 00 mov $0x1,%eax
40007f: bb 00 00 00 00 mov $0x0,%ebx
400084: cd 80 int $0x80
remove .code32 from AT&T and you get this.
400078: 8a 04 25 65 00 00 00 mov 0x65,%al
40007f: b8 01 00 00 00 mov $0x1,%eax
400084: bb 00 00 00 00 mov $0x0,%ebx
400089: cd 80 int $0x80
Notice how it wants to move the contents of memory location 0x64 into AL.
movb $'e',%al
fixes that problem. In any event, developing 32 bit code on a 64 bit system will probably give you grief at some point in time, especially when you start dealing with stack.
I've been reading and studying assembly code. Code is below
Disassembly of section .text:
08048510 <main>:
8048510: 8d 4c 24 04 lea 0x4(%esp),%ecx
8048514: 83 e4 f0 and $0xfffffff0,%esp
8048517: ff 71 fc pushl -0x4(%ecx)
804851a: 55 push %ebp
804851b: 89 e5 mov %esp,%ebp
804851d: 51 push %ecx
804851e: 83 ec 08 sub $0x8,%esp
8048521: 68 e0 93 04 08 push $0x80493e0
8048526: 68 c0 93 04 08 push $0x80493c0
804852b: 68 c9 93 04 08 push $0x80493c9
8048530: e8 7a 07 00 00 call 8048caf <eos_printf>
8048535: c7 04 24 d6 93 04 08 movl $0x80493d6,(%esp)
804853c: e8 6e 07 00 00 call 8048caf <eos_printf>
8048541: a1 38 c0 04 08 mov 0x804c038,%eax
8048546: bc 00 00 00 00 mov $0x0,%esp
804854b: ff e0 jmp *%eax
804854d: 8b 4d fc mov -0x4(%ebp),%ecx
8048550: 31 c0 xor %eax,%eax
8048552: c7 05 34 c0 04 08 00 movl $0x0,0x804c034
8048559: 00 00 00
804855c: c9 leave
804855d: 8d 61 fc lea -0x4(%ecx),%esp
8048560: c3 ret
Disassembly of section .data:
0804c030 <_irq_mask>:
804c030: ff (bad)
804c031: ff (bad)
804c032: ff (bad)
804c033: ff 01 incl (%ecx)
0804c034 <_eflags>:
804c034: 01 00 add %eax,(%eax)
...
0804c038 <_vector>:
804c038: 1d 8d 04 08 1d sbb $0x1d08048d,%eax
804c03d: 8d 04 08 lea (%eax,%ecx,1),%eax
804c040: 1d 8d 04 08 37 sbb $0x3708048d,%eax
804c045: 8d 04 08 lea (%eax,%ecx,1),%eax
At 0x8048541, EAX register is set to 0x804c038
At 0x804854b, process jump to the address pointed by EAX register
At 0x804c048, the instruction is < sbb $0x1d08048d, %eax>
By the instruction manual, sbb is stand for dest = dest - (src+carry flag). So we can replace 0x804c048 instruction to %eax = $eax - ($0x1d08048d + carry flag).
Then.... at that time, what value is set to carry flag value?
I didn't find any carry flag setting instruction previous to the 0x804c048 line. Is the carry flag is initially set to 0?
And the second question is, at 0x804854b, process jump to *%eax value. After that, how the process return to main function? there is nothing return instruction in _vector section.
I'll be glad to your help. Thanks.
Oh........ #MarkPlotnick You are God to me...... I was totally trapped in the < sbb $0x1d08048d, %eax >.
In the assembly source code, _vector array and _os_reset_handler function is defined as below.
.data
.global _vector
_vector:
.long _os_reset_handler
.long _os_reset_handler
.long _os_reset_handler
.long _os_irq_handler
.text
.global _os_reset_handler
_os_reset_handler:
_CLI
lea _os_init_stack_end, %esp
call _os_initialization
jmp _os_reset_handler
-----------------------
_CLI is defined in another c header file as macro
#define _CLI \
movl $0, _eflags;
I was consistently wondering why _vector array is not contain _os_reset_handler address. I read the disassembled code again and found that the objdump misaligned the hexcode at _vector data. "0x1d (address at 0x804c03c)" didn't go to new line, so it interpreted to irrelevant assembly code. (I'm very unhappy. I didn't do any other work to catch this problem for 10 hours...)
Anyway. At the address 0x8048d1d, there is _os_reset_handler function.
08048d1d <_os_reset_handler>:
8048d1d: c7 05 34 c0 04 08 00 movl $0x0,0x804c034
8048d24: 00 00 00
8048d27: 8d 25 48 d0 04 08 lea 0x804d048,%esp
8048d2d: e8 07 01 00 00 call 8048e39 <_os_initialization>
8048d32: e9 e6 ff ff ff jmp 8048d1d <_os_reset_handler>
No more questions. Thanks.
I want to convert this assembly program to shellcode.
This program just creates a file , my purpose is how I should convert assembly to shellcode when I using extern command in it
My assmbly code is :
extern _fopen,_fclose
global main
section .text
main:
xor r10,r10
push r10
mov r13, 0x6277
push r13
mov rsi,rsp
push r10
mov r13, 0x726964656b616d
push r13
mov rdi,rsp
call _fopen
mov r14, rax
mov rdi, r14
call _fclose
mov rax, 0x2000001 ; exit
mov rdi, 0
syscall
I used this command to compile it :
nasm -f macho64 test2.asm
ld -o test -e main test2.o -lSystem
and I used objdum -d test to create shellcode
...........$ objdump -d test
test: file format mach-o-x86-64
Disassembly of section .text:
0000000000001f93 <main>:
1f93: 4d 31 d2 xor %r10,%r10
1f96: 41 52 push %r10
1f98: 41 bd 77 62 00 00 mov $0x6277,%r13d
1f9e: 41 55 push %r13
1fa0: 48 89 e6 mov %rsp,%rsi
1fa3: 41 52 push %r10
1fa5: 49 bd 6d 61 6b 65 64 movabs $0x726964656b616d,%r13
1fac: 69 72 00
1faf: 41 55 push %r13
1fb1: 48 89 e7 mov %rsp,%rdi
1fb4: e8 1d 00 00 00 callq 1fd6 <_fopen$stub>
1fb9: 49 89 c6 mov %rax,%r14
1fbc: 4c 89 f7 mov %r14,%rdi
1fbf: e8 0c 00 00 00 callq 1fd0 <_fclose$stub>
1fc4: b8 01 00 00 02 mov $0x2000001,%eax
1fc9: bf 00 00 00 00 mov $0x0,%edi
1fce: 0f 05 syscall
Disassembly of section __TEXT.__stubs:
0000000000001fd0 <_fclose$stub>:
1fd0: ff 25 3a 00 00 00 jmpq *0x3a(%rip) # 2010 <_fclose$stub>
0000000000001fd6 <_fopen$stub>:
1fd6: ff 25 3c 00 00 00 jmpq *0x3c(%rip) # 2018 <_fopen$stub>
Disassembly of section __TEXT.__stub_helper:
0000000000001fdc <__TEXT.__stub_helper>:
1fdc: 68 00 00 00 00 pushq $0x0
1fe1: e9 0a 00 00 00 jmpq 1ff0 <_fopen$stub+0x1a>
1fe6: 68 0e 00 00 00 pushq $0xe
1feb: e9 00 00 00 00 jmpq 1ff0 <_fopen$stub+0x1a>
1ff0: 4c 8d 1d 11 00 00 00 lea 0x11(%rip),%r11 # 2008 <>
1ff7: 41 53 push %r11
1ff9: ff 25 01 00 00 00 jmpq *0x1(%rip) # 2000 <>
1fff: 90 nop
In normal condition i used opcode in "main" section and conveted it to shellcode and used this code to run it
#include <sys/mman.h>
#include <inttypes.h>
#include <unistd.h>
char code[] = "\x4d\x31\xd2\x41\x52\x41...For Example ...";
int main()
{
int (*ret)() = (int (*)())code;
void *page = (void *)((uintptr_t)code & ~(getpagesize() - 1));
mprotect(page, sizeof code, PROT_EXEC);
ret();
return 0;
}
but in this case it dosen't work and I know I should used other sections opcodes mentioned below the main section , but I don't know the arrange of calling them.
Please guide me.
your assmbly code is written in x64 mode,are you sure that the loader-'main' program is also compile to x64?
This one I've tried with an Macho64-Binary
for i in $( otool -t test2.o | cut -d ' ' -f 2- | grep ' '); do echo -n '\\x'$i; done; echo